TD M2 cis-reg ENS

Exercise 1: Motif descriptors

The goal of this first exercice is to manipulate different motif descriptors: consensus sequences and count matrices.
You will use the program convert-matrix from the RSA-tools suite. This tool allows to :

Perform interconversions between various matrix formats
Produce consensus and regular expression descriptors
Calculate various statistics on the PSSMs
Reverse PSSMs
Permute PSSMs
Construct logos

As input data, you will construct a matrix from a multiple alignment, and also fetch a count matrix from the JASPAR database. The matrices describes the binding motifs of transcription factors seen in the course (Meis and Gcn4).

Constructing a personal matrix

Go to the RSAT teaching server

In the Matrix tool menu, select convert matrix

You will construct a count matrix for the factor Meis, from the multiple alignment of TFBS extracted from various vertebrate genomes. The alignment is in FASTA format.
Note that the tool allows to convert to a wide range of formats.

Copy the following alignement in the matrix box, and select as format sequences

>1
TGACAA
>2
TGACAG
>3
TGATGG
>4
TGACAA
>5
TGGCAG
>6
TGATTG
>7
TGACAG
>8
TGACAG

The background model is not used in this exercice, you can leave the default option

Click on Go to run the program with default parameters

Questions
Compare the computed matrix with the one you made manually during the course.
Look at the consensus sequence under the matrix. Is it strict or degenerate ? Compare it with the one you made manually during the course.
Have a look at the logo, note how the height in each column is different.

Go back to the previous page, rerun the program by choosing as output format transfac

Questions
The transfac format is very different from the tab format used before. What is the main difference ?
This is the format used by the TRANSFAC database ; this format is used by many bioinformatics tools, and has the advantage of integrating a name and identifier (ID and AC fields) within the matrix format.

Obtaining a matrix from a database

You will retrieve the Gcn4 count matrix from the JASPAR database.

Go to the Jaspar database website

Search by name the Gcn4 factor

Click on the logo to get more details

Questions
To which family does this transcription factor belong ? From which organism was built the matrix ?

The logo does not look really "nice", you will computes a logo in PDF format, usable for publications

Copy/paste the matrix into convert matrix

Keep the input format as tab, not jaspar !

Run the program to produce a logo, if you click on it, you can download the PDF file

Morgane Thomas-Chollier - Ecole Normale Supérieure mthomas[at]biologie.ens.fr