Morgane Thomas-Chollier - Ecole Normale SupĂ©rieure        M1 ENS - October 2015

Exercise 1: Motif descriptors

The goal of this first exercice is to manipulate different motif descriptors: consensus sequences and count matrices.
You will use the program convert-matrix from the RSA-tools suite. This tool allows to : As input data, you will construct a matrix from a multiple alignment, and also fetch a count matrix from the JASPAR database. The matrices describes the binding motifs of transcription factors seen in the course (Meis and Gcn4).

Constructing a personal matrix

  1. Go to the RSAT teaching server
  2. In the Matrix tool menu, select convert matrix
  3. You will construct a count matrix for the factor Meis, from the multiple alignment of TFBS extracted from various vertebrate genomes. The alignment is in FASTA format.
    Note that the tool allows to convert to a wide range of formats.

  4. Copy the following alignement in the matrix box, and select as format sequences
  5. >1
    TGACAA
    >2
    TGACAG
    >3
    TGATGG
    >4
    TGACAA
    >5
    TGGCAG
    >6
    TGATTG
    >7
    TGACAG
    >8
    TGACAG
    
  6. The background model is not used in this exercice, you can leave the default option
  7. Click on Go to run the program with default parameters
  8. Questions
    Compare the computed matrix with the one you made manually during the course.
    Look at the consensus sequence under the matrix. Is it strict or degenerate ? Compare it with the one you made manually during the course.
    Have a look at the logo, note how the height in each column is different.

  9. Go back to the previous page, rerun the program by choosing as output format transfac
  10. Questions
    The transfac format is very different from the tab format used before. What is the main difference ?
    This is the format used by the TRANSFAC database ; this format is used by many bioinformatics tools, and has the advantage of integrating a name and identifier (ID and AC fields) within the matrix format.


Obtaining a matrix from a database

You will retrieve the Gcn4 count matrix from the JASPAR database.

  1. Go to the Jaspar database website
  2. Search by name the Gcn4 factor
  3. Click on the logo to get more details
  4. Questions
    To which family does this transcription factor belong ? From which organism was built the matrix ?

    The logo does not look really "nice", you will computes a logo in PDF format, usable for publications

  5. Copy/paste the matrix into convert matrix
  6. Keep the input format as tab, not jaspar !
  7. Run the program to produce a logo, if you click on it, you can download the PDF file

Morgane Thomas-Chollier - Ecole Normale SupĂ©rieure        mthomas[at]biologie.ens.fr