- Integration of sequence and expression data provides new insight into cross-species comparisons -
Using our methodology, we were able to identify differences between the sporulation processes of the two species,
despite conserved functions, revealing the plasticity of the functional modules. Comparing genomic properties of different
organisms is of fundamental importance and several examples, which exhibit characteristic situations and illustrate the
potential of combining sequence and expression data to address particular evolutionary issues, are presented in :
- Supplementary Note S8 : Conservation of expression between organisms can be used for improving functional gene annotation.
- Supplementary Note S9 : Co-expression can be used for refining orthologous links between organisms.
- Co-expression can be used for refining orthologous links between organisms -
The identification of orthologous genes in S. cerevisiae and S. pombe is an important point in our approach.
For some amino acid sequences, it was not possible to determine a unique relationship between the two organisms.
This "multiple inter-organism links" illustrates the complexity of orthology analysis because of large numbers of
paralogs within protein families. For example, the S. cerevisiae gene GPN1 (YDR508C) exhibited high similarity scores
with two genes in S. pombe (SPBPB2B2.01 and SPBC19F8.06C/meu22) (Figure S9 A). The situation may be the consequence of
lineage specific gene duplications generating multiple paralogs in one species (in this case S. pombe), or deletion
events resulting in the loss of the "true ortholog" of a gene in the other species. In such cases, it is non-trivial
to determine which of the genes is functionally equivalent to the ortholog in the other species.
After the procedure to optimise the superimposition of the two gene expression networks (
Figure 2),
GPN1 showed a large average Ei value (
Figure 5A, coloured in red) with its related genes in
S. pombe. GPN1 was attracted in opposite directions (Figure S9 B, red arrows), leading to deformation of the
displaced expression network. It was clear that this orthologous group includes genes, which are expressed differently
during sporulation. We plotted the corresponding expression profiles (Figure below C) and
interestingly, of the two S. pombe expression profiles, that of meu22 (SPBC19F8.06C, red profile) was clearly
different from the expression profile of SPBPB2B2.01 (Figure below C), but very similar to that of GPN1. Such an observation
suggests that in fine only the orthologous link between GPN1 and meu22 is reliable. It illustrates a limitation of
cross-species comparison based primarily on genomic sequence information. The combination of sequence and expression
data allows the discrimination of homolog genes which cannot be distinguished by sequence comparison alone but
whose expression profiles are clearly different.
Figure legend : The S. cerevisiae gene GPN1/YDR508C exhibits ortholog relationships with two genes in S. pombe
(SPBPB2B2.01, SPBC19F8.06C/meu22).
(A) Multiple sequence alignment and BLAST similarity scores between the amino acid sequences of the three genes,
whose expression profiles are represented in (C). Boxes at the top, allow discrimination between sequences
(SPBC19F8.06C/meu22 = red, SPBPB2B2.01 = black and GPN1/YDR508C = black star). Alignment and coloured version of the
alignment were generated using the Tcoffee web server (Poirot et al. 2003). Red residues correspond to highly reliable
portions of the multiple alignments.
(B) Initial location of gene YDR508C in the gene expression networks. Points represent genes, white segments
link pairs of genes whose the inter-gene distance between their expression measurements is less than a threshold
D0 = 0.3 and inter-organism related genes are connected with red segments. Red arrows show directions where YDR508C
is attracted during the superimposition procedure.
(C) Expression profile of GPN1 during the time course experiments as described in (Chu et al. 1998),
and expression profiles of the two S. pombe genes related to GPN1 with high similarity scores
(SPBPB2B2.01 = black and SPBC19F8.06C/meu22 = red) during the time course experiments described in (Mata et al. 2002).

- To enlarge the multiple sequence alignment result, click on the following picture :