- Comparison of cell-cycle gene expression programs between budding and fission yeasts -

To illustrate our approach with an other example, we compared the gene expression networks of the yeasts S. cerevisiae and S. pombe, using the cell cycle data described in (Spellman, Sherlock et al. 1998) and (Rustici, Mata et al. 2004). The cross-species comparison was performed using a procedure identical to that presented with the sporulation datasets. To summarize, this procedure can be divided into six main steps :
  • Selection of genes implicated in the cell cycle
  • Representation in a 3-dimensional space and classification of the expression profiles into temporal classes
  • Selection of orthologous genes implicated in the cell cycle
  • Representations of orthologous genes in a 3-dimensional space and optimization of the superimposition
  • Statistical analysis comparing the optimal superimposition to random controls
  • Interpretation


  • Step 1 : Selection of genes implicated in the cell cycle
Two different laboratories used DNA microarrays to study the genome-wide transcriptional programs of the S. cerevisiae and S. pombe cell cycle. The first study, published in 1998 (Spellman, Sherlock et al. 1998), identified ~800 periodically transcribed genes. The second study, published in 2004 (Rustici, Mata et al. 2004), identified ~400 periodically expressed genes. We used the datasets named "alpha pheromone" (Spellman, Sherlock et al. 1998) and "elutriation1" (Rustici, Mata et al. 2004), related to the yeast S. cerevisiae and S. pombe respectively. These datasets were chosen because in both cases, cells were synchronized at G1 stage and through two cycles. As lists reporting the gene names labelled by the authors as implicated in the cell-cycle control were available online (here and here), we took advantage of this supplementary information to reduce the complexity of the studied system. From all the genes for which expression data was available, we only analysed genes that were reported as being periodic. According to the notations defined in Methods, the resulting expression matrices were constituted of G1 = 352 genes and N1 = 20 time points for the S. pombe genome, and G2 = 792 genes and N2 = 18 time points for the S. cerevisiae genome.
  • Step 2 : Representation in a 3-dimensional space and classification of the expression profiles into temporal classes
Genes selected in step 1 were mapped on to a 3-dimensional space, according to the expression measurements during the cell-cycle time course experiments (Figure below A). They were also classified into four temporal classes by k-means clustering (Figure below B). To visualize the successive waves of transcription during the cell-cycle on the 3-dimensional representation, each gene was coloured according to its temporal class (blue, red, green and yellow classes).



Figure legend : (A) The 352 and 792 genes reported as being periodic during the cell cycle process in the fission (green) and budding (blue) yeasts were mapped in a 3-dimensional space according to the expression measurements during the time course experiments described in (Rustici, Mata et al. 2004) and (Spellman, Sherlock et al. 1998). Points represent genes, and white segments link pairs of genes for which the inter-gene distance between their expression measurements is less than a threshold D0 = 0.6 (a conventional correlation higher than 0.8). (B) Using the k-means algorithm, genes were classified according to their expression profiles into four classes. Genes contained in each group are shown as different colours so that they can be identified in the gene expression network. Unlike in (A), only the first and the second coordinates are represented here.


  • Step 3 : Selection of orthologous genes implicated in the cell cycle
Using the orthology relationships inferred from the complete genomes of S. cerevisiae and S. pombe, we identified 132 orthologous gene pairs implicated in the cell cycle in both species. These pairs corresponded to G1 = 91 S. pombe genes and G2 = 104 S. cerevisiae genes.
  • Step 4 : Representation of orthologous genes in a 3-dimensional space and optimization of the superimposition
First, each set of orthologous genes was independently represented in a 3-dimensional space (Figure below A). Then, to analyse the concordance between orthology relationships and expression data, we modified the superimposition between the two 3D-gene expression networks using the optimization procedure described in Methods. The S. pombe genome was moved with a W-value of 1.2 (Figure below B). Note that the optimal superimposition was performed with different W-values (data not shown) and 1.2 appeared to be a good compromise between proximity of orthologous gene pairs and distortion of the displaced gene expression network.

Figure legend : (A) Orthologous genes between the budding and fission yeasts were independently represented in a 3-dimensional space according to their expression measurements during the time course experiments described in (Rustici, Mata et al. 2004) and (Spellman, Sherlock et al. 1998). Points represent the 91 and 104 genes belonging to the S. pombe (green) and S. cerevisiae (blue) genomes respectively. White segments link pairs of genes for which the inter-gene distance between their expression measurements is less than a threshold D0 = 0.6 (a conventional correlation higher than 0.8). (B) Initial superimposition of the 3D-gene expression networks shown in (A). Ortholog gene pairs are connected with red segments. (C) Result of the superimposition after the optimization procedure described in Methods. Only the S. pombe genome (green) was displaced, with a W-value = 1.2.
  • To enlarge the 3D-gene expression networks, click on the following pictures :
(A) Fission yeast :

Budding yeast :

(B) Initial superimposition :

(C) Final superimposition :



  • Step 5 : Statistical analysis comparing the final superimposition to random controls
To assess the significance of overall 3D-proximity between orthologs after optimization, we used the statistical analysis comparing the criterion E (see the main text). The calculated Z-score was -3.37 (Figure below), meaning that orthology relationships and expression during the cell cycle process have significant concordance between S. cerevisiae and S. pombe.
We then examined whether the overall concordance between sequence conservation and expression is similar for the four temporal classes of genes. The criterion E was decomposed into four components : E-blue, E-green, E-red and E-yellow representing the contribution of genes belonging to the same temporal class in the displaced genome (E = E-blue,+ E-green+ E-red + E-yellow). Interestingly, the calculated Z-score for the "green" class was more significant than the calculated Z-scores for the other classes (-5.2 compared to -1.3; -1.8; -1.2). This class is comprised of 31 genes and exhibits a major contribution in the global reduction of distances between orthologous genes. Such a result suggests that these genes have a particularly well conserved expression between the yeasts S. cerevisiae and S. pombe, during the cell cycle.

Figure legend : The final E-value, obtained after superimposition with W = 1.2, was decomposed into its four components E-blue, E-green, E-red and E-yellow, calculated for genes belonging to the same temporal class (E = E-blue + E-green + E-red + E-yellow). Each value (red line) was compared to random controls obtained by reshuffling the S. cerevisiae and S. pombe lists of ortholog genes. Means of the random distributions are indicated, with the calculated Z-scores in square brackets.

To extend this observation, we analysed the green class of genes using functional gene annotation available in the Gene Ontology (GO) (Ashburner, Ball et al. 2000) and GeneDB database (Hertz-Fowler, Peacock et al. 2004). As a result, we found that these genes are mostly involved in "chromosome structure" or "nucleic acid metabolism" like HTA1 (histone H2A, alpha), HTA2 (histone H2A, beta), PHT1 (histone H2A variant), HTB1 (histone H2B, alpha), HHT2 (histone H3), H3.3 (histone H3), HHF1 (histone H4), HHF2 (histone H4), H4.3 (histone H4), CDC22 (ribonucleoside reductase large subunit Cdc22), HHT1 (histone H3), RHP51 (recombinase Rhp51), MRC1 (mediator of replication checkpoint 1). Interestingly, this result is in agreement with an observation made by Rustici et al. (2004), who compared the cell cycle-regulated genes lists between S. cerevisiae and S. pombe and observed a small overlap between orthologous genes, periodically expressed in both species (~40 genes). To the question of why this relatively small core set of cell cycle-regulated genes has been conserved through evolution, they raised the following hypothesis : "Perhaps they are genes that must be highly regulated to ensure orderly progression or that required in relatively large amounts with peak demands, such as histone genes". However, as they did not directly compare the microarray results obtained between the two yeasts, our approach gives further insight into this issue. By combining orthology relationships and expression data, our methodology allows a direct comparison of overall and modular properties of the transcriptional programs between two species. In the cell cycle, we also identified a core of 31 genes whose expression is particularly well conserved between the two yeasts and half of these genes code for histones. This findings provide support for the hypothesis of Rustici et al. (2004).

  • Step 6 : Interpretation
In this study, we followed a procedure identical to that used to compare the sporulation datasets between S. cerevisiae and S. pombe. Together, these two examples demonstrate the effectiveness of our methodology, since both cross-species comparisons lead to interesting results that are in agreement with conclusions of previous studies. Our approach was developed with the objective of performing cross-species comparisons based on selected experiments that are as close as possible. That is the reason why we decided to work first with the sporulation data, because of the simplicity of the experimental procedure (yeast cells are transferred to a nitrogen-deficient medium that induced sporulation and changes in the concentrations of the mRNA transcripts from each gene were measured using DNA microarray), compared to the cell cycle data, where cell cultures are synchronized by different methods that can introduce specific artefacts in the measured expression profiles (Shedden and Cooper 2002). In this situation, the observed differences between expression profiles can be due to experimental variability and not to relevant biological reasons. Thus, when working with microarray data coming from different sources, a challenging part of the work is to verify that the comparison is reasonable (Jordan 2004; Marshall 2004). Our method, as with all other methods, depends on the accuracy of the gene expression profiles estimated in the original studies. Nevertheless, even if the superimposition can be done with gene expression networks that have nothing in common, the statistical analysis comparing the final superimposition to random controls is highly informative for estimating the relevance of the optimized superimposition (see supplemantary note S10) as well as for identifying subparts of the gene expression networks that are more (or less) conserved between species. The combination of expression and sequence data (orthology relationships) allows the discrimination between experimental noise and biological information. With the cell cycle data, our methodology allowed the identification of a subclass of genes whose expression is highly conserved between the two yeasts.



January 2006. Trouble with this site?