- Assessment of the approach with simulated data with known effects -

  • Principle
In this study, we wanted to evaluate the ability of our approach to identify orthologous pairs of genes whose expression is conserved, in spite of noise addition in the gene expression measurements. For this purpose, we used as a prototype the S. cerevisiae 3D-gene expression network derived from the sporulation dataset. Several synthetic datasets were generated by adding noise (low, medium or high intensity) in the gene expression measurements, except for 10 genes serving as a reference to estimate the relevance of the optimized superimposition (genes coloured in orange, Figure below). These 10 genes have the same expression measurements between the real and the synthetic expression datasets. After the procedure to superimpose optimally the gene expression networks represented in a multi-dimensional space, these genes should, in principle, appear to be more conserved than any other genes (for which noise was added in the gene expression measurements).
  • Simulation results
To test the ability of our approach to identify these 10 genes whose expression was unchanged between real and simulated data, the 3D-gene expression networks were first constructed using the synthetic datasets (low, medium and high noise levels). The results are presented Figure below (B). Note that the 10 genes serving as controls are shown in orange on each picture. In a second step, the real S. cerevisiae gene expression network (derived from the real expression data) was optimally superimposed on each noisy 3D-gene expression network, with a W-value of 2 (see below for initial and optimized superimpositions). Finally, genes were represented in biplots according to their final Ei (X-axis) and Delta-i (Y-axis) values (Figure below (C)). On the resulting graphs, we could observe that whatever the noise level (low; medium or high), the orange genes were correctly superimposed, i. e. they exhibited low Ei and Delta-i values. In addition, more the greater the noise level, the more the orange genes appeared to have the most conserved expression. This was especially apparent considering the biplot obtained with the noise level of 1: orange genes were the only genes located in the lower left part of the plot. These results demonstrate that the procedure for optimal superimposition is tolerant of noise and illustrates the effectiveness of our approach. Indeed from a biological point of view, the prerequisite for comparing two gene expression networks derived from two different species is to be able to identify precisely classes of genes whose expression is well conserved between the two species, despite other classes of genes whose expression has varied across evolution. Then, if "noise" represents the "evolutionary divergence" between two gene expression networks, the simulations presented in this section give interesting results and demonstrate that our approach does not extract spurious relationships, i. e. gene pairs whose expression measurements appear to be conserved while they are not.




Figure legend : (A) S. cerevisiae 3D-gene expression network presented in the main text (sporulation datasets). The orange points correspond to 10 genes whose expression measurements were not changed during the simulation procedure. (B) Simulated 3D-gene expression networks, obtained by randomly adding noise according to a normal distribution with several standard deviations = {0.1; 0.3; 1} (i.e. low, medium and high noise levels). (C) Biplots representing each gene according to its final Ei (X-axis) and Delta-i (Y-axis) values. Whatever the noise level, the orange genes are correctly superimposed during the optimization procedure (low Ei and Delta-i values).
  • To enlarge the 3D-gene expression networks click on the following pictures :
Real data :

Noise level = 0.1 :

Noise level = 0.3 :

Noise level = 1 :


  • Initial and optimized superimpositions of the real and simulated gene expression networks:
Initial superimposition (Noise level = 0.1) :

Optimized superimposition (w = 2) :

Initial superimposition (Noise level = 0.3) :

Optimized superimposition (w = 2) :

Initial superimposition (Noise level = 1) :

Optimized superimposition (w = 2) :



  • Relevance of the optimized superimposition
Estimating the significance of 3D-proximity between genes after optimization of the superimposition is of fundamental importance. One must keep in mind that, in theory, an "optimal superimposition" can be found with any gene expression networks, even those that don't have anything in common. In the main text, we present a statistical analysis to assess the relevance of the optimized superimpostion. The idea is to compare the final 3D-proximity between genes to random controls, obtained by reshuffling the orthologous gene list (i. e. the "attractor" pairs of genes during the superimposition). To evaluate the abilities of this statistical control to distinguish between superimposition of two randomly generated gene expression networks and two gene expression networks whose organisation is partially conserved, we performed the following simulations. Once again, we used as a prototype the S. cerevisiae 3D-gene expression network derived from the sporulation dataset. The real S. cerevisiae gene expression network was optimally superimpose on a noisy 3D-gene expression network as well as on a randomised 3D-gene expression network (obtained by disordering all the expression values). Results were compared to random controls obtained by reshuffling the links between genes serving as "attractors" during the superimposition (see Figure below). As expected, the calculated Z-Score was clearly significant (-10) when the S. cerevisiae gene expression network was compared to a network with similar organization (noisy network), whereas the calculated Z-score was non significant (only -1) when the S. cerevisiae gene expression network was compared to a network randomly generated, i. e. with no conserved structure.


Figure legend : The real S. cerevisiae gene expression network derived from the real expression data (see Figure before) was optimally superimposed on : (A) a noisy 3D-gene expression network (noise level 0.3) and (B) a randomised 3D-gene expression network. The final E-values (red line) were compared to random controls obtained by reshuffling the list of "attractors". Means m of the random distributions are indicated, with the calculated Z-scores in square brackets.



January 2006. Trouble with this site?