- Assessment of the approach with simulated data with known effects -
In this study, we wanted to evaluate the ability of our approach to identify orthologous pairs of genes whose expression
is conserved, in spite of noise addition in the gene expression measurements. For this purpose, we used as a prototype
the S. cerevisiae 3D-gene expression network derived from the sporulation dataset. Several synthetic datasets were
generated by adding noise (low, medium or high intensity) in the gene expression measurements, except for 10 genes serving
as a reference to estimate the relevance of the optimized superimposition (genes coloured in orange, Figure below).
These 10 genes have the same expression measurements between the real and the synthetic expression datasets.
After the procedure to superimpose optimally the gene expression networks represented in a multi-dimensional space,
these genes should, in principle, appear to be more conserved than any other genes (for which noise was added in the gene
expression measurements).
To test the ability of our approach to identify these 10 genes whose expression was unchanged between real and
simulated data, the 3D-gene expression networks were first constructed using the synthetic datasets
(low, medium and high noise levels). The results are presented Figure below (B).
Note that the 10 genes serving as controls are shown in orange on each picture. In a second step, the real S. cerevisiae
gene expression network (derived from the real expression data) was optimally superimposed on each noisy 3D-gene expression
network, with a W-value of 2 (see below for initial and optimized superimpositions). Finally, genes were represented
in biplots according to their final Ei (X-axis) and Delta-i (Y-axis) values (Figure below (C)).
On the resulting graphs, we could observe that whatever the noise level (low; medium or high), the orange genes
were correctly superimposed, i. e. they exhibited low Ei and Delta-i values. In addition, more the greater the noise
level, the more the orange genes appeared to have the most conserved expression. This was especially apparent considering
the biplot obtained with the noise level of 1: orange genes were the only genes located in the lower left part of
the plot. These results demonstrate that the procedure for optimal superimposition is tolerant of noise and
illustrates the effectiveness of our approach. Indeed from a biological point of view, the prerequisite for comparing
two gene expression networks derived from two different species is to be able to identify precisely classes of genes
whose expression is well conserved between the two species, despite other classes of genes whose expression has varied
across evolution. Then, if "noise" represents the "evolutionary divergence" between two gene expression networks,
the simulations presented in this section give interesting results and demonstrate that our approach does not extract
spurious relationships, i. e. gene pairs whose expression measurements appear to be conserved while they are not.
Figure legend :
(A) S. cerevisiae 3D-gene expression network presented in the main text (sporulation datasets). The orange points
correspond to 10 genes whose expression measurements were not changed during the simulation procedure.
(B) Simulated 3D-gene expression networks, obtained by randomly adding noise according to a normal distribution with
several standard deviations = {0.1; 0.3; 1} (i.e. low, medium and high noise levels).
(C) Biplots representing each gene according to its final Ei (X-axis) and Delta-i (Y-axis) values.
Whatever the noise level, the orange genes are correctly superimposed during the optimization procedure
(low Ei and Delta-i values).
- To enlarge the 3D-gene expression networks click on the following pictures :
Real data :
|
Noise level = 0.1 :
|
Noise level = 0.3 :
|
Noise level = 1 :
|
- Initial and optimized superimpositions of the real and simulated gene expression networks:
Initial superimposition (Noise level = 0.1) :
|
Optimized superimposition (w = 2) :
|
Initial superimposition (Noise level = 0.3) :
|
Optimized superimposition (w = 2) :
|
Initial superimposition (Noise level = 1) :
|
Optimized superimposition (w = 2) :
|

- Relevance of the optimized superimposition
Estimating the significance of 3D-proximity between genes after optimization of the superimposition is of fundamental
importance. One must keep in mind that, in theory, an "optimal superimposition" can be found with any gene expression
networks, even those that don't have anything in common. In the main text, we present a statistical analysis to assess
the relevance of the optimized superimpostion. The idea is to compare the final 3D-proximity between genes to random
controls, obtained by reshuffling the orthologous gene list (i. e. the "attractor" pairs of genes during the
superimposition). To evaluate the abilities of this statistical control to distinguish between superimposition
of two randomly generated gene expression networks and two gene expression networks whose organisation is partially
conserved, we performed the following simulations. Once again, we used as a prototype the S. cerevisiae 3D-gene expression
network derived from the sporulation dataset. The real S. cerevisiae gene expression network was optimally superimpose
on a noisy 3D-gene expression network as well as on a randomised 3D-gene expression network (obtained by disordering
all the expression values). Results were compared to random controls obtained by reshuffling the links between genes
serving as "attractors" during the superimposition (see Figure below). As expected, the calculated Z-Score was clearly
significant (-10) when the S. cerevisiae gene expression network was compared to a network with similar organization
(noisy network), whereas the calculated Z-score was non significant (only -1) when the S. cerevisiae gene expression
network was compared to a network randomly generated, i. e. with no conserved structure.
Figure legend :
The real S. cerevisiae gene expression network derived from the real expression data (see Figure before) was optimally
superimposed on : (A) a noisy 3D-gene expression network (noise level 0.3) and (B) a randomised 3D-gene expression network.
The final E-values (red line) were compared to random controls obtained by reshuffling the list of "attractors".
Means m of the random distributions are indicated, with the calculated Z-scores in square brackets.