Before running BioNetStat, you must set the execution parameters available on
the left sidebar. Below, we detail each differential
network analysis parameter.
Column classes name
After input the data variables values data BioNetStat will choose the first character (factor) column to classify the samples. If your dataset has more than one sample class column it is possible to choose the column of classes which you want to compare.
Classes (conditions) being compared
Select the classes you want to analise with BioNetStat.
Gene sets size range
BioNetStat performs tests for each variable set of a collection of sets defined in Variable set database. If none file were inputed only one group with all variables will be analyzed. To test only a subcollection of sets, you can filter the groups according to their sizes by setting the "Minimum gene set size" and "Maximum gene set size" parameters.
The minimum gene set size allowed is 5. However, we recommend
to test groups with at least 15 variables.
Testing large gene sets can spend much time. In general,
it is feasible to set 1000 or some hundreds of variables as the maximum size. However this number may vary according to the user's machine specification.
Method for network construction
The network links are inferred according to a measure of association between the variables values. BioNetStat provides three classical association measures:
-
Pearson: Pearson's correlation coefficient. It measures the linear
dependence between two variables. For the statistical test, we use the
Hmisc package.
-
Spearman: Spearman's correlation coefficient. It measures the monotonic
dependence between two variables. For the statistical test, we use the
Hmisc package.
-
Kendall: Kendall's Tau coefficient. It measures the monotonic
dependence between two variables. For the statistical test, we use the
psych package.
Network type
You can choose between unweighted and weighted networks:
-
Unweighted: Graphs where all the edges are weighted by one.
You must choose a threshold for the edges selection. Only
the edges that connect genes with an association degree greater than
the threshold will remain in the graph.
-
Weighted: Weighted networks are full graphs where each edge has
a weight. The weight of an edge is defined as the association degree
between the two gene products that are connected by it.
Statistic to link formation
The correlation coefficient or p-value obtained by one of the methods mentioned
above are used to set an association degree for each link of the network.
The following options are available to measure the association degrees:
- Absolute correlation: the absolute value of the correlation coefficient
- 1 - p-value: One minus the p-value of the test for dependence between
two gene products. If the p-value is small, the expression levels are
tightly associated.
- 1 - q-value: One minus the adjusted p-value of the test for dependence
between two gene products. The p-value is adjusted by
the False Discovery Rate (Benjamini and
Hochberg, 1995) method for multiple testing.
After choosing the association measure, the user has to choose the threshold value to links formation.
Links weights
If weighted option was selected, the user has also to choose which measure will be used as weight of the links.
Method for gene networks comparison
BioNetStat compares the correlation networks between the classes for each variable set.
Below, we describe the methods available for comparing unweighted networks:
-
Spectral distribution test: The spectrum of an undirected
graph is the set of eigenvalues of its adjacency matrix. The spectrum distribution describes many topological properties of a graph, such as
the number of walks, diameter, and cliques. The spectral distribution
test is based on the Kullback-Leiber (KL) divergence between spectral
distributions (Takahashi et al., 2012). It can be used to test if
two graphs were generated by the same model.
-
Spectral entropy test: It uses the absolute difference between spectral entropies (Takahashi et al., 2012) to measure the difference in
the graph topological organization complexity.
-
Degree distribution test: The degree of a node is the number of edges
that connect to it. The degree distribution test is based on the
Kullback-Leiber (KL) divergence between the degree distributions. BioNetStat uses the
igraph package implementation of the node degree.
-
Degree centrality test: The degree centrality test
is based on the Euclidian distance between the degree centralities
of the two networks adjusted by the number of vertices.
-
Betweenness centrality test: The betweenness centrality of a node is the number of shortest paths going through it (Freeman, 1979).
The betweenness centrality test is based on the Euclidian distance between the
betweenness centralities of
the two networks adjusted by the number of vertices. BioNetStat uses the igraph package
implementation.
-
Closeness centrality test: The closeness centrality of a node is the
inverse of the average length of the shortest paths between it and all
the other vertices in the graph (Freeman, 1979). The closeness
centrality test is based on the Euclidian distance between the
closeness centralities of the two networks adjusted by the number of
vertices. BioNetStat uses the igraph package
implementation.
-
Eigenvector centrality test: The eigenvector centrality of a node
vi is the ith value of the first eigenvector
of the graph adjacency matrix (Bonacich, 1987). The eigenvector
centrality test is based on the Euclidian distance between
eigenvector centralities of the two networks adjusted by the number of
vertices. BioNetStat uses the igraph package
implementation.
-
Clustering coefficient test:
The local clustering coefficient of a node is the number of edges between the
vertices within its neighborhood divided by the number of edges that could
exist among them (Watts and Strogatz, 1998). The clustering coefficient test
is based on the Euclidian distance between the
local clustering coefficients of the two networks adjusted by the number
of vertices. BioNetStat uses the igraph package
implementation.
BioNetStat includes generalizations of some of the statistics described above to
weighted undirected graphs. Let G be a weighted undirected graph.
We define the weighted adjacency matrix of G to be the
matrix W = (w)ij, such that wij is the
weight of the edge that connects the vertices vi and
vj.
In this context, 0 ≤ wij ≤ 1 and G is a full graph.
Below, we describe the methods available for comparing weighted networks:
-
Spectral distribution test: Replaces the usual adjacency matrix by the
weighted adjacency matrix, and then performs the spectral distribution
test for unweighted networks.
-
Spectral entropy test: Replaces the usual adjacency matrix by the
weighted adjacency matrix, and then performs the spectral entropy
test for unweighted networks.
-
Degree distribution test: BioNetStat generalizes the degree of a node
to the sum of the weights of the edges that connect to it (Barrat, 2004).
The software uses the igraph implementation of the node strength.
It replaces the usual node degree by the weighted degree, and
then computes the degree distribution test for unweighted networks.
-
Degree centrality test: Replaces the usual node degree by the weighted
degree, and then computes the degree centrality test for
unweighted networks.
-
Betweenness centrality test: The betweenness centrality of a node is the number of shortest paths going through it (Freeman, 1979).
The betweenness centrality test is based on the Euclidian distance between the
betweenness centralities of
the two networks adjusted by the number of vertices. BioNetStat uses the igraph package
implementation.
-
Eigenvector centrality test: replaces the usual adjacency matrix by the
weighted adjacency matrix, and then performs the eigenvector centrality
test for unweighted networks (Newton, 2004).
-
Clustering coefficient test: replaces the local clustering coefficient
of a node by the sum of the weights of the edges between the vertices within its neighborhood divided by the number of edges that could exist among
them (Lopez-Fernandez et al, 2004). Then it performs the
clustering coefficient test for unweighted networks.
For the "Spectral distribution test", the "Spectral entropy test", and
the "Degree distribution test" methods, you must select a criterion to
define the bandwidth for the probability density function estimation. The
available methods for computing the bandwidth are:
-
Sturges: the bandwidth is defined as (max(x) - min(x))/nbins (Sturges, 1926), where
x is the graph spectrum (for the tests based on the spectral density)
or the node degrees (for the degree distribution test), and
nbins=⌈log2(nV)
+ 1⌉,
with nV denoting the number of genes.
-
Silverman: the bandwidth is defined as 0.9min{sd(x), IQR(x)/1.34}
nV-0.2
(Silverman, 1986), unless the quartiles coincide,
where nV is the number of genes, sd(x) is the standard deviation of x, and IQR is the interquantile
range of x, with x denoting the graph spectrum (for the tests based on the spectral density)
or the node degrees (for the degree distribution test). If the
graph is empty, it is defined as 0.9nV-0.2.
BioNetStat uses the R 'density' function from the base package for estimating the
probability density function.
Permutation test settings
To compute a p-value for the differential network analysis, BioNetStat performs
a permutation based test, which generates N random permutations of
the sample labels.
The minimum possible p-value is 1⁄N + 1.
Therefore, the choice of N depends on the required significance level
of the test. You can set the N parameter on the
"Enter the number of label permutations" option.
To perform the same label permutations for all variable sets, you can set a seed
to generate the random permutations on the "Enter a seed to generate random
permutations" option.
Running the analysis
After loading the dataset and the execution parameters, click on the "Start
analysis" button. The warning "The analysis is runing..." will be shown on the "Analysis results" section:
The results and other execution messages are shown on the
"Analysis results" section.