CoNet by default runs without Rserve, but enabling Rserve allows the
usage of more advanced implementations of some measures, especially the minet
package for mutual information.
Rserve configuration
By default, Rserve is disabled. Enable it by activating the "Enable Rserve" check box
(assuming you have installed and started the Rserve server outside of CoNet).
The host and port of the Rserve server can be specified in the host and port text fields.
Rserve installation
Rserve is a server that
allows to connect to R from within other programming languages.
If you do not have access to an Rserve server, you can install and run it
locally on your machine as follows:
After installation of the Rserve package, load it with:
library(Rserve)
Then start the Rserve server with the command:
Rserve(args="--no-save")
Rserve is now running locally on your machine with default parameters (host=127.0.0.1, port=6311)
More detailed information on Rserve installation is given here.
Implementation selection
For mutual information (MI), the choice of implementation affects the results.
Three implementations are offered, the default is the implementation by Jean Sebastien Lerat (option jsl).
Alternatively, MI can be computed with a re-implementation of the ARACNE implementation
(the original Matlab code is available here) which has one parameter,
the standard deviation of the Gaussian kernel (which can be set as one of the global constants). However, ARACNE cannot deal with missing values.
Last but not least, MI can be computed using the minet package in R. This requires Rserve to be enabled. Minet is by default run with the "mi.shrink"
estimator. Discretization strategy and estimator can be selected in the Minet settings.
Association rule mining binary
CoNet wraps the apriori association mining algorithm developed by Agrawal et al. and implemented by Christian Borgelt.
To use apriori in combination with CoNet, please download it from here
and indicate the directory in which you placed the binary. This can be done by clicking "Select folder", then click the folder of choice and finally click "Choose".
Alternatively, you can add the binary to your path environment variable (/usr/bin or /usr/local/bin).
Note that association rule mining is not supported for Windows.
Missing value treatment
Missing values can either be ignored (the default) or treated by omission ("pairwise_omit"). In the latter case, if a
score is computed between two matrix rows, all value pairs that contain a missing value are omitted. Non-pairwise computations
such as the ARACNE implementation of mutual information do not handle missing values and can therefore not used in combination
with "pairwise_omit".
Pair-wise omission of missing values may shorten the vectors to such an extend that the computed score is meaningless. To prevent this,
a minimum number of missing-value free pairs can be specified, which should however not exceed the number of columns in the input matrix.
Global constants
The Gauss kernel width is a parameter of the ARACNE implementation of mutual information.
The pseudo-count is added to zeros in measures like Kullback-Leibler and Jensen-Shannon dissimilarity to prevent negative infinity to occur.
Note that the variance of log ratios and to a lesser extent Kullback-Leibler are sensitive to the value of the pseudocount,
thus the same pseudocount needs to be selected to compare their scores across different data sets. However, pseudocounts
are adjusted automatically when they are larger than the smallest non-zero value in the matrix.
Thus, it is better to compare Kullback-Leibler and variance of log ratio networks not on the score level, but
on the p-value level.
Mutual information settings
Set minet default value for mutual information estimation (see the minet R package documentation for details).
In addition, the discretization method for both minet and mutual information as implemented in Jean Sebastien Lerat's library
can be selected. The selected discretization method is only applied if the matrix is an abundance matrix or if a count matrix has been
transformed into a continuous matrix by a normalization step.
The number of bins is always determined by taking the square root of the column number.
System
Logging
The log level determines the verbosity of CoNet, ranging from "fatal" (only messages leading
to a crash are recorded) to "debug" (most verbose level). CoNet logs to the "output.log" file
in Cytoscape's root folder. By default, the "fatal" level is enabled to keep the log file size small.
In case errors occur, the log level can be increased.
Speed-up
From version 1.0b1 onwards, CoNet comes with a major re-implementation of its core to make it faster. However,
it is possible to switch back to the old implementation by disabling the speed-up.
The re-implementation covers the measures Pearson, Spearman, Kendall, Hellinger, Euclid, Steinhaus, Bray Curtis, Kullback-Leibler
and variance of Log-ratios and adds the Hilbert-Schmidt independence criterion (which is not available with the old core).
The re-implementation also includes renormalization for these measures.
When speed-up is disabled, the following implementations are used:
Spearman
Apache commons.math
Pearson
JSC (java statistical classes)
Kendall
JSC (java statistical classes)
Hellinger
difference in formula: normalization factor 1/sqrt(2) not used
For Kullback-Leibler, the score range (but not the edge ranks) changed for versions 1.0b1 to 1.0b3, such that old threshold files
cannot be re-used with these versions. However, from version 1.0b3.1 onwards, the Kullback-Leibler score range
is the same as for the alpha version.
From version 1.0b1 to 1.0b5, the re-implemented Spearman did not treat ties correctly.
In addition, it gave wrong results when run with missing value treatment enabled.
This has been fixed from 1.0b6 onwards.
From version 1.0b2 onwards, mutual information and Jensen-Shannon dissimilarity are also available (with the old core, only the ARACNE implementation of
mutual information is available and Jensen-Shannon is not supported).
Pairwise count/abundance measures not covered by the speed-up, which include logged Euclidean and Chi-Square distance, are removed.
Note that the speed-up does not cover any of the incidence measures.
Note on reproducibility
Networks constructed with the beta and the alpha version (using Kullback-Leibler, Spearman, Pearson and Bray Curtis as well as
the full pipeline, e.g. p-values computed from both renormalized permutations and bootstraps) do not overlap perfectly.
The Jaccard indices of edge overlap vary, but are typically between 0.5 and 0.6 (the Jaccard index ranges from 0 for no overlap to 1 for 100% overlap).
The main reason for this difference is the treatment of pseudocounts needed for Kullback-Leibler: in the beta version, it is ensured
that pseudocounts are not larger than 1/100th of the smallest value in the matrix.
Networks also vary slightly when re-computed with the same version of CoNet, if p-values are computed by randomization
(typically this variation should not be larger than 5% in terms of Jaccard index of edge overlap).
To reproduce results from the alpha version, the alpha version can be downloaded from CoNet's web page (click the About button for an active link to CoNet's web page).