Pula-Magdeburg single-gene knockout benchmark dataset
A collection of single-gene knockout datasets has been produced as a benchmark for network inference algorithms described in the paper Reconstruction of large-scale regulatory networks based on perturbation graphs and transitive reduction: improved methods and their performance, BMC Systems Biology 2013, 7:73. The compendium consists of 270 datasets simulated from 30 different 5000-gene networks according to 9 noise configurations.
For each network, a compressed archive (whose size is approximately close to 700 MB) is made available for download. Each archive includes 19 files:
- The list of unsigned edges encoding the directed interactions in the 5000-gene network.
- The wild-type gene expression values, one file for each of the 9 noise configurations.
- The matrix of expression values after the single-gene knockout of all genes in the network, one file for each of the 9 noise configurations.
The data can be downloaded from these links:
|
|
The above SysGenSIM and DREAM4 networks have been employed for evaluating this collection of scripts for network inference (one small bug has been fixed on August 7th, 2013). The algorithms can be also easily adapted to reverse-engineer other gene networks from perturbation data.
StatSeq benchmark dataset
The StatSeq compendium consists of 72 datasets originated from 9 different in silico gene networks, each simulated under 8 different parameter settings, in order to investigate the performances of inference algorithms over various network and population sizes, marker distances, and heritability. All datasets have been simulated with SysGenSIM 1.0.2.
The networks are characterized by different size (100, 1000 and 5000 genes) and contain a large strongly connected component.
More detailed information about the compendium and the evaluation of predictions is available here: StatSeq dataset description.
For each dataset, gold standard networks, simulated gene expression and genotype are available for download:
- 100-gene networks (6.3 MB)
- 1000-gene networks (62.7 MB)
- 5000-gene networks (311.3 MB)
- Median value of the heritability for each dataset
The evaluation of predictions may be accomplished through the following MATLAB script: Evaluation script and gold standard networks (148.8 KB).
DREAM5 benchmark dataset
The DREAM5 SysGenA compendium is a collection of simulated datasets, produced for the DREAM5 Systems Genetics In-silico Network subchallenge in 2010. The aim is to reverse-engineer gene networks from systems genetics data.
The whole dataset has been simulated with a preliminary version of SysGenSIM.
The compendium consists of 15 datasets, corresponding to 15 different 1000-gene in silico networks equipped with simulated gene expression and genotype data. In particular, 5 networks have data for only 100 RILs, 5 networks for 300 RILs, and 5 networks for 999 RILs. More information is available in the challenge description.
Data download:
- Gene expression and genotype data (29.7 MB): from DREAM5 website (registration required) or from CRS4 mirror
- Gold standard networks (178.3 KB)
Predictions were evaluated by calculating various scores (described here) with the following MATLAB script: Evaluation script.