Contents

1 Installation

2 Introduction and algorithm description

In the previous vignette, we explored all aspects of gene coexpression networks (GCNs), which are represented as undirected weighted graphs. It is undirected because, for a given link between gene A and gene B, we can only say that these genes are coexpressed, but we cannot know whether gene A controls gene B or otherwise. Further, weighted means that some coexpression relationships between gene pairs are stronger than others. In this vignette, we will demonstrate how to infer gene regulatory networks (GRNs) from expression data with BioNERO. GRNs display interactions between regulators (e.g., transcription factors or miRNAs) and their targets (e.g., genes). Hence, they are represented as directed unweighted graphs.

Numerous algorithms have been developed to infer GRNs from expression data. However, the algorithm performances are highly dependent on the benchmark data set. To solve this uncertainty, Marbach et al. (2012) proposed the application of the “wisdom of the crowds” principle to GRN inference. This approach consists in inferring GRNs with different algorithms, ranking the interactions identified by each method, and calculating the average rank for each interaction across all algorithms used. This way, we can have consensus, high-confidence edges to be used in biological interpretations. For that, BioNERO implements three popular algorithms: GENIE3 (Huynh-Thu et al. 2010), ARACNE (Margolin et al. 2006) and CLR (Faith et al. 2007).

3 Data preprocessing

Before inferring the GRN, we will preprocess the expression data the same way we did in the previous vignette.

4 Gene regulatory network inference

BioNERO requires only 2 objects for GRN inference: the expression data (SummarizedExperiment, matrix or data frame) and a character vector of regulators (transcription factors or miRNAs). The transcription factors used in this vignette were downloaded from PlantTFDB 4.0 (Jin et al. 2017).

4.1 Consensus GRN inference

Inferring GRNs based on the wisdom of the crowds principle can be done with a single function: exp2grn(). This function will infer GRNs with GENIE3, ARACNE and CLR, calculate average ranks for each interaction and filter the resulting network based on the optimal scale-free topology (SFT) fit. In the filtering step, n different networks are created by subsetting the top n quantiles. For instance, if a network of 10,000 edges is given as input with nsplit = 10, 10 different networks will be created: the first with 1,000 edges, the second with 2,000 edges, and so on, with the last network being the original input network. Then, for each network, the function will calculate the SFT fit and select the best fit.

4.2 Algorithm-specific GRN inference

This section is directed to users who, for some reason (e.g., comparison, exploration), want to infer GRNs with particular algorithms. The available algorithms are:

GENIE3: a regression-tree based algorithm that decomposes the prediction of GRNs for n genes into n regression problems. For each regression problem, the expression profile of a target gene is predicted from the expression profiles of all other genes using random forests (default) or extra-trees.

ARACNE: information-theoretic algorithm that aims to remove indirect interactions inferred by coexpression.

CLR: extension of the relevance networks algorithm that uses mutual information to identify regulatory interactions.

Users can also infer GRNs with the 3 algorithms at once using the function exp_combined(). The resulting edge lists are stored in a list of 3 elements.1 NOTE: Under the hood, exp2grn() uses exp_combined() followed by averaging ranks with grn_average_rank() and filtering with grn_filter().

5 Gene regulatory network analysis

After inferring the GRN, BioNERO allows users to perform some common downstream analyses.

5.1 Hub gene identification

GRN hubs are defined as the top 10% most highly connected regulators, but this percentile is flexible in BioNERO.2 NOTE: Remember: GRNs are represented as directed graphs. This implies that only regulators are taken into account when identifying hubs. The goal here is to identify regulators (e.g., transcription factors) that control the expression of several genes. They can be identified with get_hubs_grn().

5.2 Network visualization

GRNs can also be visualized interactively for exploratory purposes.

Finally, BioNERO can also be used for visualization and hub identification in protein-protein (PPI) interaction networks. The functions get_hubs_ppi() and plot_ppi() work the same way as their equivalents for GRNs (get_hubs_grn() and plot_grn()).

Session information

This vignette was created under the following conditions:

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] BioNERO_1.0.1    BiocStyle_2.20.2
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1                backports_1.2.1            
##   [3] circlize_0.4.13             Hmisc_4.5-0                
##   [5] plyr_1.8.6                  igraph_1.2.6               
##   [7] splines_4.1.0               GENIE3_1.14.0              
##   [9] BiocParallel_1.26.1         GenomeInfoDb_1.28.1        
##  [11] ggnetwork_0.5.9             ggplot2_3.3.5              
##  [13] sva_3.40.0                  digest_0.6.27              
##  [15] foreach_1.5.1               htmltools_0.5.1.1          
##  [17] magick_2.7.2                GO.db_3.13.0               
##  [19] fansi_0.5.0                 magrittr_2.0.1             
##  [21] checkmate_2.0.0             memoise_2.0.0              
##  [23] cluster_2.1.2               doParallel_1.0.16          
##  [25] limma_3.48.1                openxlsx_4.2.4             
##  [27] ComplexHeatmap_2.8.0        fastcluster_1.2.3          
##  [29] Biostrings_2.60.1           annotate_1.70.0            
##  [31] matrixStats_0.59.0          jpeg_0.1-8.1               
##  [33] colorspace_2.0-2            ggrepel_0.9.1              
##  [35] blob_1.2.1                  haven_2.4.1                
##  [37] xfun_0.24                   dplyr_1.0.7                
##  [39] crayon_1.4.1                RCurl_1.98-1.3             
##  [41] jsonlite_1.7.2              genefilter_1.74.0          
##  [43] impute_1.66.0               survival_3.2-11            
##  [45] iterators_1.0.13            glue_1.4.2                 
##  [47] gtable_0.3.0                zlibbioc_1.38.0            
##  [49] XVector_0.32.0              GetoptLong_1.0.5           
##  [51] DelayedArray_0.18.0         car_3.0-11                 
##  [53] shape_1.4.6                 BiocGenerics_0.38.0        
##  [55] abind_1.4-5                 scales_1.1.1               
##  [57] edgeR_3.34.0                DBI_1.1.1                  
##  [59] rstatix_0.7.0               Rcpp_1.0.6                 
##  [61] xtable_1.8-4                htmlTable_2.2.1            
##  [63] clue_0.3-59                 foreign_0.8-81             
##  [65] bit_4.0.4                   preprocessCore_1.54.0      
##  [67] Formula_1.2-4               stats4_4.1.0               
##  [69] htmlwidgets_1.5.3           httr_1.4.2                 
##  [71] RColorBrewer_1.1-2          ellipsis_0.3.2             
##  [73] farver_2.1.0                pkgconfig_2.0.3            
##  [75] XML_3.99-0.6                nnet_7.3-16                
##  [77] sass_0.4.0                  locfit_1.5-9.4             
##  [79] utf8_1.2.1                  dynamicTreeCut_1.63-1      
##  [81] labeling_0.4.2              reshape2_1.4.4             
##  [83] tidyselect_1.1.1            rlang_0.4.11               
##  [85] AnnotationDbi_1.54.1        cellranger_1.1.0           
##  [87] munsell_0.5.0               tools_4.1.0                
##  [89] cachem_1.0.5                generics_0.1.0             
##  [91] RSQLite_2.2.7               statnet.common_4.5.0       
##  [93] broom_0.7.8                 evaluate_0.14              
##  [95] stringr_1.4.0               fastmap_1.1.0              
##  [97] yaml_2.2.1                  RhpcBLASctl_0.20-137       
##  [99] knitr_1.33                  bit64_4.0.5                
## [101] zip_2.2.0                   purrr_0.3.4                
## [103] KEGGREST_1.32.0             nlme_3.1-152               
## [105] compiler_4.1.0              rstudioapi_0.13            
## [107] curl_4.3.2                  png_0.1-7                  
## [109] ggsignif_0.6.2              minet_3.50.0               
## [111] tibble_3.1.2                statmod_1.4.36             
## [113] geneplotter_1.70.0          bslib_0.2.5.1              
## [115] stringi_1.6.2               highr_0.9                  
## [117] forcats_0.5.1               lattice_0.20-44            
## [119] Matrix_1.3-4                vctrs_0.3.8                
## [121] networkD3_0.4               pillar_1.6.1               
## [123] lifecycle_1.0.0             BiocManager_1.30.16        
## [125] jquerylib_0.1.4             GlobalOptions_0.1.2        
## [127] cowplot_1.1.1               data.table_1.14.0          
## [129] bitops_1.0-7                GenomicRanges_1.44.0       
## [131] R6_2.5.0                    latticeExtra_0.6-29        
## [133] bookdown_0.22               network_1.17.1             
## [135] rio_0.5.27                  gridExtra_2.3              
## [137] IRanges_2.26.0              codetools_0.2-18           
## [139] assertthat_0.2.1            SummarizedExperiment_1.22.0
## [141] DESeq2_1.32.0               rjson_0.2.20               
## [143] S4Vectors_0.30.0            GenomeInfoDbData_1.2.6     
## [145] intergraph_2.0-2            mgcv_1.8-36                
## [147] hms_1.1.0                   parallel_4.1.0             
## [149] grid_4.1.0                  rpart_4.1-15               
## [151] tidyr_1.1.3                 NetRep_1.2.4               
## [153] coda_0.19-4                 rmarkdown_2.9              
## [155] carData_3.0-4               MatrixGenerics_1.4.0       
## [157] Cairo_1.5-12.2              ggpubr_0.4.0               
## [159] ggnewscale_0.4.5            Biobase_2.52.0             
## [161] WGCNA_1.70-3                base64enc_0.1-3

References

Faith, Jeremiah J., Boris Hayete, Joshua T. Thaden, Ilaria Mogno, Jamey Wierzbowski, Guillaume Cottarel, Simon Kasif, James J. Collins, and Timothy S. Gardner. 2007. “Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.” PLoS Biology 5 (1): 0054–0066. https://doi.org/10.1371/journal.pbio.0050008.

Huynh-Thu, Vân Anh, Alexandre Irrthum, Louis Wehenkel, and Pierre Geurts. 2010. “Inferring regulatory networks from expression data using tree-based methods.” PLoS ONE 5 (9): 1–10. https://doi.org/10.1371/journal.pone.0012776.

Jin, J, F Tian, D C Yang, Y Q Meng, L Kong, J Luo, and G Gao. 2017. “PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants.” Nucleic Acids Res 45 (D1): D1040–D1045. https://doi.org/10.1093/nar/gkw982.

Marbach, Daniel, James C. Costello, Robert Küffner, Nicole M. Vega, Robert J. Prill, Diogo M. Camacho, Kyle R. Allison, et al. 2012. “Wisdom of crowds for robust gene network inference.” Nature Methods 9 (8): 796–804. https://doi.org/10.1038/nmeth.2016.

Margolin, Adam A., Ilya Nemenman, Katia Basso, Chris Wiggins, Gustavo Stolovitzky, Riccardo Dalla Favera, and Andrea Califano. 2006. “ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context.” BMC Bioinformatics 7 (SUPPL.1): 1–15. https://doi.org/10.1186/1471-2105-7-S1-S7.