publications
2025
- Construction of Multi-Modal Transcriptome-Small Molecule Interaction Networks from High-Throughput Measurements to Study Human Complex TraitsVaha Akbary Moghaddam, Sandeep Acharya, Michaela Schwaiger-Haber, and 8 more authorsbioRxiv, 2025
Small molecules (SMs) are integral to biological processes, influencing metabolism, homeostasis, and regulatory networks. Despite their importance, a significant knowledge gap exists regarding their downstream effects on biological pathways and gene expression, largely due to differences in scale, variability, and noise between untargeted metabolomics and sequencing-based technologies. To address these challenges, we developed a multi-omics framework comprising a machine learning-based protocol for data processing, a semi-supervised network inference approach, and network-guided analysis of complex traits. The ML protocol harmonized metabolomic, lipidomic, and transcriptomic data through batch correction, principal component analysis, and regression-based adjustments, enabling unbiased and effective integration. Building on this, we proposed a semi-supervised method to construct transcriptome-SM interaction networks (TSI-Nets) by selectively integrating SM profiles into gene-level networks using a meta-analytic approach that accounts for scale differences and missing data across omics layers. Benchmarking against three conventional unsupervised methods demonstrated the superiority of our approach in generating diverse, biologically relevant, and robust networks. While single-omics analyses identified 18 significant genes and 3 significant SMs associated with insulin sensitivity (IS), network-guided analysis revealed novel connections between these markers. The top-ranked module highlighted a cross-talk between fiber-degrading gut microbiota and immune regulatory pathways, inferred by the interaction of the protective SM, N-acetylglycine (NAG), with immune genes (FCER1A, HDC, MS4A2, and CPA3), linked to improved IS and reduced obesity and inflammation. Together, this framework offers a robust and scalable solution for multi-modal network inference and analysis, advancing SM pathway discovery and their implications for human health. Leveraging data from a population of thousands of individuals with extended longevity, the inferred TSI-Nets demonstrate generalizability across diverse conditions and complex traits. These networks are publicly available as a resource for the research community.
- FISHNET: A Network-based Tool for Analyzing Gene-level P-values to Identify Significant Genes Missed by Standard MethodsSandeep Acharya, Vaha Akbary Moghaddam, Wooseok J Jung, and 4 more authorsbioRxiv, 2025
FISHNET uses prior biological knowledge, represented as gene interaction networks and gene function annotations, to identify genes that do not meet the genome-wide significance threshold but replicate nonetheless. Its input is gene-level P-values from any source, including omicsWAS, aggregation of GWAS P-values, CRISPR screens, or differential expression analysis. It is based on the idea that genes whose P-values are low due to sampling error are distributed randomly across networks and functions, so genes with suggestive P-values that cluster in densely connected subnetworks and share common functions are less likely to reflect sampling error and more likely to replicate. FISHNET combines network and function analysis with permutation-based P-value thresholds to identify a small set of exceptional genes that we call FISHNET genes.Applied to 11 cardiovascular risk traits, FISHNET identified 19 gene-trait relationships that missed genome-wide significance thresholds but, nonetheless, replicated in an independent cohort. The replication rate of FISHNET genes matched or exceeded that of other genes with similar P-values. FISHNET identified a novel association between RUNX1 expression and HDL that is supported by experimental evidence that RUNX1 promotes white fat browning, which increases HDL cholesterol levels. FISHNET also identified an association between LTB expression and BMI that is supported by experimental evidence that higher LTB expression increases BMI via activation of the LTβR pathway. Both associations failed genome-wide significance thresholds, highlighting FISHNET’s ability to uncover meaningful relationships missed by traditional methods. FISHNET software is freely available at https://doi.org/10.5281/zenodo.14765850.Competing Interest StatementThe authors have declared no competing interest.
2024
- A methodology for gene level omics-WAS integration identifies genes influencing traits associated with cardiovascular risks: the Long Life Family StudySandeep Acharya, Shu Liao, Wooseok J Jung, and 8 more authorsHuman genetics, 2024
The Long Life Family Study (LLFS) enrolled 4953 participants in 539 pedigrees displaying exceptional longevity. To identify genetic mechanisms that affect cardiovascular risks in the LLFS population, we developed a multi-omics integration pipeline and applied it to 11 traits associated with cardiovascular risks. Using our pipeline, we aggregated gene-level statistics from rare-variant analysis, GWAS, and gene expression-trait association by Correlated Meta-Analysis (CMA). Across all traits, CMA identified 64 significant genes after Bonferroni correction (p ≤ 2.8 × 10–7), 29 of which replicated in the Framingham Heart Study (FHS) cohort. Notably, 20 of the 29 replicated genes do not have a previously known trait-associated variant in the GWAS Catalog within 50 kb. Thirteen modules in Protein–Protein Interaction (PPI) networks are significantly enriched in genes with low meta-analysis p-values for at least one trait, three of which are replicated in the FHS cohort. The functional annotation of genes in these modules showed a significant over-representation of trait-related biological processes including sterol transport, protein-lipid complex remodeling, and immune response regulation. Among major findings, our results suggest a role of triglyceride-associated and mast-cell functional genes FCER1A, MS4A2, GATA2, HDC, and HRH4 in atherosclerosis risks. Our findings also suggest that lower expression of ATG2A, a gene we found to be associated with BMI, may be both a cause and consequence of obesity. Finally, our results suggest that ENPP3 may play an intermediary role in triglyceride-induced inflammation. Our pipeline is freely available and implemented in the Nextflow workflow language, making it easily runnable on any compute platform (https://nf-co.re/omicsgenetraitassociation).
2022
- Predicting which genes will respond to transcription factor perturbationsYiming Kang, Wooseok J Jung, and Michael R BrentG3 Genes|Genomes|Genetics, Jun 2022
The ability to predict which genes will respond to the perturbation of a transcription factor serves as a benchmark for our systems-level understanding of transcriptional regulatory networks. In previous work, machine learning models have been trained to predict static gene expression levels in a biological sample by using data from the same or similar samples, including data on their transcription factor binding locations, histone marks, or DNA sequence. We report on a different challenge—training machine learning models to predict which genes will respond to the perturbation of a transcription factor without using any data from the perturbed cells. We find that existing transcription factor location data (ChIP-seq) from human cells have very little detectable utility for predicting which genes will respond to perturbation of a transcription factor. Features of genes, including their preperturbation expression level and expression variation, are very useful for predicting responses to perturbation of any transcription factor. This shows that some genes are poised to respond to transcription factor perturbations and others are resistant, shedding light on why it has been so difficult to predict responses from binding locations. Certain histone marks, including H3K4me1 and H3K4me3, have some predictive power when located downstream of the transcription start site. However, the predictive power of histone marks is much less than that of gene expression level and expression variation. Sequence-based or epigenetic properties of genes strongly influence their tendency to respond to direct transcription factor perturbations, partially explaining the oft-noted difficulty of predicting responsiveness from transcription factor binding location data. These molecular features are largely reflected in and summarized by the gene’s expression level and expression variation. Code is available at https://github.com/BrentLab/TFPertRespExplainer.