RSS-Feed abonnieren

DOI: 10.1055/a-2599-3728
Big Data Analytics in Large Cohorts: Opportunities and Challenges for Research in Hepatology
Autoren

Abstract
Advances in big data analytics, precision medicine, and artificial intelligence are transforming hepatology, offering new insights into disease mechanisms, risk stratification, and therapeutic interventions. In this review, we explore how the integration of genetic studies, multi-omics data, and large-scale population cohorts has reshaped our understanding of liver disease, using steatotic liver disease as a prototype for data-driven discoveries in hepatology. We highlight the role of artificial intelligence in identifying patient subgroups, optimizing treatment strategies, and uncovering novel therapeutic targets. Furthermore, we discuss the importance of collaborative networks, open data initiatives, and implementation science in translating these findings into clinical practice. Although data-driven precision medicine holds great promise, its impact depends on structured approaches that ensure real-world adoption.
* Joint authorship.
Publikationsverlauf
Artikel online veröffentlicht:
21. Mai 2025
© 2025. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)
Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA
-
References
- 1 Rico-Uribe LA, Morillo-Cuadrado D, Rodríguez-Laso Á. et al. Worldwide mapping of initiatives that integrate population cohorts. Front Public Health 2022; 10: 964086
- 2 Kinkorová J, Topolčan O. Biobanks in the era of big data: objectives, challenges, perspectives, and innovations for predictive, preventive, and personalised medicine. EPMA J 2020; 11 (03) 333-341
- 3 Mahmud N, Goldberg DS, Bittermann T. Best practices in large database clinical epidemiology research in hepatology: barriers and opportunities. Liver Transpl 2022; 28 (01) 113-122
- 4 Cheung K-S, Leung WK, Seto W-K. Application of big data analysis in gastrointestinal research. World J Gastroenterol 2019; 25 (24) 2990-3008
- 5 Kong H-J. Managing unstructured big data in healthcare system. Healthc Inform Res 2019; 25 (01) 1-2
- 6 Schneider CV, Li T, Zhang D. et al. Large-scale identification of undiagnosed hepatic steatosis using natural language processing. EClinicalMedicine 2023; 62: 102149
- 7 Huang T, Ma L, Zhang B, Liao H. Advances in deep learning: from diagnosis to treatment. Biosci Trends 2023; 17 (03) 190-192
- 8 Far AT, Bastani A, Lee A. et al. Evaluating the positive predictive value of code-based identification of cirrhosis and its complications utilizing GPT-4. Hepatology 2024; . Epub ahead of print
- 9 Nam D, Chapiro J, Paradis V, Seraphin TP, Kather JN. Artificial intelligence in liver diseases: improving diagnostics, prognostics and response prediction. JHEP Rep Innov Hepatol 2022; 4 (04) 100443
- 10 Balsano C, Alisi A, Brunetto MR, Invernizzi P, Burra P, Piscaglia F. Special Interest Group (SIG) Artificial Intelligence and Liver Diseases; Italian Association for the Study of the Liver (AISF). The application of artificial intelligence in hepatology: a systematic review. Dig Liver Dis 2022; 54 (03) 299-308
- 11 Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA 2014; 311 (24) 2479-2480
- 12 Duan J, Xiong J, Li Y. et al. Deep learning based multimodal biomedical data fusion: an overview and comparative review. Inf Fusion 2024; 112: 102536
- 13 Brancato V, Esposito G, Coppola L. et al. Standardizing digital biobanks: integrating imaging, genomic, and clinical data for precision medicine. J Transl Med 2024; 22 (01) 136
- 14 Oikonomou EK, Thangaraj PM, Bhatt DL. et al. An explainable machine learning-based phenomapping strategy for adaptive predictive enrichment in randomized clinical trials. NPJ Digit Med 2023; 6 (01) 217
- 15 Aithal GP, Guha N, Fallowfield J, Castera L, Jackson AP. Biomarkers in liver disease: emerging methods and potential applications. Int J Hepatol 2012; 2012: 437508
- 16 Pang Y, Kartsonaki C, Lv J. et al. Adiposity, metabolomic biomarkers, and risk of nonalcoholic fatty liver disease: a case-cohort study. Am J Clin Nutr 2022; 115 (03) 799-810
- 17 Schneider KM, Cao F, Huang HYR. et al. The lipidomic profile discriminates between MASLD and MetALD. Aliment Pharmacol Ther 2025; 61 (08) 1357-1371
- 18 Raverdy V, Tavaglione F, Chatelain E. et al. Data-driven cluster analysis identifies distinct types of metabolic dysfunction-associated steatotic liver disease. Nat Med 2024; 30 (12) 3624-3633
- 19 Liu Z, Yuan H, Suo C. et al. Point-based risk score for the risk stratification and prediction of hepatocellular carcinoma: a population-based random survival forest modeling study. EClinicalMedicine 2024; 75: 102796
- 20 Deng Y-T, You J, He Y. et al. Atlas of the plasma proteome in health and disease in 53,026 adults. Cell 2025; 188 (01) 253-271.e7
- 21 Kjaergaard M, Lindvig KP, Thorhauge KH. et al. Using the ELF test, FIB-4 and NAFLD fibrosis score to screen the population for liver disease. J Hepatol 2023; 79 (02) 277-286
- 22 Reverter E, Tandon P, Augustin S. et al. A MELD-based model to determine risk of mortality among patients with acute variceal bleeding. Gastroenterology 2014; 146 (02) 412-19.e3
- 23 Loaeza-del-Castillo A, Paz-Pineda F, Oviedo-Cárdenas E, Sánchez-Avila F, Vargas-Vorácková F. AST to platelet ratio index (APRI) for the noninvasive evaluation of liver fibrosis. Ann Hepatol 2008; 7 (04) 350-357
- 24 Serra-Burriel M, Juanola A, Serra-Burriel F. et al; LiverScreen Consortium Investigators. Development, validation, and prognostic evaluation of a risk score for long-term liver-related outcomes in the general population: a multicohort study. Lancet 2023; 402 (10406): 988-996
- 25 Åberg F, Luukkonen PK, But A. et al. Development and validation of a model to predict incident chronic liver disease in the general population: The CLivD score. J Hepatol 2022; 77 (02) 302-311
- 26 Njei B, Osta E, Njei N, Al-Ajlouni YA, Lim JK. An explainable machine learning model for prediction of high-risk nonalcoholic steatohepatitis. Sci Rep 2024; 14 (01) 8589
- 27 Schneider ARP, Schneider CV, Schneider KM. et al. Early prediction of decompensation (EPOD) score: non-invasive determination of cirrhosis decompensation risk. Liver Int 2022; 42 (03) 640-650
- 28 Liu Y, Zhang J, Wang W, Li G. Development and validation of a risk prediction model for incident liver cancer. Front Public Health 2022; 10: 955287
- 29 Pazoki R, Vujkovic M, Elliott J. et al; Lifelines Cohort Study, VA Million Veteran Program. Genetic analysis in European ancestry individuals identifies 517 loci associated with liver enzymes. Nat Commun 2021; 12 (01) 2579
- 30 Vujkovic M, Ramdas S, Lorenz KM. et al; Regeneron Genetics Center, Geisinger-Regeneron DiscovEHR Collaboration, EPoS Consortium, VA Million Veteran Program. A multiancestry genome-wide association study of unexplained chronic ALT elevation as a proxy for nonalcoholic fatty liver disease with histological and radiological validation. Nat Genet 2022; 54 (06) 761-771
- 31 Buch S, Stickel F, Trépo E. et al. A genome-wide association study confirms PNPLA3 and identifies TM6SF2 and MBOAT7 as risk loci for alcohol-related cirrhosis. Nat Genet 2015; 47 (12) 1443-1448
- 32 Schneider CV, Fromme M, Schneider KM, Bruns T, Strnad P. Mortality in patients with genetic and environmental risk of liver disease. Am J Gastroenterol 2021; 116 (08) 1741-1745
- 33 Hehl L, Creasy KT, Vitali C. et al; Regeneron Genetics Center. A genome-first approach to variants in MLXIPL and their association with hepatic steatosis and plasma lipids. Hepatol Commun 2024; 8 (05) e0427
- 34 Rendel MD, Vitali C, Creasy KT. et al; Regeneron Center. The common p.Ile291Val variant of ERLIN1 enhances TM6SF2 function and is associated with protection against MASLD. Med (N Y) 2024; 5 (08) 963-980.e5
- 35 Scorletti E, Saiman Y, Jeon S. et al. A missense variant in human perilipin 2 (PLIN2 Ser251Pro) reduces hepatic steatosis in mice. JHEP Rep Innov Hepatol 2023; 6 (01) 100902
- 36 Huang HYR, Vitali C, Zhang D. et al; Regeneron Centre. Deep metabolic phenotyping of humans with protein-altering variants in TM6SF2 using a genome-first approach. JHEP Rep Innov Hepatol 2024; 7 (01) 101243
- 37 Fabbrini E, Rady B, Koshkina A. et al. Phase 1 trials of PNPLA3 siRNA in I148M homozygous patients with MAFLD. N Engl J Med 2024; 391 (05) 475-476
- 38 Harrison SA, Bedossa P, Guy CD. et al; MAESTRO-NASH Investigators. A phase 3, randomized, controlled trial of resmetirom in NASH with liver fibrosis. N Engl J Med 2024; 390 (06) 497-509
- 39 Xu K, He B-W, Yu J-L. et al. Clinical significance of serum FGF21 levels in diagnosing nonalcoholic fatty liver disease early. Sci Rep 2024; 14 (01) 25191
- 40 Loomba R, Sanyal AJ, Kowdley KV. et al. Randomized, controlled trial of the FGF21 analogue pegozafermin in NASH. N Engl J Med 2023; 389 (11) 998-1008
- 41 Vell MS, Loomba R, Krishnan A. et al. Association of statin use with risk of liver disease, hepatocellular carcinoma, and liver-related mortality. JAMA Netw Open 2023; 6 (06) e2320222
- 42 Krishnan A, Schneider CV, Hadi Y, Mukherjee D, AlShehri B, Alqahtani SA. Cardiovascular and mortality outcomes with GLP-1 receptor agonists vs other glucose-lowering drugs in individuals with NAFLD and type 2 diabetes: a large population-based matched cohort study. Diabetologia 2024; 67 (03) 483-493
- 43 Yen F-S, Hou M-C, Wei JC-C, Shih YH, Hwu CM, Hsu CC. Effects of glucagon-like peptide-1 receptor agonists on liver-related and cardiovascular mortality in patients with type 2 diabetes. BMC Med 2024; 22 (01) 8
- 44 Kanwal F, Kramer JR, Li L. et al. GLP-1 receptor agonists and risk for cirrhosis and related complications in patients with metabolic dysfunction-associated steatotic liver disease. JAMA Intern Med 2024; 184 (11) 1314-1323
- 45 Akuta N, Kawamura Y, Fujiyama S. et al. Favorable impact of long-term SGLT2 inhibitor for NAFLD complicated by diabetes mellitus: a 5-year follow-up study. Hepatol Commun 2022; 6 (09) 2286-2297
- 46 Androutsakos T, Nasiri-Ansari N, Bakasis A-D. et al. SGLT-2 inhibitors in NAFLD: expanding their role beyond diabetes and cardioprotection. Int J Mol Sci 2022; 23 (06) 3107
- 47 Jojima T, Wakamatsu S, Kase M. et al. The SGLT2 inhibitor canagliflozin prevents carcinogenesis in a mouse model of diabetes and non-alcoholic steatohepatitis-related hepatocarcinogenesis: association with SGLT2 expression in hepatocellular carcinoma. Int J Mol Sci 2019; 20 (20) 5237
- 48 Vell MS, Creasy KT, Scorletti E. et al. Omega-3 intake is associated with liver disease protection. Front Public Health 2023; 11: 1192099
- 49 Vell MS, Krishnan A, Wangensteen K. et al. Aspirin is associated with a reduced incidence of liver disease in men. Hepatol Commun 2023; 7 (10) e0268
- 50 Žigutytė L, Sorz-Nechay T, Clusmann J, Kather JN. Use of artificial intelligence for liver diseases: a survey from the EASL congress 2024. JHEP Rep Innov Hepatol 2024; 6 (12) 101209
- 51 Schattenberg JM, Chalasani N, Alkhouri N. Artificial intelligence applications in hepatology. Clin Gastroenterol Hepatol 2023; 21 (08) 2015-2025
- 52 Kalapala R, Rughwani H, Reddy DN. Artificial intelligence in hepatology- ready for the primetime. J Clin Exp Hepatol 2023; 13 (01) 149-161
- 53 Feng S, Wang J, Wang L. et al. Current status and analysis of machine learning in hepatocellular carcinoma. J Clin Transl Hepatol 2023; 11 (05) 1184-1191
- 54 Yu PLH, Chiu KW-H, Lu J. et al. Application of a deep learning algorithm for the diagnosis of HCC. JHEP Rep 2024; 7 (01) 101219
- 55 Razmpour F, Daryabeygi-Khotbehsara R, Soleimani D. et al. Application of machine learning in predicting non-alcoholic fatty liver disease using anthropometric and body composition indices. Sci Rep 2023; 13 (01) 4942
- 56 Rehman AU, Butt WH, Ali TM. et al. A machine learning-based framework for accurate and early diagnosis of liver diseases: a comprehensive study on feature selection, data imbalance, and algorithmic performance. Int J Intell Syst 2024; (01) 6111312
- 57 Wong GL-H, Hui VW-K, Tan Q. et al. Novel machine learning models outperform risk scores in predicting hepatocellular carcinoma in patients with chronic viral hepatitis. JHEP Rep Innov Hepatol 2022; 4 (03) 100441
- 58 Ghosh S, Zhao X, Alim M, Brudno M, Bhat M. Artificial intelligence applied to 'omics data in liver disease: towards a personalised approach for diagnosis, prognosis and treatment. Gut 2025; 74 (02) 295-311
- 59 Hu H, Galea S, Rosella L, Henry D. Big data and population health: focusing on the health impacts of the social, physical, and economic environment. Epidemiology 2017; 28 (06) 759-762
- 60 Bosch J, Chung C, Carrasco-Zevallos OM. et al. A machine learning approach to liver histological evaluation predicts clinically significant portal hypertension in NASH cirrhosis. Hepatology 2021; 74 (06) 3146-3160
- 61 Forlano R, Mullish BH, Giannakeas N. et al. High-throughput, machine learning-based quantification of steatosis, inflammation, ballooning, and fibrosis in biopsies from patients with nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol 2020; 18 (09) 2081-2090.e9
- 62 Saillard C, Schmauch B, Laifa O. et al. Predicting survival after hepatocellular carcinoma resection using deep learning on histological slides. Hepatology 2020; 72 (06) 2000-2013
- 63 Mandrekar P. Advancing hepatology research: excellence in open access. Hepatol Commun 2017; 1 (02) 83
- 64 Lohmöller J, Pennekamp J, Matzutt R. et al. The unresolved need for dependable guarantees on security, sovereignty, and trust in data ecosystems. Data Knowl Eng 2024; 151: 102301
- 65 Díaz-Faes AA, Llopis O, D'Este P, Molas-Gallart J. Assessing the variety of collaborative practices in translational research: an analysis of scientists' ego-networks. Res Eval 2023; 32 (02) 426-440
- 66 Peng Y, Shi J, Fantinato M, Chen J. A study on the author collaboration network in big data*. Inf Syst Front 2017; 19: 1329-1342
- 67 Rehm HL, Page AJH, Smith L. et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom 2021; 1 (02) 100029
- 68 Lim J, Sang H, Kim HI. Impact of metabolic risk factors on hepatic and cardiac outcomes in patients with alcohol- and non-alcohol-related fatty liver disease. JHEP Rep Innov Hepatol 2023; 5 (06) 100721
- 69 Gu S, Rajendiran G, Forest K. et al. Drug-induced liver injury with commonly used antibiotics in the all of us research program. Clin Pharmacol Ther 2023; 114 (02) 404-412
- 70 Khalifa A, Obeid JS, Erno J, Rockey DC. The role of artificial intelligence in hepatology research and practice. Curr Opin Gastroenterol 2023; 39 (03) 175-180
- 71 Rogal SS, Yakovchenko V, Waltz TJ. et al. Longitudinal assessment of the association between implementation strategy use and the uptake of hepatitis C treatment: year 2. Implement Sci 2019; 14 (01) 36
- 72 Karlsen TH, Sheron N, Zelber-Sagi S. et al. The EASL-Lancet Liver Commission: protecting the next generation of Europeans against liver disease complications and premature mortality. Lancet 2022; 399 (10319): 61-116
- 73 Holle R, Happich M, Löwel H, Wichmann HE. MONICA/KORA Study Group. KORA—a research platform for population based health research. Gesundheitswesen 2005; 67 (Suppl. 01) S19-S25
- 74 Winkelmann BR, März W, Boehm BO. et al; LURIC Study Group (LUdwigshafen RIsk and Cardiovascular Health). Rationale and design of the LURIC study—a resource for functional genomics, pharmacogenomics and long-term prognosis of cardiovascular disease. Pharmacogenomics 2001; 2 (1, Suppl 1): S1-S73
- 75 German National Cohort (GNC) Consortium. The German National Cohort: aims, study design and organization. Eur J Epidemiol 2014; 29 (05) 371-382
- 76 Bycroft C, Freeman C, Petkova D. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 2018; 562 (7726) 203-209
- 77 Kurki MI, Karjalainen J, Palta P. et al; FinnGen. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023; 613 (7944) 508-518
- 78 Sijtsma A, Rienks J, van der Harst P, Navis G, Rosmalen JGM, Dotinga A. Cohort Profile Update: lifelines, a three-generation cohort study and biobank. Int J Epidemiol 2022; 51 (05) e295-e302
- 79 Huppertz B, Bayer M, Macheiner T. et al. Biobank Graz: the hub for innovative biomedical research. Open J Bioresour 2016; 3: e3
- 80 Hakonarson H, Gulcher JR, Stefansson K. deCODE genetics, Inc. Pharmacogenomics 2003; 4 (02) 209-215
- 81 Cook MB, Sanderson SC, Deanfield JE. et al. Our future health: a unique global resource for discovery and translational research. Nat Med 2025; 31 (03) 728-730
- 82 Verma A, Damrauer SM, Naseer N. et al; For The Penn Medicine BioBank. The Penn Medicine BioBank: towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population. J Pers Med 2022; 12 (12) 1974
- 83 Boutin NT, Schecter SB, Perez EF. et al. The evolution of a large biobank at Mass General Brigham. J Pers Med 2022; 12 (08) 1323
- 84 Olson JE, Ryu E, Johnson KJ. et al. The Mayo Clinic Biobank: a building block for individualized medicine. Mayo Clin Proc 2013; 88 (09) 952-962
- 85 Feigelson HS, Clarke CL, Van Den Eeden SK. et al. The Kaiser Permanente Research Bank Cancer Cohort: a collaborative resource to improve cancer care and survivorship. BMC Cancer 2022; 22 (01) 209
- 86 Denny JC, Rutter JL, Goldstein DB. et al; All of Us Research Program Investigators. The “All of Us” Research Program. N Engl J Med 2019; 381 (07) 668-676
- 87 Gaziano JM, Concato J, Brophy M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 2016; 70: 214-223
- 88 Patel CJ, Pho N, McDuffie M. et al. A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey. Sci Data 2016; 3: 160096
- 89 Carey DJ, Fetterolf SN, Davis FD. et al. The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research. Genet Med 2016; 18 (09) 906-913
- 90 Ho V, Csizmadi I, Boucher BA. et al. Cohort profile: the CARTaGENE Cohort Nutrition Study (Quebec, Canada). BMJ Open 2024; 14 (08) e083425
- 91 Dummer TJB, Awadalla P, Boileau C. et al; with the CPTP Regional Cohort Consortium. The Canadian Partnership for Tomorrow Project: a pan-Canadian platform for research on chronic disease prevention. CMAJ 2018; 190 (23) E710-E717
- 92 Walters RG, Millwood IY, Lin K. et al; China Kadoorie Biobank Collaborative Group. Genotyping and population characteristics of the China Kadoorie Biobank. Cell Genom 2023; 3 (08) 100361
- 93 Nagai A, Hirata M, Kamatani Y. et al; BioBank Japan Cooperative Hospital Group. Overview of the BioBank Japan Project: study design and profile. J Epidemiol 2017; 27 (3S): S2-S8