Semin Liver Dis 2019; 39(02): 124-140
DOI: 10.1055/s-0039-1679920
Review Article
Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

Genetics of Nonalcoholic Fatty Liver Disease: From Pathogenesis to Therapeutics

Silvia Sookoian
1  Institute of Medical Research A Lanari, School of Medicine, University of Buenos Aires, Ciudad Autonoma de Buenos Aires (C1427ARN), Argentina
2  Department of Clinical and Molecular Hepatology, National Scientific and Technical Research Council (CONICET), University of Buenos Aires, Institute of Medical Research (IDIM), Ciudad Autónoma de Buenos Aires (C1427ARN), Argentina
,
Carlos J. Pirola
1  Institute of Medical Research A Lanari, School of Medicine, University of Buenos Aires, Ciudad Autonoma de Buenos Aires (C1427ARN), Argentina
3  Department of Molecular Genetics and Biology of Complex Diseases, National Scientific and Technical Research Council (CONICET), University of Buenos Aires, Institute of Medical Research (IDIM), Ciudad Autonoma de Buenos Aires (C1427ARN), Argentina
› Institutsangaben
FundingAgencia Nacional de Promoción Científica y Tecnológica, Fondo para la Investigación Científica y Tecnológica (Fon-CyT) (PICT 2014-0432 and PICT 2015-0551 to S.S. and PICT 2014-1816 and PICT 2016-0135 to C.J.P.).
Weitere Informationen

Address for correspondence

Silvia Sookoian, MD, PhD
Instituto de Investigaciones Médicas (UBA-CONICET), Combatientes de Malvinas 3150
CABA- C1427ARN
Argentina   
Carlos J. Pirola, PhD
Instituto de Investigaciones Médicas (UBA-CONICET), Combatientes de Malvinas 3150
CABA- C1427ARN
Argentina   

Publikationsverlauf

Publikationsdatum:
25. März 2019 (eFirst)

 

Abstract

Here, the authors review the remarkable genetic discoveries that have illuminated the biology of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH). The authors integrate genes associated with NAFLD and NASH into regulatory pathways to elucidate the disease pathogenesis. They review the evidence for molecular mediators of chronic liver damage, which suggests that convergent pathophenotypes, including inflammation and fibrosis, share common genetic modifiers. They further demonstrate that genes involved in the genetic susceptibility of NAFLD and NASH participate in cross-phenotype associations with diseases of the metabolic syndrome, including type 2 diabetes, obesity, and cardiovascular disease. However, immune-related loci associated with NAFLD and NASH exhibit some level of pleiotropy influencing disparate phenotypes, such as premature birth or sepsis. They finally focus on the translation of current genetic knowledge of NAFLD and NASH toward precision medicine. They provide evidence of genetic findings that can be leveraged to identify therapeutic targets.


#

Nonalcoholic fatty liver disease (NAFLD) is a condition manifested by an abnormal accumulation of fat in the liver, which can present signs of hepatocyte injury and chronic damage, such as those that characterize nonalcoholic steatohepatitis (NASH).[1] The disease can progress into severe clinical forms, including NASH-fibrosis, cirrhosis, and even hepatocellular carcinoma (HCC).[1]

Epidemiological observations derived from population-based studies,[2] [3] familial aggregation studies,[4] [5] and twin studies[6] [7] have long provided evidence that NAFLD is at some level a heritable trait. NAFLD has been observed in a cluster of families, whereby Schwimmer et al found that fatty liver is significantly more common in siblings (59%) and parents (78%) of children with NAFLD.[4] The heritability estimates of NAFLD range from 20 to 70%, depending on the study design and diagnostic approaches used in determining the liver phenotype.[2] [3] [4] [5] [6]

Variants of over 100 loci have been explored in candidate–gene association studies (see [Table 1]). Findings yielded by these studies have generated plausible evidence indicating that several loci are involved in the genetic susceptibility of NAFLD, including nuclear receptors, transcription factors that regulate lipid- and carbohydrate-related biosynthetic processes, inflammatory response, and fibrogenesis.[8] [9] [10] Nevertheless, authors of a large majority of candidate–gene studies on NAFLD and NASH have failed to convincingly demonstrate a robust causal relationship between the associated variant and the disease. This could be explained by the limited number of functional mechanistic studies designed to test the hypotheses driving the investigations, or simply by the lack of statistical power.

Table 1

Training gene list based on published evidence of the genetic component of NAFLD and NASH

Gene symbol (gene description)

RPL13AP7 (ribosomal protein L13a pseudogene 7)

ABCB11 (ATP binding cassette subfamily B member 11)

ACSL4 (acyl-CoA synthetase long chain family member 4)

ACTR5 (ARP5 actin related protein 5 homolog)

ADIPOQ (adiponectin, C1Q and collagen domain containing)

ADIPOR1 (adiponectin receptor 1)

ADIPOR2 (adiponectin receptor 2)

ADRB2 (adrenoceptor β 2)

ADRB3 (adrenoceptor β 3)

AGTR1 (angiotensin II receptor type 1)

APOC3 (apolipoprotein C3)

APOE (apolipoprotein E)

ARHGEF40 (Rho guanine nucleotide exchange factor 40)

C1orf94 (chromosome 1 open reading frame 94)

CACNA2D1 (calcium voltage-gated channel auxiliary subunit α2delta 1)

CD14 (CD14 molecule)

CDH2 (cadherin 2)

CFTR (cystic fibrosis transmembrane conductance regulator)

CLOCK (clock circadian regulator)

CNTN5 (contactin 5)

COL13A1 (collagen type XIII α 1 chain)

CRACR2A (calcium release activated channel regulator 2A)

CYP2E1 (cytochrome P450 family 2 subfamily E member 1)

DCLK1 (doublecortin like kinase 1)

DGAT1 (diacylglycerol O-acyltransferase 1)

DGAT2 (diacylglycerol O-acyltransferase 2)

DYSF (dysferlin)

EHBP1L1 (EH domain binding protein 1 like 1)

ENPP1 (ectonucleotide pyrophosphatase/phosphodiesterase 1)

ETS1 (ETS proto-oncogene 1, transcription factor)

FABP2 (fatty acid binding protein 2)

FARP1 (FERM, ARH/RhoGEF and pleckstrin domain protein 1)

FDFT1 (farnesyl-diphosphate farnesyltransferase 1)

GATAD2A (GATA zinc finger domain containing 2A)

GC (GC, vitamin D binding protein)

GCKR (glucokinase regulator)

GCLC (glutamate-cysteine ligase catalytic subunit)

HFE (homeostatic iron regulator)

HS3ST1 (heparan sulfate-glucosamine 3-sulfotransferase 1)

HSD17B13 (hydroxysteroid 17-β dehydrogenase 13)

IL18RAP (interleukin 18 receptor accessory protein)

IL1B (interleukin 1 β)

IL6 (interleukin 6)

IRS1 (insulin receptor substrate 1)

KHDRBS3 (KH RNA binding domain containing, signal transduction associated 3)

KLF6 (Kruppel-like factor 6)

LCP1 (lymphocyte cytosolic protein 1)

LEPR (leptin receptor)

LINC00322 (long intergenic nonprotein coding RNA 322)

LIPC (lipase C, hepatic type)

PRG1 (p53-responsive gene 1)

LTBP3 (latent transforming growth factor β binding protein 3)

LYPLAL1 (lysophospholipase like 1)

MACROD2 (MACRO domain containing 2)

MBOAT7 (membrane bound O-acyltransferase domain containing 7)

MC4R (melanocortin 4 receptor)

MIF (macrophage migration inhibitory factor)

MTCYBP22 (mitochondrially encoded cytochrome b pseudogene 22)

MTHFR (methylenetetrahydrofolate reductase)

MTTP (microsomal triglyceride transfer protein)

MUM1 (melanoma associated antigen (mutated) 1)

NCAN (neurocan)

NFIC (nuclear factor I C)

NGF (nerve growth factor)

NR1I2 (nuclear receptor subfamily 1 group I member 2)

OTX2P1 (orthodenticle homeobox 2 pseudogene 1)

PALLD (palladin, cytoskeletal associated protein)

PARVB (parvin β)

PBX2P1 (PBX homeobox 2 pseudogene 1)

PDGFA (platelet derived growth factor subunit A)

PEMT (phosphatidylethanolamine N-methyltransferase)

PNPLA3 (patatin like phospholipase domain containing 3)

PPARA (peroxisome proliferator-activated receptor α)

PPARG (peroxisome proliferator-activated receptor gamma)

PPARGC1A (PPARG coactivator 1 α, PGC-1a)

PPP1R3B (protein phosphatase 1 regulatory subunit 3B)

PTGS2 (prostaglandin-endoperoxide synthase 2)

PTPRU (protein tyrosine phosphatase, receptor type U)

PZP (PZP, α-2-macroglobulin like)

RAB37 (RAB37, member RAS oncogene family)

SAMM50 (SAMM50 sorting and assembly machinery component)

SDK1 (sidekick cell adhesion molecule 1)

SEL1L3 (SEL1L family member 3)

SERPINA1 (serpin family A member 1)

SLC38A8 (solute carrier family 38 member 8)

SLC46A3 (solute carrier family 46 member 3)

SLC9A9 (solute carrier family 9 member A9)

SOD2 (superoxide dismutase 2)

SPINK1 (serine peptidase inhibitor, Kazal type 1)

ST8SIA1 (ST8 α-N-acetyl-neuraminide α-2,8-sialyltransferase 1)

STAT3 (signal transducer and activator of transcription 3)

TCF7L2 (transcription factor 7 like 2)

TEX36 (testis expressed 36)

TLR4 (toll like receptor 4)

TM6SF2 (transmembrane 6 superfamily member 2)

TMEM56 (transmembrane protein 56)

TNF (tumor necrosis factor)

TNFSF10 (TNF superfamily member 10)

TRAPPC9 (trafficking protein particle complex 9)

UCP1 (uncoupling protein 1)

UGT1A1 (UDP glucuronosyltransferase family 1 member A1)

YIPF1 (Yip1 domain family member 1)

ZNF512 (zinc finger protein 512)

ZP4 (zona pellucida glycoprotein 4)

Abbreviations: NAFLD, nonalcoholic fatty liver disease; NASH, nonalcoholic steatohepatitis.


Conversely, discoveries of variants of three genes (PNPLA3-rs738409, TM6SF2-rs58542916, and glucokinase regulator gene [GCKR]-rs780094 or GCKR-rs1260326) that regulate metabolic traits have been driven by genome-wide approaches,[2] [3] [11] [12] including genome-wide association (GWAS) and exome-wide association (EWAS) studies. The association between these gene variants with the risk of NAFLD in cohorts of diverse ethnical backgrounds around the world was demonstrated by several authors.[13] [14] [15] In addition, variants in these genes have been associated with the risk of NASH and histological features of the disease severity, including liver fibrosis.[13] [16] [17] [18]

Variants in additional loci, including a missense (p.Gly17Glu, rs641738 C/T) variant in exon 1 of transmembrane channel-like 4 (TMC4)/intergenic-downstream of membrane-bound O-acyltransferase domain-containing 7 (MBOAT7), have been associated with a modest risk of NAFLD and NASH in Italian population.[19] However, this association could not be replicated in populations of other ethnicities.[20] [21] [22]

More recently, a study that involved the analysis of exome-sequence data coupled to electronic health records of 46,455 patients taking part in a large collaborative study revealed a loss-of-function variation in hydroxysteroid 17-β dehydrogenase 13 (HSD17B13) gene that confers protection against chronic liver injury and mitigates progressive NASH among European Americans.[23]

From the histopathologic point of view, NAFLD refers to potentially progressive lesions ranging from isolated steatosis (NAFL) to NASH[1] as explained above. It is then reasonable to hypothesize that NAFL, NASH, and NASH-fibrosis share genetic modifiers. In fact, variants in locus influencing the risk of NAFLD, including PNPLA3, TM6SF2, and GCKR, contribute to the risk of NASH as well. For example, the rs738409 presents a significant effect not only on liver fat accumulation (GG homozygous carriers show 73% higher lipid fat content when compared with CC homozygous) but also on the susceptibility of a more aggressive disease (GG homozygous carriers have 3.24-fold greater risk of higher inflammatory scores and 3.2-fold greater risk of developing fibrosis when compared with CC homozygous).[13]

While single nucleotide polymorphisms (SNPs) currently known as involved in the genetic risk of NAFLD cannot distinguish between isolated steatosis and NASH, some of these variants can inform the chances of presenting a more advanced disease. For example, NASH is 3.5-fold more frequently observed in GG homozygous than in CC homozygous carriers.[13]

The questions arise as to why much of the genetic variants do not allow us to differentiate NAFL from progressive NASH. Many explanations for this question can be suggested, which vary from the assumption that there is no NASH without NAFL, and consequently variants influencing the risk of NAFL directly or indirectly affect the predisposition to NASH to a more pragmatic point of view that questions the designs of genetic studies, including imprecise phenotyping and the use of controls of uncertain comparability. Certainly, the evidence suggests that some factors either genetic or environmental should affect the progression of NAFL to NASH if they are stages of a single disease. These factors are likely related with the inflammatory response and fibrogenesis process.

The Missing Heritability of NAFLD and NASH

According to the available evidence, the effect of variants uncovered from GWAS or EWAS[2] [3] [11] [12] [23] explains a small portion of the disease variance. In fact, variants in the loci mentioned above explain up to approximately 10% of the variance NAFLD-related phenotypes. Hence, it is clear that knowledge of the phenotypic variance of NAFLD and NASH, which stems from the interaction between the genetic component and environmental sources, is still lacking, resulting in what is known as missing heritability.

The missing heritability of NAFLD and NASH, like many other common diseases, includes a complex spectrum of factors that remain poorly explored; some of them are illustrated in [Fig. 1]. Mapping the genetic component of NAFLD and NASH should include not only the search for rare variants, which probably would have substantial effect/s on the phenotype but the exploration of structural variation, for example, copy number variants. Given the role of mitochondria on the physiology of the disease, it is also worthwhile to characterize the genetic diversity of the mitochondrial deoxyribonucleic acid (mtDNA).

Zoom Image
Fig. 1 Missing heritability of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH). There is considerable disparity in the magnitude of heritability estimates of NAFLD and NASH and the proportion of variance explained by single nucleotide polymorphisms (SNPs) uncovered from genome-wide association study (GWAS), exome-wide association study (EWAS), and candidate–gene association studies. A significant proportion of the disease burden could be explained by the missing heritability, which cover not only genetic and epigenetic modifiers but also the interaction with environmental exposure as well as with a highly interconnected and dynamic network of factors, including the microbiome.[115] [116] The genetic component of NAFLD and NASH may be potentially explained by undiscovered rare variants, structural variation, including copy number variation, variants in micro-ribonucleic acids (miRNAs) and long noncoding RNAs (lncRNAs), and expression of quantitative trait loci (eQTLs). The allelic architecture of the human genome that substantially varies according to the different ethnic groups plays an important role. Variation across populations might explain differences in the prevalence and severity of the disease across different ethnic groups. Epigenetic factors include not only deoxyribonucleic acid (DNA) and histone methylation but also chromatic remodeling and nonprotein coding RNAs. Epigenetic inheritance also involves modifications of the histone code, including histone acetyltransferases (HAT) and deacetylases (HDAC). Abbreviations: circRNA, circular RNA; linRNAs, long intergenic RNA; NAT, natural antisense transcript; piRNA, PIWI-interacting RNA; snRNA, small nuclear RNA; snoRNA, small nucleolar RNA.

Novel evidence on the variability of the mtDNA genome suggests a significant role of mitochondrial genetics in the pathogenesis of NAFLD and the natural history of the disease.[24] A comprehensive exploration of the complete liver mtDNA-mutation spectrum in patients with NAFLD and in different stages of the disease showed that NAFLD is associated with increased liver mtDNA mutational burden, including point mutations in genes of the oxidative phosphorylation.[24] In addition, patients with advanced fibrosis had an overall enrichment of 1.4-fold mutation rate compared with those in whom fibrosis was mild or absent.[24] The accumulation of liver mtDNA-polymorphic sites in subunits of the OXPHOS was paralleled by the emergence of an OXPHOS-deficient phenotype.[24] Specifically, profiling of liver OXPHOS-gene and protein expression provided evidence of liver mtDNA mutational rate impacts on mitochondrial function.[24] These observations are supported by several studies that highlighted the importance of mitochondrial homeostasis in the pathogenesis of NAFLD and the disease progression into NASH and NASH-fibrosis.[25] [26] [27]

It is expected that variants in micro-ribonucleic acids (miRNAs), which may affect miRNA function, would account for a sizeable proportion of the disease risk and/or the association of NAFLD with comorbidities. For instance, rs41318021, a miR-122-related sequence genetic variation in the 3′ untranslated region of the l-arginine transporter gene (SLC7A1) was associated with arterial hypertension in patients with NAFLD.[28] Across the disease spectrum, variants in long noncoding RNAs (lncRNAs) have been shown to cover a portion of genetic component as well.[29] A survey of genetic variation associated with lncRNA-genomic regions uncovered the rs2829145 A/G located in a lncRNA (lnc-JAM2–6), which was associated with NAFLD and the disease severity. Moreover, prediction of regulatory elements in lnc-JAM2–6 indicated potential sequence-specific binding motifs of oncogenes MAF bZIP transcription factor K (MAFK) and JunD proto-oncogene, AP-1 transcription factor subunit (JUND), as well as transcription factors involved in inflammatory response.[29] Results from a pilot GWAS on NAFLD showed that intergenic or intron variants with predicted functionality in lncRNAs might be associated with steatosis, lobular inflammation, and liver fibrosis.[30] SNPs in LYPLAL1 (rs12137855), PPP1R3B (rs4240624), and TRIBI (rs2954021), of which the predicted functionality in the corresponding loci is a long lincRNA, were associated with liver fat content.[3]

The concept of heritability is derived from a mathematical calculation involving three phenotype variance sources: genetics (G), environment (E), and individual (noise). Hence, a single number (G) represents the fraction of variation between individuals in a population that is due to their genetic background. Unfortunately, knowledge of the G × E interaction in the biology of NAFLD and NASH remains largely unexplored ([Fig. 1]). There is, however, remarkable evidence showing an interaction effect of variants in genes predisposing an individual to NAFLD, specifically variants in PNPLA3, TM6SF2, and GCKR and adiposity.[31] The greatest effects were observed for the interaction between rs738409-G risk allele and obesity, which was found to affect the entire spectrum of NAFLD, from steatosis, to steatohepatitis, to end-stage liver disease, as well as liver enzyme levels.[31] For example, in persons homozygous for the G-allele compared with homozygous CC, the risk of progression to cirrhosis varies from 2.4-fold among lean (body mass index [BMI] < 25 kg/m2) to 5.8-fold among obese (BMI > 35 kg/m2) subjects.[31] While interaction effects between obesity and variants of TM6SF2 (rs58542926) and GCKR (rs1260326) were also observed, the magnitude of the effect and the impact on the disease spectrum are not only much lesser, but are rather confined to the amount of liver fat deposition.[31] These results suggest that an excessive caloric intake is more determinant in individuals at genetic risk, as expected.

It remains uncertain whether the gene–environment interactions mentioned above are limited to European ancestry populations. Hence, a note of caution must be added as the allelic architecture of the human genome substantially varies according to the different ethnic groups. A detailed explanation of population genetics of variants in locus of interest can be found in the International Genome Sample Resource (http://www.internationalgenome.org/), which includes data generated by the 1000 Genomes Project (Africa, American, East Asian, European, and South Asian population). An important consideration is that the large majority of genetic studies of NAFLD and NASH remain mostly limited to Caucasian population. Further studies in non-Europeans, which are clearly underrepresented in the big studies currently available, might yield intriguing results, in particular regarding extreme genotypes or rapidly progressive forms of the disease.[32] [33]

Although the concept is beyond this review, it is worthy to note that a noticeable portion of the missing heritability of NAFLD may be explained by epigenetic factors,[25] [34] [35] [36] [37] [38] which may change gene expression by modifying accessibility of transcription machinery to chromatin. Among the most prominent epigenetic factors are DNA methylation and histone covalent modifications. We found that DNA methylation not only of the nuclear but mitochondrial genome loci is associated with NAFLD pathophenotypes.[25] [37] [39]

Most importantly, knowledge is lacking regarding a broad range of not sufficiently studied interactions, which are not limited to G × E, gene–gene (G × G), and genotype–phenotype (G × P) but other interactions that could explain the variance of the disease ([Fig. 1]). It is plausible to presume that the NAFLD–NASH heritability gap might be explained by the intricate relationship among genetic variance of the nuclear and mitochondrial genome, the phenotype, and the yet unexplored interactions with epigenetic and environmental factors, including the microbiome. Future explorations into this interaction network could help unravel the missing heritability of NAFLD.


#

Genetic Knowledge of NAFLD: Integrated Pathways of Disease Pathogenesis

Genes associated with a given disease often provide clues on its pathogenesis and mechanisms of tissue-associated damage. For example, variants in PNPLA3 [40] [41] [42] [43] and TM6SF2 [12] [44] [45] have been functionally profiled to confirm a putative relationship with and a responsible effect on the variability of liver fat content. When their findings are interpreted jointly, the studies highlighted above yield insights into the significance of liver fat composition and lipid droplet biology and dynamics, as well as patterns of liver fat mobilization, in the pathogenesis of NAFLD.

Despite the wealth of knowledge from early hypothesis-driven genetic studies and genome-wide investigations, the precise mechanisms that explain the variability of the NAFLD phenotype are not fully understood. We also lack an understanding of the precise processes that govern the disease progression, as well as the molecular mechanisms associated with the degree of disease severity.

To offer a framework for overcoming these limitations, we used a tool that expands annotation details of genes/proteins to perform an integrative analysis. Specifically, we integrated the genes/loci discovered either via candidate–gene association studies or genome-wide investigations into the Protein ANalysis THrough Evolutionary Relationships (PANTHER) database (http://pantherdb.org). PANTHER contains comprehensive information on the evolution and function of protein-coding genes from Homo sapiens to a wide range of completely sequenced genome. The training set of genes is shown in [Table 1]. This list includes a search in the GWAS Catalog (https://www.ebi.ac.uk/gwas/) using the “Nonalcoholic fatty liver disease” search string, as well as genes that have been associated with the genetic risk of NAFLD and NASH.[8] [10] [46] [47] [48] We used the Gene Ontology (GO) data set to infer and integrate information pertinent to biological process of all genes listed in [Table 1]. The top ranked GO biological processes were adipokinetic hormone receptor activity (GO:0097003), adiponectin binding (GO:0055100), retinol O-fatty-acyltransferase activity (GO:0050252), and β-adrenergic receptor activity (GO:0004939) that presented a > 100-fold change enrichment (see [Table 2] for the complete list that includes p-values and fold changes). Overrepresentation and enrichment tests based on Reactome pathways highlighted signaling to signal transducer and activator of transcription 3 (STAT3) (R-HSA-198745) (> 100-fold change), acyl chain remodeling of diacylglycerol (DAG) and triacylglycerol (TAG) (R-HSA-1482883), adenosine monophosphate-activated protein kinase-mediated chREBP transcriptional activation and caspase activation (R-HSA-163680 and R-HSA-140534, respectively), and chylomicron-mediated lipid transport (R-HSA-174800) as significantly enriched (see [Table 2] for the complete list). As a result, we may infer that the pathogenesis of NAFLD and NASH is heavily mediated not only by processes associated with TAG and DAG remodeling, but also with hepatocyte response to interleukins/cytokines/adipokines, cell-death immune-mediated pathways, and acute-phase protein genes.

Table 2

Integrated pathways of disease pathogenesis: Gene Ontology (GO) molecular function and Reactome prediction of NAFLD-predisposing genes

Annotation data set

Fold enrichment

Raw p-value

FDR

GO molecular function (GO annotation number)

Adipokinetic hormone receptor activity (GO:0097003)

> 100

1.51E-04

3.91E-02

Adiponectin binding (GO:0055100)

> 100

2.51E-04

5.31E-02

Retinol O-fatty-acyltransferase activity (GO:0050252)

> 100

2.51E-04

5.08E-02

Beta-adrenergic receptor activity (GO:0004939)

> 100

2.51E-04

4.87E-02

Long-chain fatty acid binding (GO:0036041)

45.38

6.72E-05

2.61E-02

Acylglycerol O-acyltransferase activity (GO:0016411)

27.12

2.26E-05

1.75E-02

Fatty acid binding (GO:0005504)

22.47

4.44E-05

2.30E-02

RNA polymerase II repressing transcription factor binding (GO:0001103)

21.26

5.43E-05

2.53E-02

O-acyltransferase activity (GO:0008374)

15.73

1.61E-04

3.95E-02

Nuclear receptor activity (GO:0004879)

15.73

1.61E-04

3.75E-02

Transcription factor activity, direct ligand regulated sequence-specific DNA binding (GO:0098531)

15.73

1.61E-04

3.57E-02

Monocarboxylic acid binding (GO:0033293)

14.9

3.01E-05

1.75E-02

Carboxylic acid binding (GO:0031406)

7.25

6.50E-05

2.75E-02

Organic acid binding (GO:0043177)

6.85

9.14E-05

2.66E-02

Cytokine receptor binding (GO:0005126)

5.6

1.12E-04

3.06E-02

Lipid binding (GO:0008289)

3.92

1.52E-05

1.76E-02

Identical protein binding (GO:0042802)

2.99

2.03E-07

9.42E-04

Protein dimerization activity (GO:0046983)

2.91

2.70E-05

1.79E-02

Signaling receptor binding (GO:0005102)

2.7

1.10E-05

1.70E-02

Molecular function regulator (GO:0098772)

2.51

2.22E-05

2.07E-02

Enzyme binding (GO:0019899)

2.45

6.17E-06

1.44E-02

Protein binding (GO:0005515)

1.33

8.31E-05

2.58E-02

Molecular function (GO:0003674)

1.15

6.75E-05

2.42E-02

Reactome pathway (identifier number)

Signaling to STAT3 (R-HSA-198745)

> 100

2.51E-04

4.54E-02

Acyl chain remodeling of DAG and TAG (R-HSA-1482883)

84.28

1.47E-05

7.33E-03

AMPK inhibits chREBP transcriptional activation activity (R-HSA-163680)

73.75

2.02E-05

6.69E-03

Ligand-dependent caspase activation (R-HSA-140534)

42.14

8.13E-05

1.80E-02

Chylomicron-mediated lipid transport (R-HSA-174800)

25.65

3.01E-04

4.28E-02

Caspase activation via extrinsic apoptotic signaling pathway (R-HSA-5357769)

24.58

3.37E-04

4.19E-02

Transcriptional regulation of white adipocyte differentiation (R-HSA-381340)

12.61

6.40E-05

1.59E-02

Lipid digestion, mobilization, and transport (R-HSA-73923)

11.8

1.65E-05

6.56E-03

Glycerophospholipid biosynthesis (R-HSA-1483206)

9.02

2.86E-04

4.74E-02

PPARA activates gene expression (R-HSA-1989781)

8.94

2.98E-04

4.56E-02

Regulation of lipid metabolism by Peroxisome proliferator-activated receptor α (PPARalpha) (R-HSA-400206)

8.78

3.23E-04

4.28E-02

Metabolism of vitamins and cofactors (R-HSA-196854)

8.08

1.24E-04

2.47E-02

Fatty acid, triacylglycerol, and ketone body metabolism (R-HSA-535734)

6.99

2.46E-05

6.99E-03

Metabolism of lipids and lipoproteins (R-HSA-556833)

5.25

4.26E-09

4.24E-06

Metabolism (R-HSA-1430728)

3.2

2.10E-09

4.17E-06

Abbreviations: DAG, diacylglycerol; FDR, false discovery rate; NAFLD, nonalcoholic fatty liver disease; NASH, nonalcoholic steatohepatitis; TAG, triacylglycerol.


Note: Enrichment analysis was performed by the PANTHER software available at http://pantherdb.org/;[120] analysis type: PANTHER Overrepresentation Test (Released December 5, 2017). Annotation version: PANTHER version 13.1 and Reactome version 58.


Statistical analysis: Fisher's exact test with false discovery rate (FDR) multiple test correction. Analyzed list: training set was the list of genes associated with NAFLD or NASH in candidate gene association studies or genome-wide approaches (see [Table 1]).


Reference list: Homo sapiens (all genes in the database).



#

Shared Molecular Regulatory Pathways of Chronic Liver Damage

Chronic liver diseases, particularly NAFLD and alcoholic liver disease (ALD), share the pathogenic pathways and mechanisms.[49] [50] [51] [52] There are also consistent similarities in the pathogenesis of complex cholestatic disorders, including primary biliary cholangitis (formerly known as primary biliary cirrhosis) and primary sclerosing cholangitis.[53] [54] [55] Furthermore, chronic liver damage is associated with conserved pathogenic mechanisms, in particular hepatocyte cell death pathways, inflammatory processes that involve immune response, and fibrogenesis.[52] [56]

It is therefore biologically plausible to presume that genetic predisposition of convergent pathophenotypes, specifically liver inflammation and fibrosis, is similar, as discussed later.

An interesting example of that is the PNPLA3-rs738409 (I148M) variant that was initially discovered in a GWAS of NAFLD. Subsequently, evidence of its involvement in the susceptibility of cirrhosis and end-stage liver disease of patients with ALD emerged, including the development of alcohol-related cirrhosis[57] [58] [59] and HCC.[60] [61] In addition, patients homozygous for the risk-G allele of the rs738409 variant seem to be more susceptible to developing severe alcoholic hepatitis, while also having a greater risk of poor survival rates.[62] [63] Summarized evidence also suggests an involvement of the rs738409 variant in the risk and severity of chronic hepatitis C.[64]

These remarkable observations suggest that the rs738409 (directly by a cis or trans eQTL effect of PNPLA3 gene and/or by coding protein isoforms with diverse functions) might have a causal role in inflammation, fibrosis, and hepatocarcinogenesis. In vitro studies showed that PNPLA3 is required for hepatic stellate cell (HSC) activation and that the rs738409 G variant potentiates the profibrogenic features of HSCs.[65] Although findings yielded by previous studies on PNPLA3 protein regulation indicate that the adiponutrin protein-family exhibits phospholipase but not retinyl esterase activity,[66] [67] some evidence suggests that the rs738409 variant may be involved in retinol release.[68] On the other hand, recent in vitro studies showed that overexpression of PNPLA3-Met148 variant is associated with 1.75-fold increase in lactic acid, suggesting a shift of cellular response toward anaerobic metabolism and mitochondrial dysfunction.[69] This particular metabolic profile has also been observed in patients with NASH.[70] Furthermore, PNPLA3 silencing has been associated with global metabolic perturbations that resemble a catabolic response associated with protein breakdown.[69] These metabolic changes may support the involvement of PNPLA3 in broader metabolic functions in the liver.

More recently, the splice variant rs72613567 in the HSD17B13 gene was found to protect patients with chronic liver disease, including NAFLD and ALD, from severe and progressive damage, regardless of the etiology.[23] These findings were replicated in two recent studies.[71] [72]

There are other liver-related traits, such as serum aminotransferase levels, of which the genetic component of variability is highly influenced by the aforementioned variants, irrespective of the underlying cause of liver disease.[23] [73]

While shared biology and genetics might explain the pathogenesis of chronic liver damage, the magnitude of the loci that are potentially involved in shared mechanisms is unknown. Based on the available evidence, PNPLA3 and probably TM6SF2 could explain commonality in pathogenic pathways of metabolic liver disease. However, some interesting observations suggest that other genes might potentially influence the shared mechanisms of liver damage. For example, variants in nuclear receptor subfamily 1, group I, member 2 (nuclear pregnane X receptor) have been associated with NAFLD predisposition[74] and with drug clearance and drug-induced liver injury.[75]

Variants/mutations in homeostatic iron regulator (HFE), a membrane protein that is similar to major histocompatibility complex class I-type proteins and that is involved in iron storage disorder, have been associated not only with hereditary hemochromatosis[76] but also with an increased risk of HCC in patients with alcoholic cirrhosis.[77] HFE variants have been also involved in the susceptibility of NAFLD,[78] [79] although findings yielded by a systematic review of available data does not support this association.[47]

A missense (p.Glu366Lys, also known as PI*Z) variant in SERPINA1 (serpin family A member 1) gene that is known as a predisposing factor for developing α-1-antitrypsin deficiency[80] has been recently associated with the risk of cirrhosis in NAFLD and alcohol misuse.[81]


#

NAFLD Genes and Pleiotropy: Cross-Associations between NAFLD-Predisposing Genes and Phenotypes of the Metabolic Syndrome

Genome-wide association of complex diseases have demonstrated that a large number of SNPs are implicated in the susceptibility of multiple—not necessarily related—traits. The effect of one gene on different phenotypes is known as pleiotropy.[82] [83] While the concept of pleiotropy has been largely confined to the field of evolutionary biology, it become evident during the past 10 years, when genome-wide approaches revealed cross-phenotype associations among a broad category of complex traits.[84] In fact, it is estimated that approximately 4.6% of SNPs discovered by GWAS show pleiotropic effects,[84] and 44% of genes reported in the GWAS Catalog are associated with more than one phenotype.[85]

Gene-based connectivity network based on gene/protein cooccurrence suggests genetic commonality between NAFLD and features of the metabolic syndrome (MetS), specifically obesity, type 2 diabetes, and arterial hypertension.[86] For example, a rare nonsense (rs149847328, p.Arg227Ter) mutation in GCKR was associated with a rapidly progressive clinical form of NASH, which might be the first rare genetic form of the disease.[33] Interestingly, GCKR is considered a susceptibility gene for a form of maturity-onset diabetes of the young.[87]

There are, however, paradoxical examples of alleles that impart risk of developing NAFLD but are protective against phenotypes that are closely related with the disease, including cardiovascular disease (CVD). For example, carriers of the minor T allele (EK + KK) of the TM6SF2 E167K (rs58542926 C/T) variant are protected from CVD, including myocardial infarction,[88] and show low levels of total plasma cholesterol, low-density lipoprotein cholesterol, and tryglicerides.[12] [14] [18] [44] [89] At the same time, the minor-T allele of rs58542926 is a risk factor for NAFLD and NASH.[12] [14] [17] [18] [89]

Together, these observations highlight the concept of shared genetic basis of diverse phenotypes. This assumption not only fits into biologically meaningful associations, for example, immune-mediated and/or metabolic diseases, but also traits/diseases that a priori present a certain level of dissimilarity in their pathogenic mechanisms.

We explored the extent of pleiotropy of loci known to be associated with the genetic risk of NAFLD and NASH. This exploration was performed by literature-enrichment analysis offered by the Genset2Diseases (GS2D) Web server (http://cbdm.uni-mainz.de/geneset2diseases)—a tool that computes associations of genes with diseases using biomedical literature annotations.[90] GS2D algorithm prioritizes all human genes according to their relation to a biomedical topic using all available scientific abstracts and orthology information.[90]

As expected, we found that a high proportion of NAFLD-related genes (listed in [Table 1]) are also involved in the pathogenesis of phenotypes of the MetS ([Table 3]). Examples of shared NAFLD and MetS-related loci include PPARGC1A, a master regulator of carbohydrates and fat metabolism and mitochondrial function that has been associated with NAFLD, insulin resistance, and liver mitochondrial copy number,[25] as well as with cardiac development[91] and cardiac disease.[92] Another example is clock circadian regulator (CLOCK) that has been linked to MetS in rodents[93] and in humans, specifically obesity[94] [95] and NAFLD.[96]

Table 3

Extent of pleiotropy in NAFLD-predisposing genes

Disease

Genes count

Fold change

p-Value

FDR

Genes# (numbers of times that relevant citations regarding each gene appear in biomedical literature)

Metabolic syndrome

21

10.15

0.000 e+00

0.000e+00

ADRB29, ADRB322, AGTR110, APOC320, APOE35, FABP213, GCKR10, IRS18, LEPR20, LIPC8, MC4R9, MTTP7, ENPP111, PPARG51, UCP15, ADIPOQ163, CLOCK7, PPARGC1A8, ADIPOR17, ADIPOR25, PNPLA310

Insulin resistance

22

6.08

5.038e-12

9.236e−11

ADRB335, APOC315, FABP228, GCKR8, IL653, IRS199, LEPR30, LIPC10, MC4R8, MTTP8, ENPP143, PPARA10, PPARG141, TCF7L248, TNF101, UCP19, DGAT15, ADIPOQ200, PPARGC1A28, ADIPOR131, ADIPOR228, PNPLA321

Hypertriglyceridemia

7

14.31

5.279e−07

5.807e−06

APOC338, APOE36, FABP26, GCKR8, LIPC8, PPARA8, ADIPOQ6

Morbid obesity

8

10.85

6.807e−07

6.240e−06

ADRB38, LEPR13, MC4R19, PPARG18, UCP17, ADIPOQ24, PPARGC1A9, PNPLA36

Dyslipidemias

8

10.32

9.975e−07

7.837e−06

ADRB25, APOC317, APOE39, GCKR6, LIPC9, PPARA13, PPARG11, ADIPOQ11

Alcoholic liver diseases

5

26.58

1.018e−06

6.999e−06

CD146, CYP2E110, HFE9, TNF10, PNPLA38

Obesity

21

3.08

2.624e−06

1.604e−05

ADRB281, ADRB399, FABP227, GCKR11, IRS131, LEPR152, LIPC21, MC4R199, ENPP138, PPARA26, PPARG174, TCF7L243, UCP146, DGAT16, ADIPOQ200, CLOCK16, PPARGC1A28, ADIPOR121, ADIPOR213, PNPLA344, LYPLAL18

Chronic periodontitis

6

13.99

4.096e−06

2.253e−05

CD147, IL1B33, IL621, PTGS210, TLR47, TNF13

Diabetes mellitus,

type 2

25

2.52

9.088e−06

4.544e−05

ADRB337, AGTR130, APOC334, APOE111, FABP239, GC9, GCKR52, IL698, IRS185, LEPR32, LIPC25, MC4R32, MTTP8, ENPP160, PPARA28, PPARG200, SOD225, TCF7L2200, TNF123, UCP117, ADIPOQ200, PPARGC1A75, ADIPOR126, ADIPOR222, LYPLAL15

Polycystic ovary syndrome

9

5.75

2.564e−05

1.175e−04

IL616, IRS125, PPARG29, TCF7L216, TNF18, ADIPOQ52, PPARGC1A5, ADIPOR16, ADIPOR25

Overweight

7

7.82

3.058e−05

1.294e−04

ADRB27, ADRB37, LEPR10, MC4R10, ADIPOQ48, CLOCK6, PNPLA311

Hyperlipidemias

6

9.38

4.170e−05

1.638e−04

ADRB35, APOC38, APOE67, FABP26, LIPC10, PPARA11

Atherosclerosis

14

3.39

5.669e−05

2.079e−04

AGTR110, APOC312, APOE61, CD1411, GCKR5, IL638, LIPC7, MIF9, PPARA12, PPARG29, TLR433, TNF45, ADIPOQ57, PPARGC1A7

Diabetes, gestational

6

8.05

9.822e−05

3.376e−04

IRS17, PPARG16, TCF7L217, TNF15, ADIPOQ37, PPARGC1A5

Weight loss

6

7.67

1.291e−04

4.177e−04

ADRB311, FABP25, LEPR11, MC4R16, ADIPOQ28, CLOCK6

Coronary artery disease

14

2.81

4.014e−04

1.226e−03

AGTR123, APOC334, APOE85, CD1417, FABP25, GCKR9, IL663, LIPC26, MTHFR89, PPARA14, PPARG33, ADIPOQ82, PPARGC1A9, ADIPOR17

Periodontitis

5

7.64

4.909e−04

1.421e−03

CD1415, IL1B68, IL628, TLR424, TNF33

Diabetic nephropathies

8

4.29

5.476e−04

1.506e−03

AGTR120, APOE27, MTHFR29, ENPP19, PPARG27, SOD28, TCF7L27, ADIPOQ31

Glucose intolerance

5

6.71

8.854e−04

2.319e−03

IRS17, LEPR6, PPARG12, TCF7L214, ADIPOQ31

Premature birth

7

4.45

9.875e−04

2.469e−03

ADRB210, CD147, IL1B19, IL633, MTHFR17, TLR413, TNF29

Nasal polyps

5

5.54

2.085e−03

4.986e−03

CFTR7, IL1B7, IL69, PTGS29, TNF10

Abortion, habitual

5

5.07

3.053e-03

6.996e-03

APOE9, IL1B12, IL616, MTHFR68, TNF24

Helicobacter infections

6

3.69

5.725e−03

1.260e−02

CD1410, IL1B117, MIF8, PTGS233, TLR437, TNF57

Hepatitis B, chronic

5

4.18

6.921e−03

1.464e−02

HFE7, IL620, MIF7, TNF44, PNPLA38

Sepsis

5

3.36

1.673e−02

3.408e−02

CD1441, IL651, MIF12, TLR446, TNF55

Diabetes mellitus

7

2.63

1.719e−02

3.377e−02

PPARA10, PPARG35, TCF7L220, UCP16, PPARGC1A10, ADIPOR17, ADIPOR26

Colitis, ulcerative

7

2.53

2.051e−02

3.890e−02

CD1412, IL1B20, MIF8, STAT318, TLR428, TNF55, NR1I25

Pulmonary disease, chronic obstructive

7

2.47

2.336e−02

4.283e−02

ADRB223, CFTR13, GC16, GCLC,5 IL6,32 SERPINA1,45 TNF55

Abbreviations: EWAS, exome-wide association study; FDR, false discovery rate; GWAS, genome-wide association study; NAFLD, nonalcoholic fatty liver disease; NASH, nonalcoholic steatohepatitis.


Note: The exploration was performed by literature-enrichment analysis offered by the Genset2Diseases (GS2D) Web server (http://cbdm.uni-mainz.de/geneset2diseases), a tool that computes associations of genes with diseases using biomedical literature annotations.[90] The training set consisted of a list of genes extracted from published gene associations with NAFLD and NASH in candidate–gene association studies and genome-wide approaches (GWAS and EWAS); the full list is shown in [Table 1].


Disease: Disease term from the MeSH vocabulary (based on biomedical references represented by MEDLINE records).


Genes count: The search was restricted using the following filters: For a gene set, minimum number of genes significantly associated with a disease = 5 and minimum number of disease-related citations for a gene = 5.


Fold change: (number of input genes significantly associated with the disease in the literature / number of input genes) / (total number of genes significantly associated with the disease in the literature / total number of genes).


p-Value: Computed by Fisher's exact test; FDR computed by Benjamini–Hochberg method. Gene #: list of genes (gene symbols) of input genes significantly associated with the disease and, in superscript, numbers of relevant citations in the literature.


It could be argued that this analysis is inflated by highly correlated traits and outcomes, such as NAFLD, type 2 diabetes, insulin resistance, atherosclerosis, dyslipidemia, etc. Nevertheless, the analysis offered some surprising findings as well. For example, 7 out of 104 input genes were significantly associated in the literature with ulcerative colitis or premature birth ([Table 3]), and 5 out of 104 were associated with abortion, sepsis, and nasal polyps ([Table 3]). These results, however, must be interpreted with caution, as further work on the confirmation of causality and curation of data are needed. Still, it is expected that—if confirmed—these results may open a window for therapeutic explorations, whereby drugs can be designed to focus on pleiotropic loci or pleiotropic molecular targets that cover multiple traits, even though those traits are not obviously associated.


#

Genetics of NAFLD and Precision Medicine

With the advances of the genetic knowledge of NAFLD and NASH, it becomes possible to use this information for clinical applications. Genetic data could be leveraged to identify individuals at risk of NAFLD, or to estimate the risk of severe histological outcomes, including NASH and NASH-fibrosis ([Fig. 2]). Genetic markers are already being used as tools for personalized clinical practice, including treatment decisions ([Fig. 2]). Specifically, PNPLA3-rs738409 was incorporated into combined screening algorithms that included clinical and biochemical data. Nevertheless, the utility of the variant in NAFLD risk estimation remains inferior to classical predictive or imaging approaches. For example, Kotronen et al proposed the NAFLD liver fat score, which showed an area under the receiver operating characteristic curve (AUROC) of 0.872 (95% confidence interval [CI]: 0.84–0.91) in predicting liver fat content.[97] The addition of rs738409 to the score composed by the presence of type 2 diabetes, along with the level of serum fasting insulin and aminotransferases, improved the prediction accuracy by only < 1%.[97] A more sophisticated multipanel score—the NAFLD multicomponent score—which integrates omics-derived and clinical variables, the rs738409, and proteomic data, showed an AUROC of 0.932 for the NAFLD population risk identification.[98] Despite this high predictive value, this biomarker panel would be neither practical nor cost-effective for large-scale population screening programs.

Zoom Image
Fig. 2 Genetics of nonalcoholic fatty liver disease (NAFLD) and precision medicine. This figure shows examples of the use of genetic markers in the clinical setting, as well as potential yet unexplored applications.

Risk estimation of the disease severity and progression, including NASH and NASH-fibrosis, offers greater opportunities of clinical translation. In fact, the use of genetic testing might open a window for the development of gene-based strategies for the diagnosis of NASH, thus moving the diagnosis of the disease severity from an invasive (liver biopsy) toward a noninvasive approach. Unfortunately, there is still no evidence of superiority in terms of efficacy and accuracy of rs738409—or other variants—in predicting liver histology as compared with the liver biopsy. For example, a combination of laboratory test (aspartate transaminase and fasting insulin), circulating metabolites, and rs738409 genotypes into the NASH Clinical Score and the NASH ClinLipMet score showed NASH prediction value; the AUROC for NASH was 0.778 (95% CI: 0.709–0.846) and 0.866 (95% CI: 0.820–0.913), respectively.[99] Similar explorations have been conducted in pediatric settings, in which a polygenic risk score that included combinations of variants in four loci (PNPLA3-rs738409, SOD2-rs4880, KLF6-3750861, and LPIN1-13412852) showed an AUROC for NASH of 0.75 (95% CI: 0.67–0.82).[100] It has to be emphasized that the more variants (a worse scenario if minor allele frequencies are low) are included in a polygenic score, the lower the frequency of individuals at risk will be found.

It should be noted that genetic assessments provide static information for the explored phenotype or disease trait. However, genetic markers could be used to dynamically predict the response to any therapeutic intervention, as shown in [Fig. 2]. For example, findings yielded by pilot studies indicate that information of the homozygocity status of rs738409 risk-G allele was useful in predicting the absolute change in liver fat content of patients enrolled in a program of hypocaloric low-carbohydrate diet[101] or reduced caloric intake.[102] The variant in PNPLA3 seems to be also useful in predicting changes in body weight of morbidly NAFLD patients enrolled in a bariatric surgery program.[103]

Potential avenues for future research that would significantly affect prevention, surveillance, and prognosis assessment of NAFLD and NASH are summarized in [Fig. 2]. Poorly explored but promising uses of genetic markers include, for example, surveillance of HCC that could occur in cirrhotic and noncirrhotic patients with NASH, or assessment of liver transplantation prognostic outcomes ([Fig. 2]). The potential interplay between the recipient and donor genotype of variants of interest suggests an interesting yet poorly explored research avenue in the field of precision medicine. Findings yielded by a small number of studies suggest that the PNPLA3-rs738409 G allele in either the donor or the recipient could be a risk factor for NAFLD recurrence or appearance after liver transplantation.[104] [105]

The potential value of using genetic markers for treatment decisions pertaining to patients enrolled in NASH clinical trials remains largely unexploited. Nonetheless, it is expected that this specific clinical application will be explored in the near future as the use of novel drugs for the treatment of NASH becomes available in the market. Potential shortcomings and limitations of genetic markers in clinical decision making are shown in [Fig. 2].

Finally, while it is known that NAFLD is a polygenic and complex disease, the use of polygenic risk scores in the NASH diagnosis and prognosis and its interaction with environmental exposure remain largely unknown. Yet, the use of polygenic risk scores in personalized NAFLD care should be tested and optimized to perform well in diverse ethnic groups because the frequency of the risk alleles varies significantly among populations.[2] [11] [12] [13] [14] [23] [106] Remarkable examples of allele frequency disparity among populations are PNPLA3-rs738309, of which the frequency of the G-risk allele varies from 12% in African population to 48% in South American (Mexican, Colombian, Peruvian, and Puerto Rican) population (as shown in http://www.ensembl.org), and HSD17B13-rs72613567, of which the frequency of the A-protective insertion allele varies from 5% in African population to 34% among East Asian population (figures of population genetics were extracted from the 1000 Genomes Project, http://www.internationalgenome.org/).


#

Nonalcoholic Steatohepatitis Treatment Inferred from Genetic Discoveries

There are currently no approved pharmacologic therapies for NASH. However, many novel drugs are being tested for safety and efficacy.[107] Some of these drugs have been designed based on the available knowledge of NAFLD pathogenesis and the underlying mechanisms of the disease progression, including metabolic pathways, inflammatory cascades, and/or fibrogenesis.[107]

Patients with NAFLD and NASH currently receive lifestyle recommendations, and are eventually medicated with known and relatively safe drugs, for example, α tocopherol (vitamin E), ursodeoxycholic acid (UDCA), metformin, losartan, or the insulin sensitizer pioglitazone,[107] [108] which are already available on the market. These drugs are usually prescribed not necessarily for the treatment of NASH but for the treatment of associated comorbidities, for example, type 2 diabetes and arterial hypertension. Hence, their use in the treatment of NASH is purely empirical and/or pragmatic, guided by the assumption of a putative effect on the disease. Despite this limitation, some of the commonly prescribed drugs, including vitamin E and pioglitazone, have been shown to lead to a partial improvement in liver outcomes, such as liver enzymes.[108]

To answer the question of whether medications that patients receive in ordinary clinical practice are in line with disease mechanisms inferred from genetic discoveries, we performed text-mined chemical–gene–disease interactions by the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). We specifically modeled the interaction network among genes associated with NAFLD and NASH that was reported in previous studies ([Table 1]), genes associated with fibrosis and inflammation (mined from the curated gene–disease associations that are established by both the CTD data set and Online Mendelian Inheritance in Man), and drugs that have been used or are currently in use for the treatment of NASH (α tocopherol-vitamin E, UDCA, metformin, pioglitazone, losartan, and liraglutide).

It is evident from the above that translating the information generated from NAFLD genetic studies into new treatment drugs and/or clinical biomarkers was a major challenge. In fact, data generated from either candidate–gene association studies or genome-wide surveys have not been exploited for drug discovery, even though some genes involved are shared by NAFLD and general processes, such as inflammation and fibrosis ([Fig. 3]). Nevertheless, we obtained some remarkable results. For example, some of these drugs—including vitamin E, pioglitazone, and even losartan—are predicted to target genes associated with the genetic risk of NAFLD or NASH ([Fig. 3]); conversely, liraglutide seems not to match any genes discovered in genetic studies ([Fig. 3]).

Zoom Image
Fig. 3 Nonalcoholic fatty liver disease (NAFLD) genes and drug interaction network. This figure shows the shared genes associated with NAFLD and nonalcoholic steatohepatitis (NASH) (see [Table 1] for the training set of genes, n = 104), genes associated with fibrosis (n = 49), and inflammation (n = 130) (the list of genes under these terms is automatically established from the Comparative Toxicogenomics Database [CTD] data set) (A), and drugs that have been used or are currently in use for the treatment of NASH: α-tocopherol/vitamin E, ursodeoxycholic acid (UDCA) (B), pioglitazone, liraglutide (C), and metformin and losartan (D). Numbers after terms in Venn graphs, including genes associated with the selected drugs, indicate the number of genes stored in the CTD database for a given term: UDCA, n = 299; vitamin E, n = 174; pioglitazone, n = 686; liraglutide, n = 3; losartan, n = 172; and metformin, n = 424. CTD integrates information on chemicals, including chemical structures, curated interacting genes and proteins, curated and inferred disease relationships, and enriched pathways and functional annotations, which were extracted from the U.S. National Library of Medicine, the Online Mendelian Inheritance in Man (OMIM) database, and the gene database at the National Center for Biotechnology Information (NCBI). The interaction network was modeled by the CTD (http://ctdbase.org).

Particularly interesting are the following targets: peroxisome proliferator-activated receptor alpha (PPARα) and peroxisome proliferator-activated receptor gamma (PPARγ) and its coactivator PPARG coactivator 1 alpha (PGC1α), STAT3, adrenoceptor beta 2 (ADRB3), and tumor necrosis factor (TNF).

Elafibranor (code name GFT505), a dual PPARα and PPARδ ligand that is currently in phase III, has been proven to consistently ameliorate histological outcomes associated with the disease severity.[110] This pharmacological agent, which has been specifically designed to target PPARs, represents a remarkable example of a drug with potentially pleiotropic and systemic effects.[111]


#

NAFLD and NASH Genes and the Druggable Proteome

Variants associated with the greatest effects on NAFLD and NASH are indeed missense SNPs (PNPLA3-I148M and TM6SF2-E167K) that not only explain modest changes in gene/protein expression levels but hardly represent “druggable” targets.[106] These two loci present either pleiotropic metabolic effects,[69] or are associated with dual and opposite effects on critical phenotypes, particularly TM6SF2-E167K variant, as already mentioned.[14] Hence, the potential use of these proteins as pharmacological targets by modulating their protein and/or enzymatic activity is rather limited.[106]

As a proof-of-concept, we performed an in silico “druggability” prediction of known NAFLD GWAS-discovered genes—including PNPLA3, TM6SF2, GCKR, and HSD17B13—based on protein structural druggability, ligand-based druggability, and network-based druggability implemented by the canSAR resource (http://cansar.icr.ac.uk/). This resource contains information of the whole human proteome, as well as 2,136 model organisms and 8,631 protein families.[112] Predictions are based on the premise that a protein is “druggable” if its activity can be modulated by its binding to a drug-like small compound.[112] The results yielded by this analysis revealed that neither PNPLA3 nor TM6SF2 have any “druggable” protein structure or are associated with any bioactive compound, or are potentially druggable by any predicted ligand-based approach. Assessment of the same parameters for GCKR and HSD17B13 shows a contrasting scenario, as both proteins are potentially druggable targets based on their molecular target three-dimensional structure (Protein Data Bank) and ligandability prediction, which were performed for all identified pockets within each protein structure. Based on the homology of closest druggable structure(s), which examines the structure of the protein and identifies any cavities on the protein surface where a drug-like compound could bind, we found that HSD17B13 and GCKR have a structural druggability of 66.67 and 100%, respectively. Nevertheless, druggability prediction using different approaches, including tumor-tissue and cell line expression, and mutational analysis indicated that overall druggability percentile of GCKR is 44.08%, including druggability for cancer (46.36%) and other therapeutics (17.32%).

Specific focused analysis by the canSAR resource on candidate genes previously associated with NAFLD and the disease severity, for instance STAT3,[113] revealed that the protein coded by this gene presents a ligand-based druggability score of 97%. This specific score indicates the likely druggability of the protein based on the chemical properties of different compounds tested against the protein itself and/or its homologs. STAT3 protein presents an overall druggability percentile of 99.21%, and druggability for cancer therapeutics of 99.39%. Furthermore, structural druggability of STAT3 is 100%.

Network-based druggability assessment for STAT3 and GCKR proteins, which examines the structure or the protein–protein interaction around the target, suggests that STAT3 but not GCKR is a good drug target, as disrupting its activity would affect different and relevant cellular processes ([Fig. 4]). In fact, STAT3 performs better than average targets of other therapeutic areas, even cancer ([Fig. 4]).

Zoom Image
Fig. 4 Signal transducer and activator of transcription 3 (STAT3) and glucokinase regulator gene (GCKR) radar network-predicted druggability plots. Radar plots showing representative network property profiles of STAT3 and GCKR as potential drug targets (blue plot). The predicted network druggability is compared with the randomized network model of an average cancer target (green plot) or an average target for a noncancer drug. Prediction was performed by the canSAR resource available at https://cansar.icr.ac.uk. The network descriptors are divided into three categories: Substructures, Topological, and Community-based. The substructures were obtained from Przulj.[119] The graphlets are labeled as G-n and the orbits as O-. Topological descriptors: Betweenness centrality: A measure for quantifying the influence of one protein on the communication between other proteins in a network. Closeness centrality: Measures how many steps are required for a protein to reach every other protein—a lower number of steps indicates faster communication. Burt's constraint: Burt's Structural Hole and Ego Networks. Constraint is higher when a protein's neighbors are also connected, making the protein more redundant. k-core: A k-core is a fully connected subgraph in which each protein has a degree of at least k. Kleinberg hub score: A measure of how authoritative each protein is based on the principal eigenvector of the network's adjacency matrix. Google PageRank: A measure of the relative importance of the protein within the network; as the protein–protein network is an undirected graph, PageRank is positively correlated with degree distribution. Clustering coefficient: The probability that the neighbors are also connected to each other, calculated by the ratio of triangles connected to the protein. Community-based descriptors: Community Size (Walktrap): Based on hierarchical clustering and attempts to find densely connected subgraphs via random walks across the network. Community size (Spinglass): Based on partitional clustering, where the number of communities to detect is predefined. Intracommunity: Ratio of inter- to intracommunity communication. A higher number indicates that the protein's neighbors are in the same community. Spinglass inner: Number of interactions within the community. Spinglass outer: Number of interactions between the community and the rest of the network.

A recently published experimental study in mice in which the researchers used a novel small STAT3 inhibitor molecule (C188–9) has demonstrated its beneficial effects on liver-related outcomes.[114] C188–9 not only reduced tumor development but also improved liver steatosis, inflammation, and pathological lesions of NASH in mice with hepatocyte-specific deletion of Pten gene.[114] Further experimental and clinical evidence indicates that STAT3 is not only involved in the regulatory circuit of liver fibrogenesis ([Fig. 3]),[115] but is also involved in NASH by exacerbating insulin resistance.[116]

In conclusion, as genetic studies of NAFLD and NASH continue to expand, they are likely to provide insights into the mechanisms of disease pathogenesis and progression. Knowledge on variants associated with the susceptibility of NASH offers an interesting opportunity not only for individualized risk prediction and prognosis, but also for the individual assessment of therapeutic response. Hence, future medicine in the field of NASH would benefit from patient-optimized strategies, which rather than being implemented on a wide scale may be tailored to the genetic makeup of each patient.


#

Main Concepts and Learning Points

  1. Concepts

    • NAFLD is a polygenic complex disease

    • NAFLD gene-regulatory networks

    • Shared pathogenic mechanisms of chronic liver damage

    • NAFLD genes: pleiotropy or just biologically meaningful associations?

    • NAFLD and the druggable proteome

  2. Learning Points

    • The genetic component of NAFLD and NASH is largely explained by variants in genes that regulate glucose and fat homeostasis.

    • Integrated pathways of disease pathogenesis suggest > 100-fold change enrichment in adiponectin and STAT3 activated-signaling pathways, retinol O-fatty-acyltransferase, and β-adrenergic activity.

    • Convergent pathophenotypes, including liver inflammation and fibrosis, share molecular regulatory pathways and disease-predisposing genes.

    • NAFLD-associated genes overlap with loci that were originally thought to play a role in the metabolic syndrome-associated traits.

    • Data generated from candidate–gene association studies and genome-wide surveys can be leveraged to identify therapeutic targets.


#
#

Conflicts of Interest

None.

Note

The authors apologize to the colleagues whose works could not be cited owing to manuscript length limitations.



Address for correspondence

Silvia Sookoian, MD, PhD
Instituto de Investigaciones Médicas (UBA-CONICET), Combatientes de Malvinas 3150
CABA- C1427ARN
Argentina   
Carlos J. Pirola, PhD
Instituto de Investigaciones Médicas (UBA-CONICET), Combatientes de Malvinas 3150
CABA- C1427ARN
Argentina   


Zoom Image
Fig. 1 Missing heritability of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH). There is considerable disparity in the magnitude of heritability estimates of NAFLD and NASH and the proportion of variance explained by single nucleotide polymorphisms (SNPs) uncovered from genome-wide association study (GWAS), exome-wide association study (EWAS), and candidate–gene association studies. A significant proportion of the disease burden could be explained by the missing heritability, which cover not only genetic and epigenetic modifiers but also the interaction with environmental exposure as well as with a highly interconnected and dynamic network of factors, including the microbiome.[115] [116] The genetic component of NAFLD and NASH may be potentially explained by undiscovered rare variants, structural variation, including copy number variation, variants in micro-ribonucleic acids (miRNAs) and long noncoding RNAs (lncRNAs), and expression of quantitative trait loci (eQTLs). The allelic architecture of the human genome that substantially varies according to the different ethnic groups plays an important role. Variation across populations might explain differences in the prevalence and severity of the disease across different ethnic groups. Epigenetic factors include not only deoxyribonucleic acid (DNA) and histone methylation but also chromatic remodeling and nonprotein coding RNAs. Epigenetic inheritance also involves modifications of the histone code, including histone acetyltransferases (HAT) and deacetylases (HDAC). Abbreviations: circRNA, circular RNA; linRNAs, long intergenic RNA; NAT, natural antisense transcript; piRNA, PIWI-interacting RNA; snRNA, small nuclear RNA; snoRNA, small nucleolar RNA.
Zoom Image
Fig. 2 Genetics of nonalcoholic fatty liver disease (NAFLD) and precision medicine. This figure shows examples of the use of genetic markers in the clinical setting, as well as potential yet unexplored applications.
Zoom Image
Fig. 3 Nonalcoholic fatty liver disease (NAFLD) genes and drug interaction network. This figure shows the shared genes associated with NAFLD and nonalcoholic steatohepatitis (NASH) (see [Table 1] for the training set of genes, n = 104), genes associated with fibrosis (n = 49), and inflammation (n = 130) (the list of genes under these terms is automatically established from the Comparative Toxicogenomics Database [CTD] data set) (A), and drugs that have been used or are currently in use for the treatment of NASH: α-tocopherol/vitamin E, ursodeoxycholic acid (UDCA) (B), pioglitazone, liraglutide (C), and metformin and losartan (D). Numbers after terms in Venn graphs, including genes associated with the selected drugs, indicate the number of genes stored in the CTD database for a given term: UDCA, n = 299; vitamin E, n = 174; pioglitazone, n = 686; liraglutide, n = 3; losartan, n = 172; and metformin, n = 424. CTD integrates information on chemicals, including chemical structures, curated interacting genes and proteins, curated and inferred disease relationships, and enriched pathways and functional annotations, which were extracted from the U.S. National Library of Medicine, the Online Mendelian Inheritance in Man (OMIM) database, and the gene database at the National Center for Biotechnology Information (NCBI). The interaction network was modeled by the CTD (http://ctdbase.org).
Zoom Image
Fig. 4 Signal transducer and activator of transcription 3 (STAT3) and glucokinase regulator gene (GCKR) radar network-predicted druggability plots. Radar plots showing representative network property profiles of STAT3 and GCKR as potential drug targets (blue plot). The predicted network druggability is compared with the randomized network model of an average cancer target (green plot) or an average target for a noncancer drug. Prediction was performed by the canSAR resource available at https://cansar.icr.ac.uk. The network descriptors are divided into three categories: Substructures, Topological, and Community-based. The substructures were obtained from Przulj.[119] The graphlets are labeled as G-n and the orbits as O-. Topological descriptors: Betweenness centrality: A measure for quantifying the influence of one protein on the communication between other proteins in a network. Closeness centrality: Measures how many steps are required for a protein to reach every other protein—a lower number of steps indicates faster communication. Burt's constraint: Burt's Structural Hole and Ego Networks. Constraint is higher when a protein's neighbors are also connected, making the protein more redundant. k-core: A k-core is a fully connected subgraph in which each protein has a degree of at least k. Kleinberg hub score: A measure of how authoritative each protein is based on the principal eigenvector of the network's adjacency matrix. Google PageRank: A measure of the relative importance of the protein within the network; as the protein–protein network is an undirected graph, PageRank is positively correlated with degree distribution. Clustering coefficient: The probability that the neighbors are also connected to each other, calculated by the ratio of triangles connected to the protein. Community-based descriptors: Community Size (Walktrap): Based on hierarchical clustering and attempts to find densely connected subgraphs via random walks across the network. Community size (Spinglass): Based on partitional clustering, where the number of communities to detect is predefined. Intracommunity: Ratio of inter- to intracommunity communication. A higher number indicates that the protein's neighbors are in the same community. Spinglass inner: Number of interactions within the community. Spinglass outer: Number of interactions between the community and the rest of the network.