Subscribe to RSS
DOI: 10.1055/s-0045-1807460
Investigating the role of structural variants in insulin action and type 2 diabetes
Background Type 2 diabetes mellitus (T2DM) has a substantial genetic component, but as of now only a fraction of associated gene variants has been identified. In previous studies we conducted crossbreeding and linkage analyses using T2DM-susceptile (NZO) and T2DM–resistant mouse strains (C3H, 129/P2) to uncover three large-effect quantitative trait loci (QTL) for blood glucose (Nbg4/7/15 on chromosomes 4, 7 and 15) based on conventional single nucleotide polymorphisms. Recent advances in technology and bioinformatics have enabled generation and assembly of long-read sequencing data without a reference genome, thus enabling the investigation of structural variants (SVs) that are not resolved in conventional short-read DNA sequencing.
Aim The aim of the project is to a) generate high quality genomes from the three mouse strains; b) overlay tissue-specific transcriptomes to characterize the impact of SVs on gene expression and c) refine genomic linkage regions involved in diabetes development.
Methods DNA and RNA were extracted from different tissues (pancreatic islets, skeletal muscle, liver, gWAT, BAT) of 6 week old mice of the parental strains. Genome sequencing was performed using HiFi-PacBio technology, RNASeq was conducted by Illumina short-read sequencing. Both types of reads were aligned to the mouse reference genome. Based on the DNA assembly, SVs were called against the reference genome. Differential gene expression (DGE) analysis and differential splicing analysis are performed using nf-core/differentialabundance and rMATS.
Results For all three strains, the mean read length of DNA fragments was above 50.000 bp. Using the pbmm2/pbsv pathway, SV calling was performed in the QTL regions. 600-2100 SVs were called in each strain, with 0 to 1.5 percent occurring in coding regions. Further collapsing of the data revealed that between 175 and 710 genes per QTL were affected by the SVs in each strain, some of them being affected in two or three strains. RNA-Sequencing analysis revealed expression of 5-12 Mio. unique transcripts in all of the 5 tissues analyzed. As expected, principal component analysis shows high similarity of gene expression across the different strains but high variability between the different tissues.
Discussion and Outlook Using PacBio HiFi long-read sequencing we identified SVs in candidate genes and intergenic regions that may contribute to the phenotype of the NZO/C3H and NZO/129/P2 outcross population, respectively. Our genome assemblies provide for the first time highly accurate genomic data for the three mouse strains, thus improving future identification of new risk variants for T2DM and related phenotypes. DGE analysis of the performed RNA-Sequencing and subsequent aggregation of the genomic and transcriptomic data including the called SVs will allow the investigation of relationships between SVs and gene expression. Plausible functional SVs discovered by this analysis will be further investigated.
Publication History
Article published online:
28 May 2025
© 2025. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany