Genome-wide association studies (GWASs) identified hundreds of signals associated with type 2 diabetes (T2D). To gain insight into their underlying molecular mechanisms, we have created the translational human pancreatic islet genotype tissue-expression resource (TIGER), aggregating >500 human islet genomic datasets from five cohorts in the Horizon 2020 consortium T2DSystems. We impute genotypes using four reference panels and meta-analyze cohorts to improve the coverage of expression quantitative trait loci (eQTL) and develop a method to combine allele-specific expression across samples (cASE). We identify >1 million islet eQTLs, 53 of which colocalize with T2D signals. Among them, a low-frequency allele that reduces T2D risk by half increases CCND2 expression. We identify eight cASE colocalizations, among which we found a T2D-associated SLC30A8 variant. We make all data available through the TIGER portal (http://tiger.bsc.es), which represents a comprehensive human islet genomic data resource to elucidate how genetic variation affects islet function and translates into therapeutic insight and precision medicine for T2D. ; This work has been supported by the European Union's Horizon 2020 research and innovation program T2Dsystems under grant agreement no. 667191. L.A. was supported by grant BES-2017-081635 of the Severo Ochoa Program, awarded by the Spanish government. I.M. was supported by the FJCI-2017-31878 Juan de la Cierva grant, awarded by the Spanish government. Work in the Cnop and Eizirik labs was further supported by the Fonds National de la Recherche Scientifique (FNRS), the Brussels Region Innoviris project DiaType, and the Walloon Region SPW-EER Win2Wal project BetaSource, Belgium. D.L.E. is supported by a grant from the Welbio–FNRS, Belgium. P.M., L.G., D.L.E., and M.C. are supported by the Innovative Medicines Initiative 2 Joint Undertaking Rhapsody, under grant agreement no. 115881, which is supported by the European Union's Horizon 2020 research and innovation programme, EFPIA and the Swiss State Secretariat for Education' Research and Innovation (SERI) under contract number 16.0097. J.M.M. is supported by American Diabetes Association Innovative and Clinical Translational Award 1-19-ICTS-068. J.C. is supported by an Expanding Excellence in England Award from Research England. H.M., J.L.S.E., and L.E. are supported by the Swedish Strategic Research Foundation (IRC15-0067). A.L.G. is a Wellcome Trust Senior Fellow in Basic Biomedical Science. This work was funded in Oxford and Stanford by the Wellcome Trust (095101, 200837, 106130, and 203141 [all to A.L.G.]) and the NIH (U01-DK105535 and U01-DK085545 [A.L.G.]). The research was funded by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC) (A.L.G.). I.M.-E. was supported by the EFDS/Novo Nordisk Rising Star Programme. Work in the Ferrer lab was supported by the Imperial College London Research Computing Service, the NIHR Imperial BRC, and the Centre for Genomic Regulation (CRG) genomics facility, and grants from Ministerio de Ciencia e Innovación (BFU2014-54284-R and RTI2018-095666-B-I00), the Medical Research Council (MR/L02036X/1), the Wellcome Trust Senior Investigator Award (WT101033), and the European Research Council Advanced Grant (789055)
The identification and characterisation of genomic changes (variants) that can lead to human diseases is one of the central aims of biomedical research. The generation of catalogues of genetic variants that have an impact on specific diseases is the basis of Personalised Medicine, where diagnoses and treatment protocols are selected according to each patient's profile. In this context, the study of complex diseases, such as Type 2 diabetes or cardiovascular alterations, is fundamental. However, these diseases result from the combination of multiple genetic and environmental factors, which makes the discovery of causal variants particularly challenging at a statistical and computational level. Genome-Wide Association Studies (GWAS), which are based on the statistical analysis of genetic variant frequencies across non-diseased and diseased individuals, have been successful in finding genetic variants that are associated to specific diseases or phenotypic traits. But GWAS methodology is limited when considering important genetic aspects of the disease and has not yet resulted in meaningful translation to clinical practice. This review presents an outlook on the study of the link between genetics and complex phenotypes. We first present an overview of the past and current statistical methods used in the field. Next, we discuss current practices and their main limitations. Finally, we describe the open challenges that remain and that might benefit greatly from further mathematical developments. ; L.A. was supported by grant BES-2017-081635. This publication is part of R&D and Innovation grant BES-2017-081635 funded by MCIN and by "FSE Investing in your future"I.M. was supported by grant FJCI-2017-31878. This publication is part of R&D and Innovation grant FJCI-2017-31878 funded by MCIN. C.S. received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement H2020-MSCA-COFUND-2016-754433. ; Peer Reviewed ; Postprint (published version)
Genome-wide association studies (GWAS) are not fully comprehensive, as current strategies typically test only the additive model, exclude the X chromosome, and use only one reference panel for genotype imputation. We implement an extensive GWAS strategy, GUIDANCE, which improves genotype imputation by using multiple reference panels and includes the analysis of the X chromosome and non-additive models to test for association. We apply this methodology to 62,281 subjects across 22 age-related diseases and identify 94 genome-wide associated loci, including 26 previously unreported. Moreover, we observe that 27.7% of the 94 loci are missed if we use standard imputation strategies with a single reference panel, such as HRC, and only test the additive model. Among the new findings, we identify three novel low-frequency recessive variants with odds ratios larger than 4, which need at least a three-fold larger sample size to be detected under the additive model. This study highlights the benefits of applying innovative strategies to better uncover the genetic architecture of complex diseases. ; This work has been sponsored by the grant SEV-2011-00067 and SEV2015-0493 of Severo Ochoa Program, awarded by the Spanish Government, by the grant TIN2015-65316-P, awarded by the Spanish Ministry of Science and Innovation, and by the Generalitat de Catalunya (contract 2014-SGR-1051). This work was supported by an EFSD/Lilly research fellowship. Josep M. Mercader was supported by a Sara Borrell Fellowship from the Instituto Carlos III, Beatriu de Pinós fellowship from the Agency for Management of University and Research Grants (AGAUR) and by the American Diabetes Association Innovative and Clinical Translational Award 1-19-ICTS-068. Sílvia Bonàs was supported by FI-DGR Fellowship from FIDGR 2013 from Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya), and a 'Juan de la Cierva' postdoctoral fellowship (MINECO;FJCI-2017-32090). Cecilia Salvoro received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement H2020-MSCA-COFUND-2016-754433. Cristian Ramon-Cortes pre-doctoral contract is financed by the Spanish Ministry of Science, Innovation, and Universities under contract BES-2016-076791. Elizabeth G. Atkinson was supported by the National Institutes of Mental Health (grants K01MH121659 and T32MH017119).
[Background] The physical organization and chromosomal localization of genes within genomes is known to play an important role in their function. Most genes arise by duplication and move along the genome by random shuffling of DNA segments. Higher order structuring of the genome occurs in eukaryotes, where groups of physically linked genes are co-expressed. However, the contribution of gene duplication to gene order has not been analyzed in detail, as it is believed that co-expression due to recent duplicates would obscure other domains of co-expression. ; [Results] We have catalogued ordered duplicated genes in Drosophila melanogaster, and found that one in five of all genes is organized as tandem arrays. Furthermore, among arrays that have been spatially conserved over longer periods than would be expected on the basis of random shuffling, a disproportionate number contain genes encoding developmental regulators. Using in situ gene expression data for more than half of the Drosophila genome, we find that genes in these conserved clusters are co-expressed to a much higher extent than other duplicated genes. ; [Conclusions] These results reveal the existence of functional constraints in insects that retain copies of genes encoding developmental and regulatory proteins as neighbors, allowing their coexpression. This co-expression may be the result of shared cis-regulatory elements or a shared need for a specific chromatin structure. Our results highlight the association between genome architecture and the gene regulatory networks involved in the construction of the body plan. ; This work was supported by a grant from the BBVA Foundation (Spain) to MMan, DT and MMil, from the Spanish Government (grants BFU2005-00025 to MMan and BIO2006- 15036 to DT), and from the EMBO Young Investigator Programme (to MMan). The work of MMan at the CNIC is supported by the Spanish Ministry of Science and Innovation and the Pro-CNIC Foundation. ; Peer reviewed
The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies. ; GCAT|Genomes for Life, a cohort study of the Genomes of Catalonia, Fundació Institut Germans Trias i Pujol (IGTP); IGTP is part of the CERCA Program/Generalitat de Catalunya; GCAT is supported by Acción de Dinamización del ISCIII-MINECO; Ministry of Health of the Generalitat of Catalunya [ADE 10/00026]; Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR) [2017-SGR 529]; B.C. is supported by national grants [PI18/01512]; X.F. is supported by VEIS project [001-P-001647] (co-funded by European Regional Development Fund (ERDF), 'A way to build Europe'); a full list of the investigators who contributed to the generation of the GCAT data is available from www.genomesforlife.com/; Severo Ochoa Program, awarded by the Spanish Government [SEV-2011-00067 and SEV2015-0493]; Spanish Ministry of Science [TIN2015-65316-P]; Innovation and by the Generalitat de Catalunya [2014-SGR-1051 to D.T.]; Agencia Estatal de Investigación (AEI, Spain) [BFU2016-77244-R and PID2019-107836RB-I00]; European Regional Development Fund (FEDER, EU) (to M.C.); Spanish Ministry of Science and Innovation [FPI BES-2016-0077344 to J.V.M.]; C.S. received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement [H2020-MSCA-COFUND-2016-754433]; this study made use of data generated by the UK10K Consortium from UK10K COHORT IMPUTATION [EGAS00001000713]; formal agreement with the Barcelona Supercomputing Center (BSC); this study made use of data generated by the Genome of the Netherlands' project, which is funded by the Netherlands Organization for Scientific Research [184021007], allowing us to use the GoNL reference panel containing SVs, upon request (GoNL Data Access request 2019203); this study also used data generated by the Haplotype Reference Consortium (HRC) accessed through the European Genome-phenome Archive with the accession numbers EGAD00001002729; formal agreement of the Barcelona Supercomputing Center (BSC) with WTSI; this study made use of data generated by the 1000 Genomes (1000G), accessed through the FTP portal (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/); this study used the GeneHancer-for-AnnotSV dump for GeneCards Suite Version 4.14, through a formal agreement between the BSC and the Weizmann Institute of Science. ; Peer Reviewed ; "Article signat per 21 autors/es: Jordi Valls-Margarit, Iván Galván-Femenía, Daniel Matías-Sánchez, Natalia Blay, Montserrat Puiggròs, Anna Carreras, Cecilia Salvoro, Beatriz Cortés, Ramon Amela, Xavier Farre, Jon Lerga-Jaso, Marta Puig, Jose Francisco Sánchez-Herrero, Victor Moreno, Manuel Perucho, Lauro Sumoy, Lluís Armengol, Olivier Delaneau, Mario Cáceres, Rafael de Cid, David Torrents" ; Postprint (published version)
Genome-wide association studies (GWASs) identified hundreds of signals associated with type 2 diabetes (T2D). To gain insight into their underlying molecular mechanisms, we have created the translational human pancreatic islet genotype tissue-expression resource (TIGER), aggregating >500 human islet genomic datasets from five cohorts in the Horizon 2020 consortium T2DSystems. We impute genotypes using four reference panels and meta-analyze cohorts to improve the coverage of expression quantitative trait loci (eQTL) and develop a method to combine allele-specific expression across samples (cASE). We identify >1 million islet eQTLs, 53 of which colocalize with T2D signals. Among them, a low-frequency allele that reduces T2D risk by half increases CCND2 expression. We identify eight cASE colocalizations, among which we found a T2D-associated SLC30A8 variant. We make all data available through the TIGER portal (http://tiger.bsc.es), which represents a comprehensive human islet genomic data resource to elucidate how genetic variation affects islet function and translates into therapeutic insight and precision medicine for T2D. ; This work has been supported by the European Union's Horizon 2020 research and innovation program T2Dsystems under grant agreement no. 667191 . L.A. was supported by grant BES-2017-081635 of the Severo Ochoa Program, awarded by the Spanish government . I.M. was supported by the FJCI-2017-31878 Juan de la Cierva grant, awarded by the Spanish government . Work in the Cnop and Eizirik labs was further supported by the Fonds National de la Recherche Scientifique (FNRS), the Brussels Region Innoviris project DiaType , and the Walloon Region SPW-EER Win2Wal project BetaSource, Belgium . D.L.E. is supported by a grant from the Welbio–FNRS , Belgium. P.M., L.G., D.L.E., and M.C. are supported by the Innovative Medicines Initiative 2 Joint Undertaking Rhapsody , under grant agreement no. 115881 , which is supported by the European Union's Horizon 2020 research and innovation programme, EFPIA and the Swiss State Secretariat for Education' Research and Innovation (SERI) under contract number 16.0097 . J.M.M. is supported by American Diabetes Association Innovative and Clinical Translational Award 1-19-ICTS-068 . J.C. is supported by an Expanding Excellence in England Award from Research England . H.M., J.L.S.E., and L.E. are supported by the Swedish Strategic Research Foundation ( IRC15-0067 ). A.L.G. is a Wellcome Trust Senior Fellow in Basic Biomedical Science. This work was funded in Oxford and Stanford by the Wellcome Trust ( 095101 , 200837 , 106130 , and 203141 [all to A.L.G.]) and the NIH ( U01-DK105535 and U01-DK085545 [A.L.G.]). The research was funded by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC) (A.L.G.). I.M.-E. was supported by the EFDS/Novo Nordisk Rising Star Programme . Work in the Ferrer lab was supported by the Imperial College London Research Computing Service , the NIHR Imperial BRC , and the Centre for Genomic Regulation (CRG) genomics facility , and grants from Ministerio de Ciencia e Innovación ( BFU2014-54284-R and RTI2018-095666-B-I00 ), the Medical Research Council ( MR/L02036X/1 ), the Wellcome Trust Senior Investigator Award ( WT101033 ), and the European Research Council Advanced Grant ( 789055 ). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. The technical support group from the Barcelona Supercomputing Center is gratefully acknowledged. Finally, we thank the entire Computational Genomics group at the BSC for their helpful discussions and valuable comments on the manuscript. We also acknowledge Cristian Opi and Laia Codó from the Barcelona Supercomputing Center for excellent website design and allocation of technical support and Isabelle Millard and Anyishaï Musuaya from the ULB Center for Diabetes Research for excellent technical and experimental support. ; Peer Reviewed ; "Article signat per 30 autors/es: Lorena Alonso, Anthony Piron, Ignasi Morán, Marta Guindo-Martínez, Sílvia Bonàs-Guarch, Goutham Atla, Irene Miguel-Escalada, Romina Royo, Montserrat Puiggròs, Xavier Garcia-Hurtado, Mara Suleiman, Lorella Marselli, Jonathan L.S. Esguerra, Jean-Valéry Turatsinze, Jason M. Torres, Vibe Nylander, Ji Chen, Lena Eliasson, Matthieu Defrance, Ramon Amela, MAGIC24, Hindrik Mulder, Anna L. Gloyn, Leif Groop, Piero Marchetti, Decio L. Eizirik, Jorge Ferrer, Josep M. Mercader, Miriam Cnop, David Torrents" ; Postprint (published version)
Lambda interferons (IFNLs) have immunomodulatory functions at epithelial barrier surfaces. IFN-λ4, a recent member of this family is expressed only in a subset of the population due to a frameshift-causing DNA polymorphism rs368234815. We examined the association of this polymorphism with atopy (aeroallergen sensitization) and asthma in a Polish hospital-based case-control cohort comprising of well-characterized adult asthmatics (n = 326) and healthy controls (n = 111). In the combined cohort, we saw no association of the polymorphism with asthma and/or atopy. However, the IFN-λ4-generating ΔG allele protected older asthmatic women (>50 yr of age) from atopic sensitization. Further, ΔG allele significantly associated with features of less-severe asthma including bronchodilator response and corticosteroid usage in older women in this Polish cohort. We tested the association of related IFNL locus polymorphisms (rs12979860 and rs8099917) with atopy, allergic rhinitis and presence/absence of asthma in three population-based cohorts from Europe, but saw no significant association of the polymorphisms with any of the phenotypes in older women. The polymorphisms associated marginally with lower occurrence of asthma in men/older men after meta-analysis of data from all cohorts. Functional and well-designed replication studies may reveal the true positive nature of these results. ; SC was supported as a visiting scientist by Healthy Ageing Research Centre, Medical University of Lodz. This study was supported by Polish National Science Centre grant no. 2013/09/B/NZ6/00746 . The authors (SC, AW, MP, JM and MLK) have been partially supported by The Healthy Ageing Research Centre Project (REGPOT-2012-2013-1, 7FP). Inter99 and Health2006 study: TS was supported by a grant from the Lundbeck Foundation (Grant number R165-2013-15410), the Harboe Foundation (Grant number 16152), the A.P. Møller Foundation for the Advancement of Medical Science (Grant number 15-363), Aase and Einar Danielsen's Foundation (Grant number 10-001490), and the Weimann's grant. The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent Research Center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation (www.metabol.ku.dk). GERA study: This work has been sponsored by the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government. This work was supported by an EFSD/Lilly research fellowship. Josep M. Mercader was supported by Sara Borrell Fellowship from the Instituto Carlos III. Sílvia Bonàs was FI-DGR Fellowship from FI-DGR 2013 from Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya). COPSAC2000 study: We greatly acknowledge the private and public research funding allocated to COPSAC and listed on www.copsac.com, with special thanks to The Lundbeck Foundation (Grant nr. R16-A1694); Ministry of Health (Grant nr. 903516); Danish Council for Strategic Research (Grant nr.: 0603-00280B); The Danish Council for Independent Research and The Capital Region Research Foundation as core supporters.
Background One of the hallmarks of cancer is the disruption of gene expression patterns. Many molecular lesions contribute to this phenotype, and the importance of aberrant DNA methylation profiles is increasingly recognized. Much of the research effort in this area has examined proximal promoter regions and epigenetic alterations at other loci are not well characterized. Results Using whole genome bisulfite sequencing to examine uncharted regions of the epigenome, we identify a type of far-reaching DNA methylation alteration in cancer cells of the distal regulatory sequences described as super-enhancers. Human tumors undergo a shift in super-enhancer DNA methylation profiles that is associated with the transcriptional silencing or the overactivation of the corresponding target genes. Intriguingly, we observe locally active fractions of super-enhancers detectable through hypomethylated regions that suggest spatial variability within the large enhancer clusters. Functionally, the DNA methylomes obtained suggest that transcription factors contribute to this local activity of super-enhancers and that trans-acting factors modulate DNA methylation profiles with impact on transforming processes during carcinogenesis. Conclusions We develop an extensive catalogue of human DNA methylomes at base resolution to better understand the regulatory functions of DNA methylation beyond those of proximal promoter gene regions. CpG methylation status in normal cells points to locally active regulatory sites at super-enhancers, which are targeted by specific aberrant DNA methylation events in cancer, with putative effects on the expression of downstream genes. ; The research leading to these results received funding from: the European Research Council (ERC), grant EPINORC, under agreement number 268626; MICINN Projects–SAF2011-22803 and BFU2011-28549; Ministerio de Economía y Competitividad (MINECO), co-financed by the European Development Regional Fund, 'A way to achieve Europe' ERDF, under grant number SAF2014-55000-R; the Cellex Foundation; AGAUR Catalan Government Project #2009SGR1315; the Institute of Health Carlos III (ISCIII), under the Spanish Cancer Research Network (RTICC) number RD12/0036/0039, the Integrated Project of Excellence number PIE13/00022 (ONCOPROFILE) and the research grant PI11/00321; the Sandra Ibarra Foundation, under IV ghd Grants for breast cancer research; the Olga Torres Foundation; the European Community's Seventh Framework Programme (FP7/2007-2013), grant HEALTH-F5-2011-282510 – BLUEPRINT, and the Health and Science Departments of the Generalitat de Catalunya. H.H. is a Miguel Servet (CP14/00229) researcher funded by the Spanish Institute of Health Carlos III (ISCIII). D.T. and M.E. are ICREA Research Professors. ; Peer Reviewed ; Postprint (author's final draft)
Background One of the hallmarks of cancer is the disruption of gene expression patterns. Many molecular lesions contribute to this phenotype, and the importance of aberrant DNA methylation profiles is increasingly recognized. Much of the research effort in this area has examined proximal promoter regions and epigenetic alterations at other loci are not well characterized. Results Using whole genome bisulfite sequencing to examine uncharted regions of the epigenome, we identify a type of far-reaching DNA methylation alteration in cancer cells of the distal regulatory sequences described as super-enhancers. Human tumors undergo a shift in super-enhancer DNA methylation profiles that is associated with the transcriptional silencing or the overactivation of the corresponding target genes. Intriguingly, we observe locally active fractions of super-enhancers detectable through hypomethylated regions that suggest spatial variability within the large enhancer clusters. Functionally, the DNA methylomes obtained suggest that transcription factors contribute to this local activity of super-enhancers and that trans-acting factors modulate DNA methylation profiles with impact on transforming processes during carcinogenesis. Conclusions We develop an extensive catalogue of human DNA methylomes at base resolution to better understand the regulatory functions of DNA methylation beyond those of proximal promoter gene regions. CpG methylation status in normal cells points to locally active regulatory sites at super-enhancers, which are targeted by specific aberrant DNA methylation events in cancer, with putative effects on the expression of downstream genes. ; The research leading to these results received funding from: the European Research Council (ERC), grant EPINORC, under agreement number 268626; MICINN Projects–SAF2011-22803 and BFU2011-28549; Ministerio de Economía y Competitividad (MINECO), co-financed by the European Development Regional Fund, 'A way to achieve Europe' ERDF, under grant number SAF2014-55000-R; the Cellex Foundation; AGAUR Catalan Government Project #2009SGR1315; the Institute of Health Carlos III (ISCIII), under the Spanish Cancer Research Network (RTICC) number RD12/0036/0039, the Integrated Project of Excellence number PIE13/00022 (ONCOPROFILE) and the research grant PI11/00321; the Sandra Ibarra Foundation, under IV ghd Grants for breast cancer research; the Olga Torres Foundation; the European Community's Seventh Framework Programme (FP7/2007-2013), grant HEALTH-F5-2011-282510 – BLUEPRINT, and the Health and Science Departments of the Generalitat de Catalunya. H.H. is a Miguel Servet (CP14/00229) researcher funded by the Spanish Institute of Health Carlos III (ISCIII). D.T. and M.E. are ICREA Research Professors. ; Peer Reviewed ; Postprint (author's final draft)
The reanalysis of existing GWAS data represents a powerful and cost-effective opportunity to gain insights into the genetics of complex diseases. By reanalyzing publicly available type 2 diabetes (T2D) genome-wide association studies (GWAS) data for 70,127 subjects, we identify seven novel associated regions, five driven by common variants (LYPLAL1, NEUROG3, CAMKK2, ABO, and GIP genes), one by a low-frequency (EHMT2), and one driven by a rare variant in chromosome Xq23, rs146662057, associated with a twofold increased risk for T2D in males. rs146662057 is located within an active enhancer associated with the expression of Angiotensin II Receptor type 2 gene (AGTR2), a modulator of insulin sensitivity, and exhibits allelic specific activity in muscle cells. Beyond providing insights into the genetics and pathophysiology of T2D, these results also underscore the value of reanalyzing publicly available data using novel genetic resources and analytical approaches. ; This work has been sponsored by the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government. This work was supported by an EFSD/Lilly research fellowship. Josep M. Mercader was supported by Sara Borrell Fellowship from the Instituto Carlos III and Beatriu de Pinós fellowship from the Agency for Management of University and Research Grants (AGAUR). Sílvia Bonàs was FI-DGR Fellowship from FI-DGR 2013 from Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya). This study makes use of data generated by the WTCCC. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113. This study also makes use of data generated by the UK10K Consortium, derived from samples from UK10K COHORT IMPUTATION (EGAS00001000713). A full list of the investigators who contributed to the generation of the data is available in www.UK10K.org. Funding for UK10K was provided by the Wellcome Trust under award WT091310. We acknowledge PRACE for awarding us to access MareNostrum supercomputer, based in Spain at Barcelona. The technical support group, particularly Pablo Ródenas and Jorge Rodríguez, from the Barcelona Supercomputing Center is gratefully acknowledged. This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No 667191. Mercè Planas-Fèlix is funded by the Obra Social Fundación la Caixa fellowship under the Severo Ochoa 2013 program. Work from Irene Miguel-Escalada, Ignasi Moran, Goutham Atla, and Jorge Ferrer was supported by the National Institute for Health Research (NIHR) Imperial Biomedical Research Centre, the Wellcome Trust (WT101033), Ministerio de Economía y Competitividad (BFU2014-54284-R) and Horizon 2020 (667191). Irene Miguel-Escalada has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska–Curie grant agreement No 658145. We acknowledge Prof. Giulio Cossu (Institute of Inflammation and Repair, University of Manchester) for providing the muscle myoblast cell line. We also acknowledge the InterAct and SIGMA Type 2 Diabetes Consortia for access to the data to replicate the rs146662075 variant. A full list of the investigators of the SIGMA Type 2 Diabetes and the InterAct consortia is provided in Supplementary Notes 3 and 4. The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent research center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation (www.metabol.ku.dk). This research has been conducted using the UK Biobank Resource (application number 16803). We also acknowledge Bianca C. Porneala, MS for his technical assistance in the collection and curation of the genotype and phenotype data from Partners Biobank. We also thank Marcin von Grotthuss for their support for uploading the summary statistics data to the Type 2 Diabetes Genetic Portal (AMP-T2D portal). Finally, we thank all the Computational Genomics group at the BSC for their helpful discussions and valuable comments on the manuscript. ; Peer Reviewed ; Postprint (published version)
The reanalysis of existing GWAS data represents a powerful and cost-effective opportunity to gain insights into the genetics of complex diseases. By reanalyzing publicly available type 2 diabetes (T2D) genome-wide association studies (GWAS) data for 70,127 subjects, we identify seven novel associated regions, five driven by common variants (LYPLAL1, NEUROG3, CAMKK2, ABO, and GIP genes), one by a low-frequency (EHMT2), and one driven by a rare variant in chromosome Xq23, rs146662057, associated with a twofold increased risk for T2D in males. rs146662057 is located within an active enhancer associated with the expression of Angiotensin II Receptor type 2 gene (AGTR2), a modulator of insulin sensitivity, and exhibits allelic specific activity in muscle cells. Beyond providing insights into the genetics and pathophysiology of T2D, these results also underscore the value of reanalyzing publicly available data using novel genetic resources and analytical approaches. ; This work has been sponsored by the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government. This work was supported by an EFSD/Lilly research fellowship. Josep M. Mercader was supported by Sara Borrell Fellowship from the Instituto Carlos III and Beatriu de Pinós fellowship from the Agency for Management of University and Research Grants (AGAUR). Sílvia Bonàs was FI-DGR Fellowship from FI-DGR 2013 from Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya). This study makes use of data generated by the WTCCC. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113. This study also makes use of data generated by the UK10K Consortium, derived from samples from UK10K COHORT IMPUTATION (EGAS00001000713). A full list of the investigators who contributed to the generation of the data is available in www.UK10K.org. Funding for UK10K was provided by the Wellcome Trust under award WT091310. We acknowledge PRACE for awarding us to access MareNostrum supercomputer, based in Spain at Barcelona. The technical support group, particularly Pablo Ródenas and Jorge Rodríguez, from the Barcelona Supercomputing Center is gratefully acknowledged. This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No 667191. Mercè Planas-Fèlix is funded by the Obra Social Fundación la Caixa fellowship under the Severo Ochoa 2013 program. Work from Irene Miguel-Escalada, Ignasi Moran, Goutham Atla, and Jorge Ferrer was supported by the National Institute for Health Research (NIHR) Imperial Biomedical Research Centre, the Wellcome Trust (WT101033), Ministerio de Economía y Competitividad (BFU2014-54284-R) and Horizon 2020 (667191). Irene Miguel-Escalada has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska–Curie grant agreement No 658145. We acknowledge Prof. Giulio Cossu (Institute of Inflammation and Repair, University of Manchester) for providing the muscle myoblast cell line. We also acknowledge the InterAct and SIGMA Type 2 Diabetes Consortia for access to the data to replicate the rs146662075 variant. A full list of the investigators of the SIGMA Type 2 Diabetes and the InterAct consortia is provided in Supplementary Notes 3 and 4. The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent research center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation (www.metabol.ku.dk). This research has been conducted using the UK Biobank Resource (application number 16803). We also acknowledge Bianca C. Porneala, MS for his technical assistance in the collection and curation of the genotype and phenotype data from Partners Biobank. We also thank Marcin von Grotthuss for their support for uploading the summary statistics data to the Type 2 Diabetes Genetic Portal (AMP-T2D portal). Finally, we thank all the Computational Genomics group at the BSC for their helpful discussions and valuable comments on the manuscript. ; Peer Reviewed ; Postprint (published version)
The reanalysis of existing GWAS data represents a powerful and cost-effective opportunity to gain insights into the genetics of complex diseases. By reanalyzing publicly available type 2 diabetes (T2D) genome-wide association studies (GWAS) data for 70,127 subjects, we identify seven novel associated regions, five driven by common variants (LYPLAL1, NEUROG3, CAMKK2, ABO, and GIP genes), one by a low-frequency (EHMT2), and one driven by a rare variant in chromosome Xq23, rs146662057, associated with a twofold increased risk for T2D in males. rs146662057 is located within an active enhancer associated with the expression of Angiotensin II Receptor type 2 gene (AGTR2), a modulator of insulin sensitivity, and exhibits allelic specific activity in muscle cells. Beyond providing insights into the genetics and pathophysiology of T2D, these results also underscore the value of reanalyzing publicly available data using novel genetic resources and analytical approaches. ; This work has been sponsored by the grant SEV-2011-00067 of Severo Ochoa Program, awarded by the Spanish Government. This work was supported by an EFSD/Lilly research fellowship. Josep M. Mercader was supported by Sara Borrell Fellowship from the Instituto Carlos III and Beatriu de Pinós fellowship from the Agency for Management of University and Research Grants (AGAUR). Sílvia Bonàs was FI-DGR Fellowship from FI-DGR 2013 from Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR, Generalitat de Catalunya). This study makes use of data generated by the WTCCC. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113. This study also makes use of data generated by the UK10K Consortium, derived from samples from UK10K COHORT IMPUTATION (EGAS00001000713). A full list of the investigators who contributed to the generation of the data is available in www.UK10K.org. Funding for UK10K was provided by the Wellcome Trust under award WT091310. We acknowledge PRACE for awarding us to access MareNostrum supercomputer, based in Spain at Barcelona. The technical support group, particularly Pablo Ródenas and Jorge Rodríguez, from the Barcelona Supercomputing Center is gratefully acknowledged. This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No 667191. Mercè Planas-Fèlix is funded by the Obra Social Fundación la Caixa fellowship under the Severo Ochoa 2013 program. Work from Irene Miguel-Escalada, Ignasi Moran, Goutham Atla, and Jorge Ferrer was supported by the National Institute for Health Research (NIHR) Imperial Biomedical Research Centre, the Wellcome Trust (WT101033), Ministerio de Economía y Competitividad (BFU2014-54284-R) and Horizon 2020 (667191). Irene Miguel-Escalada has received funding from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska–Curie grant agreement No 658145. We acknowledge Prof. Giulio Cossu (Institute of Inflammation and Repair, University of Manchester) for providing the muscle myoblast cell line. We also acknowledge the InterAct and SIGMA Type 2 Diabetes Consortia for access to the data to replicate the rs146662075 variant. A full list of the investigators of the SIGMA Type 2 Diabetes and the InterAct consortia is provided in Supplementary Notes 3 and 4. The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent research center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation (www.metabol.ku.dk). This research has been conducted using the UK Biobank Resource (application number 16803). We also acknowledge Bianca C. Porneala, MS for his technical assistance in the collection and curation of the genotype and phenotype data from Partners Biobank. We also thank Marcin von Grotthuss for their support for uploading the summary statistics data to the Type 2 Diabetes Genetic Portal (AMP-T2D portal). Finally, we thank all the Computational Genomics group at the BSC for their helpful discussions and valuable comments on the manuscript. ; Peer reviewed
As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy. ; We thank the DKFZ Genomics and Proteomics Core Facility and the OICR Genome Technologies Platform for provision of sequencing services. Financial support was provided by the consortium projects READNA under grant agreement FP7 Health-F4-2008-201418, ESGI under grant agreement 262055, GEUVADIS under grant agreement 261123 of the European Commission Framework Programme 7, ICGC-CLL through the Spanish Ministry of Science and Innovation (MICINN), the Instituto de Salud Carlos III (ISCIII) and the Generalitat de Catalunya. Additional financial support was provided by the PedBrain Tumor Project contributing to the International Cancer Genome Consortium, funded by German Cancer Aid (109252) and by the German Federal Ministry of Education and Research (BMBF, grants #01KU1201A, MedSys #0315416C and NGFNplus #01GS0883; the Ontario Institute for Cancer Research to PCB and JDM through funding provided by the Government of Ontario, Ministry of Research and Innovation; Genome Canada; the Canada Foundation for Innovation and Prostate Cancer Canada with funding from the Movember Foundation (PCB). PCB was also supported by a Terry Fox Research Institute New Investigator Award, a CIHR New Investigator Award and a Genome Canada Large-Scale Applied Project Contract. The Synergie Lyon Cancer platform has received support from the French National Institute of Cancer (INCa) and from the ABS4NGS ANR project (ANR-11-BINF-0001-06). The ICGC RIKEN study was supported partially by RIKEN President's Fund 2011, and the supercomputing resource for the RIKEN study was provided by the Human Genome Center, University of Tokyo. MDE, LB, AGL and CLA were supported by Cancer Research UK, the University of Cambridge and Hutchison-Whampoa Limited. SD is supported by the Torres Quevedo subprogram (MI CINN) under grant agreement PTQ-12-05391. EH is supported by the Research Council of Norway under grant agreements 221580 and 218241 and by the Norwegian Cancer Society under grant agreement 71220-PR-2006-0433. Very special thanks go to Jennifer Jennings for administrating the activity of the ICGC Verification Working Group and Anna Borrell for administrative support. ; This is the final version of the article. It first appeared from Nature Publishing Group via http://dx.doi.org/10.1038/ncomms10001
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits. ; B.P.C. acknowledges funding from Abigail Wexner Research Institute at Nationwide Children's Hospital; T.H. Nyrönen acknowledges funding from Academy of Finland grant #31996; A.M.-J., K.N., T.F.B., O.M.H., and Z.S. acknowledge funding from Australian Medical Research Future Fund; M.S. acknowledges funding from Biobank Japan; D. Bujold and S.J.M.J. acknowledge funding from Canada Foundation for Innovation; L.J.D. acknowledges funding from Canada Foundation for Innovation Cyber Infrastructure grant #34860; D. Bujold and G.B. acknowledge funding from CANARIE; L.J.D. acknowledges funding from CANARIE Research Data Management contract #RDM-090 (CHORD) and #RDM2-053 (ClinDIG); K.K.-L. acknowledges funding from CanSHARE; T.L.T. acknowledges funding from Chan Zuckerberg Initiative; T. Burdett acknowledges funding from Chan Zuckerberg Initiative grant #2017-171671; D. Bujold, G.B., and L.D.S. acknowledge funding from CIHR; L.J.D. acknowledges funding from CIHR grant #404896; M.J.S.B. acknowledges funding from CIHR grant #SBD-163124; M. Courtot and M. Linden acknowledge funding from CINECA project EU Horizon 2020 grant #825775; D. Bujold and G.B. acknowledge funding from Compute Canada; F.M.-G. acknowledges funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – NFDI 1/1 "GHGA – German Human Genome-Phenome Archive; R.M.H.-S. acknowledges funding from Duke-Margolis Center for Health Policy; S.B. and A.J.B. acknowledge funding from EJP-RD EU Horizon 2020 grant #825575; A. Niewielska, A.K., D.S., G.I.S., J.A.T., J.R., M.A.K., M. Baudis, M. Linden, S.B., S.S., T.H. Nyrönen, and T.M.K. acknowledge funding from ELIXIR; A. Niewielska acknowledges funding from EOSC-Life EU Horizon 2020 grant #824087; J.-P.H. acknowledges funding from ETH Domain Strategic Focal Area "Personalized Health and Related Technologies (PHRT)" grant #2017-201; F.M.-G. acknowledges funding from EUCANCan EU Horizon 2020 grant #825835; B.M.K., D. Bujold, G.B., L.D.S., M.J.S.B., N.S., S.E.W., and Y.J. acknowledge funding from Genome Canada; B.M.K., M.J.S.B., S.E.W., and Y.J. acknowledge funding from Genome Quebec; F.M.-G. acknowledges funding from German Human Genome-Phenome Archive; C. Voisin acknowledges funding from Google; A.J.B. acknowledges funding from Health Data Research UK Substantive Site Award; D.H. acknowledges funding from Howard Hughes Medical Institute; S.B. acknowledges funding from Instituto de Salud Carlos III; S.-S.K. and K.T. acknowledge funding from Japan Agency for Medical Research and Development (AMED); S. Ogishima acknowledges funding from Japan Agency for Medical Research and Development (AMED) grant #20kk0205014h0005; C.Y. and K. Kosaki acknowledge funding from Japan Agency for Medical Research and Development (AMED) grant #JP18kk0205012; GEM Japan acknowledges funding from Japan Agency for Medical Research and Development (AMED) grants #19kk0205014h0004, #20kk0205014h0005, #20kk0205013h0005, #20kk0205012h0005, #20km0405401h0003, and #19km0405001h0104; J.R. acknowledges funding from La Caixa Foundation under project #LCF/PR/GN13/50260009; R.R.F. acknowledges funding from Mayo Clinic Center for Individualized Medicine; Y.J. and S.E.W. acknowledge funding from Ministère de l'Économie et de l'Innovation du Québec for the Can-SHARE Connect Project; S.E.W. and S.O.M.D. acknowledge funding from Ministère de l'Économie et de l'Innovation du Québec for the Can-SHARE grant #141210; M.A.H., M.C.M.-T., J.O.J., H.E.P., and P.N.R. acknowledge funding from Monarch Initiative grant #R24OD011883 and Phenomics First NHGRI grant #1RM1HG010860; A.L.M. and E.B. acknowledge funding from MRC grant #MC_PC_19024; P.T. acknowledges funding from National University of Singapore and Agency for Science, Technology and Research; J.M.C. acknowledges funding from NHGRI; A.H.W. acknowledges funding from NHGRI awards K99HG010157, R00HG010157, and R35HG011949; A.M.-J., K.N., D.P.H., O.M.H., T.F.B., and Z.S. acknowledge funding from NHMRC grants #GNT1113531 and #GNT2000001; D.L.C. acknowledges funding from NHMRC Ideas grant #1188098; A.B.S. acknowledges funding from NHMRC Investigator Fellowship grant #APP177524; J.M.C. and L.D.S. acknowledge funding from NIH; A.A.P. acknowledges funding from NIH Anvil; A.V.S. acknowledges funding from NIH contract #HHSN268201800002I (TOPMed Informatics Research Center); S.U. acknowledges funding from NIH ENCODE grant #UM1HG009443; M.C.M.-T. and M.A.H. acknowledge funding from NIH grant #1U13CA221044; R.J.C. acknowledges funding from NIH grants #1U24HG010262 and #1U2COD023196; M.G. acknowledges funding from NIH grant #R00HG007940; J.B.A., S.L., P.G., E.B., H.L.R., and L.S. acknowledge funding from NIH grant #U24HG011025; K.P.E. acknowledges funding from NIH grant #U2C-RM-160010; J.A.E. acknowledges funding from NIH NCATS grant #U24TR002306; M.M. acknowledges funding from NIH NCI contract #HHSN261201400008c and ID/IQ Agreement #17X146 under contract #HHSN2612015000031 and #75N91019D00024; R.M.C.-D. acknowledges funding from NIH NCI grant #R01CA237118; M. Cline acknowledges funding from NIH NCI grant #U01CA242954; K.P.E. acknowledges funding from NIH NCI ITCR grant #1U24CA231877-01; O.L.G. acknowledges funding from NIH NCI ITCR grant #U24CA237719; R.L.G. acknowledges funding from NIH NCI task order #17X147F10 under contract #HHSN261200800001E; A.F.R. acknowledges funding from NIH NHGRI grant #RM1HG010461; N.M. and L.J.Z. acknowledge funding from NIH NHGRI grant #U24HG006941; R.R.F., T.H. Nelson, L.J.B., and H.L.R. acknowledge funding from NIH NHGRI grant #U41HG006834; B.J.W. acknowledges funding from NIH NHGRI grant #UM1HG009443A; M. Cline acknowledges funding from NIH NHLBI BioData Catalyst Fellowship grant #5118777; M.M. acknowledges funding from NIH NHLBI BioData Catalyst Program grant #1OT3HL142478-01; N.C.S. acknowledges funding from NIH NIGMS grant #R35-GM128636; M.C.M.-T., M.A.H., P.N.R., and R.R.F. acknowledge funding from NIH NLM contract #75N97019P00280; E.B. and A.L.M. acknowledge funding from NIHR; R.G. acknowledges funding from Project Ris3CAT VEIS; S.B. acknowledges funding from RD-Connect, Seventh Framework Program grant #305444; J.K. acknowledges funding from Robertson Foundation; S.B. and A.J.B. acknowledge funding from Solve-RD, EU Horizon 2020 grant #779257; T.S. and S. Oesterle acknowledge funding from Swiss Institute of Bioinformatics (SIB) and Swiss Personalized Health Network (SPHN), supported by the Swiss State Secretariat for Education, Research and Innovation SERI; S.J.M.J. acknowledges funding from Terry Fox Research Institute; A.E.H., M.P.B., M. Cupak, M.F., and J.F. acknowledge funding from the Digital Technology Supercluster; D.F.V. acknowledges funding from the Australian Medical Research Future Fund, as part of the Genomics Health Futures Mission grant #76749; M. Baudis acknowledges funding from the BioMedIT Network project of Swiss Institute of Bioinformatics (SIB) and Swiss Personalized Health Network (SPHN); B.M.K. acknowledges funding from the Canada Research Chair in Law and Medicine and CIHR grant #SBD-163124; D.S., G.I.S., M.A.K., S.B., S.S., and T.H. Nyrönen acknowledge funding from the EU Horizon 2020 Beyond 1 Million Genomes (B1MG) Project grant #951724; P.F., A.D.Y., F.C., H.S., I.U.L., D. Gupta, M. Courtot, S.E.H., T. Burdett, T.M.K., and S.F. acknowledge funding from the European Molecular Biology Laboratory; Y.J. and S.E.W. acknowledge funding from the Government of Canada; P.G. acknowledges funding from the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-206); J.Z. acknowledges funding from the Government of Ontario; C.K.Y. acknowledges funding from the Government of Ontario, Canada Foundation for Innovation; C. Viner and M.M.H. acknowledge funding from the Natural Sciences and Engineering Research Council of Canada (grant #RGPIN-2015-03948 to M.M.H. and Alexander Graham Bell Canada Graduate Scholarship to C.V.); K.K.-L. acknowledges funding from the Program for Integrated Database of Clinical and Genomic Information; J.K. acknowledges funding from the Robertson Foundation; D.F.V. acknowledges funding from the Victorian State Government through the Operational Infrastructure Support (OIS) Program; A.M.L., R.N., and H.V.F. acknowledge funding from Wellcome (collaborative award); F.C., H.S., P.F., and S.E.H. acknowledge funding from Wellcome Trust grant #108749/Z/15/Z; A.D.Y., H.S., I.U.L., M. Courtot, H.E.P., P.F., and T.M.K. acknowledge funding from Wellcome Trust grant #201535/Z/16/Z; A.M., J.K.B., R.J.M., R.M.D., and T.M.K. acknowledge funding from Wellcome Trust grant #206194; E.B., P.F., P.G., and S.F. acknowledge funding from Wellcome Trust grant #220544/Z/20/Z; A. Hamosh acknowledges funding from NIH NHGRI grant U41HG006627 and U54HG006542; J.S.H. acknowledges funding from National Taiwan University #91F701-45C and #109T098-02; the work of K.W.R. was supported by the Intramural Research Program of the National Library of Medicine, NIH. For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. H.V.F. acknowledges funding from Wellcome Grant 200990/A/16/Z 'Designing, developing and delivering integrated foundations for genomic medicine'. ; Peer reviewed