Capturing variation impact on molecular interactions in the IMEx Consortium mutations data set
The current wealth of genomic variation data identified at nucleotide level presents the challenge of understanding by which mechanisms amino acid variation affects cellular processes. These effects may manifest as distinct phenotypic differences between individuals or result in the development of disease. Physical interactions between molecules are the linking steps underlying most, if not all, cellular processes. Understanding the effects that sequence variation has on a molecule's interactions is a key step towards connecting mechanistic characterization of nonsynonymous variation to phenotype. We present an open access resource created over 14 years by IMEx database curators, featuring 28,000 annotations describing the effect of small sequence changes on physical protein interactions. We describe how this resource was built, the formats in which the data is provided and offer a descriptive analysis of the data set. The data set is publicly available through the IntAct website and is enhanced with every monthly release. ; The IntAct database and EMBL-EBI-based authors received funding from EMBL core funding and Open Targets (grant agreement OTAR-044). The DIP database is funded by NIH grant R01GM123126. MINT received support from ERC grant 'DEPTH project of the European Research Council (grant agreement 322749)'. The British Heart Foundation-University College of London (BHF-UCL) curation team is funded with the British Heart Foundation grant RG/13/5/30112. UniProt curation activities at EMBL-EBI and the Swiss Institute of Bioinformatics are funded by the National Eye Institute, National Human Genome Research Institute, National Heart, Lung, and Blood Institute, National Institute of Allergy and Infectious Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, National Institute of General Medical Sciences, and National Institute of Mental Health of the National Institutes of Health under Award Number [U24HG007822], National Human Genome Research Institute under Award Numbers [U41HG007822 and U41HG002273], and the National Institute of General Medical Sciences under Award Numbers [R01GM080646, P20GM103446 and U01GM120953] (the content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health); Swiss Federal Government through the State Secretariat for Education, Research and Innovation; and aforementioned British Heart Foundation grants and EMBL core funding. DisGeNET is supported with EU-FP7 funds from ISCIII-FEDER (CP10/00524, CPII16/00026), IMI-JU (grant agreement no. 116030, TransQST) and EFPIA companies in kind contribution, and the EU H2020 Programme 2014-2020 (grant agreements no. 634143, MedBioinformatics and no. 676559, Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a member of the Spanish National Bioinformatics Institute (INB), PRB2-ISCIII and is supported by grant PT13/0001/0023, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER. The DCEXS is a 'Unidad de Excelencia María de Maeztu', funded by the MINECO (ref: MDM-2014-0370). I.J. and group supported in part by Krembil Foundation, Ontario Research Fund (#34876), and Canada Foundation for Innovation (CFI #225404, #30865, #33536). The authors would like to thank Marco Galardini, Luz García-Alonso, Denes Turei and Martin Krallinger for valuable discussions when designing the data set output format; Iain Moal for providing key information about SKEMPI 2.0; Danish Memon for his help pre-processing cBioPortal data; and Luz García-Alonso as the creator of the Python scripts we used to parse UniProt mutagenesis annotations.