October 14, 2020
Machine learning model helps characterize compounds for drug discovery
WEST LAFAYETTE, Ind. – Tandem mass spectrometry is a powerful analytical tool used to characterize complex mixtures in drug discovery and other fields.
Now, Purdue University innovators have created a new method of applying machine learning concepts to the tandem mass spectrometry process to improve the flow of information in the development of new drugs. Their work is published in Chemical Science.
“Mass spectrometry plays an integral role in drug discovery and development,” said Gaurav Chopra, an assistant professor of analytical and physical chemistry in Purdue’s College of Science. “The specific implementation of bootstrapped machine learning with a small amount of positive and negative training data presented here will pave the way for becoming mainstream in day-to-day activities of automating characterization of compounds by chemists.”
Chopra said there are two major problems in the field of machine learning used for chemical sciences. Methods used do not provide chemical understanding of the decisions that are made by the algorithm, and new methods are not typically used to do blind experimental tests to see if the proposed models are accurate for use in a chemical laboratory.
“We have addressed both of these items for a methodology that is isomer selective and extremely useful in chemical sciences to characterize complex mixtures, identify chemical reactions and drug metabolites, and in fields such as proteomics and metabolomics,” Chopra said.
The Purdue researchers created statistically robust machine learning models to work with less training data – a technique that will be useful for drug discovery. The model looks at a common neutral reagent – called 2-methoxypropene (MOP) – and predicts how compounds will interact with MOP in a tandem mass spectrometer in order to obtain structural information for the compounds.
“This is the first time that machine learning has been coupled with diagnostic gas-phase ion-molecule reactions, and it is a very powerful combination, leading the way to completely automated mass spectrometric identification of organic compounds,” said Hilkka Kenttämaa, the Frank Brown Distinguished Professor of Analytical Chemistry and Organic Chemistry. “We are now introducing many new reagents into this method.”
The Purdue team introduces chemical reactivity flowcharts to facilitate chemical interpretation of the decisions made by the machine learning method that will be useful to understand and interpret the mass spectra for structural information.
This work aligns with other innovations and research from Chopra’s and Kenttämaa’s labs, whose team members work with the Purdue Research Foundation Office of Technology Commercialization to patent numerous technologies. To find out more information about their patented inventions, contact email@example.com.
About Purdue Research Foundation Office of Technology Commercialization
The Purdue Research Foundation Office of Technology Commercialization operates one of the most comprehensive technology transfer programs among leading research universities in the U.S. Services provided by this office support the economic development initiatives of Purdue University and benefit the university's academic activities through commercializing, licensing and protecting Purdue intellectual property. The office recently moved into the Convergence Center for Innovation and Collaboration in Discovery Park District, adjacent to the Purdue campus. In fiscal year 2020, the office reported 148 deals finalized with 225 technologies signed, 408 disclosures received and 180 issued U.S. patents. The office is managed by the Purdue Research Foundation, which received the 2019 Innovation and Economic Prosperity Universities Award for Place from the Association of Public and Land-grant Universities. In 2020, IPWatchdog Institute ranked Purdue third nationally in startup creation and in the top 20 for patents. The Purdue Research Foundation is a private, nonprofit foundation created to advance the mission of Purdue University. Contact firstname.lastname@example.org for more information.
About Purdue University
Purdue University is a top public research institution developing practical solutions to today’s toughest challenges. Ranked the No. 5 Most Innovative University in the United States by U.S. News & World Report, Purdue delivers world-changing research and out-of-this-world discovery. Committed to hands-on and online, real-world learning, Purdue offers a transformative education to all. Committed to affordability and accessibility, Purdue has frozen tuition and most fees at 2012-13 levels, enabling more students than ever to graduate debt-free. See how Purdue never stops in the persistent pursuit of the next giant leap at purdue.edu.
Hilkka Kenttämaa, email@example.com
Graph-based machine learning interprets and predicts diagnostic isomer-selective ion-molecule reactions in tandem mass spectrometry
Jonathan Fine, Judy Kuan-Yu Liu, Armen Beck, Kawthar Z Alzarieni, Xin Ma, Victoria M. Boulos, Hilkka I. Kenttamaa and Gaurav Chopra
Diagnostic ion-molecule reactions employed in tandem mass spectrometry experiments can frequently be used to differentiate between isomeric compounds unlike the popular collision-activated dissociation methodology. Selected neutral reagents, such as 2-methoxypropene (MOP), are introduced into an ion trap mass spectrometer where they react with protonated analytes to yield product ions that are diagnostic for the functional groups present in the analytes. However, the understanding and interpretation of the mass spectra obtained can be challenging and time-consuming. Here, we introduce the first bootstrapped decision tree mod-el trained on 36 known ion-molecule reactions with MOP. It uses the graph-based connectivity of analytes’ functional groups as input to predict whether the protonated analyte will undergo a diagnostic reaction with MOP. A Cohen Kappa statistic of 0.70 was achieved with a blind test set, suggesting substantial inter-model reliability on limited training data. Prospective diagnostic product predictions were experimentally tested for 13 previously unpublished analytes. We introduce chemical reactivity flowcharts to facilitate chemical interpretation of the decisions made by the machine learning method that will be useful to understand and interpret the mass spectra for chemical reactivity.