MIT scientists build a system that can generate AI models for biological research | Massachusetts Institute of Technology News

AI News


Is it possible to build a machine learning model without machine learning expertise?

Jim Collins, Termer Professor of Medical Engineering and Science at the Massachusetts Institute of Technology School of Biotechnology and Dean of Life Sciences at the Abdul Latif Jameel Clinic for Machine Learning in Health (Jamir Clinic), said many I decided to work on this initiative with my colleagues. This problem arises when faced with a similar challenge. An open access paper on their proposed solution, called BioAutoMATED, was published on his June 21st. cell system.

Recruiting machine learning researchers can be a time-consuming and financially costly process for science and engineering labs. Even for machine learning experts, choosing the right model, formatting and fine-tuning that model’s dataset can dramatically change the model’s performance, and it takes a lot of work. .

“How much time do you typically spend preparing and transforming data in your machine learning projects?” asks the 2022 Google Course on Machine Learning (ML) Fundamentals. His two choices offered are “Less than half the project time” or “More than half the project time”. If you guessed the latter, you would be correct. Google states that over 80% of his project time is spent formatting data, but this doesn’t even take into account the time required to frame the problem from a machine learning perspective.

“It would take weeks to find the right model for a dataset. This is a very prohibitive step for many people who want to use machine learning and biology,” said PhD 5. says sophomore Jacqueline Valeri. He is a Ph.D. in bioengineering in the Collins lab and the first co-author of his paper.

BioAutoMATED is an automated machine learning system that can select and build the right model for a given dataset, handling the arduous task of data preprocessing, reducing a months-long process to just hours You can also. Automated machine learning (AutoML) systems are still in a relatively early stage of development, and their current use is primarily focused on image and text recognition, although they are largely unused in biological subfields. First co-author and Jameel Clinic Postdoctoral Fellow, Dr. Louis Thornksen points out ’20.

“The basic language of biology is based on sequences,” explains Soenksen, a Ph.D. in mechanical engineering at MIT. “Biological sequences such as DNA, RNA, proteins, and glycans have the amazing information property that they are, like the alphabet, inherently standardized. Many of his AutoML tools are developed for text. , so it makes sense to extend it. [biological] sequence. ”

Additionally, most AutoML tools can only explore and build models of reduced type. “But you can’t really know from the start of a project which model is the best fit for your dataset,” he says. “Incorporating multiple tools under one overarching tool enables a much larger search space than any of his AutoML tools can achieve alone.”

BioAutoMATED’s repertoire of supervised ML models includes binary classification models (splitting the data into two classes), multiclass classification models (splitting the data into multiple classes), and regression models (fitting of continuous values ​​or (measures of relationship strength) include three types: variable). BioAutoMATED also helps determine the amount of data required to properly train the selected model.

“Our tool explores models that are better suited for smaller, sparse biological datasets and more complex neural networks,” says Valeri. This is an advantage for research groups working with new data that may or may not be suitable for machine learning problems. .

“It can be very expensive to conduct novel and successful experiments at the intersection of biology and machine learning. We need to invest in infrastructure and people trained in AI-ML,” explains Soenksen. See if their ideas are ready to come true. We want to lower these barriers for biology professionals. With BioAutoMATED, researchers are free to run initial experiments to assess whether it is worth hiring a machine learning expert to build another model for further experiments.

Researchers stress that the open source code is publicly available and easy to implement. “What we want is for people to take our code, improve it, and work with the larger community to make it a tool for everyone,” he says. says Mr. “We want to evoke the biological research community and raise awareness about AutoML technology. as a useful pathway.”

Collins, senior author of this paper, is also affiliated with the MIT Institute of Biomedical Engineering Sciences, the Harvard MIT Health Science and Technology Program, the Broad Institute at MIT and Harvard University, and the Wyss Institute. Other her MIT contributors to this paper include Katherine M. Collins ’21, Nicolaas M. Angenent-Mari PhD ’21; Timothy K. Lu is a professor of bioengineering, electrical engineering, and computer science.

This work was supported, in part, by a Defense Threat Reduction Agency grant, the Defense Advanced Research Projects Agency SD2 Program, the Paul G. Allen Frontier Group, and the Wyss Institute of Bioinspiration Engineering, Harvard University. MIT-Takeda Fellowship, Siebel Foundation Scholarship, CONACyT Grant, MIT-TATA Center Fellowship, Johnson & Johnson Undergraduate Research Scholarship, Barry Goldwater Scholarship, Marshall Scholarship, Cambridge Trust, and National Allergy and Infectious Diseases Diseases of the National Institutes of Health. This research is part of the Antibiotics-AI Project supported by the Audacious Project, Flu Lab, LLC, the Sea Grape Foundation, Rosamund Zander and his Hansjorg Wyss of the Wyss Foundation, and anonymous donors.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *