AI Meets Basic Science

Northwestern basic scientists are leveraging artificial intelligence and machine learning to untangle complex intracellular processes.

by Emily Ayshford

Human bodies are intricate networks of systems, and those networks become even more complex as we try to understand what happens when things go awry at the cellular level — when genetic mutations cause cancer, for example, or when viruses hijack cells. 

Fortunately, as medical imaging and genomic data sets grow, so do the abilities of powerful new tools in the scientist’s toolbox: namely, artificial intelligence (AI) and machine learning. Over the past several years, these quickly evolving technologies have already impacted medicine at the clinical level — whether being used to diagnose disease or enhance electronic health record systems. Now, basic science research, too, is poised to benefit from these pattern-finding methodologies.  

Feng Yue, PhD

Feinberg scientists have begun to explore the possibilities, including using AI and machine learning to understand how viruses affect a cell’s nucleus, discover new genetic mutations that cause cancer, and help define a subset of autism.  

The potential is tremendous. But since the field is still relatively new, many faculty aren’t proficient in AI systems, and many AI investigators don’t fully understand which biological questions could potentially be answered. Several Feinberg faculty members act as bridges between these two fields, and through Feinberg’s new Institute for Artificial Intelligence in Medicine (I.AIM), faculty are training the next generation of investigators how to harness the power of AI.  

“Machine learning has become really powerful as a tool to understand human disease and cancer at the cellular level,” says Feng Yue, PhD, the Duane and Susan Burnham Professor of Molecular Medicine, director of I.AIM’s Center for Advanced Molecular Analysis and director of the Center for Cancer Genomics at the Robert H. Lurie Comprehensive Cancer Center of Northwestern University. “The possibilities are unlimited, and I can’t wait to see what happens.” 

Analyzing infected cells 

To use artificial intelligence in basic science research, investigators often train machine learning systems to sift through datasets, find patterns, and ultimately make predictions based on those patterns. At the cellular level, this can involve finding patterns in high-resolution images of cells or in huge genomic and epigenomic datasets.  

Machine learning techniques are often limited by their datasets — they must have enough good data to learn from — but the explosion of data and higher-resolution imaging technology within the past several years has begun to lead to new insights in the field.  

Derek Walsh, PhD

For Derek Walsh, PhD, professor of Microbiology-Immunology, machine learning offered a way to understand how viruses control cells. Viruses can control cells in many ways, from viral proteins present in the nucleus directly controlling gene expression to proteins working on the cell’s surface or in the cytoplasm to control cell signaling networks. But how and why the nucleus is moved and reorganized under various conditions, including during viral infection, remained a matter of investigation. 

Walsh and his team used a dataset of images of individual cells and developed automated cell imaging systems that use AI-based networks to identify and analyze infected cells. They found that viruses can control structural and genetic polarity inside the cell nucleus. The findings were published in Nature

“The intersection of AI and cell biology is still a relatively new but rapidly growing area, as people have begun to realize its power and it becomes more accessible or available,” Walsh says. “It requires some degree of proficiency in programming and understanding of AI that’s not common among cell biologists, at least not just yet. But we are getting there. There are several papers that have used AI approaches to analyze data at the cellular level in different contexts and that number is likely to grow exponentially.” 

Walsh is also a member of the Lurie Cancer Center.

The intersection of AI and cell biology is still a relatively new but rapidly growing area, as people have begun to realize its power and it becomes more accessible or available.


Finding new genetic mutations in cancer 

Helping bridge the gap between programming and health are professors like Feng Yue, who has a PhD in computer science and also had a postdoctoral fellowship in a wet lab. “We need to have informatics people who have deep domain knowledge of human biology and disease,” Yue says. “You cannot say, ‘I am an AI expert, tell me what to do.’” 

Yue has used his training to develop machine learning models that can detect previously undetectable patterns in the genome. In recent work, he and his collaborators discovered hundreds of genetic mutations in cancer that are undetectable by current genome sequencing. 

Within each cell, long strands of DNA need to be precisely folded and organized so that they can fit inside the nucleus, which is usually only a few micrometers in diameter. Previously, Yue and his collaborators showed that structural variants in cancer genomes can be detected by genomic analysis tools. 

In this study, published in Science Advances, Yue and his collaborators collected a set of curated high-confidence structural variations of different types from eight cancer cell lines. These were used to train a deep learning model — named EagleC — to learn the hidden patterns buried in these signals.  

EagleC found hundreds of fusion events in different types of cancers that were missed by current genome sequencing techniques. Gene fusion is a chromosomal rearrangement event that can play a significant role in cancers. This knowledge could be especially useful in studying cancer with a high frequency of fusion events, such as brain tumor and breast cancer. 

“We can use AI for basic science research, but we want to find the actionable item in the data,” Yue says. “How can we use this information on mutations to classify patients and make predictions on how they will respond to drugs? The ultimate goal is to find the genetic variations that are most important to human disease, so we can learn how to manipulate regulators and make a breakthrough in treatment.” 

Understanding autism 

In some cases, using machine learning on cellular-level datasets combined with clinical data sets can lead to groundbreaking discoveries on hard-to-pin-down diseases. 

Autism, for example, has no one defined cause, but it affects an estimated 1 in 44 children in the United States.

Yuan Luo, PhD

Yuan Luo, PhD, associate professor of Preventive Medicine in the Division of Health and Biomedical Informatics, used artificial intelligence to find a previously unknown biomarker for a subset of the disease.  

With access to huge amounts of genetic mutation data, sexually dimorphic gene expression patterns, animal model data, electronic health record data and health insurance claims data, he and his collaborators were able to cross-compare potential patterns.  

For a study published in Nature Medicine, the team identified clusters of gene exons — parts of genes that contain information coding for a protein — that function together during brain development. They then used a state-of-the-art AI algorithm, graph clustering, on gene expression data.  

They ultimately found that a certain autism subtype — known as dyslipidemia-associated autism, which represents about 7 percent of all diagnosed autism spectrum disorders in the United States — was characterized by abnormal lipid levels. This could lead to the first biomedical screening and intervention tool for this autism subtype.  

“That will allow early intervention in treatment,” says Luo, who is also chief AI officer at the Northwestern University Clinical and Translational Sciences (NUCATS) Institute and at I.AIM, as well as a member of the Lurie Cancer Center. “Only when we integrate different modalities of healthcare data can we actually develop profound insights of the disease etiology and come up with novel target interventions. In the next five to ten years, we’re going to see more and more studies that include multimodal data. That will really stoke the fire of multimodal machine learning.” 

Teaching the next generation 

Gathering the efforts of AI research at Feinberg is an important focus for I.AIM, whose mission is to bridge computational methods with human expertise to advance medical science and improve human health. 

Launched in 2020, the institute partners with investigators in the community to connect them with AI experts, help write grants, and educate the next generation of both clinical and basic science investigators on how to best use the tools of AI. 

The institute launched the Health Data Gymnasium, a website that teaches students the basics of how to use data science and machine learning methods to find new insights. In 2021, the institute also hosted the first annual Big Ten Augmented Intelligence Bowl, bringing together multi-disciplinary teams of investigators from institutions representing the Big Ten Academic Alliance to discover new ways that AI could address health disparities.  

Abel Kho, MD

“One of the biggest challenges in this space as that we don’t have enough qualified people in academic health who understand augmented intelligence and its possibilities,” says Abel Kho, MD, professor of Medicine and Preventive Medicine and director of the institute. “We want to cross-train as many people as we can to help build that pipeline.”  

At NUCATS, Luo — who is also a trained computer scientist — helped launch the first AI class in the medical school and is helping to bring clinicians, basic scientists, and computers scientists together to expose them to the possibilities of AI in healthcare.  

“There can sometimes be a lack of trust between the clinicians and AI scientists,” he says. “We have witnessed AI proponents boasting that AI could displace radiologists and pathologists, and clinicians feeling cynicism towards AI scientists throwing models at data. We really need to train them to work together and learn from each other in creative ways”.  

To that end, Luo launched the AI for Health (AI4H) clinic in 2019, which is open to all practicing Northwestern Medicine physicians. In this clinic, physicians discuss a clinical problem which could benefit from AI, while AI scientists and trainees help brainstorm ideas, iterate solutions, and deploy implementations.  

“The clinic serves as a new arena to train clinicians and AI scientists, where they learn from each other through real-world case studies,” Luo says.  “For example, AI and machine learning can help prioritize scientific hypotheses for investigators — helping them focus their energies on the most promising candidate instead of searching for the needle in the haystack.” 

Listen to a recent Breakthroughs podcast about AI and machine learning with Kho here.