Enlarge Text

Mining Big Data for Medical Discovery

New Center for Data Science and Informatics ramps up medical research

BigData_500Byte by byte, big data continues to become bigger and even bigger-er. It is estimated that 2.5 quintillion (1 followed by 18 zeros) bytes of data are created every single day. The amount of data created in all areas of biomedical research is staggering, with many bits of information propagating from multiple directions. Today’s investigators are just beginning to incorporate data from myriad sources, including: “omic” or molecular-based data such as genomics, proteomics and metabolomics; imaging and signal data; clinical data from patient encounters; and population data found in regional and national data sets such as Medicare and the Framingham Heart Study.

Donald Lloyd-Jones, MD, ScM, FACC, FAHA

Donald Lloyd-Jones, MD, ScM, FACC, FAHA

At Northwestern Medicine, the potential for mining big data for groundbreaking medical discoveries is great—and has become even greater. In January, a new Center for Data Science and Informatics (CDSI) was launched. Housed within NUCATS, the CDSI encompasses and expands on the existing strength of the Northwestern Biomedical Informatics Center (NUBIC) by formally including the emerging area of data science. This strategy will further efforts to convert big data into big knowledge at Northwestern Medicine.

In addition to CDSI resources, the Galter Health Sciences Library, also part of NUCATS, can leverage massive amounts of bibliographic data for trend analyses to assess the full impact of Northwestern research. The library also offers new user-friendly statistical and bioinformatics tools such as SPSS, SAS, STATA, Partek Genomics Suite and Golden Helix SNP – as well as training on how to use them – so scientists can conduct their own data analyses.

“Big data is for this decade what mapping the human genome was for the last,” says Donald M. Lloyd-Jones, MD, ScM, senior associate dean for clinical and translational research, chair of the Department of Preventive Medicine and director of NUCATS. “Every discipline is producing massive amounts of information. Untangling that big data with data science is at the frontier of biomedical discovery and clinical care, and that’s where we want to be by creating the CDSI.”

Sanjiv Shah, MD

Sanjiv Shah, MD

Charting New Frontiers

When cardiologist Sanjiv J. Shah’s, moniker for a group of heart patients he was studying made it to the frontlines of care, he took it as an early measure of success. “A medical resident called me and said, ‘Dr. Shah, I admitted a huff puffer last night!’” recalls the associate professor of medicine at Feinberg. “His use of the term validated the work we were doing to better classify and treat these patients.”

In 2007, Dr. Shah, a 2000 Feinberg School of Medicine alumnus, returned to Northwestern intent on starting the first clinical program in the country, if not the world, for patients with heart failure with preserved ejection fraction (HFpEF). A common and growing cardiovascular condition, it remains difficult to identify and treat, in part, due to a one-size-fits-all medical approach. HFpEF expert Shah dubbed the condition “Huff-Puff” Syndrome both for the sound of the acronym and the shortness of breath many of these patients develop. To set up his clinic, Shah needed a systematic method for finding huff puffers.

“These patients are not easy to find,” he explains. “Many different providers care for them and there are no easy diagnostic indices like a low ejection fraction or elevated cholesterol readings to hang your hat on.”

So he enlisted the services of the Northwestern University Clinical and Translational Sciences (NUCATS) Institute. With the help of NUCATS, Shah mined the Northwestern Medicine Enterprise Data Warehouse (NMEDW) to target individuals who met his specified HFpEF clinical criteria. Every day the NMEDW, a repository of clinical and research data generated by the Feinberg School of Medicine and Northwestern Memorial HealthCare (NMHC), would reveal a few names. Soon Dr. Shah found his patients—and much more.

Since 2008, Northwestern Medicine’s unique HFpEF outpatient clinic has seen more than 1,300 individuals with the heart condition. Dr. Shah has not only been able to deliver more personalized care but also to make important clinical observations that could vastly improve patient outcomes. In the January 2015 issue of Circulation, Dr. Shah detailed the first study to conduct high-density phenotypic classification (phenomapping) of HFpEF. Using big data analytics, the investigators discovered three distinct groups of HFpEF patients: each has significantly different clinical profiles and levels of risk for hospitalization or death that demand tailored therapeutic strategies. These findings are a revolutionary departure from the current standard of care that lumps these patients into one broad HFpEF category.

From the get-go, Shah employed bioinformatics and big data analytics to accelerate his discovery of a novel classification system for HFpEF. It’s an approach that the recently established Center for Data Science and Informatics (CDSI) plans to foster with many more investigators at Feinberg and beyond.

Justin Starren, MD, PhD

Justin Starren, MD, PhD

Fueling Medical Discovery

Electronic health record (EHR) systems cross most people’s minds when they think of big data generators in biomedicine. Indeed, electronic charts produce copious amounts of digitally available information with every patient visit. Add data from healthcare claims, imaging studies, clinical outcomes and molecular assays — and the data get real big, real fast. By aggregating and sifting through large amounts of information, the hope is that patterns and predictors of health and disease will rise to the surface. They will, in turn, improve the speed and quality of translational research. In the near future, computerized simulations of patient populations, or “synthetic cohorts,” could replace the need to use real people in research studies.

“Traditional science focuses on exhaustively studying a small sample and generalizing findings to the larger population,” says Justin B. Starren, MD, PhD, chief of health and biomedical informatics in the Department of Preventive Medicine, deputy director of NUCATS and director of the CDSI. “Big data flips that concept around. Give me everything on everybody and then I will filter it down to figure out where I should best focus my research efforts.”

The NMEDW currently stores close to 70 billion observations on 4.9 million Northwestern Medicine patients. From its inception in 2007, the data warehouse has served as a single platform for Feinberg School of Medicine research data and NMHC clinical operations reporting (from financials to quality control). Its joint governance and dual-use model sets it apart from others and provides a premier informatics infrastructure for the CDSI. Says Starren, “We are leaders in our ability to fully integrate both healthcare and research data.”

Emillie Powell, MD

Emillie Powell, MD

Digging Deeper for Quality Care

Early intervention with antibiotics and blood pressure stabilization works wonders against sepsis, a serious and potentially deadly bloodstream infection. The problem: sepsis can be difficult to recognize, often hiding behind other health complaints from pneumonia to a skin infection. Most of the time, individuals don’t even know they have sepsis. And when they land in the emergency room, it is not a given that the healthcare team will arrive at a timely diagnosis.

Looking to improve sepsis care in the emergency department (ED), Emilie S. Powell, MD, ’09 GME, assistant professor of emergency medicine, joined forces with NUCATS’ data analysts. Mining the NMEDW for sepsis patients, she wanted to tease out how their condition went from pretty good to really bad before they ended up in intensive care. “The challenge with sepsis is that a patient could look well when they enter the ED but quickly become sick,” says Dr. Powell. “We wanted to identify barriers to treating with evidence-based guidelines to see where we were missing the boat and look for areas of opportunity.”

Powell and the NUCATS team developed an algorithm combining diagnostic codes and clinical parameters. They identified 376 severe and/or septic shock patients who came to Northwestern Medicine via the ED between 2009 and 2010 and then examined aspects of their care from the taking of vital signs at triage to lab results. Partnering with David H. Salzman, ’05 MD, ’09 GME, MEd, assistant professor of emergency medicine and medical education, Powell used big data to develop an in situ simulation, complete with an actor portraying a sepsis patient, to educate and train ED staff at Northwestern Memorial Hospital (NMH). Insights from Powell’s project are benefiting the training of emergency medicine residents as well as the education of medical school students and graduate students in public health.

The Future of Science

Big data is meaningless and potentially misleading without the resources (people and technology) to mine, model and make sense of trends. As it gets off the ground, the new CDSI will work to provide investigators at Northwestern Medicine, NUCATS and clinical partners with increasingly sophisticated services and tools to capture, search, integrate and analyze big data.

The many connections of providers who touch heart failure (HF) patient.

The many connections of providers who touch heart failure (HF) patient.

Already Feinberg’s data scientists are advancing their own field through a number of research initiatives. Siddhartha Jonnalagadda, PhD, assistant professor in Preventive Medicine, for example, has developed text-mining algorithms to summarize or select content from EHRs and biomedical literature. In fact for Shah’s HFpEF clinical trials, Jonnalagadda uses natural language processing and machine learning to scour the NMEDW for provider notes that contain the words “heart failure” or “HF” (along with other inclusion/exclusion criteria) to find suitable study candidates. Interested in how team dynamics impact care, Nicholas Soulakis, PhD, assistant professor of Preventive Medicine, uses big data to develop networks of provider-patient connections. Among his findings: on average, 112 NMH employees will interact with a cardiovascular patient’s electronic chart during a seven-day hospital stay.

Big data, it appears, has much to offer.

“More than two-thirds of Feinberg researchers we surveyed about the need for data science at Northwestern stated it will be absolutely critical for future research,” explains Dr. Starren. “The use of big data is the future of science. Computation has now joined theory and experimentation as a new third pillar of scientific progress and will be essential to accelerating discoveries going forward.”