Tracking COVID-19: Hunting the Virus with Technology, AI, and Analytics

Experts offer insights on mapping and forecasting virus cases during Stanford HAI’s COVID-19 and AI Conference.

Sachin Waikar

fromStanford University

Last updated on 09 Apr, 2020. 4 minutes read

A Covid-19 taskforce examines a map of the virus’ spread. PHOTO CREDIT: Carlos Barria | Reuters

Mapping COVID-19

Beating COVID-19 requires tracking it closely.

Multiple groups have been monitoring the virus’s spread since late 2019, as discussed by John Brownstein, a Harvard pediatrics professor. “It takes a small army of people,” he says, highlighting efforts by universities and other organizations to use data-mining and other tools to track early signs of the outbreak online (such as through China’s WeChat app) and understand effects of intervention. His group’s research, for instance, shows that China’s control measures mitigated the virus’s spread.

The general public’s direct inputs are instrumental in this effort, including through use of symptom-checker apps such as a clinically validated one provided by Buoy: Users answer questions about symptoms, and data are aggregated to understand trends. The COVID Near You tool launched by Google and others to identify patterns/hot spots at the zip-code level already has 250,000+ users.

Part of the challenge is the potentially large rate of unreported infections, which can spread latently across large regions, prolonging the pandemic. “More testing can help us estimate undetected cases, but it’s expensive,” says Lucy Li, a scientist at the Chan Zuckerberg Biohub. Similarly, modeling based on case count data yields uncertain results. So Li and colleagues have turned to viral genomic data — the number of virus mutations between identified cases — along with testing and WHO time-series data to estimate unreported-infection rates.

Using data from Hubei, China, they found that about one in seven infections was detected. Outside of China, the rate can be as low as 1 detected case out of 20, meaning thousands of infections for every 100 detected. Such findings underscore the importance of social distancing and protective equipment like masks.

Forecasting the Spread

Forecasting COVID-19’s spread is even trickier than tracking it.

A big part of the challenge is that the go-to models and tools used to predict influenza rates and impact don’t apply “as is.” Ryan Tibshirani, a Carnegie Mellon professor of statistics and machine learning, works with the university’s DELPHI group, a CDC Center of Excellence for flu forecasting. “A pandemic doesn’t work like a typical flu,” he says.

His group and others are quickly building COVID data sets and adapting influenza-forecasting models including machine-learning-based ones, with help from large tech firms. The CDC has also launched a COVID forecasting challenge. Still, Tibshirani cautions, “Forecasting past four weeks is best avoided, for statistical and political reasons”— long-term forecasts may not be able to incorporate the effects of fast-changing policies aimed at the crisis, for example.

Gauging Impact on Well-Being

Physical health isn’t the only measure of interest to researchers addressing COVID. Many are assessing the pandemic’s mental-health and behavioral effects. “Anxiety and loneliness are affected by this slow-motion national disaster, reducing well-being,” says Johannes Eichstaedt, a Stanford assistant professor of psychology and HAI junior fellow.

Eichstaedt applied his decade of experience using Twitter data to understand well-being by analyzing 2 million recent COVID-hashtag tweets by content and geographic region, in combination with Census and other data.

Preliminary findings: More educated and urban areas are focused on information-seeking (such as about testing) and changing behaviors (hand washing); Trump voters have downplayed the potential impact (“like the flu”); collective negative sentiment centers around words like “economy” and “Trump.” The researchers are monitoring weekly changes in these trends.

Making Data Accessible

The urgency of the COVID crisis has underscored the need for “open science” — the growing trend to make data and other scientific-research inputs available so that other researchers can use these to verify existing findings and make new discoveries.

Mark Musen, Stanford professor of medicine and biomedical data science, says, “There’s a lot of data on the virus already, but it’s like a library with no good catalog.” That means researchers can’t maximize efforts to track and combat COVID by identifying/aggregating key data. Musen and others are working to make COVID data “fair”: findable, accessible, interoperable, and reusable.

A large part of the effort involves creating good “metadata” (data about data), such as by standardizing variable names (rather than “age,” “age (yr),” and “age_years”). Multiple organizations are using Stanford-developed technology like the web-based CEDAR Workbench to improve available COVID data.

Controlling the Coronavirus

Analytics and technology are playing a large role in efforts to control the coronavirus, as well.

Taiwan took a proactive, analytics-fueled approach to COVID-19 from early on. “They focused on tracking people at risk and making information public,” says Jason Wang, director of Stanford’s Center for Policy, Outcomes and Prevention.

That included activating a centralized command center for 24/7 virus-combating operations; checking temperatures of incoming air passengers and quarantining those with fevers; linking the national health-insurance and customs databases to identify at-risk citizens (such as those who’d traveled to Wuhan); and tracking quarantined citizens’ movements.

The government also gathered data on personal-protective equipment (PPE), quickly ramping up production to 11 million masks daily, while pointing citizens to retailers that sold PPE. These mutually reinforcing measures helped Taiwan maintain the lowest per-capita infection/fatality rates, while earning the government a 90 percent approval rating.

Still, as Taiwan’s effort suggests, a big trade-off in tracking the virus is between making use of comprehensive data and protecting privacy and other civil liberties — at risk when apps use GPS and other personally identifying data.

Here, Tina White, Stanford mechanical engineering doctoral student, asks, “Can we create a tracking and alerting system that doesn’t infringe on our liberties?”

To answer that, she’s part of a growing research partnership developing a Bluetooth-based mobile app that automates contact-tracing (through proximity-sensing), with non-personal information uploaded to a database. The app would use the information to alert users when they may have been exposed to COVID, to improve awareness and reduce spread.

In all of these ways, the speakers featured here are using technology, AI, and analytics to flatten the curve, slow the spread of COVID-19, and save lives.

Missed the conference? Watch the conference recording, and check the HAI blog for additional in-depth coverage.