Using machine learning, researchers are finding patterns in electronic medical record data to better identify those who may have the disease.
The research team, supported by the National Institutes of Health, identified the characteristics of people with long-term COVID and those who may be ill. Using machine learning techniques, the researchers analyzed an unprecedented collection of electronic health records (EHRs) available for COVID-19 studies to better identify who has long had COVID. Investigating dehidified EHR data from the National COVID Cohort Collaborative (N3C), a national centralized public database headed by the National Center for the Advancement of NIH (NCATS), the team used this data to find more than 100,000 likely long-term COVID cases as of October (as of May 2022, the number is more than 200,000). The findings appear in The Lancet Digital Health.
Prolonged COVID is marked by extensive symptoms including shortness of breath, fatigue, fever, headaches, “brain fog” and other neurological problems. Such symptoms may last for many months or longer after the initial diagnosis of COVID-19. One of the reasons why COVID-19 is difficult to identify is that many of its symptoms are similar to those of other diseases and conditions. The best feature of long-term COVID may lead to improved diagnosis and new therapeutic approaches.
Emily Pfaff, co-author and clinical computer scientist at the University of North Carolina, said: “It made sense to take advantage of modern data analysis tools and unique big data resources such as N3C, where many features of long COVID can be presented.”
The N3C data enclave currently contains information representing more than 13 million people nationwide, including nearly 5 million positive COVID-19 cases. The resource allows you to quickly explore new issues about COVID-19 vaccines, treatments, risk factors and health outcomes.
The new study is part of a related, larger trans-NIH initiative, Researching COVID to Enhance Recovery (RECOVER), which aims to improve understanding of the long-term effects of COVID-19, called the post-acute effects of SARS-CoV. -2 infection (PASC). RECOVER will accurately identify people with PASC and develop approaches to its prevention and treatment. The program will also provide answers to important research questions about the long-term effects of COVID through clinical trials, long-term observational studies and more.
The models focused on identifying potential long-term patients with COVID among three groups in the N3C database: all patients with COVID-19, patients hospitalized with COVID-19, and patients who had COVID-19 but were not hospitalized. The models proved to be accurate because the people identified at risk for long-term COVID were similar to the patients observed in long-term COVID clinics. Machine learning systems classified approximately 100,000 patients in the N3C database whose profiles were close to those with long-term COVID.
Josh Fessel, senior clinical advisor to NCATS and head of the RECOVER research program, said: “Once you can identify who has long COVID in a large database of people, you can start asking questions about those people. Did these people have anything else before they developed long-term COVID? Did they have certain risk factors? Was there anything in the way they were treated during acute COVID that could increase or decrease the risk of prolonged COVID? ”
The models looked for common features, including new medications, doctor visits and new symptoms, in patients with a positive COVID diagnosis who were at least 90 days post-acute infection. Models identified patients as long-term patients with COVID when they visited a long-term COVID clinic or demonstrated long-term COVID symptoms and probably had the disease but were not diagnosed.