Award Abstract #2028280

RAPID: A novel platform for data integration and deep learning on COVID-19

NSF Directorate:
BIO - Directorate for Biological Sciences
NSF Division:

Division of Environmental Biology

Initial Amendment Date:

Latest Amendment Date:

Award Number:


Award Instrument:


Program Manager:

Shannon Fehlberg

Start Date:

End Date:

Awarded Amount to Date:



Gholamali Rahnavard [email protected] (Principal Investigator)
Keith A Crandall (Co-Principal Investigator)
Marcos Perez-Losada (Co-Principal Investigator)


George Washington University
1918 F ST NW

NSF Program:
Systematics & Biodiversity Sci
COVID-19 Research
Program Reference Code(s):
Program Element Code(s):

The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has fundamentally changed the world, and yet its ultimate impact is unknown. While China has experienced a slowdown in new cases, infections in the US continue to rise and are threatening to exceed our health care system’s capacity. Tests capacities are limited compared to the need, hospital services are becoming overwhelmed, and critical supplies are in shortage. There is a diversity of efforts currently ongoing to develop both new treatments as well as vaccine strategies to combat COVID-19. Yet, we know from experience, the virus will evolve solutions to both host immune systems and intervention strategies. In order to diminish both the short-term and long-term impacts of COVID-19, it is essential to develop robust, repeatable, and accessible tools to integrate and analyze the diversity of data becoming available in the face of the COVID-19 pandemic. The development of a platform to characterize the dynamic nature of mutations in the virus and testing for associations with clinical variables and biomarkers is an essential broader impact and will help in making informed predictions of health outcomes such as the stage of the severity of the disease and efficacy of treatment. Additionally, this project provides professional development opportunities for early career researchers.

Advances in omics technologies provide a broad and deep range of genotypic and phenotypic data to integrate with clinical phenotypes. Machine learning techniques such as clustering using phylogenetic distance and Deep Neural Networks (DNNs) are suitable techniques to link these DNA level changes to clinical metadata for human disease prediction, diagnosis, and therapeutics. This project develops tools within an open-source platform for documented, repeatable analyses that can be conducted in real-time allowing integration of data from patients with new treatments/vaccines strategies. This deep learning bioinformatics platform will allow the prioritization of genes associated with outcome predictors, including health, therapeutic, and vaccine outcomes, as well as inform improved DNA tests for predicting disease status and severity. The computational tools developed in this study will provide the research community and health professionals with comprehensive and generic approaches for characterizing the dynamics of genotype/phenotype associations in viruses. Such tools allow healthcare professionals and researchers to address specific properties of viruses such as frequency and location of mutations across the viral genome. When added to other clinical and epidemiological data, such information could help pave the way to better treatments or a vaccine. The developed platform will provide a venue for robust, open, repeatable analyses of COVID-19 as more and more data become available.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.