Coronavirus Data Science

This repository contains Jupyter notebooks and python scripts for investigating the 2019 coronavirus outbreak. The goal is to serve as a starting point to track and analyze this outbreak. Getting an environment set up to read, analyze, and plot the outbreak data is not trivial. I am hoping this helps more people get started.

If you are a researcher, journalist, or other interested member of the public, please use this freely. If you are a data scientist, please fork and contribute back to build a better foundation for future research.

Goals

Provide a framework and tools for loading outbreak data into Python
Easily visualize outbreak geodata
Facilitate collaboration among researchers

Background

From the CDC:

2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people. The latest situation summary updates are available on CDC’s web page 2019 Novel Coronavirus, Wuhan, China.

This is an emerging, rapidly evolving situation and CDC will provide updated information as it becomes available.

Data Sources

The data for tracking the 2019-nCoV outbreak is provided by the Johns Hopkins Center for Systems Science and Engineering. They have created an interactive GIS Dashboard.

In response to this ongoing public health emergency, we developed an online dashboard (static snapshot shown below) to visualize and track the reported cases on a daily timescale; the complete set of data is downloadable as a google sheet. The case data visualized is collected from various sources, including WHO, U.S. CDC, ECDC China CDC (CCDC), NHC and DXY. DXY is a Chinese website that aggregates NHC and local CCDC situation reports in near real-time, providing more current regional case estimates than the national level reporting organizations are capable of, and is thus used for all the mainland China cases reported in our dashboard (confirmed, suspected, recovered, deaths). U.S. cases (confirmed, suspected, recovered, deaths) are taken from the U.S. CDC, and all other country (suspected and confirmed) case data is taken from the corresponding regional health departments. The dashboard is intended to provide the public with an understanding of the outbreak situation as it unfolds, with transparent data sources.

Pulling Updates from Google Sheets

The data is updated in a read-only Google Sheet.

Download credentials and install dependencies as described in the Google documentation..

python pull_gsheet_csse.py

Progress

The Jan 25 Jupyter notebook works on a snapshot of data from Jan 25.

Load the coronavirus data into a Pandas DataFrame and plot
Load world, China, and US shapefiles into GeoDataFrames
Merge the coronavirus DataFrame with the GeoDataFrames
Display on a map

The nCoV Spread Jupyter notebook loads all data files into one time-indexed DataFram.

Dependencies

Jupyter Notebooks

pip install pandas
pip install requests
pip install geopandas
pip install descartes

Short-term Roadmap

Load and visualize a data snapshot ✅
Create a script to download new data from Google Sheets ✅
Visualize time-series data ✅

contributions welcome!

Twitter @tyreus

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Model 1		Model 1
data		data
images		images
shapefiles		shapefiles
.gitignore		.gitignore
2019 nCoV Spread.ipynb		2019 nCoV Spread.ipynb
2019-nCoV Jan 25.ipynb		2019-nCoV Jan 25.ipynb
LICENSE		LICENSE
README.md		README.md
geodata_functions.py		geodata_functions.py
pull_gsheet_csse.py		pull_gsheet_csse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model 1

Model 1

data

data

images

images

shapefiles

shapefiles

.gitignore

.gitignore

2019 nCoV Spread.ipynb

2019 nCoV Spread.ipynb

2019-nCoV Jan 25.ipynb

2019-nCoV Jan 25.ipynb

LICENSE

LICENSE

README.md

README.md

geodata_functions.py

geodata_functions.py

pull_gsheet_csse.py

pull_gsheet_csse.py

Repository files navigation

Coronavirus Data Science

Goals

Background

Data Sources

Pulling Updates from Google Sheets

Progress

Dependencies

Short-term Roadmap

About

Releases

Packages

Contributors 2

Languages

License

pdtyreus/coronavirus-ds

Folders and files

Latest commit

History

Repository files navigation

Coronavirus Data Science

Goals

Background

Data Sources

Pulling Updates from Google Sheets

Progress

Dependencies

Short-term Roadmap

About

Topics

Resources

License

Stars

Watchers

Forks

Languages