HomeLearnArticle

How to access the John Hopkins University Covid-19 Data

Published: Apr 16, 2020

  • MongoDB
  • Atlas
  • ...

By Maxime Beugnet

 and Mark Smith

Share

As the Covid-19 pandemic has swept the globe the work of JHU (John Hopkins University) and their Covid-19 dashboard has become vitally important in keeping people informed about the progress of the virus in their communities, in their countries, and in the world in general.

JHU not only publishes their dashboard, but they make the data powering it freely available for anyone to use. However, this data is not in a format that is easy to consume. At MongoDB, we've been working with this data since it was first published while developing our Charts dashboard. Now we're making our more accessible version of the JHU data freely available for anyone to use.

We have not modified the data in any way. What we have done is structured it better and made it easier to query by placing it within a MongoDB Atlas cluster and by creating some convenient APIs.

#Where do we get the data?

All data is from JHU, they source their data from

  • the World Health Organization,
  • the National Health Commission of the People’s Republic of China,
  • the United States Centre for Disease Control,
  • the Australia Government Department of Health,
  • the European Centre for Disease Prevention and Control,
  • and many others.

You can read the full list on their GitHub repository.

#The MongoDB Dataset

We produced two different databases in our cluster.

  • covid19jhu contains the raw CSV files imported with the mongoimport tool,
  • covid19 contains the same dataset but with a clean MongoDB schema design with all the good practises we are recommending.

Here is an example of the documents in the covid19 database:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{ "_id" : ObjectId("5e8cd51700c98085a07a8d25"), "uid" : 250, "country_iso2" : "FR", "country_iso3" : "FRA", "country_code" : 250, "country" : "France", "combined_name" : "France", "population" : 65273512, "loc" : { "type" : "Point", "coordinates" : [ 2.2137, 46.2276 ] }, "date" : ISODate("2020-04-06T00:00:00Z"), "confirmed" : 98010, "deaths" : 8911, "recovered" : 17250 }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{ "_id" : ObjectId("5e8cd51800c98085a07ce9e1"), "uid" : 84036115, "country_iso2" : "US", "country_iso3" : "USA", "country_code" : 840, "fips" : 36115, "city" : "Washington", "state" : "New York", "country" : "US", "combined_name" : "Washington, New York, US", "population" : 61204, "loc" : { "type" : "Point", "coordinates" : [ -73.4304, 43.3115 ] }, "date" : ISODate("2020-04-06T00:00:00Z"), "confirmed" : 19, "deaths" : 1 }

The documents above were obtained by joining together the file UID_ISO_FIPS_LookUp_Table.csv and the CSV files time series you can find in this folder.

If you would prefer to host the data yourself, the scripts required to download and transform the JHU data are open-source. You can view them and instructions for how to use them on our GitHub repository.

#Get Started

You can begin exploring the data right away without any MongoDB or programming experience using MongoDB Charts or MongoDB Compass.

With Charts, you can create visualisations of the data using any of the pre-built graphs and charts. You can then arrange this into a unique dashboard, or embed the charts in your pages or blogs.

If you want to create your own MongoDB Charts dashboard, you will need to setup your own Free MongoDB Atlas cluster and import the dataset in your cluster using the import scripts.

You can also use the code MAXIME200 to get $200 of MongoDB Atlas credit just in case you would like to try more advanced features.

Compass allows you to dig deeper into the data using the MongoDB Query Language or via the Aggregation Pipeline visual editor. Perform a range of operations on the data, including mathematical, comparison and groupings. Create documents which provide unique insights and interpretations. You can use the output from your pipelines as data-sources for your Charts.

Screencast showing some of the features of MongoDB Compass for exploring Covid-19 Data

Of course, because we store the data in MongoDB, you can also access it via the MongoDB Shell or using any of our drivers. We've included examples below for Java, Node.js, and Python to get you started. Of course, you only have a read only access.

1
mongo "mongodb+srv://covid-19.hip2i.mongodb.net/test" --username readonly --password readonly

For MongoDB Compass or your driver, you can use this connexion string.

1
mongodb+srv://readonly:readonly@covid-19.hip2i.mongodb.net/test

In the following sections, we will show you how to consume this dataset using the Java, Node.js and Python drivers.

We will show you how to perform the following queries in each language:

  • Retrieve the last 5 days of data for a given place,
  • Retrieve all the data for the last day,
  • Make a geo spatial query to retrieve data within a certain distance of a given place.

#Accessing the Data with Java

Our Java examples are available in our Github Repository's Java folder.

#With the MongoDB Driver

Here is the main class of our Java example. Of course you need the three POJOs from the Java Github folder to make this work.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
package com.mongodb.coronavirus; import com.mongodb.ConnectionString; import com.mongodb.MongoClientSettings; import com.mongodb.client.MongoClient; import com.mongodb.client.MongoClients; import com.mongodb.client.MongoCollection; import com.mongodb.client.MongoDatabase; import org.bson.codecs.configuration.CodecRegistry; import org.bson.codecs.pojo.PojoCodecProvider; import org.bson.conversions.Bson; import java.util.Date; import static com.mongodb.client.model.Filters.*; import static com.mongodb.client.model.Sorts.descending; import static org.bson.codecs.configuration.CodecRegistries.fromProviders; import static org.bson.codecs.configuration.CodecRegistries.fromRegistries; public class MongoDB { public static void main(String[] args) { try (MongoClient client = MongoClients.create(getMongoClient())) { int earthRadius = 6371; MongoDatabase db = client.getDatabase("covid19"); MongoCollection<Stat> statsCollection = db.getCollection("statistics", Stat.class); MongoCollection<Metadata> metadataCollection = db.getCollection("metadata", Metadata.class); System.out.println("Query to get the last 5 entries for France (continent only)"); Bson franceFilter = eq("country", "France"); Bson noStateFilter = eq("state", null); statsCollection.find(and(franceFilter, noStateFilter)).sort(descending("date")).limit(5).forEach(System.out::println); System.out.println("\nQuery to get the last day data (limited to 5 docs here)."); Metadata metadata = metadataCollection.find().first(); Date lastDate = metadata.getLastDate(); Bson lastDayFilter = eq("date", lastDate); statsCollection.find(lastDayFilter).limit(5).forEach(System.out::println); System.out.println("\nQuery to get the last day data for all the countries within 500km of Paris."); Bson aroundParisFilter = geoWithinCenterSphere("loc", 2.341908, 48.860199, 500.0 / earthRadius); statsCollection.find(and(lastDayFilter, aroundParisFilter)).forEach(System.out::println); System.out.println("\nPrint the Metadata summary."); metadataCollection.find().forEach(System.out::println); } } private static MongoClientSettings getMongoClient() { String connectionString = "mongodb+srv://readonly:readonly@covid-19.hip2i.mongodb.net/test"; CodecRegistry pojoCodecRegistry = fromProviders(PojoCodecProvider.builder().automatic(true).build()); CodecRegistry codecRegistry = fromRegistries(MongoClientSettings.getDefaultCodecRegistry(), pojoCodecRegistry); return MongoClientSettings.builder() .applyConnectionString(new ConnectionString(connectionString)) .codecRegistry(codecRegistry) .build(); } }

#Accessing the Data with Node.js

Our Node.js examples are available in our Github Repository's Node.js folder.

#With the MongoDB Driver

Check out the instructions in the Node.js folder.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
const MongoClient = require("mongodb").MongoClient; const uri = "mongodb+srv://readonly:readonly@covid-19.hip2i.mongodb.net/test?retryWrites=true&w=majority"; const client = new MongoClient(uri, { useNewUrlParser: true, useUnifiedTopology: true, }); client.connect((err) => { const statistics = client.db("covid19").collection("statistics"); // find the latest 15 cases from France statistics .find({ country: "France" }) .sort([["a", 1]]) .limit(15) .toArray(function (err, docs) { if (err) { console.error(err); } console.log(docs); client.close(); }); });

#Accessing the Data with Python

Our Python examples are available in our Github Repository's Python folder.

#With the MongoDB Driver

See all the instructions to get started in the Python folder.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#!python3 import pymongo from pymongo import MongoClient from tabulate import tabulate EARTH_RADIUS = 6371.0 MDB_URL = "mongodb+srv://readonly:readonly@covid-19.hip2i.mongodb.net/test?retryWrites=true&w=majority" def main(): client = MongoClient(MDB_URL) db = client.get_database("covid19") stats = db.get_collection("statistics") metadata = db.get_collection("metadata") # Get some results for the UK: print("\nMost recent 10 statistics for the UK:") results = ( stats.find({"country": "United Kingdom", "state": None}) .sort("date", pymongo.DESCENDING) .limit(10) ) print_table(["date", "confirmed", "deaths"], results) # Get the last date loaded: meta = metadata.find_one() last_date = meta["last_date"] # Show the 5 locations with the most recovered cases: print("\nThe last day's highest reported recoveries:") results = ( stats.find({"date": last_date}).sort("recovered", pymongo.DESCENDING).limit(5) ) print_table(["combined_name", "recovered"], results) # Confirmed cases for all countries within 500km of Paris: print( "\nThe last day's confirmed cases for all the countries within 500km of Paris:" ) results = stats.find( { "date": last_date, "loc": { "$geoWithin": { "$centerSphere": [[2.341908, 48.860199], 500.0 / EARTH_RADIUS] } }, } ) print_table(["combined_name", "confirmed"], results) def print_table(doc_keys, search_results, headers=None): """ Utility function to print a query result as a table. Params: doc_keys: A list of keys for data to be extracted from each document. search_results: A MongoDB cursor. headers: A list of headers for the table. If not provided, attempts to generate something sensible from the provided `doc_keys` """ if headers is None: headers = [key.replace("_", " ").replace("-", " ").title() for key in doc_keys] records = (extract_tuple(doc, doc_keys) for doc in search_results) print(tabulate(records, headers=headers)) def extract_tuple(mapping, keys): """ Extract a tuple from a mapping by requesting a sequence of keys. Missing keys will result in `None` values in the resulting tuple. """ return tuple([mapping.get(key) for key in keys]) if __name__ == "__main__": main()

#Wrap up

We see the value and importance of making this data as readily available to everyone as possible, so we're not stopping here. Over the coming days, we'll be adding a GraphQL and REST API, as well as making the data available within Excel and Google Sheets.

We've also launched an Atlas credits program for anyone working on detecting, understanding, and stopping the spread of COVID-19.

If you have any questions, suggestions, or would like any assistance working with the data, we're always available on the community forums. You can also reach out to Aaron, Joe, Mark, and Maxime directly on Twitter.

MongoDB Icon
  • Developer Hub
  • Documentation
  • University
  • Community Forums

© MongoDB, Inc.