A day in the life of the Internet
We have roughly 250 servers around the world, and they ping each other every hour. This used to be moderately reasonable when we had many fewer servers (it's basically a variant of the Handshake Problem, where order matters so it's
), but now we're generating tens of thousands of rows per hour. We save all that data to a database, then move it up in chunks to S3 and Athena once a month to avoid storing it all locally. At present we have 2,852,372,148 rows of data.n * (n-1)
We'd like to share some of it, so here's the data from July 19th & 20th, 2020:
Using It
The pings should look like this:
"source","destination","timestamp","min","avg","max","mdev"
"0","285","2020-07-20 00:00:17.000","307.86","308.256","309.32","0.842"
"2","285","2020-07-20 00:00:17.000","180.694","182.514","221.879","7.351"
"3","285","2020-07-20 00:00:17.000","128.309","129.832","152.21","4.398"
"4","285","2020-07-20 00:00:17.000","106.727","107.544","116.499","2.045"
"6","285","2020-07-20 00:00:17.000","327.508","327.723","328.113","0.138"
Those columns are:
Column | Description |
---|---|
source |
The ID of the server running the ping |
destination |
The ID of the server being pinged |
timestamp |
The date and time the ping was initiated |
min |
The fastest ping time (round-trip time) observed |
avg |
The average ping time observed |
max |
The slowest ping time observed |
mdev |
The standard deviation of the mean ping time |
To build those records, we run ping -c 30 <server hostname>
and scoop up the summary at the end. (If you want to read more about how ping
works, Sean Connery will talk to you through it.)
You can look source
and destination
up in the servers file, which looks like this:
"id","name","title","location","state","country","state_abbv","continent","latitude","longitude"
"0","JoaoPessoa","Joao Pessoa","Patos","Paraiba","Brazil","PB","2","-7.0833","-34.8333"
"1","Melbourne","Melbourne","Melbourne","Victoria","Australia","VIC","4","-37.7833","144.9667"
"2","Toronto","Toronto","Toronto","Ontario","Canada","ON","1","43.6481","-79.4042"
"3","Prague","Prague","Prague","Prague","Czech Republic","","3","50.0833","14.4167"
"4","Paris","Paris","Paris","Ile-de-France","France","IDF","3","48.8742","2.347"
"6","Tokyo","Tokyo","Tokyo","Tokyo","Japan","","3","35.6833","139.7667"
Those columns are:
Column | Description |
---|---|
id |
The server ID (this will match source and destination from the pings) |
name |
Our internal name for the server (usually title without spaces) |
title |
A more readable (and technically correct) name for the server |
location |
Rough actual location. We often name servers for major cities, so we identify the suburb here if we know it. |
state |
The state or province the server resides in, if applicable. (This stuff gets really fun with different parts of the world recognizing different levels of government division, ask Mari about it.) |
country |
The country/administrative region the server is in |
state_abbv |
If there is a state, its abbreviation |
continent |
The ID of the continent, using the 5-continent model of the world. |
latitude |
The latitude in degrees of the server (generally the latitude of the city the server is in) |
longitude |
The longitude in degrees of the server (generally the longitude of the city the server is in) |
Notes
- We haven't done any data integrity checking on this data set. We just happened to share it with someone else and thought it might be more broadly useful.
- We work with data like these in MySQL. Importing the CSV files back into a relational database may make them easier to manipulate.
- We use the spherical law of cosines to calculate the great circle distance between two servers like this:
$lat1; // server 1 latitude in radians
$lon1; // server 1 longitude in radians
$lat2; // server 2 latitude in radians
$lon2; // server 2 longitude in radians
$radius = 6372.8; //Earth radius in kilometers
$dist = sin($lat1) * sin($lat2) + cos($lat1) * cos($lat2) * cos( $lon1 - $lon2)
$dist = acos($dist) * $radius;
Request
If you do anything with these data, please credit & link to us as a data source. If you'd like to start pinging your own stuff, you can do that with our Where's it Up? API (which is what we're using). If you need to test your website from around the world, use WonderProxy (that's us).
Showcase
Check out some folks doing interesting things with this ping data. (Note: we're not affiliated with them, we just think they're neat.)
- The folks at 15til.com are interested in low-latency music making on the internet, and they've used our ping data to create a latency network visualization.
- Utku Demir published an article on Choosing cloud regions for lower latency.