Skip to content

A day in the life of the Internet

Paul Reinheimer Oct 28, 2020 Development

We have roughly 250 servers around the world, and they ping each other every hour. This used to be moderately reasonable when we had many fewer servers (it's basically a variant of the Handshake Problem, where order matters so it's n * (n-1)), but now we're generating tens of thousands of rows per hour. We save all that data to a database, then move it up in chunks to S3 and Athena once a month to avoid storing it all locally. At present we have 2,852,372,148 rows of data.

We'd like to share some of it, so here's the data from July 19th & 20th, 2020:

Using It

The pings should look like this:

"source","destination","timestamp","min","avg","max","mdev"
"0","285","2020-07-20 00:00:17.000","307.86","308.256","309.32","0.842"
"2","285","2020-07-20 00:00:17.000","180.694","182.514","221.879","7.351"
"3","285","2020-07-20 00:00:17.000","128.309","129.832","152.21","4.398"
"4","285","2020-07-20 00:00:17.000","106.727","107.544","116.499","2.045"
"6","285","2020-07-20 00:00:17.000","327.508","327.723","328.113","0.138"

Those columns are:

Column Description
source The ID of the server running the ping
destination The ID of the server being pinged
timestamp The date and time the ping was initiated
min The fastest ping time (round-trip time) observed
avg The average ping time observed
max The slowest ping time observed
mdev The standard deviation of the mean ping time

To build those records, we run ping -c 30 <server hostname> and scoop up the summary at the end. (If you want to read more about how ping works, Sean Connery will talk to you through it.)

You can look source and destination up in the servers file, which looks like this:

"id","name","title","location","state","country","state_abbv","continent","latitude","longitude"
"0","JoaoPessoa","Joao Pessoa","Patos","Paraiba","Brazil","PB","2","-7.0833","-34.8333"
"1","Melbourne","Melbourne","Melbourne","Victoria","Australia","VIC","4","-37.7833","144.9667"
"2","Toronto","Toronto","Toronto","Ontario","Canada","ON","1","43.6481","-79.4042"
"3","Prague","Prague","Prague","Prague","Czech Republic","","3","50.0833","14.4167"
"4","Paris","Paris","Paris","Ile-de-France","France","IDF","3","48.8742","2.347"
"6","Tokyo","Tokyo","Tokyo","Tokyo","Japan","","3","35.6833","139.7667"


Those columns are:

Column Description
id The server ID (this will match source and destination from the pings)
name Our internal name for the server (usually title without spaces)
title A more readable (and technically correct) name for the server
location Rough actual location. We often name servers for major cities, so we identify the suburb here if we know it.
state The state or province the server resides in, if applicable. (This stuff gets really fun with different parts of the world recognizing different levels of government division, ask Mari about it.)
country The country/administrative region the server is in
state_abbv If there is a state, its abbreviation
continent The ID of the continent, using the 5-continent model of the world.
latitude The latitude in degrees of the server (generally the latitude of the city the server is in)
longitude The longitude in degrees of the server (generally the longitude of the city the server is in)

Notes

  • We haven't done any data integrity checking on this data set. We just happened to share it with someone else and thought it might be more broadly useful.
  • We work with data like these in MySQL. Importing the CSV files back into a relational database may make them easier to manipulate.
  • We use the spherical law of cosines to calculate the great circle distance between two servers like this:
$lat1; // server 1 latitude  in radians
$lon1; // server 1 longitude in radians
$lat2; // server 2 latitude  in radians
$lon2; // server 2 longitude in radians
$radius = 6372.8; //Earth radius in kilometers

$dist = sin($lat1) * sin($lat2) + cos($lat1) * cos($lat2) * cos( $lon1 - $lon2)
$dist = acos($dist) * $radius;

Request

If you do anything with these data, please credit & link to us as a data source. If you'd like to start pinging your own stuff, you can do that with our Where's it Up? API (which is what we're using). If you need to test your website from around the world, use WonderProxy (that's us).

Paul Reinheimer

Developer, support engineer, and occasional manager. I enjoy working on new products, listening to customers, and forgetting to bill them. Also: co-founder.