Skip to content

How Do You Make a GeoIP Database?

Making a GeoIP Database

To make a GeoIP database, GeoIP database providers combine a variety of data sources with the goal of being able to provide a single answer for where each IP is located.

This guide is intended to help people understand how GeoIP data comes to be, not instruct you in the creation of your own.

 

Source: Reginal Internet Registry Database

When an organization like an ISP or hosting company wants an IP address, they ask the appropriate "Regional Internet Registry". In North America that's ARIN. ARIN will (if the request meets the registry's requirements) assign a block of IPs to them, and record them as the owners of those IPs. You can lookup the public information with the whois command:

whois 24.120.53.94

NetRange: 24.120.0.0 - 24.120.255.255
CIDR: 24.120.0.0/16
NetName: NETBLK-24-120-0-0
NetHandle: NET-24-120-0-0-1
Parent: NET24 (NET-24-0-0-0-0)
NetType: Direct Allocation
OriginAS: AS22773
Organization: Cox Communications Inc. (CXA)
RegDate: 2001-02-21
Updated: 2014-12-09
Comment: For legal requests/assistance please use the following contact information:
Comment:
Comment: Cox Subpoena Phone: 404-269-0100
Comment:
Comment: Cox Subpoena Info: http://www.cox.com/policy/leainformation/default.asp
Ref: https://whois.arin.net/rest/net/NET-24-120-0-0-1

OrgName: Cox Communications Inc.
OrgId: CXA
Address: 1400 Lake Hearn Dr.
City: Atlanta
StateProv: GA
PostalCode: 30319
Country: US
RegDate:
Updated: 2017-01-28
Comment: For legal requests/assistance please use the
Comment: following contact information:
Comment: Cox Subpoena Phone: 404-269-0100
Comment: Cox Subpoena Info: http://www.cox.com/policy/leainformation/default.asp
Ref: https://whois.arin.net/rest/org/CXA

That IP address (24.120.53.94) has been assigned to Cox Communications, and Cox Communications has an address in Atlanta Georgia, USA. You can also see that every IP address between 24.120.0.0 and 24.120.255.255 belongs to Cox, so the database could start by assuming all those IPs are in Atlanta Georgia.

Extrapolating location from whois information works, with varying amount of detail for every IP out there, making it an easy place to start building a database from.

There's two problems with this start:
  1. ARIN's terms of service specifically forbid using whois as part of a commercial product:

    You are specifically prohibited from using the Whois Service (i) as part of a commercial service or product, including the solicitation and servicing of your, or your employer's, customers, even if additional data not derived from the Whois Service is incorporated or (ii) for advertising, direct marketing, marketing research or similar purposes.

  2. That's my IP right now, and I'm not in Atlanta, Georgia: I'm in the Tropicana Hotel, Las Vegas, Nevada, almost 2,000 miles away.

Source: Hostnames

Instead of looking at whois data, you could look at the hostname for an IP address to try to deduce its location:

nslookup 24.120.53.94

Non-authoritative answer:
94.53.120.24.in-addr.arpa name = wsip-24-120-53-94.lv.lv.cox.net.

Since we know the right answer, it's easy to spot "LV" and say, oh, it's in Las Vegas. Other answers are less revealing, for example where is this:

3.114.226.24.in-addr.arpa name = d226-114-3.home.cgocable.net.

Source: Traceroute

One other network tool we can look at is traceroute:

traceroute 24.120.53.94
traceroute to 24.120.53.94 (24.120.53.94), 30 hops max, 60 byte packets
1 64.34.27.81 (64.34.27.81) 2.166 ms 2.104 ms 2.083 ms 2 10ge.tor-fr402-dis-1.peer1.net (216.187.113.83) 0.377 ms 0.379 ms 0.364 ms 3 10ge.xe-0-0-0.chi-eqx-dis-1.peer1.net (216.187.114.141) 38.202 ms 38.235 ms 10ge.xe-0-2-0.chi-eqx-dis-1.peer1.net (216.187.120.130) 38.446 ms 4 10ge-xe-0-1-1.dal-eqx-cor-1.peer1.net (216.187.124.73) 38.472 ms 38.462 ms 38.431 ms 5 10ge-xe-0-0-0.dal-eqx-cor-2.peer1.net (216.187.124.134) 38.056 ms 10ge.xe-0-3-0.dal-eqx-cor-2.peer1.net (216.187.89.193) 38.021 ms 10ge-xe-0-0-0.dal-eqx-cor-2.peer1.net (216.187.124.134) 38.247 ms 6 72.51.48.90 (72.51.48.90) 38.578 ms 38.509 ms 38.465 ms 7 nwstdsrj01-ae1.0.rd.lv.cox.net (68.1.0.85) 80.414 ms 80.032 ms sestdsrj01-ae1.0.rd.lv.cox.net (68.1.0.89) 67.647 ms 8 24-234-6-17.ptp.lvcm.net (24.234.6.17) 74.373 ms 24.120.251.2 (24.120.251.2) 80.662 ms 79.977 ms 9 wsip-24-120-53-94.lv.lv.cox.net (24.120.53.94) 81.316 ms 81.466 ms 81.305 ms

The traceroute tool yields yields that same hostname we saw before on the last line, but if that last line wasn't helpful previous hops may have been. The second or third last hop will often tell you a bunch about the provider to that IP.

Source: Ping

Ping is a simple tool that tells you how long it takes to get a message to a remote server and get a response. Ping data is helpful when the time is very short, but less useful when it takes longer. The reasonsing here is: if ping takes very little time, the server being pinged must be close. If ping takes a while the server could be further OR it could have been routed poorly. In one example we've seen packets travelling between our Baltimore & New York datacenters first went through Paris France, then London UK, then over to New York.

To work well, you need to really know where a server is, or ideally several servers. I know exactly where our webserver is (I took the physical server to the building and put it on the rack), and I trust that our providers give us an accurate location (though some lie, we find alternate providers).

According to some GeoIP providers, the IP 185.139.237.128 is in Kuwait, but if I ping it from our Frankfurt location I see:

wproxy@frankfurt *** 15:06:22 *** ~
> ping 185.139.237.128
PING 185.139.237.128 (185.139.237.128) 56(84) bytes of data.
64 bytes from 185.139.237.128: icmp_seq=1 ttl=60 time=1.54 ms
64 bytes from 185.139.237.128: icmp_seq=2 ttl=60 time=1.08 ms
64 bytes from 185.139.237.128: icmp_seq=3 ttl=60 time=0.822 ms
64 bytes from 185.139.237.128: icmp_seq=4 ttl=60 time=0.934 ms

Light travels about ~120.9 miles/millisecond through fibre, it's about 2,500 miles from Frankfurt to Kuwait, so that's not physically possible.

Ping is a great way to prove some of your datapoints wrong.

 

Putting it all together

Building a database will involve combining all the data sources to form a single authortative answer for each IP. A good start would be:

  1. License data from the five regional internet registry groups: AFRINIC, ARIN, APNIC, LACNIC, and RIPE NCC to start with some data for all IPs
  2. Put real effort into a good UI that would allow someone to manually look at all the data we've generated for an IP, and make a determination on its location
  3. Start running some hostname checks on each block of IPs, and generate some rules to turn those into approximate locations
  4. Combine hostname data with traceroutes, it's common to use local airports codes in hostnames for big routers, which can help speed processing.
  5. Use servers with known locations to spot-test some purportedly nearby systems with ping