Friday, February 28, 2014

The Iditarod track file

With Melinda away in London, a quick post from me, Chris. Over the last weeks, I've been asked whether I could provide an Iditarod track file and calculate distances between the checkpoints directly from it. (I do this sort of thing in my work all the time, for science.) I thought this sounded like a good idea. In practice, it was a little more involved than I expected. So here's a little bit of information about track files, and you can download some at the end of this post.
First let's clarify "track file". This is a non-specialist term for a file that is used  to put a track on a map, with waypoints (for example checkpoints) and optional information (names, even images) enclosed. In geospatial jargon such files are called "vector files", which simply means that they contain collections of simplified real-life entities that can be represented as basic geometric objects: points, lines and polygons (the area enclosed by closed rings of lines, such as triangles, rectangles, or irregular shapes). For example, each tree in the forest outside my window could be represented as a point, whereas the area of the forest would correspond to a polygon -- or maybe a multipolygon (several non-overlapping polygons) if the forest consists of multiple wooded islands. The general terms for these things is "feature", and the most common feature types are point, linestring (sequences of points that make up a line), polygon (sequences of lines that enclose an area), and the multi- versions of each (multipoint, multilinestring, multipolygon). Vector files contain the coordinates that define each feature.  (Oh, right, we also need a coordinate system -- there are many, latitude/longitude on an approximate Earth ellipsoid being very common. Mapping is a surprisingly complicated topic that can , luckily, be left to the software we use, most of the time.)
With this rough understanding what kind of file we're dealing with, what types of track file could we get for the Iditarod 2014? Well, the specific type depends on the file's purpose:

  • for science and map making, the data is usually stored in what amounts to little databases that combine both the geographic coordinates of the features and a table -- sometimes a large table -- of information about them. The most common are: ESRI Shapefile (a proprietary binary format, but with an open specification), GeoJSON (which is human-readable), or formats requiring full-blown database software (PostGIS, Spatialite...). These files  require quite specialized software to work with.
  • for GPS tracking, a variety of text-based formats are available, the most easy to use being GPX.
  • for consumer-accessible web mapping, mostly under the influence of Google's Map and Earth products, Google's Keyhole Markup Language (KML) format has become widespread (and KMX, which is just KML + some extra overlay resources, zipped together).
KML is similar to the first category, but is not really made for storing a lot of feature attributes in a standard way. It also contains a lot of extra information related to the presentation -- the colour of the lines, links to little icons, the order in which the various feature layers should be displayed. GeoJSON often does for open-source web mapping what KML does for Google Maps, but is also a nice alternative to shapefiles. 
As for the Iditarod trail, a simple web search shows that KML files are widely available. We want one with the track for the northern route, that is, an even-numbered year. Well, here is one for 2008. If you download the file and have Google Earth installed, it will open directly. But if you look inside, it turns out that the location data isn't actually contained in the file, but in a different one that is imported via the web. This is an example for how Google KML files are just a lot more flexible -- or messy -- than file formats made for professional or scientific applications. But we have tools to transform one format into others. Without going into any detail, the most powerful ones are a set of command-line tools distributed with the Geospatial Data Abstraction Library (GDAL) and, for making GPX files (and putting the geospatial data files on a map), the online GPSVisualizer (which uses GPSBabel). 
With these two, and a bit of knowledge, I extracted the track and checkpoint information and converted it:
  • ... to a set of ESRI Shapefiles (zipped archive) in Latitude/Longitude coordinates [1], for the routes and the checkpoints separately (shapefiles can only contain one type of features, so I had to separate the checkpoints (points) from the route segments (linestrings); [2])
  • ... to a GeoJSON file containing both routes and checkpoints (this is an advantage of GeoJSON over ESRI Shapefiles)
  • ... to a single GPX file, also containing both routes and checkpoints
You can right-click, save, and play around with the files. What can you do with them?
The GPX file opens in Garmin's no-cost Basecamp software (or the older Mapsource), which you can use to push it to a Garmin GPS device. I would expect the same works in other GPS software. (Maybe I get this post out early enough for some people who work on the trail to try it out!)
The GeoJSON file can just be opened in a text editor and read. It is a much cleaner version of the KML file.
The shapefiles are suitable for use in mapping (GIS) software. Here is a little map of both the Iditarod (northern route) and Yukon Quest trail, which I made from these shapefiles  (and similar ones for the Yukon Quest) using free mapping data from Natural Earth and the Iditarod and the free GIS software called uDig



Last, the shapefiles or the GeoJSON file can serve to calculate distances between the checkpoints. For programmers, here is a tutorial I wrote how to do this. For everyone else, I'll just copy and paste the result. Note that it starts in Willow.

Willow --> Yentna
  Distance in km: 50.7
  Distance in miles: 31.7
Yentna --> Skwentna
  Distance in km: 47.7
  Distance in miles: 29.8
Skwentna --> Finger Lake
  Distance in km: 56.4
  Distance in miles: 35.2
Finger Lake --> Rainy Pass
  Distance in km: 40.1
  Distance in miles: 25.0
Rainy Pass --> Rohn
  Distance in km: 52.2
  Distance in miles: 32.6
Rohn --> Nikolai
  Distance in km: 103.8
  Distance in miles: 64.9
Nikolai --> McGrath
  Distance in km: 82.0
  Distance in miles: 51.3
McGrath --> Takotna
  Distance in km: 26.2
  Distance in miles: 16.4
Takotna --> Ophir
  Distance in km: 32.7
  Distance in miles: 20.4
Ophir --> Cripple
  Distance in km: 113.4
  Distance in miles: 70.9
Cripple --> Ruby
  Distance in km: 109.7
  Distance in miles: 68.5
Ruby --> Galena
  Distance in km: 84.8
  Distance in miles: 53.0
Galena --> Nulato
  Distance in km: 76.1
  Distance in miles: 47.6
Nulato --> Kaltag
  Distance in km: 56.3
  Distance in miles: 35.2
Kaltag --> Unalakeet
  Distance in km: 115.9
  Distance in miles: 72.4
Unalakeet --> Shaktoolik
  Distance in km: 74.4
  Distance in miles: 46.5
Shaktoolik --> Koyuk
  Distance in km: 64.9
  Distance in miles: 40.5
Koyuk --> Elim
  Distance in km: 71.5
  Distance in miles: 44.7
Elim --> Golovin
  Distance in km: 42.7
  Distance in miles: 26.7
Golovin --> White Mountains
  Distance in km: 24.7
  Distance in miles: 15.4
White Mountains --> Safety
  Distance in km: 78.1
  Distance in miles: 48.8
Safety --> Nome
  Distance in km: 34.5
  Distance in miles: 21.6

Total distance: 1438.8 km -- 899.3 miles

Okay! Well, Unalakleet is misspelled -- because it was misspelled in the original KML. (I fixed it in the GPX and GeoJSON files.) Second, the total distance comes out a little low, but if you add an extra 5% or so (because there are only 771 route points for the whole track, so curves get cut off), it looks pretty good. Last, closer inspection shows that the very first leg Willow-Yentna has a very long straight line between the very first two points and therefore is particularly underestimated. So if you are interested in these distances, don't trust blindly, compare with what the Iditarod Trail Committee says, but go ahead and use them any way you want.

[1] For advanced users, a set of files in Alaska Albers projection is also available -- this is better if you want to use it to measure distances, as the coordinates are in metres in a map projection that works well for Alaska (not much distortion).
[2] Also, if you unzip the archive, you will file more than just two files: a shapefile actually requires three or more related files -- the one ending in SHP is the actual shapefile with the geolocation information, the one ending in DBF contains the attribute table and the one ending in PRJ contains coordinate system and map projection; then there are indices (SHX, ...) etc. etc. 

3 comments:

  1. Has anyone collected an actual gps track from the iditarod, and checked how it compares with this? Just wondering, some of those mileages are way off, it has always been around 73 miles for example from Rohn to Nikolai the three times I have biked it. Perhaps it is shorter with a dog sled :)

    ReplyDelete
  2. There appear to be a lot of pieces of gps tracks around but they tend to be guarded with some zeal by their owners. This is one of the interesting issues around Iditarod tracking - because they're using Trackleaders hardware, Trackleaders almost certainly has tracks from recent races, but not only does Iditarod not allow them to make those available, they actually do not expose tracking from prior races when you buy a subscription. It would be great if someone could put together a GPS track repository for Alaska trails at some point.

    ReplyDelete
    Replies
    1. I am surprised tracklogs from some of the mushers aren't available. In regards to a GPS track repository for alaska, have you seen trailmapper? http://www.trailmapper.org/

      Delete