Friday, February 28, 2014

The Iditarod track file

With Melinda away in London, a quick post from me, Chris. Over the last weeks, I've been asked whether I could provide an Iditarod track file and calculate distances between the checkpoints directly from it. (I do this sort of thing in my work all the time, for science.) I thought this sounded like a good idea. In practice, it was a little more involved than I expected. So here's a little bit of information about track files, and you can download some at the end of this post.
First let's clarify "track file". This is a non-specialist term for a file that is used  to put a track on a map, with waypoints (for example checkpoints) and optional information (names, even images) enclosed. In geospatial jargon such files are called "vector files", which simply means that they contain collections of simplified real-life entities that can be represented as basic geometric objects: points, lines and polygons (the area enclosed by closed rings of lines, such as triangles, rectangles, or irregular shapes). For example, each tree in the forest outside my window could be represented as a point, whereas the area of the forest would correspond to a polygon -- or maybe a multipolygon (several non-overlapping polygons) if the forest consists of multiple wooded islands. The general terms for these things is "feature", and the most common feature types are point, linestring (sequences of points that make up a line), polygon (sequences of lines that enclose an area), and the multi- versions of each (multipoint, multilinestring, multipolygon). Vector files contain the coordinates that define each feature.  (Oh, right, we also need a coordinate system -- there are many, latitude/longitude on an approximate Earth ellipsoid being very common. Mapping is a surprisingly complicated topic that can , luckily, be left to the software we use, most of the time.)
With this rough understanding what kind of file we're dealing with, what types of track file could we get for the Iditarod 2014? Well, the specific type depends on the file's purpose:

  • for science and map making, the data is usually stored in what amounts to little databases that combine both the geographic coordinates of the features and a table -- sometimes a large table -- of information about them. The most common are: ESRI Shapefile (a proprietary binary format, but with an open specification), GeoJSON (which is human-readable), or formats requiring full-blown database software (PostGIS, Spatialite...). These files  require quite specialized software to work with.
  • for GPS tracking, a variety of text-based formats are available, the most easy to use being GPX.
  • for consumer-accessible web mapping, mostly under the influence of Google's Map and Earth products, Google's Keyhole Markup Language (KML) format has become widespread (and KMX, which is just KML + some extra overlay resources, zipped together).
KML is similar to the first category, but is not really made for storing a lot of feature attributes in a standard way. It also contains a lot of extra information related to the presentation -- the colour of the lines, links to little icons, the order in which the various feature layers should be displayed. GeoJSON often does for open-source web mapping what KML does for Google Maps, but is also a nice alternative to shapefiles. 
As for the Iditarod trail, a simple web search shows that KML files are widely available. We want one with the track for the northern route, that is, an even-numbered year. Well, here is one for 2008. If you download the file and have Google Earth installed, it will open directly. But if you look inside, it turns out that the location data isn't actually contained in the file, but in a different one that is imported via the web. This is an example for how Google KML files are just a lot more flexible -- or messy -- than file formats made for professional or scientific applications. But we have tools to transform one format into others. Without going into any detail, the most powerful ones are a set of command-line tools distributed with the Geospatial Data Abstraction Library (GDAL) and, for making GPX files (and putting the geospatial data files on a map), the online GPSVisualizer (which uses GPSBabel). 
With these two, and a bit of knowledge, I extracted the track and checkpoint information and converted it:
  • ... to a set of ESRI Shapefiles (zipped archive) in Latitude/Longitude coordinates [1], for the routes and the checkpoints separately (shapefiles can only contain one type of features, so I had to separate the checkpoints (points) from the route segments (linestrings); [2])
  • ... to a GeoJSON file containing both routes and checkpoints (this is an advantage of GeoJSON over ESRI Shapefiles)
  • ... to a single GPX file, also containing both routes and checkpoints
You can right-click, save, and play around with the files. What can you do with them?
The GPX file opens in Garmin's no-cost Basecamp software (or the older Mapsource), which you can use to push it to a Garmin GPS device. I would expect the same works in other GPS software. (Maybe I get this post out early enough for some people who work on the trail to try it out!)
The GeoJSON file can just be opened in a text editor and read. It is a much cleaner version of the KML file.
The shapefiles are suitable for use in mapping (GIS) software. Here is a little map of both the Iditarod (northern route) and Yukon Quest trail, which I made from these shapefiles  (and similar ones for the Yukon Quest) using free mapping data from Natural Earth and the Iditarod and the free GIS software called uDig

Last, the shapefiles or the GeoJSON file can serve to calculate distances between the checkpoints. For programmers, here is a tutorial I wrote how to do this. For everyone else, I'll just copy and paste the result. Note that it starts in Willow.

Willow --> Yentna
  Distance in km: 50.7
  Distance in miles: 31.7
Yentna --> Skwentna
  Distance in km: 47.7
  Distance in miles: 29.8
Skwentna --> Finger Lake
  Distance in km: 56.4
  Distance in miles: 35.2
Finger Lake --> Rainy Pass
  Distance in km: 40.1
  Distance in miles: 25.0
Rainy Pass --> Rohn
  Distance in km: 52.2
  Distance in miles: 32.6
Rohn --> Nikolai
  Distance in km: 103.8
  Distance in miles: 64.9
Nikolai --> McGrath
  Distance in km: 82.0
  Distance in miles: 51.3
McGrath --> Takotna
  Distance in km: 26.2
  Distance in miles: 16.4
Takotna --> Ophir
  Distance in km: 32.7
  Distance in miles: 20.4
Ophir --> Cripple
  Distance in km: 113.4
  Distance in miles: 70.9
Cripple --> Ruby
  Distance in km: 109.7
  Distance in miles: 68.5
Ruby --> Galena
  Distance in km: 84.8
  Distance in miles: 53.0
Galena --> Nulato
  Distance in km: 76.1
  Distance in miles: 47.6
Nulato --> Kaltag
  Distance in km: 56.3
  Distance in miles: 35.2
Kaltag --> Unalakeet
  Distance in km: 115.9
  Distance in miles: 72.4
Unalakeet --> Shaktoolik
  Distance in km: 74.4
  Distance in miles: 46.5
Shaktoolik --> Koyuk
  Distance in km: 64.9
  Distance in miles: 40.5
Koyuk --> Elim
  Distance in km: 71.5
  Distance in miles: 44.7
Elim --> Golovin
  Distance in km: 42.7
  Distance in miles: 26.7
Golovin --> White Mountains
  Distance in km: 24.7
  Distance in miles: 15.4
White Mountains --> Safety
  Distance in km: 78.1
  Distance in miles: 48.8
Safety --> Nome
  Distance in km: 34.5
  Distance in miles: 21.6

Total distance: 1438.8 km -- 899.3 miles

Okay! Well, Unalakleet is misspelled -- because it was misspelled in the original KML. (I fixed it in the GPX and GeoJSON files.) Second, the total distance comes out a little low, but if you add an extra 5% or so (because there are only 771 route points for the whole track, so curves get cut off), it looks pretty good. Last, closer inspection shows that the very first leg Willow-Yentna has a very long straight line between the very first two points and therefore is particularly underestimated. So if you are interested in these distances, don't trust blindly, compare with what the Iditarod Trail Committee says, but go ahead and use them any way you want.

[1] For advanced users, a set of files in Alaska Albers projection is also available -- this is better if you want to use it to measure distances, as the coordinates are in metres in a map projection that works well for Alaska (not much distortion).
[2] Also, if you unzip the archive, you will file more than just two files: a shapefile actually requires three or more related files -- the one ending in SHP is the actual shapefile with the geolocation information, the one ending in DBF contains the attribute table and the one ending in PRJ contains coordinate system and map projection; then there are indices (SHX, ...) etc. etc. 

Thursday, February 27, 2014

John Schandelmeier's ADN piece on trackers

I'm in London doing work-y things (workshop on strengthening the internet against pervasive surveillance, Internet Engineering Task Force meeting).  It's a long trek from Alaska, and while I was in transit John Schandelmeier published an article in the Anchorage Daily News questioning the value of GPS tracking in dogsled racing.  I actually agree with him substantially but think he's really not addressing a few things that matter a lot.

John is not the first racer I've heard or read saying things along these lines.  I expect that it is incredibly annoying to be on the trail and away from people, the world, etc., but to see a red light blinking at you hour after hour after hour after hour.  In addition to a general sense of being unable to disengage from the clutter, of one thing of which I have absolutely no doubt is that some number of people carrying trackers on their sleds feel like they're under surveillance.

I also think it's an open question what value they bring to the races.  It's certainly less of a question in the case of Iditarod, since they seem to be making a profit on tracker subscriptions (I'd also argue that there are more people running Iditarod with marginal trail skills who need to be kept an eye on than there are in Quest, but I suppose that would be overly argumentative).  With Quest it's less clear that it's led to a substantial increase in financial support from fans, particularly given the state of the purse over the past few years.  And, of course, fan overreaction to things that they see in the trackers, plus managing the PR aspects of real problems in real time, increase both the workload and stress level for race staff.

And to be sure, there is no substitute for physical presence and human interaction.  Over all these years, hands-down and by a large margin my favorite race spectating experience was last year at the Two Rivers checkpoint.  Hugh's tracker was off and while we knew where Allen was we didn't know if Hugh was ahead of him, behind him, ... ?  So there was a crowd, mostly handlers and people from the dog-savvy Two Rivers community, waiting at the checkpoint to see who'd be the first in and the likely winner of the 2013 Quest.  There was a lot of chatter, a lot of suspense, and a lot of camaraderie as we waited.

That said, there is more than one way to experience the race, and I wouldn't denigrate the experience that people who cannot be here, who don't run dogs and don't know winter, are having as they follow along from home.  As I was flying out yesterday/last night/whenever the heck that was (it all runs together ... ) I looked down on the landscape, and the trails that go on for miles without ever crossing a road or encountering a town, and once again I realized ridiculously lucky I am to be able to live in Alaska.  Most people don't, and most people can't.  They come up and visit and have, I'm sorry to say, staged, inauthentic experiences, but somehow it captures their imagination and they fall in love with the romance of the place even if they can't quite engage Alaska directly.  Following the races is one way for them to keep the romance alive.  I'd argue that's a good thing, even if it's not really got very much to do with what Alaska is actually about.

But still, one of the things I've been hammering on is that the data and the trackers do tell some stories, if you care to watch and listen.  Unfortunately is down right now but when it comes back up I'll post a bit of John's track from this year's Quest, where we did get to watch a story unfold and did get a sense of what was happening.  He was traveling with Matt Hall (and really, somebody has to have a word with the unfortunate PR people who were handling the Quest's Facebook page and were turning every instance of people traveling together into a race).  We watched them stop, leave the trail, go some distance, turn around, rejoin the trail, and stop again (here's an excerpt from Matt's track; John's looks much the same).  So, while we don't know what they looked and felt like, we do have some idea that they ran into some tough trail and we watched them deal with it.

Similarly, I think a lot of people following on the GPS tracker remember standing up and screaming at their computers while watching Rob Cooke on Eagle Summit last year.  We watched him motor on up, pause, and turn around.  This is a case where it was much less clear what was going on (it looked possible that he was having problems but it turned out that he had so little difficulty going up he thought he must have left the trail, and turned around to find it) but it was emotional in any event.

So no, it's not at all the same as being on the trail.  People are working with woefully little information and sometimes they don't understand what they're seeing at all (and this is where race organizations can be doing a better job, to head off overreaction and to help fans understand what they're seeing).  But they're having a different kind of experience and have their own level of emotional involvement in it.  John and others may not value it as highly as they value direct trail experience (and I wouldn't, either), but it's real and it's meaningful.

Sunday, February 23, 2014

Why Iditarod tracker mileage sorting is messed up

People have noticed that the sort by mileage function on the Iditarod leaderboard was wrong (it's been fixed), with 100-something miles sorting in front of fewer miles.  For example, see this screen shot grabbed by Dawn Beckwell:

Here's what's going on (and this will be old news to a few people and not interesting at all to most):  Computers are binary calculators, with all data, from programs to stored files, taking the form of a string of 0s and 1s.  Both characters and numbers are also strings of 0s and 1s, and there are standardized encoding schemes for representing character data.  By far the most popular/successful is known as ASCII, or the American Standard Code for Information Interchange (nice collection of ASCII tables here).  So, while the number 1 is "00000001" in base 2 (again, base 2 because each bit can take one of only two values, 0 or 1), the character "1" is encoded in ASCII as 00110001.  That's right, 1 and '1' are not the same.  When you see a '1' on a screen what's really behind that - how the data are really represented - is 00110001.  00000001 is translated to 00110001 for printing or display.

Programs that do things to data, like sort them, have no way to know what 00110001 is or how they should treat it.  In a lot of web-oriented and application-oriented programming languages it's very easy to sort data (old school, we had to write our own sort functions) but the default is that data are treated like characters.  They look at the first character in each string and sort on that, etc.  In that scheme, "1" is smaller than "7" and sorts earlier.  To sort as numbers, in modern programming languages you just need to tell the sorting function to treat the data as numbers, not characters, so instead of looking at it character-by-character it understands that the value it needs to sort is 101.5.

Saturday, February 22, 2014

Answering a question with the new Iditarod tracker

Today, someone asked what time Conway got into Yentna.  So, how do you answer that question with the Iditarod tracker analytics?  Well, the best you can do is to take a look at what time his speed fell to 0mph.  Because there is no plot showing mile location against time (or time against miles), you can't say "Checkpoint <abc> is at mile 128 and Musher <def> was at mile 128 at 4:20, so Musher <def> arrived at <abc> at 4:20.  Instead, you can make inferences from speed.  And, in fact, the standings say he got in at 3:56.

Another possibility for figuring out when he got in is to use the replay.  They don't allow you to control the speed of the replay and wow, that's going to suck a lot when the race gets longer and you've got over 60 teams on the trail, but if you drag the slider you can do it manually.

[Another UX problem - that legend on top of the curves is really annoying!]

The new Iditarod tracker and user experience ("UX")

First, I'd like to apologize for not blogging much over here.  I've been extremely busy with work (that's a good thing, mostly) and have found that keeping a Facebook page is a handy way to get out short notes.  The Facebook page is here.

Anyway, Iditarod's new tracker is up and being used to track the Junior Iditarod.  It's pretty clear that they've written their own based on a data feed from Trackleaders, and it's also pretty clear that they didn't have time or the means to debug it.  Software quality assurance is much more difficult than you might think.  One common problem is that commercial-grade software needs to handle unexpected inputs gracefully.  It is an ongoing source of amazement what people will try to do with something, things you never could have expected and didn't plan for.  When people in the techie business say "The first 80% of the project takes 80% of the effort, and the last 20% of the project takes the other 80% of the effort," that's nearly always what they're talking about - quality assurance.  People who haven't done a lot of commercial software tend not to appreciate this and think that when a program does what they want it to, they're done.  Not sure what to say about that other than "Hah."

Anyway, rather than dwell on bugs I'd like to talk for a minute about user interface issues.  User interface is also a very highly specialized area in software.  It's something at which I am truly terrible, so I rely on people who understand user behavior, workflows, and so on.  But in this case I am a user and here's what I'm finding:

  • Some of the things which work well for a 9-team field are going to be nearly unbearable when there are nearly 70 teams being tracked
  • First, good for them for making the columns sortable in the leaderboard (the panel on the left-hand side of the map).  It's helpful, and it's going to be absolutely necessary when there are 60-odd teams on the trail
  • Too much clutter on the screen, and it covers up portions of the map.  Unfortunately because of a few other problems with the user interface we kind of need to keep some of it around (the leaderboard)
  • The base map layer is not a good choice.  I understand that this is what Mapbox provides but the lack of labels on geographic features is unfortunate.  It would be nice to have the option to switch between a map layer and a satellite image layer (a topo layer would be awesome but I understand it's a lot more difficult to come by - another plus for Trackleaders).  On a more positive note, today Iditarod switched from showing the road map layer to showing a terrain map layer.  It makes it easier to compare to a topo map, plus - let's face it - using a road map to track a wilderness race is kind of dumb
  • We can't zoom out to cover a larger geographic area.  Can't imagine why not unless it costs them money (does Mapbox charge for tile access?  Don't know).
  • They need to get a handle on this whole "rest" thing.  I'm very interested in run/rest schedules (you should be, too!  They're a key question in understanding distance dogsled races) and a 10-minute stop for snacks or to check booties is not at all the same thing as a 4-hour break.  Also, it's probably a mistake to display a musher as having stopped the same as a tracker that hasn't updated.  It doesn't help that a single 0mph reading is treated as stopped, because it means that they're also showing a single missed tracker update as stopped
  • If you hover over a flag on the map it gives you the geographic coordinates of that tracker update.  I assume they did that for debugging purposes, but for those of us following the race it would be a big improvement if they showed the musher's name, instead.  Right now you need to go back to the leaderboard to find who a given bib number belongs to.  That's going to suck when there are 69 teams on the trail.
  • I'm having a hard time calling their analytics "analytics," since they don't provide that much insight into what's going on on the trail.  I keep hammering on this because I think it's important: the competitive advantage that Trackleaders brings to the event tracking business is that they know how to tell a story using data.  Teams on the trail aren't simply moving down a line, they're also moving relative to one another, and that movement is much of the story of a race.  Who's traveling together?  Who's passing whom, and where is it happening? How much faster *is* one team traveling than another, really? Is there one particular spot on the trail that's a popular camping site?  I get the impression that the folks working for IonEarth were pressured by Iditarod to provide analytics and found a Javascript library containing strip charts so implemented that, without thinking very hard about what they want to show.  Now, Iditarod is copying that.
  • Here's one thing the Iditarod analytics do do well: by mapping speed against time they give you some insight into a particular musher's run/rest schedule.
  • It's great that they let you get a musher's "analytics" directly from the leaderboard but it's kind of a clutzy process.  I usually start by noticing something on the map I'd like to look at more closely.  In this case, what we see on the map is bib numbers.  So you need to go over to the leaderboard (open it up if you've closed it to mimimize clutter), find the bib number, then click on that person's analytics icon.  It'd be a lot more straightforward to be able to go from the map marker directly to the "analytics."
  • Another clutter-related issue is that because the pop-ups don't close when you open another, you can get a mess pretty quickly.  Unfortunately closing them can be a little hit-and-miss with your mouse.  Anyway, your moment of fugly:

  • They're not really strong on the mileage reporting and while the analytics show speed against time there's really no easy way to compare how two teams performed over the same section of trail
Anyway, enough kvetching.  When you're in the software business and when you're an engineer, your first instincts when facing new technologies are 1) to figure out how it works, and 2) to figure out how to make it better.  Alas, this tracker is giving us plenty of opportunities for the latter.  But ultimately what matters is how it works when put to some basic tests, and in a couple of future posts I'll look at how to answer certain kinds of questions using this software.

Saturday, February 1, 2014

Now on Facebook, too

I've created a Mushing Tech Facebook page as a more efficient way of sharing information.  In addition to pointers to blog posts there will also be quick notes about things which may not merit an entire post.  The page is here.