Mushing Tech: 2014

Monday, December 22, 2014

Updated SPOT API example

Several years ago I posted some example code to exercise the SPOT tracker API. SPOT has since both updated their data format and added a JSON feed. I think both changes are for the better; certainly JSON is easier to deal with than XML and is far more efficient to process. So, I've updated my code to speak JSON. Note that in this code I'm pulling the JSON in from a file that I created by downloading the JSON using curl; you should be able to pull it in directly from SPOT by putting together a URL following the instructions on their API page (I did it using a local file because SPOT, frankly, is a bit parsimonious about server traffic). Nearly all of the change was to the function extract_gps_data().

Note that the data are simply a JSON-ized version of the data shown at the SPOT API page.

I've put the code up on Github, as well. The caveats from the previous API post still apply - don't write a tracker that runs in the browser, don't write it in Javascript, don't hit the servers (either SPOT or Google Maps) more often than absolutely necessary, etc. Be sensitive to privacy issues, and what you're revealing when you write a tracker that's publicly available.

Let me know if you identify errors in the code or if you have questions! Most of all, have some fun with this.

<!DOCTYPE html>
<html>
<head>

 <title>My Wee Tracker</title>
 <meta name="viewport" content="initial-scale=1.0, user-scalable=no" />
 <style type="text/css">
  html { height: 100% }
  body { height: 100%; margin: 0; padding: 0 }
  #map_canvas { height: 100% }
 </style>
 <script type="text/javascript" 
  src="http://maps.googleapis.com/maps/api/js?key=AIzaSyBFoJjPtS9vWXIENOa-egd0XFFnnQbfTIk&sensor=false&libraries=geometry">
 </script>

<script type="text/javascript">
//<![CDATA[


// convert text from the tracker data to a JSON object and
// pull out deeply-nested data elements

function extract_gps_data(trackerdata)  {
    var points = new Array();

    var track_data = JSON.parse(trackerdata);
    var messages = track_data['response']['feedMessageResponse']['messages']['message'];

    for (i = 0 ; i < track_data['response']['feedMessageResponse']['count'] ; i++)  {
        var timestamp = messages[i]['dateTime'];
        var latitude = messages[i]['latitude'];
        var longitude = messages[i]['longitude'];
        var point_holder = new point(timestamp, latitude, longitude);
        points.push(point_holder);
    }
    return points;
}


// "point" is an object we use to hold the data we'll be putting on the map

function point(timestamp, latitude, longitude)  {
    this.timestamp = timestamp;
    this.latitude = latitude;
    this.longitude = longitude;
}

function get_track(url)  {
    var request = new XMLHttpRequest();
    request.open("GET", url, false);
    request.send();
    return request.response;
}

function makeinfobox(pointnum, thispoint, theotherpoint)  {
    var latlnga, latlngb; 
    var distance;
    var infoboxtext;
    var timestamp;
    
    timestamp = new Date(thispoint.timestamp); // we convert it from ISO format to something more readable
    infoboxtext = String(timestamp);
    if (pointnum > 0 && theotherpoint)  {  // no point calculating distance on the point
        latlnga = new google.maps.LatLng(thispoint.latitude, thispoint.longitude);
        latlngb = new google.maps.LatLng(theotherpoint.latitude, theotherpoint.longitude);
        distance = google.maps.geometry.spherical.computeDistanceBetween(latlnga, latlngb) / 1610; // convert to miles
        infoboxtext = infoboxtext + "<br />" + distance.toFixed(2) + " miles";
    } 
    return infoboxtext; 
}

function initialize()  {
    var points;
    url = "spot_track.json";
    trackline = new Array();

    trackerdata = get_track(url);
    points = extract_gps_data(trackerdata);

    var spot = new google.maps.LatLng(points[0].latitude, points[0].longitude);
    var my_options = {
        center: spot,
        zoom: 12,
        mapTypeId: google.maps.MapTypeId.ROADMAP
    };
    var map = new google.maps.Map(document.getElementById("map_canvas"), my_options);
    for ( i = 0 ; i < points.length ; i++ )  {
        var contentstring = "Point " + i; 
        var spot = new google.maps.LatLng(points[i].latitude, points[i].longitude);
  // here we create the text that is displayed when we click on a marker
        var windowtext = makeinfobox(i, points[i], points[i+1]);  
        var marker = new google.maps.Marker( {
            position: spot, 
            map: map,
            title: points[i].timestamp,
            html: windowtext
        } );
  // instantiate the infowindow
  
        var infowindow = new google.maps.InfoWindow( {
        } );

  // when you click on a marker, pop up an info window
        google.maps.event.addListener(marker, 'click', function() {
            infowindow.setContent(this.html);
            infowindow.open(map, this);
        });

  // set up the array from which we'll draw a line connecting the readings
        trackline.push(spot);
    }  
 
 // here's where we actually draw the path 
    var trackpath = new google.maps.Polyline( {
        path: trackline,
        strokeColor: "#FF00FF",
        strokeWeight: 3
    } );
    trackpath.setMap(map);
}

//]]>

</script>
</head>

<body onload="initialize()">

<div id="map_canvas" style="width:100%; height:100%"></div>

</body>
</html>

Saturday, December 20, 2014

Catching up with the SPOT API

Over the holiday I'm planning on putting out an update of the sample code that uses the SPOT API, both to work with the new format and to work with the JSON representation rather than the XML. In the meantime, here's a sample JSON element representing a single SPOT tracker message:

{
'    @clientUnixTime': '0',
    'batteryState': u'GOOD',
    'dateTime': '2014-12-19T07:34:35+0000',
    'hidden': 0,
    'id': 349242506,
    'latitude': 64.82709,
    'longitude': -148.99679,
    'messageType': 'TRACK',
    'messengerId': '0-8283550',
    'messengerName': 'Melinda\'s tracker',
    'modelId': 'SPOT2',
    'showCustomMsg': 'Y',
    'unixTime': 1418974475
}

The (minimal) SPOT api documentation is available online here. More later!

Monday, November 24, 2014

2011 Quest runtimes

Following up on my previous post on 2013 Quest statistics, I've gone back, cleaned up my 2011 data, and run some numbers on that. What I found was pretty consistent with the 2013 numbers, except that there were a few surprises in the numbers that actually helped tell the story of the race.

If you've been following the Yukon Quest for awhile you may remember that 2011 was pretty harrowing, with some bad weather in the middle and second half of the race that caused some very serious problems for those in the front of the pack. Brent Sass recorded some particularly memorable video as he helped a nearly-hypothermic Hans Gatt off American Summit:

and Sebastian Schnuelle's ice-caked boots in Central told a story, as well:

What I did with the 2011 data was virtually identical to what I did with the 2013 data: For each checkpoint I took a look at runtimes to get a general picture of what happened on that leg of the race. Then I collected those runtimes into a table, where I ran correlations of runtimes for each race segment against the finishing position.

The unsurprising part was that race segment distance correlated quite well with overall finishing position, with the longest race segment (Dawson to Eagle) showing a very, very large positive correlation with finishing position, at .8537. So again, this year look for longer race segments to have greater predictive value for finishing position.

The surprising bit, at least initially, was that there were four checkpoints at which runtime was inversely correlated with finishing position (in one case, strongly so). They were Slavens (-0.1755), Circle (-0.4027), Central (-0.2543), and 101 (-0.0877). At well over half-way into the race, the faster teams had pulled up towards the front, and this is where they ran into awful conditions, which slowed them down. By the time the back of the pack arrived the overflow had frozen and the weather had moderated. So, they were able to travel faster.

Nevertheless, the correlated between traveling speed over longer segments and overall finishing position remained reasonably strong, with an r value of 0.4527. Here's the plot:

If you're interested in looking at the data and playing with them yourself, they're here, with the correlations on the very last sheet. Unfortunately Trackleaders hadn't added the replay feature to their tracker at that point, but the race track is online here and looking at individual musher pages can help illustrate some of what happened (the closer together the breadcrumbs, the slower the team was moving [assuming that the breadcrumbs were being uploaded at regular intervals]).

Also, note that this analysis lacks anything resembling rigor, including questionable choice of metrics for correlation, etc. But, for a casual description of how the race played out and how that's reflected in the race statistics, I think it's adequate. Let me know if you spot a problem, or if you've got further questions.

Tuesday, October 14, 2014

A look at 2013 Quest runtimes

It's looking like this year's Yukon Quest has a pretty good field of entries, and with fall training well underway in interior Alaska we're all starting to speculate about how the race is going to go this year. It's only natural to look at past races, so I've started poking at the 2013 data, the last year the race was run in the Whitehorse-to-Fairbanks direction. I'm also interested in having a baseline set of data to which this year's race can be compared, once it's underway.

So, I've taken my own spreadsheets from 2013 and used them as a basis for running some numbers. In particular, I've created a spreadsheet containing runtime 2013 checkpoint data and extracted runtime summary data to get some basic descriptive statistics: fastest, slowest, mean, median, 1st quartile, and 3rd quartile for each race segment (between checkpoints). For example, between Braeburn and Carmacks, I've got a table that looks like this:

You'll note that I've also calculated the ratio between the fastest and slowest runtimes; there may be something interesting there to look at later. I've also plotted all runtimes as a histogram:

Again, this is largely to create a baseline dataset for comparison with this year.

However, I also ran some correlations, and while the results are obvious if you think about them for a few seconds, I haven't ever seen anybody say so explicitly:

The ranking of runtimes on longer race segments (more miles between checkpoints) tends to correlate more strongly with final standings, at least in this data set. That is to say, people who had the faster runtimes between checkpoints which are far apart tended to finish better than people with slower runtimes. Some of this is tautological (the longer runs are a greater percentage of the total race), some of it might (I haven't looked at this) be because on long runs everybody has to camp, even people who would otherwise prefer to rest at checkpoints, or because longer runs smooth out the variability you might see in shorter runs (law of large numbers, sort of).

Here's a plot of the the correlations between segment runtime and finishing position, against segment distance.

The correlations are on the "correlations" worksheet in the spreadsheet. I'm using a standard Pearson product-moment correlation coefficient (r), which is not the best test for a complete dataset but is adequate for exploring these data. I'm not posting the numbers here in the interest of not having reader eyes glaze over, but definitely feel free to visit the spreadsheet, poke through the data, copy the spreadsheets, and ask questions.

I'm planning on doing something similar with 2011 data, as well as years run in the opposite direction, to see how well what we're seeing in 2013 holds up. Unfortunately getting the data into the spreadsheet and clean enough to use is pretty labor-intensive, so it might be some while before a follow-up to this post. But, once the data are in a spreadsheet there's a lot we can do with them, so there's incentive to do it beyond answering just these questions.

Here's looking forward to a great winter of distance dogsled racing!

Monday, September 1, 2014

New (ish?) event tracking software

It looks like there's a new GPS-based event tracking application, RaceBeacon. From what I can glean from their website it looks like they're consolidating GPS feeds from individual participants' personal smart phones. This is probably reasonable for shorter (as in, very short) informal events, and may provide a solid basis for building out a more robust platform with more features in the future. As much as I'm a fan of Trackleaders (and I'm a huge fan, for reasons I'll go into in the next paragraph), it's always great to see some competition in this space.

That said, just showing locations on a map is not that interesting, particularly for events with staggered starts (like sprint races). A map alone can show you where teams are in relation to each other but they don't capture the dynamic nature of racing - the stuff that makes racing exciting. Prominent among the reason that I like Trackleaders is that they really are both data guys and competitive cyclists, and they're interested in showing the story of a race as it unfolds. Also, because they're data guys and computer scientists they've already dealt with some relatively difficult problems, for example calculating with some degree of precision the race mile at which a given team is currently located (harder than you'd think). In this case, for sprint mushing races, the problem RaceBeacon is facing is how to demonstrate the relationship between two different teams' performances when the entire race is run in 20 minutes without stopping and the teams started 2 (or 4 or 18) minutes apart. But, if you're not committed to using another tracking system and you're putting on a sprint event (and you can count on most or all of your teams carrying smart phones with data plans and having their batteries fully charged and being a platform supported by RaceBeacon), this could be very interesting to experiment with.

If you've used them for tracking your event, how did it go? Who's planning on using them this fall or winter?

Saturday, May 31, 2014

Packable beer

Recently there's been some discussion of instant beer for backpacking or other backcountry travel. After all, beer is almost entirely water and water's really heavy, so if the water can be eliminated and added later, the problem is basically solved (well, almost).

This sounded promising but turned out to be a hoax, but in the meantime Pat's Backcountry Beverages has developed the real thing - a beer concentrate and a convenient technology for rehydrating it and adding the fizzy back. Note that they also have concentrates for various soft drinks including colas, lemon-lime drinks, ginger ale, and others (and you can actually probably use it to carbonate nearly anything). In the interest of Science we decided to use empirical methods to test the manufacturer's claims.

It's a 3-part system, consisting of a very lightweight easy-to-pack plastic carbonator:

the "activator" (a mix of citric acid and sodium bicarbonate, which comes in convenient small packets):

and the drink concentrate:

(a fun fact about the concentrate - yup, that's 58% alcohol):

Also, your basic low bar:

Anyway, the lid of the concentrator has a lever that's used to pump air in and pressurize the container, and making the brew is a quite straightforward process of pressurizing the device, releasing the pressure, and repeating that cycle for about two minutes. When you're done you've got something that looks like this:

which is a not-bad head on the beer. Poured into pint glasses we have:

There are two beers available, a "Pale Rail" and a "Black Hops." This is the Black Hops. It smells very malty and a little sweet, but the sweetness doesn't come through in the flavor. I'm pleased with the flavor (a little bitter) and find it very drinkable. Chris is German and therefore has profound beer expertise, and she thought it would be better much cooler (we used plain tap water) but was otherwise quite good.

Here's the bad news: while the purchase cost was high but not unreasonable, the shipping costs to Alaska were nuts. As in, about $30 for the carbonator, activator, and brew just for shipping. It's only available from one online vendor and there doesn't seem to be anybody selling it locally. (The good news: market opportunity!). The carbonator itself is about $40 (but it's reusable and seems durable), the activator is about .50/packet, the beer is about $2.50 for a packet to make a pint, and the soda is a bit under $1.50 for a packet to make a pint.

My one reservation is that because it's a liquid concentrate it's likely to freeze at low temperatures, but otherwise I'm very pleased. I'll be ordering the Pale Rail and some of their sodas, I think - I do think this is a pretty nifty gizmo, and very highly packable.

Hey, does anybody know if alcohol kills giardia?

Saturday, March 8, 2014

Iditarod and software development

As I posted on my Facebook page, Iditarod has removed my ability to post to their Facebook page. The proximate cause appears to have been my suggestion that they hire a tech support professional. I accept that I've been a little obnoxious. I don't think, however, that I'm wrong.

I don't think Iditarod realizes yet that they're now in the commercial software business. They wrote some software, they're selling it, and that's kind of that. I don't think they had any idea at all what they were getting into, and I've been trying to figure out why they did it in the first place. I think part of the issue is that Trackleaders' user interface really does look dated, even if they've got the best functionality in the event tracking business. Iditarod wants their stuff to be "branded." (They also want to charge a lot of money for it; Trackleaders is committed to tracking being free to fans). I think a better outcome would have been to work with Trackleaders on figuring out how to "skin" the Trackleaders app to develop a distinctive look.

But, that's not what they did. They decided to write their own software. I can't imagine they did it to save money, since programmers are really pretty expensive. Better ones make upwards of $90,000/year, plus benefits, really good ones make big piles of cash. On the other hand there are web sites for jobbing out work to what are really very good programmers in places like Pakistan and Bangladesh, and those folks are quite inexpensive (at the cost of some reliability issues due to both infrastructure and political stability - a friend hired some developers in Pakistan and then Bhutto was assassinated a few days later, which, among other things, pushed deadlines back).

What Stan and the other fun folks in that office might not have realized is that you never finish software, never. You release it, but there's always more work to do, bugs to fix, features to add, underlying technology changes to adapt to, and so on. For example, when Google changed their maps API, it put one of my favorite Alaska GIS websites out of business, because there was no money to hire programmers to adapt their software to the new interface. So, when the ITC decided to develop their own software, they decided to commit money to it, year after year after year after year.

Something else they apparently didn't realize is that software is not ready for release when the developer says "it works for me." The developer knows how it works and naive programmers only test happy path application use. Once the software is released into the wild, particularly if it's a web app, it's going to be run in a variety of platform and browser environments, users are going to try to do things you could not possibly have imagined, and so on. Bringing in a test professional to bang on your application is going to turn up problems before the software is released, giving you an opportunity to fix bugs and head off support issues. A typical test environment has a bunch of different operating systems running in clean VMs (virtual machines), with as many browsers (and versions of browsers) as they can possibly get their hands on. I'm still boggled that Iditarod developers apparently didn't test their stuff on Internet Explorer, which may be losing market share but is still the second most-widely used browser after Chrome (see here for browser stats). Test and QA professionals are in high demand and well-compensated for a very good reason - over the long run they save a project money, reputation, and headaches.

So here we are, with Iditarod having developed their own software and not having tested it before releasing it. Now what? Well, this is where having tech support people makes a huge, huge difference. For starters, actually knowing something about the technology is kind of a time-saver when trying to solve a technical problem. In library school, people who are planning on a career providing reference services are taught a skill called "negotiating the question," where they're taught techniques for finding out what a library patron's real question is when they come in with something vague or somewhat oblique to what they really want to find out. Tech support people do the same thing. They find out how to reproduce a problem and they know how to describe it to developers to find a solution if it's not something they can figure out themselves. They recognize the difference between a user error and a real bug.

But that's not what Iditarod has. They hired some social media people, whose job is to say "Watch this fantastic video! Then buy things." They may be able to navigate Facebook and Twitter extremely well, but those are different skills from being able to sort out technical problems. And so it is that Iditarod's social media people were answering questions from someone who was unable to watch videos because she clearly wasn't logged in. They told her "Reboot your computer" (I'm only sorry I never had an opportunity to ask them how they thought that would help). That led to a situation in which the social media people were flustered and frustrated and the user with the unsolved question was pissed off. It's not good for anybody.

So, perfect storm of really bad decisions on ITC's part. I don't expect Iditarod to fix this situation, because Iditarod doesn't fix these things and it's completely consistent with past performance (here's something I wrote about this almost exactly two years ago, when the handwriting was already on the wall with regard to IonEarth's long-term viability). It's a tough situation for those putzes who blocked me from posting on Iditarod's wall, and it's a tough situation for fans. I don't expect it to improve, at least not any time soon.

I really don't like the Iditarod organization, in case that's in any way unclear (!). But when you get past all the stupid decisions, the commercialism, the minstrel show aspects of some of what goes on, that they really do not put dogs first, it's still a 1000-mile race with superb mushers and incredible athlete dogs. Fortunately, as was foreseeable a few years ago, better and better photos, videos, and coverage are coming from free sources (in addition to the Anchorage Daily News and KNOM, KUAC started sending Emily Schwing out on the Iditarod trail two years ago, and this year Alaska Dispatch has really upped their coverage). The only thing that Iditarod provides that isn't available elsewhere is the tracking, and at this point it's nearly worthless, anyway. I am cheering for friends running the race and wishing them the best, The ITC, well, whatever.

Monday, March 3, 2014

This one's for the mapping nerds

Map enthusiasts (and that would be nearly all of us of a nerdly disposition) should know about a really nice, free mapping service built on top of Google Maps. Gmap4 includes not just road maps and satellite images, but also topo maps for the US and Canada, satellite maps, Open Street Maps, and a wide variety of tools, plus what's basically a RESTful API that allows you to integrate your own data without having to create an account or reveal personally identfiable information (PII). Bravo to the author, and enjoy!

Gmap4 is online here.

Sunday, March 2, 2014

Good morning, yo

I'm in totally the wrong timezone for following Iditarod. I went to bed before the start (sorry, all, I just can't bring myself to call it a "restart") and woke up at 4am GMT to a lot of complaints about the new tracker. But I wasn't interested in the complaints as much as I was in looking to see what friends on the trail were up to, so I opened the tracker myself.

With Trackleaders, the first thing I'd do in the morning was open the race flow chart and get a quick picture of who was moving the fastest, who was resting, who was passing whom, and so on. What I get with the Iditarod tracker is dots on a map. This morning I can infer who went out fast by what order the bib numbers are in on the trail (because they left the start in order), but that's not going to last for very much longer. But nevermind that, how are my friends doing?

To find that out, I had to open up the so-called "leaderboard" and chew up more screen space, then select someone from the list. Becaause the default sort is by bib number and it's a huge list, to find Mike I needed to click to sort by musher name, then select his name. A box pops up with basic information, and it COVERS UP THE DOTS ON THE MAP (yes, I'm shouting).

That is to say, the dots on the map aren't even visible. When I drag the screen around to uncover the dots on the map, is Mike highlighted? No, he's not, so I cannot even tell with a quick look where he is in relation to everybody else. So, I go back to the "leaderboard," sort it by name, and get his trail mile. Since trail miles are not displayed on the map, I need to find him in relation to the pack, which means that now I've got to sort the "leaderboard" by trail mile, which gives me some sense of how to find him (by finding other teams around him, which means looking for their bib numbers - I know, right?).

Mike's running last, so he was easy to find by dragging the map around some more. Since he's last he's easy to find. But here's an exercise for those following along at home: find Dan Kaduce. I'm waiting ...

Find him? How much clicking and dragging did you need to do to do that? If you'll recall, with Trackleaders all you needed to do was hover over his name and his dot-on-a-map would bounce, making him very easy to find, indeed. So basically, at this point in the race the tracker isn't carrying very much information. I think some of this is deliberate (they really don't want to make it easy for you to spot teams in trouble) but much of it is just lack of understanding of design issues and the software development process.

A digression: I just tried looking at mouseover pop-ups to see if that's a little easier, once you've got trail mile. It is, but the trail miles are wrong - they're showing a few people further down the trail at lower trail miles - see the screenshot showing Kristy Berington and, uh, somebody whose name is covered up by Kristy's pop-up:

Kristy is shown as being at race mile 36, yet she's ahead of someone who's shown as being at race mile 37.

Iditarod have made some very, very basic user interface mistakes, but because it's their software, they're the ones paying directly to fix every bug, improve every user interface error in the design, answer every user complaint, and so on. I am pretty sure they had absolutely no idea what they were getting into when they made the decision not to use someone else's software.

And one last screenshot before going back to bed, because if nothing else I'm grateful to Iditarod for such a clear demonstration that the development of production software should not be left to hobbyists:

Saturday, March 1, 2014

Start order and speed impacts, 2013

I've just run some numbers on the 2013 Iditarod, from the start to the first checkpoint (Yentna), with an eye towards getting a handle on the relationship between bib number (i.e. start order) and speed. The assumption is that because the trail gets torn up by everybody who passes over it, the teams with higher bib numbers will be traveling more slowly over the first section of trail. I looked at this in the Copper Basin in this blog post, and found that there was, in fact, a negative correlation (later teams traveled slower), albeit a fairly loose one.

So, I took a look at last year's Iditarod. The first thing I did was to plot speed against bib number:

If there's a relationship it certainly did not pop out as clearly as it did in the Copper Basin. So, I ran the actual numbers. Using the R statistical package, I ran a Kendall's rank correlation tau, which has the advantage of not making assumptions about the underlying distribution (for example, that the underlying data have a "normal" distribution). In a nutshell, nope, there does not appear to be a relationship between bib number and speed in these particular data, or at least not one that cannot be explained as the result of random fluctuation. Specifically, the results of the run are:

 Kendall's rank correlation tau

data:  y$Bib and y$speed 
z = -1.8238, p-value = 0.06818
alternative hypothesis: true tau is not equal to 0 
sample estimates:
       tau 
-0.1557844

So, with a p value of 0.06818, it does not meet a .05 significance level criterion.

It doesn't appear likely that distance makes a difference. You'd reasonably expect that a shorter trail would see greater impacts, but there's not that much difference in distance between the start and first checkpoints in the two races (42 miles in Iditarod, 50 miles in Copper Basin). So, there's still a bunch of work to do. It may have something to do with trail differences, or weather, or ... ? I'll be taking a look at the 2014 numbers after the last team is into Yentna this weekend, but over the longer term I'd like to aggregate a bunch of years and see what falls out.

Friday, February 28, 2014

The Iditarod track file

With Melinda away in London, a quick post from me, Chris. Over the last weeks, I've been asked whether I could provide an Iditarod track file and calculate distances between the checkpoints directly from it. (I do this sort of thing in my work all the time, for science.) I thought this sounded like a good idea. In practice, it was a little more involved than I expected. So here's a little bit of information about track files, and you can download some at the end of this post.
First let's clarify "track file". This is a non-specialist term for a file that is used to put a track on a map, with waypoints (for example checkpoints) and optional information (names, even images) enclosed. In geospatial jargon such files are called "vector files", which simply means that they contain collections of simplified real-life entities that can be represented as basic geometric objects: points, lines and polygons (the area enclosed by closed rings of lines, such as triangles, rectangles, or irregular shapes). For example, each tree in the forest outside my window could be represented as a point, whereas the area of the forest would correspond to a polygon -- or maybe a multipolygon (several non-overlapping polygons) if the forest consists of multiple wooded islands. The general terms for these things is "feature", and the most common feature types are point, linestring (sequences of points that make up a line), polygon (sequences of lines that enclose an area), and the multi- versions of each (multipoint, multilinestring, multipolygon). Vector files contain the coordinates that define each feature. (Oh, right, we also need a coordinate system -- there are many, latitude/longitude on an approximate Earth ellipsoid being very common. Mapping is a surprisingly complicated topic that can , luckily, be left to the software we use, most of the time.)
With this rough understanding what kind of file we're dealing with, what types of track file could we get for the Iditarod 2014? Well, the specific type depends on the file's purpose:

for science and map making, the data is usually stored in what amounts to little databases that combine both the geographic coordinates of the features and a table -- sometimes a large table -- of information about them. The most common are: ESRI Shapefile (a proprietary binary format, but with an open specification), GeoJSON (which is human-readable), or formats requiring full-blown database software (PostGIS, Spatialite...). These files require quite specialized software to work with.
for GPS tracking, a variety of text-based formats are available, the most easy to use being GPX.
for consumer-accessible web mapping, mostly under the influence of Google's Map and Earth products, Google's Keyhole Markup Language (KML) format has become widespread (and KMX, which is just KML + some extra overlay resources, zipped together).

KML is similar to the first category, but is not really made for storing a lot of feature attributes in a standard way. It also contains a lot of extra information related to the presentation -- the colour of the lines, links to little icons, the order in which the various feature layers should be displayed. GeoJSON often does for open-source web mapping what KML does for Google Maps, but is also a nice alternative to shapefiles.

As for the Iditarod trail, a simple web search shows that KML files are widely available. We want one with the track for the northern route, that is, an even-numbered year. Well, here is one for 2008. If you download the file and have Google Earth installed, it will open directly. But if you look inside, it turns out that the location data isn't actually contained in the file, but in a different one that is imported via the web. This is an example for how Google KML files are just a lot more flexible -- or messy -- than file formats made for professional or scientific applications. But we have tools to transform one format into others. Without going into any detail, the most powerful ones are a set of command-line tools distributed with the Geospatial Data Abstraction Library (GDAL) and, for making GPX files (and putting the geospatial data files on a map), the online GPSVisualizer (which uses GPSBabel).

With these two, and a bit of knowledge, I extracted the track and checkpoint information and converted it:

... to a set of ESRI Shapefiles (zipped archive) in Latitude/Longitude coordinates [1], for the routes and the checkpoints separately (shapefiles can only contain one type of features, so I had to separate the checkpoints (points) from the route segments (linestrings); [2])
... to a GeoJSON file containing both routes and checkpoints (this is an advantage of GeoJSON over ESRI Shapefiles)
... to a single GPX file, also containing both routes and checkpoints

You can right-click, save, and play around with the files. What can you do with them?

The GPX file opens in Garmin's no-cost Basecamp software (or the older Mapsource), which you can use to push it to a Garmin GPS device. I would expect the same works in other GPS software. (Maybe I get this post out early enough for some people who work on the trail to try it out!)
The GeoJSON file can just be opened in a text editor and read. It is a much cleaner version of the KML file.

The shapefiles are suitable for use in mapping (GIS) software. Here is a little map of both the Iditarod (northern route) and Yukon Quest trail, which I made from these shapefiles (and similar ones for the Yukon Quest) using free mapping data from Natural Earth and the Iditarod and the free GIS software called uDig.

Last, the shapefiles or the GeoJSON file can serve to calculate distances between the checkpoints. For programmers, here is a tutorial I wrote how to do this. For everyone else, I'll just copy and paste the result. Note that it starts in Willow.

Willow --> Yentna
  Distance in km: 50.7
  Distance in miles: 31.7
Yentna --> Skwentna
  Distance in km: 47.7
  Distance in miles: 29.8
Skwentna --> Finger Lake
  Distance in km: 56.4
  Distance in miles: 35.2
Finger Lake --> Rainy Pass
  Distance in km: 40.1
  Distance in miles: 25.0
Rainy Pass --> Rohn
  Distance in km: 52.2
  Distance in miles: 32.6
Rohn --> Nikolai
  Distance in km: 103.8
  Distance in miles: 64.9
Nikolai --> McGrath
  Distance in km: 82.0
  Distance in miles: 51.3
McGrath --> Takotna
  Distance in km: 26.2
  Distance in miles: 16.4
Takotna --> Ophir
  Distance in km: 32.7
  Distance in miles: 20.4
Ophir --> Cripple
  Distance in km: 113.4
  Distance in miles: 70.9
Cripple --> Ruby
  Distance in km: 109.7
  Distance in miles: 68.5
Ruby --> Galena
  Distance in km: 84.8
  Distance in miles: 53.0
Galena --> Nulato
  Distance in km: 76.1
  Distance in miles: 47.6
Nulato --> Kaltag
  Distance in km: 56.3
  Distance in miles: 35.2
Kaltag --> Unalakeet
  Distance in km: 115.9
  Distance in miles: 72.4
Unalakeet --> Shaktoolik
  Distance in km: 74.4
  Distance in miles: 46.5
Shaktoolik --> Koyuk
  Distance in km: 64.9
  Distance in miles: 40.5
Koyuk --> Elim
  Distance in km: 71.5
  Distance in miles: 44.7
Elim --> Golovin
  Distance in km: 42.7
  Distance in miles: 26.7
Golovin --> White Mountains
  Distance in km: 24.7
  Distance in miles: 15.4
White Mountains --> Safety
  Distance in km: 78.1
  Distance in miles: 48.8
Safety --> Nome
  Distance in km: 34.5
  Distance in miles: 21.6

Total distance: 1438.8 km -- 899.3 miles

Okay! Well, Unalakleet is misspelled -- because it was misspelled in the original KML. (I fixed it in the GPX and GeoJSON files.) Second, the total distance comes out a little low, but if you add an extra 5% or so (because there are only 771 route points for the whole track, so curves get cut off), it looks pretty good. Last, closer inspection shows that the very first leg Willow-Yentna has a very long straight line between the very first two points and therefore is particularly underestimated. So if you are interested in these distances, don't trust blindly, compare with what the Iditarod Trail Committee says, but go ahead and use them any way you want.

[1] For advanced users, a set of files in Alaska Albers projection is also available -- this is better if you want to use it to measure distances, as the coordinates are in metres in a map projection that works well for Alaska (not much distortion).
[2] Also, if you unzip the archive, you will file more than just two files: a shapefile actually requires three or more related files -- the one ending in SHP is the actual shapefile with the geolocation information, the one ending in DBF contains the attribute table and the one ending in PRJ contains coordinate system and map projection; then there are indices (SHX, ...) etc. etc.

Thursday, February 27, 2014

John Schandelmeier's ADN piece on trackers

I'm in London doing work-y things (workshop on strengthening the internet against pervasive surveillance, Internet Engineering Task Force meeting). It's a long trek from Alaska, and while I was in transit John Schandelmeier published an article in the Anchorage Daily News questioning the value of GPS tracking in dogsled racing. I actually agree with him substantially but think he's really not addressing a few things that matter a lot.

John is not the first racer I've heard or read saying things along these lines. I expect that it is incredibly annoying to be on the trail and away from people, the world, etc., but to see a red light blinking at you hour after hour after hour after hour. In addition to a general sense of being unable to disengage from the clutter, of one thing of which I have absolutely no doubt is that some number of people carrying trackers on their sleds feel like they're under surveillance.

I also think it's an open question what value they bring to the races. It's certainly less of a question in the case of Iditarod, since they seem to be making a profit on tracker subscriptions (I'd also argue that there are more people running Iditarod with marginal trail skills who need to be kept an eye on than there are in Quest, but I suppose that would be overly argumentative). With Quest it's less clear that it's led to a substantial increase in financial support from fans, particularly given the state of the purse over the past few years. And, of course, fan overreaction to things that they see in the trackers, plus managing the PR aspects of real problems in real time, increase both the workload and stress level for race staff.

And to be sure, there is no substitute for physical presence and human interaction. Over all these years, hands-down and by a large margin my favorite race spectating experience was last year at the Two Rivers checkpoint. Hugh's tracker was off and while we knew where Allen was we didn't know if Hugh was ahead of him, behind him, ... ? So there was a crowd, mostly handlers and people from the dog-savvy Two Rivers community, waiting at the checkpoint to see who'd be the first in and the likely winner of the 2013 Quest. There was a lot of chatter, a lot of suspense, and a lot of camaraderie as we waited.

That said, there is more than one way to experience the race, and I wouldn't denigrate the experience that people who cannot be here, who don't run dogs and don't know winter, are having as they follow along from home. As I was flying out yesterday/last night/whenever the heck that was (it all runs together ... ) I looked down on the landscape, and the trails that go on for miles without ever crossing a road or encountering a town, and once again I realized ridiculously lucky I am to be able to live in Alaska. Most people don't, and most people can't. They come up and visit and have, I'm sorry to say, staged, inauthentic experiences, but somehow it captures their imagination and they fall in love with the romance of the place even if they can't quite engage Alaska directly. Following the races is one way for them to keep the romance alive. I'd argue that's a good thing, even if it's not really got very much to do with what Alaska is actually about.

But still, one of the things I've been hammering on is that the data and the trackers do tell some stories, if you care to watch and listen. Unfortunately Trackleaders.com is down right now but when it comes back up I'll post a bit of John's track from this year's Quest, where we did get to watch a story unfold and did get a sense of what was happening. He was traveling with Matt Hall (and really, somebody has to have a word with the unfortunate PR people who were handling the Quest's Facebook page and were turning every instance of people traveling together into a race). We watched them stop, leave the trail, go some distance, turn around, rejoin the trail, and stop again (here's an excerpt from Matt's track; John's looks much the same). So, while we don't know what they looked and felt like, we do have some idea that they ran into some tough trail and we watched them deal with it.

Similarly, I think a lot of people following on the GPS tracker remember standing up and screaming at their computers while watching Rob Cooke on Eagle Summit last year. We watched him motor on up, pause, and turn around. This is a case where it was much less clear what was going on (it looked possible that he was having problems but it turned out that he had so little difficulty going up he thought he must have left the trail, and turned around to find it) but it was emotional in any event.

So no, it's not at all the same as being on the trail. People are working with woefully little information and sometimes they don't understand what they're seeing at all (and this is where race organizations can be doing a better job, to head off overreaction and to help fans understand what they're seeing). But they're having a different kind of experience and have their own level of emotional involvement in it. John and others may not value it as highly as they value direct trail experience (and I wouldn't, either), but it's real and it's meaningful.

Sunday, February 23, 2014

Why Iditarod tracker mileage sorting is messed up

People have noticed that the sort by mileage function on the Iditarod leaderboard was wrong (it's been fixed), with 100-something miles sorting in front of fewer miles. For example, see this screen shot grabbed by Dawn Beckwell:

Here's what's going on (and this will be old news to a few people and not interesting at all to most): Computers are binary calculators, with all data, from programs to stored files, taking the form of a string of 0s and 1s. Both characters and numbers are also strings of 0s and 1s, and there are standardized encoding schemes for representing character data. By far the most popular/successful is known as ASCII, or the American Standard Code for Information Interchange (nice collection of ASCII tables here). So, while the number 1 is "00000001" in base 2 (again, base 2 because each bit can take one of only two values, 0 or 1), the character "1" is encoded in ASCII as 00110001. That's right, 1 and '1' are not the same. When you see a '1' on a screen what's really behind that - how the data are really represented - is 00110001. 00000001 is translated to 00110001 for printing or display.

Programs that do things to data, like sort them, have no way to know what 00110001 is or how they should treat it. In a lot of web-oriented and application-oriented programming languages it's very easy to sort data (old school, we had to write our own sort functions) but the default is that data are treated like characters. They look at the first character in each string and sort on that, etc. In that scheme, "1" is smaller than "7" and sorts earlier. To sort as numbers, in modern programming languages you just need to tell the sorting function to treat the data as numbers, not characters, so instead of looking at it character-by-character it understands that the value it needs to sort is 101.5.

Saturday, February 22, 2014

Answering a question with the new Iditarod tracker

Today, someone asked what time Conway got into Yentna. So, how do you answer that question with the Iditarod tracker analytics? Well, the best you can do is to take a look at what time his speed fell to 0mph. Because there is no plot showing mile location against time (or time against miles), you can't say "Checkpoint <abc> is at mile 128 and Musher <def> was at mile 128 at 4:20, so Musher <def> arrived at <abc> at 4:20. Instead, you can make inferences from speed. And, in fact, the standings say he got in at 3:56.

Another possibility for figuring out when he got in is to use the replay. They don't allow you to control the speed of the replay and wow, that's going to suck a lot when the race gets longer and you've got over 60 teams on the trail, but if you drag the slider you can do it manually.

[Another UX problem - that legend on top of the curves is really annoying!]

The new Iditarod tracker and user experience ("UX")

First, I'd like to apologize for not blogging much over here. I've been extremely busy with work (that's a good thing, mostly) and have found that keeping a Facebook page is a handy way to get out short notes. The Facebook page is here.

Anyway, Iditarod's new tracker is up and being used to track the Junior Iditarod. It's pretty clear that they've written their own based on a data feed from Trackleaders, and it's also pretty clear that they didn't have time or the means to debug it. Software quality assurance is much more difficult than you might think. One common problem is that commercial-grade software needs to handle unexpected inputs gracefully. It is an ongoing source of amazement what people will try to do with something, things you never could have expected and didn't plan for. When people in the techie business say "The first 80% of the project takes 80% of the effort, and the last 20% of the project takes the other 80% of the effort," that's nearly always what they're talking about - quality assurance. People who haven't done a lot of commercial software tend not to appreciate this and think that when a program does what they want it to, they're done. Not sure what to say about that other than "Hah."

Anyway, rather than dwell on bugs I'd like to talk for a minute about user interface issues. User interface is also a very highly specialized area in software. It's something at which I am truly terrible, so I rely on people who understand user behavior, workflows, and so on. But in this case I am a user and here's what I'm finding:

Some of the things which work well for a 9-team field are going to be nearly unbearable when there are nearly 70 teams being tracked
First, good for them for making the columns sortable in the leaderboard (the panel on the left-hand side of the map). It's helpful, and it's going to be absolutely necessary when there are 60-odd teams on the trail
Too much clutter on the screen, and it covers up portions of the map. Unfortunately because of a few other problems with the user interface we kind of need to keep some of it around (the leaderboard)
The base map layer is not a good choice. I understand that this is what Mapbox provides but the lack of labels on geographic features is unfortunate. It would be nice to have the option to switch between a map layer and a satellite image layer (a topo layer would be awesome but I understand it's a lot more difficult to come by - another plus for Trackleaders). On a more positive note, today Iditarod switched from showing the road map layer to showing a terrain map layer. It makes it easier to compare to a topo map, plus - let's face it - using a road map to track a wilderness race is kind of dumb
We can't zoom out to cover a larger geographic area. Can't imagine why not unless it costs them money (does Mapbox charge for tile access? Don't know).
They need to get a handle on this whole "rest" thing. I'm very interested in run/rest schedules (you should be, too! They're a key question in understanding distance dogsled races) and a 10-minute stop for snacks or to check booties is not at all the same thing as a 4-hour break. Also, it's probably a mistake to display a musher as having stopped the same as a tracker that hasn't updated. It doesn't help that a single 0mph reading is treated as stopped, because it means that they're also showing a single missed tracker update as stopped
If you hover over a flag on the map it gives you the geographic coordinates of that tracker update. I assume they did that for debugging purposes, but for those of us following the race it would be a big improvement if they showed the musher's name, instead. Right now you need to go back to the leaderboard to find who a given bib number belongs to. That's going to suck when there are 69 teams on the trail.

I'm having a hard time calling their analytics "analytics," since they don't provide that much insight into what's going on on the trail. I keep hammering on this because I think it's important: the competitive advantage that Trackleaders brings to the event tracking business is that they know how to tell a story using data. Teams on the trail aren't simply moving down a line, they're also moving relative to one another, and that movement is much of the story of a race. Who's traveling together? Who's passing whom, and where is it happening? How much faster *is* one team traveling than another, really? Is there one particular spot on the trail that's a popular camping site? I get the impression that the folks working for IonEarth were pressured by Iditarod to provide analytics and found a Javascript library containing strip charts so implemented that, without thinking very hard about what they want to show. Now, Iditarod is copying that.
Here's one thing the Iditarod analytics do do well: by mapping speed against time they give you some insight into a particular musher's run/rest schedule.
It's great that they let you get a musher's "analytics" directly from the leaderboard but it's kind of a clutzy process. I usually start by noticing something on the map I'd like to look at more closely. In this case, what we see on the map is bib numbers. So you need to go over to the leaderboard (open it up if you've closed it to mimimize clutter), find the bib number, then click on that person's analytics icon. It'd be a lot more straightforward to be able to go from the map marker directly to the "analytics."
Another clutter-related issue is that because the pop-ups don't close when you open another, you can get a mess pretty quickly. Unfortunately closing them can be a little hit-and-miss with your mouse. Anyway, your moment of fugly:

They're not really strong on the mileage reporting and while the analytics show speed against time there's really no easy way to compare how two teams performed over the same section of trail

Anyway, enough kvetching. When you're in the software business and when you're an engineer, your first instincts when facing new technologies are 1) to figure out how it works, and 2) to figure out how to make it better. Alas, this tracker is giving us plenty of opportunities for the latter. But ultimately what matters is how it works when put to some basic tests, and in a couple of future posts I'll look at how to answer certain kinds of questions using this software.

Saturday, February 1, 2014

Now on Facebook, too

I've created a Mushing Tech Facebook page as a more efficient way of sharing information. In addition to pointers to blog posts there will also be quick notes about things which may not merit an entire post. The page is here.

Friday, January 31, 2014

2012, from the start to Two Rivers

Here's a quick summary of times from the start to Two Rivers in 2012:

Times ranged from 7:15 (7 hours and 15 minutes) to 12:51, with speeds from 5.6mph to 9.93mph. The fastest time was Hugh Neff's. I would not expect times tomorrow to be quite as fast, given trail conditions. The average (mean) time was 10:01:49, but the distribution was skewed and the median time was 10:50 (average speed 7.4 and a median speed of 6.65).

Here's a histogram of the speeds:

As you can see, most of the speeds were clustered around the lower end of the scale, but a few speedsters pulled the average up.

Here's a complete table of the average speeds from the start to Two Rivers:

Musher Name	Speed
Kyla Durham	5.6
Jason Weitzel	5.95
Misha Pedersen	6.07
Maren Bradley	6.12
Abbie West	6.23
Paige Drobny	6.3
Allen Moore	6.38
Brian Wilmshurst	6.42
Michael Telpin	6.43
Mike Ellis	6.49
Joar Leifseth Ulsom	6.65
Sonny Lindner	6.24
Brent Sass	7.13
Yuka Honda	7.32
Nikolay Ettyne	7.59
Marcelle Fressineau	8.45
Trent Herbst	8.71
Jake Berkowitz	8.8
David Dalton	9.27
Gus Guenther	9.33
Lance Mackey	9.6
Kristy Berington	9.8
Hugh Neff	9.93

Returning mushers (in order of speed) are Hugh Neff, Dave Dalton, Brent Sass, Mike Ellis, Brian Wilmshurst, and Allen Moore.