Thursday, January 29, 2015

Tools for looking at run/rest schedules

If you've followed distance mushing for any amount of time you're keenly aware of the role that run/rest schedules play in the sport.  They can be both strategic and tactical, and can reflect the breeding decisions a given musher makes, as well as their training regimen (more on that in a bit).

I've posted a couple of videos on my Facebook page using Trackleaders tools from last weekend's Northern Lights 300 to look at how to use them to get a better understanding of how teams are performing against each other as they move down the trail, and how to use the replay function to watch interesting, and occasionally surprising, things happen during a race, based on how the teams move on the map.  In the latter video I looked more closely at Larry Daugherty, because there were some unusual things happening on his tracker that stood out from the rest of the race.

In Larry's recap of the race he talks about deciding that he was going to try to be competitive, and making adjustments to his run/rest schedule to be more competitive.  Of course, now we know that his team shut down on him twice, and the first place to look given his comments and given what happened is at how he rested his team.

During last year's Yukon Quest I wrote a short Python program to pull rest times out of a Trackleaders musher track.  It's available for download, but for the more visually-oriented, you can take a look at a musher's speed/time plot on their Trackleaders page.  For example, here's Kristy Berington's for this year's NL 300.  She won the race, so clearly the decisions she made worked well for the team she had and the training she'd done:


Reading this is absolutely straightforward.  The x-axis (horizontal) is race time - hours since the race start - and the y-axis (vertical) is speed.  This looks like a pretty standard schedule for a 300-mile race.  Her schedule looked roughly like:

        Run 7 hours, rest 6 hours
        Run 6 hours, rest 3 hours
        Run 5 hours, rest 5 hours
        Run 8 hours, rest 4 hours
        Run 7 hours to the finish

So, the longest of the runs were only moderately long, and pretty much in keeping with what we've seen be successful in other races.  None were what anybody would call very long.  There were four rests.  Note that the longest run was in the last half of the race, the second-to-last one, which was 8 hours.

Now, Larry said that his goal was to run long and rest long, as he'd seen other mushers, notably Allen Moore, do with a great deal of success.  So, let's take a look at Allen's successful Copper Basin 300 race earlier this month:


Again, time is on the vertical axis and speed is on the horizontal axis.  His schedule looked roughly like:

        Run 5 hours, rest 5 hours
        Run 7 hours, rest 7 hours
        Run 3 hours, rest 5 hours
        Run 9 hours, rest 2 hours
        Run 8 hours to the finish

Again, a few moderately long runs, four rests, and the longer runs in the second half of the race (also top speeds of 30mph in a few places - woo, Allen!  But those, sadly, are actually measurement errors).  Note, as well, that in mid-distance races with mandatory checkpoint rest, the run/rest schedule is going to be influenced by checkpoint location and layover rules.

So now we've looked at a couple of successful run/rest schedules, including one that Larry said he was using as a model.  Let's look at what Larry actually did:



        Run 7 hours, rest 7 hours
        Run 12 hours, rest 5 hours
        Run 8 hours, rest 6 hours
        Run 8 hours to the finish

Note that there are places in the third run and the run to the finish where his speed dropped considerably, and in one case where he actually stopped.  Those are the places where his dogs quit on him.

What pops out here is that he only did four runs over 300 miles, while Kristy did five runs in the NL 300 and Allen did five runs in the CB 300.  Note as well that his long run was comparatively quite long (12 hours), that it was in the first half of the race, and that it was not followed by much rest.  His run/rest schedule does not actually look much like that of mushers winning at that distance, and is clearly one possible reason why his dogs quit on him several times.  And this gets back to the training question - if he hadn't been training for very long runs prior to the race his dogs were likely not in condition either mentally or physically to pull one off.

Eyeballing the curves it also looks like Larry lost more speed on the first run than Kristy did.  It might be interesting to fit a regression line to each and see how the slopes compare, but think it would only be a little bit interesting.  More interesting is that many top mushers have talked about leaving the start chute at the same speed they'd like to run their race, or about negative splits, where their speeds towards the end of the race are faster than their speeds earlier in the race.  Sebastian Schnuelle has talked memorably about being passed left and right by other teams towards the start of the race, and telling them "I will see you later."  And nearly always, he does.  Going out fast may or may not hurt but it's not clear that it helps.

This plot (the speed vs. time plot) is an incredibly handy tool for looking at run/rest schedules.  One enhancement I'd love to see would be the ability to overlay multiple mushers on the same plot, which would allow us not only to compare rest durations and locations, but also speeds while moving.  In the meantime, my script has the option to output the results in CSV format, which is handy for loading into a spreadsheet or into a data analysis package like R or NumPy.

Tuesday, January 20, 2015

A closer look at the Buser shortcuts in the Kusko

I think at this point most distance mushing fans are aware that both Rohn and Martin Buser left the trail during this past weekend's Kuskokwim 300, and that they received no penalty for having done so despite the trail they took being somewhat shorter than the race trail.

The Kusko 300 rules say:
"Racers must follow the marked and/or broken race trail. Leaving the marked and/or broken trail for the purpose of gaining a competitive advantage over other racers is not allowed."
Right there is an obvious problem: it implies that if someone leaves the trail and follows a shorter route by accident, they will not be penalized.  It requires the race judges to attempt to determine intent, and that's both difficult and unfair, as it introduces a highly subjective element into the decision.  There's also the question of incentives - if you kill a moose with your car in Alaska, you don't get to keep the meat, antlers, or any other valuable part of the animal because the state doesn't want you accidentally-on-purpose killing a moose.  Same with killing wildlife in defense of life and property - you don't get to keep anything of value from the animal because they also don't want you accidentally-on-purpose having a dangerous run-in with a bear.  Setting up a situation in which it's okay to leave the trail under particular circumstances removes some of the disincentives for leaving the trail.

But, this doesn't apply in Rohn's case, as he was told that he was off-course and given the opportunity to return to the race trail, which he did not do.  So.

What interests, I think, a lot of us is whether or not these shortcuts had an impact on the outcome of the race.  We can use several Trackleaders' tools to look at that question and try to sort it out.

First, let's look at the shortcut itself.  If you go to the Trackleaders page for the race, down the right-hand side of the map you'll see a column of buttons.  Click on "Map layers."



That will expand to a series of checkboxed menu items: Weather Conditions, Cloud Cover, All Musher tracks, Tent layer, and Scratch layer.  Click on "All Musher tracks."  That will draw all of the tracks for all of the mushers who are being tracked.  I've done that in the following image, and zoomed in on the trail near Bethel (I've also switched to satellite view, as the tracks stand out better on the map).



Clearly there's no question that Martin and Rohn did not follow the same trail as everybody else, and it appears that the trail they did take was shorter.  So, was it shorter, and if so, how much of an advantage did they gain?

To try to suss that out, let's look at the race flow chart, which can give us a pretty clear look at average traveling speeds and team speeds relative to one another, as well as showing us speed anomalies in the track.  Here's the race flow chart from Tuluksak to the finish.  Note the big vertical jag in Rohn's and Martin's curves - that's where they left the trail (to digress a bit, Trackleaders appears to calculate musher trail mile by comparing the location of the GPS reading to the track that they were given by the race organization).  Note, as well, that both Rohn and Martin had slowed down and were losing speed relative to the teams around them.  On the race flow chart the x-axis (horizontal) represents time since the race began and the y-axis (vertical) represents trail mile.  The steeper the slope of a musher's curve, the faster they're going, and the less steep it is, the slower they're going.  So, that they were losing ground is very clear, and that they got a bump from their shortcut is also very clear.



There are several things we can do in looking at the data.  One thing we can do is look at the bumps and try to figure out what it did to "effective" speed on the trail.  Switching over to looking at Rohn's individual tracker map, it appears that he left the trail right at about mile 253.5 and rejoined at mile 266.   Again, according to his individual track,, that would mean that he left the trail at about 4:37am and rejoined the trail at about 5:36am.  So, as far as the race is concerned he covered 12.5 miles in 59 minutes, or averaged a hair over 12.5 mph over that section of trail (again, as far as the race is concerned).  If you take a look at his traveling speed prior to that (looking at the slope on the race flow chart) at hour 32 he was at mile 234.5 and at hour 34 he was at mile 253.5, so was traveling about 9.5 mph in the two hours prior to leaving the trail.  That is to say, he got about a 1 mph boost.

Now, if he'd stayed on the trail and continued traveling at about 9.5mph, would he still have beaten Jeff King to the finish?  More arithmetic.  He left the trail at about 4:37am, at trail mile 253.5.  The finish in Bethel is at about trail mile 267.3.  That's 13.8 miles.  If he'd been traveling at a constant 9.5 mph over that 13.8 miles he'd have arrived in Bethel about an hour and 27 minutes after the time at which he left the trail, or around 6:04 am.  Jeff got in around 5:58.  That's close enough, I think, to be questionable, but Jeff probably would have finished earlier and gotten the $17,000 check instead of the one for $11,500.

Just for fun, let's try extending Rohn's curve long its original path to see if it results in something much different, and as a way of validating (or not) the values I've been using for time and trail mile.  Also, pictures are just plain easier to understand.  So, what happens if I extend Rohn's line on the race flow chart along its original path?  This, which shows him finishing at about the same time as Jeff:




Note that Rohn's line, however, was not straight - it bulges a bit on top of the straight line because ... he was slowing down.

Running some numbers on Martin's track, he left the trail at about 5:54 am and returned to it at about 7:04am, so as far as the race is concerned he ran that section at 10.9 mph, while otherwise running about 8.7 mph.  If he'd stayed on the trail at a constant 8.7 mph, he would have arrived in Bethel in the general vicinity of 7:30, or right about the same time as Brent Sass.  This one is a lot fuzzier than the Rohn/Jeff situation.

So basically, yes, I think that if Rohn hadn't left the trail Jeff would have beaten him into Bethel, but not by much.  I don't understand why the Busers weren't penalized for leaving the trail and taking a shorter route and I especially don't understand why the race organization has said absolutely nothing. Given that the Kusko organization hasn't said a word about this I think it would be nice if Rohn would take responsibility and then donate the $5500 difference between a 2nd and 3rd place finish to a local charitable organization in Bethel.  The way the race ended just leaves a bad feeling all around.

[Update: KYUK reports that both Busers have been penalized 10 minutes and 10 percent of their winnings.  I do think that both received substantially more than a 10-minute benefit from their adventures on Church Slough, but I'm glad that the problem has been formally recognized by the race.]

Monday, December 22, 2014

Updated SPOT API example

Several years ago I posted some example code to exercise the SPOT tracker API. SPOT has since both updated their data format and added a JSON feed.  I think both changes are for the better; certainly JSON is easier to deal with than XML and is far more efficient to process.  So, I've updated my code to speak JSON.  Note that in this code I'm pulling the JSON in from a file that I created by downloading the JSON using curl; you should be able to pull it in directly from SPOT by putting together a URL following the instructions on their API page (I did it using a local file because SPOT, frankly, is a bit parsimonious about server traffic).  Nearly all of the change was to the function extract_gps_data().

Note that the data are simply a JSON-ized version of the data shown at the SPOT API page.

I've put the code up on Github, as well.  The caveats from the previous API post still apply - don't write a tracker that runs in the browser, don't write it in Javascript, don't hit the servers (either SPOT or Google Maps) more often than absolutely necessary, etc.  Be sensitive to privacy issues, and what you're revealing when you write a tracker that's publicly available. 

Let me know if you identify errors in the code or if you have questions!  Most of all, have some fun with this.



<!DOCTYPE html>
<html>
<head>

 <title>My Wee Tracker</title>
 <meta name="viewport" content="initial-scale=1.0, user-scalable=no" />
 <style type="text/css">
  html { height: 100% }
  body { height: 100%; margin: 0; padding: 0 }
  #map_canvas { height: 100% }
 </style>
 <script type="text/javascript" 
  src="http://maps.googleapis.com/maps/api/js?key=AIzaSyBFoJjPtS9vWXIENOa-egd0XFFnnQbfTIk&sensor=false&libraries=geometry">
 </script>

<script type="text/javascript">
//<![CDATA[


// convert text from the tracker data to a JSON object and
// pull out deeply-nested data elements

function extract_gps_data(trackerdata)  {
    var points = new Array();

    var track_data = JSON.parse(trackerdata);
    var messages = track_data['response']['feedMessageResponse']['messages']['message'];

    for (i = 0 ; i < track_data['response']['feedMessageResponse']['count'] ; i++)  {
        var timestamp = messages[i]['dateTime'];
        var latitude = messages[i]['latitude'];
        var longitude = messages[i]['longitude'];
        var point_holder = new point(timestamp, latitude, longitude);
        points.push(point_holder);
    }
    return points;
}


// "point" is an object we use to hold the data we'll be putting on the map

function point(timestamp, latitude, longitude)  {
    this.timestamp = timestamp;
    this.latitude = latitude;
    this.longitude = longitude;
}

function get_track(url)  {
    var request = new XMLHttpRequest();
    request.open("GET", url, false);
    request.send();
    return request.response;
}

function makeinfobox(pointnum, thispoint, theotherpoint)  {
    var latlnga, latlngb; 
    var distance;
    var infoboxtext;
    var timestamp;
    
    timestamp = new Date(thispoint.timestamp); // we convert it from ISO format to something more readable
    infoboxtext = String(timestamp);
    if (pointnum > 0 && theotherpoint)  {  // no point calculating distance on the point
        latlnga = new google.maps.LatLng(thispoint.latitude, thispoint.longitude);
        latlngb = new google.maps.LatLng(theotherpoint.latitude, theotherpoint.longitude);
        distance = google.maps.geometry.spherical.computeDistanceBetween(latlnga, latlngb) / 1610; // convert to miles
        infoboxtext = infoboxtext + "<br />" + distance.toFixed(2) + " miles";
    } 
    return infoboxtext; 
}

function initialize()  {
    var points;
    url = "spot_track.json";
    trackline = new Array();

    trackerdata = get_track(url);
    points = extract_gps_data(trackerdata);

    var spot = new google.maps.LatLng(points[0].latitude, points[0].longitude);
    var my_options = {
        center: spot,
        zoom: 12,
        mapTypeId: google.maps.MapTypeId.ROADMAP
    };
    var map = new google.maps.Map(document.getElementById("map_canvas"), my_options);
    for ( i = 0 ; i < points.length ; i++ )  {
        var contentstring = "Point " + i; 
        var spot = new google.maps.LatLng(points[i].latitude, points[i].longitude);
  // here we create the text that is displayed when we click on a marker
        var windowtext = makeinfobox(i, points[i], points[i+1]);  
        var marker = new google.maps.Marker( {
            position: spot, 
            map: map,
            title: points[i].timestamp,
            html: windowtext
        } );
  // instantiate the infowindow
  
        var infowindow = new google.maps.InfoWindow( {
        } );

  // when you click on a marker, pop up an info window
        google.maps.event.addListener(marker, 'click', function() {
            infowindow.setContent(this.html);
            infowindow.open(map, this);
        });

  // set up the array from which we'll draw a line connecting the readings
        trackline.push(spot);
    }  
 
 // here's where we actually draw the path 
    var trackpath = new google.maps.Polyline( {
        path: trackline,
        strokeColor: "#FF00FF",
        strokeWeight: 3
    } );
    trackpath.setMap(map);
}

//]]>

</script>
</head>

<body onload="initialize()">

<div id="map_canvas" style="width:100%; height:100%"></div>

</body>
</html>

Saturday, December 20, 2014

Catching up with the SPOT API

Over the holiday I'm planning on putting out an update of the sample code that uses the SPOT API, both to work with the new format and to work with the JSON representation rather than the XML.  In the meantime, here's a sample JSON element representing a single SPOT tracker message:


{
'    @clientUnixTime': '0',
    'batteryState': u'GOOD',
    'dateTime': '2014-12-19T07:34:35+0000',
    'hidden': 0,
    'id': 349242506,
    'latitude': 64.82709,
    'longitude': -148.99679,
    'messageType': 'TRACK',
    'messengerId': '0-8283550',
    'messengerName': 'Melinda\'s tracker',
    'modelId': 'SPOT2',
    'showCustomMsg': 'Y',
    'unixTime': 1418974475
}

The (minimal) SPOT api documentation is available online here.  More later!

Monday, November 24, 2014

2011 Quest runtimes

Following up on my previous post on 2013 Quest statistics, I've gone back, cleaned up my 2011 data, and run some numbers on that.  What I found was pretty consistent with the 2013 numbers, except that there were a few surprises in the numbers that actually helped tell the story of the race.

If you've been following the Yukon Quest for awhile you may remember that 2011 was pretty harrowing, with some bad weather in the middle and second half of the race that caused some very serious problems for those in the front of the pack.  Brent Sass recorded some particularly memorable video as he helped a nearly-hypothermic Hans Gatt off American Summit:




and Sebastian Schnuelle's ice-caked boots in Central told a story, as well:



What I did with the 2011 data was virtually identical to what I did with the 2013 data: For each checkpoint I took a look at runtimes to get a general picture of what happened on that leg of the race.  Then I collected those runtimes into a table, where I ran correlations of runtimes for each race segment against the finishing position.

The unsurprising part was that race segment distance correlated quite well with overall finishing position, with the longest race segment (Dawson to Eagle) showing a very, very large positive correlation with finishing position, at .8537.  So again, this year look for longer race segments to have greater predictive value for finishing position.

The surprising bit, at least initially, was that there were four checkpoints at which runtime was inversely correlated with finishing position (in one case, strongly so).  They were Slavens (-0.1755), Circle (-0.4027), Central (-0.2543), and 101 (-0.0877).  At well over half-way into the race, the faster teams had pulled up towards the front, and this is where they ran into awful conditions, which slowed them down.  By the time the back of the pack arrived the overflow had frozen and the weather had moderated.  So, they were able to travel faster.

Nevertheless, the correlated between traveling speed over longer segments and overall finishing position remained reasonably strong, with an r value of 0.4527.  Here's the plot:




If you're interested in looking at the data and playing with them yourself, they're here, with the correlations on the very last sheet.  Unfortunately Trackleaders hadn't added the replay feature to their tracker at that point, but the race track is online here and looking at individual musher pages can help illustrate some of what happened (the closer together the breadcrumbs, the slower the team was moving [assuming that the breadcrumbs were being uploaded at regular intervals]).

Also, note that this analysis lacks anything resembling rigor, including questionable choice of metrics for correlation, etc.  But, for a casual description of how the race played out and how that's reflected in the race statistics, I think it's adequate.  Let me know if you spot a problem, or if you've got further questions.

Tuesday, October 14, 2014

A look at 2013 Quest runtimes

It's looking like this year's Yukon Quest has a pretty good field of entries, and with fall training well underway in interior Alaska we're all starting to speculate about how the race is going to go this year. It's only natural to look at past races, so I've started poking at the 2013 data, the last year the race was run in the Whitehorse-to-Fairbanks direction.  I'm also interested in having a baseline set of data to which this year's race can be compared, once it's underway.

So, I've taken my own spreadsheets from 2013 and used them as a basis for running some numbers.  In particular, I've created a spreadsheet containing runtime 2013 checkpoint data and extracted runtime summary data to get some basic descriptive statistics: fastest, slowest, mean, median, 1st quartile, and 3rd quartile for each race segment (between checkpoints).  For example, between Braeburn and Carmacks, I've got a table that looks like this:




You'll note that I've also calculated the ratio between the fastest and slowest runtimes; there may be something interesting there to look at later.  I've also plotted all runtimes as a histogram:



Again, this is largely to create a baseline dataset for comparison with this year.

However, I also ran some correlations, and while the results are obvious if you think about them for a few seconds, I haven't ever seen anybody say so explicitly:

The ranking of runtimes on longer race segments (more miles between checkpoints) tends to correlate more strongly with final standings, at least in this data set.  That is to say, people who had the faster runtimes between checkpoints which are far apart tended to finish better than people with slower runtimes.  Some of this is tautological (the longer runs are a greater percentage of the total race), some of it might (I haven't looked at this) be because on long runs everybody has to camp, even people who would otherwise prefer to rest at checkpoints, or because longer runs smooth out the variability you might see in shorter runs (law of large numbers, sort of).

Here's a plot of the the correlations between segment runtime and finishing position, against segment distance.



The correlations are on the "correlations" worksheet in the spreadsheet.  I'm using a standard Pearson product-moment correlation coefficient (r), which is not the best test for a complete dataset but is adequate for exploring these data.  I'm not posting the numbers here in the interest of not having reader eyes glaze over, but definitely feel free to visit the spreadsheet, poke through the data, copy the spreadsheets, and ask questions.

I'm planning on doing something similar with 2011 data, as well as years run in the opposite direction, to see how well what we're seeing in 2013 holds up.  Unfortunately getting the data into the spreadsheet and clean enough to use is pretty labor-intensive, so it might be some while before a follow-up to this post.  But, once the data are in a spreadsheet there's a lot we can do with them, so there's incentive to do it beyond answering just these questions.

Here's looking forward to a great winter of distance dogsled racing!

Monday, September 1, 2014

New (ish?) event tracking software

It looks like there's a new GPS-based event tracking application, RaceBeacon.  From what I can glean from their website it looks like they're consolidating GPS feeds from individual participants' personal smart phones.  This is probably reasonable for shorter (as in, very short) informal events, and may provide a solid basis for building out a more robust platform with more features in the future.  As much as I'm a fan of Trackleaders (and I'm a huge fan, for reasons I'll go into in the next paragraph), it's always great to see some competition in this space.

That said, just showing locations on a map is not that interesting, particularly for events with staggered starts (like sprint races).  A map alone can show you where teams are in relation to each other but they don't capture the dynamic nature of racing - the stuff that makes racing exciting.  Prominent among the reason that I like Trackleaders is that they really are both data guys and competitive cyclists, and they're interested in showing the story of a race as it unfolds.  Also, because they're data guys and computer scientists they've already dealt with some relatively difficult problems, for example calculating with some degree of precision the race mile at which a given team is currently located (harder than you'd think).  In this case, for sprint mushing races, the problem RaceBeacon is facing is how to demonstrate the relationship between two different teams' performances when the entire race is run in 20 minutes without stopping and the teams started 2 (or 4 or 18) minutes apart.  But, if you're not committed to using another tracking system and you're putting on a sprint event (and you can count on most or all of your teams carrying smart phones with data plans and having their batteries fully charged and being a platform supported by RaceBeacon), this could be very interesting to experiment with.

If you've used them for tracking your event, how did it go?  Who's planning on using them this fall or winter?