Mushing Tech

Taking a breather

2016-04-08T15:24:00.000-08:00

With the sport's pre-eminent event being such an unreconstructed and largely irredeemable crapshow, we're taking a break.

Race signups, fairness, and stress reduction

2015-10-03T12:48:00.000-08:00

Today sign-ups for two of the most popular mid-distance races in Alaska, the Copper Basin 300 and the Knik 200, opened. Because they are so popular and tend to become oversubscribed there's a crush of entries as soon as they open. Unfortunately there have been technology failures for both races, and mushers don't know whether or not they're in. Copper Basin has responded by providing alternative mechanisms for entering, and the Knik is, so far, sticking with web-based race entry.

We've got a few goals here at Mushing Tech. Among them is that we'd really like to make life easier for volunteers - without them there's no race, and being a race volunteer can be exhausting and stressful. I have very little doubt that the Copper Basin and Knik volunteers handling race entries are extremely stressed right now. Additionally, a number of mushers are questioning the fairness of the entry systems. Towards the bottom of this post I'll discuss the fairness question in more detail (particularly sources of statistical unfairness) but I think the more important question is what can be done to make everybody's lives a little easier, reduce stress levels, and reduce both the likelihood and the impact of technology failures.

I think the number one thing that can be done to improve race registrations would be to move away from the current land rush model, where everybody's trying to get their entry in early. Using this model creates a load on the race registration system that unintentionally increases the likelihood of technology failure. If races were instead to announce that they'd be doing random drawings on a given date, mushers would be able to enter at their leisure by that date, use varying technologies to submit their entry application (email, postal mail, etc.), and would be less likely to feel disadvantaged if they have only very slow or intermittent connectivity. There are some excellent sources of randomness available both on our own computers and online (see, for example, this) which can be used to draw from the pool of entries. Races can set aside some slots for veterans or for people running qualifiers and do random draws for everybody else, etc. The main thing is to remove the pressures associated with getting an entry form in before everybody else.

Copper Basin has responded to their web outage by allowing mushers to enter by phone (voicemail) or email, and then the race organization will rely on timestamps to determine entry order. This has a number of problems with it, including that there's an excellent chance that the clocks on their computers and voicemail are not synchronized. Depending on whether or not the voicemail system is run by the phone company or is a home answering machine, it's possible that there's what's called in computing "head of line" blocking, where a queue is blocked on action on the head of the line, and queue members can only be serviced one at a time, unlike email, which can be received in parallel. Plus, voicemail is slower, both because of connection latency (the phone has to ring and be answered) and because it's just slower to leave a voicemail than it is to send a piece of email. So, we suspect that people who used voicemail to register were statistically disadvantaged relative to people who used email. However, people who live remotely and use satellite internet were disadvantaged relative to people who use DSL or cable for their internet connectivity, since the round-trip latency for satellite is at least 1/2 second (laws of physics!) and typically much higher. However, unlike the voicemail situation we'd be a bit surprised if delay associated with satellite internet had much of an impact on signup order. Similarly, if the issues with the Knik website are failures to establish a TCP connection to the race's web server, those who can retry quickly have an advantage over those with slow or high-latency connectivity.

At any rate, the goal here is to keep everybody happy: volunteers should not be stressed and mushers should feel that they're being treated fairly. We suspect that the current registration model doesn't really lend itself to that goal, and that a deadline model has a number of advantages over a land rush model.

How to access (and copy!) race spreadsheets

2015-07-30T20:48:00.000-08:00

Hi, all!

I thought it might be useful to produce a video showing how to access race spreadsheets. I know that many of you are curious about what might be shown in data from past races and have questions that you'd like to answer, but haven't had the data to do it. Data acquisition is quite labor-intensive and not very much fun, so not there's not that much data out there, and what there is is often in, well, suboptimal shape. My race spreadsheets are in Google Drive and so are available on the web to anybody who's interested in taking a look at them, making copies, doing their own analyses, and so on.

So, I've made a video showing how to access them and how to make copies for your own use. I haven't figured out how to put clickable links in Youtube videos, but the folder containing the spreadsheets is here.

Have fun with it, let me know if you've got questions, and especially let me know if you find anything interesting! Future videos will show how to do simple calculations with the spreadsheets, easy plots (graphs), and so on.

Doing data

2015-07-25T21:25:00.000-08:00

I had big plans for the summer - getting some code written, some analyses written up, and getting the new Mushing Tech website up and running. Well, with one thing and another none of that has happened, but I thought it might be worthwhile to post something about things that people who are interested in a closer look at distance mushing race data can do for themselves.

Very broadly speaking, looking at the data involves two separate but related steps: 1) acquiring the data, and 2) doing the analysis.

Data acquisition and cleaning is a laborious grind. Right now race data tend to be in pretty shabby shape. They are generally available as as web pages, with no two races using the same format (even those using my spreadsheets have tweaked them for their own use, which is great but does increase the effort to take the data apart). Mostly they're available as web pages, which means scraping the data and converting it into a format that you can use with analytical tools. Worse, both major races have some major database errors - Iditarod's archived races have badly broken checkpoint links, and the Yukon Quest's archived races have screwed up bib numbers.

I've been pulling race data into my own spreadsheets using two mechanisms. For races underway, I manually enter data into live spreadsheets that I keep in Google Sheets. I've been bit-by-bit converting old race data by using table scraping tools available as browser extensions (for example, DataMiner for Chrome, and the extremely fabulously wonderful TableTools2 for Firefox). So, I do have a collection of race data available in spreadsheet form, here. Please feel free to use them yourselves, copy them, and so on. They are released under the BSD 3-clause license, so please respect that and provide attribution if you do use them.

Another source of data is the Trackleaders race tracking archive, and this is the really interesting one. For each race and each musher being tracked, the speed/time plot source data is actually embedded in the web page, and you can pull it out if you know where to look. I've got a Python program that does that and calculates rest schedules from it, here. Please feel free to grab a copy and tweak it for your own use (again, it's released under the BSD 3-clause license - it's yours to use as you wish at no cost other than providing attribution).

Okay, so that's the data side of things, but how do you learn how to look at data? Many people I've talked with have questions they want to ask about the data but aren't sure how to go about answering them.

One way is to ask someone with a stats background, but a more Alaskan approach might be to learn to do it yourself. There are a couple of possibilities at low or no cost.

Data science is a booming field right now and the educational resources are really impressive. There are a lot of new books, many of which require minimal technical background. O'Reilly has a remarkable collection in both print and ebook form. For example, Cathy O'Neill's "Doing Data Science" is a terrific book that gives you an overview of the approaches you can take to looking at data, while Joel Grus's "Data Science From Scratch" provides a hands-on introduction to a variety of analytical techniques while providing a grounding in Python programming (if you're already a Python programmer you'll probably want to use an existing data science module, like scikit-learn).

But, if you've got a computer and an internet connection, a better choice might be to take an online course at one of the MOOCs ("Massive Open Online Course"). Data science and introductory statistics courses are everywhere, and they provide an opportunity to learn new skills with as much or as little of a commitment as works for you. You can take one and do all the exercises and take all the quizzes (and earn a certificate), or just watch the lecture videos - it's all up to you.

Coursera features courses prepared by faculty at major universities and the quality is extremely high. As an example, take a look at the courses in the Data Science Specialization offered by Johns Hopkins University. They offer everything from a very high-level overview to a class on regression models. ("Getting and cleaning data" might be a good one for people interested in looking more closely at mushing data!)

Udacity is another popular MOOC site. It's more oriented towards practitioners and classes tend to be less consistent in quality than Coursera, but the are learn-at-your-own-pace with no deadlines for homework or quizzes, and some of the classes were developed by companies like AT&T and Google. They also have a large data science category, with some excellent introductory classes like their Introduction to Descriptive Statistics, as well as classes that might give you some idea about how to approach looking at mushing data, like their "Data Analysis With R" class.

And as I mentioned, I'll be making a few videos showing how to work with Google spreadsheets to look at data, dealing with questions like how to do arithmetic on dates and times, and how to do some simple summary statistics.

But summer passes quickly in interior Alaska. Yukon Quest sign-ups are in a week, and we should be able to start training the dogs regularly in about a month. It's starting to get dark at night and I think we're all looking forward to seeing the aurora again. But, in the meantime, there's summer to enjoy, fish to catch, projects to finish, and friends to visit. Have a great rest of your summer, and watch this space!

About that Iditarod post concerning Siberians

2015-03-22T18:32:00.000-08:00

Over the years I've been a bit baffled by Iditarod fans being so enthusiastic on the one hand but knowing so little about the sport on the other. This year it finally dawned on me that the likely reasons are: 1) many of the fans are fans of the race, not of distance mushing more generally, and 2) if you rely on Iditarod for most of your information you're going to find yourself in the weeds fairly often. Yesterday's "Eye on the Trail" post on Siberians and Siberian mushers was a pretty stellar example of the latter.

There were a number of smaller problems (misspelling Yvonne's name, identifying Lev as a purebred musher) and some enormous ones. Failure to mention Lisbet Norris, who finished earlier the same day, is the scion of the oldest Siberian Husky kennel in the world and one that's been incredibly influential, and whose grandfather ran the Iditarod with dogs from the same kennel back in the 1980s, is probably the most glaring error in judgment, but a nod absolutely needs to have been given to Isabelle Travadon, who also ran a purebred team and who did a very creditable job when things got difficult under circumstances that led others to scratch.

Also: YES, Rob Cooke became the first Siberian musher to finish both the Quest and the Iditarod in the same year. He did it with substantially the same dogs in both races. Rob is a friend and we are so proud of him we could bust. This is a big deal, and the author of that blog post should have known it. I think it's also worth mentioning that Rob did not have a good Quest, that it started out badly and he worked through it, solved his problems and got his team to the finish. That, I think, is a huge deal and speaks to the kind of dog man he is, which brings me to my real beef with the Iditarod post.

Shortly after I moved up to Two Rivers and before Chris arrived I needed to travel for work, so I boarded my eight (!) (at the time - I'm up to 20) Siberians at a kennel in the neighborhood. The fellow who owned the place was chatting and said "I used to run Siberians," so I said "oh?," curious to see where this was going to go. He went on to say "but Siberians have too much sense of self-preservation. They'll sit down on you. Alaskans will just go until they drop." That was a bit of an exaggeration but not completely. Siberians will leave a little in the tank (okay, sometimes a lot in the tank) when calling a time-out, and it takes a certain kind of musher to successfully run a Siberian team 1000 miles. It is not a coincidence that Mike Ellis in particular but also increasingly Rob Cooke and some other up-and-coming purebred mushers are known for exceptional dog care. The people who successfully run Siberians in 1000-mile races tend to be very fine dog people, in part because they have to be.

At this point it should not be a secret that the two things that what people who don't know better say about Siberians, that they're slow and they're pretty, annoy a lot of Siberian mushers. Siberians have some traits that are highly valued in sled dogs: they have excellent feet, they're easy keepers, and they do extremely well in genuinely frigid conditions. They're tough dogs, but with an enthusiasm a friend describes as "joie de husky." And yes, they're pretty, but focusing on that is a bit like saying "What a pretty face!" or giving someone a Miss Congeniality award. It's a bit condescending and it's failing to acknowledge their qualities specifically as sled dogs. These are great dogs and tricky dogs and they're often being driven by great -- and underappreciated -- mushers.

I think this is a really great time for working Siberians and that it's just getting better, and I'm excited to see more purebred teams running Quest. In the meantime, when you see a purebred team finishing a 1000-mile race looking happy and ready for more, do not say "What pretty dogs." Instead, say "What a fine, fine dog musher. And wow, those dogs are pretty."

On Iditarod, justice, and "two-way communication" devices

2015-03-19T13:34:00.000-08:00

As I wrote on my Facebook page, I haven't been saying much about Brent's disqualification because I am nauseated by it. I haven't wanted to participate in some of the technical nitpicking that's been going on because I think the real issue is justice. Whether or not there's a network available is irrelevant - nobody, including the race judges, think that Brent had any intention to cheat or had taken any actions that would give him an unfair advantage. At the same time they acknowledge that there are other mushers on the trail with similar devices and they're not going to seek those people out for punishment. So, unjust rule unfairly applied.

That said, I do think it's worth talking about the technology a little bit, because another problem is that the rule is very, very poorly specified and I suspect a competent lawyer with some technical expertise or who had access to expert technical witnesses could have a field day. The real question from where I sit isn't whether or not there's a network that a musher could access, but rather what constitutes a two-way communication device.

By way of context, one of the things I do to earn my dogs' kibble is develop internet protocol specifications, as a participant in and chair of several working groups in the Internet Engineering Task Force. We develop the core protocols that are used on the internet, including things like routing, security, transport, and so on. We are an organization of fuss-budgets, and our work consists of specifying protocol and device behavior to a level of detail that would make most people comatose. So, I feel pretty comfortable looking at something like the Iditarod's rule 35 and trying to figure out whether or not it makes sense.

The bottom line is that I think it probably does not, for several reasons. The primary reason is that it's overly broad and would exclude devices like Bluetooth headsets, which perform a (two-way!) negotiation with another device in order to pair. It excludes special-purpose radio/wi-fi devices which cannot be used for anything but the purpose for which they were developed (for example, Nikon cameras speak PTP/IP over 802.11, with the camera acting as the access point/hotspot - completely useless for anything but camera control and transferring images/video). They also incorrectly identify a SPOT and other trackers as a one-way communication device. Technically, they are two-way - they receive radio from satellites and uplink back to the satellites. They only uplink if they "know" they're in contact with the satellites. A DeLorme InReach Satellite Communicator, which falls generally within the same category as SPOT devices, allows the person with the device to send arbitrary messages. And what about a Fitbit? Technically, those are two-way communication devices, since they swap messages with your computer, tablet, etc.

And then there's the more general problem of keeping up as technology changes and develops. For example, what about the Apple watch and other "smart watches?"

This probably sounds like nitpicking, and it is. There are a lot of two-way communication devices that don't provide general communication facilities. You can't send and receive email with a Bluetooth headset and you can't browse the web with a Nikon camera (yet, as far as I know). The problem is that this is the rule under which Brent was disqualified, and the rule is a mess as far as the specification of communication capabilities. Because this is the rule that was used and because the penalty was so severe, its technical correctness matters a lot. As I said I think a competent attorney (and likely even an incompetent one) could have a field day with it.

The Iditarod organization has repeatedly demonstrated itself to be technically unsophisticated. Usually this comes out in the form of making bad decisions about writing their own tracking system, having their social media people provide technical support (badly), and so on, but here's a case where their lack of ability to describe what it is that they'd like to prevent has caused material damage to someone who even the race judges who made the disqualification decision agreed wasn't cheating.

Given that the problem that they're trying to solve isn't really a technical one, although it could be instantiated using technology, I think they are probably much better off trying to define disqualifying behavior rather than disqualifying devices. Their technical incompetence has led to considerable injustice against someone who did nothing wrong,

The Yukon Quest on GPS or Google Earth

2015-02-05T08:54:00.003-09:00

Last year, we provided some Iditarod track files for quite a large number of software applications. This year, for the Quest, I've prepared copies of the most popular file types:

a KMZ file for use with Google Earth
a GPX file for use with compatible GPS units, such as Garmin's

KMZ file loaded into Google Earth

The files contain the same information: checkpoint (and dog drop etc) locations as well as the race trail for odd years. I condensed this data from a variety of sources and hand-cleaned it, but obviously it will only be approximate, and the trail will vary from year to year. Also, because the trail is made up of points that are about half a mile distant, the total distance is being underestimated (it comes out to 911 miles according to this track). So please take this (ENTIRELY UNOFFICIAL) file with a good pinch of salt! If it is helpful or useful, I'd be glad.

The predictive value of split differences

2015-02-03T17:11:00.000-09:00

It seems that the format of the Knik 200 trail was not that much fun for teams, so we had a few mushers who didn't need qualifiers checking out early (that is to say, scratching). But, the format also gave us an opportunity to look at some data that haven't been available in the past for distance racing.

The Knik had mushers running from Deshka Landing up to Yentna and back, twice. That is to say, they passed over the same piece of trail 4 times. This gave us an opportunity to look at whether or not the consistency of the speed with which they covered the trail had any predictive value, in terms of final placement. If someone had pretty much the same times on each race leg, did they tend to finish higher or lower in the standings?

So, I created a spreadsheet sheet in which I took the differences between the two up times, the differences between the two back times, and the sum of the absolute values of the differences. This gave me a handle on just how much variability there was in a team's runtimes. I then ran correlations on the total differences, the run to Yentna differences, and the run back differences, I found a fairly strong correlation between the summed runtime differences and final placement, but with also a fairly large standard error given the size of the field. That is to say:

a team whose times remained consistent from split to split tended to finish better than a team whose times varied more from split to split

This should not be particularly surprising, since the source of speed variation tends to be slowing down due to tiredness, etc. Another source of consistency, other than conditioning and fitness, could be the musher's expertise in managing their team's resources. Steady speeds probably don't cause a good finish, but they can tell us something about the team's "quality" (for lack of a better word).

In the correlations I ran, the difference in splits was the independent variable, and finishing position the dependent variable. Here's the table, with the first correlation being between the summed differences and the finish, the second being the summed differences on the run to Yentna, and the third beingn the summed differences on the run from Yentna back to Deshka:

correlation coefficient:	0.7031
standard error:	5.4138

correlation coefficient out splits:	0.5768
standard error:	6.2195

correlation coefficient in splits:	0.5781
standard error:	6.2121

Tools for looking at run/rest schedules

2015-01-29T21:57:00.000-09:00

If you've followed distance mushing for any amount of time you're keenly aware of the role that run/rest schedules play in the sport. They can be both strategic and tactical, and can reflect the breeding decisions a given musher makes, as well as their training regimen (more on that in a bit).

I've posted a couple of videos on my Facebook page using Trackleaders tools from last weekend's Northern Lights 300 to look at how to use them to get a better understanding of how teams are performing against each other as they move down the trail, and how to use the replay function to watch interesting, and occasionally surprising, things happen during a race, based on how the teams move on the map. In the latter video I looked more closely at Larry Daugherty, because there were some unusual things happening on his tracker that stood out from the rest of the race.

In Larry's recap of the race he talks about deciding that he was going to try to be competitive, and making adjustments to his run/rest schedule to be more competitive. Of course, now we know that his team shut down on him twice, and the first place to look given his comments and given what happened is at how he rested his team.

During last year's Yukon Quest I wrote a short Python program to pull rest times out of a Trackleaders musher track. It's available for download, but for the more visually-oriented, you can take a look at a musher's speed/time plot on their Trackleaders page. For example, here's Kristy Berington's for this year's NL 300. She won the race, so clearly the decisions she made worked well for the team she had and the training she'd done:

Reading this is absolutely straightforward. The x-axis (horizontal) is race time - hours since the race start - and the y-axis (vertical) is speed. This looks like a pretty standard schedule for a 300-mile race. Her schedule looked roughly like:

        Run 7 hours, rest 6 hours
        Run 6 hours, rest 3 hours
        Run 5 hours, rest 5 hours
        Run 8 hours, rest 4 hours
        Run 7 hours to the finish

So, the longest of the runs were only moderately long, and pretty much in keeping with what we've seen be successful in other races. None were what anybody would call very long. There were four rests. Note that the longest run was in the last half of the race, the second-to-last one, which was 8 hours.

Now, Larry said that his goal was to run long and rest long, as he'd seen other mushers, notably Allen Moore, do with a great deal of success. So, let's take a look at Allen's successful Copper Basin 300 race earlier this month:

Again, time is on the vertical axis and speed is on the horizontal axis. His schedule looked roughly like:

        Run 5 hours, rest 5 hours
        Run 7 hours, rest 7 hours
        Run 3 hours, rest 5 hours
        Run 9 hours, rest 2 hours
        Run 8 hours to the finish

Again, a few moderately long runs, four rests, and the longer runs in the second half of the race (also top speeds of 30mph in a few places - woo, Allen! But those, sadly, are actually measurement errors). Note, as well, that in mid-distance races with mandatory checkpoint rest, the run/rest schedule is going to be influenced by checkpoint location and layover rules.

So now we've looked at a couple of successful run/rest schedules, including one that Larry said he was using as a model. Let's look at what Larry actually did:

        Run 7 hours, rest 7 hours
        Run 12 hours, rest 5 hours
        Run 8 hours, rest 6 hours
        Run 8 hours to the finish

Note that there are places in the third run and the run to the finish where his speed dropped considerably, and in one case where he actually stopped. Those are the places where his dogs quit on him.

What pops out here is that he only did four runs over 300 miles, while Kristy did five runs in the NL 300 and Allen did five runs in the CB 300. Note as well that his long run was comparatively quite long (12 hours), that it was in the first half of the race, and that it was not followed by much rest. His run/rest schedule does not actually look much like that of mushers winning at that distance, and is clearly one possible reason why his dogs quit on him several times. And this gets back to the training question - if he hadn't been training for very long runs prior to the race his dogs were likely not in condition either mentally or physically to pull one off.

Eyeballing the curves it also looks like Larry lost more speed on the first run than Kristy did. It might be interesting to fit a regression line to each and see how the slopes compare, but think it would only be a little bit interesting. More interesting is that many top mushers have talked about leaving the start chute at the same speed they'd like to run their race, or about negative splits, where their speeds towards the end of the race are faster than their speeds earlier in the race. Sebastian Schnuelle has talked memorably about being passed left and right by other teams towards the start of the race, and telling them "I will see you later." And nearly always, he does. Going out fast may or may not hurt but it's not clear that it helps.

This plot (the speed vs. time plot) is an incredibly handy tool for looking at run/rest schedules. One enhancement I'd love to see would be the ability to overlay multiple mushers on the same plot, which would allow us not only to compare rest durations and locations, but also speeds while moving. In the meantime, my script has the option to output the results in CSV format, which is handy for loading into a spreadsheet or into a data analysis package like R or NumPy.

A closer look at the Buser shortcuts in the Kusko

2015-01-20T18:08:00.000-09:00

I think at this point most distance mushing fans are aware that both Rohn and Martin Buser left the trail during this past weekend's Kuskokwim 300, and that they received no penalty for having done so despite the trail they took being somewhat shorter than the race trail.

The Kusko 300 rules say:

"Racers must follow the marked and/or broken race trail. Leaving the marked and/or broken trail for the purpose of gaining a competitive advantage over other racers is not allowed."

Right there is an obvious problem: it implies that if someone leaves the trail and follows a shorter route by accident, they will not be penalized. It requires the race judges to attempt to determine intent, and that's both difficult and unfair, as it introduces a highly subjective element into the decision. There's also the question of incentives - if you kill a moose with your car in Alaska, you don't get to keep the meat, antlers, or any other valuable part of the animal because the state doesn't want you accidentally-on-purpose killing a moose. Same with killing wildlife in defense of life and property - you don't get to keep anything of value from the animal because they also don't want you accidentally-on-purpose having a dangerous run-in with a bear. Setting up a situation in which it's okay to leave the trail under particular circumstances removes some of the disincentives for leaving the trail.

But, this doesn't apply in Rohn's case, as he was told that he was off-course and given the opportunity to return to the race trail, which he did not do. So.

What interests, I think, a lot of us is whether or not these shortcuts had an impact on the outcome of the race. We can use several Trackleaders' tools to look at that question and try to sort it out.

First, let's look at the shortcut itself. If you go to the Trackleaders page for the race, down the right-hand side of the map you'll see a column of buttons. Click on "Map layers."

That will expand to a series of checkboxed menu items: Weather Conditions, Cloud Cover, All Musher tracks, Tent layer, and Scratch layer. Click on "All Musher tracks." That will draw all of the tracks for all of the mushers who are being tracked. I've done that in the following image, and zoomed in on the trail near Bethel (I've also switched to satellite view, as the tracks stand out better on the map).

Clearly there's no question that Martin and Rohn did not follow the same trail as everybody else, and it appears that the trail they did take was shorter. So, was it shorter, and if so, how much of an advantage did they gain?

To try to suss that out, let's look at the race flow chart, which can give us a pretty clear look at average traveling speeds and team speeds relative to one another, as well as showing us speed anomalies in the track. Here's the race flow chart from Tuluksak to the finish. Note the big vertical jag in Rohn's and Martin's curves - that's where they left the trail (to digress a bit, Trackleaders appears to calculate musher trail mile by comparing the location of the GPS reading to the track that they were given by the race organization). Note, as well, that both Rohn and Martin had slowed down and were losing speed relative to the teams around them. On the race flow chart the x-axis (horizontal) represents time since the race began and the y-axis (vertical) represents trail mile. The steeper the slope of a musher's curve, the faster they're going, and the less steep it is, the slower they're going. So, that they were losing ground is very clear, and that they got a bump from their shortcut is also very clear.

There are several things we can do in looking at the data. One thing we can do is look at the bumps and try to figure out what it did to "effective" speed on the trail. Switching over to looking at Rohn's individual tracker map, it appears that he left the trail right at about mile 253.5 and rejoined at mile 266. Again, according to his individual track,, that would mean that he left the trail at about 4:37am and rejoined the trail at about 5:36am. So, as far as the race is concerned he covered 12.5 miles in 59 minutes, or averaged a hair over 12.5 mph over that section of trail (again, as far as the race is concerned). If you take a look at his traveling speed prior to that (looking at the slope on the race flow chart) at hour 32 he was at mile 234.5 and at hour 34 he was at mile 253.5, so was traveling about 9.5 mph in the two hours prior to leaving the trail. That is to say, he got about a 1 mph boost.

Now, if he'd stayed on the trail and continued traveling at about 9.5mph, would he still have beaten Jeff King to the finish? More arithmetic. He left the trail at about 4:37am, at trail mile 253.5. The finish in Bethel is at about trail mile 267.3. That's 13.8 miles. If he'd been traveling at a constant 9.5 mph over that 13.8 miles he'd have arrived in Bethel about an hour and 27 minutes after the time at which he left the trail, or around 6:04 am. Jeff got in around 5:58. That's close enough, I think, to be questionable, but Jeff probably would have finished earlier and gotten the $17,000 check instead of the one for $11,500.

Just for fun, let's try extending Rohn's curve long its original path to see if it results in something much different, and as a way of validating (or not) the values I've been using for time and trail mile. Also, pictures are just plain easier to understand. So, what happens if I extend Rohn's line on the race flow chart along its original path? This, which shows him finishing at about the same time as Jeff:

Note that Rohn's line, however, was not straight - it bulges a bit on top of the straight line because ... he was slowing down.

Running some numbers on Martin's track, he left the trail at about 5:54 am and returned to it at about 7:04am, so as far as the race is concerned he ran that section at 10.9 mph, while otherwise running about 8.7 mph. If he'd stayed on the trail at a constant 8.7 mph, he would have arrived in Bethel in the general vicinity of 7:30, or right about the same time as Brent Sass. This one is a lot fuzzier than the Rohn/Jeff situation.

So basically, yes, I think that if Rohn hadn't left the trail Jeff would have beaten him into Bethel, but not by much. I don't understand why the Busers weren't penalized for leaving the trail and taking a shorter route and I especially don't understand why the race organization has said absolutely nothing. Given that the Kusko organization hasn't said a word about this I think it would be nice if Rohn would take responsibility and then donate the $5500 difference between a 2nd and 3rd place finish to a local charitable organization in Bethel. The way the race ended just leaves a bad feeling all around.

[Update: KYUK reports that both Busers have been penalized 10 minutes and 10 percent of their winnings. I do think that both received substantially more than a 10-minute benefit from their adventures on Church Slough, but I'm glad that the problem has been formally recognized by the race.]

Updated SPOT API example

2014-12-22T12:46:00.000-09:00

Several years ago I posted some example code to exercise the SPOT tracker API. SPOT has since both updated their data format and added a JSON feed. I think both changes are for the better; certainly JSON is easier to deal with than XML and is far more efficient to process. So, I've updated my code to speak JSON. Note that in this code I'm pulling the JSON in from a file that I created by downloading the JSON using curl; you should be able to pull it in directly from SPOT by putting together a URL following the instructions on their API page (I did it using a local file because SPOT, frankly, is a bit parsimonious about server traffic). Nearly all of the change was to the function extract_gps_data().

Note that the data are simply a JSON-ized version of the data shown at the SPOT API page.

I've put the code up on Github, as well. The caveats from the previous API post still apply - don't write a tracker that runs in the browser, don't write it in Javascript, don't hit the servers (either SPOT or Google Maps) more often than absolutely necessary, etc. Be sensitive to privacy issues, and what you're revealing when you write a tracker that's publicly available.

Let me know if you identify errors in the code or if you have questions! Most of all, have some fun with this.

<!DOCTYPE html>
<html>
<head>

 <title>My Wee Tracker</title>
 <meta name="viewport" content="initial-scale=1.0, user-scalable=no" />
 <style type="text/css">
  html { height: 100% }
  body { height: 100%; margin: 0; padding: 0 }
  #map_canvas { height: 100% }
 </style>
 <script type="text/javascript" 
  src="http://maps.googleapis.com/maps/api/js?key=AIzaSyBFoJjPtS9vWXIENOa-egd0XFFnnQbfTIk&sensor=false&libraries=geometry">
 </script>

<script type="text/javascript">
//<![CDATA[


// convert text from the tracker data to a JSON object and
// pull out deeply-nested data elements

function extract_gps_data(trackerdata)  {
    var points = new Array();

    var track_data = JSON.parse(trackerdata);
    var messages = track_data['response']['feedMessageResponse']['messages']['message'];

    for (i = 0 ; i < track_data['response']['feedMessageResponse']['count'] ; i++)  {
        var timestamp = messages[i]['dateTime'];
        var latitude = messages[i]['latitude'];
        var longitude = messages[i]['longitude'];
        var point_holder = new point(timestamp, latitude, longitude);
        points.push(point_holder);
    }
    return points;
}


// "point" is an object we use to hold the data we'll be putting on the map

function point(timestamp, latitude, longitude)  {
    this.timestamp = timestamp;
    this.latitude = latitude;
    this.longitude = longitude;
}

function get_track(url)  {
    var request = new XMLHttpRequest();
    request.open("GET", url, false);
    request.send();
    return request.response;
}

function makeinfobox(pointnum, thispoint, theotherpoint)  {
    var latlnga, latlngb; 
    var distance;
    var infoboxtext;
    var timestamp;
    
    timestamp = new Date(thispoint.timestamp); // we convert it from ISO format to something more readable
    infoboxtext = String(timestamp);
    if (pointnum > 0 && theotherpoint)  {  // no point calculating distance on the point
        latlnga = new google.maps.LatLng(thispoint.latitude, thispoint.longitude);
        latlngb = new google.maps.LatLng(theotherpoint.latitude, theotherpoint.longitude);
        distance = google.maps.geometry.spherical.computeDistanceBetween(latlnga, latlngb) / 1610; // convert to miles
        infoboxtext = infoboxtext + "<br />" + distance.toFixed(2) + " miles";
    } 
    return infoboxtext; 
}

function initialize()  {
    var points;
    url = "spot_track.json";
    trackline = new Array();

    trackerdata = get_track(url);
    points = extract_gps_data(trackerdata);

    var spot = new google.maps.LatLng(points[0].latitude, points[0].longitude);
    var my_options = {
        center: spot,
        zoom: 12,
        mapTypeId: google.maps.MapTypeId.ROADMAP
    };
    var map = new google.maps.Map(document.getElementById("map_canvas"), my_options);
    for ( i = 0 ; i < points.length ; i++ )  {
        var contentstring = "Point " + i; 
        var spot = new google.maps.LatLng(points[i].latitude, points[i].longitude);
  // here we create the text that is displayed when we click on a marker
        var windowtext = makeinfobox(i, points[i], points[i+1]);  
        var marker = new google.maps.Marker( {
            position: spot, 
            map: map,
            title: points[i].timestamp,
            html: windowtext
        } );
  // instantiate the infowindow
  
        var infowindow = new google.maps.InfoWindow( {
        } );

  // when you click on a marker, pop up an info window
        google.maps.event.addListener(marker, 'click', function() {
            infowindow.setContent(this.html);
            infowindow.open(map, this);
        });

  // set up the array from which we'll draw a line connecting the readings
        trackline.push(spot);
    }  
 
 // here's where we actually draw the path 
    var trackpath = new google.maps.Polyline( {
        path: trackline,
        strokeColor: "#FF00FF",
        strokeWeight: 3
    } );
    trackpath.setMap(map);
}

//]]>

</script>
</head>

<body onload="initialize()">

<div id="map_canvas" style="width:100%; height:100%"></div>

</body>
</html>

Catching up with the SPOT API

2014-12-20T16:46:00.001-09:00

Over the holiday I'm planning on putting out an update of the sample code that uses the SPOT API, both to work with the new format and to work with the JSON representation rather than the XML. In the meantime, here's a sample JSON element representing a single SPOT tracker message:

{
'    @clientUnixTime': '0',
    'batteryState': u'GOOD',
    'dateTime': '2014-12-19T07:34:35+0000',
    'hidden': 0,
    'id': 349242506,
    'latitude': 64.82709,
    'longitude': -148.99679,
    'messageType': 'TRACK',
    'messengerId': '0-8283550',
    'messengerName': 'Melinda\'s tracker',
    'modelId': 'SPOT2',
    'showCustomMsg': 'Y',
    'unixTime': 1418974475
}

The (minimal) SPOT api documentation is available online here. More later!

2011 Quest runtimes

2014-11-24T15:17:00.000-09:00

Following up on my previous post on 2013 Quest statistics, I've gone back, cleaned up my 2011 data, and run some numbers on that. What I found was pretty consistent with the 2013 numbers, except that there were a few surprises in the numbers that actually helped tell the story of the race.

If you've been following the Yukon Quest for awhile you may remember that 2011 was pretty harrowing, with some bad weather in the middle and second half of the race that caused some very serious problems for those in the front of the pack. Brent Sass recorded some particularly memorable video as he helped a nearly-hypothermic Hans Gatt off American Summit:

and Sebastian Schnuelle's ice-caked boots in Central told a story, as well:

What I did with the 2011 data was virtually identical to what I did with the 2013 data: For each checkpoint I took a look at runtimes to get a general picture of what happened on that leg of the race. Then I collected those runtimes into a table, where I ran correlations of runtimes for each race segment against the finishing position.

The unsurprising part was that race segment distance correlated quite well with overall finishing position, with the longest race segment (Dawson to Eagle) showing a very, very large positive correlation with finishing position, at .8537. So again, this year look for longer race segments to have greater predictive value for finishing position.

The surprising bit, at least initially, was that there were four checkpoints at which runtime was inversely correlated with finishing position (in one case, strongly so). They were Slavens (-0.1755), Circle (-0.4027), Central (-0.2543), and 101 (-0.0877). At well over half-way into the race, the faster teams had pulled up towards the front, and this is where they ran into awful conditions, which slowed them down. By the time the back of the pack arrived the overflow had frozen and the weather had moderated. So, they were able to travel faster.

Nevertheless, the correlated between traveling speed over longer segments and overall finishing position remained reasonably strong, with an r value of 0.4527. Here's the plot:

If you're interested in looking at the data and playing with them yourself, they're here, with the correlations on the very last sheet. Unfortunately Trackleaders hadn't added the replay feature to their tracker at that point, but the race track is online here and looking at individual musher pages can help illustrate some of what happened (the closer together the breadcrumbs, the slower the team was moving [assuming that the breadcrumbs were being uploaded at regular intervals]).

Also, note that this analysis lacks anything resembling rigor, including questionable choice of metrics for correlation, etc. But, for a casual description of how the race played out and how that's reflected in the race statistics, I think it's adequate. Let me know if you spot a problem, or if you've got further questions.

A look at 2013 Quest runtimes

2014-10-14T14:39:00.000-08:00

It's looking like this year's Yukon Quest has a pretty good field of entries, and with fall training well underway in interior Alaska we're all starting to speculate about how the race is going to go this year. It's only natural to look at past races, so I've started poking at the 2013 data, the last year the race was run in the Whitehorse-to-Fairbanks direction. I'm also interested in having a baseline set of data to which this year's race can be compared, once it's underway.

So, I've taken my own spreadsheets from 2013 and used them as a basis for running some numbers. In particular, I've created a spreadsheet containing runtime 2013 checkpoint data and extracted runtime summary data to get some basic descriptive statistics: fastest, slowest, mean, median, 1st quartile, and 3rd quartile for each race segment (between checkpoints). For example, between Braeburn and Carmacks, I've got a table that looks like this:

You'll note that I've also calculated the ratio between the fastest and slowest runtimes; there may be something interesting there to look at later. I've also plotted all runtimes as a histogram:

Again, this is largely to create a baseline dataset for comparison with this year.

However, I also ran some correlations, and while the results are obvious if you think about them for a few seconds, I haven't ever seen anybody say so explicitly:

The ranking of runtimes on longer race segments (more miles between checkpoints) tends to correlate more strongly with final standings, at least in this data set. That is to say, people who had the faster runtimes between checkpoints which are far apart tended to finish better than people with slower runtimes. Some of this is tautological (the longer runs are a greater percentage of the total race), some of it might (I haven't looked at this) be because on long runs everybody has to camp, even people who would otherwise prefer to rest at checkpoints, or because longer runs smooth out the variability you might see in shorter runs (law of large numbers, sort of).

Here's a plot of the the correlations between segment runtime and finishing position, against segment distance.

The correlations are on the "correlations" worksheet in the spreadsheet. I'm using a standard Pearson product-moment correlation coefficient (r), which is not the best test for a complete dataset but is adequate for exploring these data. I'm not posting the numbers here in the interest of not having reader eyes glaze over, but definitely feel free to visit the spreadsheet, poke through the data, copy the spreadsheets, and ask questions.

I'm planning on doing something similar with 2011 data, as well as years run in the opposite direction, to see how well what we're seeing in 2013 holds up. Unfortunately getting the data into the spreadsheet and clean enough to use is pretty labor-intensive, so it might be some while before a follow-up to this post. But, once the data are in a spreadsheet there's a lot we can do with them, so there's incentive to do it beyond answering just these questions.

Here's looking forward to a great winter of distance dogsled racing!

New (ish?) event tracking software

2014-09-01T21:02:00.000-08:00

It looks like there's a new GPS-based event tracking application, RaceBeacon. From what I can glean from their website it looks like they're consolidating GPS feeds from individual participants' personal smart phones. This is probably reasonable for shorter (as in, very short) informal events, and may provide a solid basis for building out a more robust platform with more features in the future. As much as I'm a fan of Trackleaders (and I'm a huge fan, for reasons I'll go into in the next paragraph), it's always great to see some competition in this space.

That said, just showing locations on a map is not that interesting, particularly for events with staggered starts (like sprint races). A map alone can show you where teams are in relation to each other but they don't capture the dynamic nature of racing - the stuff that makes racing exciting. Prominent among the reason that I like Trackleaders is that they really are both data guys and competitive cyclists, and they're interested in showing the story of a race as it unfolds. Also, because they're data guys and computer scientists they've already dealt with some relatively difficult problems, for example calculating with some degree of precision the race mile at which a given team is currently located (harder than you'd think). In this case, for sprint mushing races, the problem RaceBeacon is facing is how to demonstrate the relationship between two different teams' performances when the entire race is run in 20 minutes without stopping and the teams started 2 (or 4 or 18) minutes apart. But, if you're not committed to using another tracking system and you're putting on a sprint event (and you can count on most or all of your teams carrying smart phones with data plans and having their batteries fully charged and being a platform supported by RaceBeacon), this could be very interesting to experiment with.

If you've used them for tracking your event, how did it go? Who's planning on using them this fall or winter?

Packable beer

2014-05-31T20:32:00.000-08:00

Recently there's been some discussion of instant beer for backpacking or other backcountry travel. After all, beer is almost entirely water and water's really heavy, so if the water can be eliminated and added later, the problem is basically solved (well, almost).

This sounded promising but turned out to be a hoax, but in the meantime Pat's Backcountry Beverages has developed the real thing - a beer concentrate and a convenient technology for rehydrating it and adding the fizzy back. Note that they also have concentrates for various soft drinks including colas, lemon-lime drinks, ginger ale, and others (and you can actually probably use it to carbonate nearly anything). In the interest of Science we decided to use empirical methods to test the manufacturer's claims.

It's a 3-part system, consisting of a very lightweight easy-to-pack plastic carbonator:

the "activator" (a mix of citric acid and sodium bicarbonate, which comes in convenient small packets):

and the drink concentrate:

(a fun fact about the concentrate - yup, that's 58% alcohol):

Also, your basic low bar:

Anyway, the lid of the concentrator has a lever that's used to pump air in and pressurize the container, and making the brew is a quite straightforward process of pressurizing the device, releasing the pressure, and repeating that cycle for about two minutes. When you're done you've got something that looks like this:

which is a not-bad head on the beer. Poured into pint glasses we have:

There are two beers available, a "Pale Rail" and a "Black Hops." This is the Black Hops. It smells very malty and a little sweet, but the sweetness doesn't come through in the flavor. I'm pleased with the flavor (a little bitter) and find it very drinkable. Chris is German and therefore has profound beer expertise, and she thought it would be better much cooler (we used plain tap water) but was otherwise quite good.

Here's the bad news: while the purchase cost was high but not unreasonable, the shipping costs to Alaska were nuts. As in, about $30 for the carbonator, activator, and brew just for shipping. It's only available from one online vendor and there doesn't seem to be anybody selling it locally. (The good news: market opportunity!). The carbonator itself is about $40 (but it's reusable and seems durable), the activator is about .50/packet, the beer is about $2.50 for a packet to make a pint, and the soda is a bit under $1.50 for a packet to make a pint.

My one reservation is that because it's a liquid concentrate it's likely to freeze at low temperatures, but otherwise I'm very pleased. I'll be ordering the Pale Rail and some of their sodas, I think - I do think this is a pretty nifty gizmo, and very highly packable.

Hey, does anybody know if alcohol kills giardia?

Iditarod and software development

2014-03-08T15:05:00.000-09:00

As I posted on my Facebook page, Iditarod has removed my ability to post to their Facebook page. The proximate cause appears to have been my suggestion that they hire a tech support professional. I accept that I've been a little obnoxious. I don't think, however, that I'm wrong.

I don't think Iditarod realizes yet that they're now in the commercial software business. They wrote some software, they're selling it, and that's kind of that. I don't think they had any idea at all what they were getting into, and I've been trying to figure out why they did it in the first place. I think part of the issue is that Trackleaders' user interface really does look dated, even if they've got the best functionality in the event tracking business. Iditarod wants their stuff to be "branded." (They also want to charge a lot of money for it; Trackleaders is committed to tracking being free to fans). I think a better outcome would have been to work with Trackleaders on figuring out how to "skin" the Trackleaders app to develop a distinctive look.

But, that's not what they did. They decided to write their own software. I can't imagine they did it to save money, since programmers are really pretty expensive. Better ones make upwards of $90,000/year, plus benefits, really good ones make big piles of cash. On the other hand there are web sites for jobbing out work to what are really very good programmers in places like Pakistan and Bangladesh, and those folks are quite inexpensive (at the cost of some reliability issues due to both infrastructure and political stability - a friend hired some developers in Pakistan and then Bhutto was assassinated a few days later, which, among other things, pushed deadlines back).

What Stan and the other fun folks in that office might not have realized is that you never finish software, never. You release it, but there's always more work to do, bugs to fix, features to add, underlying technology changes to adapt to, and so on. For example, when Google changed their maps API, it put one of my favorite Alaska GIS websites out of business, because there was no money to hire programmers to adapt their software to the new interface. So, when the ITC decided to develop their own software, they decided to commit money to it, year after year after year after year.

Something else they apparently didn't realize is that software is not ready for release when the developer says "it works for me." The developer knows how it works and naive programmers only test happy path application use. Once the software is released into the wild, particularly if it's a web app, it's going to be run in a variety of platform and browser environments, users are going to try to do things you could not possibly have imagined, and so on. Bringing in a test professional to bang on your application is going to turn up problems before the software is released, giving you an opportunity to fix bugs and head off support issues. A typical test environment has a bunch of different operating systems running in clean VMs (virtual machines), with as many browsers (and versions of browsers) as they can possibly get their hands on. I'm still boggled that Iditarod developers apparently didn't test their stuff on Internet Explorer, which may be losing market share but is still the second most-widely used browser after Chrome (see here for browser stats). Test and QA professionals are in high demand and well-compensated for a very good reason - over the long run they save a project money, reputation, and headaches.

So here we are, with Iditarod having developed their own software and not having tested it before releasing it. Now what? Well, this is where having tech support people makes a huge, huge difference. For starters, actually knowing something about the technology is kind of a time-saver when trying to solve a technical problem. In library school, people who are planning on a career providing reference services are taught a skill called "negotiating the question," where they're taught techniques for finding out what a library patron's real question is when they come in with something vague or somewhat oblique to what they really want to find out. Tech support people do the same thing. They find out how to reproduce a problem and they know how to describe it to developers to find a solution if it's not something they can figure out themselves. They recognize the difference between a user error and a real bug.

But that's not what Iditarod has. They hired some social media people, whose job is to say "Watch this fantastic video! Then buy things." They may be able to navigate Facebook and Twitter extremely well, but those are different skills from being able to sort out technical problems. And so it is that Iditarod's social media people were answering questions from someone who was unable to watch videos because she clearly wasn't logged in. They told her "Reboot your computer" (I'm only sorry I never had an opportunity to ask them how they thought that would help). That led to a situation in which the social media people were flustered and frustrated and the user with the unsolved question was pissed off. It's not good for anybody.

So, perfect storm of really bad decisions on ITC's part. I don't expect Iditarod to fix this situation, because Iditarod doesn't fix these things and it's completely consistent with past performance (here's something I wrote about this almost exactly two years ago, when the handwriting was already on the wall with regard to IonEarth's long-term viability). It's a tough situation for those putzes who blocked me from posting on Iditarod's wall, and it's a tough situation for fans. I don't expect it to improve, at least not any time soon.

I really don't like the Iditarod organization, in case that's in any way unclear (!). But when you get past all the stupid decisions, the commercialism, the minstrel show aspects of some of what goes on, that they really do not put dogs first, it's still a 1000-mile race with superb mushers and incredible athlete dogs. Fortunately, as was foreseeable a few years ago, better and better photos, videos, and coverage are coming from free sources (in addition to the Anchorage Daily News and KNOM, KUAC started sending Emily Schwing out on the Iditarod trail two years ago, and this year Alaska Dispatch has really upped their coverage). The only thing that Iditarod provides that isn't available elsewhere is the tracking, and at this point it's nearly worthless, anyway. I am cheering for friends running the race and wishing them the best, The ITC, well, whatever.

This one's for the mapping nerds

2014-03-03T13:37:00.003-09:00

Map enthusiasts (and that would be nearly all of us of a nerdly disposition) should know about a really nice, free mapping service built on top of Google Maps. Gmap4 includes not just road maps and satellite images, but also topo maps for the US and Canada, satellite maps, Open Street Maps, and a wide variety of tools, plus what's basically a RESTful API that allows you to integrate your own data without having to create an account or reveal personally identfiable information (PII). Bravo to the author, and enjoy!

Gmap4 is online here.

Good morning, yo

2014-03-02T20:11:00.000-09:00

I'm in totally the wrong timezone for following Iditarod. I went to bed before the start (sorry, all, I just can't bring myself to call it a "restart") and woke up at 4am GMT to a lot of complaints about the new tracker. But I wasn't interested in the complaints as much as I was in looking to see what friends on the trail were up to, so I opened the tracker myself.

With Trackleaders, the first thing I'd do in the morning was open the race flow chart and get a quick picture of who was moving the fastest, who was resting, who was passing whom, and so on. What I get with the Iditarod tracker is dots on a map. This morning I can infer who went out fast by what order the bib numbers are in on the trail (because they left the start in order), but that's not going to last for very much longer. But nevermind that, how are my friends doing?

To find that out, I had to open up the so-called "leaderboard" and chew up more screen space, then select someone from the list. Becaause the default sort is by bib number and it's a huge list, to find Mike I needed to click to sort by musher name, then select his name. A box pops up with basic information, and it COVERS UP THE DOTS ON THE MAP (yes, I'm shouting).

That is to say, the dots on the map aren't even visible. When I drag the screen around to uncover the dots on the map, is Mike highlighted? No, he's not, so I cannot even tell with a quick look where he is in relation to everybody else. So, I go back to the "leaderboard," sort it by name, and get his trail mile. Since trail miles are not displayed on the map, I need to find him in relation to the pack, which means that now I've got to sort the "leaderboard" by trail mile, which gives me some sense of how to find him (by finding other teams around him, which means looking for their bib numbers - I know, right?).

Mike's running last, so he was easy to find by dragging the map around some more. Since he's last he's easy to find. But here's an exercise for those following along at home: find Dan Kaduce. I'm waiting ...

Find him? How much clicking and dragging did you need to do to do that? If you'll recall, with Trackleaders all you needed to do was hover over his name and his dot-on-a-map would bounce, making him very easy to find, indeed. So basically, at this point in the race the tracker isn't carrying very much information. I think some of this is deliberate (they really don't want to make it easy for you to spot teams in trouble) but much of it is just lack of understanding of design issues and the software development process.

A digression: I just tried looking at mouseover pop-ups to see if that's a little easier, once you've got trail mile. It is, but the trail miles are wrong - they're showing a few people further down the trail at lower trail miles - see the screenshot showing Kristy Berington and, uh, somebody whose name is covered up by Kristy's pop-up:

Kristy is shown as being at race mile 36, yet she's ahead of someone who's shown as being at race mile 37.

Iditarod have made some very, very basic user interface mistakes, but because it's their software, they're the ones paying directly to fix every bug, improve every user interface error in the design, answer every user complaint, and so on. I am pretty sure they had absolutely no idea what they were getting into when they made the decision not to use someone else's software.

And one last screenshot before going back to bed, because if nothing else I'm grateful to Iditarod for such a clear demonstration that the development of production software should not be left to hobbyists:

Start order and speed impacts, 2013

2014-03-01T11:01:00.001-09:00

I've just run some numbers on the 2013 Iditarod, from the start to the first checkpoint (Yentna), with an eye towards getting a handle on the relationship between bib number (i.e. start order) and speed. The assumption is that because the trail gets torn up by everybody who passes over it, the teams with higher bib numbers will be traveling more slowly over the first section of trail. I looked at this in the Copper Basin in this blog post, and found that there was, in fact, a negative correlation (later teams traveled slower), albeit a fairly loose one.

So, I took a look at last year's Iditarod. The first thing I did was to plot speed against bib number:

If there's a relationship it certainly did not pop out as clearly as it did in the Copper Basin. So, I ran the actual numbers. Using the R statistical package, I ran a Kendall's rank correlation tau, which has the advantage of not making assumptions about the underlying distribution (for example, that the underlying data have a "normal" distribution). In a nutshell, nope, there does not appear to be a relationship between bib number and speed in these particular data, or at least not one that cannot be explained as the result of random fluctuation. Specifically, the results of the run are:

 Kendall's rank correlation tau

data:  y$Bib and y$speed 
z = -1.8238, p-value = 0.06818
alternative hypothesis: true tau is not equal to 0 
sample estimates:
       tau 
-0.1557844

So, with a p value of 0.06818, it does not meet a .05 significance level criterion.

It doesn't appear likely that distance makes a difference. You'd reasonably expect that a shorter trail would see greater impacts, but there's not that much difference in distance between the start and first checkpoints in the two races (42 miles in Iditarod, 50 miles in Copper Basin). So, there's still a bunch of work to do. It may have something to do with trail differences, or weather, or ... ? I'll be taking a look at the 2014 numbers after the last team is into Yentna this weekend, but over the longer term I'd like to aggregate a bunch of years and see what falls out.

The Iditarod track file

2014-02-28T21:42:00.000-09:00

With Melinda away in London, a quick post from me, Chris. Over the last weeks, I've been asked whether I could provide an Iditarod track file and calculate distances between the checkpoints directly from it. (I do this sort of thing in my work all the time, for science.) I thought this sounded like a good idea. In practice, it was a little more involved than I expected. So here's a little bit of information about track files, and you can download some at the end of this post.
First let's clarify "track file". This is a non-specialist term for a file that is used to put a track on a map, with waypoints (for example checkpoints) and optional information (names, even images) enclosed. In geospatial jargon such files are called "vector files", which simply means that they contain collections of simplified real-life entities that can be represented as basic geometric objects: points, lines and polygons (the area enclosed by closed rings of lines, such as triangles, rectangles, or irregular shapes). For example, each tree in the forest outside my window could be represented as a point, whereas the area of the forest would correspond to a polygon -- or maybe a multipolygon (several non-overlapping polygons) if the forest consists of multiple wooded islands. The general terms for these things is "feature", and the most common feature types are point, linestring (sequences of points that make up a line), polygon (sequences of lines that enclose an area), and the multi- versions of each (multipoint, multilinestring, multipolygon). Vector files contain the coordinates that define each feature. (Oh, right, we also need a coordinate system -- there are many, latitude/longitude on an approximate Earth ellipsoid being very common. Mapping is a surprisingly complicated topic that can , luckily, be left to the software we use, most of the time.)
With this rough understanding what kind of file we're dealing with, what types of track file could we get for the Iditarod 2014? Well, the specific type depends on the file's purpose:

for science and map making, the data is usually stored in what amounts to little databases that combine both the geographic coordinates of the features and a table -- sometimes a large table -- of information about them. The most common are: ESRI Shapefile (a proprietary binary format, but with an open specification), GeoJSON (which is human-readable), or formats requiring full-blown database software (PostGIS, Spatialite...). These files require quite specialized software to work with.
for GPS tracking, a variety of text-based formats are available, the most easy to use being GPX.
for consumer-accessible web mapping, mostly under the influence of Google's Map and Earth products, Google's Keyhole Markup Language (KML) format has become widespread (and KMX, which is just KML + some extra overlay resources, zipped together).

KML is similar to the first category, but is not really made for storing a lot of feature attributes in a standard way. It also contains a lot of extra information related to the presentation -- the colour of the lines, links to little icons, the order in which the various feature layers should be displayed. GeoJSON often does for open-source web mapping what KML does for Google Maps, but is also a nice alternative to shapefiles.

As for the Iditarod trail, a simple web search shows that KML files are widely available. We want one with the track for the northern route, that is, an even-numbered year. Well, here is one for 2008. If you download the file and have Google Earth installed, it will open directly. But if you look inside, it turns out that the location data isn't actually contained in the file, but in a different one that is imported via the web. This is an example for how Google KML files are just a lot more flexible -- or messy -- than file formats made for professional or scientific applications. But we have tools to transform one format into others. Without going into any detail, the most powerful ones are a set of command-line tools distributed with the Geospatial Data Abstraction Library (GDAL) and, for making GPX files (and putting the geospatial data files on a map), the online GPSVisualizer (which uses GPSBabel).

With these two, and a bit of knowledge, I extracted the track and checkpoint information and converted it:

... to a set of ESRI Shapefiles (zipped archive) in Latitude/Longitude coordinates [1], for the routes and the checkpoints separately (shapefiles can only contain one type of features, so I had to separate the checkpoints (points) from the route segments (linestrings); [2])
... to a GeoJSON file containing both routes and checkpoints (this is an advantage of GeoJSON over ESRI Shapefiles)
... to a single GPX file, also containing both routes and checkpoints

You can right-click, save, and play around with the files. What can you do with them?

The GPX file opens in Garmin's no-cost Basecamp software (or the older Mapsource), which you can use to push it to a Garmin GPS device. I would expect the same works in other GPS software. (Maybe I get this post out early enough for some people who work on the trail to try it out!)
The GeoJSON file can just be opened in a text editor and read. It is a much cleaner version of the KML file.

The shapefiles are suitable for use in mapping (GIS) software. Here is a little map of both the Iditarod (northern route) and Yukon Quest trail, which I made from these shapefiles (and similar ones for the Yukon Quest) using free mapping data from Natural Earth and the Iditarod and the free GIS software called uDig.

Last, the shapefiles or the GeoJSON file can serve to calculate distances between the checkpoints. For programmers, here is a tutorial I wrote how to do this. For everyone else, I'll just copy and paste the result. Note that it starts in Willow.

Willow --> Yentna
  Distance in km: 50.7
  Distance in miles: 31.7
Yentna --> Skwentna
  Distance in km: 47.7
  Distance in miles: 29.8
Skwentna --> Finger Lake
  Distance in km: 56.4
  Distance in miles: 35.2
Finger Lake --> Rainy Pass
  Distance in km: 40.1
  Distance in miles: 25.0
Rainy Pass --> Rohn
  Distance in km: 52.2
  Distance in miles: 32.6
Rohn --> Nikolai
  Distance in km: 103.8
  Distance in miles: 64.9
Nikolai --> McGrath
  Distance in km: 82.0
  Distance in miles: 51.3
McGrath --> Takotna
  Distance in km: 26.2
  Distance in miles: 16.4
Takotna --> Ophir
  Distance in km: 32.7
  Distance in miles: 20.4
Ophir --> Cripple
  Distance in km: 113.4
  Distance in miles: 70.9
Cripple --> Ruby
  Distance in km: 109.7
  Distance in miles: 68.5
Ruby --> Galena
  Distance in km: 84.8
  Distance in miles: 53.0
Galena --> Nulato
  Distance in km: 76.1
  Distance in miles: 47.6
Nulato --> Kaltag
  Distance in km: 56.3
  Distance in miles: 35.2
Kaltag --> Unalakeet
  Distance in km: 115.9
  Distance in miles: 72.4
Unalakeet --> Shaktoolik
  Distance in km: 74.4
  Distance in miles: 46.5
Shaktoolik --> Koyuk
  Distance in km: 64.9
  Distance in miles: 40.5
Koyuk --> Elim
  Distance in km: 71.5
  Distance in miles: 44.7
Elim --> Golovin
  Distance in km: 42.7
  Distance in miles: 26.7
Golovin --> White Mountains
  Distance in km: 24.7
  Distance in miles: 15.4
White Mountains --> Safety
  Distance in km: 78.1
  Distance in miles: 48.8
Safety --> Nome
  Distance in km: 34.5
  Distance in miles: 21.6

Total distance: 1438.8 km -- 899.3 miles

Okay! Well, Unalakleet is misspelled -- because it was misspelled in the original KML. (I fixed it in the GPX and GeoJSON files.) Second, the total distance comes out a little low, but if you add an extra 5% or so (because there are only 771 route points for the whole track, so curves get cut off), it looks pretty good. Last, closer inspection shows that the very first leg Willow-Yentna has a very long straight line between the very first two points and therefore is particularly underestimated. So if you are interested in these distances, don't trust blindly, compare with what the Iditarod Trail Committee says, but go ahead and use them any way you want.

[1] For advanced users, a set of files in Alaska Albers projection is also available -- this is better if you want to use it to measure distances, as the coordinates are in metres in a map projection that works well for Alaska (not much distortion).
[2] Also, if you unzip the archive, you will file more than just two files: a shapefile actually requires three or more related files -- the one ending in SHP is the actual shapefile with the geolocation information, the one ending in DBF contains the attribute table and the one ending in PRJ contains coordinate system and map projection; then there are indices (SHX, ...) etc. etc.

John Schandelmeier's ADN piece on trackers

2014-02-27T14:56:00.000-09:00

I'm in London doing work-y things (workshop on strengthening the internet against pervasive surveillance, Internet Engineering Task Force meeting). It's a long trek from Alaska, and while I was in transit John Schandelmeier published an article in the Anchorage Daily News questioning the value of GPS tracking in dogsled racing. I actually agree with him substantially but think he's really not addressing a few things that matter a lot.

John is not the first racer I've heard or read saying things along these lines. I expect that it is incredibly annoying to be on the trail and away from people, the world, etc., but to see a red light blinking at you hour after hour after hour after hour. In addition to a general sense of being unable to disengage from the clutter, of one thing of which I have absolutely no doubt is that some number of people carrying trackers on their sleds feel like they're under surveillance.

I also think it's an open question what value they bring to the races. It's certainly less of a question in the case of Iditarod, since they seem to be making a profit on tracker subscriptions (I'd also argue that there are more people running Iditarod with marginal trail skills who need to be kept an eye on than there are in Quest, but I suppose that would be overly argumentative). With Quest it's less clear that it's led to a substantial increase in financial support from fans, particularly given the state of the purse over the past few years. And, of course, fan overreaction to things that they see in the trackers, plus managing the PR aspects of real problems in real time, increase both the workload and stress level for race staff.

And to be sure, there is no substitute for physical presence and human interaction. Over all these years, hands-down and by a large margin my favorite race spectating experience was last year at the Two Rivers checkpoint. Hugh's tracker was off and while we knew where Allen was we didn't know if Hugh was ahead of him, behind him, ... ? So there was a crowd, mostly handlers and people from the dog-savvy Two Rivers community, waiting at the checkpoint to see who'd be the first in and the likely winner of the 2013 Quest. There was a lot of chatter, a lot of suspense, and a lot of camaraderie as we waited.

That said, there is more than one way to experience the race, and I wouldn't denigrate the experience that people who cannot be here, who don't run dogs and don't know winter, are having as they follow along from home. As I was flying out yesterday/last night/whenever the heck that was (it all runs together ... ) I looked down on the landscape, and the trails that go on for miles without ever crossing a road or encountering a town, and once again I realized ridiculously lucky I am to be able to live in Alaska. Most people don't, and most people can't. They come up and visit and have, I'm sorry to say, staged, inauthentic experiences, but somehow it captures their imagination and they fall in love with the romance of the place even if they can't quite engage Alaska directly. Following the races is one way for them to keep the romance alive. I'd argue that's a good thing, even if it's not really got very much to do with what Alaska is actually about.

But still, one of the things I've been hammering on is that the data and the trackers do tell some stories, if you care to watch and listen. Unfortunately Trackleaders.com is down right now but when it comes back up I'll post a bit of John's track from this year's Quest, where we did get to watch a story unfold and did get a sense of what was happening. He was traveling with Matt Hall (and really, somebody has to have a word with the unfortunate PR people who were handling the Quest's Facebook page and were turning every instance of people traveling together into a race). We watched them stop, leave the trail, go some distance, turn around, rejoin the trail, and stop again (here's an excerpt from Matt's track; John's looks much the same). So, while we don't know what they looked and felt like, we do have some idea that they ran into some tough trail and we watched them deal with it.

Similarly, I think a lot of people following on the GPS tracker remember standing up and screaming at their computers while watching Rob Cooke on Eagle Summit last year. We watched him motor on up, pause, and turn around. This is a case where it was much less clear what was going on (it looked possible that he was having problems but it turned out that he had so little difficulty going up he thought he must have left the trail, and turned around to find it) but it was emotional in any event.

So no, it's not at all the same as being on the trail. People are working with woefully little information and sometimes they don't understand what they're seeing at all (and this is where race organizations can be doing a better job, to head off overreaction and to help fans understand what they're seeing). But they're having a different kind of experience and have their own level of emotional involvement in it. John and others may not value it as highly as they value direct trail experience (and I wouldn't, either), but it's real and it's meaningful.

Why Iditarod tracker mileage sorting is messed up

2014-02-23T08:48:00.002-09:00

People have noticed that the sort by mileage function on the Iditarod leaderboard was wrong (it's been fixed), with 100-something miles sorting in front of fewer miles. For example, see this screen shot grabbed by Dawn Beckwell:

Here's what's going on (and this will be old news to a few people and not interesting at all to most): Computers are binary calculators, with all data, from programs to stored files, taking the form of a string of 0s and 1s. Both characters and numbers are also strings of 0s and 1s, and there are standardized encoding schemes for representing character data. By far the most popular/successful is known as ASCII, or the American Standard Code for Information Interchange (nice collection of ASCII tables here). So, while the number 1 is "00000001" in base 2 (again, base 2 because each bit can take one of only two values, 0 or 1), the character "1" is encoded in ASCII as 00110001. That's right, 1 and '1' are not the same. When you see a '1' on a screen what's really behind that - how the data are really represented - is 00110001. 00000001 is translated to 00110001 for printing or display.

Programs that do things to data, like sort them, have no way to know what 00110001 is or how they should treat it. In a lot of web-oriented and application-oriented programming languages it's very easy to sort data (old school, we had to write our own sort functions) but the default is that data are treated like characters. They look at the first character in each string and sort on that, etc. In that scheme, "1" is smaller than "7" and sorts earlier. To sort as numbers, in modern programming languages you just need to tell the sorting function to treat the data as numbers, not characters, so instead of looking at it character-by-character it understands that the value it needs to sort is 101.5.

Answering a question with the new Iditarod tracker

2014-02-22T21:50:00.000-09:00

Today, someone asked what time Conway got into Yentna. So, how do you answer that question with the Iditarod tracker analytics? Well, the best you can do is to take a look at what time his speed fell to 0mph. Because there is no plot showing mile location against time (or time against miles), you can't say "Checkpoint <abc> is at mile 128 and Musher <def> was at mile 128 at 4:20, so Musher <def> arrived at <abc> at 4:20. Instead, you can make inferences from speed. And, in fact, the standings say he got in at 3:56.

Another possibility for figuring out when he got in is to use the replay. They don't allow you to control the speed of the replay and wow, that's going to suck a lot when the race gets longer and you've got over 60 teams on the trail, but if you drag the slider you can do it manually.

[Another UX problem - that legend on top of the curves is really annoying!]

The new Iditarod tracker and user experience ("UX")

2014-02-22T21:23:00.000-09:00

First, I'd like to apologize for not blogging much over here. I've been extremely busy with work (that's a good thing, mostly) and have found that keeping a Facebook page is a handy way to get out short notes. The Facebook page is here.

Anyway, Iditarod's new tracker is up and being used to track the Junior Iditarod. It's pretty clear that they've written their own based on a data feed from Trackleaders, and it's also pretty clear that they didn't have time or the means to debug it. Software quality assurance is much more difficult than you might think. One common problem is that commercial-grade software needs to handle unexpected inputs gracefully. It is an ongoing source of amazement what people will try to do with something, things you never could have expected and didn't plan for. When people in the techie business say "The first 80% of the project takes 80% of the effort, and the last 20% of the project takes the other 80% of the effort," that's nearly always what they're talking about - quality assurance. People who haven't done a lot of commercial software tend not to appreciate this and think that when a program does what they want it to, they're done. Not sure what to say about that other than "Hah."

Anyway, rather than dwell on bugs I'd like to talk for a minute about user interface issues. User interface is also a very highly specialized area in software. It's something at which I am truly terrible, so I rely on people who understand user behavior, workflows, and so on. But in this case I am a user and here's what I'm finding:

Some of the things which work well for a 9-team field are going to be nearly unbearable when there are nearly 70 teams being tracked
First, good for them for making the columns sortable in the leaderboard (the panel on the left-hand side of the map). It's helpful, and it's going to be absolutely necessary when there are 60-odd teams on the trail
Too much clutter on the screen, and it covers up portions of the map. Unfortunately because of a few other problems with the user interface we kind of need to keep some of it around (the leaderboard)
The base map layer is not a good choice. I understand that this is what Mapbox provides but the lack of labels on geographic features is unfortunate. It would be nice to have the option to switch between a map layer and a satellite image layer (a topo layer would be awesome but I understand it's a lot more difficult to come by - another plus for Trackleaders). On a more positive note, today Iditarod switched from showing the road map layer to showing a terrain map layer. It makes it easier to compare to a topo map, plus - let's face it - using a road map to track a wilderness race is kind of dumb
We can't zoom out to cover a larger geographic area. Can't imagine why not unless it costs them money (does Mapbox charge for tile access? Don't know).
They need to get a handle on this whole "rest" thing. I'm very interested in run/rest schedules (you should be, too! They're a key question in understanding distance dogsled races) and a 10-minute stop for snacks or to check booties is not at all the same thing as a 4-hour break. Also, it's probably a mistake to display a musher as having stopped the same as a tracker that hasn't updated. It doesn't help that a single 0mph reading is treated as stopped, because it means that they're also showing a single missed tracker update as stopped
If you hover over a flag on the map it gives you the geographic coordinates of that tracker update. I assume they did that for debugging purposes, but for those of us following the race it would be a big improvement if they showed the musher's name, instead. Right now you need to go back to the leaderboard to find who a given bib number belongs to. That's going to suck when there are 69 teams on the trail.

I'm having a hard time calling their analytics "analytics," since they don't provide that much insight into what's going on on the trail. I keep hammering on this because I think it's important: the competitive advantage that Trackleaders brings to the event tracking business is that they know how to tell a story using data. Teams on the trail aren't simply moving down a line, they're also moving relative to one another, and that movement is much of the story of a race. Who's traveling together? Who's passing whom, and where is it happening? How much faster *is* one team traveling than another, really? Is there one particular spot on the trail that's a popular camping site? I get the impression that the folks working for IonEarth were pressured by Iditarod to provide analytics and found a Javascript library containing strip charts so implemented that, without thinking very hard about what they want to show. Now, Iditarod is copying that.
Here's one thing the Iditarod analytics do do well: by mapping speed against time they give you some insight into a particular musher's run/rest schedule.
It's great that they let you get a musher's "analytics" directly from the leaderboard but it's kind of a clutzy process. I usually start by noticing something on the map I'd like to look at more closely. In this case, what we see on the map is bib numbers. So you need to go over to the leaderboard (open it up if you've closed it to mimimize clutter), find the bib number, then click on that person's analytics icon. It'd be a lot more straightforward to be able to go from the map marker directly to the "analytics."
Another clutter-related issue is that because the pop-ups don't close when you open another, you can get a mess pretty quickly. Unfortunately closing them can be a little hit-and-miss with your mouse. Anyway, your moment of fugly:

They're not really strong on the mileage reporting and while the analytics show speed against time there's really no easy way to compare how two teams performed over the same section of trail

Anyway, enough kvetching. When you're in the software business and when you're an engineer, your first instincts when facing new technologies are 1) to figure out how it works, and 2) to figure out how to make it better. Alas, this tracker is giving us plenty of opportunities for the latter. But ultimately what matters is how it works when put to some basic tests, and in a couple of future posts I'll look at how to answer certain kinds of questions using this software.