Wednesday, February 27, 2013

More on Iditarod analytics

So, the other day I wrote a post on IonEarth's new "analytics" that they'll be using in this Iditarod for the first time.  In rereading it I thought it was too negative and not very helpful, so although I left it up I did not publicize it.

Instead, I thought it might be worthwhile to spend a little time talking about tracker analytics and how they're useful.  The first thing that needs to be addressed is the question of why to have them in the first place.  Frankly, a lot of people are just not interested - they're happy with seeing the location of teams on a map and that tells them what they want to know.

However, sometimes people want to know more.  They want to get a handle on a given team's run/rest schedule, which is so critical to distance dog racing, or they want to know if the gap between a given pair of teams is opening up or closing.  Static maps, in which you can only see the current location of a team, won't help you much unless they also allow you to step backwards and forwards.  And even then you probably still won't get a very good feel for the overall trajectory, and it can be extremely tedious to step through an entire race just to find the couple of hours you're interested in.

There are some obvious tools that seem to be helpful at first glance, like average speed.  Unfortunately conventional averages treat all sample points identically and you can't tell if the speed has been constant, increasing, or decreasing.  Given two teams with two averages, all you can tell is which team has been faster so far, not whether or not a team is accelerating or slowing down.  Two teams can have the same average traveling speed when one goes crazy-fast but rests a lot and the other plods steadily with few breaks.

Analytics can help you get a sense of the dynamics of a race, how things are changing, where the averages come from, and whether or not there's something surprising in the underlying data.  That is, it can help you with those things if they're good analytics.  While I admire Trackleaders' clever use of low-cost, commodity hardware what I really, really love about them is that they understand that a race is dynamic and constantly changing, and they work pretty hard at figuring out ways to tell the story behind those points on a map.

Some of the tools Trackleaders provides are conceptually simple, but they're elegant and very expressive.  For example, their race flow chart just plots at what mile teams were at at a given time.  They put the top 10 teams on it (and they *really* need to allow us to choose which teams we'd like on the plot), but I digress).  It's a very simple idea but you can look at the plot and immediately see who's traveling faster, who's catching up with whom, etc.  You can see the relationships between the teams change over time and space and get a very good handle on what's been happening and what you can expect to happen in the near future.  It's a fantastic tool.  Here's an example, with the end of the 2013 Yukon Quest.  That race was a nailbiter almost right up until the end, but this plot shows Allen Moore opening a clear lead on Hugh Neff, as their lines get further apart over time (Allen's line is the yellow one on top, Hugh's is the light blue line very close to it):



Another of their tools that works well for distance dogsled racing is their plot of speed against time.  Again, you can take a look at one and get an immediate sense of run/rest schedules, whether a team is getting generally faster or generally slower, etc.  Here's a look at Brent Sass's speed/time plot from the 2013 Quest:


As you can see, he took very regular breaks on the trail (fantastic to see this kind of discipline from Brent; he could win this thing in the not-too-distant future and it's discipline like this that will help make it happen), along with his long, 40-hour layover in Dawson City.

Okay, so what about Iditarod and IonEarth?  Well, I'm trying to find something nice to say and it pretty much comes down to that their pictures are pretty.  I don't think they'll find themselves copying Trackleaders, unfortunately, but I think there are a couple of things they could do to improve their analytics on their own.

For one thing, if they can overlay temperature on top of the speed plot (seriously, were drugs involved with that decision?), they can certainly start thinking of ways to combine plots from different teams in a way that expresses their relationship (distance, speed) over time.  Another thing they could do that would help a lot would be to provide moving averages (REAL moving averages, not their ferkakte average speed while moving) over, say, two different sets of terms, one longer and one shorter, to get a sense of how speed is changing and how one team's speed is changing relative to another.

The main thing here is that when we're following a race, those of us who are interested in the dynamics of the race and how things are changing over time would find some analytical tools that can answer our questions or represent graphically what's happening on the trail really, really helpful.  IonEarth has incredible hardware in the units they mount on Iditarod sleds, and if they provided better analytical tools during races I think they'd be nearly unstoppable.

Tuesday, February 26, 2013

Mike Ellis, Team Tsuga, and respecting your dogs

As I'm sitting here on my hindermost parts, working on my computer and looking out the window at another glorious Alaska late winter day (deep snow, lots of light, warm temperatures -- nice!  What on earth am I doing inside, again?), dog teams from interior Alaska are headed down to Anchorage for the Iditarod start.  I am not much of a fan of the Iditarod but I'm a huge fan of some of the people and dogs who are running it.

I think a lot of people know Mike Ellis as someone who runs purebred Siberian Huskies.  Some smaller number of people are aware that he and Sue are consistently recognized for outstanding dog care, both on and off the trail.




What even fewer people may be aware of is that they've got a very, very successful breeding program, one which is absolutely consistent with "Respect your doGs."  Not that long ago it was pretty common to hear mushers say "If you get one good puppy out of a litter you're doing well" or "Expect one good puppy for every 10 you whelp," and you still hear that from time to time.  But Mike and Sue have been producing consistently good litters, in some cases with all the puppies eventually making their Quest team.  By focusing on quality rather than volume, they're producing excellent litters, improving working Siberian Husky performance, and raising the bar for other mushers, purebred and Alaskan alike.

As a side note, there's been a trend towards breeding lankier and lankier Siberians for performance, with some dogs these days looking more like Alaskans than Siberians.  The breed standard says "moderate," though, and I've heard Mike insist on breeding moderate dogs.  Team Tsuga dogs look like Siberian Huskies.

So, needless to say that it was not a surprise when Team Tsuga won the Seppala Heritage grant ("Applicants have to demonstrate a commitment to work with, train and race sled dogs, and show value traits of generosity of spirit, courage, integrity and love for the dogs, land and people of Alaska").

Fortunately the Ellises are not the only mushers taking exceptional care of their dogs, and there are quite a few teams in this year's Iditarod that deserve a lot of respect.  I was at the road crossing in Two Rivers when some of this year's Quest teams came through, and Dan Kaduce's team looked incredible (the other eye-popping team I saw come through was Scott Smith's, although he's not running Iditarod this year). Dan and his wife, Jodi Bailey, run Dewclaw Kennel together, and Jodi will be taking some of the dogs from Dan's Quest team in Iditarod this year.

There are many other teams in there who display consistently great dog care (and I think one in particular is likely to win this year's race as long as nothing weird happens with the weather or trail), and my not mentioning them here should not be seen as suggesting that I don't think they're model dog people.  But every once in awhile you see something kind of magical, and need to give it a nod.

Saturday, February 23, 2013

Junior Iditarod tracker is live and free

Not to be confused with "wild and free."

Anyway, the Junior Iditarod teams are on the trail, and like last year the Iditarod is providing free live tracking.  This year IonEarth has added an analytics page, so I've been taking a quick look at it to try to figure out whether or not it's helpful, what I can learn from it, what I can't learn from it, and so on.

The thing I ran into when looking at the Iron Dog tracker is that I had no idea what they meant by their various averages.  It looked like "moving average" and "average" were possibly the same thing, with one taken over more terms.  It turns out that while the definitions are not available on the Iron Dog tracker, they are available on the Jr Iditarod page, and look like this:

So, this was something of a surprise (as in a big one), since "moving average" means something to statisticians, and it's not this.  Both their average speed and moving average speed are actually cumulative averages (probably?  they aren't very clear).  Also, my suspicion is that "Speed" is not "across the course," but is actually instantaneous speed.  Anyway, I think they could remove the confusion and be much clearer (to both of us mushing fans with statistics backgrounds ... ) by changing "Moving average speed" to "Average moving speed."

After looking at the Iron Dog tracker and thinking that I had a pretty good handle on what was actually on the analytics plots (and being largely wrong ... ), this morning I looked at the Jr Iditarod analytics and immediately sat down to try to figure out what the curves represent:


[Also, go Noah!  Upstate NY represents!!].  So I was eyeballing the red line and trying to figure out if it was being taken over more terms (which seemed likely) than the purple line, and then I looked at the legend.  And, d'oh!  it's actually the temperature curve.  If they're going to put that on the plot at all they really need to put it on the same plot as altitude (the upper blue line), since it represents environmental factors and is just really confusing on the speed plot.  So, moving it up would make the plot a lot clearer.

I was also having some difficulty understanding how the curves can be so smooth given the very, very small number of samples (the race started at 10am AKST, it's now 11:30am AKST).  I think they just drew some BĂ©zier curves, decided they look nice, and left it at that (Chris says "It looks like the analytics are a response to an RFP") rather than worrying about what information was being conveyed.

But, it's very, very early in the race and at this point I'm not sure how much useful analysis can be done, anyway.  As the race progresses I'll be very interested in how the speed curves represent what's happening in the race.

IonEarth makes incredible hardware - there's absolutely no question about that.  That they're as reliable as they are makes it possible to learn a lot more by glancing at their tracking map than you can by just looking at positions on a Trackleaders tracking map (because the SPOTs are less reliable you also need to take a look at the age of a "breadcrumb").  I'd love to see them improve their analytics, even though I'll acknowledge that it's probably just a small minority of us interested in those things.  What they've got now is pretty but perfunctory.

Friday, February 22, 2013

Using my spreadsheets for race data - Part 2

In my previous post I introduced some of the background and goals for using a spreadsheet (particularly  the Google Drive spreadsheet) for doing some record-keeping for distance dogsled race data.  In this post I'd like to go into some of the details of how I dealt with the calculations.

The columns represent a particular piece of data of interest.  I'm probably going to find myself slipping into calling them "fields" from time to time, so when you see that, know that I'd just referring to a particular column.  The columns/fields are:

  • Bib number
  • Musher name
  • Date and time in
  • Runtime from the previous checkpoint
  • Speed from the previous checkpoint
  • Speed ranking from the previous checkpoint (where "1" is the fastest, and higher numbers are slower)
  • Date and time out
  • Rest time
At the top of each column, in spreadsheet row 1, are the column headers explaining what's in each column.

I'll step through each column in the following sections.

Bib number


As I explained in my previous post, I'm using the bib number to tie data together across sheets.  I initially loaded my data from the Yukon Quest website (for example, here) using the mechanism described in this blog post.  But the Quest does something a little odd, and rather than having a separate bib number field in their own data, they put the bib number in parentheses next to the musher name.  Extracting it by hand is tedious and error prone, so I populated my bib number field by extracting it from the Quest's musher name field using the Google spreadsheet "regexextract" function.  A "regular expression," often abbreviated to "regex," is a way of describing patterns of text.  Regexextract pulls data out of a cell based on a regular expression.  Here's what I put into cell B1 (the "Bib" column in the "Hugh Neff" row) in the spreadsheet I'm using:

=regexextract(B2;"([0-9]+)")

What this says is that in cell B2 (the one containing "Hugh Neff (4)"), I'm looking for a number from 0 through 9 followed by zero or more numbers from 0 through 9, and I want to extract those into this cell.    I then copied that and pasted it into the other rows in this column (that is to say, cells A3 through A27), and the spreadsheet software automatically adjusted the cell number to match the current row.  So, when I pasted it into cell A4, it then adjusted the formula to examine cell B4.  The best thing is that you can select a range of cells to paste the formula into, and it just works - all the cell row numbers are adjusted.

As I mentioned, the reason I did this is because of how the Quest shows the bib number.  You may prefer to manually enter the bib number in its own column for your race.

Musher

This is just the name of the musher.  In this particular spreadsheet I have the bib number in parentheses following the musher name, but that's only because the Yukon Quest does it that way, as explained in the previous section.

Date/time in

This cell contains the date and time that the musher arrived in this checkpoint, in the form 
month/day/year hour:minutes:seconds
Here are several things you should know:
  • The cell must have this particular format applied.  Go to the "Format" menu, choose "Number," and then choose "Date time".  Apply the format to every cell in the column except for the one in row 1, which contains the column header (label)


  • Google Drive will try to guess what you mean and can be pretty smart about format conversions. If you type something like "February 3 18:00" it will convert it to "2/3/2013 18:00:00" for you, automatically adding the year, converting the date to numbers, and adding the seconds.   I've run into one situation in which it did not convert correctly.  This may not be intuitive to people who don't write software, but if you enter a date/time that has leading blanks (that is to say, instead of saying "2/3/2013 18:00:00" it says "     2/3/2013 18:00:00") it will think that you mean that this is just a string of text characters and should not be converted.  I ran into this with the Quest 300 data, where it looks like they tried to center numbers in the cell by adding a bunch of blanks in front.  Don't do that!

Runtime from previous checkpoint

This is where you start to see some serious labor savings.  We tell the spreadsheet to do the work for us, by finding the right values to do the arithmetic on, and then doing the arithmetic.  Here's an example of what we put in a "runtime" cell on the sheet for the Circle City checkpoint:

=if (C2="", "", C2-vlookup(A2, 'slavens'!$A$2:$G$27, 7, FALSE))

Let's step through this one:

As I mentioned earlier, we need to make sure  that we're catching errors where we can, so that we don't show confusing junk to the user.  At the start of the race I want to fill in the formulas, even though I don't know what the actual dates and times will be.  I do know where they are, though, so I put the whole thing inside a big "if" clause.  The "if" formula in Google spreadsheet has three parts, inside parentheses and separated by commas.  The first part is the one that's evaluated.  In this case we're checking to see if the C2 cell is empty.  The second part is what's done if the first part is true.  In this case we just keep it empty.  The third part is what we do is if the first part is not true - if the C2 cell is not empty.  

So here's what we do if we've got an arrival date and time: we need to subtract the time out of the previous checkpoint from the time into the current checkpoint to get the runtime.  The date and time into the current checkpoint is in row C, and we need to find the time out of the previous checkpoint to subtract from it.  We could use a fixed spreadsheet location, but because we want to be able to be more foolproof than not, allow ourselves some flexibility in musher ordering, and reduce the likelihood for error, we're going to search for that number.  To do that, we use the spreadsheet "vlookup" function.  It takes four arguments, separated by commas:
  • the first argument is the value we want to look for.  In this case, we're using bib numbers to identify mushers and the bib number is stored in row A, so we're going to look for the value in A2 for row 2, value in A3 for row 3, and so on.
  • the second argument is where to look, and this is a little tricky.  First, we want to look in the sheet for the Slavens dog drop, which I've labeled "slavens"  
  • Within the "slavens" sheet, we want to search the range of cells from A2 through G27.  However, as I've mentioned above, the spreadsheet tries to be helpful by adjusting rows and columns when you copy and paste.  If you have a formula in cell B3 that references cell A2, if you copy that formula into cell C5 the spreadsheet will change that reference to B4.  So, to tell it you don't want it to adjust the cell reference values you prefix each row and column with a '$' sign.  '$A$2' means "yes, I really mean A2."  
  • the third argument is the column from which I'd like to take the value.  I think this entire interface is a little awkward but it does give you some flexibility.  So, when you find the value from A2 (in this case) in the range of columns given in the second argument, take the value from the 7th column (wasn't that a movie?).  In this case, the 7th column on the previous spreadsheet contains the departure time.  And magic, it does the arithmetic and calculates the runtime for you.
  • The last parameter ("FALSE") tells vlookup that your data are not sorted.
  • Do make sure that the "Format" for the runtime cell is hours, using the Format menu:


So that's how we set up the spreadsheet to take care of the runtime arithmetic for us.  Once that's done, it's done.  

Speed

Speed is derived from runtime (speed is a function of time and distance).  I was pretty slovenly and put the distance between checkpoints directly into the formulas and then copied-and-pasted.  Here's what it looks like:

=iferror(round(58/(int(D2)*24+hour(D2)+minute(D2)/60),1))

This looks like a hairball but it actually isn't, and to be honest I'm not really sure that this is the best way to calculate the speed.  But, it works, given the limitation of the medium.  Again, we're going to tackle this from the outside ("iferror") in.

  • "iferror" is a function that allows us to catch errors and make sure that something that's clearly wrong won't be displayed.
  • "round" is a function that allows us to control the number of digits to the right of the decimal point (or precision).  When you're talking about dogsled speeds, something like "7.1" makes sense.  Something like "7.128475" really doesn't, so let's not show that.
  • Inside the "round" function is where we're actually doing the arithmetic - dividing the distance traveled by the time it took to travel it.
  • The rest of the string ("int (D2)*24+hour(D2)+minute(D2)/60)" converts the date and time into just plain hours.  I can write some more about this at a later time but I think it may be a distraction here, since my assumption is that the people who use these spreadsheets are going to be doing a lot of copying and pasting and only need to know details to the extent that it helps them adapt the spreadsheets to their own race.  So: use this formula, but change the miles ("58") to the mileage from the last checkpoint to this one as needed.

Speed rank

I decided to rank runtimes from the previous checkpoint because it's something people ask about, and because it's easy to do.  I put this in column F.  The formula is:
=iferror(rank(E2; $E$2:$E$28; 0))
Again, outside to inside we have
  • iferror, to catch mistakes like missing data
  • the "rank" function, which takes the following parameters/arguments
    • the first argument is the cell for which you'd like to find the ranking (speed, which is in column E)
    • the second is the range within which you're ranking the data (and again, you put '$' in front of the row and column numbers to make it's clear to the spreadsheet that these are absolute, and should not be adjusted if and when you paste this formula into another column)
    • and the third argument tells rank if you want to sort in ascending order (0) or descending order (1)
You may have noticed that in this case I'm using a semicolon (';') to separate arguments when previously I'd been using a comma (','). This is the kind of inconsistency that can creep into larger projects when they're done over time rather than in just one sitting. I can and should bring them into consistency at some point ...

Departure

Column G contains the last piece of manually entered data, the departure time from the checkpoint. Again, this needs to be in the Date time format.

Layover, or rest

The last column, H, contains the rest time at the checkpoint, and is simply the value you get from subtracting time in from time out:

=iferror(if (G2="","", G2-C2))

Wrapped, as always, in an "iferror".

So that's a pretty messy, incoherent overview of what's actually in the spreadsheet cells, and how the arithmetic is done.

In the next post, I'll tell you what you need to do to adapt these spreadsheets for your race.

Progress on antacids

The Finnmarksløpet, which has had some issues around dog care (see Dave King's article in the May/June 2012 issue of Mushing magazine), just released the happy news that they're now allowing the use of antacids (famotidine and omeprazole) on dogs running the race.  They've been using the IFSS's banned substances list, which excluded those life-saving treatments (as well as others).  This became an issue when the IFSS tried to have a distance world championship race in Alaska next month and was turned away because of that list.
This comes in conjunction with the IFSS's revision of their banned substances list, released a few weeks ago.  Much of this is driven by the IFSS's desire to see dog mushing recognized as an Olympic sport, which we continue to think is asinine, but as long as they're doing no harm (and their previous banned substances list did do harm), it's their business.

Thursday, February 21, 2013

New tracker analytics for Iditarod

I guess it's time to start ramping up for Iditarod coverage.  Recent interactions suggest that we won't see much improved video, but IonEarth (the GPS tracker service used by the Iditarod) seems to be feeling the heat from the guys at Trackleaders, who are committed to trying to find ways to reveal the stories in the tracker data.  For the most part IonEarth has been content to show you where everybody is, but that's no longer enough.

IonEarth is tracking the Iron Dog, a snow machine race from Big Lake to Nome to Fairbanks. (Incidentally, the Iron Dog runs over part of the Iditarod race trail and it may be worth taking a look at how that's going to get a sense of trail conditions).  Thanks to the sponsorship of the Alaska National Guard (which also sets up and staffs the Yukon Quest Two Rivers checkpoint - a huge w00t! to them), the Iron Dog trackers are free, so here's your chance to take it out for a test drive and see if you want to subscribe to the Iditarod's.

The Iron Dog tracker is here.  If you subscribed to the Iditarod tracker last year this should look very familiar, at least at first:


You'll note the addition of some new tabs across the top of the map:

  • Web view (that is to say, this view)
  • Analytics view
  • Leader view
  • Mobile view
  • Help
I've said before that IonEarth's stuff is a good example of why we don't let hardware people write software, and I'll say it again.  To get to the mobile view you need to load this map, which is a little slow to load on my machine with a quad-core processor, 12GB of memory, and a 3Mb down network connection.  To get to the mobile tracker you need to go to this page (as nerds say: FAIL).  On my phone, it took three minutes, and once it's loaded you get this:

Whut

Note that this picture was taken from a distance of several inches.  My middle-aged eyes looked at the screen and my middle-aged brain said "No.  Absolutely not."  I sincerely hope that Iditarod has the good judgment to compensate for IonEarth's mistake by providing a link directly from the Iditarod site to the IonEarth mobile tracker site (fortunately the IonEarth interface is sufficiently RESTful that you can).

Okay, back to the new analytics page.  When you click the Analytics tab it will take you to a new page with a plot for one of the Iron Dog racers.You can add more teams - as many as two in total!  Very exciting.  But you can choose which teams to show (I continue to think that Trackleaders needs to allow us to add and remove mushers from the race flow plot).

The plot for each team looks like this:


The 'x' axis on the plot (the horizontal one) shows time and date (and note the little thumbs above the horizontal scrollbar that allow you to zoom in on particular times).  There are two values overlaid on the 'y' axis (the vertical one): altitude and speed.  The top line (the blue one) plots altitude against time.  (I'm unclear why last night was so short - I've got to assume that's a bug).

There are three speed values.  The green line shows "instantaneous" speed - the IonEarth tracker boxes contain an accelerometer in addition to the GPS and battery.  The purple line shows average speed and the lighter blue line shows moving average speed.  To be honest I don't know what they're trying to do with their averages (because they don't tell us - yay, IonEarth).  I have to assume that both are moving averages and that the purple one is taken over fewer terms, and possibly the blue line is actually a cumulative moving average (moving averages are a nice way of smoothing data to help get a better understanding of longer-term trends).  This may change with more experience playing around with their stuff, but at this point I do not find the moving averages particularly useful and I find the "moving average" curve to be particularly not useful.  But, as I said, this may change.

Here's where I think the speed/time plot will be very interesting: one of the things we try to understand during a dogsled race is run/rest schedules, as they can be one of the keys to performance.  Over the years teams have transitioned from a fairly strict 4-on/4-off schedule to longer run/longer rest or longer run/same rest (say, 6-on/4-off).  This will provide a nice graphic illustration of what teams are doing, just as Trackleaders speed/time plots on individual team pages does.  But:  IonEarth is still not showing the dynamics of the race, and how teams are moving against each other.  They're telling more of a story than they used to, to be sure, but they're still not telling much of one, which is too bad.

In conversations with fans it's become clear that many people just want to know where teams are at a specific point in time.  But, a growing number of fans are increasingly sophisticated in their understanding of distance mushing events, and I think there's pretty clearly a market for tools which help reveal more of what's going on on the trail.  IonEarth has amazing hardware; I hope that one day they either get more software people (and analytical types) on-board, or that they open up their interfaces so that developers who understand this stuff a little better can start to produce improved, more interesting tools.

What about you: when you look at GPS trackers while you're following a race, what do you want to know?

Tuesday, February 19, 2013

Using my spreadsheets for race data - first post

Somebody-or-other said "The two qualities of a great programmer are laziness and hubris," but I'm too lazy to look it up (and am sufficiently hubristic to suggest that I'm a great programmer by claiming to be lazy).  When you create something it's a really good idea to make it easy to adapt for other uses.  So I put together these spreadsheets for my own use during the Yukon Quest and I think they're now mature enough that someone else may find them interesting.  I've been trying to figure out how to write up a description in a way that's got minimal babbling and maximal useful information and haven't been very successful, so I thought it might be worth stepping through each of the fields.

But first, design goals.  In the best of all possible worlds,

  1. you should only have to enter a piece of data once, and
  2. you should never, ever have to do arithmetic
And that's what I did with the spreadsheets.

I started with something in mind along the lines of a relational database implemented with a spreadsheet.  That is to say, I'd have different sheets/pages within the spreadsheet for each checkpoint, plus the start and finish.  On each sheet, all I'd enter would be the musher's name, the time in, and the time out, and the other data would all be calculated from those.

I chose to use bib numbers to tie data together.  For example, if I wanted to calculate runtime between two checkpoints for Brian Wilmshurst (bib number 1), I'd search for the departure time for bib #1 on the sheet for the previous checkpoint and subtract that from his arrival time at the current checkpoint.  The reason that I chose bib number instead of name that there can be a lot of variation in the spelling of some names, especially when volunteers are tired, and if there was an error it would be easier to find it if we used bib numbers.  During the Copper Basin this year one musher's name went back and forth between "Matt" and "Mark" across multiple checkpoints, which convinced me to use something more reliable - bib numbers!

A couple of words on errors: it's a matter of professionalism for code to recognize and, where possible, correct errors.  A programmer who isn't diligent about error (or exception) handling is a slob.  What I found with the Google Drive spreadsheet (and spreadsheets in general, I guess) is that it's pretty easy to catch certain kinds of errors so that you wouldn't end up with garbage in cells, but catching input errors is pretty hard. Because of this, over the long term I think it's probably a good idea to write a race management package that's a little more specialized and that can be more robust against errors that are otherwise difficult to process in spreadsheets.

I'm going to assume that you've got some basic experience with spreadsheets, that you know that columns run up-and-down and rows run across, and that columns are labeled with letters, rows are labeled with numbers, and a "cell" is specified by its row and column.

So, I put mushers in rows and data types in columns.  Here's the column header for the race start, along with the first few rows:



As you can see I've got a column for the "position in," which is actually the bib number and should be changed to reflect that, the musher name, and the date and time out.  That's it!

But you'll probably have noticed that I've got the date and time out in the same field (cell).  It's possible to put the date in one cell and the time in another cell, but combining them makes it easier to do arithmetic on them (particularly across day values - for example, someone leaves at 11pm Tuesday and arrives at 3am Wednesday).

Here's what the column headers and first few rows for each checkpoint sheet looks like:


For each team we've got a cell for the bib number, name, date and time in, runtime from previous checkpoint, speed from previous checkpoint, speed rank, date and time out, and rest time at that checkpoint.  The only cells in which I actually entered data were the name, date and time in, and date and time out.  All of the other fields (including bib number - more on that later) were calculated.

Next up, how the arithmetic works.

The right thing to say is "Thank you"

Rob Loveman surely deserves some sort of recognition for outstanding achievement in grudge-holding.  His most recent blog post on the subject of race volunteers is truly unfortunate, asserting that if a volunteer is unwilling or unable to stay out on the trail for as long as it takes the last musher to complete the race, they aren't volunteers, they're spectators.

So, a few points:

  1. It is often extremely difficult for people to deal with uncertainty in their schedules.  They have families, jobs, and commitments outside the race.  Even small races run on volunteers and part of treating them well includes being able to tell them when and where they're needed
  2. Rob likes to remind us that he has a Ph.D. in physics.  Consequently I'm a little unclear on why he's not addressing extreme cases, limits, constraints, etc.  Presumably most would agree that it's not reasonable to keep a checkpoint open for six months.  If there are cases where it's not reasonable to keep a checkpoint open, there must be some sort of border between reasonable and unreasonable.  Is three weeks unreasonable?  Two weeks?  Eight weeks?  He doesn't say, so we're left with not being sure whether or not he does think there are limits
  3. Extrapolating from a local 300-mile race on the road system to 1000-mile races in the bush just doesn't work, not only because the former poses fewer logistical challenges over time and distance, but also because a really slow team in a mid-distance race may take an extra day.  A really slow team in a 1000-mile race may take an extra week.  These quantities are not comparable in terms of impacts
  4. Keeping checkpoints open costs money, particularly remote checkpoints with access only by air (or snow machine).  Keeping generators running, telecommunications, food, cooking fuel, etc.  Not sure where the money for that is going to come from.  
Volunteers make it possible to hold our races.  If someone can be there two weeks but not three, we not only should be grateful but most of us are grateful.  If someone can be there two weeks but not three, they're still an enormous help to the race organization.  If someone's willing to sit out at Eagle River for two weeks but not three, they're still a hero.  We need to say "Thank you" for whatever someone can give of themselves, and we need to respect their constraints and treat them well by being clear about what we're asking of them.

[edited to add: A highly competitive musher who'd prefer to stay anonymous commented "if a musher is not training hard enough and prepared to run with the race pack then they are a camper, not a racer. (directed specifically at Rob)." And we see a fair amount of that in the Iditarod, which always attracts more than its share of the top dog teams in distance racing, but which always also attracts bucket-listers, people who aren't necessarily mushers but want an adventure. Asking volunteers and the race organization to keep checkpoints open for campers seems wrong.]

Thursday, February 14, 2013

Yukon Quest


One of the things I love about living in interior Alaska is waking up every day knowing that something amazing can happen before day's end.  Seeing arctic grayling spawning in the Chena, a couple of moose cows posturing at each other and negotiating their own space across a pond, firewood shattering at -50F, northern lights filling the sky, even small moments when you suddenly notice that you don't hear anything but the wind through the trees.

Living in Two Rivers, the Yukon Quest runs through my backyard and I've seen nearly all of this year's teams come through.  Each one of those was one of those amazing moments, when you know you're seeing something rare and beautiful and extraordinary.  It's not a small thing that these dog teams were coming in from 1000 miles across the Yukon and Alaska looking strong and happy, but it certainly goes well beyond that, to the team and the experience they've just had.


So, many thanks to the mushers, their incredible dogs, the Quest professional and volunteer staffs, the veterinarians, the cooks, the handlers, and the entire organization for giving us another extraordinary event and not a few transcendant moments.



Saturday, February 9, 2013

A Monday finish?

I'm trying to get a handle on when to expect the leaders to show up in the Fairbanks area.  Taking a look at past results, it seems like it's generally been taking the leaders roughly two days to get from the Circle City checkpoint to the finish in Fairbanks.  That could put the winner into Fairbanks mid-day on Monday.  We had a local fun run in Two Rivers today over the same trail that's going to bring the Quest mushers down alongside the Chena, and the trail is in very good shape.  The winning teams in the Hamburger run had surprisingly fast times.  Right now it looks like the trail should hold up and the currently too-warm temperatures should drop, which is good news for the Quest teams.  In the meantime the Quest lead is being very, very closely contested, and that is likely to keep the overall speed up.

So, if the Quest winner arrives mid-day on Monday, will this be a record?  Possibly, at least in the Whitehorse to Fairbanks direction, which is widely acknowledged as being more difficult than the Fairbanks to Whitehorse direction.  The previous record for this direction belongs to Lance Mackey, who won the 2007 Quest in 10 days, 2 hours, and 27 minutes.  Unless something really surprising happens this is almost certainly going to fall.  The overall record belongs to Hans Gatt, who finished in 2010 in 9 days and 26 minutes.  I think this could possibly fall but right now it seems kind of unlikely.

As of this writing it's very hard to say who's going to win.  Hugh is in front but Allen is eating away at his lead, and in the meantime we're seeing tweets from the Fairbanks Daily News-Miner that Brent is planning on overtaking the leaders on the uphill side of Eagle Summit.  As the tweet says, "Yowza!"  But if anybody can do that it's certainly the ridiculously athletic Brent Sass.

Friday, February 8, 2013

That's kind of interesting




So here's a story from the race flow chart: Scott Smith (left-most purple line) has been cutting rest to move ahead.  He got within hailing distance of Brent Sass (left-most green line), who peeled out of Trout Creek and is putting some distance between him and Scott, while appearing to be gaining on Jake (red line).  I'm really curious to see what happens next!

Thursday, February 7, 2013

A bit more on the race flow chart

If you look at the Trackleaders map for the Quest, it's pretty easy to see how far apart two teams are, at least in theory.  You can look at the map, click on the icon for the team, and a window pops up that tells you, among other things, the trail miles for that team.  For example,



So, you can get the trail miles for each team, take the difference, and know how far they are apart.

But it's got its limitations.  One is that it's pretty easy to overlook the timestamp on the "breadcrumb" (GPS reading), and if it's more than 20 minutes old or so it can be misleading (especially if you're comparing two teams with timestamps of much different ages).  The other issue is related to the first, and that's that you don't get much sense of the race flow, or the dynamics of teams moving down the trail and their relationship to each other.  This is one of the reasons that I like the race flow chart - it tells the story of the race rather than just showing you static points on a map.  So right now (8:25 AKST, Thursday evening), we can see where Hugh and Allen are but the race flow chart can tell us more about how their relationship is changing as they travel.  And you can nail it down pretty easily.  Here's an example:



In this screen grab, I've drawn black horizontal lines for Hugh and Allen at two different times, the first at race hour 126 (the x-axis on the chart is hours since the race start, and y-axis on the chart is trail mile) and the second at hour 129.  A horizontal line across a given time will intersect the y-axis at a trail mile.  So, at hour 126 it looks like Allen is roughly at mile 511, and Hugh is at mile 518.5 (that level of granularity is more obvious when you scroll in more closely).  So, there's about 7.5 miles between them.  If you look at where they are at hour 129, you can eyeball it and see that the distance between them has closed a bit, but you can be a bit more exact by seeing what mile they're at.  It looks like Allen is at about mile 534 and Hugh is at about mile 540.  In other words, the distance between them is now roughly six miles, and the gap has closed by about 1.5 miles over three hours, or about .5 mph.

And you can play with it further - project the lines to see where they'd cross, etc.  But mostly it's a terrific tool for illustrating that the teams aren't just points on a map - they're moving through time and space, and where they are in relationship to each other is constantly changing.

Wednesday, February 6, 2013

A few notes

While Yukon Quest teams continue to pull into Dawson I thought I'd post a few quick notes:


  • Trackleaders also has a topo option for a map layer.  It's my preferred choice for looking at maps because of the detail it provides (including place names!), and because it gives a much clearer sense of the trail's topography than the terrain map display does.  There's also a satellite option and that can be interesting, but topo maps are the ticket.  Here's an example.  Note that the place to choose your map layer is a drop-down menu in the upper right-hand corner.



  • Among the tools the tracking software gives us to be able to get a better visual handle on what's happening on the trail are speed/distance and speed/time plots on the individual mushers' pages.  They're down towards the bottom.  I haven't found the speed/distance plot to be that interesting for dogsled races (although I'll bet it's really interesting for bicycle racing, etc.), but I like the speed/time plot a lot.  It can give you a very graphic image of how often and where teams are resting, and by comparing them you can get a quick grasp of who's resting more and who's resting less.  Here's Dan Kaduce's, shortly after he arrived in Dawson, and Hugh Neff's from about the same time.  Interesting, right?
Dan Kaduce


Hugh Neff

  • I've gotten a couple of questions about the tracker leaderboard.  The first thing to understand is that the times being shown aren't time of day (clock time), they're elapsed time since the start of the race, Saturday at 11.  So, for example, the tracker leaderboard shows Allen getting into Scroggie at 2:11:59.  Doing the arithmetic, that's Monday (two days after the start) at 22 hours (11:00am + 11 hours) and 59 minutes, or Monday at 10:59pm.  If you take a look at the Quest official times (and they are absolutely authoritative), they give his time into Scroggie as 10:47pm.  The tracking software figures out whether or not somebody's arrived by measuring their proximity to a particular geographic coordinate.  It's a pretty safe bet that the latitude/longitude for the real checkpoint location is going to be different from the ones the tracker has by some tens of feet.  You may have noticed that the Trackleaders leaderboard doesn't show anyone as having arrived in Dawson, and projects that Hugh will be arriving there Wednesday at 3:23pm, or just about 10 minutes from now as I'm writing this (he got in over 24 hours ago).  I expect to see that keep being bumped back until Hugh's sled finally gets close enough to the location they've identified as the checkpoint that it will register as having arrived.  Seriously, the Trackleaders leaderboard: no, don't use it, and ignore their projections.


Monday, February 4, 2013

Tracker names

For people who enjoy this sort of thing, this is the sort of thing they'll enjoy.  Here's the complete list of tracker names and who's carrying them:



MusherTracker
WestAster
MooreBee
SassBuck
WilmshurstChip
TraverseCook
StratheCosmo
StuderCrux
KaduceEcho
LeeEnvy
DaltonFern
TremblayGas
BergenGump
AbrahamsonHawk
HopkinsHunt
NeffIris
BerkowitzIvy
GriffinJill
MackeyKid
IngebretsenLily
FailorLotus
PedersenMoby
CasavantNeon
MacKenzieOkra
CookeOlive
SmithOpus
RoganPaco

Allen Moore is fast

Day 3 of the Yukon Quest, and Allen Moore is consistently running the fastest times between checkpoints.  He has, in fact, been the fastest into each checkpoint so far (spreadsheet here).  You can also see it in the race flow chart if you look carefully.  So how's Hugh keeping up?  Well, for one thing he's spending far less time in checkpoints.  He only stopped briefly in Braeburn and took his mandatory 4 in Carmacks, while Allen took his mandatory 4 in Braeburn but also spent nearly 5 hours in Carmacks.  Hugh spent only 21 minutes in Pelly and Allen appears to have spent about 7 hours there (the website leaderboard isn't updated, but we can see his movement on the tracker).

So the obvious question is: is Hugh spending a lot of time resting on the trail?  It would account for his slower runtimes between checkpoints.  Unfortunately he's having a lot of tracker outages so it's hard to say for sure.  While when his tracker's working it shows him moving consistently, there are enough outages to think we're not seeing some stops.  On the other hand his longest outage is only three hours, so even if he's camping during the outages he's not camping much.

Speaking of trackers, I just learned yesterday that Trackleaders names their devices.  For example, Rob Cooke's tracker is named "Olive," and apparently she's a good, reliable performer (and we've seen relatively little outage from his tracker).  Trackleaders has a good working relationship with the SPOT manufacturers and were able to get more reliable, upgraded GPS chips installed in the existing SPOT 2 chassis, but there still have been a lot of outages so far during this race.  Trackleaders and the race management are working hard to figure out what's causing them.  There are several ideas about it but so far the best guess seems to be the metal mounting plates that were attached to sleds at the vet checks.

Saturday, February 2, 2013

A little about the "Race Flow Chart"

The Yukon Quest started about six hours ago, and in a 1000-mile race that means that it's too early to know much about much.  But, I'm seeing some discussion on Facebook that makes me think that it might be helpful to explain a little about Trackleaders' "Race Flow" plot.

One of the things I really like a lot about Trackleaders is that they want to do more than just show you static positions on a map.  Those are definitely useful but they don't capture the general sense of what's going on.  I love using numbers to explain things but I am not a mathematician.  Numbers need to describe something going on in the world - they need to have descriptive power.  And I really think that the race flow plot does.  It's a really, really simple idea but it is incredibly elegant.  It shows the dynamics of the race

Here's the basic idea:  They have a two-dimensional graph.  The x-axis (the horizontal one) shows the race time (the number of hours since the race started.  The y-axis (the vertical one) shows the number of miles along the trail someone has traveled.  So, when they get a new tracker update for a given musher, they put a dot on the plot showing how many miles the team was along the trail at that point in time.  When you connect the dogs with a line, that line has slope - the steeper the slope, the faster the team was traveling, and the more level the slope, the slower the team was traveling.  If it's completely level, the team has stopped.

Here's a screen capture about 6.5 hours into the race:




You'll see that the lines are basically parallel, which shouldn't be surprising, especially not this early in the race.  You can take a look at the plot and by looking where the lines are, you can see things like that at about hour 5.5, Hugh was around mile 52 and Lance was around mile 49.  Similarly, you can see that Hugh passed mile 40 around 4 hours and 10 minutes into the race and Lance passed it around 4 hours and 35 minutes into the race.  You can get a sense of how far apart they are by measuring the distance between them on the y-axis (miles) at a given time.

So that's all great, but what really interests me are the stories being told here.  You can see that someone is catching up with someone else when their lines are converging (for example, Normand and Allen at about hour 6.5).  You can see when they're falling behind as their lines get further apart (for example, Randy Mackenzie falling behind Markus Ingebritsen at about 6 hours and 15 minutes).  You can see passes when their lines cross (Kelley Griffin passing Randy at about 4:15 into the race).  You can see who's traveling together (their lines are basically on top of each other, like Susie Rogan and Allen from about 5:45 into the race), and so on.  A very nice thing is that you can make some projections.  In this screen grab you can see that Hugh's tracker has been updating but Lance's hasn't (the most recent "breadcrumb" was received shortly before 6 hours into the race, while Hugh's updated about 6 hours and 40 minutes into the race.  But this plot allows us to get a handle on where they're likely to be in relation to each other, by extending the lines for each musher on the same slope the line's currently on.  From here, it looks like Lance is gaining on Hugh, but very, very slowly.  So, even without recent tracker updates you can get a rough handle on where they might be in relation to each other.

It would be fantastic if Trackleaders would extend the plot to allow us to choose which mushers to show, but as it stands it's still my favorite tool for getting a quick handle on how the race is going.

Yukon Quest has started

Hands-down my favorite dogsled race, and easily my favorite event in Alaska.  There are a lot of people I'd be very happy to see win this year, and some others who are likely to run further back but who I wish a really fantastic race.

Anyway, I'd have thought that only two hours into the race there wouldn't be anything to say, but as it turns out there isn't much to say but here are a few observations:

  • Trackleaders' "race flow chart" (one of the best tools in Trackleaders' analytic arsenal) shows Hugh traveling consistently faster than the others.  That's worked well for him in the past but more often has not worked well at all.  It's safe to predict that he'd go out fast and less safe to predict whether or not it will work for him.  Here's a screen grab of the race flow chart at about 1:15 Yukon time:
  • There's been talk of the Quest using spreadsheets this year to maintain their race data.  I hope that's the case and I especially hope that they've got them set up to do the speed/time/etc. calculations for them to save them a lot of time and effort
  • The Fairbanks Daily News-Miner printed an article on "What You Need to Know about the 2013 Yukon Quest."  In it, the reporter says "For fans tracking teams on the Internet, that means the standings after the Carmacks checkpoint will provide an accurate picture of who is leading the race."  A couple of points about that: 
    • Some people who are serious contenders are going to take it easy at least until Dawson City and may be hanging in the middle of the pack, or towards the back of the front of the pack depending on how things play out.  If Hugh's dogs aren't looking that great there may be a temptation to let him flame out.  At any rate I'm not sure that the standings at Carmacks have much predictive power.
    • We know what the start differentials are, so you don't need to have them actually taken in order to be able to figure out their impact on standings.
Best wishes to the mushers, the dogs, the handlers, the race officials, vets, photographers, press, Quest professional staff, trailbreakers, volunteers - everybody doing the hard work to put this epic northern event together.  

Friday, February 1, 2013

Following the Quest on Twitter

I've gotten a couple of questions about how to follow the Quest on Twitter, and in particular, who to follow.  There are a few Twitter users worth following, to be sure, but there's also a broader strategy, using search.

Starting with the "Who to follow" question, there are a few generating original content and a few who just retweet everything coming from the few generating original content.  Even for the few who are creating original content it will tend to be redundant with what shows up on their Facebook page (social media, sigh).  That said, Twitter users to follow include @theyukonquest, @EmilySchwing, @Whitehorse_Star, and @FDNMQuest.  The first in that list is the official race Twitter account, and the others are from the news media.  Let me know if you hear of handlers or race fans tweeting from the trail or checkpoints.

That said, you can scrape up a lot of Twitter activity about the Quest without following anybody, by using Twitter's search function, either on their search page or from a decent Twitter client.  Note that a lot of Twitter clients don't support search, and if yours doesn't you may want to use the Twitter search page or find another client.

If you go to Twitter's search page, you'll see this:


As simple as it gets - a text entry box with a search button.  If you enter just Yukon Quest you'll get a list of recent tweets (newest first) containing the string "Yukon Quest" (note while Twitter cares whether or not the search operators are upper-case or lower-case [or mixed-case]), the search itself is case-independent and you'll be fine with yukon quest, Yukon Quest, or yUkOn QuEsT):


The wider right-hand column shows you the tweets, while on the left it shows you who's tweeting about the Yukon Quest, as well as photos and videos related to your search.  If tweets arrive while you've got the search running you'll get a notification at the top of the search results panel, like this:



Click it, and you'll be shown the new tweet or tweets.

However, if you just search on "Yukon Quest" you'll probably be missing some interesting tweets.  With only 140 characters in a tweet, tweeters will leave stuff out.  You could easily miss something along the lines of "Someone just arrived in Braeburn with a broken runner."  One thing that can help is to do another search, one on the #yukonquest hashtag.  

You can combine the two searches into one, getting the search results in one place, by using the OR logical operator.  Note that "OR" needs to be entirely in upper case for Twitter searches, to distinguish it from a search on the term "or".  So, change your search string to:

yukon quest OR #yukonquest

and you'll find that you're getting more hits.

Personally, I prefer to use a good Twitter client that allows me to follow more than one search at a time, some about mushing and some about technology.  I like Tweetdeck, since it runs on every platform I'm interested in, supports a variety of social networking sites, and has a powerful set of features.



You can easily add a separate search column for the Yukon Quest:  


Click on the "+" sign in the upper left-hand corner and a search box will open up:


Enter your search terms (Yukon Quest OR #yukonquest), and Tweetdeck will create a new search column for you, checking for tweets related to the Quest and showing you the new ones as they are posted.

[Edited to add that it looks like some people are using the hashtag #yq2013, so you may want a search string that contains it: yukon quest OR #yukonquest OR #yq2013