Saturday, January 11, 2014

The relationship between speed and bib number

The Copper Basin 300 is underway, with nearly all teams having arrived at the first checkpoint.  It looks like the race organization is doing an excellent job, and while most people don't care about this sort of thing I'm grateful that their race data are in a form that requires nearly no massaging to be useful for analytical purposes.

Anyway, one of the things that's come up is that people on the CB300 Facebook page have expressed surprise that Nic Petit is as fast as he is, and there's been a general sense that it has something to do with trail conditions.  That trails deteriorate with traffic is well-known and just a given, but I was surprised that they were surprised that someone whose nickname is "Quick Nic," who was Iditarod rookie of the year in 2011 and who took 6th in last year's Iditarod, would be fast.  So, I decided to take a closer look at whether or not the numbers support their argument.

I use the R statistical package for analysis.  R is an open source, free, and widely-used statistical tool, and a follow-on to Bell Labs's S package (I know, right?).  I've been running it inside the R Studio development environment, which packages a bunch of tools together in an easy-to-use manner that really boosts productivity.

So, the first thing I did was plot speed against bib number:

Eyeballing it, it certainly looked like there was a negative correlation: that is to say, low bib numbers tended to have higher speeds than high bib numbers.  Did the actual numbers support it?  As it turns out they did, with a Pearson correlation coefficient (r) of -.43 with a probability P of .0037.  That's both fairly highly-correlated and unlikely to be the result of random fluctuation.

Then I decided to re-run it without Nic, since it's a coincidence that "Quick Nic" drew bib #3 and that could have impacted the results quite a bit.  Without Nic we ended up with a correlation coefficient of -.38 with a P of .0119, which is not quite as strong but is still present.

I have a pile of deadlines hanging over me and some undone work, but when I get a moment I'd like to look at some other races (for example, last weekend's Knik 200, where fast mushers were more evenly distributed through the bib number space) and also take a look at whether or not the correlation we're seeing here deteriorates over the course of the race.  I expect it to be close to 0, but my intuitions can be very, very wrong.  In the meantime, this is kind of interesting.


  1. Interesting investigation, Melinda. Looking forward to see what the analysis of other races reveals.