It's looking like this year's Yukon Quest has a pretty good field of entries, and with fall training well underway in interior Alaska we're all starting to speculate about how the race is going to go this year. It's only natural to look at past races, so I've started poking at the 2013 data, the last year the race was run in the Whitehorse-to-Fairbanks direction. I'm also interested in having a baseline set of data to which this year's race can be compared, once it's underway.
So, I've taken my own spreadsheets from 2013 and used them as a basis for running some numbers. In particular, I've created a spreadsheet containing runtime 2013 checkpoint data and extracted runtime summary data to get some basic descriptive statistics: fastest, slowest, mean, median, 1st quartile, and 3rd quartile for each race segment (between checkpoints). For example, between Braeburn and Carmacks, I've got a table that looks like this:
You'll note that I've also calculated the ratio between the fastest and slowest runtimes; there may be something interesting there to look at later. I've also plotted all runtimes as a histogram:
Again, this is largely to create a baseline dataset for comparison with this year.
However, I also ran some correlations, and while the results are obvious if you think about them for a few seconds, I haven't ever seen anybody say so explicitly:
The ranking of runtimes on longer race segments (more miles between checkpoints) tends to correlate more strongly with final standings, at least in this data set. That is to say, people who had the faster runtimes between checkpoints which are far apart tended to finish better than people with slower runtimes. Some of this is tautological (the longer runs are a greater percentage of the total race), some of it might (I haven't looked at this) be because on long runs everybody has to camp, even people who would otherwise prefer to rest at checkpoints, or because longer runs smooth out the variability you might see in shorter runs (law of large numbers, sort of).
Here's a plot of the the correlations between segment runtime and finishing position, against segment distance.
The correlations are on the "correlations" worksheet in the spreadsheet. I'm using a standard Pearson product-moment correlation coefficient (r), which is not the best test for a complete dataset but is adequate for exploring these data. I'm not posting the numbers here in the interest of not having reader eyes glaze over, but definitely feel free to visit the spreadsheet, poke through the data, copy the spreadsheets, and ask questions.
I'm planning on doing something similar with 2011 data, as well as years run in the opposite direction, to see how well what we're seeing in 2013 holds up. Unfortunately getting the data into the spreadsheet and clean enough to use is pretty labor-intensive, so it might be some while before a follow-up to this post. But, once the data are in a spreadsheet there's a lot we can do with them, so there's incentive to do it beyond answering just these questions.
Here's looking forward to a great winter of distance dogsled racing!