Today sign-ups for two of the most popular mid-distance races in Alaska, the Copper Basin 300 and the Knik 200, opened. Because they are so popular and tend to become oversubscribed there's a crush of entries as soon as they open. Unfortunately there have been technology failures for both races, and mushers don't know whether or not they're in. Copper Basin has responded by providing alternative mechanisms for entering, and the Knik is, so far, sticking with web-based race entry.
We've got a few goals here at Mushing Tech. Among them is that we'd really like to make life easier for volunteers - without them there's no race, and being a race volunteer can be exhausting and stressful. I have very little doubt that the Copper Basin and Knik volunteers handling race entries are extremely stressed right now. Additionally, a number of mushers are questioning the fairness of the entry systems. Towards the bottom of this post I'll discuss the fairness question in more detail (particularly sources of statistical unfairness) but I think the more important question is what can be done to make everybody's lives a little easier, reduce stress levels, and reduce both the likelihood and the impact of technology failures.
I think the number one thing that can be done to improve race registrations would be to move away from the current land rush model, where everybody's trying to get their entry in early. Using this model creates a load on the race registration system that unintentionally increases the likelihood of technology failure. If races were instead to announce that they'd be doing random drawings on a given date, mushers would be able to enter at their leisure by that date, use varying technologies to submit their entry application (email, postal mail, etc.), and would be less likely to feel disadvantaged if they have only very slow or intermittent connectivity. There are some excellent sources of randomness available both on our own computers and online (see, for example, this) which can be used to draw from the pool of entries. Races can set aside some slots for veterans or for people running qualifiers and do random draws for everybody else, etc. The main thing is to remove the pressures associated with getting an entry form in before everybody else.
Copper Basin has responded to their web outage by allowing mushers to enter by phone (voicemail) or email, and then the race organization will rely on timestamps to determine entry order. This has a number of problems with it, including that there's an excellent chance that the clocks on their computers and voicemail are not synchronized. Depending on whether or not the voicemail system is run by the phone company or is a home answering machine, it's possible that there's what's called in computing "head of line" blocking, where a queue is blocked on action on the head of the line, and queue members can only be serviced one at a time, unlike email, which can be received in parallel. Plus, voicemail is slower, both because of connection latency (the phone has to ring and be answered) and because it's just slower to leave a voicemail than it is to send a piece of email. So, we suspect that people who used voicemail to register were statistically disadvantaged relative to people who used email. However, people who live remotely and use satellite internet were disadvantaged relative to people who use DSL or cable for their internet connectivity, since the round-trip latency for satellite is at least 1/2 second (laws of physics!) and typically much higher. However, unlike the voicemail situation we'd be a bit surprised if delay associated with satellite internet had much of an impact on signup order. Similarly, if the issues with the Knik website are failures to establish a TCP connection to the race's web server, those who can retry quickly have an advantage over those with slow or high-latency connectivity.
At any rate, the goal here is to keep everybody happy: volunteers should not be stressed and mushers should feel that they're being treated fairly. We suspect that the current registration model doesn't really lend itself to that goal, and that a deadline model has a number of advantages over a land rush model.
Saturday, October 3, 2015
Thursday, July 30, 2015
How to access (and copy!) race spreadsheets
Hi, all!
I thought it might be useful to produce a video showing how to access race spreadsheets. I know that many of you are curious about what might be shown in data from past races and have questions that you'd like to answer, but haven't had the data to do it. Data acquisition is quite labor-intensive and not very much fun, so not there's not that much data out there, and what there is is often in, well, suboptimal shape. My race spreadsheets are in Google Drive and so are available on the web to anybody who's interested in taking a look at them, making copies, doing their own analyses, and so on.
So, I've made a video showing how to access them and how to make copies for your own use. I haven't figured out how to put clickable links in Youtube videos, but the folder containing the spreadsheets is here.
Have fun with it, let me know if you've got questions, and especially let me know if you find anything interesting! Future videos will show how to do simple calculations with the spreadsheets, easy plots (graphs), and so on.
I thought it might be useful to produce a video showing how to access race spreadsheets. I know that many of you are curious about what might be shown in data from past races and have questions that you'd like to answer, but haven't had the data to do it. Data acquisition is quite labor-intensive and not very much fun, so not there's not that much data out there, and what there is is often in, well, suboptimal shape. My race spreadsheets are in Google Drive and so are available on the web to anybody who's interested in taking a look at them, making copies, doing their own analyses, and so on.
So, I've made a video showing how to access them and how to make copies for your own use. I haven't figured out how to put clickable links in Youtube videos, but the folder containing the spreadsheets is here.
Have fun with it, let me know if you've got questions, and especially let me know if you find anything interesting! Future videos will show how to do simple calculations with the spreadsheets, easy plots (graphs), and so on.
Saturday, July 25, 2015
Doing data
I had big plans for the summer - getting some code written, some analyses written up, and getting the new Mushing Tech website up and running. Well, with one thing and another none of that has happened, but I thought it might be worthwhile to post something about things that people who are interested in a closer look at distance mushing race data can do for themselves.
Very broadly speaking, looking at the data involves two separate but related steps: 1) acquiring the data, and 2) doing the analysis.
Data acquisition and cleaning is a laborious grind. Right now race data tend to be in pretty shabby shape. They are generally available as as web pages, with no two races using the same format (even those using my spreadsheets have tweaked them for their own use, which is great but does increase the effort to take the data apart). Mostly they're available as web pages, which means scraping the data and converting it into a format that you can use with analytical tools. Worse, both major races have some major database errors - Iditarod's archived races have badly broken checkpoint links, and the Yukon Quest's archived races have screwed up bib numbers.
I've been pulling race data into my own spreadsheets using two mechanisms. For races underway, I manually enter data into live spreadsheets that I keep in Google Sheets. I've been bit-by-bit converting old race data by using table scraping tools available as browser extensions (for example, DataMiner for Chrome, and the extremely fabulously wonderful TableTools2 for Firefox). So, I do have a collection of race data available in spreadsheet form, here. Please feel free to use them yourselves, copy them, and so on. They are released under the BSD 3-clause license, so please respect that and provide attribution if you do use them.
Another source of data is the Trackleaders race tracking archive, and this is the really interesting one. For each race and each musher being tracked, the speed/time plot source data is actually embedded in the web page, and you can pull it out if you know where to look. I've got a Python program that does that and calculates rest schedules from it, here. Please feel free to grab a copy and tweak it for your own use (again, it's released under the BSD 3-clause license - it's yours to use as you wish at no cost other than providing attribution).
Okay, so that's the data side of things, but how do you learn how to look at data? Many people I've talked with have questions they want to ask about the data but aren't sure how to go about answering them.
One way is to ask someone with a stats background, but a more Alaskan approach might be to learn to do it yourself. There are a couple of possibilities at low or no cost.
Data science is a booming field right now and the educational resources are really impressive. There are a lot of new books, many of which require minimal technical background. O'Reilly has a remarkable collection in both print and ebook form. For example, Cathy O'Neill's "Doing Data Science" is a terrific book that gives you an overview of the approaches you can take to looking at data, while Joel Grus's "Data Science From Scratch" provides a hands-on introduction to a variety of analytical techniques while providing a grounding in Python programming (if you're already a Python programmer you'll probably want to use an existing data science module, like scikit-learn).
But, if you've got a computer and an internet connection, a better choice might be to take an online course at one of the MOOCs ("Massive Open Online Course"). Data science and introductory statistics courses are everywhere, and they provide an opportunity to learn new skills with as much or as little of a commitment as works for you. You can take one and do all the exercises and take all the quizzes (and earn a certificate), or just watch the lecture videos - it's all up to you.
Coursera features courses prepared by faculty at major universities and the quality is extremely high. As an example, take a look at the courses in the Data Science Specialization offered by Johns Hopkins University. They offer everything from a very high-level overview to a class on regression models. ("Getting and cleaning data" might be a good one for people interested in looking more closely at mushing data!)
Udacity is another popular MOOC site. It's more oriented towards practitioners and classes tend to be less consistent in quality than Coursera, but the are learn-at-your-own-pace with no deadlines for homework or quizzes, and some of the classes were developed by companies like AT&T and Google. They also have a large data science category, with some excellent introductory classes like their Introduction to Descriptive Statistics, as well as classes that might give you some idea about how to approach looking at mushing data, like their "Data Analysis With R" class.
And as I mentioned, I'll be making a few videos showing how to work with Google spreadsheets to look at data, dealing with questions like how to do arithmetic on dates and times, and how to do some simple summary statistics.
But summer passes quickly in interior Alaska. Yukon Quest sign-ups are in a week, and we should be able to start training the dogs regularly in about a month. It's starting to get dark at night and I think we're all looking forward to seeing the aurora again. But, in the meantime, there's summer to enjoy, fish to catch, projects to finish, and friends to visit. Have a great rest of your summer, and watch this space!
Very broadly speaking, looking at the data involves two separate but related steps: 1) acquiring the data, and 2) doing the analysis.
Data acquisition and cleaning is a laborious grind. Right now race data tend to be in pretty shabby shape. They are generally available as as web pages, with no two races using the same format (even those using my spreadsheets have tweaked them for their own use, which is great but does increase the effort to take the data apart). Mostly they're available as web pages, which means scraping the data and converting it into a format that you can use with analytical tools. Worse, both major races have some major database errors - Iditarod's archived races have badly broken checkpoint links, and the Yukon Quest's archived races have screwed up bib numbers.
I've been pulling race data into my own spreadsheets using two mechanisms. For races underway, I manually enter data into live spreadsheets that I keep in Google Sheets. I've been bit-by-bit converting old race data by using table scraping tools available as browser extensions (for example, DataMiner for Chrome, and the extremely fabulously wonderful TableTools2 for Firefox). So, I do have a collection of race data available in spreadsheet form, here. Please feel free to use them yourselves, copy them, and so on. They are released under the BSD 3-clause license, so please respect that and provide attribution if you do use them.
Another source of data is the Trackleaders race tracking archive, and this is the really interesting one. For each race and each musher being tracked, the speed/time plot source data is actually embedded in the web page, and you can pull it out if you know where to look. I've got a Python program that does that and calculates rest schedules from it, here. Please feel free to grab a copy and tweak it for your own use (again, it's released under the BSD 3-clause license - it's yours to use as you wish at no cost other than providing attribution).
Okay, so that's the data side of things, but how do you learn how to look at data? Many people I've talked with have questions they want to ask about the data but aren't sure how to go about answering them.
One way is to ask someone with a stats background, but a more Alaskan approach might be to learn to do it yourself. There are a couple of possibilities at low or no cost.
Data science is a booming field right now and the educational resources are really impressive. There are a lot of new books, many of which require minimal technical background. O'Reilly has a remarkable collection in both print and ebook form. For example, Cathy O'Neill's "Doing Data Science" is a terrific book that gives you an overview of the approaches you can take to looking at data, while Joel Grus's "Data Science From Scratch" provides a hands-on introduction to a variety of analytical techniques while providing a grounding in Python programming (if you're already a Python programmer you'll probably want to use an existing data science module, like scikit-learn).
But, if you've got a computer and an internet connection, a better choice might be to take an online course at one of the MOOCs ("Massive Open Online Course"). Data science and introductory statistics courses are everywhere, and they provide an opportunity to learn new skills with as much or as little of a commitment as works for you. You can take one and do all the exercises and take all the quizzes (and earn a certificate), or just watch the lecture videos - it's all up to you.
Coursera features courses prepared by faculty at major universities and the quality is extremely high. As an example, take a look at the courses in the Data Science Specialization offered by Johns Hopkins University. They offer everything from a very high-level overview to a class on regression models. ("Getting and cleaning data" might be a good one for people interested in looking more closely at mushing data!)
Udacity is another popular MOOC site. It's more oriented towards practitioners and classes tend to be less consistent in quality than Coursera, but the are learn-at-your-own-pace with no deadlines for homework or quizzes, and some of the classes were developed by companies like AT&T and Google. They also have a large data science category, with some excellent introductory classes like their Introduction to Descriptive Statistics, as well as classes that might give you some idea about how to approach looking at mushing data, like their "Data Analysis With R" class.
And as I mentioned, I'll be making a few videos showing how to work with Google spreadsheets to look at data, dealing with questions like how to do arithmetic on dates and times, and how to do some simple summary statistics.
But summer passes quickly in interior Alaska. Yukon Quest sign-ups are in a week, and we should be able to start training the dogs regularly in about a month. It's starting to get dark at night and I think we're all looking forward to seeing the aurora again. But, in the meantime, there's summer to enjoy, fish to catch, projects to finish, and friends to visit. Have a great rest of your summer, and watch this space!
Sunday, March 22, 2015
About that Iditarod post concerning Siberians
Over the years I've been a bit baffled by Iditarod fans being so enthusiastic on the one hand but knowing so little about the sport on the other. This year it finally dawned on me that the likely reasons are: 1) many of the fans are fans of the race, not of distance mushing more generally, and 2) if you rely on Iditarod for most of your information you're going to find yourself in the weeds fairly often. Yesterday's "Eye on the Trail" post on Siberians and Siberian mushers was a pretty stellar example of the latter.
There were a number of smaller problems (misspelling Yvonne's name, identifying Lev as a purebred musher) and some enormous ones. Failure to mention Lisbet Norris, who finished earlier the same day, is the scion of the oldest Siberian Husky kennel in the world and one that's been incredibly influential, and whose grandfather ran the Iditarod with dogs from the same kennel back in the 1980s, is probably the most glaring error in judgment, but a nod absolutely needs to have been given to Isabelle Travadon, who also ran a purebred team and who did a very creditable job when things got difficult under circumstances that led others to scratch.
Also: YES, Rob Cooke became the first Siberian musher to finish both the Quest and the Iditarod in the same year. He did it with substantially the same dogs in both races. Rob is a friend and we are so proud of him we could bust. This is a big deal, and the author of that blog post should have known it. I think it's also worth mentioning that Rob did not have a good Quest, that it started out badly and he worked through it, solved his problems and got his team to the finish. That, I think, is a huge deal and speaks to the kind of dog man he is, which brings me to my real beef with the Iditarod post.
Shortly after I moved up to Two Rivers and before Chris arrived I needed to travel for work, so I boarded my eight (!) (at the time - I'm up to 20) Siberians at a kennel in the neighborhood. The fellow who owned the place was chatting and said "I used to run Siberians," so I said "oh?," curious to see where this was going to go. He went on to say "but Siberians have too much sense of self-preservation. They'll sit down on you. Alaskans will just go until they drop." That was a bit of an exaggeration but not completely. Siberians will leave a little in the tank (okay, sometimes a lot in the tank) when calling a time-out, and it takes a certain kind of musher to successfully run a Siberian team 1000 miles. It is not a coincidence that Mike Ellis in particular but also increasingly Rob Cooke and some other up-and-coming purebred mushers are known for exceptional dog care. The people who successfully run Siberians in 1000-mile races tend to be very fine dog people, in part because they have to be.
At this point it should not be a secret that the two things that what people who don't know better say about Siberians, that they're slow and they're pretty, annoy a lot of Siberian mushers. Siberians have some traits that are highly valued in sled dogs: they have excellent feet, they're easy keepers, and they do extremely well in genuinely frigid conditions. They're tough dogs, but with an enthusiasm a friend describes as "joie de husky." And yes, they're pretty, but focusing on that is a bit like saying "What a pretty face!" or giving someone a Miss Congeniality award. It's a bit condescending and it's failing to acknowledge their qualities specifically as sled dogs. These are great dogs and tricky dogs and they're often being driven by great -- and underappreciated -- mushers.
I think this is a really great time for working Siberians and that it's just getting better, and I'm excited to see more purebred teams running Quest. In the meantime, when you see a purebred team finishing a 1000-mile race looking happy and ready for more, do not say "What pretty dogs." Instead, say "What a fine, fine dog musher. And wow, those dogs are pretty."
There were a number of smaller problems (misspelling Yvonne's name, identifying Lev as a purebred musher) and some enormous ones. Failure to mention Lisbet Norris, who finished earlier the same day, is the scion of the oldest Siberian Husky kennel in the world and one that's been incredibly influential, and whose grandfather ran the Iditarod with dogs from the same kennel back in the 1980s, is probably the most glaring error in judgment, but a nod absolutely needs to have been given to Isabelle Travadon, who also ran a purebred team and who did a very creditable job when things got difficult under circumstances that led others to scratch.
Also: YES, Rob Cooke became the first Siberian musher to finish both the Quest and the Iditarod in the same year. He did it with substantially the same dogs in both races. Rob is a friend and we are so proud of him we could bust. This is a big deal, and the author of that blog post should have known it. I think it's also worth mentioning that Rob did not have a good Quest, that it started out badly and he worked through it, solved his problems and got his team to the finish. That, I think, is a huge deal and speaks to the kind of dog man he is, which brings me to my real beef with the Iditarod post.
Shortly after I moved up to Two Rivers and before Chris arrived I needed to travel for work, so I boarded my eight (!) (at the time - I'm up to 20) Siberians at a kennel in the neighborhood. The fellow who owned the place was chatting and said "I used to run Siberians," so I said "oh?," curious to see where this was going to go. He went on to say "but Siberians have too much sense of self-preservation. They'll sit down on you. Alaskans will just go until they drop." That was a bit of an exaggeration but not completely. Siberians will leave a little in the tank (okay, sometimes a lot in the tank) when calling a time-out, and it takes a certain kind of musher to successfully run a Siberian team 1000 miles. It is not a coincidence that Mike Ellis in particular but also increasingly Rob Cooke and some other up-and-coming purebred mushers are known for exceptional dog care. The people who successfully run Siberians in 1000-mile races tend to be very fine dog people, in part because they have to be.
At this point it should not be a secret that the two things that what people who don't know better say about Siberians, that they're slow and they're pretty, annoy a lot of Siberian mushers. Siberians have some traits that are highly valued in sled dogs: they have excellent feet, they're easy keepers, and they do extremely well in genuinely frigid conditions. They're tough dogs, but with an enthusiasm a friend describes as "joie de husky." And yes, they're pretty, but focusing on that is a bit like saying "What a pretty face!" or giving someone a Miss Congeniality award. It's a bit condescending and it's failing to acknowledge their qualities specifically as sled dogs. These are great dogs and tricky dogs and they're often being driven by great -- and underappreciated -- mushers.
I think this is a really great time for working Siberians and that it's just getting better, and I'm excited to see more purebred teams running Quest. In the meantime, when you see a purebred team finishing a 1000-mile race looking happy and ready for more, do not say "What pretty dogs." Instead, say "What a fine, fine dog musher. And wow, those dogs are pretty."
Thursday, March 19, 2015
On Iditarod, justice, and "two-way communication" devices
As I wrote on my Facebook page, I haven't been saying much about Brent's disqualification because I am nauseated by it. I haven't wanted to participate in some of the technical nitpicking that's been going on because I think the real issue is justice. Whether or not there's a network available is irrelevant - nobody, including the race judges, think that Brent had any intention to cheat or had taken any actions that would give him an unfair advantage. At the same time they acknowledge that there are other mushers on the trail with similar devices and they're not going to seek those people out for punishment. So, unjust rule unfairly applied.
That said, I do think it's worth talking about the technology a little bit, because another problem is that the rule is very, very poorly specified and I suspect a competent lawyer with some technical expertise or who had access to expert technical witnesses could have a field day. The real question from where I sit isn't whether or not there's a network that a musher could access, but rather what constitutes a two-way communication device.
By way of context, one of the things I do to earn my dogs' kibble is develop internet protocol specifications, as a participant in and chair of several working groups in the Internet Engineering Task Force. We develop the core protocols that are used on the internet, including things like routing, security, transport, and so on. We are an organization of fuss-budgets, and our work consists of specifying protocol and device behavior to a level of detail that would make most people comatose. So, I feel pretty comfortable looking at something like the Iditarod's rule 35 and trying to figure out whether or not it makes sense.
The bottom line is that I think it probably does not, for several reasons. The primary reason is that it's overly broad and would exclude devices like Bluetooth headsets, which perform a (two-way!) negotiation with another device in order to pair. It excludes special-purpose radio/wi-fi devices which cannot be used for anything but the purpose for which they were developed (for example, Nikon cameras speak PTP/IP over 802.11, with the camera acting as the access point/hotspot - completely useless for anything but camera control and transferring images/video). They also incorrectly identify a SPOT and other trackers as a one-way communication device. Technically, they are two-way - they receive radio from satellites and uplink back to the satellites. They only uplink if they "know" they're in contact with the satellites. A DeLorme InReach Satellite Communicator, which falls generally within the same category as SPOT devices, allows the person with the device to send arbitrary messages. And what about a Fitbit? Technically, those are two-way communication devices, since they swap messages with your computer, tablet, etc.
And then there's the more general problem of keeping up as technology changes and develops. For example, what about the Apple watch and other "smart watches?"
This probably sounds like nitpicking, and it is. There are a lot of two-way communication devices that don't provide general communication facilities. You can't send and receive email with a Bluetooth headset and you can't browse the web with a Nikon camera (yet, as far as I know). The problem is that this is the rule under which Brent was disqualified, and the rule is a mess as far as the specification of communication capabilities. Because this is the rule that was used and because the penalty was so severe, its technical correctness matters a lot. As I said I think a competent attorney (and likely even an incompetent one) could have a field day with it.
The Iditarod organization has repeatedly demonstrated itself to be technically unsophisticated. Usually this comes out in the form of making bad decisions about writing their own tracking system, having their social media people provide technical support (badly), and so on, but here's a case where their lack of ability to describe what it is that they'd like to prevent has caused material damage to someone who even the race judges who made the disqualification decision agreed wasn't cheating.
Given that the problem that they're trying to solve isn't really a technical one, although it could be instantiated using technology, I think they are probably much better off trying to define disqualifying behavior rather than disqualifying devices. Their technical incompetence has led to considerable injustice against someone who did nothing wrong,
That said, I do think it's worth talking about the technology a little bit, because another problem is that the rule is very, very poorly specified and I suspect a competent lawyer with some technical expertise or who had access to expert technical witnesses could have a field day. The real question from where I sit isn't whether or not there's a network that a musher could access, but rather what constitutes a two-way communication device.
By way of context, one of the things I do to earn my dogs' kibble is develop internet protocol specifications, as a participant in and chair of several working groups in the Internet Engineering Task Force. We develop the core protocols that are used on the internet, including things like routing, security, transport, and so on. We are an organization of fuss-budgets, and our work consists of specifying protocol and device behavior to a level of detail that would make most people comatose. So, I feel pretty comfortable looking at something like the Iditarod's rule 35 and trying to figure out whether or not it makes sense.
The bottom line is that I think it probably does not, for several reasons. The primary reason is that it's overly broad and would exclude devices like Bluetooth headsets, which perform a (two-way!) negotiation with another device in order to pair. It excludes special-purpose radio/wi-fi devices which cannot be used for anything but the purpose for which they were developed (for example, Nikon cameras speak PTP/IP over 802.11, with the camera acting as the access point/hotspot - completely useless for anything but camera control and transferring images/video). They also incorrectly identify a SPOT and other trackers as a one-way communication device. Technically, they are two-way - they receive radio from satellites and uplink back to the satellites. They only uplink if they "know" they're in contact with the satellites. A DeLorme InReach Satellite Communicator, which falls generally within the same category as SPOT devices, allows the person with the device to send arbitrary messages. And what about a Fitbit? Technically, those are two-way communication devices, since they swap messages with your computer, tablet, etc.
And then there's the more general problem of keeping up as technology changes and develops. For example, what about the Apple watch and other "smart watches?"
This probably sounds like nitpicking, and it is. There are a lot of two-way communication devices that don't provide general communication facilities. You can't send and receive email with a Bluetooth headset and you can't browse the web with a Nikon camera (yet, as far as I know). The problem is that this is the rule under which Brent was disqualified, and the rule is a mess as far as the specification of communication capabilities. Because this is the rule that was used and because the penalty was so severe, its technical correctness matters a lot. As I said I think a competent attorney (and likely even an incompetent one) could have a field day with it.
The Iditarod organization has repeatedly demonstrated itself to be technically unsophisticated. Usually this comes out in the form of making bad decisions about writing their own tracking system, having their social media people provide technical support (badly), and so on, but here's a case where their lack of ability to describe what it is that they'd like to prevent has caused material damage to someone who even the race judges who made the disqualification decision agreed wasn't cheating.
Given that the problem that they're trying to solve isn't really a technical one, although it could be instantiated using technology, I think they are probably much better off trying to define disqualifying behavior rather than disqualifying devices. Their technical incompetence has led to considerable injustice against someone who did nothing wrong,
Thursday, February 5, 2015
The Yukon Quest on GPS or Google Earth
Last year, we provided some Iditarod track files for quite a large number of software applications. This year, for the Quest, I've prepared copies of the most popular file types:
KMZ file loaded into Google Earth |
The files contain the same information: checkpoint (and dog drop etc) locations as well as the race trail for odd years. I condensed this data from a variety of sources and hand-cleaned it, but obviously it will only be approximate, and the trail will vary from year to year. Also, because the trail is made up of points that are about half a mile distant, the total distance is being underestimated (it comes out to 911 miles according to this track). So please take this (ENTIRELY UNOFFICIAL) file with a good pinch of salt! If it is helpful or useful, I'd be glad.
Tuesday, February 3, 2015
The predictive value of split differences
It seems that the format of the Knik 200 trail was not that much fun for teams, so we had a few mushers who didn't need qualifiers checking out early (that is to say, scratching). But, the format also gave us an opportunity to look at some data that haven't been available in the past for distance racing.
The Knik had mushers running from Deshka Landing up to Yentna and back, twice. That is to say, they passed over the same piece of trail 4 times. This gave us an opportunity to look at whether or not the consistency of the speed with which they covered the trail had any predictive value, in terms of final placement. If someone had pretty much the same times on each race leg, did they tend to finish higher or lower in the standings?
So, I created a spreadsheet sheet in which I took the differences between the two up times, the differences between the two back times, and the sum of the absolute values of the differences. This gave me a handle on just how much variability there was in a team's runtimes. I then ran correlations on the total differences, the run to Yentna differences, and the run back differences, I found a fairly strong correlation between the summed runtime differences and final placement, but with also a fairly large standard error given the size of the field. That is to say:
In the correlations I ran, the difference in splits was the independent variable, and finishing position the dependent variable. Here's the table, with the first correlation being between the summed differences and the finish, the second being the summed differences on the run to Yentna, and the third beingn the summed differences on the run from Yentna back to Deshka:
The Knik had mushers running from Deshka Landing up to Yentna and back, twice. That is to say, they passed over the same piece of trail 4 times. This gave us an opportunity to look at whether or not the consistency of the speed with which they covered the trail had any predictive value, in terms of final placement. If someone had pretty much the same times on each race leg, did they tend to finish higher or lower in the standings?
So, I created a spreadsheet sheet in which I took the differences between the two up times, the differences between the two back times, and the sum of the absolute values of the differences. This gave me a handle on just how much variability there was in a team's runtimes. I then ran correlations on the total differences, the run to Yentna differences, and the run back differences, I found a fairly strong correlation between the summed runtime differences and final placement, but with also a fairly large standard error given the size of the field. That is to say:
a team whose times remained consistent from split to split tended to finish better than a team whose times varied more from split to splitThis should not be particularly surprising, since the source of speed variation tends to be slowing down due to tiredness, etc. Another source of consistency, other than conditioning and fitness, could be the musher's expertise in managing their team's resources. Steady speeds probably don't cause a good finish, but they can tell us something about the team's "quality" (for lack of a better word).
In the correlations I ran, the difference in splits was the independent variable, and finishing position the dependent variable. Here's the table, with the first correlation being between the summed differences and the finish, the second being the summed differences on the run to Yentna, and the third beingn the summed differences on the run from Yentna back to Deshka:
correlation coefficient: | 0.7031 |
standard error: | 5.4138 |
correlation coefficient out splits: | 0.5768 |
standard error: | 6.2195 |
correlation coefficient in splits: | 0.5781 |
standard error: | 6.2121 |
Thursday, January 29, 2015
Tools for looking at run/rest schedules
If you've followed distance mushing for any amount of time you're keenly aware of the role that run/rest schedules play in the sport. They can be both strategic and tactical, and can reflect the breeding decisions a given musher makes, as well as their training regimen (more on that in a bit).
I've posted a couple of videos on my Facebook page using Trackleaders tools from last weekend's Northern Lights 300 to look at how to use them to get a better understanding of how teams are performing against each other as they move down the trail, and how to use the replay function to watch interesting, and occasionally surprising, things happen during a race, based on how the teams move on the map. In the latter video I looked more closely at Larry Daugherty, because there were some unusual things happening on his tracker that stood out from the rest of the race.
In Larry's recap of the race he talks about deciding that he was going to try to be competitive, and making adjustments to his run/rest schedule to be more competitive. Of course, now we know that his team shut down on him twice, and the first place to look given his comments and given what happened is at how he rested his team.
During last year's Yukon Quest I wrote a short Python program to pull rest times out of a Trackleaders musher track. It's available for download, but for the more visually-oriented, you can take a look at a musher's speed/time plot on their Trackleaders page. For example, here's Kristy Berington's for this year's NL 300. She won the race, so clearly the decisions she made worked well for the team she had and the training she'd done:
Reading this is absolutely straightforward. The x-axis (horizontal) is race time - hours since the race start - and the y-axis (vertical) is speed. This looks like a pretty standard schedule for a 300-mile race. Her schedule looked roughly like:
So, the longest of the runs were only moderately long, and pretty much in keeping with what we've seen be successful in other races. None were what anybody would call very long. There were four rests. Note that the longest run was in the last half of the race, the second-to-last one, which was 8 hours.
Now, Larry said that his goal was to run long and rest long, as he'd seen other mushers, notably Allen Moore, do with a great deal of success. So, let's take a look at Allen's successful Copper Basin 300 race earlier this month:
Again, time is on the vertical axis and speed is on the horizontal axis. His schedule looked roughly like:
Again, a few moderately long runs, four rests, and the longer runs in the second half of the race (also top speeds of 30mph in a few places - woo, Allen! But those, sadly, are actually measurement errors). Note, as well, that in mid-distance races with mandatory checkpoint rest, the run/rest schedule is going to be influenced by checkpoint location and layover rules.
So now we've looked at a couple of successful run/rest schedules, including one that Larry said he was using as a model. Let's look at what Larry actually did:
Note that there are places in the third run and the run to the finish where his speed dropped considerably, and in one case where he actually stopped. Those are the places where his dogs quit on him.
What pops out here is that he only did four runs over 300 miles, while Kristy did five runs in the NL 300 and Allen did five runs in the CB 300. Note as well that his long run was comparatively quite long (12 hours), that it was in the first half of the race, and that it was not followed by much rest. His run/rest schedule does not actually look much like that of mushers winning at that distance, and is clearly one possible reason why his dogs quit on him several times. And this gets back to the training question - if he hadn't been training for very long runs prior to the race his dogs were likely not in condition either mentally or physically to pull one off.
Eyeballing the curves it also looks like Larry lost more speed on the first run than Kristy did. It might be interesting to fit a regression line to each and see how the slopes compare, but think it would only be a little bit interesting. More interesting is that many top mushers have talked about leaving the start chute at the same speed they'd like to run their race, or about negative splits, where their speeds towards the end of the race are faster than their speeds earlier in the race. Sebastian Schnuelle has talked memorably about being passed left and right by other teams towards the start of the race, and telling them "I will see you later." And nearly always, he does. Going out fast may or may not hurt but it's not clear that it helps.
This plot (the speed vs. time plot) is an incredibly handy tool for looking at run/rest schedules. One enhancement I'd love to see would be the ability to overlay multiple mushers on the same plot, which would allow us not only to compare rest durations and locations, but also speeds while moving. In the meantime, my script has the option to output the results in CSV format, which is handy for loading into a spreadsheet or into a data analysis package like R or NumPy.
I've posted a couple of videos on my Facebook page using Trackleaders tools from last weekend's Northern Lights 300 to look at how to use them to get a better understanding of how teams are performing against each other as they move down the trail, and how to use the replay function to watch interesting, and occasionally surprising, things happen during a race, based on how the teams move on the map. In the latter video I looked more closely at Larry Daugherty, because there were some unusual things happening on his tracker that stood out from the rest of the race.
In Larry's recap of the race he talks about deciding that he was going to try to be competitive, and making adjustments to his run/rest schedule to be more competitive. Of course, now we know that his team shut down on him twice, and the first place to look given his comments and given what happened is at how he rested his team.
During last year's Yukon Quest I wrote a short Python program to pull rest times out of a Trackleaders musher track. It's available for download, but for the more visually-oriented, you can take a look at a musher's speed/time plot on their Trackleaders page. For example, here's Kristy Berington's for this year's NL 300. She won the race, so clearly the decisions she made worked well for the team she had and the training she'd done:
Reading this is absolutely straightforward. The x-axis (horizontal) is race time - hours since the race start - and the y-axis (vertical) is speed. This looks like a pretty standard schedule for a 300-mile race. Her schedule looked roughly like:
Run 7 hours, rest 6 hours Run 6 hours, rest 3 hours Run 5 hours, rest 5 hours Run 8 hours, rest 4 hours Run 7 hours to the finish
So, the longest of the runs were only moderately long, and pretty much in keeping with what we've seen be successful in other races. None were what anybody would call very long. There were four rests. Note that the longest run was in the last half of the race, the second-to-last one, which was 8 hours.
Now, Larry said that his goal was to run long and rest long, as he'd seen other mushers, notably Allen Moore, do with a great deal of success. So, let's take a look at Allen's successful Copper Basin 300 race earlier this month:
Again, time is on the vertical axis and speed is on the horizontal axis. His schedule looked roughly like:
Run 5 hours, rest 5 hours Run 7 hours, rest 7 hours Run 3 hours, rest 5 hours Run 9 hours, rest 2 hours Run 8 hours to the finish
So now we've looked at a couple of successful run/rest schedules, including one that Larry said he was using as a model. Let's look at what Larry actually did:
Run 7 hours, rest 7 hours Run 12 hours, rest 5 hours Run 8 hours, rest 6 hours Run 8 hours to the finish
Note that there are places in the third run and the run to the finish where his speed dropped considerably, and in one case where he actually stopped. Those are the places where his dogs quit on him.
What pops out here is that he only did four runs over 300 miles, while Kristy did five runs in the NL 300 and Allen did five runs in the CB 300. Note as well that his long run was comparatively quite long (12 hours), that it was in the first half of the race, and that it was not followed by much rest. His run/rest schedule does not actually look much like that of mushers winning at that distance, and is clearly one possible reason why his dogs quit on him several times. And this gets back to the training question - if he hadn't been training for very long runs prior to the race his dogs were likely not in condition either mentally or physically to pull one off.
Eyeballing the curves it also looks like Larry lost more speed on the first run than Kristy did. It might be interesting to fit a regression line to each and see how the slopes compare, but think it would only be a little bit interesting. More interesting is that many top mushers have talked about leaving the start chute at the same speed they'd like to run their race, or about negative splits, where their speeds towards the end of the race are faster than their speeds earlier in the race. Sebastian Schnuelle has talked memorably about being passed left and right by other teams towards the start of the race, and telling them "I will see you later." And nearly always, he does. Going out fast may or may not hurt but it's not clear that it helps.
This plot (the speed vs. time plot) is an incredibly handy tool for looking at run/rest schedules. One enhancement I'd love to see would be the ability to overlay multiple mushers on the same plot, which would allow us not only to compare rest durations and locations, but also speeds while moving. In the meantime, my script has the option to output the results in CSV format, which is handy for loading into a spreadsheet or into a data analysis package like R or NumPy.
Tuesday, January 20, 2015
A closer look at the Buser shortcuts in the Kusko
I think at this point most distance mushing fans are aware that both Rohn and Martin Buser left the trail during this past weekend's Kuskokwim 300, and that they received no penalty for having done so despite the trail they took being somewhat shorter than the race trail.
The Kusko 300 rules say:
But, this doesn't apply in Rohn's case, as he was told that he was off-course and given the opportunity to return to the race trail, which he did not do. So.
What interests, I think, a lot of us is whether or not these shortcuts had an impact on the outcome of the race. We can use several Trackleaders' tools to look at that question and try to sort it out.
First, let's look at the shortcut itself. If you go to the Trackleaders page for the race, down the right-hand side of the map you'll see a column of buttons. Click on "Map layers."
That will expand to a series of checkboxed menu items: Weather Conditions, Cloud Cover, All Musher tracks, Tent layer, and Scratch layer. Click on "All Musher tracks." That will draw all of the tracks for all of the mushers who are being tracked. I've done that in the following image, and zoomed in on the trail near Bethel (I've also switched to satellite view, as the tracks stand out better on the map).
Clearly there's no question that Martin and Rohn did not follow the same trail as everybody else, and it appears that the trail they did take was shorter. So, was it shorter, and if so, how much of an advantage did they gain?
To try to suss that out, let's look at the race flow chart, which can give us a pretty clear look at average traveling speeds and team speeds relative to one another, as well as showing us speed anomalies in the track. Here's the race flow chart from Tuluksak to the finish. Note the big vertical jag in Rohn's and Martin's curves - that's where they left the trail (to digress a bit, Trackleaders appears to calculate musher trail mile by comparing the location of the GPS reading to the track that they were given by the race organization). Note, as well, that both Rohn and Martin had slowed down and were losing speed relative to the teams around them. On the race flow chart the x-axis (horizontal) represents time since the race began and the y-axis (vertical) represents trail mile. The steeper the slope of a musher's curve, the faster they're going, and the less steep it is, the slower they're going. So, that they were losing ground is very clear, and that they got a bump from their shortcut is also very clear.
There are several things we can do in looking at the data. One thing we can do is look at the bumps and try to figure out what it did to "effective" speed on the trail. Switching over to looking at Rohn's individual tracker map, it appears that he left the trail right at about mile 253.5 and rejoined at mile 266. Again, according to his individual track,, that would mean that he left the trail at about 4:37am and rejoined the trail at about 5:36am. So, as far as the race is concerned he covered 12.5 miles in 59 minutes, or averaged a hair over 12.5 mph over that section of trail (again, as far as the race is concerned). If you take a look at his traveling speed prior to that (looking at the slope on the race flow chart) at hour 32 he was at mile 234.5 and at hour 34 he was at mile 253.5, so was traveling about 9.5 mph in the two hours prior to leaving the trail. That is to say, he got about a 1 mph boost.
Now, if he'd stayed on the trail and continued traveling at about 9.5mph, would he still have beaten Jeff King to the finish? More arithmetic. He left the trail at about 4:37am, at trail mile 253.5. The finish in Bethel is at about trail mile 267.3. That's 13.8 miles. If he'd been traveling at a constant 9.5 mph over that 13.8 miles he'd have arrived in Bethel about an hour and 27 minutes after the time at which he left the trail, or around 6:04 am. Jeff got in around 5:58. That's close enough, I think, to be questionable, but Jeff probably would have finished earlier and gotten the $17,000 check instead of the one for $11,500.
Just for fun, let's try extending Rohn's curve long its original path to see if it results in something much different, and as a way of validating (or not) the values I've been using for time and trail mile. Also, pictures are just plain easier to understand. So, what happens if I extend Rohn's line on the race flow chart along its original path? This, which shows him finishing at about the same time as Jeff:
Note that Rohn's line, however, was not straight - it bulges a bit on top of the straight line because ... he was slowing down.
Running some numbers on Martin's track, he left the trail at about 5:54 am and returned to it at about 7:04am, so as far as the race is concerned he ran that section at 10.9 mph, while otherwise running about 8.7 mph. If he'd stayed on the trail at a constant 8.7 mph, he would have arrived in Bethel in the general vicinity of 7:30, or right about the same time as Brent Sass. This one is a lot fuzzier than the Rohn/Jeff situation.
So basically, yes, I think that if Rohn hadn't left the trail Jeff would have beaten him into Bethel, but not by much. I don't understand why the Busers weren't penalized for leaving the trail and taking a shorter route and I especially don't understand why the race organization has said absolutely nothing. Given that the Kusko organization hasn't said a word about this I think it would be nice if Rohn would take responsibility and then donate the $5500 difference between a 2nd and 3rd place finish to a local charitable organization in Bethel. The way the race ended just leaves a bad feeling all around.
[Update: KYUK reports that both Busers have been penalized 10 minutes and 10 percent of their winnings. I do think that both received substantially more than a 10-minute benefit from their adventures on Church Slough, but I'm glad that the problem has been formally recognized by the race.]
The Kusko 300 rules say:
"Racers must follow the marked and/or broken race trail. Leaving the marked and/or broken trail for the purpose of gaining a competitive advantage over other racers is not allowed."Right there is an obvious problem: it implies that if someone leaves the trail and follows a shorter route by accident, they will not be penalized. It requires the race judges to attempt to determine intent, and that's both difficult and unfair, as it introduces a highly subjective element into the decision. There's also the question of incentives - if you kill a moose with your car in Alaska, you don't get to keep the meat, antlers, or any other valuable part of the animal because the state doesn't want you accidentally-on-purpose killing a moose. Same with killing wildlife in defense of life and property - you don't get to keep anything of value from the animal because they also don't want you accidentally-on-purpose having a dangerous run-in with a bear. Setting up a situation in which it's okay to leave the trail under particular circumstances removes some of the disincentives for leaving the trail.
But, this doesn't apply in Rohn's case, as he was told that he was off-course and given the opportunity to return to the race trail, which he did not do. So.
What interests, I think, a lot of us is whether or not these shortcuts had an impact on the outcome of the race. We can use several Trackleaders' tools to look at that question and try to sort it out.
First, let's look at the shortcut itself. If you go to the Trackleaders page for the race, down the right-hand side of the map you'll see a column of buttons. Click on "Map layers."
That will expand to a series of checkboxed menu items: Weather Conditions, Cloud Cover, All Musher tracks, Tent layer, and Scratch layer. Click on "All Musher tracks." That will draw all of the tracks for all of the mushers who are being tracked. I've done that in the following image, and zoomed in on the trail near Bethel (I've also switched to satellite view, as the tracks stand out better on the map).
Clearly there's no question that Martin and Rohn did not follow the same trail as everybody else, and it appears that the trail they did take was shorter. So, was it shorter, and if so, how much of an advantage did they gain?
To try to suss that out, let's look at the race flow chart, which can give us a pretty clear look at average traveling speeds and team speeds relative to one another, as well as showing us speed anomalies in the track. Here's the race flow chart from Tuluksak to the finish. Note the big vertical jag in Rohn's and Martin's curves - that's where they left the trail (to digress a bit, Trackleaders appears to calculate musher trail mile by comparing the location of the GPS reading to the track that they were given by the race organization). Note, as well, that both Rohn and Martin had slowed down and were losing speed relative to the teams around them. On the race flow chart the x-axis (horizontal) represents time since the race began and the y-axis (vertical) represents trail mile. The steeper the slope of a musher's curve, the faster they're going, and the less steep it is, the slower they're going. So, that they were losing ground is very clear, and that they got a bump from their shortcut is also very clear.
There are several things we can do in looking at the data. One thing we can do is look at the bumps and try to figure out what it did to "effective" speed on the trail. Switching over to looking at Rohn's individual tracker map, it appears that he left the trail right at about mile 253.5 and rejoined at mile 266. Again, according to his individual track,, that would mean that he left the trail at about 4:37am and rejoined the trail at about 5:36am. So, as far as the race is concerned he covered 12.5 miles in 59 minutes, or averaged a hair over 12.5 mph over that section of trail (again, as far as the race is concerned). If you take a look at his traveling speed prior to that (looking at the slope on the race flow chart) at hour 32 he was at mile 234.5 and at hour 34 he was at mile 253.5, so was traveling about 9.5 mph in the two hours prior to leaving the trail. That is to say, he got about a 1 mph boost.
Now, if he'd stayed on the trail and continued traveling at about 9.5mph, would he still have beaten Jeff King to the finish? More arithmetic. He left the trail at about 4:37am, at trail mile 253.5. The finish in Bethel is at about trail mile 267.3. That's 13.8 miles. If he'd been traveling at a constant 9.5 mph over that 13.8 miles he'd have arrived in Bethel about an hour and 27 minutes after the time at which he left the trail, or around 6:04 am. Jeff got in around 5:58. That's close enough, I think, to be questionable, but Jeff probably would have finished earlier and gotten the $17,000 check instead of the one for $11,500.
Just for fun, let's try extending Rohn's curve long its original path to see if it results in something much different, and as a way of validating (or not) the values I've been using for time and trail mile. Also, pictures are just plain easier to understand. So, what happens if I extend Rohn's line on the race flow chart along its original path? This, which shows him finishing at about the same time as Jeff:
Note that Rohn's line, however, was not straight - it bulges a bit on top of the straight line because ... he was slowing down.
Running some numbers on Martin's track, he left the trail at about 5:54 am and returned to it at about 7:04am, so as far as the race is concerned he ran that section at 10.9 mph, while otherwise running about 8.7 mph. If he'd stayed on the trail at a constant 8.7 mph, he would have arrived in Bethel in the general vicinity of 7:30, or right about the same time as Brent Sass. This one is a lot fuzzier than the Rohn/Jeff situation.
So basically, yes, I think that if Rohn hadn't left the trail Jeff would have beaten him into Bethel, but not by much. I don't understand why the Busers weren't penalized for leaving the trail and taking a shorter route and I especially don't understand why the race organization has said absolutely nothing. Given that the Kusko organization hasn't said a word about this I think it would be nice if Rohn would take responsibility and then donate the $5500 difference between a 2nd and 3rd place finish to a local charitable organization in Bethel. The way the race ended just leaves a bad feeling all around.
[Update: KYUK reports that both Busers have been penalized 10 minutes and 10 percent of their winnings. I do think that both received substantially more than a 10-minute benefit from their adventures on Church Slough, but I'm glad that the problem has been formally recognized by the race.]
Subscribe to:
Posts (Atom)