Tuesday, March 19, 2013

I don't even know where to start

So, Dorado.  I find it nearly impossible to write about something as personal, as intimate, as losing a dog.  If it were me I'd be running around Unalakleet with an axe or a gun, or more likely just sitting out on the ice for a couple of weeks (or a couple of months) and avoiding people entirely.  Much love and honor to Paige and Cody for walking a hard road.

Danny Seavey wrote about what happened to Dorado on his business's Facebook page, and that piece has received a lot of attention, reposts, and wide acclaim from Iditarod fans.  It made me more than a little ill, frankly, and I'd like to talk about that and about responsibility to the sport.

The upshot of his post is that life is short, everything that lives must die at some point, and we need to decide where to draw the line between tragic and just sad.  He talks about eating meat and he talks about the euthanasia of unwanted pets.  He says a well-intended volunteer messed up and that we shouldn't take it out on that person.  He's saying that accidents happen and that's just the way it is.

But here's the thing: this wasn't a freak accident.  This was negligence on the part of the person or people staffing the dog yard at Unalakleet.  Whether or not they understood how snow fences work, they weren't checking on the dogs regularly enough to identify a developing problem, and they weren't checking the dogs regularly enough to remediate what was going wrong.

I think we're living in a very good period in the development of dog mushing.  Veterinary research has made great strides in identifying and developing mitigations against preventable dog deaths.  Ethical standards in dog care, husbandry, breeding, and so on are improving a lot.  Much improved nutrition has both improved dogs' performance and improved their quality of life.  Some top kennels are working directly with veterinary researchers, and veterinary care awards have become some of the most prestigious titles awarded for many races.

But many dog mushing fans come from warm places, places without mushers or dog teams, places without roadless areas or true wilderness.  They're drawn to the sport for the romance and adventure and often don't really know anything about running dogs beyond what they learn while following Iditarod and reading blog posts or Facebook statuses from mushers.  They don't have enough experience to contextualize what they read.  They want to support the mushers and dog teams, and when a relatively high-profile musher says something they tend to believe it.  The kind of fans they become depends on what they read and experience.  And so even leaving aside moral and ethical questions raised by Danny's unfortunate post, I think that it's important to the sport that mushers are clear that any dog death caused by negligence on the part of race volunteers, race staff, mushers, whomever, is completely unacceptable.  We don't just shrug it off and compare it to the euthanasia of an unwanted, homeless dog or to eating chicken.  What happened here is intolerable, and much shame on anybody who not only thinks otherwise but influences people who don't know better to think otherwise as well.

So, onward.  I believe that the sport is going to continue to improve, that dog care will be valued more and more highly, that a solid understanding of the ethics of how we live with dogs will spread, and that fans will learn what these extraordinary dogs really mean to the people who raise them and care for them and travel vast distances with them as partners.  But it will take outreach, and communicating to those fans who think that Danny Seavey's explanation is brilliant that no, this is not how I feel about my dogs and this is not how it works.

Thursday, March 14, 2013

The whizzdom of crowds

I've been enjoying the heck out of the Seavey's "Fantasy Iditarod" game.  It occurred to me that with so many participants (469!) it might be interesting to look at how everybody bet, to see whether or not the game actually had any predictive value.

A few years back James Suroweicki wrote a book called "The Wisdom of Crowds," the basic premise of which is that "a large group's aggregated answers to questions involving quantity estimation, general world knowledge, and spatial reasoning has generally been found to be as good as, and often better than, the answer given by any of the individuals within the group." Over the past decade or so there's been tremendous growth in what are called "prediction markets," in which participants buy and sell prediction shares in things like political elections, Academy Awards, etc.

So, how well can a group of Iditarod fans predict the race outcome?  Not that well, as it turns out, but not that badly, either.  There seems to be some accuracy at the high ends (winners) and low ends (um, not winners), but the pricing and rules of the game have a distortive effect in the middle, I think.

Here's my premise: according to the rules of Fantasy Iditarod, each person has $27,000 to allocate to up to 7 mushers.  Each musher had a "price," with very experienced, successful mushers being priced quite high and rookies or people who hadn't been particularly in the past priced quite low.  The prices were set such that a player couldn't spend it all on top mushers - if they wanted 7 mushers they'd need to bet on some lesser-known or less-successful mushers.  I thought it was possible that the raw counts of how many people had included a given musher might indicate something about how the race would turn out.  So, I wrote a simple script to count the number of bets on each of the mushers (and did the same for the rookie bets - more on that later).

 I was surprised that Martin Buser had gotten as many bets as he had, and he was the jackrabbit early in the race.  Aliy Zirkle was the most popular choice and she came in a very close second.   Joar Leifseth Ulsom was ranked 5th, which is not what you'd normally expect for a rookie, but was quite close to how he actually did (7th).  Mitch Seavey was the 9th most popular choice but had far, far viewer bets than Aliy (74, to her 232).

I think to some extent the results were distorted by our inability to pick the seven teams we thought would place the highest, although I don't think they were distorted that much.  Almost certainly people with no chance received more votes than they would have absent the $27,000 limit, and people with some chance received a bit fewer.  I also think there was a lot of sentimental voting as a way of showing support for mushers people particularly like, regardless of what the expected outcome would be.

In the rookie race, Joar was the hands-down favorite.  If you follow dog mushing at all he would have had to have been your choice.  He received 92 votes for the Rookie Award, and the closest competitor was Travis Beals, at 45.  Josh Cadzow received 42, which surprised me quite a bit - I would have thought he'd get the second-most votes, based on past performance.  If anybody has any insight into why Travis got more votes from fans I'd be really interested to hear your take on it.

The tables are below, with names, votes, and actual placement.  Some teams are still on the trail so the final standings aren't complete, but 36 teams are in and I think that's enough to get a handle on how well the Fantasy Iditarod bets line up with the actual results.

Fantasy Iditarod bet counts
Name Bets Placement
Aliy Zirkle 232 2
Dallas Seavey 206 4
Martin Buser 156 17
Lance Mackey 127 19
Joar Leifseth Ulsom 98 7
Jeff King 97 3
Jake Berkowitz 86 8
Ramey Smyth 80 20
Mitch Seavey 74 1
DeeDee Jonrowe 71 10
Gerry Willomitzer 71 withdrawn
Kristy Berington 70
Mike Ellis 68 30
Travis Beals 68
Brent Sass 65 22
Josh Cadzow 61 14
Matt Giblin 54
Newton Marshall 51 scratched
Peter Kaiser 50 13
John Baker 47 21
Cindy Abbott 46 scratch
Allen Moore 44 33
Aaron Peck 42
Anna Berington 42
Richie Diehl 40 36
Cim Smyth 40 15
Matt Failor 38 28
Paige Drobny 36 34
Mikhail Telpin 32
Christine Roalofs 31
James Volek 30
Aaron Burmeister 28 11
Scott Janssen 28 scratched
Jason Mackey 27 scratched
Paul Gebhardt 26 16
Charley Bejna 25 scratched
Nicolas Petit 25 6
Jessie Royer 23 18
Luan Ramos Marques 23
Mike Williams Sr 23
Wade Marrs 22 32
Jodi Bailey 21
Karin Hendrickson 21
Angie Taggart 20
Jan Steves 18 scratched
Michelle Phillips 18 24
Justin Savidis 17
Ken Anderson 17 12
Ray Redington Jr 17 5
Jim Lanier 15 35
Michael Williams Jr 14 23
Curt Perano 14 27
Kelley Griffin 14 26
Louie Ambrose 13
Bob Chlupach 11
Jessica Hendricks 10 25
Robert Bundtzen 9 scratched
Kelly Maixner 9 31
Sonny Lindner 7 9
Cindy Gallea 7
Rudy Demoski Sr 6 scratch
Linwood Fiedler 5 29
Michael Suprenant 3 scratched
Gerald Sousa 3
Ed Stielstra 3 scratched
David Sawatzky 1 scratched

Fantasy Iditarod rookie bet counts
Joar Leifseth Ulsom 92
Travis Beals 45
Josh Cadzow 42
Paige Drobny 32
Mike Ellis 30
Richie Diehl 23
Cindy Abbott 11
Mikhail Telpin 10
Charley Bejna 9
James Volek 9
Christine Roalofs 8
Luan Ramos Marques 7
Louie Ambrose 6

Saturday, March 9, 2013

Something interesting from the "analytics"

I know I'm getting pretty repetitive about the limitations of IonEarth's so-called "analytics," so I thought it might be novel to write about something I just noticed that may or may not be useful.  I was looking at Aliy's and Martin's plots for the last 24 hours or so, and it appears to be the case while both Martin and Aliy are now traveling slower than their average speed while moving (although I don't think IonEarth's numbers here are particularly reliable), the difference between Martin's current speeds and his average speed is larger than the difference between Aliy's current speeds and her average speeds.  That is to say, he's slowed down more than she has.  Or at least I think that's the case - we're relying on visual guesstimates here but the y-axis scales seem to be the same.

So, at long last - IonEarth finally produces an insight not otherwise easily recognizable.

Thursday, March 7, 2013

Using the "analytics" to answer questions

I think that I've probably underestimated the value of the IonEarth "analytics."  With all due respect to Iditarod fans it's my sense that many of them are from warm places and don't have much (or any) experience with dogsled racing.  It shouldn't be surprising that they tend to be less knowledgable than, say, Yukon Quest fans or Two Rivers 200 (this weekend!) fans.  And so the "analytics" are showing them things they don't know and are finding enlightening.  I could be wrong but it's my sense that this year we've got fewer people freaking out every time a team stops moving, because fans are coming to understand that there are run/rest cycles, that some teams prefer to take long rests on the trail rather than at checkpoints, etc.  That's definitely valuable.  The "analytics" have an important educational role to play.

However, they have a less useful analytical role to play, at least in terms of utility in answering the sorts of questions I've been wondering about.  Even when you clear away some of the bad visual design and disregard some of the odd "analytical" decisions (seriously: who thought plotting altitude against time was a good idea?), there are still some problems.  For example, quite possibly the most common question someone might want answered is whether one team is gaining on another or falling back.  Right now Martin Buser is the furthest down the trail, and Aliy Zirkle is chasing him, having just come off her 24.  Is she gaining ground?

So, here's how I'm approaching the problem.  It involves a ridiculous amount of clicking and may not be the most efficient way, and I'd love to hear from people who've come up with better ways to answer this question.

The first thing I do is remove all mushers from the tracker, then add back the two I'm interested in.  This reduces the likelihood of annoying unwanted pop-ups should my mouse cross another musher.  Unfortunately the "add musher" feature in the tracker uses bib numbers, which is kind of insane when you think about the number of teams in this race, but it is what it is so to find the two I'm interested in I either sort by name, or I sort by trail mile and then use the list to find their bib numbers.  (You can sort the "Selected mushers" list by clicking on the column header of interest).

Once I've got the bib numbers, I remove all mushers and then add the ones I'm interested in.  Right now, that leaves me with this:

[This is actually the second screenshot I took of this.  The screen updated during the first attempt and they threw up a big "updating" alert box.  I hope that sometime between this Iditarod and next, IonEarth hires someone who understands a little bit about software development and user interfaces and this kind of amateurish nonsense goes away.  Or better still, that Iditarod hires a better tracking service.]

Okay, so here's the problem: given that the tools IonEarth provides don't give an easy way to assess how teams are moving in relation to one another, what do I do?  I figure I've got some basic choices:

  1. Use the instantaneous speed reading - how fast both teams are moving as of the most recent reading
  2. Average the instantaneous speed reading over some number of readings, say 6 (an hour) or 3 (a half-hour)
  3. Measure the distance apart at several different points and see how it's changing.
Note that I am not looking at either the average speed or the average speed while moving.  The reason I'm not is that they're averages from the beginning of the race.  They fail to capture anything about overall trajectory.  A team that was moving very fast on Sunday and is plodding today could have the exact same averages as a team that was plodding on Sunday and is very fast today.  IonEarth's averages may or may not be interesting in and of themselves, but they don't help answer this question at all.

I've decided that my best bet is to see how distance between the teams changes over time.  This should be more-or-less equivalent to averaging the instantaneous speed readings over the same period, but I've noticed some odd readings in the instantaneous speeds and suspect that they're not that reliable (too many 0s).

So, what I'm doing is calculating the difference between them now by subtracting Aliy's trail mile (427) from Martin's (446) and finding that they're roughly 19 miles apart.  I'll then move backwards 20 minutes by using the dropdown time menu to the left of the map

(the terrible user interface design choices just keep piling up, don't they?) to move the two teams backwards through time (if only I could find a way to do that for myself ... ).  20 minutes before the most recent reading, Martin was at trail mile 443 and Aliy was at trail mile 425, or they were 18 miles apart.  That is to say, Martin's gained about a mile in the last 20 minutes.  Going back an additional 20 minutes, Martin was at trail mile 441 and Aliy was at trail mile 423, also about 18 miles apart.  So, it looks like they're traveling at roughly the same speed, with Martin just a hair faster.  

With the Trackleaders race flow plot this is graphically displayed in a way that you can take it in an instant, but given that this is what we've got, we can figure out how to get questions answered anyway, with a few more steps and a lot more effort and with some loss of information.  The main thing is understanding that the race isn't just a matter of a team moving through space and time, it's a matter of a lot of teams moving through space and time and having their relationships constantly shifting as a result.  It would be nice to have tools that represented those relationships better.  But in the meantime, I think that if we can figure out what questions we want answered we can also figure out how to answer them with the tools at hand.

Wednesday, March 6, 2013

Measurement artifacts in the Iditarod altitude plot

I just really don't like having altitude and temperature curves on the Iditarod speed plots, so when I start to look more closely at someone's "analytics" (I still cannot describe a speed/time plot as an "analysis" - sorry) the first thing I do is make them go away.  But today someone asked me a question about something he'd seen on a plot and while looking at his screen grab I noticed that someone's altitude was changing while the sled wasn't moving (i.e. when the speed was 0.0).  So, I decided it might be fun to take a look at someone who was parked a long time and see what happened.  Here's Martin Buser's 24 in Rohn:

As you can see, the altitude line is basically level but wiggles a bit (and I wish they'd get rid of the drop shadow on that line - it's pretty but it makes it a little more difficult to read the curve - more chartjunk!). How much does it actually vary?  Well, running my cursor back and forth and back and forth and back and forth and back and forth over the curve, the highest value appears to be 1489 feet on 3/5 at 11:10, and the lowest value appears to be 1309 feet on 3/5 at 1:40pm.  In other words, by the plot the sled lost 180 feet of altitude in 2 1/2 hours while sitting still.

So, what's really going on?  A couple of possibilities:

  • The tracker is using barometric pressure to measure altitude, and this is due to normal fluctuation or weather changes.  Not likely
  • The GPS in the tracker is triangulating altitude, along with lat/long.  This tends to be much less reliable than GPS lat/long readings and can fluctuate amore, and is the more likely explanation
Note that this represents an error range of about 12%.

[n.b. note that IonEarth will redraw the "analytics" while you're looking at it, taking it back out to the full scale even if you've zoomed in.  Thanks tons, guys - your stuff is a pleasure to work with.]

Tuesday, March 5, 2013

The Mike Williamses are so confusing!

Chris got home tonight and said "I think they swapped the Mike Williamses."  I had no idea what she was talking about but when she said "the tracker!" I thought "not too likely."  Well, she's right - the Mike Williams Jr on the tracker is the Mike Williams Sr on the leaderboard, and vice versa.  They swapped the bib numbers but, as it turns out, not the humans.  Here's how I approached figuring it out.

Since they arrived in Nikolai about an hour apart I thought I'd use the tracker's history mechanism to see which the tracker thinks arrived first.  The first thing I did was to remove all the mushers, then add the Mike Williamses back in (note to Iditarod: ordering the mushers by bib number on the "Choose your mushers" drop-down menu when there are 66 mushers is really dumb).  Then I took a look at the leaderboard to see when a Mike Williams first arrived in Nikolai.   We have a Mike Williams whose bib number is 46 arriving at 15:36, and a Mike Williams whose bib number is 35 arriving at 16:33.

That gave me a rough idea of what time to rewind the tracker to to see which Mike Williams it thought arrived at 15:36.  Here's what the tracker showed at 15:30 this afternoon:

So it thinks bib 35, Mike Williams Sr, arrived at 3:36pm, and the leaderboard thinks it was bib 46, Mike Williams, Jr.  We'd sort of expect it to be Mike Williams, Jr, given that he's a very speedy guy who's expected to place very high.

However, this:

It appears to be the case that they've associated the right name but the wrong bib number with that particular tracker.  With the unfortunate reliance on bib number in the IonEarth user interface it would be nice if they could fix it, but given the ham-handed way they treat data I would not expect them to be able to move retrospective race data correctly when and if they fix this, so there's a reasonable expectation of buggered-up speed calculations, run/rest schedules, and whatnot.

Seriously, Iditarod: do you really want to stick with this tracking service?

Exploring the "analytics" a little further

I like to use race analytics to try to answer questions I have about how a given race is unfolding or to reveal something otherwise not obvious, and I thought it might be worth spending some time looking at what's possible with the new IonEarth "analytics" (I have a hard time calling plots of speed against time  an "analytic," but we get what we get).  So, here's Justin Savidis's "analytic" plot:

First, let's get rid of the chartjunk - the stuff that doesn't add information but clutters up the plot with a bunch of noise.

Aaaah - much better.  Goodbye, altitude curve and insane, ridiculous temperature curve!  To get rid of a given curve, click its name in the box to the right side right above the plot.

One of the things I do like about the IonEarth analytics is that they flag checkpoint locations on the x (horizontal) axis (and it should be noted that this is really the only place in their "analytics" where they provide distance information).

When the instantaneous speed curve (the green one) is horizontal and at 0 on the Y axis (chartjunk alert!  IonEarth needs to move some of those labels to the right side of the plot and move temperature up to the altitude curve), the team isn't moving.  What we can see is that Justin's longer rests have almost all been in checkpoints so far.  We can also see that he stopped for about three hours on the trail earlier this morning.  So that's kind of revealing.

Contrast this with Brent Sass's plot:

For the most part Brent is taking his rest on the trail.  This should not be a surprise, for a few reasons:

  • Brent is racing and he's not going to let checkpoint placement location control his run/rest schedule
  • Brent's got mad skillz and is expert at wilderness travel.  He knows how to camp and he's comfortable doing it
None of this is hugely surprising.  But one thing we can learn from these "analytic" plots is teams' run/rest schedules, and it's enormously interesting to compare those.  And, of course, it was on this plot that we could tell that Martin was just not giving his dogs a break as he hot-footed it to Rohn.

It's a lot harder to use IonEarth's "analytics" to do things like figuring out who's traveling together (often sort of interesting) and how teams are moving in relation to each other.  I'll start to look at those questions in subsequent posts.

Monday, March 4, 2013

Iditarod, Monday morning

So, it's Monday morning and the talk is all about Martin Buser opening a big gap on the rest of the field.  It's 8:40 here in Alaska and according to the tracker he's at trail mile 156 while the person running in the second position, Matthew Failor, is at trail mile 132.  That's a difference of 24 miles, or roughly three hours of running time.  Perhaps more interesting is that Lance Mackey is resting at trail mile 109, or 47 miles behind Martin.

Before the race started commentators and mushers alike stressed the importance of run/rest cycles, how they figure into the race, and what it means to keep your dogs perky.  I don't think we should forget that now that we've got a jackrabbit, especially since we're less than 24 hours into the race and we don't know whether or not it's going to work for him

So, here are the IonEarth speed/time plot for Martin and Lance, with some chartjunk removed for clarity.   Martin's plot is the upper one; Lance's the lower:

The thing that pops out here is that Lance is banking a lot of rest and Martin is banking what appears to be none.  Lance took a three-hour break last night at 10pm and has now been parked for nearly 5 hours.  This is less rest than the old-school even run/rest schedules that used to be common, but it's still a good amount of rest.

Also note that Martin is moving more slowly than other teams (when they're moving) so it's crossed my mind that it's *possible* that he's carrying one or two dogs at a time to give them some rest rather than breaking the whole team, but there's no way to tell.  He's averaging 9mph while moving and Lance is averaging 9.7mph while moving, which is a respectable difference (although to be honest I don't really trust IonEarth's averages, so while I'm probably more confident than not in these numbers I'm not 100% confident).

Also worth mentioning and this should definitely figure into your thinking about positions: Martin has the lowest bib number, which means that he owes the most time (over two hours).  To figure out the difference between what he owes and another team owes, subtract Martin's bib number from the other team's bib number and multiply by two.  That's the number of minutes more that Martin owes.

Sunday, March 3, 2013

Analytics and smell tests

One of the things I've really tried to hammer on over the last year or so is that the statistics that we talk about and that race organizations offer up need to be connected to something going on in the real world, and that they need to make sense.  Whether or not they're useful depends on whether or not they reflect something real and on whether or not they have any explanatory power.  I feel pretty strongly that tracker analytics are worthless without meeting those two conditions.

And so it is that I cast an eye upon the IonEarth analytics and find myself scratching my head.  This one is really easy, popped out immediately, required absolutely no arithmetic.  And here it is, tracker info on Rudy Demoski:

Here we are, less than three hours into the race.  Rudy is just a little past Scary Tree, at about race mile 25.  So we take a look at the IonEarth run/rest calculations, and they claim that he's taken about 1:20 of rest, and been running 2:40.  First, this fails the smell test just on the rest time alone.  It's quite unlikely that someone's already taken 1:20 rest before getting to mile 25 of the Iditarod, although we'll circle back around to this later.  If we add 1:20 and 2:40, we get 4 hours.  So, has Rudy been running 4 hours at 5:50 pm on Sunday?  Not very likely, since the first musher left at 2:00pm.  Bib 39 definitely did not leave before the race started.  If we drop the 1:20 rest, does the notion that he's been running 2:40 make sense?  Maybe.  By the clock he should have left at 3:14pm, and if he'd been running 2:40 at 5:50 Trackleaders thought he left at at 3:10.

So that all makes sense, but where did the 1:40 rest come from?  Here's my guess:  IonEarth was incredibly sloppy in their treatment of pre-event data in the Junior Iditarod, including readings from when the trackers were in someone's car or truck on the way to the start.  So, early in the race you'd see average speeds of something like 45mph.  So what I think is that they truncated the pre-event data up to the start from some of their calculations and displays, but not the run/rest calculations.  The tracker was on Rudy's sled and turned on for 1:40 before he started the race.

So the other thing that doesn't pass the smell test, as it relates to rest time, is the average speeds.  If someone is taking rest their average speed will fall relative to their average moving speed (and IonEarth, seriously, kill that "moving average" label, at least if you don't want people to know that you don't have people working on this who've been through an introductory statistics class - it doesn't mean what you're using it to mean, and I think you could be using actual moving averages to provide better insights into your data).  Here, they're identical, as you'd intuitively expect early in the race.  If he'd really spent 1/3 of his race time parked, the average speed would be substantially lower.

So, we're not off to a good start, IonEarth analytics and I.  This is just really sloppy, and it's pretty clear that it did not go through any QA process.  There are a couple of things they can do.  One is to manually edit the raw data down.  Preferable, I think, would be to include the team's actual start time in their input data and use the Holy Mackerel of really simple programming to completely exclude anything that shows up before the start time (or after the finish time).  What they have now is internally inconsistent, confusing, and lacks the ability to explain what's happening on the trail.

Saturday, March 2, 2013

Terrific improvements in the Iditarod website

I'd like to give a nod and a huge "Thank you!" to the Iditarod Trail Committee for one of the things they've done for their website.  It looks like the site, and the Iditarod media content, are being served out of Amazon Web Services.

What this means in practice is improved performance, potentially much improved performance.  Amazon Web Services are what's known as a "cloud" service, and it includes the ability to add resources (computational, disk, bandwidth) when they're needed and remove them when they're not.  It makes it possible to move services around, keeping them available even in the face of hardware failure. If they do a good job, and Amazon does an excellent job, you won't notice it at all because the data will flow smoothly and services will remain available.  I've been an Insider subscriber since they first started charging for it (although the last few years I haven't gotten a video subscription, but this year I have), and it seems like year after year after year after year they knew exactly how many subscribers they had and still underprovisioned their services, resulting in poor video performance, site outages, etc.  This should solve that problem.

I should mention that the first few days that they started to see a lot of traffic there were some performance issues that appeared to be related to their database servers rather than their web services, and it looks like they've sorted that out.  So, credit where credit is due, and many thanks to the ITC for taking these problems seriously and taking the right steps to getting them fixed.