Sunday, March 3, 2013

Analytics and smell tests

One of the things I've really tried to hammer on over the last year or so is that the statistics that we talk about and that race organizations offer up need to be connected to something going on in the real world, and that they need to make sense.  Whether or not they're useful depends on whether or not they reflect something real and on whether or not they have any explanatory power.  I feel pretty strongly that tracker analytics are worthless without meeting those two conditions.

And so it is that I cast an eye upon the IonEarth analytics and find myself scratching my head.  This one is really easy, popped out immediately, required absolutely no arithmetic.  And here it is, tracker info on Rudy Demoski:

Here we are, less than three hours into the race.  Rudy is just a little past Scary Tree, at about race mile 25.  So we take a look at the IonEarth run/rest calculations, and they claim that he's taken about 1:20 of rest, and been running 2:40.  First, this fails the smell test just on the rest time alone.  It's quite unlikely that someone's already taken 1:20 rest before getting to mile 25 of the Iditarod, although we'll circle back around to this later.  If we add 1:20 and 2:40, we get 4 hours.  So, has Rudy been running 4 hours at 5:50 pm on Sunday?  Not very likely, since the first musher left at 2:00pm.  Bib 39 definitely did not leave before the race started.  If we drop the 1:20 rest, does the notion that he's been running 2:40 make sense?  Maybe.  By the clock he should have left at 3:14pm, and if he'd been running 2:40 at 5:50 Trackleaders thought he left at at 3:10.

So that all makes sense, but where did the 1:40 rest come from?  Here's my guess:  IonEarth was incredibly sloppy in their treatment of pre-event data in the Junior Iditarod, including readings from when the trackers were in someone's car or truck on the way to the start.  So, early in the race you'd see average speeds of something like 45mph.  So what I think is that they truncated the pre-event data up to the start from some of their calculations and displays, but not the run/rest calculations.  The tracker was on Rudy's sled and turned on for 1:40 before he started the race.

So the other thing that doesn't pass the smell test, as it relates to rest time, is the average speeds.  If someone is taking rest their average speed will fall relative to their average moving speed (and IonEarth, seriously, kill that "moving average" label, at least if you don't want people to know that you don't have people working on this who've been through an introductory statistics class - it doesn't mean what you're using it to mean, and I think you could be using actual moving averages to provide better insights into your data).  Here, they're identical, as you'd intuitively expect early in the race.  If he'd really spent 1/3 of his race time parked, the average speed would be substantially lower.

So, we're not off to a good start, IonEarth analytics and I.  This is just really sloppy, and it's pretty clear that it did not go through any QA process.  There are a couple of things they can do.  One is to manually edit the raw data down.  Preferable, I think, would be to include the team's actual start time in their input data and use the Holy Mackerel of really simple programming to completely exclude anything that shows up before the start time (or after the finish time).  What they have now is internally inconsistent, confusing, and lacks the ability to explain what's happening on the trail.


  1. You are definitely on it. The tech people are not connected to the rel world. It's all an abstraction to the. It's the responsibility of the Iditarod organization to give them the feedback they need and to see that it is carried out.

  2. As Chris said, "Plotting speeds against time is not 'analytics.'"

  3. I just read a lot of your past posts regarding the IonEarth analytics. I agree with all you points and especially in the comparison of software between Iditarod and Quest. That RaceFlow too can tell us so much. If we can just get Trackleaders to make a few options it would be even better.

    Keep up the interesting posts - I am interested to see how IonEarth handles odd data. I noticed an elevation glitch in Dallas' data just recently? What is with the Iditarod Insights too? some are just way off... reading bad data in a database?

    thanks, Darren

  4. I think there are a few issues, some related and some not. The new website is being hosted out of Amazon Web Services (AWS) and my best guess would be that they either did not move the Insights data to a new AWS-hosted database or that they did and ran out of time to change the code. I don't know what it's like in the Yukon but the IT situation in Alaska is really grim.

    Unrelated to that are the many, many problems with IonEarth. My best guess would be that Iditarod went to them and said "Analytics, please!" and not really knowing what to do or having any in-house statisticians, IonEarth came up with something that's frankly a big mess. And not having any in-house statisticians they don't understand issues around dirty data. So there's a problem with bad design, both in terms of chartjunk and in terms of bad data analysis.

    But, there's also a problem with software quality and quality assurance practices. I'm sure they never load-tested this thing because if they had they'd have discovered the problem with blank/disappearing analytics. And because the IT situation in Alaska is as isolated and out-of-touch with current best practices as it is, Iditarod probably didn't have any sort of acceptance testing or criteria because they didn't know to write it into the contract.

    And then there's the attitude problem. IonEarth used to have this wonderful guy named Russ who worked with fans to get bug reports and feature requests flowing in and out of the company, and he's also do tech support, answer user questions, etc. I swear this guy didn't sleep during Iditarod. He wasn't doing that for IonEarth last year and they replaced him with this surly guy who responded to bug reports and feature requests with some pretty defensive and unhelpful posts. Now there's nobody, which I guess is an improvement over Surly Guy but isn't going to help them sort out their problems.

    Sometimes I feel like the jackass from Silicon Valley when I talk about this stuff but I'm not sure it's understood by either the Iditarod or the Quest that now that they've got these huge international audiences they've got people watching and participating who are used to functioning the way the world outside Alaska functions.