Friday, January 18, 2013

Raise your glass




if you are wrong in all the right ways

I come from a technology culture in which it's taken as a given that if you aren't making mistakes it's probably because you're not actually doing anything.  Having someone else point them out is an opportunity to fix something broken, and make it better.  That's a very, very good thing.  I tend to work in environments in which a lot of people review my stuff and over the years I've come to see it as a big net positive when someone finds an error and I'm able to fix it before it propagates.

But, sometimes - too often, probably - I forget that the rest of the world tends not to operate like that and people regard having their mistakes identified as a personal affront.  I don't see it that way and in my post about errors in the Copper Basin data forgot that that's not the way most of the world works.  I screwed up by writing badly and not making that clear, and I apologize.

So here's the thing about the Copper Basin: like other races, by the end of the first night without sleep volunteers are tired, and by the second they're approaching burnout.  You've got revolving crews of volunteers at the checkpoints and different people have different ways of recording ins and outs.  Our experience with our own races has been that it's not uncommon to forget to record the number of dogs. Not everybody is equally good at arithmetic, and frankly arithmetic on times isn't very intuitive in the first place and can get downright confusing when you cross a day boundary.  Errors are inevitable.

Going forward from here, I'm interested in a couple of things:
  • good race data
  • the things you can do with good race data, and
  • making it easier for race organizations to deal with their data
The first thing is that people really are interested in your data.  Maybe not everybody, but I've been a little surprised to see that my spreadsheet from the Copper Basin is still seeing some use (I don't know by whom - the usage information is anonymized before I ever see it).  Inaccurate data are less useful, not only because they make it difficult to understand what's really going on but also because the more things are wrong, the more everything becomes suspicious.  You can't tell what's correct and what's not.

It's tempting to look at race timings as big piles of individual numbers, but there are stories buried in there that can be pulled out with some statistical analyses and with good summary graphics.  One of the reasons I'm such a fan of Trackleaders is that they are working hard to come up with new and better ways to reveal those stories to fans, whether it's the race flow chart or their (pretty bad, at least for dog mushing) projections.  I have a few things I'd like to try with the Copper Basin data when I get some time (that's becoming my theme song), which is why I care about the data quality.

Anyway, probably more immediate is the question of what can be done to make race volunteers' lives a lot easier while improving the data.  Improving the data at the cost of creating more work for the checkpoint and web staff is not an option, and making things easier for the volunteers by not showing any data, or permitting just really awful data, isn't a very attractive option.  But, there are a few things that can be done:
  • Create systems in which you only have to enter a given piece of data once.  A given piece of information, whether it's bib number, arrival time, number of dogs, whatever, shouldn't have to be retyped.  This suggests spreadsheets, databases, something to automate the process
  • Use calculators, particularly for time differences.  It may be a spreadsheet, it may be a smartphone app, it may be a website, but don't do any arithmetic that you don't have to do
    • To use spreadsheets, format arrival and departure time as number->time, and format the difference between them as hours.  Take a look at the format menu for your spreadsheet, although if you're using Microsoft Excel or OpenOffice you'll need to use the TEXT function.
    • A really excellent online resource for time arithmetic is the set of calculators at timeanddate.com.  You'll want this one.
    • As we know all too well, race checkpoints can often be in fairly remote places with no telecommunications or internet, so you won't be able to use web resources.  There are several date calculators available for free for smartphones, although I haven't tried any and would be loathe to recommend one.  If none are adequate it should be quite straightforward to put together something for iPhone and/or Android, since both offer nice time picker primitives.
  • I'll acknowledge right now that this is kind of a pain in the tuches, but sanity check your data.  There are some quick heuristics that you can do to identify potential problems and protect yourself from uploading funky data.  For example,
    • the number of dogs in a team should be the same for the departure from the previous checkpoint and the arrival at the current checkpoint.  If not, ask questions
    • without having to figure out what the actual difference is between 7:25am on 1/18/2013 and 11:23pm on 11/17/2013, you can look at the last digits in the times and see if what you get when you subtract them matches the last digit on the elapsed time calculation.  In this case, if it's not '2' you've got a problem
    • you can do the same with hours.  Even if you're not into carrying hours and days, you can get a sense for when a calculation is two or more hours off
Anyway, I'm really interested in helping to identify, and develop where necessary, tools to lighten volunteers' load, and incidentally produce more reliable results.  If you've got suggestions about what would make your life easier, definitely let me know.  In the meantime the Kuskokwim 300 just started, with some potentially dangerous weather starting to blow in.  The race is being tracked and Trackleaders has a new toytool that looks pretty interesting - a replay button.  More soon!

No comments:

Post a Comment