Monday, January 14, 2013

Experience using spreadsheets for Copper Basin 300

This weekend I've been using a Google Drive spreadsheet to follow the Copper Basin and do some basic calculations.  The spreadsheet is here.

So, a couple of observations:

  1. After about three checkpoints the single sheet approach dissolves into a very big mass of numbers, and it should be broken out in some form to make it more digestible.  Chris used a different strategy, and created sheets for each checkpoint.  It seems to me that doing that plus having a front sheet with summary statistics might be a good way to handle it.
  2. Using spreadsheets helps meet the goal of only having to enter a piece of data once, reducing both errors and effort
  3. Speaking of reducing errors and effort, having the spreadsheet calculate time differences, speed, etc. is an enormous win.  This can't be overemphasized
  4. However, if you're going to start calculating time differences and summary statistics, you probably need to be sophisticated enough to understand problems around missing data and how to cope with them.  Google spreadsheets and Excel have a basic "if then else" control structure that's easy to use if you're comfortable with lots of nested parentheses.  OpenOffice Sheet may, but I haven't looked at it 
  5. The rudimentary plotting stuff is helpful but you'll need a scratch space for the calculations.  I used a spare sheet that I just threw numbers into more-or-less willy-nilly.  When plotting histograms, etc. if you aren't able to reduce the data in some way you'll end up with a completely flat distribution, since it's rare that two teams are traveling at precisely the same speed.  So, using the "floor" and "frequency" functions I was able to group speeds by half-mph increments and plot those.  I didn't love the results but I think they can be improved a lot and may end up being useful.  Dunno, though.
  6. It is possible to provide multiple views into the data; for example display the musher stats for a checkpoint or checkpoint stats for a musher.  At some point, though, the spreadsheet turns into a big hairball and it becomes hard to understand the relationships and dependencies between sheets and cells, and in the long run it might be better to have better software that's written specifically for dogsled races
  7. Having something handles the arithmetic easily allows us to display a wider range of summary statistics and provide more information about how the race is playing out.  A lot of people are interested in those summary statistics but not everybody is, and it's important not to allow the analytical tables to clutter up the basic display and make it harder to read.
Jodi Bailey made some good suggestions on my Facebook page:
 I like the accumulated rest & average speed columns. I am thinking the plots show how many mushers were in each speed zone [ ... ].  Might be more interesting if you could plot mushers on it? Is that possible, may get busy looking. Also would love if the sheet could be sorted by current standings? Which would let people see that being high in the standings early on does not always reflect speed or rest time, which have a big impact. And maybe a simple check box for when a differential has been paid back to the clock, cause if that is calculated in rest it gets confusing on minimum mandatory rest races. (my 2 cents)
So, now that I've got a big pile of data, here's an opportunity to try a few things out, see what works well, see what's unnecessarily complicated, and try to understand what's involved in adapting it for different races.   I'd be really interested in hearing what people think about the spreadsheet that's there, and what they'd like to see be done differently.  (And, as always, let me know about any errors you find.)

No comments:

Post a Comment