Saturday, January 26, 2013

Tidying up a little more

Okay, now that we've figured out how to copy the NL 300 leaderboard spreadsheets to a place where we can sort them and (hopefully) merge them into a set of data from multiple (or all!) checkpoints, we need to reformat them into something that we can do arithmetic on.  Note that in the best of all possible worlds the data will already be in a format you can use, and you'll be able to roll up your sleeves and go to town on race data immediately.  However, we don't live in the best of all possible worlds, and you've got to work with what you have, so here we are.

[Throughout I'm going to assume that you've got basic spreadsheet knowledge but aren't a hardcore spreadsheet jockey.  You know how to reference cells and do calculations based on cell contents (that is to say, you know to preface a calculation with a '=') but aren't necessarily familiar with some of the more obscure issues around formatting or with spreadsheet functions]

The first problem is that the time is represented in a 12-hour format rather than a 24-hour format, so that the spreadsheet doesn't know whether "2:00" is 2am or 2pm.  For the first checkpoint this is straightforward - everybody arrived after noon but before midnight, so we can add 12 hours to get to a 24-hour representation.  How do we do that?  With a formula, plus a little bit of information about data formats.

Any piece of data is just basically a bag of bits.  When a computer sees the bits 01000001 it doesn't know (or care) whether that's an ASCII 'A' or the decimal number 65 or hexadecimal 41, etc.  It's up to the software that manipulates and displays that set of bits to know and/or decide what it is and how to deal with it.  And so it is with spreadsheets that you often need to tell a spreadsheet what format is used in a given cell.  Google often guesses correctly, but not always.  It has a reasonably rich set of time formats (as well as some handy time functions) and you can do a surprising amount of work with not very much effort.

So, if you go to the format menu you'll be able to see some of your options for displaying times and dates:


However, Google is smart enough to guess that when it sees three sets of two digit numbers separated by colons, it should treat it as a time-formatted value.  And when you want to manipulate something that's formatted as time, you'll need to make sure your numbers are in the right format, too.

Here's an example: we want to add 12 hours to arrival time in Yentna, outbound.  The obvious thing to do would be to take a cell that has the time for a given musher and add 12, but the spreadsheet has no way to know 12 whats you want to add.  12 hours, 12 minutes, 12 seconds, ... ?  So here I'll add a column to the right of the arrival time column to hold the adjust time, and try some arithmetic:



So, that doesn't pass the smell test - that's obviously not useful.  Although, come to think of it, a world with a 291:00 o'clock might make it easier to get more done in a day.  Anyway, the way to fix this is to use the "time" function to tell the spreadsheet that you want to add 12 hours.  The time function takes three arguments: hours, minutes, and seconds.  And what we're going to do is add time(12,0,0) to the contents of column C (the arrival times).



And that, by golly is correct.  The next thing to do is to copy the contents of that cell and paste it into every cell in that column that has a musher in it.  The spreadsheet is smart enough to figure out that you want to reference a cell in that row and makes adjustments when you paste it in:



That's better!  But it's not perfect - in a multiday race, we need to know which day, too.  Let's start with an easy situation - teams were going in and out of the checkpoint today.  Let's add another column for our new date/time arrival times.  We're going to use the "today()" spreadsheet function to get today's date, and add our corrected times from the previous step (note that I've added some column headings so that I can keep track of what I'm doing):



And once again, we'll copy this formula and paste it into the other rows in the column to fill out the table.

But wait!  The mushers didn't arrive at the checkpoint today (Saturday), they arrived yesterday.  So here's how to add a specific date, rather than today.  Instead of using the today() function, do the same thing but use the date() function instead.  It takes three arguments in this order: year, month, day.  Here's how it looks when I add January 25, 2013:


So, that's pretty easy.  As usual, copy the contents of that cell and paste into the other rows.

The next post will show how to deal with figuring out if you've crossed into the next day, and how to deal with missing data.

No comments:

Post a Comment