Tuesday, January 24, 2012

Alaska Map Resources

Nerds don't just like time and date formats, we also like Neal Stephenson novels, spicy food, and maps.  By far the best-known and most widely-used online maps are Google maps and Bing maps, but by no means are they the only ones.  In addition to free online topo services (for example, mytopo.com), we've got an excellent Alaska-specific online mapping service that not only includes road maps and topo maps, in a few areas it also has higher-resolution satellite imagery than you'll find on Google or Bing.  It's the Alaska Statewide Digital Mapping Initiative and it's a cooperative project between the University of Alaska, the Division of Natural Resources, and other state agencies.  I live in Two Rivers, AK and spend a lot of time in the Chena River State Recreation Area and the difference in quality is pretty remarkable.  Here's what we get if we try to look at the confluence of the middle fork with the Chena with Google Maps:

Middle fork of the Chena, Google maps

Now, check out the same area from Alaska Mapped:

Middle fork of the Chena, Alaska Mapped

It varies by region but I have yet to come across an area in which Google has higher-resolution pictures but there are some where Alaska Mapped does, so the latter is always the first place I go when looking for satellite images of Alaska.  Also, it's got a topo layer, so it's trivial to pop back and forth among satellite images, road maps, and topo maps.  I'm not going to go into the various data services they provide because I think they'll tend to be less interesting to the casual user, but it's a fun site to explore and play around with if you like maps.

Friday, January 13, 2012

Do Twinkies really not freeze at -60F?

We're interested in other things besides the trackers.  For example, I've heard a few mushers say that Twinkies won't freeze at very low temperatures, even as low as -60F.  That seemed implausible to me, so I set out to find out if it's true.

 As it turns out, it's not.

Thursday, January 12, 2012

What data would you like to see?

So far we've been pretty focused on the trackers, but they're not the only source of information about how a race is going.  There are also tables maintained by the race committees, showing start times, checkpoint arrival and departure times, and so on.  There's tremendous inconsistency from race to race about what data are shown, how they're presented, and so on.  Because we use the data to answer questions, when thinking about what data to display we need to start by identifying what questions we're trying to get answered.

That one, I think, is pretty easy: "So how's everybody doing?"  Knowing the answer to that involves information that's not readily available from the trackers, even when GPS trackers are being used.  Arrival times at checkpoints is a given.  We don't always get departure times and I would really like to see that, myself.  I'd also like to see runtimes between checkpoints and how much mandatory rest time is "owed."

As for presentation, a tabular format is inevitable, both because it's easiest to maintain and easiest to read.  But!  This is the 21st century and it would be fantastic if RGOs provided tables that allowed us to click on any column to sort on it.  So, if I wanted to see in what order everybody arrived at a checkpoint I could click on that column, or see an ordered list of runtimes I'd be able to click on that column.  Maybe this weekend while I'm inside hiding from the seriously cold temperatures we're expecting I'll put together a Google Docs spreadsheet that shows what I'm thinking.

So, in summary, here are the columns I'd like to see:

  • start time
  • checkpoint arrival time
  • checkpoint departure time
  • run time between checkpoints
  • how much mandatory rest is owed
Too much data are too much data and I think much more than that would result in a table that's cluttered and hard to read.

That's a summary of what I'd like to see, but other people have different preferences.  What data would help you understand what's going on during a race?

Sunday, January 8, 2012

The "Race Flow" plot

The first few teams running the Knik 200 have arrived at the finish but most are still out on the trail.  During the race I became very fond of Trackleaders new "Race Flow" graph, which plots trail location against race time.  It's a fantastic tool for visualizing how the race is actually playing out.  Right now they're only showing the first nine teams and I'm really hoping that they're going to beef it up over the coming months, and add features like allowing us to select which mushers we'd like displayed, and putting better mouse controls on it.

Anyway, as of this writing the top nine teams have arrived back in Knik, and this is what a top-level view (the default) of the race flow looks like:

This is pretty straightforward.  They traveled at a pretty steady pace to Skwentna, did their mandatory layovers, and then traveled at a somewhat less steady pace back to Knik.  If you look closely you can see little wrinkles at about race mile 40, or about 4 hours into the race.  It's not too hard to guess what that's about: Yentna.  You see something similar on the return trip but it looks like it took them longer to travel from Yentna back to the start, and that shouldn't be too surprising, either.

At this level the plot doesn't provide that much insight, but let's zoom in a little to look at a couple of things that were happening towards the end of the race.  Let's start with Mike Santos, who ran a really strong race and looked like he could possibly win.  Then this happened (Mike is the purple line):

At arrow 1 we can see Mike's speed drop.  Not coincidentally, Jake (the beige line) speeds up and Mike is passed.  About 15 minutes later Mike stops (arrow 2), gets going again, stops again for about 20 minutes (arrow 3), trucks on for about an hour, and then stops again, for about another 20 minutes.  I don't know for sure what was going on but if I had to guess I'd guess he was moving dogs around to try to get his speed back (which he did).  So, these little lines are telling a story, and we can watch it as it unfolds.  

Now here's the really great story from the plot:

What we can see is that Lance (blue line) and Jake (beige line) were traveling at about the same speed (the slope represents speed - steeper is faster and flatter is slower, and these were parallel - i.e. the same speed) a little less than 1/2 mile apart when Jake made a move at about race clock 25.3.  If the curves are to be believed there was some passing going on, although this may be an artifact of the Spot updates coming in at different times.  At any rate, the lines converged, and that's when things got a little crazy.

The tracking map shows where the mushers are, but this terrific little plot shows how the race actually played out - where people rested, where they raced, where they passed, who really is traveling together, and all sorts of small pieces of data that help capture the essence of what's happening.  You can bet that I'll be glued to it during the Yukon Quest.

Projections update

Just a quick note:  with everybody having arrived in Skwentna, if the issue with the projections is that the software calculates them based on an incorrect assumption that they're headed south when they're actually headed north we'd expect to see them become correct when folks turned around and started heading south for real, but that's not what's happened.  Here's a look at the current leaderboard:

At this time, according to the tracking map, Nicolas Petit is about a mile from Yentna.  The projection says to expect him at race clock time 08:10, or roughly 13 hours ago.  That led me to wonder if the issue is actually just an incorrect offset, but I don't think so.  The tracker shows Lev about 4 miles back from Nicolas but he's projected to arrive 15 minutes later, which suggests that the tracker is projecting that Lev will cover those 4 miles at 16mph, and that ain't right, nohow.

It's worth pointing out, however, that the order in which the teams are projected to arrive in Yentna reflects what's going on on the trail, so whatever's going on here is systematic.

As always, remember the locations on the map do reflect where the team really was at the time the reading was taken, and those are data that can be relied on.

How long is the track? How long is the trail?

Melinda mentioned the conversations she and I have been having about how to look at the data, where the numbers may be coming from and what it all means. So tonight, I thought I'd chime in here on the blog and take a second look at what Melinda pointed out in her post about measuring speeds: Because the SPOT track is a succession of straight line segments, she wrote, "The actual distanced traveled by the dog teams will always be longer than shown on the tracker."

How much longer? Ideally, when we receive the position of a musher from the SPOT tracker every 10 minutes, we could measure the distance travelled not simply as the straight line between these points, but along the actual race trail. This would require that we know where exactly the trail is, and how long it is between any two points on it. Unfortunately, neither of these two questions is easy to answer.

To illustrate some of the difficulties, take a look at this screenshot of Jeremy Rutledge's track while on his way out towards Skwetna. We can see two ways the GPS track deviates from the trail as it is marked in red (as always, click on the image for a larger version):

On top ("Deviation 2"), the GPS points are more or less on the Knik 200 trail, but the line that connects them cuts across a curve in the red line. This is exactly the situation we talked about, and we could fix the distance measurement by going along the red line, instead of cutting across it. However, the track points labeled "Deviation 1" show that it isn't quite that easy: Here the GPS points look like they aren't even on the trail! So was Jeremy lost or taking the wrong trail? No, I think it's the red trail line that isn't in the right place.(*)

Ouch. So on the one hand we only know where the musher is every 10 min, and on the other we aren't even sure where precisely the trail is.

But let's soldier on with our goal to measure the dog team's path along the trail. As it is marked in red on the map, the trail is nothing but another track, but with more points on it than the SPOT GPS track of the dog team. Now comes a trick: I can get the points of the trail track by poking around in the web page code. Or I could have asked Matthew Lee from Trackleaders.com. In any event, I procured a .kml file (the kind you can look at in Google Earth) that contains the red trail track. Neat. The file for the Knik 200 contains 544 waypoints, with latitude and longitude, from start to finish, which is very roughly speaking 5 trail waypoints between successive GPS points of a moving dogteam,  on the average (at Lance Mackey's speed, that is). The .kml file for the Gin Gin 200 contained 1139 such latitude/longitude pairs.

Now I have all the ingredients assembled to calculate distances along the track, as best I can. What I have to do is:

  1. Get the latitude and longitude of two GPS points (as transmitted by the SPOT trackers).
  2. Find the two waypoints on the trail track that are closest to these GPS points, taking into account the direction the team is travelling in, and applying precautions if there are loops in the trail (which can make it really really complicated to figure out which trail segment to assign to our dog team).
  3. Sum up all the distances between successive intermediate waypoints on the trail between these two points.
The reader will notice that we're still summing up straight line segment lengths -- but this time along the trail, not just along the GPS track, so the more trail points we have, the more accurate this measurement will become. And if the trail information we have isn't too wrong, the result will always be longer than the first shortest-GPS-path approximation. But it will still be less than the exact distance travelled by the dog team! (**)

OK, this was long and complicated, so let's look at a practical example. What I did is to take the first half of the trail, from the start to Skwetna Road House. Along the trail, measuring along the red line, I am finding a length of 82.15 miles. As Matthew writes in another comment, Trackleaders has the trail length down as 84 miles - I'm reasonably close, with the main source of error being the formula I'm using to calculate distance in miles from latitude/longitude pairs (the Earth radius of 3,958.761 miles is almost certainly not accurate up here, but I didn't have time to research the best way to compensate). Now to compare with the SPOT GPS track length, I summed up all the 55 segments in Lance Mackey's track between the start and Skwetna. Result: 76.42 miles. 

So what does this mean? Based on these numbers, as a rule of thumb, distance and speed measurements between successive GPS track points ("X miles traveled at Y mph") are on the average too low by about 7 % compared to along the trail-as-we-know-it, and possibly around 10 % (with a good margin of error here!) lower than the real-world distances and speeds. Obviously, the curvier a segment of the trail, the greater our error - we knew that already - and our method also averages out all the ways the trail information is incorrect. But now I have a tool to quantify these errors, and if I want, to calculate along-the-trail speeds for any team. Even if the method is really really fiddly.

(*) If you think about it, you'll come up with many ways that can happen: The Trackleaders team may never have received a real GPS track of the trail -- if the trailbreakers didn't have their own GPS device, they may have drawn the trail on a map, or they may be using a trail map from a previous year when the trail was in a slightly different location, especially on the river where there can be overflow. Also, when the weather's bad and the trail blows in, the trailbreakers sometimes make last minute changes. And then there are inaccuracies in Google maps, which are quite common up here in rural Alaska.

(**) As Matthew commented earlier, Trackleaders is multiplying the along-the-trail distance by 1.073 to get even closer to the real length. It surprises me a little that this number is apparently the same for the Gin Gin and the Knik, even though we had approximately twice as many trail points for the Gin Gin. I'd expect along-the-trail measurements for the Gin Gin to be more accurate than for the Knik.

Saturday, January 7, 2012


Well, I said I was going to look at Trackleaders's projections (scroll down to "Leaderboard," click "Refresh with projections") during the Knik 200, and I just did.  I have to admit that I'm not really sure what's going on, but Matthew Lee from Trackleaders has been wonderfully responsive to questions and I hope he can help me out here.  I hope that explaining what I find confusing might help provide some insight into how to look at these particular data.  There are two issues I've noticed:

Here's the easy one, for starters:  I'm looking at the projections for arrivals into Skwentna.  Zoya Denure is marked on the leaderboard as having arrived, but Vern Halter is not, Christine Roalofs is not, and other folks who've arrived (you can see it if you eyeball the tracker - there's a clump of them there near Zoya, they're not moving, and if you look at individual pages they've been sitting there for some time) have not.  If you look at the tracker what appears to have happened is that the checkpoint is marked on the map as being further down the trail than it actually is, or at least the dog lot is not as far down the trail as the checkpoint marker.  Zoya is the closest to the checkpoint marker, so I'm guessing that she's within some margin that allows the software to recognize her as having arrived, while everybody else whose parked is just a hair too far and the software's not marking them as having arrived even though they have.  What that means is that it's calculating projected arrival times despite them already being there.  So, sometimes you have to recognize that some data aren't going to help you much and throw it out.  As I've described in other places, if the raw data (in this case, GPS breadcrumbs) and the summaries of the data are telling you different things, trust the raw data.

But let's take a look at someone who's through Yentna and on the trail towards Skwentna: Jeremy Rutledge.  Here's how he shows up in the projections:

This was taken when Jeremy was 25.9 miles from Lance Mackey, who we know is in Skwentna.  If you click on this so that you can see the times, Jeremy and many of the others are projected as arriving in Skwentna before they left Yentna.  The laws of physics say this isn't possible and I usually trust the laws of physics, so the inevitable question is what's going on here that's incorrect.  Chris glanced at it, said "Oh, right, this is an out-and-back trail!"  If you've read this piece on what data are actually included in each message from the Spot you might remember that it's basically just a geographic coordinate and not much else.  Not speed, not direction.  So our best guess is that the projections are being calculated based on an incorrect assumption about the direction the team is traveling at this spot on the trail - i.e. the calculations might be based on arithmetic in which it's assumed the team is heading south, when it's really heading north.  We don't know that's the case but it's our best guess at the moment.

Who's in the lead?

We're roughly 50 miles into the 200-mile Knik 200, and there's a natural interest in knowing who's in the lead.  I'm going to argue that there's really no way of knowing who specifically is leading, although we can know with some certainty who's not leading.  Or put another way, we know who's in front but we can't tell who's "winning."

This gets into projecting arrival times, which involves a little bit of arithmetic, a moderate amount of specialized knowledge about the individual mushers and how they like to balance their run/rest schedules (as well as what sort of terrain and weather they favor, etc.), and a good amount of guesswork.  It's particularly dicey so early in the race, although it's easier to make some guesses about what order they'll arrive in Yentna.

So right now the frontrunners are Mackey, Jonrowe, and Berkowitz, with Petit a short distance behind.  They're all within a few miles of each other, which at this point in the race, in the absence of other factors, probably would count as statistical noise.  I did the arithmetic a few minutes ago and (assuming the trail length is exactly 200 miles, which it isn't) if Jake averages .1 mph - that's a tenth of a mile per hour - faster over the rest of the race, and adjusting for Jake's later start (I can't find his bib number but it looks like he started 20-25 minutes later than Lance, who's got bib #1), Jake will arrive back at Knik Lake roughly 20 minutes before Lance.  Jake closed much of the initial gap between them but if you go to the Race Flow plot you'll see that the slope on Lance's line and the slope on Jake's line are basically identical, which means that they're traveling basically the same speed and covering the same amount of ground at the moment.  [Note that if the trail is actually longer it takes less of a speed differential to catch up and if the trail is actually shorter it takes more of a speed differential to catch up.]  But at the same time there's the question of who's going to rest how much, and if I had to guess I'd guess Lance can get away with less rest.

At the Quest finisher's banquet last year Allen Moore talked about talking with the press in Dawson.  When they asked what he thought was going to happen in the second half of the race he said "I don't know, but it's going to be something."  If you follow sled dog racing you know that one thing you can count on in a distance race is that something unexpected is pretty much guaranteed to happen.  You also know that Lance Mackey isn't just a great dog trainer, he's also a great tactician and he knows how to keep his competition off balance.

So, it's early yet, and the tracker is fascinating but I'm not sure how much we can divine at this point.  We don't know what's going to happen yet, but it'll be something.

The Knik 200 is underway

The Knik 200 is underway, with the first mushers on the trail and a huge bunch waiting to start.  The race committee must have been eating their Wheaties this year because it's a much bigger race with a much slicker package than they've had in the past, and it looks awesome.  They've got a redesigned website, ham radio operators, and Spot trackers with tracking provided by Trackleaders.

Trackleaders are continuing to experiment with format and presentation and with every new feature they're making the race easier for fans to understand.  The race has just barely started and there's not much to report but one thing did jump out at me almost immediately, and it's this plot:

Trackleaders call this "race flow" but it's basically just total race time against total race distance, with time on the x (horizontal) axis and total distance on the y (vertical) axis.  This doesn't mean much (yet) but I just had to laugh when I saw it.  Lance Mackey's line is straight - he's been traveling at a nearly constant speed.  However, you can see that the other people who have started came out of the chute faster and then slowed down.  If you've ever stood on the back of a dogsled you know that the first couple of miles can be a little "enthusiastic" and that the dogs do settle down into a more regular traveling pace within a few miles.  Many mushers intend to leave the chute at their traveling pace but not that many people actually do.  I verified that Lance's line isn't straight because there's just two points - there are four data points, and Lance is really traveling at what looks like a dead steady speed.

This should be a really fun race to follow.

Tuesday, January 3, 2012

Some oversimplified thinking about projections

I started writing about speed and pace calculations (they're a frequent source of confusion, based on Facebook comments on races) and it didn't take more than a couple of sentences to realize that it's nearly impossible to make that interesting.  Instead, I think I'd like to talk a little bit about speed, pace, and really, really simple projections of things like checkpoint arrival times.

Speed and pace are the inverse of one another, and both are calculated from time and distance.  It's very basic arithmetic but requires understanding what the relationships are.
  1. Speed: speed is the distance covered over a given period of time - for example, 10 miles per hour
  2. Pace: pace is the time it takes to cover a given distance - for example, 6 minutes per mile
As you can see, they take the same inputs (time and distance).

speed = distance / time
pace = time / distance

People seem comfortable with the concept of speed, probably because it's something most of us deal with every day.  Usually because that's the measurement used in car speedometers, or because that's the measurement used to express speed limits.  Pace seems a little less intuitively comfortable for many people, although runners often express performance in terms of how many minutes it takes them to do a mile.  There's a pretty decent discussion of how to perform the calculations here.

Probably the most basic way to try to project arrival times is to start with these pieces of data:
  • speed or pace (how fast the team is moving)
  • distance from the destination
Because: since you can compute speed/pace from distance and time, you can compute distance from speed/pace and time or time from speed/pace and distance.  Basically, if you've got any two of those pieces of data you can easily compute the third.  (See? Boring.)

Here's a really simple example: let's say that you can see from the tracker that someone is roughly 10 miles out and seems to be moving at 8mph, or is knocking out 7.5 minute miles.  If it takes them 7.5 minutes to go one mile it will take them 10 times that to go 10 miles, or 75 minutes.  Add 75 minutes (one hour and 15 minutes) to the current time and you've got a rough estimate of when they'll arrive.

But note that it's a very rough estimate, for a variety of reasons:
  • You'll need to factor in the age of the trackpoint - if it's a couple of minutes old, you're good.  If it's 4 hours old, you've got a more challenging problem
  • There's some natural variability in the rate of travel, so if the trail smooths out they'll start moving faster and arrive sooner, while if the trail goes to crap they may slow down quite a bit
  • The speed numbers provided by GPS trackers may not be very reliable, in the first place
  • If it's a great distance between where the team is now and their destination, they may throw in some rest breaks or even camp
  • The, er, occasional whimsical nature of dogs
  • other factors
There's an art to projecting arrival times and it's basically impossible to be right 100% of the time.  You can expect projections to converge towards reality the closer the team gets to their destination, but we tend to be more interested in projections when the teams are further out.  As the Knik 200 gets underway this weekend I'm interested in looking at how the projections on the tracker page perform.

In the meantime I love math but hate arithmetic, and appreciate the value of a good calculator.  There are many online calculators to help with speed and pace calculations.  Here's a basic one to help switch back and forth between speed and pace.