Thursday, March 22, 2012

Another tracking service!

The 2012 Percy DeWolfe started this morning.  It's a very nice, small race from Dawson City to Eagle and back on the Yukon River.  This year there are six entries in the main race, with additional entries in the shorter "Percy Junior."

I was a little surprised to see that they were providing trackers (although it's something we've talked about doing for the Two Rivers 200, another friendly, small mid-distance race).  When I saw that trackers would be available I assumed it would be through Trackleaders, since they're low-cost, use inexpensive hardware, and are familiar to the dog mushing community because they're used in so many races.

So, I was very surprised to see that the tracking service is being provided by Mammoth Mapping, a Dawson City-based GIS company.  They're projecting locations from Spot devices onto an embedded Google Earth map.  They aren't doing much beyond projecting locations, although if you click on each of the mushers you'll see their checkpoint times, which I gather are pulled out of manually-entered checkpoint timesheets.  There are no speed or distance computations, no historical track, no analytical tools.

I think that at this point, given what's been available through Trackleaders and IonEarth (the tracking service provider for the Iditarod), if this were a bigger race or one that attracts a broader fan base, people watching the trackers would be frustrated by the limited functionality provided by the Percy trackers.  But it seems to me that this is just about right for a small, friendly slightly out-of-the-way race.  When talking about whether or not to provide trackers for races here in Two Rivers one person said "I don't want [famous musher] showing up!", suggesting that he thought that trackers were both an indicator of a fancy, high-end race and likely to draw big-name mushers, causing us to lose the friendly local feeling.  What we're seeing with the Percy is that there's a happy medium, where we can watch the race unfold without bringing to it the competitive, less intimate feeling of one of the huge mid-distance races.

Sunday, March 18, 2012

Experimenting with Google Docs spreadsheet

The Two Rivers 200 was this weekend.  Although it's a small, local Alaskan race we always get a few ringers in there, and now that mushing has turned into a spectator sport interest in seeing results as they come in extends beyond just the participants and friends.  So, we decided to try using Google docs spreadsheet to share the data with fans online.

Throwing something basic together was trivial, but I'm terrible at user interface and esthetics so Chris cleaned it up to make it more legible and easier to deal with.  Here's what we ended up with, basically just in/out data.

Once we had that in place it took just another 1/2 hour or so to put together a sheet that contained run-time summaries for time between checkpoints (see the tabs on the left-hand side of the bottom of the spreadsheet).  Google spreadsheets does a nice job supporting multiple sheets and allowing referencing between sheets (and hooray for providing the ability to do arithmetic on time and date!).

Here's what we found:

  • This was quick and easy to put together, and having it embeddable in other web pages is a big win
  • Format and arithmetic support was richer than I expected
  • It's something that pretty much anybody who's ever worked on a spreadsheet can figure out how to use
  • We did lose a little bit of data in what appeared to be a race condition situation - two people working on the same cell at the same time.  When we noticed it we added it back
  • We really needed to be more systematic about how we gathered the data in the first place, since there were checkpoints with no phone or internet, etc.
  • There appears to be a data validation glitch, in that stuff I didn't want displayed if it didn't meet some criteria were displayed, anyway
  • I think we can code our way around that last one (for example, if a given value is less than 0 display 0).
  • I love being able to provide different views into the same data and that's definitely something we can do here.  Fans are often interested in things like run times, average traveling speeds, etc., and that's something we can do pretty easily with spreadsheets.  We can do it with web pages, too, but frankly it's a lot more work.  I cannot overstate how easy this was.
  • The big drawback was that you need internet connectivity to be able to update the spreadsheet, and that's not always available.  
So, basically I think it was a win, and certainly the data display was superior to much of what we've seen from a number of other small races.  The biggest challenge is definitely the internet connectivity question.

[edited to add: I don't think this is a great approach for major races, since I think the question of who owns the data and what happens to them later is not a trivial one]

Thursday, March 8, 2012

Speed calculations - where do they come from?

When you click on a racer on the IonEarth satellite tracker, it shows you their geographic coordinates, the time of the reading, the race mile, and their speed.  I addressed the question of how they calculate the race mile in this post, and I thought it might be interesting to see how they calculate speed.  One thing that interested me a lot is why their speed numbers are so smooth, compared to the ones being computed by the Spot-based trackers.  The latter can produce quite erratic numbers with a systematic bias on the low side.  Ultimately where I ended up is that the IonEarth GPS units are probably doing more work than I realized, and are another example of good hardware engineering and smart tradeoffs.

Consumer GPS units typically calculate and display speed for you, but they're taking frequent location readings, which drains the batteries pretty quickly.  The IonEarth battery lasts for two weeks in extreme cold, so it's seems unlikely that they're taking frequent readings.  My starting assumption was the obvious bet that they're using the distance between adjacent readings to calculate speed.  I chose Paul Gephardt at random and two readings as he was nearing Rainy Pass: one at 10:20am on Monday and the following at 10:30am on Monday.  Based on the coordinates from the GPS readings, he traveled 1.35 miles in that 10 minutes.  Multiply by 6 (since we've got 6 10-minute periods in an hour), and we get 8.1mph.  However, the tracker says he was traveling 8.4mph in that period.  Why the difference?

So, I backed up to find the difference between the 10:30 reading at the 10:10 reading.  That gave me 2.7 miles in 20 minutes, or 8.1 mph.  Still no match.  Back one more!  Paul traveled 4.1 miles in 1/2 hour, or 8.2 mph.  I'm still not getting the 8.4 mph that the tracker claimed.  So, I went back and calculated the distance he'd traveled since 9:30, or exactly an hour, and I got 7.8 miles - not even close.

This is starting to get annoying.

What I've been doing, here, is trying to calculate what are known as "moving averages."  Moving averages are a terrific way of smoothing erratic data and looking at longer-term trends.  It's a handy tool for taking a look at stock trends, polling data, all sorts of collections of numbers where you're interested in how they change over time.  I've noticed that the Iditarod speed numbers are more consistent than what we'd seen in the Quest and it occurred to me that they might be calculating moving averages on speed (although when they're stopped, they're stopped) as a way of smoothing the speed numbers.  If they are, I can't see it, since the speed numbers all seem to be on the high side of what I'm calculating (you wouldn't see a consistent bias in an average like that).

So, the next candidate is weighting.  Trackleaders knows that since they're calculating speed based on the distance between two points and a racer is never traveling on a perfectly straight line, they need to do something to correct for underestimating the distance traveled and therefore underestimating the speed.  So, they increase the distance by about 7%, or basically multiplying the distance times 1.07.  Is that what IonEarth is doing?  Back to the calculator!  Let's see if the speed calculations are consistently high by some percentage.

So, the tracker says he traveled at 8.4 mph from 10:20 to 10:30.  Based on the straight-line distance he traveled at 8.1 mph.  In other words, the tracker says he traveled about 4% faster than my calculations.  From 10:10 to 10:30 he traveled 1.4 miles.  The tracker says he was traveling 7.7 mph, and by my calculation he was traveling 8.4 mph.  Whoa!  That's off in the other direction, so clearly they're not weighting.

So, so far it's not really clear where the speed calculations are coming from, by just looking at the data being provided to us.  However, my best guess at this moment would be that the GPS is actually taking location readings more often and using those to calculate speed, but the only locations they're displaying on the map are 10 minutes apart.  One of the primary issues here is preserving the battery, and I think it would almost certainly use less power to calculate the speed on the GPS device than it would to uplink the additional data.  If this is what's going on it's another example of excellent hardware engineering on the part of IonEarth.

Tuesday, March 6, 2012

You know what would be pretty great?

If IonEarth had some predefined sets of mushers, so you could choose one menu item to see all the rookies (the rookie race is a major race-within-a-race), the women, the Canadians, etc.  There are a few subsets of the racers that group together naturally and that we're trying to sort through.  What groupings would you like to see?

Let's do some arithmetic

We're starting to reach the point in the race when leaders are beginning to sort themselves out from people who are out of contention and patterns are starting to emerge.  Granted, it's still early in the race and there's so much that could happen, but still, it's not likely that someone who's 100 miles back at this point is going to move into the lead pack.

But fans like to split hairs about who's in front, cite numbers from the GPS trackers, etc., so I thought I might take a look at the numbers and see if I can't help fans understand them a little better.

To start with, the GPS tracker displays are not providing very useful information about who's where.  They've got some anchor points and when a team is closer to a given point than they are to any other anchor point, they're assigned that "trail mile."  So, you can look at the tracker and have it be perfectly obvious that a bunch of teams are spread out along 5 miles of trail, yet the tracker will show them all at the same trail mile (see this).

I think that if you eyeball the map and the map scale and the mushers on the trail you can get a pretty good idea of just how far apart they really are, but if you want to calculate it exactly (and I know there are those of you who do), you can figure it out from their GPS coordinates.  Finding the distance between two sets of lat/long coordinates is kind of a pain in the tail, but fortunately there are some online tools to help you out.  I really like - no, I really LOVE - GPS Visualizer.  They've got calculators, they've got visualization tools, they've got maps, maps, and more maps, and I think they're the go-to resource for trying to solve map-related problems.  So let's do that.

Right now Aliy Zirkle is a few miles out of McGrath (and I suppose this is where I should say, "Two Rivers *represents*!  Go, Aliy woot woot woot!"  Okay, got that out of my system).  John Baker appears to be a few miles behind her.  Both GPS readings seem to have been taken at the same time.  How apart are they really?  I need to take two things into account: the distance between them according to their GPS coordinates, and the time differential that's going to need to be made up when they 24.

So, the first thing I'm going to do is go to the Calculators page at GPS Visualizer, where I see this a whole bunch of blank text boxes.  What I'm trying to do is calculate the "great circle" distance (shortest distance on a sphere) between two points.  I'm just cutting and pasting Aliy's values as is into "Lat. 1, Lon. 1" and John's into "Lat. 2, Lon. 2"  I hit the button that says "Distance->", and viola, we've got 2.566 miles between them:



So here's some crude arithmetic: if you figure they're running about 8mph, 2.566 miles is roughly 19 minutes apart.  One more piece of information: start time differential.  Aliy left Willow at 14:24, and John left Willow at 14:18.  That's really close together, but the bottom line is that John "owes" 6 more minutes of rest time than Aliy does, which puts them at about 25 minutes apart in reality.

So, for those who are really into the numbers, this is a pretty simple, straightforward way to get a more precise handle on where teams are in relation to each other than you can get off the tracker data itself.

Monday, March 5, 2012

Data presentation - they're doing it right!

As I mentioned in an earlier post, some extremely clever people associated with the Can-Am 250 up in Fort Kent, Maine figured out how to project dog team trail positions on a topo map based on arrival and departure times at checkpoints and safety stations, essentially putting together a very nice tracker without having to rely on GPS devices.  It's not as accurate as an actual GPS but it still gives a pretty good sense of how the race is unfolding and it's got some nice features.

But they've done a lot more than that - they've also put together what I think is the best data display tool for people who are trying to understand what's happening in a race.  Visually, it's what's sometimes referred to in the industry as "fugly" (that's a technical term, look it up!), but they've figured out how to show a big chunk of data in a way that's comprehensible, by giving us the option of different views into the data.

Basically, we're all working with the same chunks of information: start time, checkpoint arrival time, checkpoint departure time, mandatory rest taken, mandatory rest not taken, and finish time.  From these you can compute things like speed and pace, etc., but displaying these data along with the checkpoint data can be difficult to do in a way that doesn't look like one big grey mass of numbers.  So, the Can-Am guys don't try to show you everything at once, but instead give you different views of the data.  So simple it's brilliant.  Here's what's at the top of their data page:



The top row presents different ways of looking at the data.  Hit the "SETUP" button and it gives you a quick overview: checkpoint names and trail distances between them.  The "RESULTS" button gives you run times per leg and the total race run time.  "ARR/DEP" gives you checkpoint arrival and departure times, "MPH/REST" gives you trail speed per leg as well as checkpoint layover times, etc.  The point here is that we're using the web, and there's no need to cram all the data onto the same page.  These data can all be computed from the arrival/departure times (plus knowledge of mandatory rest and start time adjustments), but they do the arithmetic for us and make the data easy to comprehend.

I think this is a fantastic model, and I'm starting to play with putting together a Google spreadsheet that does something similar with sheets for the upcoming Two Rivers 200.

Anyway, kudos to the Can-Am 250 for their extremely clever website.

Sunday, March 4, 2012

Sort order in the face of crappy location granularity

As I discussed in a previous post, it appear to be the case that IonEarth is calculating people's distance on the trail based on proximity to some point, and those points can be some considerable distance apart.  Because they're as far apart as they are, if you just look at the tabular data it appears that a bunch of teams are all in the same place.  If you look at the tracking map it becomes clear that they're not.  The real problem is that when you try to sort the tabular location by trail mileage (click on the "Mile" column header under "Selected Racers") the results do not look that much like what you see on the map, particularly early in the race before teams are strung out along the trail.  Here's an example, with a few of the leaders having arrived in Yenta and other teams starting to stream in:


I've sorted by trail mile on the tabular listing, and you can see that the Trail Breaker is at around mile 152, and we've got 5 teams plus the Teacher on the Trail "at" mile 59.7, which is the Skwentna checkpoint. However, if you look at the teams at mile 59.7, aside from Ray Redington and Jim Lanier, who really are at the checkpoint, the listing is ordered

  • Bill Pinkham
  • Jodi Bailey
  • DeeDee Jonroe
If you look at the map you'll see it's actually 
  • Jodi Bailey
  • Bill Pinkham
  • DeeDee Jonroe
From the group at mile 54.1 the top few in the tabular listing are
  • Cim Smyth
  • Jeff King
  • Aliy Zirkle
  • Trent Herbst
while on the map, it's 
  • Kelley Griffin
  • Jeff King
  • Aliy Zirkle
  • Paul Gephardt
What appears to be happening is that within each clump of people at a given trail mile, they're sorting by bib number.  Not super helpful, right?  Could they do better?  I think so, and here's why:

If you're familiar with GPS or you've done much navigating with a compass, you know what "triangulation" means, and you understand that you can pinpoint location by comparing your distance from other points.  In the case of the IonEarth tracker they're already calculating distance from their anchor points in order to place them at one.  If they're closer to the 54.1 point than any other point, then they list them as being at mile 54.1.  In order to determine what they're closest to they need to do the arithmetic to compute distance from other anchor points, as well.

So, this shouldn't be difficult:  Let's say there are two teams both considered to be at mile 39.8.  Musher A is 3 miles from 45.6 and 2.8 miles from 39.8, and musher B is 4 miles from 45.6 and 1.8 miles from 39.8.  It's not that hard to figure that A is leading B.  Frankly computers are pretty good at this kind of stuff, and since IonEarth has to do these calculations anyway, I don't understand why they're not using the results of those calculations to sort trail order correctly.  There may well be a very good reason, or it's possible that there's something entirely different going on in how they calculate trail mileage.  But, I tend to think that the simplest explanation is most likely the correct one, and I think this is what's going on with the Iditarod GPS trackers.

The Iditarod trackers, and "trail miles"

IonEarth redesigned their tracking system between the last Iditarod and this one, and with the teams just having left Willow we're getting an opportunity to look at the thing and try and figure out how it works, what quirks might be confusing, etc.  It turns out that the trail mile representation might be really confusing to a lot of people.

The first thing to point out is that the red line on the map, which represents the trail, doesn't actually follow the trail.  It's pretty clearly the case that someone drew particular points and then lines were drawn between the points.  This is different from either drawing the trail in continuously freehand, or sticking a GPS on a snowmachine, sending it down the trail, and recording the track.  So, using this approach you get something like this:


Obviously the trail is going to stay on the river and not run up hills (in a perfectly straight line, no less), so it's no surprise to see dog teams deviating pretty far from the "trail."

Anyway, the first thing I wanted to look at was distance calculations (which is, of course, related to speed), so I took two recent GPS readings off Josh Cadzow's track: one at 4:20pm and one at 4:30 pm.  According to the tracker he was at mile 3.1 at 4:20 and mile 8.6 at 4:30, which means he would have been traveling at over 30mph.  He's fast, but ...  Eyeballing the locations on the map, alongside the scale provided on the left-hand side of the map, didn't look like he'd traveled anywhere near 5.5 miles.  So I did my own calculations on the distance between the two GPS readings (N 61°43.129', W 150°10.21' and N 61°42.903', W 150°12.748') and came up with 1.4 miles, which looks right and gives us a much more reasonable speed of 8.4mph.

The nerd instinct at this point is to roll up our sleeves and figure out what sort of arithmetic error would have led them to a result of 5.5 miles in 10 minutes, but a little more poking around found something odd:


and this:


In a nutshell, we've got Josh in two different places, each of which is labeled as being race mile 8.6.  The next obvious step was to calculate the distance between the 4:20 reading and the 4:40 reading, and rats!  2.52 miles -- not even close to 5.5.

Continuing to scroll through the data I found that there were 3 different readings which reported being at trail mile 3.1, so I tried the first of them (at 4:300pm), and came up with 5.1 miles.

So, obviously the calculations aren't reproducing the trails miles being reported and something odd was going on.  Chris and I started kicking this around and independently came to the same conclusion: one reasonable explanation is that rather than calculating mileage based on map locations, they're anchoring "trail miles" to those map points I mentioned up at the top.  That is to say, this:



When you're closer to one of these points to any other point, that's your trail mileage.

I don't know for a fact that this is what they're doing but it seems like a reasonable guess.  The distance between those two points certainly eyeballs out at 5.5 miles.  I hope IonEarth can tell us whether or not this is what they're doing.  I don't love it but dealing with a 1000-mile GPS track has got to be extremely challenging and while this would probably something of a kludge, it's not an unreasonable kludge.  But it definitely might be confusing to fans.

Saturday, March 3, 2012

They have got to be freakin' kidding

I renewed my Iditarod Insider GPS tracker subscription today:


It's not that they're shortening subscriptions to six months, it's that they've started expiring them on 30 June.  Thanks for the heads-up, Insider!  It looks like they've basically doubled the cost of a subscription without informing folks.  So, I checked and while it's not on the front page of the Insider subscription signups and it's not on the payment page, they've buried it in the Terms of Use (a separate page).  Odd way to build fan loyalty.

Friday, March 2, 2012

A little more comparisonizing

I'm a little peeved with Iditarod Insider right now.  They shortened my last year's subscription to six months (i.e. they basically canceled it) and appear to have moved my renewal dates around.  Other people have been noticing similar issues and complaining, and the Insider Staff have been unresponsive.  And that reminded me that there's an issue here around business models, and differences in business models between the two major races.

My life is organized around dogs and technology (not always in that order, sad to say), and I've come to recognize that business models matter a lot in technical contexts.  A company with a very centralized revenue model is going make different architectural (and other) decisions from one with a distributed model.

The Iditarod is a massive, highly commercial undertaking, with many sponsors, occasional broadcast contracts, an apparently well-compensated professional staff, and the race roster to show for it.  It's very well-known outside the mushing community and has become a major tourism event for Alaska.  A few years back they decided to try to recover some of the costs of providing online access to the public by charging for it.  They use a very high-end (that is to say, expensive) GPS tracking service, professional media production, and for a few years were even putting up a helicopter with a Steadicam hung off the bottom.  Eventually they jacked up the cost to subscribers by splitting the GPS tracking system and video access subscriptions, and they quietly dropped the helicopter and Steadicam (and if you haven't seen the video of Jessie Royer on the coast with her lead dog Kuling powering into a strong headwind like it was nothing, you might not recognize what a loss that was).  They also appear to be locked into a sponsorship contract with GCI despite every single year GCI's website being unable to support the video traffic load during the starts (it's almost as if they're surprised that everybody who paid to watch is, you know, watching).  Basically, they've locked themselves into

  1. a very high-cost approach to running a race, 
  2. high-cost technologies, and 
  3. cost recovery by charging fans for access.  

The Quest doesn't have that kind of budget, and generally takes a more typically Alaskan improvisational approach, getting by with creativity, elbow grease, and a lot of metaphorical duct tape.  Their media team did an incredible job this year with stuff like lots and lots of low-cost GoPro Hero cameras, and their GPS tracking system uses less-reliable commodity (i.e. cheap) Spot trackers and excellent but buggy Trackleaders.com.  They rely on a few high-end sponsors, lots of fan donations, and low-cost (~$500) business sponsorships.  It's a scramble to cover costs every year, but every year they get it done thanks to a core of fanatical fans and a creative, flexible professional staff and board.  And because they don't rely on expensive services that they have to sell to be able to continue to provide, they're continuing to build fan loyalty.  And in the meantime, technology will continue to improve because that's what technology does, and the Spots will become more reliable and the Trackleaders software will get its bugs worked out.

The Quest has had a tough row to hoe financially but they're holding it together.  I believe that over time, the Iditarod is going to have increasing problems covering the costs of the high-end services they employ, and that the Quest's costs will probably continue to burble along at the lower end (granted that fuel costs are a huge issue for the race).  In the meantime Quest vendors (media and tracking service) are going to continue to squeeze more and more out of the lower-end tracking service and video/photography/audio, while there's really just not that much room for the Iditarod to grow the quality of what they provide.  That is to say, that over time the Quest's production costs for fan access are probably going to stay low and the cost to fans will likely stay free, and the Iditarod's production costs for fan access are going to stay high and will need to be covered by subscription fees, but the quality of services seems to me to be very likely to converge.

Thursday, March 1, 2012

Tracking without trackers

It's a very cool thing that tucked away in some of the most unexpected places are some really clever people doing really clever things.  One of these place is Fort Kent, Maine (pop. ~4000), home of arguably the premier distance dogsled race in eastern North America, the Can-Am Crown.

I mentioned earlier that the first time I ever saw dog teams move on a map while a race was underway was during the Can-Am Crown, some number of years ago.  They showed teams from all three of their races (30, 60, and 250-milers) moving on a topo map.  At first I thought they must have had GPS units on the sleds and had figured out some way to uplink data (this was before IonEarth came up with their hardware, and before Spot started selling commodity trackers).  It turns out that that was not what was happening at all.

My understanding of how it actually works is incomplete but this is what I've got so far:  They use checkpoint arrival times to calculate traveling speeds and then project those onto the map.  I've heard, but can't confirm, that they've got spotters on the trails reporting when teams pass so that they're able to get better granularity on times.  If someone has some more information on how it actually works, I'd be grateful if you could let me know.

You can go back and rerun past races, and it may be worth doing to get used to the user interface, which is not, er, fully intuitive, in preparation for this weekend's Can-Am.

So, here goes!  Go to the race data page, here.  You'll see a crapload of buttons across the top of the page.  The page displays race data for the current year.  If it's before the start of this year's race (Saturday, March 3 2012) you'll no data, but the buttons will be there.  Hit the down arrow button (circled in red in this image) once to get to the 2011 data:


Once there, hit the "TRACK!" button:


This will pop up a topo map with the trail marked in blue and a bunch of little yellow squares representing mushers, Swiss flags representing first aid stations, and big yellow squares representing actual checkpoints:


You can focus in on individual mushers and get data about where they are, their speed, how many dogs they've got, etc., and you can replay the race.  While the user interface takes some getting used to, I think this is a fantastically clever piece of work and the authors deserve major kudos.