My DSLR camera doesn't have GPS, so normally all my photos would not include the location of where I was when I took the photo. I used to use the Eye-Fi card that did geotagging, but that is no longer supported in the new "mobi" line. I could get an external GPS unit for my camera, but that sounds cumbersome and would only work with that one camera.
Since I already track everywhere I go, I figured I could use this data to geotag my photos when I upload them to Flickr. It turns out, due to the limitations of Exif, the metadata format that digital cameras use to store information about photos, it wasn't so easy.
Exif lets the camera write arbitrary text data into a jpg when it saves it. There are a handful of standard properties that most cameras write, such as the time the photo was taken, the camera settings such as shutter speed, f-stop, etc, and GPS location if the camera knows where it is. My thought was that if I know when the photo was taken, I can find out where I was at that time, and then add the GPS data to the photo.
Unfortunately, the format for storing dates in Exif does not support specifying a timezone offset. The format for dates is YYYY:MM:DD HH:MM:SS. Without the timezone offset, this series of numbers corresponds to many different actual points in time, depending on which timezone you interpret it as. So what I need is a way to turn the camera time into a specific point in time in order to find out where I was at that time.
I realized that since I have a complete log of my GPS coordinates, I should have enough information to piece this together. Essentially the question I am asking is "where was I when my clock read 7:00pm on July 16 2016?" Note that there are two parts to the answer: my location, and the absolute point in time. It's kind of like solving an equation where there are three variables and you know two of them. The three variables are: my location, the clock time, and the timezone offset. If we knew my location and the clock time, we could find the timezone offset. If we knew the timezone offset and the clock time, then we could find my location.
Where was I when my clock read "7:00pm on July 16 2016"?
If we knew what timezone I was in, then "7:00pm on July 16, 2016" becomes a single reference to an absolute point in time. But we don't know what timezone I was in yet, so there are actually 24 possible absolute points in time this could be. (I'm simplifying this problem slightly by ignoring the 30-minute offset timezones.)
The solution is to find my location (which includes the absolute point in time) at all 24 possible points in time, find the timezone offset that corresponds to each location, then find the location where its timezone offset matches the candidate offset. Below is an example:
Offset-less time in question: 2016-05-12 16:00:00
This could be any of the absolute points in time:
(I left out some of the less common timezone offsets I frequent for the sake of clarity in this example.) Now let's query my GPS database to find out what my local time actually was at each of these points in time:
|Potential Time||Time from GPS||Location|
|2016-05-12 16:00:00 -23:00||2016-05-13 10:59:03 -04:00||New York|
|2016-05-12 16:00:00 -22:00||2016-05-13 10:00:00 -04:00||New York|
|2016-05-12 16:00:00 -07:00||2016-05-12 19:00:00 -04:00||New York|
|2016-05-12 16:00:00 -08:00||no data|
|2016-05-12 16:00:00 -06:00||2016-05-12 17:59:21 -04:00||New York|
|2016-05-12 16:00:00 -05:00||2016-05-12 16:59:53 -04:00||New York|
|2016-05-12 16:00:00 -04:00||2016-05-12 15:59:57 -04:00||New York|
|2016-05-12 16:00:00 -03:00||2016-05-12 14:52:46 +02:00||France|
|2016-05-12 16:00:00 +00:00||2016-05-12 14:52:46 +02:00||France|
|2016-05-12 16:00:00 +01:00||2016-05-12 14:52:46 +02:00||France|
|2016-05-12 16:00:00 +02:00||2016-05-12 14:52:46 +02:00||France|
|2016-05-12 16:00:00 +22:00||2016-05-11 19:15:41 +02:00||Düsseldorf|
|2016-05-12 16:00:00 +23:00||2016-05-11 18:46:26 +02:00||Düsseldorf|
(Note that the times aren't an exact match, because my GPS device doesn't log a point every second. In reality it's more like every second when I'm moving and have a good GPS lock, and when I'm not moving, it records less data. Also on plane flights I sometimes lose the GPS signal part way through the flight which is why many of the rows in this case show the same time from my GPS.)
As you can see by comparing the potential timezone on the left with the actual timezone on the right, there are two offsets that match (highlighted in yellow), so we need to determine which is the correct one. This happens when I am traveling on a plane and cross timezones very quickly.
If we take the two candidates and look at the actual time difference in seconds between the timestamps described, the answer becomes obvious.
|Potential Time||Time from GPS||Difference|
|2016-05-12 16:00:00 -04:00
|2016-05-12 15:59:57 -04:00
|2016-05-12 16:00:00 +02:00
|2016-05-12 14:52:46 +02:00
From this, I can conclude that when my clock read "2016-05-12 16:00:00" it was at "2016-05-12 16:00:00 -0400" when I was in New York.
Most of the time only one offset matches, so this last step isn't necessary. It's only when I quickly cross timezones that there are potentially more than one match.
Since I want to be able to use this to geotag photos, it makes sense to include it as an API in the same system that stores my GPS logs. I encapsulated this logic in my GPS server, Compass with a simple API that returns the answer given an offset-less time. Now I can use it in my geotagging script!
Let's talk time scales real quick. Your computer's CPU lives by the nanosecond: most CPUs can get a few things done in each nanosecond – mostly simple math and comparisons. To make this easier to grasp, suppose you're the CPU and instead of nanoseconds, you live and work second by second. For clarity I'll keep this metaphor to a single-core of a single processor. You can hold a few things in your head (register). Not more than a dozen or two in your active memory, but you can recall any of them pretty much instantly. Information that's important to you you'll often keep close by, either on sheets of loose-leaf paper on your working desk (L1 cache) a couple seconds away, or in a one of a handfull of books in your place (L2 and up cache) which is so well organized that no individual piece of information is more than a dozen or so seconds away. If you can't find what you're looking for there, you'll have to make a quick stop at the library down the street (RAM, i.e. main memory). Fortunately, it's close enough that you can go down and grab a book and get back to work in only ~8 and a half minutes, and it's enormous, some are thousands of times the size of a typical strip-mall book store. A little inconvenient, until you remember that this library has a free delivery service, so it's really no bother at all so long as you can still find things to work on while you wait. But the local library mostly just stocks things on demand (which is fair, your bookcases, worksheets, and even the dozen or two facts you hold in your head are mostly the same way). The problem is that when you need something that's not there, it can take a while to get it. How long? Think Amazon.com in the age of exploration. They send out an old wooden boat and it could be a week, could a month, and it's not unusual to wait 3 years before you hear a response. Welcome to the world of hard disk storage, where your information is retrieved by making plates of metal spin really fast. Many metric tons of sweat have been spent making this as fast as possible, but it's hard to keep up with electrons flowing through wires. So when someone says that Solid State Disks are awesome, it's because they're able to turn that slow, unpredictable old sailing ship into a streamlined steam-powered vessel. A good SSD can often make the voyage in less than a week, sometimes in little more than a day. It can also make many thousands more quests for information per year.