52°F

Aaron Parecki

  • Articles
  • Notes
  • Photos
  • Long-Term Archiving of GPS Logs

    June 1, 2014

    I have been continuously logging GPS data for 6 years now, and thinking a lot about how I want to archive the data long-term. Currently I have the data in a MySQL database, which is not a good long-term solution.

    In addition to some of the issues documented on the database antipattern page, some of the problems I've encountered are:

    • The total data volume gets large after 6 years (there's already a million new rows in the last 6 months) making it hard to move the whole dataset around and back it up
    • Raw database files on disk are not always portable between versions, so to upgrade the database I need to dump and restore the data from very large SQL text files
    • Backing up the data as a SQL dump takes too long to do regularly since it locks the MySQL tables
    • Adding a new column or index to the data is a long process that locks the table for potentially an hour (which of course is bad because I'm constantly generating new data)

    Instead, I am considering how best to store the data in plain text files on disk. Below are some thoughts on the various options I'm considering, and would love to hear any feedback or other suggestions!

    Folder Structure

    I record at most one GPS point per second, so each day has a max of 86,400 records. I'll split the data into one file per day, in folders by year and month. UTC will be used to determine the filename date.

    ...
    2014/
    2014/04/
    2014/04/29.json
    2014/04/30.json
    2014/05/
    2014/05/01.json
    2014/05/02.json
    ...
    

    Since each file holds only the data from one day, each has a max size of 18-20mb (see below for size estimates). This is somewhat large, but not unwieldy for processing since easily fits in RAM while reading, and can even be opened by most good text editors if needed.

    Sharding the data into individual files means backing it up using a tool like rsync becomes very efficient since it's able to ignore entire files at a time.

    Because the data is sharded, reading a day or week of data means only accessing a limited subset of the dataset. For example, accessing the data for May 1 Pacific time will mean opening both the 2014/04/30.json and 2014/05/01.json files.

    GeoJSON Feature Collection

    Pros

    • 206 bytes per record
    • Can be loaded into any GeoJSON viewer directly to display

    Cons

    • Appending requires parsing - must parse the existing data to add a new record
    • An index would have to reference each record by byte ranges
    • Reordering data would need to be done programmatically since it is hard to visually inspect
    {
       "type":"FeatureCollection",
       "features":[
          {
             "type":"Feature",
             "properties":{
                "date":"2011-09-19T00:02:07+0000",
                "speed":1,
                "accuracy":8,
                "altitude":8,
                "heading":0,
                "battery":90
             },
             "geometry":{
                "type":"Point",
                "coordinates":[
                   -122.64768183231,
                   45.512098073959
                ]
             }
          },
          {
             "type":"Feature",
             "properties":{
                "date":"2011-09-19T00:02:10+0000",
                "speed":1,
                "accuracy":6,
                "altitude":11,
                "heading":0,
                "battery":90
             },
             "geometry":{
                "type":"Point",
                "coordinates":[
                   -122.6476174593,
                   45.512092709541
                ]
             }
          },
          {
             "type":"Feature",
             "properties":{
                "date":"2011-09-19T00:02:10+0000",
                "speed":0,
                "accuracy":1000,
                "altitude":0,
                "heading":0,
                "battery":90
             },
             "geometry":{
                "type":"Point",
                "coordinates":[
                   -122.66402777778,
                   45.517569444444
                ]
             }
          },
          {
             "type":"Feature",
             "properties":{
                "date":"2011-09-19T00:02:12+0000",
                "speed":1,
                "accuracy":6,
                "altitude":9,
                "heading":0,
                "battery":90
             },
             "geometry":{
                "type":"Point",
                "coordinates":[
                   -122.64757454395,
                   45.512087345123
                ]
             }
          },
          {
             "type":"Feature",
             "properties":{
                "date":"2011-09-19T00:02:14+0000",
                "speed":1,
                "accuracy":4,
                "altitude":8,
                "heading":0,
                "battery":90
             },
             "geometry":{
                "type":"Point",
                "coordinates":[
                   -122.64753699303,
                   45.512076616287
                ]
             }
          }
       ]
    }
    

    (Newlines and spacing for illustration purposes only, would not be included in the actual data)

    Individual rows of GeoJSON Features

    • Hand-modifying data is not as easy as the YAML option, but easier than the FeatureCollection option

    Pros

    • 206 bytes per record
    • Append without parsing - can add data to the end of the file without parsing the rest
    • An index could reference each record by line number
    • Possible to reorder by hand since each record is on its own line

    Cons

    • Must parse each line and add to a GeoJSON Feature Collection in order to display
    {"type":"Feature","properties":{"date":"2011-09-19T00:02:07+0000","speed":1,"accuracy":8,"altitude":8,"heading":0,"battery":90},"geometry":{"type":"Point","coordinates":[-122.64768183231,45.512098073959]}}
    {"type":"Feature","properties":{"date":"2011-09-19T00:02:10+0000","speed":1,"accuracy":6,"altitude":11,"heading":0,"battery":90},"geometry":{"type":"Point","coordinates":[-122.6476174593,45.512092709541]}}
    {"type":"Feature","properties":{"date":"2011-09-19T00:02:10+0000","speed":0,"accuracy":1000,"altitude":0,"heading":0,"battery":90},"geometry":{"type":"Point","coordinates":[-122.66402777778,45.517569444444]}}
    {"type":"Feature","properties":{"date":"2011-09-19T00:02:12+0000","speed":1,"accuracy":6,"altitude":9,"heading":0,"battery":90},"geometry":{"type":"Point","coordinates":[-122.64757454395,45.512087345123]}}
    {"type":"Feature","properties":{"date":"2011-09-19T00:02:14+0000","speed":1,"accuracy":4,"altitude":8,"heading":0,"battery":90},"geometry":{"type":"Point","coordinates":[-122.64753699303,45.512076616287]}}
    

    GeoYAML

    Pros

    • Append without parsing - can add data to the end of the file without parsing the rest
    • An index could reference each record by start/end line numbers
    • Easy for a human to visually inspect the data and add/modify properties
    • Possible to reorder by hand

    Cons

    • 232 bytes per record (slightly more than the JSON version because newlines are required)
    • Requires parsing before displaying - must parse the YAML and convert back to GeoJSON in order to display
    ---
    type: FeatureCollection
    features:
    - type: Feature
      properties:
        date: '2011-09-19T00:02:07+0000'
        speed: 1
        accuracy: 8
        altitude: 8
        heading: 0
        battery: 90
      geometry:
        type: Point
        coordinates:
        - -122.64768183231
        - 45.512098073959
    - type: Feature
      properties:
        date: '2011-09-19T00:02:10+0000'
        speed: 1
        accuracy: 6
        altitude: 11
        heading: 0
        battery: 90
      geometry:
        type: Point
        coordinates:
        - -122.6476174593
        - 45.512092709541
    
    Sun, Jun 1, 2014 3:00pm -07:00 #gps #logs #indieweb #ownyourdata
    1 mention

    Other Mentions

    • Aaron Parecki aaronparecki.com
      @veganstraightedge Not super formally documented yet, but here are some relevant links:

      * http://aaronparecki.com/articles/2014/06/01/1/long-term-archiving-of-gps-logs
      * http://indiewebcamp.com/p3k#Publishing_Other_Content ...
      Tue, Jun 17, 2014 1:38pm -07:00
Posted in /articles

Hi, I'm Aaron Parecki, Director of Identity Standards at Okta, and co-founder of IndieWebCamp. I maintain oauth.net, write and consult about OAuth, and participate in the OAuth Working Group at the IETF. I also help people learn about video production and livestreaming. (detailed bio)

I've been tracking my location since 2008 and I wrote 100 songs in 100 days. I've spoken at conferences around the world about owning your data, OAuth, quantified self, and explained why R is a vowel. Read more.

  • Director of Identity Standards at Okta
  • IndieWebCamp Founder
  • OAuth WG Editor
  • OpenID Board Member

  • 🎥 YouTube Tutorials and Reviews
  • 🏠 We're building a triplex!
  • ⭐️ Life Stack
  • ⚙️ Home Automation
  • All
  • Articles
  • Bookmarks
  • Notes
  • Photos
  • Replies
  • Reviews
  • Trips
  • Videos
  • Contact
© 1999-2025 by Aaron Parecki. Powered by p3k. This site supports Webmention.
Except where otherwise noted, text content on this site is licensed under a Creative Commons Attribution 3.0 License.
IndieWebCamp Microformats Webmention W3C HTML5 Creative Commons
WeChat ID
aaronpk_tv