EverySingleStreet.jl: Working with GPS data

Creation date: 2023-01-29

Tags: julia, osm, map-matching

I was inspired about a month ago by a tweet from Dr. Donut, who walked every street in Berkeley. As a new citizen of the wonderful city of Hamburg, Germany I got intrigued by doing this kind of challenge myself. Normally, I stick to familiar routes and streets though I thought it's time to break with this habit. My interest in visualization, geo data and walking made it clear to me that this would be a wonderful project. It's a daunting task that will take me a couple of years to complete, but I see it as a valuable learning experience. Both learning my body as well as improving my programming and visualization skills.

First I wanted to see what kind of websites do already exist to track the progress on this kind of challenge. The main contenders here are streetferret.com which has a nice overview of percentage of streets visited though I can't really see a nice overview on a map. The other larger website is citystrides.com which has an overview map of all the tracks that one uploaded (they take it from a third party app like strava ) but it isn't really obvious which streets were already visited on the map and which streets count towards the progress etc.

It become obvious relatively quickly that I'll never be satisfied with an existing system if I don't get absolutely everything I can think off and probably even then I would be too interested in how it works in the background to not build a system myself that is worse in general but better for me.

Let's get to it!

In this post I'll walk you through the very first steps I took: like downloading a map, the recorded tracks and very basic visualization. Additionally I'll give a broad intro into the problem of matching those recorded tracks to the underlying map. In the next post we'll dive deep into the map matching part.

Loading data

For this we want to have the positions of all streets in a given city for me Hamburg, Germany as well as the paths we walked. Let's start with generating a Julia package first by using PkgTemplates.jl. I called the package EverySingleStreet.jl.

Next up I did a quick research of which packages might be useful that already exist in the Julia registry and found out about LightOSM.jl which is a wonderful package for working with data from openstreetmap.org. Additionally GPX.jl is very useful to read and write gpx files which is the file format to store recorded GPS paths.

Unfortunately the second package isn't registered, therefore I for now copied over the code into my own repository to get started.

Fortunately it's very easy to use the LightOSM package to download all the streets in a city in a json format or some others.

I've created this function:

using LightOSM
function download(place_name, filepath)

Which downloads all the walkable streets into a format that looks like this:

            "type": "node",
            "id": 10587442922,
            "lat": 53.6098603,
            "lon": 10.1843311
            "type": "way",
            "id": 1978,
            "nodes": [
            "tags": {
                "bicycle": "use_sidepath",
                "foot": "use_sidepath",
                "highway": "primary",
                "lanes": "4",
                "lit": "yes",
                "maxspeed": "50",
                "name": "Cuxhavener Straße",
                "ref": "B 73",
                "sidewalk": "separate",
                "smoothness": "good",
                "surface": "asphalt",
                "wikidata": "Q106605121"

That already finishes the first step of our goal. For the next one of parsing the gpx files of our walked tracks there is a bit more to it. First the decision of how we record it: I use strava as it works with both websites mentioned above so I still have the option to use them though I might switch over to just track it with the awesome Android app: OSMAnd which I use anyway for some other things but more on that later.

As mentioned above I want to use the GPX.jl package for this which I just loaded into a file gpx.jl and included it.

using Geodesy
function parse_gpx(fpath)
    gpxFile = GPX.read_gpx_file(fpath)
    @assert length(gpxFile.tracks) == 1
    @assert length(gpxFile.tracks[1].segments) == 1

    return [LLA(p.lat, p.lon, p.ele) for p in gpxFile.tracks[1].segments[1].points]

Here I also use the package Geodesy.jl which is part of the JuliaGeo organization. The main usage of that package becomes clearer when we jump into the visualization section. For this function we just use the LLA struct to store the gps points. I made some assumptions here that each gpx file consists of only one track and one segment (like no pauses in between). Only returning a vector of gps points. It would most likely make sense to have a bit more structure but for now let's keep it simple.

At some point I probably want to connect it to my strava account to avoid downloading the gpx files manually if I stay with using strava.

Visualization of tracks

Next up is a very basic visualization of the streets and paths on a white canvas. I use the wonderful package Luxor.jl at this point. Yes I know there are other packages which would do a better job of visualizing maps like Tyler.jl which is a very new package which isn't registered yet but looks quite promising. The reason for using Luxor is mostly that I'm familiar with it and I can experiment around very easily with it. A huge downside though is that it's not interactive so zooming in and stuff like that is not the best experience though maybe someone can suggest me a nice svg viewer for Linux or a good VSCode plugin for it to make this easier.

Before we go actually into bringing the data to the canvas I would like to have my own Map struct and Node and Way struct extracted from the json file we previously downloaded.

THis is the very basic information that I want to store for now.

struct Node 

struct Way

struct Map
    osm_id_to_node_id::Dict{Int, Int}
    osm_id_to_edge_id::Dict{Int, Int}

The map itself stores also a graph from the street network which can be used for shortest path calculations at a later point in time as well as a mapping from open street map ids to internal node and way ids.

function parse_map(fpath)
    json = readjson(fpath)
    elements = json[:elements]
    counter = elements[1]
    @assert counter[:type] == "count"
    nodes = Vector{Node}(undef, parse(Int, counter[:tags][:nodes]))
    ways = Vector{Way}(undef, parse(Int, counter[:tags][:ways]))
    nodeid_to_local = Dict{Int, Int}()
    wayid_to_local = Dict{Int, Int}()
    node_counter = 1
    way_counter = 1
    for element in elements
        if element[:type] == "node"
            nodeid_to_local[element[:id]] = node_counter
            nodes[node_counter] = Node(element[:id], element[:lat], element[:lon])
            node_counter += 1
        if element[:type] == "way"
            element[:tags][:oneway] = "no"
            way_nodes = [nodes[nodeid_to_local[node_id]] for node_id in element[:nodes]]
            wayid_to_local[element[:id]] = way_counter
            ways[way_counter] = Way(element[:id], way_nodes, get(element[:tags],:name, ""), get(element[:tags],:highway, ""))
            way_counter += 1
    json_string = convert_keys_recursive(json)
    graph = graph_from_object(json_string; weight_type=:distance, network_type=:walk)
    return Map(graph, nodeid_to_local, wayid_to_local, nodes, ways)

and this is the corresponding code for that. The convert_keys_recursive is a small helper to convert from a dictionary which uses symbols as keys to strings as keys which is needed for the graph_from_object but we don't have to worry about that for now.

Now to the actual drawing part we start with the overall drawing map and overlaying path function.

function draw_map(city_map, paths, outpath, scale_factor=0.1)
    origin_lla = get_centroid(city_map.nodes)
    trans = ENUfromLLA(origin_lla, wgs84)
    Drawing(1920, 1080, outpath)
    for way in city_map.ways
        draw_way(way, trans)
    for path in paths
        draw_path(path, trans)


To ensure the GPS coordinates can be displayed in a reasonable way, we first obtain the centroid of all nodes on the map using the Geodesy package, which is well-suited for this task.

One can define this get_centroid method for it:

function get_centroid(nodes::Vector{Node})
    lon = mean(node.lon for node in nodes)
    lat = mean(node.lat for node in nodes)
    return LLA(lat, lon)

Then we create a full HD canvas with white background and transfer the origin to the center and scale it such that north is facing up and we have a good zoom level. It depends on the city we want to visualize of course for me 0.1 works quite well if I focus on the current part where I walk atm 😄 I really need a zoomable map soon... Then we need to create two more functions to draw the map itself and the paths.

function draw_way(way::Way, trans)
    curve = [Point(getxy_from_lat_lon(node.lat, node.lon, trans)...) for node in way.nodes]
    poly(curve, :stroke)

In general we want to convert the node points to the reasonable coordinate system and draw the curve. The first function call is used to have the opacity based on the type of the street as I want to see all walkable paths but later only count the streets at least for the every single street challenge though I might extend it to every single way in every park challenge 😄

For that I defined:

get_way_opacity(::Val{:footway}) = 0.1
get_way_opacity(::Val{:pedestrian}) = 0.1
get_way_opacity(::Val) = 0.5

and for the coordinate transform:

function getxy_from_lat_lon(lat, lon, trans)
    x,y,z = trans(LLA(lat, lon))
    return x,y

This results in this without any walked paths:


For the walked paths one can basically do the same thing:

function draw_path(path, trans)
    curve = [Point(getxy_from_lat_lon(node.lat, node.lon, trans)...) for node in path]
    poly(curve, :stroke)

With a part of my first walk:


Map matching

It's time to get to the complicated, interesting and most intriguing part of this project. As mentioned before I want to dive deep into in the next post and just want to give a rough overview of what we have to deal with in this post. Kind of showing you how much I underestimated this task before starting the project. 😄

In general one has to figure out based on the gps data which of course isn't perfect which parts one as already walked. It is often relatively easy to see as a human though I guess we know that doesn't make it simple... In some other cases as the one shown below it's not entirely obvious in a split second though one can make a reasonable guess.

Strava view of one small section with messed up GPS

Here I walked coming from the north east (the western route of the two) walked into the Gerichtstraße and then down south (the eastern one of the two). Though it looks on first sight as if I have walked into the Schnellstraße as my gps position is relatively close to that one but there is no street I could have walked to from that street directly south to just cross the Gerichtstraße.

This makes it quite clear that simply matching a GPS position to the nearest street isn't the way to go. If we would do that we would still need to connect those parts between two streets by either not connecting them if they are not on the same street, which would mean we would miss parts around corners or use the shortest path between recorded points as an estimate of what we probably walked.

For this we have to dive a bit into probability theory and hidden markov models. In general the idea of a hidden markov model is that we have some measurements as well as a probabilistic model of the world in some form. Then we want to map the measurements to the network by making an informed guess about the states that were most likely taken in that network.

A very common example for this is that Alice and Bob chat every day about the activities that Bob did that day. He is a simple guy and does only three things: Walk, shop or clean. Alice takes notes on those activities and would like to guess the weather on those days based on the activities that Bob did. For that Alice has some basic understanding of the weather in the area Bob lives as well as the connection between activity to weather. As an example it is more likely that it rains tomorrow when it rained today than that it will be sunny tomorrow as well as it is more likely that Bob takes a walk when it's sunny outside.

This results in a graph of probabilities between the states like "sunny" and "rainy" in a state network as well as connections to activities. One can then calculate the probability of a guess to the measurements of ["walk", "walk", "clean", "shop"] like ["sunny", "sunny", "rainy", "rainy"]. The ideas is to make the guess with the maximum probability.

In our case we have measurements of gps points that we have to map with a probability to certain candidates on the map (we'll define candidates in the next post) and need to have probabilities from moving from one candidate to the next.

Then we can maximize the probability of the most likely used candidates and connect those using the shortest path.

Though I'll leave you with that cliffhanger for the next post 😉

Next steps

The next step will be to implement map matching which I'm currently doing and I'm relatively satisfied with my results so far and then for the first time calculate the length and streets of my current progress. Later in this project I want to create some visualizations as well as program a route planner which gives me a route with a certain length either to a specific location or round trips that will maximize my progress.

Thanks for reading!

Special thanks to my 9 patrons! Thanks so much for you continued support on my journeys in open source and tackling those projects.

Special special thanks to my >4$ patrons. The ones I thought couldn't be found 😄

Want to be updated? Consider subscribing and receiving a mail whenever a new post comes out.

Powered by Buttondown.

Subscribe to RSS