Generate label footprints for Satellite Imagery: the Pythonic Way [Part 2]

9 minute read

Published:

MathJax example

Hi there! I hope you'd enjoyed the previous part of this tutorial. If you haven't checked that out, then I recommend you to do so before reading this part. In the previous post we had seen what we would like to achieve using Overpass, OSM's web API. In this part we shall dive into wrting python code to do so. For this we will use overpy, a python wrapper to the Overpass API.

Things to know or to brush up

  1. geoPandas and gdal
  2. Basic understanding of GIS using Python. Here you can find a very detailed and wonderful tutorial
  3. Go over overPy python wrapper's documentation. Make sure to understand the syntax. In any case we will cover that in the tutorial.

We will cover each part step by step by going over and understaning the code. I will cover only the crux of the code and its logic. You can find the code in its entirity here. Make sure to have all the libraries installed. I am using Python 3.6, have anaconda installed on my system. I wrapped the code to make a simple API. You can play around with that in this jupyter notebook. Let's start then!

Step 0: Import Statements

import overpy
import geopandas as gpd
from shapely.geometry import Point, Polygon, LineString
from osgeo import gdal, ogr

Step 1: Read the geoJSON file and extract the bounds of the AOI

input_aoi = gpd.read_file(aoi)

bounds_coords = input_aoi['geometry'].bounds

south = bounds_coords['miny'][0]
west = bounds_coords['minx'][0]
north = bounds_coords['maxy'][0]
east = bounds_coords['maxx'][0]

    • We are given a geoJSON file (with the AOI) as an input. Remember from the last post that we had the bounding box coordinates saved as a geoJSON file. You can use this file here. We will then read this file using the geoPandas library.
    • In line 3 we then find the bounds of the four coordinates. This will help us to find the four coordinates.
    • Lines 4-7 does exactly that. We save this in four different variables.

Step 2: Query the OverPass API using the overpy wrapper

api = overpy.Overpass()

query_str = '[out:json];way({south:.8f}, {west:.8f}, {north:.8f},
               {east:.8f})[{query_landmark:.50s}]; (._;>;); out;'

query_aoi = query_str.format(north=north, south=south, east=east, west=west, query_landmark='building')

query_result = api.query(query_aoi)

    • First and foremost we initialise an object of the Overpy API
    • In line 3 we write a query in the format prescribed by Overpy. The query string accepts the four coordinates of the bounding box which we have already estimated in the previous step. As the fifth argument, the landmark that we are interested in is provided. Here, building Take your own time to understand the syntax here. It took to me quite soemtime to wrap my head around it
    • In line 6 the arguments are passed using format
    • Next, we query the Overpass API using the overpy wrapper to get all the buildings in the AOI. query_result stores the query.

Step 3: Get all the desired footprints in the AOI

result_ways = query_result.ways

nodes_of_ways = [way.get_nodes(resolve_missing=True) for way in result_ways]

list_lon_lat = []
for way in nodes_of_ways:
    list_of_ways = []
    for node in way:
        list_of_ways.append([float(node.lon), float(node.lat)])
    list_lon_lat.append(list_of_ways)

    • There is something known as ways, nodes and relations which in lay man term's is a way to represent each of the desired footprint. Let's say our desired footprint is buildings in the AOI. Each building is referred to as way and each of such a building is represented by some nodes. Each node is thus a coordinate of the building polygon. Each node has longitue and a latitude. We need to go through each of them step by step and extract all this information. We also have to make sure that we represent this information in a nice manner.
    • Line 1 gets all the ways in the given AOI.
    • Line 3 iterates through each of the ways and gets all the nodes in them. We collect all the nodes of given way in a list. And we do this for all the ways. Thus, we get a list of lists. I hope this makes sense. If not then I recommend yo to go over through overpy's or OSM's documentation.
    • Lines 5 through 10, loops over all the nodes in all the ways; and get all the (longitute,latitude) of the different nodes. If for examples we are looking for buildings then each of these (lon,lat) are bounding positions of the polygon
That's a lot to digest. Take your time and then proceed. No hurry!

Step 4: Create a geoDataFrame to store the query and its metadata

geopandas_dataframe = gpd.GeoDataFrame()
geopandas_dataframe['geometry'] = None

for index, way in enumerate(result_ways):
    coordinates = list_lon_lat[index]

    if coordinates[0] == coordinates[-1]:
        geopandas_dataframe.loc[index, 'geometry'] = Polygon(coordinates)
    elif len(coordinates) == 1:
        geopandas_dataframe.loc[index, 'geometry'] = Point(coordinates)
    else:
        geopandas_dataframe.loc[index, 'geometry'] = LineString(coordinates)

    geopandas_dataframe.loc[index, 'way id'] = str(way)
    geopandas_dataframe.loc[index, 'expanded_tag'] = str(way.tags)
    geopandas_dataframe.loc[index, 'queried_tag'] = tag

geopandas_dataframe.crs = {'init': 'epsg:4326'}

      We will put all of the stuff in step 3 into geoPandas dataframe, so that we can use it later with ease!
    • In line 1 we create an empty geoPandas dataframe. Line 2 creates a column called geometry. This column is where we will store each of the way in its appropriate shape viz. Point, Polygon or a LineString.
    • The for loop goes over each of the ways and determines whether each way is a Polygon, Point or a Line. In case we have buildings the it is: Polygon
    • The last three lines of the for loop stores the metadata associated with each way in three different columns. We won't do anything with this in this tutorial series but it is always nice to have that data stored for later use!
    • the line in this code snippet specifies the Coordinate Refrence System! More information here.

Step 5: Save the geoDataframe as a geoJSON file

name='name_of_geojson_file.geojson'

print('Saving as a geoJSON file as {name}'.format(name=name))
with open(name, 'w') as file:
    file.write(geopandas_dataframe.to_json())

    • In line 1 we provide a name for the geojson file.
    • Next, we save the geoPandas dataframe obtained from the earlier steps as a geojson file.

Step 6: Save the geoDataframe as a shapely object

Note that if the dataframe has multiple geometries, then we should convert them to a single geometry otherwise a shape file cannot be generated. Hence we will employ a hack in this part.

# Check which geometry occurs the most
geopandas_dataframe['geom_type'] = None
for index, row in geopandas_dataframe.iterrows():
     row['geom_type'] = row['geometry'].geom_type

max_occur_geom = geopandas_dataframe['geom_type'].value_counts().idxmax()

for index, row in data.iterrows():
geom = row['geometry']
if geom.geom_type is not max_occur_geom:
    if max_occur_geom == 'LineString':
        row['geometry'] = LineString(list(geom.exterior.coords))
    if max_occur_geom == 'Polygon':
        # for now let's handle it by deleting the row
        # though the LineString should be converted to a polygon
        data.drop(index, inplace=True)
    if max_occur_geom == 'Point':
        row['geometry'] = Point(list(geom.exterior.coords))

# Save as a shape file
data.to_file(name)

    • Lines 2-4 checks which geometry occurs the most. Let's say for the case where we query all the buildings, we get Polygons but there might be some outliers where we get some other object like Lines or Points. This might arise beacuse of some improper labelled data in OSM.
    • So, in that case we should either convert those outliers to Polygons or just discard them. This is handled by the second part of the code snippet.
    • Finally, save it as a shapely object!

Step 7: Convert the shapely object to a TIFF image

Now that we have the shapely object, we can convert that to the required mask i.e. an image!

source_ds = ogr.Open(shape_file)
source_layer = source_ds.GetLayer()
x_min, x_max, y_min, y_max = source_layer.GetExtent()

#Get the dimension of the output image
if cols is None and rows is None:
  cols = int((x_max - x_min) / x_res)
  rows = int((y_max - y_min) / y_res)

# Get the resolution of the output image
if cols is not None and rows is not None:
  x_res = (x_max - x_min) / cols
  y_res = (y_max - y_min) / rows

raster = gdal.GetDriverByName('GTiff').Create('mask.tif', cols, rows, 1, gdal.GDT_Byte)
raster.SetGeoTransform((x_min, x_res, 0, y_max, 0, -y_res))
gdal.RasterizeLayer(raster, [1], source_layer, burn_values=[label_color])

    • Lines 1-3 opens the shape file from step 6 and extracts the four coordinates.
    • Next we determine the size and resolution of the reuired image.
    • Once we get that, the last three lines of the code simply saves the image as a file named mask.tif

That's it you have made it to the end of the tutorial! Here we wrapped our head around the python wrapper for query data from OSM's API. Now we have cool looking images for our AOI. In the next and final part of this series we will use Bokeh to visualize the extracted labels on Google Maps! That will soooo cool! Go here for part three of this tutorial. Feel free to email me if you have questions!

Credits and References

    The code was written during my internship. I should give credits to Planet and all the other cool open source projects and libraries/APIs that were used to make this possible. If you feel that I have missed some credits, feel free to email me and I will make sure they appear here.