Filter raster files by bounding box using GDAL
Often it is not necessary to process all available raster maps, but just a subset defined by a geographical region. Most of the time the regional extend is defined by a rectangular bounding box with a X- and Y-min/-max value. So how to filter all the raster files in a convenient and efficient way?
finding raster files
As a first step, we need to get an overview which files are available. Bash has the build-in function find to easily obtain a list of all the, e.g. TIFF, files in your directory.
1 |
find /path/to/file -type f -name "*.tif" > list_of_raster.txt |
filter by bounding box using gdalbuiltvrt
Now that we got the list, next we want to filter it. GDAL offers the program gdalbuiltvrt, which “builds a VRT (Virtual Dataset) that is a mosaic of the list of input GDAL datasets“. This function is fast and moreover the VRT-file requires nearly no space on your hard drive. The option -te sets the extent of the VRT as the minimum bounding box specified by the X- and Y-coordinates. A command-line with the bounding box of North Rhine Westphalia would look like this:
1 |
gdalbuildvrt -te 5.8887,50.3332,9.4702,52.5212 output.vrt -input_file_list list_of_raster.txt |
GDAL Python functions
In the next step, we need to use the GDAL functions modified for Python. The actual list of raster files within the bounding box is saved in the VRT and the Virtual Dataset needs to be read out with the Python command GetFileList(). To avoid also printing out “output.vrt“ as part of the list, we skip the first entry by added [1:] after filelist.
1 2 3 4 5 6 7 8 |
dataset = gdal.Open("output.vrt", GA_ReadOnly) filelist = dataset.GetFileList() #writes it to a new file with open(dir + "/vrt_list.txt", 'w+') as f: for line in filelist[1:]: f.write(str(line) +"\n") f.close() |
The text-file created in this way can be easily used, for example by executing the while-loop in Bash:
1 2 3 |
while read line; do #do something here done < vrt_list.txt |
The script can be found on GitHub and is available for general usage.