“Fair Use” Download of Large Vector Maps
With my switch to vector maps, the dramatically reduced storage requirements now allow to have maps for whole countries on my Smartphone. Getting such maps still requires tile downloads, which sum up to millions of web requests and gigabytes of data downloaded. To avoid unfair and excessive load to the map provider web services, and to avoid triggering blocking thresholds, I created a python script that downloads maps at slow rates and puts the downloaded tiles into MBtiles databases. The script is intended to run in the background whenever the computer is on, and over days and weeks download maps. It works through a list of maps to download, and when finished with the last one, it starts with the first one again, updating it. In my case, I installed the script on my media center PC, so whenever a recording runs or I use the PC for watching a show or listening to music, also my maps get an update.
Basic Concept
Web Map Structures
While vector maps are fundamentally different from raster image maps with regard to the data structures, the organisation of the data is similar to raster maps: The area covered is divided into small squares, the “tiles”, and also, despite not strictly being necessary, vector maps come in different zoom levels to reduce the processing requirements for the map rendering. So, the typical web map service URL looks something like this:
https://some.mapservice.tld/some/path/to/the/map/{z}/{x}/{y}/tile.pbf
https://another.mapsource.tld/path/to/the/map/{z}/{x}/{y}.pbf?some=variable&another=var
https://yet.another.map/some/mappath/{z}/{x}/{y}
where {z} represents the zoom level, and {x} and {y} are the tile coordinates in that zoom level. The tile is a protobuf file containing the compressed vector data.
MBtiles Databases
MBtiles is basically a file that contains a SQLite database, which stores map tiles – which can be also raster images, but for my purpose here lets consider vector tiles – exactly the protobuf tiles that the web service provides. The MBtiles database consists of at least two tables:
- metadata
This table contains the name of the map, minimum and maximum zoom available, the bounding box covered and some more. - tiles
This table contains the actual map data, with the columns zoom_level, tile_column, tile_row and tile_data. I guess pretty self explanatory – only remark: tile_data contains the binary tile data from the protobuf files.
The Script Process
From that, the required process is pretty obvious and straightforward:
- Read in the data for the map web services to be processed from a configuration file
- Cycle through these services – for each:
- Create the MBtiles database with the correct metadata information
- Cycle through zoom, x and y and download the regarding tile from the map service
- Store the tile into the MBtiles DB
Around this a bit of housekeeping and logging, and that’s it.
Important Notice
If you use my scripts and methods, I’d kindly ask you to respect and adhere to the usage conditions and policies of the web map service providers. Failing to do so may result in services getting offline, behind authentication or pay walls etc., which would make everyone’s life more difficult.
Implementation
Config File
To provide the script with all the map information it needs, I decided to use JSON as format for the configuration file – for three reasons: It is easily human readable and structured, it is easily digestible for a python script and it is the most commonly used format for most data related to map rendering engines, geo information processors etc. that work with vector data. My file looks like that:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
{ "FreelyChosenMapName": { "DownloadURL": "https://{server}.mapsource.tld/path/to/pbf/tiles/{z}/{x}/{y}/tile.pbf?any=get&var=needed", "ServerParts": ["server1", "server2", ...], "BoundingBox": [min_lat, min_lon, max_lat, max_lon], "MBtilesDB": "/path/to/your/MBtiles-file.mbtiles", "Name": "Mapname in the MBtiles DB", "min_z": 0, "max_z": 14, "ReadSpacing": 1.5 }, "NextMap": {...}, ... } |
The elements have the following meaning and/or function:
- FreelyChosenMapName is the name of the map internally in the script, and is used to name the status file for tracking a few items for this map. It needs to be unique within the JSON file.
- DownloadURL is the URL to fetch the tiles from. It has the {z}, {x} and {y} placeholders where the tile coordinates to download are inserted by the script. It also may contain a {server} placeholder which the script will use if the map service has more than one URL (e.g., https://s0.webservice.com/…, https://s1.webservice.com/…, https://s2.webservice.com/…) to fetch tiles from. This is used by some web map services for load balancing, and my script supports this load balancing and will use all offered web servers equally. The parts in the URL that change per server, are to be provided in:
- ServerParts – this is an array containing the URL parts that are to be cycled through for load balancing. For the example given above, this would look like [“s0”, “s1”, “s2”]. If the DownloadURL does not contain a {server} placeholder, ServerParts is just an empty array, i.e. [“”].
- BoundingBox contains the area covered as an array consisting of the minimum longitude and latitude, and the maximum longitude and latitude, basically lower left and upper right corner coordinates of the square area that is in the map data.
- MBtilesDB is the path to and name of the file that contains the SQLite database.
- Name is what goes into the database metadata table as name. I use this to include copyright notices required by the usage conditions of the map providers.
- min_z and max_z are the lowest and highest zoom level to be downloaded and stored.
- ReadSpacing is the time (in seconds) to wait between two web requests. The higher the value, the less number of web requests per time go out to the web service, the less load it means for that service, and the longer the processing of the whole map takes.
Here is an example that will download the basemap.de vector maps and the contour lines, and the Swisstopo base map (see remark below):
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
{ "BasemapDE": { "DownloadURL": "https://sgx.geodatenzentrum.de/gdz_basemapde_vektor/tiles/v2/bm_web_de_3857/{z}/{x}/{y}.pbf", "BoundingBox": [47.2, 5.8, 55.1, 15.1], "ServerParts": [""], "MBtilesDB": "/mnt/SSD/maps/AutoMap/basemapDE.mbtiles", "Name": "© GeoBasis-DE/BKG (2025) CC BY 4.0", "min_z": 0, "max_z": 14, "ReadSpacing": 1 }, "BasemapDEcontour": { "DownloadURL": "https://sgx.geodatenzentrum.de/gdz_basemapde_vektor/tiles/v2/bm_web_hl_de_3857/{z}/{x}/{y}.pbf", "BoundingBox": [47.2, 5.8, 55.1, 15.1], "ServerParts": [""], "MBtilesDB": "/mnt/SSD/maps/AutoMap/basemap_contour.mbtiles", "Name": "© GeoBasis-DE/BKG (2025) CC BY 4.0", "min_z": 8, "max_z": 14, "ReadSpacing": 1 }, "SwissTopo": { "DownloadURL": "https://vectortiles{server}.geo.admin.ch/tiles/ch.swisstopo.base.vt/v1.0.0/{z}/{x}/{y}.pbf", "BoundingBox": [45.818, 5.9559, 47.8084, 10.4921], "ServerParts": ["0", "1", "2", "3", "4"], "MBtilesDB": "/mnt/SSD/maps/AutoMap/SwissTopo.mbtiles", "Name": "© swisstopo, © EDA, © BAFU, © BABS, © SAC, © Naturfreunde Schweiz, © opentransportdata.swiss", "min_z": 0, "max_z": 14, "ReadSpacing": 1.5 } } |
The SwissTopo example is untested as given here – I just was desperately looking for a public web service that uses load balancing. The services I use with load balancing are not public, so I can’t give them here. The SwissTopo download however is not really required, as you can download SwissTopo data ready made as MBtiles DB – isn’t that nice!!!
The Script
Prerequisites
The script makes use of a number of libraries, which mostly are part of the typical Unix distributions, and if not already installed, usually can be installed via the Unix packet manager, e.g. sudo apt install python3-XXX. The only exception is the pymbtiles library, which you would need to install via the python package manager: pip install pymbtiles.
Pymbtiles brings all the routines that are required to create an MBtiles DB and add or update tiles in it – very nice, thanks to the Conservation Biology Institute for providing this for free!
On my Debian Trixie installing this required the use of a virtual python environment, which sounds more complicated than it actually is – here are helpful instructions. I added an Appendix below with very condensed instructions on python virtual environments.
What It Does
The script will do the following:
- Read the config JSON from the hard-coded filename mapconfig.json from the current directory.
- The main objective: Download the vector maps configured at the pace given, and store them into the configured MBtiles databases, adding the configured metadata. To not overburden and wear down the storage device (SSDs, SD cards and other solid state storages do degrade with every write operation), tiles will only be written to the DB after the successful download of a given number of tiles – this can be configured in the line
WriteInterval = 250
The downloads may make use of headers – primarily the User-Agent header is important. If you do not specify it, it will use the python default user agent, and this is blocked by some web services because of abuse. The headers and User-Agent used can be configured in the line
headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:136.0) Gecko/20100101 Firefox/136.0’} - Maintain a status file, which allows the script to pick up where it left off after being stopped, stopping after an error, or after a reboot. The file name and location can be configured in the line
ProcessStateFile = “./DownloadState.txt” - Append messages to a log file (this file will grow very large over time, consider killing it ever once in a while). This is mainly helpful if you encounter errors. The name and location can be configured in the line
LogfileName = “./download.log” - Maintain a status file for each map in the config. The name of this file is hard coded to <Mapname>_status.txt and will be in the current directory. <Mapname> is the freely chosen map name in the JSON config file.
- When a map is downloaded completely, create a copy of the MBtiles database, named MBTiles-file<X>.mbtiles. <X> is the number of the successful full map download, so after the first full download 1, after the second 2 etc. MBTiles-file.mbtiles is the name of the MBtiles file as configured in the JSON config. The idea is to have a consistent copy of the file that can be used, even if the main file is currently processed. Over time this will accumulate a lot of disk space, so consider deleting old versions after some time.
- Gracefully shut down on a SIGTERM or SIGINT signal. SIGTERM is a problem if the shutdown has no grace period – the script is too slow to handle the final operations before the system stops it. This is not really a problem: You loose a very few downloaded tiles, but the next run of the script will re-download them.
- Stop download and the script if an unexpected download status code was received. A more intelligent error handling may develop over time – for now, the only status code handled (aside from 200 OK) is 404 (not found) – this is taken as an indicator that a tiles was requested that is outside the bounding box. This may happen, as the calculation of min and max tile coordinates may result in a few too many tiles from rounding errors. A 404 goes into the log file as a warning, but the script will continue.
How To Run
Simplest way of course is to run the script manually, but it is intended to run automatically in the background. I use this cron command:
|
1 |
@reboot sleep 30 && cd /home/<user>/PBFdownloader && /path/to/python/virtual/environment/bin/python3 pbfdownloader.py > /home/<user>/PBFdownloader/jobrun.log 2>&1 |
Replace <user> with the home directory name of your user. PBFdownloader is the directory the script file is located. /path/to/python/virtual/environment is the full path of the python virtual environment.
This installed as a cron job, will after a reboot wait 30 seconds for the system to come fully up, and then start the script. The output, including error messages, goes into jobrun.log. This will help in case debugging is needed.
Get The Script
The latest version of my script can always be found in my Github repository. I published it under CC BY-SA 4.0 Attribution-ShareAlike 4.0 International license.
Appendix: Python Virtual Environments – the Extremely Condensed Version:
Here are the commands to create, manage and use python virtual environments:
| Command | Remarks | Function |
| python3 -m venv my-virtual-env | my-virtual-env is a freely chosen name for the environment and will create a folder of this name. Add the option –system-site-packages if you want modules that were installed via system packet manager to be available in the virtual environment |
This creates a virtual environment – needed once |
| . ./my-virtual-env/bin/activate | Your shell prompt will change to (my-virtual-env) $ | Switch to this virtual environment – needed whenever you modify the environment or want to run python in this environment. |
| pip install pymbtiles | (when within the activated environment) | Installs libraries and modules into the virtual environment. Needed once for each install. |
| python3 | (when within the activated environment) | Starts the python interpreter within the environment. |
| deactivate | Your shell prompt will change back to $ | Leave the virtual environment |
| ./my-virtual-env/bin/python3 | This will run a python interpreter in the virtual environment without first activating it |
If you installed all other python modules via your system packet manager, e.g., apt install …, they are not available in your virtual environment by default, unless you used the option –system-site-packages. If you want to change this, edit ./my-virtual-env/pyvenv.cfg – modify the line
include-system-site-packages = true
True means access is given, false it is not.

