The package provides easy access to German publicly available regional statistics. It does so by providing a wrapper for the GraphQL API of the Datenguide project.
- Free software: MIT license
- Documentation: https://datenguidepy.readthedocs.io/
- Overview of available statistics and regions:
- The package provides DataFrames with the available statistics and regions, which can be queried by the user without having to refer to expert knowledge on regional statistics or the documentation of the underlying GraphQL API
- Build and Execute Queries:
- The package provides the user an object oriented interface to build queries that fetch certain statistics and return the results as a pandas DataFrame for further analysis.
To use the package install the package (command line):
pip install datenguidepy
Within your python file or notebook:
1. Import the package
from datenguidepy import Query
2. Creating a query
- either for single regions
query = Query.region('01')
- or for all subregions a region (e.g. all Kommunen in a Bundesland)
query_allregions = Query.all_regions(parent='01')
- How to get IDs for regions? see below "Get information on fields and meta data"
3. Add statistics (fields)
- Add statistics you want to get data on
field = query.add_field('BEV001')
- How do I find the short name of the statistics? see below "Get information on fields and meta data"
- 4. Add filters
- A field can also be added with filters. E.g. you can specify, that only data from a specific year shall be returned.
field.add_args({'year': [2014, 2015]})
- 5. Add subfield
- A set of default subfields are defined for all statistics (year, value, source). If additional fields (columns in the results table) shall be returned, they can be specified as a field argument.
field.add_field('GES') # Geschlecht
# by default the summed value for a field is returned.
# E.g. if the field "Geschlecht" is added, the results table will show "None" in each row,
# which means total value for women and man.
# To get disaggregated values, they speficically need to be passed as args.
# If e.g. only values for women shall be returned, use:
field.add_args({'GES': 'GESW'})
# if all possible enum values shall be returned disaggregated, pass 'ALL':
field.add_args({'GES': 'ALL'})
- 6. Get results
- Get the results as a Pandas DataFrame
df = query.results()
Get information on region ids
from datenguidepy import get_all_regions
get_all_regions()
Use pandas query() functionality to get specific regions. E.g., if you want to get all IDs on "Bundeländer" use. For more information on "nuts" levels see Wikipedia.
get_all_regions().query("level == 'nuts1'")
Get information on statistic shortnames
from datenguidepy import get_statistics
get_statistics()
Get information on single fields
You can further information about description, possible arguments, fields and enum values on a field you added to a query.
query = Query.region("01")
field = query.add_field("BEV001")
field.get_info()
For detailed examples see the notebooks within the use_case folder.
For a detailed documentation of all statistics and fields see the Datenguide API.
All this builds on the great work of Datenguide and their GraphQL API datenguide/datenguide-api
The data is retrieved via the Datenguide API from the "Statistische Ämter des Bundes und der Länder". Data being used via this package has to be credited according to the "Datenlizenz Deutschland – Namensnennung – Version 2.0".
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.