This week we will practice how to do data classification and aggregation in Geopandas. We continue from the last week's exerise with rather similar idea. The overall aim this week is to define dominance areas [0] for 8 shopping centers in Helsinki with different travel modes (Public tranport, private car). The last step (optional) is to find out how many people live within the dominance areas of those big shopping centers in Helsinki Region.
The exercise might be a rather demanding one, so don't panic, we will go through the exercise carefully in the following weeks.
[0]: Here, we define the dominance area of a service as the geographical area from where the given service (shopping center) is the closest one to reach in terms of travel time.
- 100 % of point total if you return your solution within 1 week (due date 28.11.2016)
- 85 % of point total if your return your solution within 2 weeks (due date 05.12.2016)
- Detailed hints provided
- 50 % of point total if you return your solution within 3 weeks (due date 12.12.2016)
- Full solution provided
- Problem 1: Join accessibility datasets into a single GeoDataFrame and visualize it
- Problem 2: Calculate and visualize the dominance areas of shopping centers
- Problem 3 (optional): How many people live under the dominance area of each shopping centers?
- Answers
- Hints
Steps:
-
Download a dataset from here that includes 7 text files containing data about accessibility in Helsinki Region and a Shapefile that contains a Polygon grid that can be used to visualize and analyze the data spatially. The datasets are:
travel_times_to_[XXXXXXX]_[NAME-OF-THE-CENTER].txt
including travel times and road network distances to specific shopping centerMetropAccess_YKR_grid_EurefFIN.shp
including the Polygon grid with YKR_ID column that can be used to join the grid with the accessibility data
-
Read those travel_time data files (one by one) with Pandas and select only following columns from them:
- pt_r_tt
- car_r_t
- from_id
- to_id
-
Join the accessibility data files one by one with the Polygon grid and make an( _Update: This step will be skipped because it is not logical as we try to look at the patterns on whole Helsinki Region). _intersect
overlay analysis with Helsinki borders in a similar manner than was introduced in our lesson materials. -
Visualize the classified travel times (Public transport AND Car) of at least one of the shopping centers using the classification methods that we went through in the lesson materials. You need to classify the data into a new column in your GeoDataFrame. For classification, you can either:
-
Use the common classifiers from pysal
-
Or create your own custom classifier. If you create your own, remember to document it well how it works! Write a general description of it and comment your code as well.
-
-
Upload the map(s) you have visualized into your own Exercise 4 repository (they don't need to be pretty). If visualizing takes for ever (as computer instance can be a bit slow), it is enough that you visualize only one map using plotting in Geopandas. If it is really slow, you can do the visualization also using the QuantumGIS in the computer instance or even ArcGIS in the GIS-lab (then you need to save the data as shapefiles and upload it to GitHub and download again to the local computer.
In this problem, the aim is to define the dominance area for each of those shopping centers based on travel time.
How you could proceed with the given problem is:
- iterate over the accessibility files one by one
- rename the travel time columns so that they can be identified
- you can include e.g. the
to_id
number as part of the column name (then the column name could be e.g. "pt_r_tt_5987221")
- you can include e.g. the
- Join those columns into MetropAccess_YKR_grid_EurefFIN.shp where
YKR_ID
in the grid corresponds tofrom_id
in the travel time data file. At the end you should have a GeoDataFrame with different columns show the travel times to different shopping centers. - For each row find out the minimum value of all pt_r_tt_XXXXXX columns and insert that value into a new column called
min_time_pt
. You can now also parse theto_id
value from the column name (i.e. parse the last number-series from the column text) that had the minimum travel time value and insert that value as a number into a column calleddominant_service
. In this, way are able to determine the "closest" shopping center for each grid cell and visualize it either by travel times or by using theYKR_ID
number of the shopping center (i.e. that number series that was used in column name). - Visualize the travel times of our
min_time_pt
column using a common classifier from pysal (you can choose which one). - Visualize also the values in
dominant_service
column (no need to use any specific classifier). Notice that the value should be a number. If it is still as text, you need to convert it first. - Upload the map(s) you have visualized into your own Exercise 4 repository (they don't need to be pretty).
Take advantage of the materials last week and find out how many people live under the dominance area of each shopping center. You should first aggregate your dominance areas into a unified geometries using dissolve()
-function in Geopandas.
Write your answers for (optional) problem 3 here.
If you need more help with the exercise, read the hints.