Skip to content

Latest commit

 

History

History
101 lines (70 loc) · 3.39 KB

README.md

File metadata and controls

101 lines (70 loc) · 3.39 KB

UrbanExpansion

In this study we are proposing the application of three machine learning algorithms: Random Forest (RF), Extratrees (ET), and Logistic Regression with regularization. Extremely Randomized Trees, or Extratrees, are a variant of the RF classifier (Geurts et al. 2006) that use the entire sample at each step with randomly picked decision boundaries (variables). Some advantages of ET against RF are: (1) ET have less computational cost, (2) the randomization makes the decision boundaries smoother, and (3) tends to avoid overfitting.

Data Sources

Built Up Grid
Data contain an information layer on built-up presence as derived from Sentinel1 image collections

  • Source: Global Human Settlements
  • Temporality: 1990, 2000 and 2014
  • Format: Raster with 250 m2 resolution

Population Grid
Generated using census data combined with built-up index and aerial weights to generate the spatial distribution expressed as the number of people per cell.

  • Source: Global Human Settlements
  • Temporality: 1990, 2000 and 2015
  • Format: raster with 250 m2 resolution

Digital Elevation model (DEM)
SRTM 90m Digital Elevation Database v4.1

  • Source: NASA
  • Format: raster with 90m2 resolution

City Lights

  • Source: NOAA
  • Temporality: 1995, 2000 and 2013
  • Format: raster with 250 m2 resolution

Highways

  • Source: Open Street Maps
  • Temporality: starting from 2008
  • Format: lines geometry

Geolocations: airports, schools, universities, worship places and hospitals

  • Source: Open Street Maps
  • Temporality: starting from 2008
  • Format: points geometry

Water Bodies
Provides a basemap for the lakes, seas, oceans, large rivers, and dry salt flats of the world.

  • Source: Esri Data and Maps
  • Format: polygons geometry

Dependencies

  • Python 3.5.2
  • luigi
  • psql (PostgreSQL) 9.4
  • PostGIS 2.1.4
  • geos
  • gdal
  • geopandas
  • ...and many Python packages (see requirements.txt)

Repo Structure and How to Run

In order to run the pipeline you have to change these configuration files for the new values and run the following commands.

Configuration Files:

-pipeline/luigi.cfg will need to be configured to run luigi
-pipeline/experiment.yaml will need to be configured for the models and features to run
-pipeline/.env will need to be configured to connect to databases (make a copy from pipeline/_env)

Run the following commands:

If run locally (choose the number of workers):

python -m luigi --local-scheduler --workers 10 --module UrbanExpansion RunUrbanExpansion

If run on luigi server:

python3 -m luigi --workers 10 --module UrbanExpansion RunUrbanExpansion

Data Pipeline

Once you have set up the environment, you can start using the pipeline. The general process of the pipeline is:

  • Process of downloading data
  • Preprocess (to generate slope and city center)
  • Inserting to db
  • Generating Grids
  • Generating Feature Grids
  • Generating Urban Clusters
  • Generating Urban Feature Grids
  • Generating Features and Labels
  • Run Models
  • Store Models in results schema

The results schema is populated in this stage. The schema includes the tables:

  • evaluations: metrics and values for each model (ex. precision@100)
  • feature_importances: for each model, gives feature importance values as well as rank (abs and pct)
  • models: stores all information pertinent to each model
  • predictions: for each model, stores the value for each cell