This repository contains the data and code underlying the paper "Cities, Lights, and Skills in Developing Economies" in the Journal of Urban Economics by Jonathan Dingel, Antonio Miscio, and Don Davis.
We thank Dylan Clarke for excellent research assistance, epecially for doing the yeoman's work of implementing our algorithms in R
after they were initially written in ArcGIS
.
If you want to apply our algorithm in your own work (rather than replicate our paper), see the lights_to_cities repository.
If you want to download our metropolitan definitions for Brazil, China, or India without running any code, you can just download the CSV files.
The repository contains four top-level directories, one for each country: brazil
, china
, india
, and usa
.
The workflow for each country is organized as a series of tasks.
For example, the china
directory contains 16 folders that represent 16 tasks.
Each task folder contains three folders: input
, code
, output
.
A task's output is used as an input by one or more downstream tasks.
This graph depicts the input-output relationships between tasks for china
.
We use Unix's make
utility to automate this workflow.
After downloading this replication package (and installing the relevant software), you can reproduce the figures and tables appearing in the paper simply by typing make
at the command line.
The project's tasks are implemented via R code, Stata code, and shell scripts.
The taskflow structure employs symbolic links.
To run the code, you must have installed R, Stata, and Bash.
We ran our code using R 3.5.1, Stata 15, and GNU bash version 4.2.46(2).
Our R code leverages spatial and measurement packages with additional system requirements, namely gdalUtils
, rgdal
, rgeos
, sp
, sf
, and units
.
We used GEOS 3.7.0, GDAL 2.3.2, PROJ 4.9, and udunits 2.2.
We expect the code to work on other versions too.
- Download (or clone) this repository by clicking the green
Clone or download
button above. Uncompress the ZIP file into a working directory on your cluster or local machine. - From the Unix/Linux/MacOSX command line, navigate to a country directory.
- Typing
make
in a country directory will execute all the code.- If you are in a computing environment that supports the Slurm workload manager (if the
Makefile
detects that the commandsbatch
is valid), tasks will be submitted as jobs to your computing cluster. - If
sbatch
is not available, theMakefile
will executeRscript
andstata-se
commands locally. (Mac OS X users should ensure thatRscript
andstata-se
are in their relevantPATH
.)
- If you are in a computing environment that supports the Slurm workload manager (if the
- It is best to replicate the project using the
make
approach described above. Nonetheless, it is also possible to produce the results task-by-task in the order depicted in the flow chart for each country. These are available in thesymlinks_graph/output
folder for each country (e.g., China). If all upstream tasks have been completed, you can complete a task by navigating to the task'scode
directory and typingmake
. - An internet connection is required so that each country directory's
install_packages
task can install R packages and Stata programs. - The Brazil case requires gigabytes of microdata that is available from the IBGE.
Read the
CENSO10_pes_dta_metadata.txt
file in theinitialdata
folder within thebrazil
directory. You can skip this step by runningskip_microdata.sh
within thebrazil
directory.