Skip to content

Latest commit

 

History

History
56 lines (40 loc) · 2.09 KB

README.md

File metadata and controls

56 lines (40 loc) · 2.09 KB

Housing Inventory

This repository contains historical SF housing data and R scripts to graph that data. The data here was used to generate the graphs and analysis in the blog post "Employment, construction, and the cost of San Francisco apartments", and was recently used in a paper by Stanford researchers, "The Effects of Rent Control Expansion on Tenants, Landlords, and Inequality: Evidence from San Francisco.".

Data

Data for each year lives in the file named after the year. Later years may be listed as "craigslist-X".

You can get the rent out by running ./extract-craigslist craigslist-2016 for example. Note the data is not perfect. Here are some samples in the 2016 Craigslist data, for example.

799000 Apr 29 Exceptional Pacific Heights TIC $799000 / 2br - (Pacific Heights) pic
800 Apr 29 Awesome 5 Bedroom Available $800 / 5br - 3895ft2 - (2483 N Smiderle, San Bernardino, CA) pic
99 Apr 29 Jr. 1 BD. Washer & Dryer in unit! $99 deposit $3425 / 1br - 550ft2 - (nob hill) pic map

(It's not clear if these prices have been stripped before generating the averages in the housing-inventory file).

You can combine a bunch of data sources by running the "combine" script, ./combine. This generates the combined file in this repository.

The charts in the blog post are generated by running the model script in this repository, on the combined data.

calc-medians computes the medians for each year in the file. It prints the median, 95th, and 5th percentile for each year in the dataset. These values are present in the medians file in this repository.

Craigslist

To get the Craigslist data, open the SF rentals page, select all and copy/paste the page's contents into a text file. Keep copying every page into the same text file until done. Save this file as craigslist-YYYY-MM.

All Craigslist files should be combined into one per year, via eg:

cat craigslist-2019-* > craigslist-2019

After pulling in new data, recalculate the medians:

./calc-medians > medians