Wang_Rummler_Final_v3.Rmd

---
title: "Urban Growth Modeling in the Nashville Metro Area"
author: "Shengqian Wang and Jack Rummler"
date: "2023-04-26"
output:
  html_document:
    code_folding: hide
    toc: true
    toc_float: true
    number_sections: true
    theme: flatly
---
<style>
  .superbigimage{
      overflow-x:scroll;
      white-space: nowrap;
  }

  .superbigimage img{
     max-width: none;
  }


</style>

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library(tigris)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(sf)
library(raster)
library(knitr)
library(kableExtra)
library(tidycensus)
library(tigris)
library(FNN)
#library(QuantPsyc) # JE Note: in R 4.1, QuantPsyc package not available.
library(caret)
library(yardstick)
library(pscl)
library(plotROC) 
library(ggrepel)
library(pROC)
library(grid)
library(gridExtra)
library(viridis)
library(igraph)
library(RSocrata)
library(paletteer)

plotTheme <- theme(
  plot.title =element_text(size=12),
  plot.subtitle = element_text(size=8),
  plot.caption = element_text(size = 6),
  axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
  axis.text.y = element_text(size = 10),
  axis.title.y = element_text(size = 10),
  # Set the entire chart region to blank
  panel.background=element_blank(),
  plot.background=element_blank(),
  panel.border=element_rect(colour="black"),
  # Format the grid
  panel.grid.major=element_line(colour="#D0D0D0",size=.75),
  axis.ticks=element_blank())

mapTheme <- theme(plot.title =element_text(size=12),
                  plot.subtitle = element_text(size=8),
                  plot.caption = element_text(size = 6),
                  axis.line=element_blank(),
                  axis.text.x=element_blank(),
                  axis.text.y=element_blank(),
                  axis.ticks=element_blank(),
                  axis.title.x=element_blank(),
                  axis.title.y=element_blank(),
                  panel.background=element_blank(),
                  panel.border=element_blank(),
                  panel.grid.major=element_line(colour = 'transparent'),
                  panel.grid.minor=element_blank(),
                  legend.direction = "vertical", 
                  legend.position = "right",
                  plot.margin = margin(1, 1, 1, 1, 'cm'),
                  legend.key.height = unit(1, "cm"), legend.key.width = unit(0.2, "cm"))

source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")

qB <- function(df, variable, rnd) {
  if (missing(rnd)) {
    as.character(quantile(round(df[[variable]],2),
                          c(.01,.2,.4,.6,.8), na.rm=T))
  } else if (rnd == FALSE | rnd == F) {
    as.character(formatC(quantile(df[[variable]],
                                  c(.01,.2,.4,.6,.8), na.rm=T),
                         digits = 3))
  }
}

palette2 <- c("#0E122DFF","#AFD0C9FF")
                        
paletteMain <- c("#AFD0C9FF", "#91A1BAFF", "#51628EFF","#182044FF","#0E122DFF")

paletteQual <- c("#B39DDB", "#9575CD", "#7986CB", "#4FC3F7", "#4DD0E1", "#4DB6AC", "#81C784", "#AED581", "#DCE775", "#FFF176", "#FFD54F", "#FFB74D", "#FF8A65", "#A1887F", "#90A4AE", "#B0BEC5")

#this function converts a column in to quintiles. It is used for mapping.
quintileBreaks <- function(df,variable) {
  as.character(quantile(df[[variable]],
                        c(.01,.2,.4,.6,.8),na.rm=T))
}

#This function can be used to convert a polygon sf to centroids xy coords.
xyC <- function(aPolygonSF) {
  as.data.frame(
    cbind(x=st_coordinates(st_centroid(aPolygonSF))[,1],
          y=st_coordinates(st_centroid(aPolygonSF))[,2]))
} 

#this function convert a raster to a data frame so it can be plotted in ggplot
rast <- function(inRaster) {
  data.frame(
    xyFromCell(inRaster, 1:ncell(inRaster)), 
    value = getValues(inRaster)) }
```


# Introduction

As cities experience consistent population growth, urban growth forecasting is a tactful strategy to predict where people may live in the future. Planners are in a constant battle between balancing the economic potential of their cities with the negative ecological externalities and infrastructure demands urban growth poses in the city. Many cities and regions in the U.S. have experienced outward growth rather than investing in densified in-fill projects. As cities are feeling the ramifications of sprawl on climate change mitigation, traffic impacts, and municipal tax bases, forecasting where growth occurs is vital to understand how proper management of growth strategies can benefit a city. In this study, we forecast urban growth in the Nashville metropolitan area, also called the Nashville-Davidson–Murfreesboro–Franklin, TN MSA.

## Methodology

We are developing a predictive model of urban growth for the year 2029. We do this by collecting land use cover change from the years 2011 and 2019 to analyze change between the two, and predict the land use change between 2019 and 2029. We also engineer a variety of features ranging from land use, population, infrastructure, and development to predict growth patterns. Land cover change, our dependent variable, and our predictive variables will be aggregated into a fishnet of our study region to create a granular, equal spatial structure. We will then use this model to develop an allocation procedure from both the demand side and supply side of future development.

**Demand:** Looking at population projections for the year 2029, we distribute growth demand projections among Nashville MSA.  
**Supply:** We propose an extension to the interstate system and forecast how that changes urban growth forecasts.  

## Study Region: Nashville MSA

```{r study area, warning=FALSE, message=FALSE, results=FALSE}
tennessee <- counties("TN") # county via tigris

nashville <- tennessee %>%
  dplyr::filter(NAMELSAD%in%c("Cheatham County", "Davidson County", "Dickson County", "Macon County", "Robertson County", "Rutherford County", "Smith County", "Sumner County", "Trousdale County", "Williamson County", "Wilson County", "Maury County", "Cannon County"
))%>%
  st_transform(crs = 2274)

ggplot()+
  geom_sf(data=tennessee, fill='grey', color='black')+
  geom_sf(data=nashville, fill=palette2[1], color=palette2[2])+
  labs(title="Nashville MSA")+
  mapTheme()
```

The Nashville MSA is located in the north-central part of the state of Tennessee, encompassing 13 out of 95 Tennessee counties. Its three principal cities include Nashville, Murfreesboro, and Franklin, which are the states' first, sixth, and seventh largest cities by population, respectively. It is the largest MSA in Tennessee and the 36th largest MSA in the country. We selected this region because it is the fifth-fastest growing MSA in the U.S., with a 10-year growth rate of 20.9%, nearly double the average of other MSA's growth rates.

## Planning Application

Planners often try to assess population growth using a variety of strategies, such as demographic trend analyses, economic analyses, and scenario planning, to name a few. We are completing this model because the process of predicting growth is challenging. While using previous population data, historical land use change, and other spatially related factors, growth of a city is often non-linear and non-predictive given factors like economic development and job opportunities, public policies, and migration. 

Moreover, while counties like Davidson County (Nashville is the capital) have a strong urban core, much of the region is rapidly sprawling. In 2014, a [study]("https://wpln.org/post/how-bad-is-nashvilles-sprawl-new-study-ranks-it-among-the-worst/") by Smart Growth America and University of Utah ranked the Nashville MSA as one of the worst of nearly 221 metro areas for urban sprawl. We know that incentivizing growth in areas with infill potential, investing in transit-oriented development, and disincentivizing continual urban sprawl can help planners engage in more sustainable, compact development to minimize strain of infrastructure and natural resources.

# Data Set-Up

First, we set up our data for analysis. This includes creating a fishnet grid overlay onto the MSA, and aggregating our dependent variable, land use change between 2009 and 2019.

## Fishnet

We first create the spatial structure of our fishnet grid. We decided to use a 4000 feet resolution. This was a challenge to decide given the balance between maximizing our model's accuracy while balancing out the computation intensity of the modeling process. The fishnet grid allows us to create a granular spatial structure across our study region instead of working with dissimilar areal units, like census tracts or neighborhoods.

```{r fishnet, warning=FALSE, message=FALSE}
# create fishnet
Nashville_fishnet <- 
  st_make_grid(nashville, 4000) %>%
  st_sf()

# clip to nashville
Nashville_fishnet <-
  Nashville_fishnet[nashville,]

# add unique ID to each fishnet 
Nashville_fishnet <-
  Nashville_fishnet %>%
  rownames_to_column("fishnetID") %>% 
  mutate(fishnetID = as.numeric(fishnetID)) %>%
  dplyr::select(fishnetID)

# plot
ggplot() +
  geom_sf(data=Nashville_fishnet) +
  labs(title="Fishnet, 4000 Feet Resolution", subtitle="Nashville MSA") +
  mapTheme()
```

## Land Use Change

We retrieve land use change data from USGS National Land Cover Database, which can be found [here](https://www.mrlc.gov/data/nlcd-land-cover-change-index-conus). Particularly, we are looking at the change between 2011 and 2019. The land cover change data has 30 meter resolution, so we take the centroids of each hexbin and compare it to the land cover type to downsample the data, making it as accurate and computationally runnable as possible.

We reclassified the data into several different overarching types to build as features into our model, and use the land cover change between the 2 years as our dependent variable (0 = unchanged, 1 = newly developed).

### Land Cover - 2011

First, we look at the land cover in the year 2011.

```{r landcover,  warning=FALSE, message=FALSE, img.width=10}
#NLCD LULC & LUCC

lucc_input <- raster("https://github.com/ObjQIAN/warehouseii/raw/main/final/luccmsa_resample.tif")
lulc_input <- raster("https://github.com/ObjQIAN/warehouseii/raw/main/final/lulc11f2p_Resample.tif")
#plot(lucc_input)

#lucc_clip_raw <- projectRaster(lucc_input, crs = 2274)
#lulc_clip_raw <- projectRaster(lulc_input, crs = 2274)
lucc_clip <- mask(lucc_input, nashville)
lulc_clip <- mask(lulc_input, nashville)
#lucc_clip <- aggregate(lucc_clip0, fact = 3)
#lulc_clip <- aggregate(lulc_clip0, fact = 3)


#plot lulc
ggplot() +
  geom_sf(data=nashville) +
  geom_raster(data=rast(lulc_clip) %>% na.omit %>% filter(value > 0), 
              aes(x,y,fill=as.factor(value))) +
  labs(title = "Land Cover (2011)", subtitle="Nashville MSA") +
  scale_fill_manual(values=c(paletteQual), name="Land\nCover\nType")+
  mapTheme() +
  theme(legend.position="bottom")
```

### Land Cover Change - 2009 to 2019

We compare the land cover in 2011 and 2019, and map below the areas that changed, colored by the land type it changed to.

```{r land change, warning=FALSE, message=FALSE, img.width=10}
#plot lucc
ggplot() +
  geom_sf(data=nashville, fill="black") +
  geom_raster(data=rast(lucc_clip) %>% na.omit %>% filter(value > 0), 
              aes(x,y,fill=as.factor(value))) +
  scale_fill_manual(values=c(paletteQual), name="Land Cover Type")+
  labs(title = "Land Cover Change (2011 to 2019)", subtitle="Nashville MSA") +
  mapTheme() +
  theme(legend.position="bottom")
```

### Binarized Classification of Change

We reclassify the centroids that changed to take on 1 values.

```{r reclass, warning=FALSE, message=FALSE}


#reclass matrix 
reclassMatrix <- 
  matrix(c(
    0,12,0,
    12,24,1,
    24,Inf,0),
    ncol=3, byrow=T)

#reclassMatrix

#reclassify lucc
lucc <- 
  reclassify(lucc_clip,reclassMatrix)

lucc[lucc < 1] <- NA

names(lucc) <- "lc_change"

# plot 1's
ggplot() +
  geom_sf(data=nashville, fill=palette2[1]) +
  geom_raster(data=rast(lucc) %>% na.omit, 
              aes(x,y,fill=as.factor(value))) +
  scale_fill_manual(values=palette2[2],
                    name="")+
  #scale_fill_viridis(discrete=TRUE, name ="Land Cover\nChange") + 
  labs(title="Land Use Change (2011 to 2019)", subtitle="Values of 1 indicate change") +
  mapTheme()
```

We feed this information back into our fishnet to categorize based on developed and undeveloped, which is mapped below.

```{r change points, warning=FALSE, message=FALSE}
changePoints <-
  rasterToPoints(lucc) %>%
  as.data.frame() %>%
  st_as_sf(coords = c("x", "y"), crs = st_crs(Nashville_fishnet))

fishnet <- 
  aggregate(changePoints, Nashville_fishnet, sum) %>%
  mutate(lc_change = ifelse(is.na(lc_change),0,1),
         lc_change = as.factor(lc_change))

ggplot() +
  geom_sf(data=nashville) +
  geom_point(data=fishnet, 
             aes(x=xyC(fishnet)$x, y=xyC(fishnet)$y, colour=lc_change)) +
  scale_colour_manual(values = palette2,
                      labels=c("No Change","New Development"),
                      name = "") +
  labs(title = "Land Cover Development Change", subtitle = "As fishnet centroids") +
  mapTheme()
```

We see below that of our 10,364 fishnet cells, 8,361 (80%) did not experience land cover change and 2,003 (20%) are newly developed. We hope that when training and testing our model, there is a relatively decent balance of non-developed and developed observations, which hopefully will improve our true positive (predicts change correctly), true negative (predicts non-change correctly), and accuracy.

```{r table}
table(fishnet$lc_change)
```

### Land Use Reclassification

Based on the 15 land class values, we reclassified them into 6 broader categories: *developed*, *forest*, *farm*, *wetlands*, *other undeveloped*, and *water*. These new categories are then re-aggregated into our fishnet so we can see the predominant land type of each.

```{r reclass2, warning=FALSE, message=FALSE}
# landuse reclassification
lulc_clip <- aggregate(lulc_clip, fact = 2)
developed <- lulc_clip == 21 | lulc_clip == 22 | lulc_clip == 23 | lulc_clip == 24
forest <- lulc_clip == 41 | lulc_clip == 42 | lulc_clip == 43 
farm <- lulc_clip == 81 | lulc_clip == 82 
wetlands <- lulc_clip == 90 | lulc_clip == 95 
otherUndeveloped <- lulc_clip == 52 | lulc_clip == 71 | lulc_clip == 31 
water <- lulc_clip == 11

names(developed) <- "developed"
names(forest) <- "forest"
names(farm) <- "farm"
names(wetlands) <- "wetlands"
names(otherUndeveloped) <- "otherUndeveloped"
names(water) <- "water"
```

```{r rster agg, warning=FALSE, message=FALSE}
# aggregateRaster


aggregateRaster <- function(inputRasterList, theFishnet) {
  #create an empty fishnet with the same dimensions as the input fishnet
  theseFishnets <- theFishnet %>% dplyr::select()
  #for each raster in the raster list
  for (i in inputRasterList) {
  #create a variable name corresponding to the ith raster
  varName <- names(i)
  #convert raster to points as an sf
    thesePoints <-
      rasterToPoints(i) %>%
      as.data.frame() %>%
      st_as_sf(coords = c("x", "y"), crs = st_crs(theFishnet)) %>%
      filter(.[[1]] == 1)
  #aggregate to the fishnet
    thisFishnet <-
      aggregate(thesePoints, theFishnet, length) %>%
      mutate(!!varName := ifelse(is.na(.[[1]]),0,1))
  #add to the larger fishnet
    theseFishnets <- cbind(theseFishnets,thisFishnet)
  }
  #output all aggregates as one large fishnet
   return(theseFishnets)
  }

theRasterList <- c(developed,forest,farm,wetlands,otherUndeveloped,water)

aggregatedRasters <-
  aggregateRaster(theRasterList, Nashville_fishnet) %>%
  dplyr::select(developed,forest,farm,wetlands,otherUndeveloped,water) %>%
  mutate_if(is.numeric,as.factor)

aggregatedRasters %>%
  gather(var,value,developed:water) %>%
  st_cast("POLYGON") %>%    #just to make sure no weird geometries slipped in
  mutate(X = xyC(.)$x,
         Y = xyC(.)$y) %>%
  ggplot() +
    geom_sf(data=nashville) +
    geom_point(aes(X,Y, colour=as.factor(value))) +
    facet_wrap(~var) +
    scale_colour_manual(values = palette2,
                        labels=c("Other","Land Cover"),
                        name = "") +
    labs(title = "Land Cover Types, 2001",
         subtitle = "As fishnet centroids") +
   mapTheme()
```

# Feature Aggregation

The following features have been added to our fishnet below, in addition to our land use change:

1. Total Population - American Community Survey (2011 and 2019)
2. % of Occupied Housing Units - American Community Survey (2011 and 2019)
3. Distance to Interstates - Tigris
4. Spatial Lag to Developed Areas
5. Fixed Effect - Distance to Cropland

We hypothesized that each of these features have some correlative effect to development patterns.

## Census Data

We pulled two main variables from the census: total population and occupied housing units. We hypothesized that these were predictive of development, thus predictive of land use change. With a greater density of people and housing units, these areas are likely to grow. We gathered these variables at the census tract level.

```{r access census, warning=FALSE, message=FALSE, results=FALSE}
census_api_key("3c896250ea8d421462ade754e4dcecdf8f55e0f2", overwrite = TRUE)

nashvillePop2011 <- 
  get_acs(geography = "tract", 
          variables = c("B01001_001E", # Total population
                        "B25002_002E" # Occupied housing units
          ),
          year = 2011,
          state = 47, 
          geometry = TRUE, 
          county = c(021, 037, 043, 081, 111, 147, 149, 159, 165, 169, 187, 189, 119, 015),
          output = "wide") %>%
  rename(totalPop2011 = B01001_001E,
         totalOccupied2011 = B25002_002E
  ) %>%
  dplyr::select(-c("B01001_001M", "B25002_002M")) %>%
  st_transform(crs = 2274)

nashvillePop2019 <- 
  get_acs(geography = "tract", 
          variables = c("B01001_001E", # Total population
                        "B25002_002E" # Occupied housing units
          ),
          year = 2019,
          state = 47, 
          geometry = TRUE, 
          county = c(021, 037, 043, 081, 111, 147, 149, 159, 165, 169, 187, 189, 119, 015),
          output = "wide") %>%
  rename(totalPop2019 = B01001_001E,
         totalOccupied2019 = B25002_002E
  ) %>%
  dplyr::select(-c("B01001_001M", "B25002_002M")) %>%
  st_transform(crs = 2274)
```

### Total Populaion - 2011 vs. 2019

First, we look at total population within each census tract in years 2011 and 2019.

```{r total pop, warning=FALSE, message=FALSE}
grid.arrange(
  ggplot()+
    geom_sf(data=nashvillePop2011, aes(fill=q5(totalPop2011)), color=NA) +
    geom_sf(data=nashville, color="white", fill=NA)+
    scale_fill_manual(values=paletteMain,
                      labels=(qBr(nashvillePop2011, "totalPop2011")),
                      name="Population\nQuintiles")+
    labs(title = "Total Population - 2011",
         subtitle = "Nashville MSA Census Tracts")+
    mapTheme(),
  ggplot()+
    geom_sf(data=nashvillePop2019, aes(fill=q5(totalPop2019)), color=NA) +
    geom_sf(data=nashville, color="white", fill=NA)+
    scale_fill_manual(values=paletteMain,
                      labels=(qBr(nashvillePop2019, "totalPop2019")),
                      name="Population\nQuintiles")+
    labs(title = "Total Population - 2019",
         subtitle = "Nashville MSA Census Tracts")+
    mapTheme(),
  ncol=2
)
```

### Occupied Housing Units - 2011 vs. 2019

Now, we visualize the total occupied housing units in each census tract.

```{r housing, warning=FALSE, message=FALSE}
grid.arrange(
  ggplot()+
    geom_sf(data=nashvillePop2011, aes(fill=q5(totalOccupied2011)), color=NA) +
        geom_sf(data=nashville, color="white", fill=NA)+
    scale_fill_manual(values=paletteMain,
                      labels=(qB(nashvillePop2011, "totalOccupied2011")),
                      name="Housing Unit\nQuintiles")+
    labs(title = "Occupied Housing Units - 2011",
         subtitle = "Nashville MSA Census Tracts")+
    mapTheme(),
  ggplot()+
    geom_sf(data=nashvillePop2019, aes(fill=q5(totalOccupied2019)), color=NA) +
        geom_sf(data=nashville, color="white", fill=NA)+
    scale_fill_manual(values=paletteMain,
                      labels=(qB(nashvillePop2019, "totalOccupied2019")),
                      name="Housing Unit\nQuintiles")+
    labs(title = "Occupied Housing Units - 2019",
         subtitle = "Nashville MSA Census Tracts")+
    mapTheme(),
  ncol=2
)
```

### Census Features as Fishnet

We aggregate all of the census features for each year into our fishnet grid, and visualize them within the fishnet, also including the total change in population and housing units for each grid below.

```{r cen to fish, warning=FALSE, message=FALSE}
fishnetPopulation11 <-
  st_interpolate_aw(nashvillePop2011["totalPop2011"], Nashville_fishnet, extensive=TRUE) %>%
  as.data.frame(.) %>%
  rownames_to_column(var = "fishnetID") %>%
  left_join(Nashville_fishnet %>%
              mutate(fishnetID = as.character(fishnetID)),
            ., by=c("fishnetID"='fishnetID')) %>% 
  mutate(totalPop2011 = replace_na(totalPop2011,0)) %>%
  dplyr::select(totalPop2011)

fishnetPopulation19 <-
  st_interpolate_aw(nashvillePop2019["totalPop2019"], Nashville_fishnet, extensive=TRUE) %>%
  as.data.frame(.) %>%
  rownames_to_column(var = "fishnetID") %>%
  left_join(Nashville_fishnet %>%
              mutate(fishnetID = as.character(fishnetID)),
            ., by=c("fishnetID"='fishnetID')) %>% 
  mutate(totalPop2019 = replace_na(totalPop2019,0)) %>%
  dplyr::select(totalPop2019)

fishnetHousing11 <-
  st_interpolate_aw(nashvillePop2011["totalOccupied2011"], Nashville_fishnet, extensive=TRUE) %>%
  as.data.frame(.) %>%
  rownames_to_column(var = "fishnetID") %>%
  left_join(Nashville_fishnet %>%
              mutate(fishnetID = as.character(fishnetID)),
            ., by=c("fishnetID"='fishnetID')) %>% 
  mutate(totalOccupied2011 = replace_na(totalOccupied2011,0)) %>%
  dplyr::select(totalOccupied2011)

fishnetHousing19 <-
  st_interpolate_aw(nashvillePop2019["totalOccupied2019"], Nashville_fishnet, extensive=TRUE) %>%
  as.data.frame(.) %>%
  rownames_to_column(var = "fishnetID") %>%
  left_join(Nashville_fishnet %>%
              mutate(fishnetID = as.character(fishnetID)),
            ., by=c("fishnetID"='fishnetID')) %>% 
  mutate(totalOccupied2019 = replace_na(totalOccupied2019,0)) %>%
  dplyr::select(totalOccupied2019)

fishnetCensus <- 
  cbind(fishnetPopulation11,fishnetPopulation19, fishnetHousing11, fishnetHousing19) %>%
  dplyr::select(totalPop2011, totalPop2019, totalOccupied2011, totalOccupied2019) %>%
  mutate(popChange = (totalPop2019 - totalPop2011),
         housingChange = (totalOccupied2019 - totalOccupied2011))

grid.arrange(
  ggplot()+
    geom_sf(data=fishnetCensus, aes(fill=q5(popChange)), color=NA)+
    scale_fill_manual(values=paletteMain,
                      labels=(qB(fishnetCensus, "popChange")),
                      name="Quintiles")+
    labs(title="Fishnet: Population Change")+
    mapTheme(),
  ggplot()+
    geom_sf(data=fishnetCensus, aes(fill=q5(housingChange)), color=NA)+
    scale_fill_manual(values=paletteMain,
                      labels=(qB(fishnetCensus, "housingChange")),
                      name="Quintiles")+
    labs(title="Fishnet: Housing Units Change")+
    mapTheme(),
  ncol=1
)
```

## Distance to Interstate

Our first distance-based classification we did was distance to interstates. We accessed interstate data from the Tigris package in R. We hypothesized new development may occur in areas with closer proximity to the interstate due to ease of accessibility. We first visualize interstates below.

```{r int, warning=FALSE, message=FALSE, results=FALSE}
nashvilleInterstate <- roads("TN", c("Cheatham", "Davidson", "Dickson", "Macon", "Robertson", "Rutherford", "Smith", "Sumner", "Trousdale", "Williamson", "Wilson", "Maury", "Cannon")) %>%
  dplyr::filter(RTTYP %in% "I") %>%
  st_transform(crs = 2274)

ggplot()+
  geom_sf(data=nashville, fill="black", color="white")+
  geom_sf(data=nashvilleInterstate, fill="red", color='red')+
  labs(title="Interstates", subtitle="Nashville MSA", caption="County boundaries in white\nInterstates in red")+
  mapTheme()
```

We aggregate the intersection of each interstate into the fishnet and calculate the distance to nearest interstate for each fishnet cell. Quintile distances are shown below.

```{r int fish, warning=FALSE, message=FALSE}
fishnetInterstate <- fishnet %>%
  mutate(uniqueID = as.character(row_number()))

fishnet_centroid <- fishnetInterstate %>%
  st_centroid()

interstateDist <- fishnet_centroid %>%
  st_distance(nashvilleInterstate %>%
                st_transform(st_crs(fishnet_centroid))) %>%
  as.data.frame() %>%
  mutate(uniqueID = as.character(row_number())) %>%
  gather(-uniqueID, key = "variable", value = "value") %>%
  dplyr::select(-variable) %>%
  group_by(uniqueID) %>%
  summarize(interstateDist = min(value))

interstateDist$interstateDist <- as.numeric(gsub("\\[US_survey_foot\\]", "", interstateDist$interstateDist))

fishnet <- left_join(fishnetInterstate, interstateDist)

ggplot() +
  geom_point(data=fishnet, aes(x=xyC(fishnet)[,1], 
                               y=xyC(fishnet)[,2], 
                               colour=factor(ntile(interstateDist,5))),size=1.5) +
  scale_colour_manual(values = paletteMain,
                      labels=substr(quintileBreaks(interstateDist,"interstateDist"),1,8),
                      name="Quintile\nBreaks") +
  geom_sf(data=nashvilleInterstate, colour = "red") +
  labs(title = "Distance to Interstates",
       subtitle = "Interstates visualized in red") +
  mapTheme()
```

## Spatial Lag to Development

Another feature we hypothesized is the spatial lag to already developed fishnets. We believe that as areas within the MSA grow, for the sake of creating continuous developed space, new development will be a function of current development. 

```{r spatial lag, warning=FALSE, message=FALSE}
#define function
nn_function <- function(measureFrom,measureTo,k) {
  #convert the sf layers to matrices
  measureFrom_Matrix <-
    as.matrix(measureFrom)
  measureTo_Matrix <-
    as.matrix(measureTo)
  nn <-   
    get.knnx(measureTo, measureFrom, k)$nn.dist
  output <-
    as.data.frame(nn) %>%
    rownames_to_column(var = "thisPoint") %>%
    gather(points, point_distance, V1:ncol(.)) %>%
    arrange(as.numeric(thisPoint)) %>%
    group_by(thisPoint) %>%
    summarize(pointDistance = mean(point_distance)) %>%
    arrange(as.numeric(thisPoint)) %>% 
    dplyr::select(-thisPoint) %>%
    pull()
  
  return(output)  
}

fishnet$lagDevelopment <-
  nn_function(xyC(fishnet),
              xyC(filter(aggregatedRasters,developed==1)),
              2)

# ggplot() +
#   geom_sf(data=nashville) +
#   geom_point(data=fishnet, 
#              aes(x=xyC(fishnet)[,1], y=xyC(fishnet)[,2], 
#                  colour=factor(ntile(lagDevelopment,5))), size=1.5) +
#   scale_colour_manual(values = paletteMain,
#                      labels=substr(quintileBreaks(fishnet,"lagDevelopment"),1,7),
#                      name="Quintile\nBreaks") +
#   labs(title = "Spatial Lag to 2001 Development",
#        subtitle = "As fishnet centroids") +
#   mapTheme()

ggplot() +
  geom_sf(data=nashville) +
  geom_point(data=fishnet, 
             aes(x=xyC(fishnet)[,1], 
                 y=xyC(fishnet)[,2], 
                 colour=factor(ntile(lagDevelopment, 5))), size = 1.5) +
    scale_colour_manual(values = paletteMain,
                      labels=substr(quintileBreaks(fishnet, "lagDevelopment"), 1, 8),
                      name="Quintile\nBreaks")+
  labs(title = "Spatial Lag to Development",
       subtitle = "As fishnet centroids") +
  mapTheme()
``` 

## Spatial Lag to Croplands

Our final feature is a fixed effect, distance to croplands. Using USGS data, we reclassified land types 81 and 82, and determined spatial lag to these areas. Cropland was an arbitrary choice, but reflects again how distance to a certain land type may be useful to understand development patterns, especially a land cover like cropland that may be antithetical to development.

```{r crops, warning=FALSE, message=FALSE}
crop_VALUE1 <- 81
crop_VALUE2 <- 82

# Define a function to keep only crop value, and set other values to 0
keep_values <- function(x) {
  x[!(x %in% c(crop_VALUE1, crop_VALUE2))] <- NA
  return(x)
}

# Apply the function to the original raster
cropland_raster <- calc(lulc_clip, keep_values)

#cropland_raster <- aggregate(lulc_clip, fact = 5) 
cropland_raster <- mask(cropland_raster, nashville)
cropland_raster <- aggregate(cropland_raster, fact = 2) 
cropland_distance <- distance(cropland_raster)
names(cropland_distance) <- "distance_croplands"

cropland_Points <-
  rasterToPoints(cropland_distance) %>%
  as.data.frame() %>%
  st_as_sf(coords = c("x", "y"), crs = st_crs(Nashville_fishnet))

cropland_Points_fishnet <- 
  aggregate(cropland_Points, Nashville_fishnet, mean) %>%
  mutate(croplandsDist = ifelse(is.na(distance_croplands), 0, distance_croplands))



ggplot() +
  geom_sf(data=nashville) +
  geom_point(data=cropland_Points_fishnet, aes(x=xyC(cropland_Points_fishnet)[,1], 
                                               y=xyC(cropland_Points_fishnet)[,2], 
                                               colour=factor(ntile(croplandsDist, 5))), size=1.5) +
  scale_colour_manual(values = paletteMain,
                      labels=substr(quintileBreaks(cropland_Points_fishnet, "croplandsDist"), 1, 8),
                      name="Quintile\nBreaks") +
  #  geom_raster(data=cropland_raster, aes(x, y, fill=as.factor(value)), alpha=0.5) + # Add the cropland raster to the plot
  labs(title = "Distance to Croplands",
       subtitle = "As fishnet centroids") +
  mapTheme()
```

## Supply Side Solution: I-840 Expansion

In the supply side solution of our modeling work, we decided to understand how a new interstate expansion would impact development demand. We drew a hypothetical interstate line using Google Earth Pro, which would be a continuation of I-840, and would intersect with I-40 and I-65.

```{r supply side solution, warning=FALSE, message=FALSE, results=FALSE}
newInterstate <- st_read("C:/Users/jtrum/Desktop/CPLN6750/int.kml") %>%
  st_transform(st_crs(nashvilleInterstate))

ggplot()+
  geom_sf(data=nashville, fill="black", color="white")+
  geom_sf(data=nashvilleInterstate, fill="red", color='red')+
  geom_sf(data=newInterstate, fill="#FFD54F", color="#FFD54F", aes(geometry = geometry))+
  labs(title="Proposed Interstate Expansion", subtitle="Nashville MSA", caption="County boundaries in white\nInterstates in red")+
  mapTheme()

```

We now calculate the distance of each fishnet cell to the new interstate, which will be re-visited in Section 7.

```{r new interstate distance, supply side solution, warning=FALSE, message=FALSE, results=FALSE}
fishnetSupply <- fishnet %>%
  mutate(uniqueID = as.character(row_number()))

fishnet_centroid2 <- fishnetSupply %>%
  st_centroid()

newInterstateDist <- fishnet_centroid2 %>%
  st_distance(newInterstate %>%
                st_transform(st_crs(fishnet_centroid2))) %>%
  as.data.frame() %>%
  mutate(uniqueID = as.character(row_number())) %>%
  gather(-uniqueID, key = "variable", value = "value") %>%
  dplyr::select(-variable) %>%
  group_by(uniqueID) %>%
  summarize(newInterstateDist = min(value))

newInterstateDist$newInterstateDist <- as.numeric(gsub("\\[US_survey_foot\\]", "", newInterstateDist$newInterstateDist))

fishnet <- left_join(fishnetSupply, newInterstateDist)

ggplot() +
  geom_point(data=fishnet, aes(x=xyC(fishnet)[,1], 
                               y=xyC(fishnet)[,2], 
                               colour=factor(ntile(newInterstateDist,5))),size=1.5) +
  scale_colour_manual(values = paletteMain,
                      labels=substr(quintileBreaks(newInterstateDist,"newInterstateDist"),1,8),
                      name="Quintile\nBreaks") +
  geom_sf(data=nashvilleInterstate, colour="black")+
  geom_sf(data=newInterstate, colour = "red") +
  labs(title = "Distance to Interstate Proposal",
       subtitle = "New interstate visualized in red\nExisting interstates visualized in black") +
  mapTheme()
```

## Final Fishnet

```{r study area counties, supply side solution, warning=FALSE, message=FALSE, results=FALSE}
options(tigris_class = "sf")

studyAreaCounties <- 
  counties("Tennessee") %>%
  st_transform(st_crs(nashville)) %>%
  dplyr::select(NAME) %>%
  .[st_buffer(nashville,-1000), , op=st_intersects]
```

We develop a final dataset of all of our features to use for modeling purposes, where variable *lc_change* serves as our dependent variable.

```{r final fishnet, supply side solution, warning=FALSE, message=FALSE, results=FALSE}
# fishnet has interstate distance and spatial lag of development
# needs all of the census variables, distance to bus stop, fixed effects, building permit apps

dat <- 
  cbind(
    fishnet, aggregatedRasters, fishnetCensus, cropland_Points_fishnet) %>%
  dplyr::select(lc_change,
                interstateDist,
                newInterstateDist,
                lagDevelopment,
                developed,
                forest,
                farm,
                otherUndeveloped,
                water,
                wetlands,
                totalPop2011,
                totalPop2019,
                totalOccupied2011,
                totalOccupied2019,
                popChange,
                housingChange,
                croplandsDist
                ) %>%
  st_join(studyAreaCounties) %>%
  mutate(developed19 = ifelse(lc_change == 1 & developed == 1, 0, developed)) %>%
  filter(water == 0) 
```



# Exploratory Analysis

Next, we made some plots to explore the implications of our predictive variables on land use change.

## New Development as a Function of Distance to Built Environment Amenities

First, we calculated each fishnet's average distance to some of our features, grouped by their classification of land use change. We compared the spatial lag to development, distance to croplands, and distance to interstate. We notice below that new development is on average twice as close to interstates and the spatial lag of development, and an a cell is just slightly more likely to be closer to cropland if unchanged.

```{r warning=FALSE, message=FALSE, fig.width=10}
dat %>%
  dplyr::select(lagDevelopment, interstateDist, croplandsDist, lc_change) %>%
  gather(Variable, Value, -lc_change, -geometry) %>%
  ggplot(., aes(lc_change, Value, fill=lc_change)) + 
  geom_bar(position = "dodge", stat = "summary", fun.y = "mean") +
  facet_wrap(~Variable) +
  scale_fill_manual(values = palette2,
                    labels=c("No Change","New Development"),
                    name="") +
  labs(title="New Development as a Function of Distance Variables") +
  plotTheme()
```

## New Development as a Function of Distance to Population

Next, we see total population in years 2011 and 2019, in addition to population change, that there are a greater number of people living in newly developed areas over unchanged areas, which indicates there is greater demand for areas that are experiencing population growth.

```{r, warning=FALSE, fig.width=10, message=FALSE}
dat %>%
  dplyr::select(totalPop2011,totalPop2019,popChange,lc_change) %>%
  gather(Variable, Value, -lc_change, -geometry) %>%
  ggplot(., aes(lc_change, Value, fill=lc_change)) + 
  geom_bar(position = "dodge", stat = "summary", fun.y = "mean") +
  facet_wrap(~Variable) +
  scale_fill_manual(values = palette2,
                    labels=c("No Change","New Development"),
                    name="") +
  labs(title="New Development as a Function of Population Variables") +
  plotTheme()
```

## New Development as a Function of Amount of Occupied Housing

We also look at the same calculation but for the amount of occupied housing in 2011 and 2019, and the change in occupied housing between them. The plots indicate that there newly developed areas are driven by greater population increases. 

```{r, warning=FALSE, fig.width=10, message=FALSE}
dat %>%
  dplyr::select(totalOccupied2011,totalOccupied2019,housingChange,lc_change) %>%
  gather(Variable, Value, -lc_change, -geometry) %>%
  ggplot(., aes(lc_change, Value, fill=lc_change)) + 
  geom_bar(position = "dodge", stat = "summary", fun.y = "mean") +
  facet_wrap(~Variable) +
  scale_fill_manual(values = palette2,
                    labels=c("No Change","New Development"),
                    name="") +
  labs(title="New Development as a Function of Occupancy Status") +
  plotTheme()
```

# Modeling

We begin our modeling process by splitting the data into an equally sized training and testing set, partitioned on the *developed* land use classification. We create six different binary logistic regression models, each increasing with number of variables.

```{r model 407, warning=FALSE, message=FALSE, results=FALSE}
set.seed(407)
trainIndex <- 
  createDataPartition(dat$developed, p = .50,
                                  list = FALSE,
                                  times = 1)
datTrain <- dat[ trainIndex,]
datTest  <- dat[-trainIndex,]
```

```{r first six models, warning=FALSE, message=FALSE, results=FALSE}
Model1 <- glm(lc_change ~ developed + wetlands + forest + farm + otherUndeveloped, 
              family="binomial"(link="logit"), data = datTrain) # only land type features

Model2 <- glm(lc_change ~ developed + wetlands + forest + farm + otherUndeveloped + totalPop2011 + totalPop2019 + popChange, # plus population features
              family="binomial"(link="logit"), data = datTrain)
              
Model3 <- glm(lc_change ~ developed + wetlands + forest + farm + otherUndeveloped + totalPop2011 + totalPop2019 + popChange + totalOccupied2011 + totalOccupied2019 + housingChange, # plus housing occupancy features
              family="binomial"(link="logit"), data = datTrain)          
              
Model4 <- glm(lc_change ~ developed + wetlands + forest + farm + otherUndeveloped + totalPop2011 + totalPop2019 + popChange + totalOccupied2011 + totalOccupied2019 + housingChange + croplandsDist, # plus cropland spatial feature
              family="binomial"(link="logit"), data = datTrain)              
            
Model5 <- glm(lc_change ~ developed + wetlands + forest + farm + otherUndeveloped + totalPop2011 + totalPop2019 + popChange + totalOccupied2011 + totalOccupied2019 + housingChange + croplandsDist + interstateDist, # plus interstate spatial feature
              family="binomial"(link="logit"), data = datTrain)              
              
Model6 <- glm(lc_change ~ developed + wetlands + forest + farm + otherUndeveloped + totalPop2011 + totalPop2019 + popChange + totalOccupied2011 + totalOccupied2019 + housingChange + croplandsDist + interstateDist + lagDevelopment, # plus development lag spatial feature
              family="binomial"(link="logit"), data = datTrain) 
```

Looking at our model with the most variables, we see several features have high importance to our model, especially interstate distance and lag of development.

```{r model6 sum, warning=FALSE, message=FALSE, results=TRUE}
summary(Model6)
```

## McFadden Score

The McFadden Score, also known as the pseudo R-squared, helps give us insight to explaining how much our variables explain variance in land patterns. Our model 6 has the highest R-squared, though all of the models are quite low.

```{r mcfadden, warning=FALSE, message=FALSE, results=FALSE}
modelList <- paste0("Model", 1:6)

map_dfc(modelList, function(x)pR2(get(x)))[4,] %>%
  setNames(paste0("Model",1:6)) %>%
  gather(Model,McFadden) %>%
  ggplot(aes(Model,McFadden)) +
    geom_bar(stat="identity", fill=palette2[1]) +
    geom_text(aes(label=round(McFadden,4)), vjust=1.5, color="white") +
    labs(title= "McFadden Score by Model") +
    plotTheme()

```

## Density of Observations

This plot indicated the histogram of our test set predicted probabilities classified by land use change. We expect the density of *no change* to fall closer to 0 along the x-axis, and the density of *new development* to fall closer to 1 on the x-axis. 

```{r density histograms, warning=FALSE, message=FALSE, results=FALSE}
testSetProbs <- 
  data.frame(class = datTest$lc_change,
             probs = predict(Model6, datTest, type="response")) 
  
ggplot(testSetProbs, aes(probs)) +
  geom_density(aes(fill=class), alpha=0.5) +
  scale_fill_manual(values = palette2,
                    labels=c("No Change","New Development"),
                    name="Land Change Class") +
  labs(title = "Histogram of Predicted Probabilities",
       x="Predicted Probabilities",y="Density") +
  plotTheme()
```

## Threshold Testing

We experiment with two probability thresholds: 22% and 32%. These thresholds help us binarize probabilities, so that anything falling under the threshold gets assigned 0 (no change), and anything above the threshold gets assigned 1 (new development). We chose these values 

We see that from the table: both thresholds do an okay job at correctly predicting new development (sensitivity), a much better job at predicting no change (specificity), and have pretty decent accuracy scores.

```{r threshold testing, warning=FALSE, message=FALSE}
options(yardstick.event_first = FALSE)

testSetProbs <- 
  testSetProbs %>% 
  mutate(predClass_22 = as.factor(ifelse(testSetProbs$probs >= 0.22 ,1,0)),
         predClass_32 = as.factor(ifelse(testSetProbs$probs >= 0.32 ,1,0))) 

testSetProbs %>%
  dplyr::select(-probs) %>%
  gather(Variable, Value, -class) %>%
  group_by(Variable) %>%
  summarize(Sensitivity = round(yardstick::sens_vec(class,factor(Value)),2),
            Specificity = round(yardstick::spec_vec(class,factor(Value)),2),
            Accuracy = round(yardstick::accuracy_vec(class,factor(Value)),2)) %>% 
  kable() %>%
  kable_styling(full_width = F, bootstrap_options = c("bordered"))
```

```{r development predictions, warning=FALSE, message=FALSE, results=FALSE}
predsForMap <-         
  dat %>%
    mutate(probs = predict(Model6, dat, type="response") ,
           Threshold_22_Pct = as.factor(ifelse(probs >= 0.22 ,1,0)),
           Threshold_32_Pct =  as.factor(ifelse(probs >= 0.32 ,1,0))) %>%
    dplyr::select(lc_change,Threshold_22_Pct,Threshold_32_Pct) %>%
    gather(Variable,Value, -geometry) %>%
    st_cast("POLYGON")

# ggplot() +
#   geom_point(data=predsForMap, aes(x=xyC(predsForMap)[,1], y=xyC(predsForMap)[,2], colour=Value)) +
#   facet_wrap(~Variable) +
#   scale_colour_manual(values = palette2, labels=c("No Change","New Development"),
#                       name="") +
#   labs(title="Development Predictions", subtitle="Actual Land Cover Change, Land Cover Change at 22%, Land Cover Change at 32%") + 
#   mapTheme()+theme(legend.position = "bottom")
```

With the plot below, we see an increase in threshold increases incorrect unchanged predictions and an increase in correct new development predictions.

```{r conf matrix, warning=FALSE, message=FALSE}
ConfusionMatrix.metrics <-
  dat %>%
    mutate(probs = predict(Model6, dat, type="response") ,
           Threshold_22_Pct = as.factor(ifelse(probs >= 0.22 ,1,0)),
           Threshold_32_Pct =  as.factor(ifelse(probs >= 0.32 ,1,0))) %>%
    mutate(TrueP_22 = ifelse(lc_change  == 1 & Threshold_22_Pct == 1, 1,0),
           TrueN_22 = ifelse(lc_change  == 0 & Threshold_22_Pct == 0, 1,0),
           TrueP_32 = ifelse(lc_change  == 1 & Threshold_32_Pct == 1, 1,0),
           TrueN_32 = ifelse(lc_change  == 0 & Threshold_32_Pct == 0, 1,0)) %>%
    dplyr::select(., starts_with("True")) %>%
    gather(Variable, Value, -geometry) %>%
    st_cast("POLYGON") 

ggplot(data=ConfusionMatrix.metrics) +
  geom_point(aes(x=xyC(ConfusionMatrix.metrics)[,1], 
                 y=xyC(ConfusionMatrix.metrics)[,2], colour = as.factor(Value))) +
  facet_wrap(~Variable) +
  scale_colour_manual(values = palette2, labels=c("Correct","Incorrect"),
                       name="") +
  labs(title="Development Predictions",
       subtitle="Threshold of 22% and 32%") + mapTheme()
```

## Spatial Cross-Validation

To prove our model is generalizable to different contexts, we cross-validate our model based on individual county's land use change patterns. We run a spatial cross-validation function to loop over our data and complete a leave-one-group-out cross validation (LOGO-CV) metric, classify outcomes by county, and assess their true positive, true negative, and accuracy scores.

From the graph below, we see a general trend of lower sensitivity, higher specificity, and relatively high accuracy, mirroring our model's predictions, indicating this model is applicable to a variety of contexts.

```{r spatial cv, warning=FALSE, message=FALSE, results=FALSE}
spatialCV <- function(dataFrame, uniqueID, dependentVariable, modelName) {

#initialize a data frame 
endList <- list()

#create a list that is all the spatial group unqiue ids in the data frame (ie counties)    
  uniqueID_List <- unique(dataFrame[[uniqueID]])  
  x <- 1
  y <- length(uniqueID_List)
  
#create a counter and while it is less than the number of counties...  
  while(x <= y) 
  {
#call a current county    
    currentUniqueID <- uniqueID_List[x]
#create a training set comprised of units not in that county and a test set of units
#that are that county
    training <- dataFrame[ which(dataFrame[[uniqueID]] != uniqueID_List[x]),]
    testing <- dataFrame[ which(dataFrame[[uniqueID]] == uniqueID_List[x]),]
#create seperate xy vectors
    trainingX <- training[ , -which(names(training) %in% c(dependentVariable))]
    testingX <- testing[ , -which(names(testing) %in% c(dependentVariable))]
    
    trainY <- training[[dependentVariable]]
    testY <- testing[[dependentVariable]]
#Calculate predictions on the test county as part of a data frame including the observed
#outcome and the unique county ID    
   thisPrediction <- 
     data.frame(class = testY,
                probs = predict(modelName, testingX, type="response"),
                county = currentUniqueID) 

#Row bind the predictions to a data farme
   endList <- rbind(endList, thisPrediction)
#iterate counter    
    x <- x + 1 
  } 
#return the final list of counties and associated predictions  
  return (as.data.frame(endList))
}
```

```{r spatial cv counties, warning=FALSE, message=FALSE, results=FALSE}
spatialCV_counties <-
  spatialCV(dat,"NAME","lc_change", Model6) %>%
  mutate(predClass = as.factor(ifelse(probs >= 0.22 ,1,0)))


spatialCV_metrics <-
  spatialCV_counties %>% 
    group_by(county) %>% 
    summarize(Observed_Change = sum(as.numeric(as.character(class))),
              Sensitivity = round(yardstick::sens_vec(class,predClass),2),
              Specificity = round(yardstick::spec_vec(class,predClass),2),
              Accuracy = round(yardstick::accuracy_vec(class,predClass),2)) 

spatialCV_metrics %>%
  dplyr::select(-c('Observed_Change')) %>%
  gather(key = "metric", value = "value", -county) %>%
  arrange(county) %>%
  ggplot(aes(x = county, y = value, fill = metric)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values=c(paletteMain[1], paletteMain[3], paletteMain[5]))+
  labs(x = "Value", y = "Name", fill = "Metric",
       title="Confusion Matrix Metrics by County",
       subtitle = "LOGO-CV") +
  plotTheme()+theme(axis.text.x = element_text(angle = 90, hjust = 1))

```

# Demand-Side Solution: Predicting for Development in 2029

We further develop this model to try to predict land cover change demand in 2029, connceting back to our use case that planners can use this sort of model to understand the spatial patterns of land use change. We begin this process by analyzing population change between 2019 and 2029 at the county level.

## Population Change (2019 to 2029)

We pulled population projections from the [University of Tennessee Boyd Center for Business and Economic Research](https://tnsdc.utk.edu/estimates-and-projections/boyd-center-population-projections/).

```{r dat1, warning=FALSE, message=FALSE, results=FALSE}
dat1 <-
  dat %>%
  mutate(lagDevelopment = nn_function(xyC(.), xyC(filter(.,developed19 == 1)),2))

countyPopulation_2030 <- 
  data.frame(
   NAME = 
     c("Cannon",
       "Cheatham", 
       "Davidson", 
       "Dickson", 
       "Macon", 
       "Maury",
       "Robertson",
       "Rutherford", 
       "Smith", 
       "Sumner", 
       "Trousdale", 
       "Williamson",
       "Wilson"),
   county_projection_2030 = 
     c(15500,
       42532,
       751527,
       58862,
       27209,
       113400,
       79113,
       412856,
       21233,
       223103,
       12211,
       301247,
       175908)) %>%
   left_join(
     dat %>%
       st_set_geometry(NULL) %>%
       group_by(NAME) %>%
       summarize(county_population_2019 = round(sum(totalPop2019))))

countyPopulation_2030_wPct <- countyPopulation_2030 %>%
  mutate(pctChange = (county_projection_2030-county_population_2019)/(county_projection_2030)*100)

countyPopulation_2030 %>%
  gather(Variable,Value, -NAME) %>%
  ggplot(aes(reorder(NAME,-Value),Value)) +
  geom_bar(aes(fill=Variable), stat = "identity", position = "dodge") +
  scale_fill_manual(values = palette2,
                    labels=c("2019","2029"),
                    name="Population") +
  labs(title="Population Change by County: 2019 - 2029",
       x="County", y="Population") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  plotTheme()

```

## Development Demand (2029)

We calculate infill potential of each county and feed it into our model 6, and look at the spatial characteristics of demand in 2029. We see there is a higher demand for development in more urbanized counties with high development already, with a decrease in development demand in counties with less population and more undeveloped land.

```{r data infill, warning=FALSE, message=FALSE, results=FALSE}
dat_infill <-
  dat1 %>%
  #calculate population change
    left_join(countyPopulation_2030) %>%
    mutate(proportion_of_county_pop = totalPop2019 / county_population_2019,
           pop_2030.infill = proportion_of_county_pop * county_projection_2030,
           pop_Change = round(pop_2030.infill - totalPop2019),2) %>%
    dplyr::select(-county_projection_2030, -county_population_2019, 
                  -proportion_of_county_pop, -pop_2030.infill) %>%
  #predict for 2030
    mutate(predict_2030.infill = predict(Model6,. , type="response"))

dat_infill %>%
  ggplot() +  
  geom_point(aes(x=xyC(dat_infill)[,1], y=xyC(dat_infill)[,2], colour = factor(ntile(predict_2030.infill,5)))) +
  scale_colour_manual(values = paletteMain,
                    labels=substr(quintileBreaks(dat_infill,"predict_2030.infill"),1,4),
                    name="Quintile\nBreaks") +
  geom_sf(data=studyAreaCounties, fill=NA, colour="white", size=1) +
  labs(title= "Development Demand in 2029", subtitle="Model 6") +
  mapTheme()
```


```{r development demand and env sus, warning=FALSE, message=FALSE, results=FALSE}
lulc19 <- raster("https://github.com/ObjQIAN/warehouseii/raw/main/final/lulc19f2p_Resample.tif")

developed19_1 <- lulc19 == 21 | lulc19 == 22 | lulc19 == 23 | lulc19 == 24
forest19 <- lulc19 == 41 | lulc19 == 42 | lulc19 == 43 
farm19 <- lulc19 == 81 | lulc19 == 82 
wetlands19 <- lulc19 == 90 | lulc19 == 95 
otherUndeveloped19 <- lulc19 == 52 | lulc19 == 71 | lulc19 == 31 
water19 <- lulc19 == 11

names(developed19_1) <- "developed19_1"
names(forest19) <- "forest19"
names(farm19) <- "farm19"
names(wetlands19) <- "wetlands19"
names(otherUndeveloped19) <- "otherUndeveloped19"
names(water19) <- "water19"

theRasterList19 <- c(developed19_1,forest19,farm19,wetlands19,otherUndeveloped19,water19)

dat2 <-
  aggregateRaster(theRasterList19, dat1) %>%
  dplyr::select(developed19_1,forest19,farm19,wetlands19,otherUndeveloped19,water19) %>%
  st_set_geometry(NULL) %>% 
  bind_cols(.,dat1) %>%
  st_sf() %>%
  st_cast("POLYGON")

dat2 <-
  dat2 %>%
   mutate(sensitive_lost19 = ifelse(forest == 1 & forest19 == 0 |
                                    wetlands == 1 & wetlands19 == 0,1,0))

# dat2 %>%
#         sensitive_lost19 = ifelse(farm == 1 & farm19 == 0 |
#                                     wetlands == 1 & wetlands19 == 0,1,0)
# 
# ggplot() +
#   geom_point(data=dat2, aes(x=xyC(dat2)[,1], y=xyC(dat2)[,2], colour=as.factor(sensitive_lost19))) +
#   scale_colour_manual(values = palette2,
#                       labels=c("No Change","Sensitive Lost"),
#                       name = "") +
#   labs(title = "Sensitive lands lost: 2001 - 2011",
#        subtitle = "As fishnet centroids") +
#   mapTheme()
```

# Supply-Side Solution: I-840 Expansion

Revisiting our proposed interstate expansion, we chose this as the Nashville MSA's interstate system felt as though there were gaps to fill. We drew a line that goes through Sumner and Wilson counties, which have the 2nd and 3rd highest projected growth rates, respectively.

```{r county pop, warning=FALSE, message=FALSE, results=FALSE}
countyPopulation_2030_wPct_sorted <- countyPopulation_2030_wPct[order(-countyPopulation_2030_wPct$pctChange),]

kable(countyPopulation_2030_wPct_sorted) %>%
  kable_styling(full_width = F, bootstrap_options = c("bordered")) %>%
  row_spec(c(2,3), background = "#FFD54F")
```

## Modeling Development Demand

```{r modeling development demand, warning=FALSE, message=FALSE, results=FALSE}
set.seed(408)
trainIndex2 <- 
  createDataPartition(dat2$developed19_1, p = .50,
                                  list = FALSE,
                                  times = 1)
datTrain2 <- dat2[ trainIndex,]
datTest2  <- dat2[-trainIndex,]
```

We create our seventh model, which contains all of our features from before, in addition to our proposed interstate. We see significance in several of our variables again, including the distance to the new interstate.

```{r model7, warning=FALSE, message=FALSE, results=TRUE}
Model7 <- glm(lc_change ~ developed + wetlands + forest + farm + otherUndeveloped + totalPop2011 + totalPop2019 + popChange + totalOccupied2011 + totalOccupied2019 + housingChange + croplandsDist + interstateDist + lagDevelopment + newInterstateDist, 
              family="binomial"(link="logit"), data = datTrain2) 

summary(Model7)
```

## Map of Development Demand

We plot below our development demand's with the new model. We can see that cells that the interstate crosses through have some of the highest development demand, and some of these cells did not belong to the higher quintiles before, indicating that interstate access and roadway accessibility can likely induce higher demand for development.

```{r dat3, warning=FALSE, message=FALSE, results=FALSE}
dat3 <-
  dat2 %>%
  mutate(lagDevelopment = nn_function(xyC(.), xyC(filter(.,developed19_1 == 1)),2))

dat_infill3 <-
  dat3 %>%
  #calculate population change
    left_join(countyPopulation_2030) %>%
    mutate(proportion_of_county_pop = totalPop2019 / county_population_2019,
           pop_2030.infill = proportion_of_county_pop * county_projection_2030,
           pop_Change = round(pop_2030.infill - totalPop2019),2) %>%
    dplyr::select(-county_projection_2030, -county_population_2019, 
                  -proportion_of_county_pop, -pop_2030.infill) %>%
  #predict for 2030
    mutate(predict_2030.infill = predict(Model7,. , type="response"))

  ggplot(dat_infill3) +  
  geom_point(aes(x=xyC(dat_infill3)[,1], y=xyC(dat_infill3)[,2], colour = factor(ntile(predict_2030.infill,5)))) +
  scale_colour_manual(values = paletteMain,
                    labels=substr(quintileBreaks(dat_infill3,"predict_2030.infill"),1,4),
                    name="Quintile\nBreaks") +
  geom_sf(data=studyAreaCounties, fill=NA, colour="white", size=1) +
  geom_sf(data=newInterstate, color="red", aes(geometry=geometry))+
  labs(title= "Development Demand in 2029: Predicted Probabilities", subtitle="Model 7", caption="Interstate proposal in red") +
  mapTheme()

```

# Conclusion

Predicting growth can be extremely challenging for planners given the uncertainties of economic conditions, public policy, migration patterns, among several other reasons why a city's growth may change. While some existing methods often provide meaningful insight, leveraging regression modeling based on land cover change, demographic indicators, and built environment infrastructure appears to provide meaningful insights toward urban growth patterns.

In this scenario, we modeled the growth of the Nashville MSA and developed models that were both accurate and generalizable. While our models did not explain a lot of the variance within variables, we were able to develop predictions that were consistently accurate and decently predicted no change or new development. We applied this to demand and supply scenarios to highlight the importance of both development patterns due to population change in addition to proximity to amenities.

This model is definitely a very simplistic version of a model that could capture several more predictive features, but we show in this process that even with simple features, we can gain strong insights in land use planning patterns. Especially in a context area like Nashville MSA, with intense urban sprawl and an extremely rapid population growth, access to housing and amenities is becoming more vital than ever. Replicating this study could be extremely useful for both Nashville and other MSA's alike facing similar problems to understand where development has happened, where it is projected to happen, and how to enact better land use planning as a result.