-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patheGRID_master.qmd
1758 lines (1291 loc) · 47.3 KB
/
eGRID_master.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "eGRID Production"
author:
- "Sean Bock, Abt Global"
- "Claire Lay, Abt Global"
- "Justin Stein, Abt Global"
- "Teagan Goforth, Abt Global"
- "Emma Russell, Abt Global"
- "Sara Sokolinski, Abt Global"
- "Caroline Watson, Abt Global"
- "Madeline Zhang, Abt Global"
freeze: true
format:
html:
toc: true
toc-expand: true
toc-location: left
html-table-processing: none
code-fold: true
execute:
message: false
warning: false
params:
eGRID_year: "2023"
version: "1.0.1"
run_demo_file: FALSE # running demographic file takes 4-5 hours
# FALSE = default, do not run
# TRUE = runs script to collect demographic file
editor: visual
project:
execute_dir: project
---
# Overview
This project includes all necessary scripts and documentation to create the Emissions & Generation Resource Integrated Database (eGRID).
# Background
eGRID is a comprehensive source of data from [EPA's Clean Air and Power Division (CAPD)](https://epa.gov/power-sector) on the environmental characteristics of almost all electric power generated in the United States. eGRID is based on available plant-specific data for all U.S. electricity generating plants that provide power to the electric grid and report emissions and electricity data to the U.S. government. Data reported include, but are not limited to, net electric generation; resource mix (the share of generation by resource or fuel type); mass emissions of carbon dioxide (CO<sub>2</sub>), nitrogen oxides (NO<sub>x</sub>), sulfur dioxide (SO<sub>2</sub>), methane (CH<sub>4</sub>), and nitrous oxide (N<sub>2</sub>O); emission rates for CO<sub>2</sub>, NO<sub>x</sub>, SO<sub>2</sub>, CH<sub>4</sub>, and N<sub>2</sub>O; heat input; and nameplate capacity. eGRID reports this information on an annual basis (as well as by ozone season for heat input and NO<sub>x</sub>) at different levels of geographic aggregation.
The final eGRID dataset includes eight levels of data aggregation:
- **Generator**: A set of equipment that produces electricity and is connected to the U.S. electricity grid.
- **Unit**: A set of equipment that either produces electricity and is connected to the U.S electricity grid or a set of equipment that is connected to a generator which produces electricity and is connected to the U.S. electricity grid.
- **Plant**: A facility with one or more units and/or generators that provide power to the electric grid.
- **State**: U.S. states, Puerto Rico (PR), and the District of Columbia (DC).
- **Balancing authority**: Regional power system operators that ensure a balance of supply and demand.
- **eGRID subregion**: EPA defined subregions designed to limit the impacts of the import and export of electricity.
- **NERC (North American Electric Reliability Corporation) regions**: Each NERC region listed in eGRID represents one of nine regional portions of the North American electricity transmission grid: six in the contiguous United States, plus Alaska, Hawaii, and Puerto Rico (which are not part of the formal NERC regions but are considered so in eGRID).
- **National U.S.**: Contains all 50 states, Puerto Rico (PR), and the District of Columbia (DC).
Further information on the eGRID methodology can be found in the [eGRID Technical Guide](https://www.epa.gov/egrid/egrid-technical-guide).
The dataset that this code produces is publicly available [here](https://www.epa.gov/egrid/download-data).
## Using the .qmd file
This is a Quarto (.qmd) file that documents and runs the code necessary to create the eGRID database. Along with sequentially executing the necessary scripts used to create the database, the process is documented throughout, including built-in outputs (e.g., counts of rows, variable names, etc.) and QA steps. This document can be used in two ways: First, when used within an IDE, such as RStudio, it serves as an enhanced master script, allowing users to easily perform each step necessary to create the eGRID database from within a single file. Second, when viewing as a rendered file, it provides thorough documentation of the steps involved in creating eGRID. When rendered, this file provides additional information and tools for navigating the project. The table of contents provides both a convenient look at the project structure and method for navigating the document. There are also hidden code chunks throughout, marked by a button ("Code" with an arrow next to it). Clicking this button reveals the underlying code from the R script that is being sourced in a given section.
# Install Libraries and Set Parameters
Before loading any data or beginning to construct the eGRID database, we install all necessary libraries used in subsequent scripts. Next, the eGRID year is defined, which controls which year of data is loaded from the raw data sources.
## Install required libraries
The script `install_libraries.R` detects all necessary libraries within the project using `renv::dependencies()`, checks for installation, and installs (if needed) and loads into the workspace.
```{r}
#| label: install-libraries.R
#| file: "scripts/install_libraries.R"
#| echo: false
```
```{r}
#| label: load-packages
#| echo: false
library(docstring)
library(dplyr)
library(ggiraph)
library(ggplot2)
library(gt)
library(gtExtras)
library(kableExtra)
library(knitr)
library(patchwork)
library(readxl)
library(stringr)
```
## Define eGRID year
The year of eGRID specified as the parameter "eGRID_year" within the YAML of the Quarto document. When a year value needs to be specified (e.g., when pulling relevant data from the CAMPD API), it is done so by calling `params$eGRID_year`.
**Current year setting: `r params$eGRID_year`**
## Load all helper functions
There are a set of helper functions used throughout the project. These are defined in the folder `scripts/functions`. The `docstring` package is used to provide documentation for functions, similar to typical package documentation. To view the documentation for a given function, run `docstring({function_name})`
```{r}
#| label: load-functions
#| echo: false
# sourcing each file in functions folder
functions <- list.files("scripts/functions")
purrr::walk(paste0("scripts/functions/", functions), ~ source(.x))
```
::: panel-tabset
### Overview
*Click on the tab to view the helper function names.*
### Helper Functions
```{r}
#| label: list-functions
#| echo: false
# listing function names
kable(stringr::str_remove(stringr::str_remove(functions, ".R"), "function_"),
col.names = c("Helper Functions")) %>%
kable_styling(full_width = F)
```
:::
# Load Data
## Load data from raw sources
### EPA
The EPA's Clean Air and Power Division (CAPD) contains power plant emissions, compliance, and allowance data. We create and incorporate a composite file from several CAPD sources, containing data about facilities and annual emissions. Specifically, we include data about facility attributes, annual emissions, and annual emissions during ozone months.
These data are available through the [CAPD API](https://www.epa.gov/power-sector/cam-api-portal#/documentation). `data_load_epa.R` downloads the facility attributes and emissions data for a selected year from the CAPD API. These raw files will be combined and cleaned in subsequent steps. For this script to run, an API key is required.
An API key can be requested [here](https://www.epa.gov/power-sector/cam-api-portal#/api-key-signup). Once a key is obtained, paste your key into the file `api_keys/epa_api_key.txt` and save. Once saved, the script `data_load_epa.R` will be able to successfully connect to and load data from the CAPD API.
```{r}
#| label: load-data-epa
#| file: "scripts/data_load_epa.R"
#| echo: fenced
#| error: false
```
Three data tables are downloaded from the the CAPD and stored into one .RDS file, `epa_raw.RDS`. This file consists of facility attributes, annual emissions data, and containing annual emissions data for the ozone months. See a summary of the raw EPA file below, separated into categories.
#### EPA Raw Tables
```{r}
#| label: create-epa-tables
#| echo: false
source("scripts/functions/function_summary_table.R")
# function to create summary table
epa_raw <- read_rds(glue::glue("data/raw_data/epa/{params$eGRID_year}/epa_raw.RDS"))
table_epa_raw <- create_summary_table(epa_raw)
epa_facilities <- epa_raw[1:26]
epa_emissions <- epa_raw[27:46]
epa_emissions_ozone <- epa_raw[47:58]
table_facilities <- create_summary_table(epa_facilities)
table_emissions <- create_summary_table(epa_emissions)
table_emissions_ozone <- create_summary_table(epa_emissions_ozone)
```
::: panel-tabset
#### Overview
*Click through the tabs above to view files represented from each EPA source.*
**EPA facility attributes**: Columns 1 - 26 of `epa_raw.RDS`.
**EPA annual emissions**: Columns 27 - 47 of `epa_raw.RDS`.
**EPA ozone season emissions**: Columns 47 - 58 of `epa_raw.RDS`. The ozone season is defined as May through September.
#### EPA facility attributes
```{r}
#| label: print-epa-tab-facilities
#| echo: false
table_facilities
```
#### EPA annual emissions
```{r}
#| label: print-epa-tab-emissions
#| echo: false
table_emissions
```
#### EPA ozone season emissions
```{r}
#| label: print-epa-tab-emissions-ozone
#| echo: false
table_emissions_ozone
```
:::
### EIA
The U.S. Energy Information Administration (EIA), a part of the Department of Energy, collects and maintains energy-related data for policy making and for the public. eGRID integrates several EIA data forms into its database for relevant values.
As of 2024, the EIA API does not contain the most detailed data available, which is necessary for the construction of eGRID. More detailed data for the EIA forms [923](https://www.eia.gov/electricity/data/eia923/), [860](https://www.eia.gov/electricity/data/eia860/), and [861](https://www.eia.gov/electricity/data/eia861/) are available as zipped excel file downloads on the EIA website. `data_load_eia.R` creates a new folder in the project folder called "raw_data". The zip files for each of the forms are downloaded and unzipped within this newly created folder. Each excel file contains several sheets that serve as the raw EIA data sources used to create eGRID.
```{r}
#| label: data-load-eia
#| file: "scripts/data_load_eia.R"
#| error: false
#| echo: fenced
```
The tables within EIA Files and Sheets display each of the unzipped raw files across the three EIA forms, including the sheets embedded within.
```{r}
#| label: custom-gt-print
#| echo: false
knit_print.gt <- function(x, ...) {
stringr::str_c("<div style='all:initial;'>\n", gt::as_raw_html(x), "\n</div>") %>%
knitr::asis_output()
}
registerS3method("knit_print", "gt_tbl", knit_print.gt, envir = asNamespace("gt"))
```
```{r}
#| label: create-eia-sheets-list
sheets_923 <- get_sheets("923")
sheets_860 <- get_sheets("860")
sheets_861 <- get_sheets("861")
sheets_860m <- get_sheets("860m")
tab_923 <- make_sheets_table(sheets_923, "923")
tab_860 <- make_sheets_table(sheets_860, "860")
tab_861 <- make_sheets_table(sheets_861, "861")
tab_860m <- make_sheets_table(sheets_860m, "860m")
sheets_used923 <- c("Page 1 Generation and Fuel Data", # eia-923
"Page 1 Puerto Rico",
"Page 3 Boiler Fuel Data",
"Page 4 Generator Data",
"8C Air Emissions Control Info")
sheets_used860 <- c("Operable", # eia-860
"Proposed",
"Retired and Canceled",
"Boiler Generator",
"Boiler NOx",
"Boiler SO2",
"Boiler Mercury",
"Boiler Particulate Matter",
"Emissions Control Equipment",
"Emissions Standards & Strategies",
"Boiler Info & Design Parameters",
"FGD",
"Plant")
sheets_used860m <- c("Operating_PR",
"Retired_PR")
sheets_used861 <- c("Balancing Authority",
"States")
# file_name_schedule_2_3_4_5_m_12 <- grep("2_3_4_5_M_12", eia_923_files, value = TRUE)
# tab_923 <- style_sheets_table(sheets_used923, tab_923)
# tab_860 <- style_sheets_table(sheets_used860, tab_860)
# tab_860m <- style_sheets_table(sheets_used860m, tab_860m)
# tab_861 <- style_sheets_table(sheets_used861, tab_861)
```
#### EIA Files and Sheets {#eia-files-sheets}
::: panel-tabset
#### Overview
*Click through the tabs above to view files represented from each EIA source.*
**EIA-923**: Data reported on fuel consumption and generation
**EIA-860**: Data reported on electric generators
**EIA-861**: Data collected from distribution utilities and power marketers
**EIA-860m**: Data reported monthly on generating units (used to obtain data for Puerto Rico)
#### EIA-923
```{r}
#| label: print-eia-923-sheets
#| tbl-cap: "EIA-923 sheets"
#| echo: false
tab_923
```
#### EIA-860
```{r}
#| label: print-eia-860-sheets
#| echo: false
#| tbl-cap: "EIA-860 sheets"
tab_860
```
#### EIA-861
```{r}
#| label: print-eia-861-sheets
#| echo: false
#| tbl-cap: "EIA-861 sheets"
tab_861
```
#### EIA-860m
```{r}
#| label: print-eia-860m-sheets
#| echo: false
#| tbl-cap: "EIA-860m sheets"
tab_860m
```
:::
## Load Crosswalks and Static Tables
Crosswalks and static tables are used to supplement EPA and EIA files and provide information on one-off changes, descriptions, emission factors, or overall conversions.
```{r}
#| label: create-xwalk-summary
path <- glue::glue("data/static_tables")
files <- list.files(path)
# filter for only necessary files
file_files <- stringr::str_subset(files, ".xls|.xlsx|.csv")
# table for crosswalks
tab_xwalk <-
tibble(file_files) %>%
rename("Crosswalks and Static Tables" = file_files) %>%
gt::gt() %>%
gt::tab_style(
style = cell_text(weight = "bold"),
locations = cells_row_groups()
) %>%
gt::tab_style(
style = cell_text(size = 14, weight = "bold"),
locations = cells_column_labels()
) %>%
gt::tab_caption(caption = glue::glue("Crosswalks and Static Tables"))
```
::: panel-tabset
#### Overview
*Click on the tab to view crosswalk and static table files*
#### Files
```{r}
#| label: xwalk-summary
#| echo: false
tab_xwalk
```
:::
# Clean Raw Data Files
## EPA
There are several procedures applied to the raw EPA file:
- Variable name standardization
- All variable names are converted to snake case (e.g., "snake_case").
- Each form includes identifiers such as a given plant name, prime mover, fuel type, etc., but the assigned column names are inconsistent. To facilitate data operations (e.g., joins) and reduce confusion, we use a common naming scheme across all files (including EIA and EPA).
- `plant_id`
- `plant_name`
- `plant_state`
- `prime_mover`
- `fuel_type`
- `generator_id`
- `boiler_id`
- `nameplate_capacity`
- Removing unnecessary plants and columns
- Plants listed as future, retired, or long-term cold storage are removed. Additionally, plants with IDs above 80000 are removed
- Create source variables and apply source: `EPA/CAPD`
- `heat_input_source`
- `heat_input_oz_source`
- `nox_source`
- `nox_oz_source`
- `so2_source`
- `co2_source`
- `hg_source`
- Re-coding values to standardized abbreviations
- Ex. Operating Status to OP, unit type description to unit type abbreviation
- Removing unnecessary notes about start date
- This keeps all data rows into a usable, consistent format. In the raw version, some plants have added notes about dates or the plant.
```{r}
#| label: data-clean-epa
#| file: "scripts/data_clean_epa.R"
#| results: hold
```
::: panel-tabset
#### Overview
*Click tab above to view variables contained in* `epa_clean.RDS`.
#### EPA clean
```{r}
#| label: create-epa-summary-table
#| echo: false
create_summary_table(readr::read_rds(glue::glue("data/clean_data/epa/{params$eGRID_year}/epa_clean.RDS")))
```
:::
## EIA
From the raw Excel downloads, we load, clean, and save select files that are used in eGRID production. Three "clean" EIA files are ultimately created:
- `eia_923_clean.RDS`
- `eia_860_clean.RDS`
- `eia_861_clean.RDS`
Each of these .RDS files contains lists of the relevant tables (stored as dataframes) from each EIA form.
There are several procedures that are applied to each of the raw EIA files:
- Handling Excel format
- Each Excel file contains header rows of varying lengths. These rows are skipped when read in.
- Files contain various missing value characters, including: " ","X", and ".". These characters are converted to explicit missing values (i.e., "NA")
- Variable name standardization
- The same method as EPA data above.
```{r}
#| label: data-clean-eia
#| file: "scripts/data_clean_eia.R"
#| results: hold
```
```{r}
#| label: load-eia-clean-files
#| echo: false
eia_923_files <- read_rds(
glue::glue("data/clean_data/eia/{params$eGRID_year}/eia_923_clean.RDS"))
eia_860_files <- read_rds(
glue::glue("data/clean_data/eia/{params$eGRID_year}/eia_860_clean.RDS"))
eia_861_files <- read_rds(
glue::glue("data/clean_data/eia/{params$eGRID_year}/eia_861_clean.RDS"))
```
### EIA-923
::: panel-tabset
#### Overview
*Click through the tabs above to preview values represented from required EIA-923 files.*
```{r}
#| label: eia-923-summary-tabs
#| results: asis
tabs_923 <-
eia_923_files %>%
map(., ~ create_summary_table(.x))
purrr::iwalk(tabs_923, ~ {
cat("#### ", .y, "\n\n")
print(.x)
cat("\n\n")
} )
```
:::
### EIA-860
::: panel-tabset
#### Overview
*Click through the tabs above to preview values represented from required EIA-860 files.*
```{r}
#| label: eia-860-summary-tabs
#| echo: false
#| results: false
tabs_860 <-
eia_860_files %>%
map(., ~ create_summary_table(.x))
```
```{r}
#| label: eia-860-summary-tabs-2
#| echo: false
#| results: asis
purrr::iwalk(tabs_860, ~ {
cat("#### ", .y, "\n\n")
print(.x)
cat("\n\n")
} )
```
:::
### EIA-861
::: panel-tabset
#### Overview
*Click through the tabs above to preview values represented from required EIA-861 files.*
```{r}
#| label: eia-861-summary-tabs
#| echo: false
#| results: asis
tabs_861 <-
eia_861_files %>%
map(., ~ create_summary_table(.x))
purrr::iwalk(tabs_861, ~ {
cat("#### ", .y, "\n\n")
print(.x)
cat("\n\n")
} )
```
:::
# Generator File
The generator file uses data from `eia_860_clean.RDS`, including all operable and retired generators . The code pulls variables from EIA-860 data (`boiler_generator` and `combined`) and counts the number of boilers. Then, we assign generation to each generator value.
Direct generation is assigned to each generator values through the `EIA-923 Generator` file. For values not included in the `EIA-923 Generator` file, generation is determined by using a nameplate capacity ratio with `EIA-923 Generator and Fuel` data. A capacity factor is assigned to each generator.
For full documentation, reference the [eGRID Technical Guide](https://www.epa.gov/egrid/egrid-technical-guide).
Crosswalks used in this file:
- `epa_plants_to_delete.csv`
- `manual_corrections.xlsx`
- `og_oth_units_to_change_fuel_type.csv`
- `xwalk_oris_epa.csv`
## Produce Generator File
```{r}
#| label: generator-file-create
#| file: "scripts/generator_file_create.R"
#| results: hold
```
```{r}
#| label: generator-file-table
#| include: false
generator_file <- readr::read_rds(
glue::glue("data/outputs/{params$eGRID_year}/generator_file.RDS"))
```
## View Generator File Data
::: panel-tabset
#### Overview
*Click through the tabs above to preview data contained within the generator file.*
#### Data Summary
```{r}
#| label: gen-file-summary
create_summary_table(generator_file)
```
#### Generation Distributions
```{r}
#| label: gen-file-distributions
#| fig-height: 10
plot_gen_ann <-
generator_file %>%
ggplot(aes(x = generation_ann)) +
geom_histogram() +
theme_minimal() +
labs(title = "Annual Generation")
plot_gen_oz <-
generator_file %>%
ggplot(aes(x = generation_oz)) +
geom_histogram() +
theme_minimal() +
labs(title = "Ozone Months Generation")
plot_gen_ann_source <-
generator_file %>%
ggplot(aes(x = gen_data_source, y = generation_ann)) +
geom_boxplot() +
theme_minimal() +
coord_flip() +
scale_x_discrete(labels = scales::wrap_format(30)) +
labs(title = "Annual Generation by Data Source")
(plot_gen_ann + plot_gen_oz) / plot_gen_ann_source
```
#### Distribution of Data Sources
```{r}
#| label: generator-file-data-source-dists
generator_file %>%
count(gen_data_source) %>%
mutate(proportion = n/sum(n)) %>%
ggplot(aes(x = gen_data_source, y = proportion)) +
geom_col() +
coord_flip() +
scale_x_discrete(labels = scales::wrap_format(30)) +
scale_y_continuous(labels = scales::label_percent()) +
geom_text(aes(label = scales::percent(proportion)),
nudge_y = .05) +
theme_minimal() +
labs(
title = "Distribution of Generation Data Sources",
y = "Share of generators",
x = NULL
)
```
:::
# Unit File
The unit file data includes grid connected units from EPA/CAPD data, unique EIA-923 boilers, unique EIA-860 generators.
The unit file includes heat input and emissions values for each unit included where data is available.
For full documentation, reference the [eGRID Technical Guide](https://www.epa.gov/egrid/egrid-technical-guide).
Crosswalks used in this file:
- `biomass_units_to_add_to_unit_file.csv`
- `co2_ch4_n2o_ef.csv`
- `emission_factors.csv`
- `epa_plants_to_delete.csv`
- `fuel_type_categories.csv`
- `geothermal_emission_factors.csv`
- `manual_corrections.xlsx`
- `nrel_geothermal_table.csv`
- `og_oth_units_to_change_fuel_type.csv`
- `units_to_remove.csv`
- `xwalk_860_boiler_control_id.csv`
- `xwalk_boiler_firing_type.csv`
- `xwalk_epa_eia_power_sector.csv`
- `xwalk_oris_epa.csv`
- `xwalk_pr_oris.csv`
## Produce Unit File
```{r}
#| label: unit-file-create
#| file: "scripts/unit_file_create.R"
#| results: hold
```
```{r}
#| label: unit-file-table
#| include: false
unit_file <- readr::read_rds(
glue::glue("data/outputs/{params$eGRID_year}/unit_file.RDS"))
```
## View Unit File Data
::: panel-tabset
#### Overview
*Click through the tabs above to preview data contained within the unit file.*
#### Data Summary
```{r}
#| label: unit-file-summary
create_summary_table(unit_file)
```
#### Heat Input Distributions
```{r}
#| label: unit-file-heat-in-dist
#| fig-height: 10
plot_unt_heat_in <-
unit_file %>%
ggplot(aes(x = heat_input)) +
geom_histogram() +
theme_minimal() +
labs(title = "Heat Input")
plot_unt_heat_in_oz <-
unit_file %>%
ggplot(aes(x = heat_input_oz)) +
geom_histogram() +
theme_minimal() +
labs(title = "Heat Input Ozone")
plot_unt_heat_in_source_dist <-
unit_file %>%
count(heat_input_source) %>%
mutate(proportion = n/sum(n)) %>%
ggplot(aes(x = heat_input_source, y = proportion)) +
geom_col() +
coord_flip() +
scale_x_discrete(labels = scales::wrap_format(30)) +
scale_y_continuous(labels = scales::label_percent()) +
geom_text(aes(label = scales::percent(proportion)),
nudge_y = .05) +
theme_minimal() +
labs(
title = "Distribution of Heat Input Data Sources",
y = "Share of units",
x = NULL
)
plot_unt_heat_in_oz_source_dist <-
unit_file %>%
count(heat_input_oz_source) %>%
mutate(proportion = n/sum(n)) %>%
ggplot(aes(x = heat_input_oz_source, y = proportion)) +
geom_col() +
coord_flip() +
scale_x_discrete(labels = scales::wrap_format(30)) +
scale_y_continuous(labels = scales::label_percent()) +
geom_text(aes(label = scales::percent(proportion)),
nudge_y = .05) +
theme_minimal() +
labs(
title = "Distribution of Heat Input Ozone Data Sources",
y = "Share of units",
x = NULL
)
(plot_unt_heat_in + plot_unt_heat_in_oz) / plot_unt_heat_in_source_dist / plot_unt_heat_in_oz_source_dist
```
#### NOx Distributions
```{r}
#| label: unit-file-nox-dist
#| fig-height: 10
plot_unt_nox <-
unit_file %>%
ggplot(aes(x = nox_mass)) +
geom_histogram() +
theme_minimal() +
labs(title = "NOx Mass")
plot_unt_nox_oz <-
unit_file %>%
ggplot(aes(x = nox_oz_mass)) +
geom_histogram() +
theme_minimal() +
labs(title = "NOx Mass Ozone")
plot_unt_nox_sources_dist <-
unit_file %>%
count(nox_source) %>%
mutate(proportion = n/sum(n)) %>%
ggplot(aes(x = nox_source, y = proportion)) +
geom_col() +
coord_flip() +
scale_x_discrete(labels = scales::wrap_format(30)) +
scale_y_continuous(labels = scales::label_percent()) +
geom_text(aes(label = scales::percent(proportion)),
nudge_y = .05) +
theme_minimal() +
labs(
title = "Distribution of NOx Mass Data Sources",
y = "Share of units",
x = NULL
)
plot_unt_nox_oz_source_dist <-
unit_file %>%
count(nox_oz_source) %>%
mutate(proportion = n/sum(n)) %>%
ggplot(aes(x = nox_oz_source, y = proportion)) +
geom_col() +
coord_flip() +
scale_x_discrete(labels = scales::wrap_format(30)) +
scale_y_continuous(labels = scales::label_percent()) +
geom_text(aes(label = scales::percent(proportion)),
nudge_y = .05) +
theme_minimal() +
labs(
title = "Distribution of NOx Mass Ozone Data Sources",
y = "Share of units",
x = NULL
)
(plot_unt_nox + plot_unt_nox_oz) / (plot_unt_nox_sources_dist) / (plot_unt_nox_oz_source_dist)
```
#### SO2 Distributions
```{r}
#| label: unit-file-so2-dist
#| fig-height: 10
plot_unt_so2 <-
unit_file %>%
ggplot(aes(x = so2_mass)) +
geom_histogram() +
theme_minimal() +
labs(title = "SO2 Mass")
plot_unt_so2_sources_dist <-
unit_file %>%
count(so2_source) %>%
mutate(proportion = n/sum(n)) %>%
ggplot(aes(x = so2_source, y = proportion)) +
geom_col() +
coord_flip() +
scale_x_discrete(labels = scales::wrap_format(30)) +
scale_y_continuous(labels = scales::label_percent()) +
geom_text(aes(label = scales::percent(proportion)),
nudge_y = .05) +
theme_minimal() +
labs(
title = "Distribution of SO2 Mass Data Sources",
y = "Share of units",
x = NULL
)
plot_unt_so2 / plot_unt_so2_sources_dist
```
#### CO2 Distributions
```{r}
#| label: unit-file-co2-dist
#| fig-height: 10
plot_unt_co2 <-
unit_file %>%
ggplot(aes(x = co2_mass)) +
geom_histogram() +
theme_minimal() +
labs(title = "SO2 Mass")
plot_unt_co2_sources_dist <-
unit_file %>%
count(co2_source) %>%
mutate(proportion = n/sum(n)) %>%
ggplot(aes(x = co2_source, y = proportion)) +
geom_col() +
coord_flip() +
scale_x_discrete(labels = scales::wrap_format(30)) +
scale_y_continuous(labels = scales::label_percent()) +
geom_text(aes(label = scales::percent(proportion)),
nudge_y = .05) +
theme_minimal() +
labs(
title = "Distribution of CO2 Mass Data Sources",
y = "Share of units",
x = NULL
)
plot_unt_co2 / plot_unt_co2_sources_dist
```
:::
# Plant File
The plant file combines EIA form data, `EIA-860`, `EIA-861`, and `EIA-923`, with the outputs of the previous two files: `generator_file.RDS` and `unit_file.RDS`.
The plant file calculates unadjusted and adjusted heat input and emissions, generation (total and by fuel type), emission rates, and resource mixes for each plant.
Adjusted heat input and emission values account for combined heat and power (CHP) and biomass facilities. The plant file also reports CHP and biomass specific heat input and emissions values.
For full documentation, reference the [eGRID Technical Guide](https://www.epa.gov/egrid/egrid-technical-guide).
Crosswalks used in this file:
- `ba_codes.csv`
- `chp_database.csv`
- `co2_ch4_n2o_ef.csv`
- `egrid_2022_chp.csv`
- `egrid_nerc_subregions.csv`
- `fuel_type_categories.csv`
- `global_warming_potential.csv`
- `manual_corrections.xlsx`
- `nerc_assessment_areas_grouped_by_plant.csv`
- `og_oth_units_to_change_fuel_type.csv`
- `state_county_fips.csv`
- `xwalk_alaska_fips.csv`
- `xwalk_balancing_authority.csv`
- `xwalk_fips_names_update.csv`
- `xwalk_nerc_assessment.csv`
- `xwalk_oris_epa.csv`