Organization-README

Purpose

This repo is an overview of how to use all of this organization's repos and the steps to take to create experiments with synthetic data similar to what we have done. Each repo is meant for a different step in this process so it's important to know which one to use for which step and the general workflow for creating these experiments.

Using this repo

Make sure to read about the team's work at our website. This should give you a good sense of our methodology and what experiments we performed to test the utility of synthetic data. Then, just read the following steps for the workflow and then hopefully you can follow along in the process.

Workflow

Creating the real/baseline dataset

Get real images of the target energy infrastructure. For images of wind turbines, you can download the ones we have assembled on our figshare. For high voltage transmission towers, we have resources at this repo (TODO: NEED LINK). Otherwise, you will need to find a resource where you can download images. A couple options to do this include using EarthOnDemand, or you can also use this script to download images given a list of coordinates. For example, you could look up a dataset of coordinates for a specific energy infrastructure and then use this script to automatically collect images at those coordinates.
Label the real images. If you used our dataset for wind turbines, you can just use the labels that we have already created, which should be included in the dataset. If you don't have labels for the images, then you will need to use a program, such as labelImg to create them. Depending on which model you are using, you will need to export these labels in the correct format. If you are using YOLOv3, then make sure that the labels are exported in YOLOv3 format.
Optional: Split data into geographic domains. Here a domain means a geographic region. Depending on the goal of your experiment, it might make sense to divide the collected images/labels into different domains. Since our team was interested in how synthetic data could help the object detection model overcome the issue of domain adaptation (training on one domain, and testing on a visually different domain), we needed to split our data into different domain. Splitting the data into different domains can be done in a variety of ways, but we chose to do it by groups of states. We looked at our collected data and based off of the characteristics and landscapes in the imagery in each state, we grouped the states together. This resulted in five different domains: Northwest (NW), Southwest (SW), Western Midwest (WM), Eastern Midwest (EM), and Northwest (NW), where each domain contained a set of states in it. For our experiments, we used data from NW, SW, EM, and NW, but not from WM. For example, Northwest contained all of the imagery in WA, OR, ID, and MT. The imagery in these groups were meant to be self-similar, but look different from the imagery in the other groups.
Create training and testing sets. If the data is split into multiple domains, then this should be done for each domain. Otherwise, it would be done for the entire dataset. To create the training and testing sets, we used DBSCAN clustering in the current domain to get clusters of imagery based on distance. We then stratified sampled from the clusters to get the training and testing sets. The purpose of this is to make sure both sets both equally represent the imagery in the domain. This clustering script can be found here. The output of this should be a .csv file containing the image names for each training/testing set, such as the files contained in this folder.

Creating the Synthetic Data

Note that if you want to use our synthetic data, it is available at our figshare, and this section (steps 5 through 9) are not necessary.

Collect background images for the synthetic data. These images should contain landscapes that don't have the target energy infrastructure, because the point is to generate 3D energy infrastructure models on top of these images. This can be done in many different ways, but for our CityEngine synthetic generation scripts, these images should be 1300x1300. Our approach was to collect background images close to the testing site locations for each domain, which was done in the script here.
Find a 3D model for the target energy infrastructure. The other item we need for creating synthetic data is the 3D model of the energy infrastructure. You can find various models (some of which are free, others cost money) at CGTrader or other websites. If you are using wind turbines, then you can find the particular models in our City-Engine file repo, which are already there so you don't have to do anything for this step.
Optional: Create CityEngine model size bins.
Generate the synthetic imagery using CityEngine. Now we have background images and a 3D model, so we can now create the synthetic imagery. Follow the steps in the CityEngine-Files repo. You'll need to first download CityEngine. If you are a part of Duke, then instructions for downloading CityEngine through Duke can be found here. Then, you can make a new project in CityEngine and copy and paste the contents of the CityEngine-Files repo into that project after cloning the repo. Pay attention to the branches of the CityEngine-Files repo. The most updated branch in this repo is dynamic_size_mar24, so you'll likely want to clone the repo, switch to that branch and then copy and paste those files into the CityEngine project. Your background images should be in the maps folder, and your 3D model should be in the models folder. If you did step 7 and are using dynamic_size_mar24, then place the scale_bins.csv file into the data folder. Make any adjustements you want to the script, and then you should be able to run it and generate the synthetic imagery.
Create the synthetic labels. From the previous step, you should have rgb and black-and-white images of the target infrastructure. You can then use the Synthetic-Label-Generation repo to convert the black-and-white images to YOLOv3 formatted .txt labels that have the same name as the image it corresponds to.

Setting up and Running the Experiments

Setup the experiment. You can now use the Experiment-Setup repo to generate the files necessary to run the experiments. This assumes you are using YOLOv3, since it generates a file structure that YOLOv3 requries for experiments. This takes the train and test set .csv files as inputs, as well as the names of all of the synthetic images.
Run the experiments. There are many options for how to do this, but we chose to use Duke Box to store our data and Google Colab to run the experiments. To store the data using Duke Box, it is easiest if you sync Duke Box onto your computer. You'll need to upload individual .zip files for: real images, real labels, synthetic images, and synthetic labels. You can then use the settings on Duke Box to create a direct download link for each folder. Then, you'll need to upload .zip files for the files generated from step 10, specifically you'll want to zip and upload the folder(s) that directly contain the .txt, .DATA, and .NAMES files. If you haven't changed the naming configurations in the Experiment-Setup repo, then these folders that you want to zip and upload will be called 'baseline' and 'adding_synthetic'. Next, you can make a new notebook or copy one of ours from our Colab-Notebooks-for-Training-Models repo. You'll then need to replace the direct download links with your own, and then you can run the experiment with your own data.

Questions/Issues

If you have any questions about using this or run into any problems, feel free to create an issue on the relevant github repo, or email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Organization-README

Purpose

Using this repo

Workflow

Creating the real/baseline dataset

Creating the Synthetic Data

Setting up and Running the Experiments

Questions/Issues

About

Releases

Packages

Duke-BC-DL-for-Energy-Infrastructure/Organization-README

Folders and files

Latest commit

History

Repository files navigation

Organization-README

Purpose

Using this repo

Workflow

Creating the real/baseline dataset

Creating the Synthetic Data

Setting up and Running the Experiments

Questions/Issues

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages