Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding section on Creating data products #73

Open
paolap opened this issue Jul 20, 2023 · 9 comments
Open

adding section on Creating data products #73

paolap opened this issue Jul 20, 2023 · 9 comments
Assignees

Comments

@paolap
Copy link
Contributor

paolap commented Jul 20, 2023

application ready data
data products finalised to be usable in other disciplines, stakeholders etc., so tailored to the intended user needs. This might mean converting existing data to suitable formats, adding extra quality control
how is packaged, tailored to different uses and documented in a way that errors and quality are well documented as the "clients" are non-expert
user centric

@hot007
Copy link
Contributor

hot007 commented Jul 20, 2023

From @AliciaTak , might be good to rewrite in our own words (hah!) and include!

I asked ChatGPT how to create application-ready RCM data, and here are the steps it gave me 😊

Step 1: Data Collection and Download

  • Identify the RCM simulations or datasets relevant to your research or application.
  • Obtain the RCM data from reputable sources, such as climate modeling centers or research institutions. Ensure that the data aligns with your study area, resolution, and time period of interest.

Step 2: Data Preprocessing and Quality Control

  • Check and handle missing values, inconsistencies, or outliers in the RCM data.
  • Perform quality control checks to ensure the integrity and reliability of the data.

Step 3: Spatial Subset and Resampling

  • If the RCM data covers a larger region than needed, spatially subset it to the specific study area of interest.
  • If your application requires merging RCM data with other datasets of different resolutions, perform spatial resampling or interpolation to align them.

Step 4: Temporal Aggregation and Time Synchronization

  • Depending on your application, aggregate the RCM data to the desired temporal resolution (e.g., daily, monthly, seasonal means) to match the timescale required for your analysis.
  • Ensure that all datasets are synchronized and aligned in time, accounting for any time lags or offsets.

Step 5: Bias Correction (if needed)

  • Apply bias correction techniques, if necessary, to address any systematic errors or biases in the RCM data compared to observed climate data.

Step 6: Coordinate Reference System (CRS) Standardization

  • Check the coordinate reference system of the RCM data and convert it to a standardized CRS (e.g., WGS 84) if required for compatibility with other datasets or GIS applications.

Step 7: Metadata Documentation

  • Document all the essential metadata for the RCM data, including model information, downscaling methods, time period, spatial resolution, and any preprocessing steps undertaken.

Step 8: Data Format and Storage

  • Save the processed RCM data in an appropriate format, such as NetCDF, which is commonly used for climate data storage and analysis.

Step 9: Data Validation and Evaluation

  • Perform validation and evaluation exercises to assess the performance and accuracy of the RCM data against observations or benchmark datasets.

Step 10: Data Sharing and Dissemination

  • Make the application-ready RCM data available to the scientific community and other stakeholders by sharing it through climate data portals, data repositories, or other relevant platforms.

By following these steps, you will create high-quality, application-ready Regional Climate Model data that is well-prepared for use in various climate research and applications, such as climate impact assessments, climate change projections, or coupled modeling studies.

When I ask it how to create application-ready climate products, I get the following:

Step 1: Define the Climate Product

  • Clearly define the purpose and scope of the climate product. Determine what specific information or insights the product aims to provide and the target audience.

Step 2: Data Collection and Selection

  • Identify the relevant climate data required to create the product. This could include observational data, climate model outputs, reanalysis data, or other relevant sources.
  • Select the appropriate datasets that align with the product's goals and cover the relevant spatial and temporal scales.

Step 3: Data Preprocessing and Quality Control

  • Preprocess the climate data to handle missing values, outliers, and any data inconsistencies.
  • Conduct quality control checks to ensure the data's integrity and reliability.

Step 4: Data Analysis and Derivation

  • Perform data analysis and derive the necessary climate indicators or variables required for the product.
  • Apply relevant statistical or climatological methods to compute meaningful information, such as long-term averages, anomalies, trends, or extreme events.

Step 5: Visualization and Plotting

  • Create clear and visually appealing visualizations that effectively communicate the climate information.
  • Use appropriate maps, graphs, and charts to highlight key patterns, trends, and variations in the data.

Step 6: Product Packaging and Formatting

  • Organize the climate product in a well-structured format that is easy for users to understand and navigate.
  • Ensure that the product includes clear titles, legends, units, and metadata to provide context and interpretability.

Step 7: User Interface (if applicable)

  • If the climate product is interactive or web-based, design a user-friendly interface that allows users to explore and interact with the data easily.
  • Consider providing user-guides or tooltips to assist users in interpreting the information.

Step 8: Documentation and User Support

  • Prepare comprehensive documentation that explains the methodology, data sources, processing steps, and limitations of the climate product.
  • Offer user support, such as FAQs or contact information, to help users interpret and utilize the product effectively.

Step 9: Validation and Sensitivity Analysis

  • Validate the climate product by comparing it with other reliable datasets or observations.
  • Conduct sensitivity analyses to assess the product's robustness and reliability under different scenarios.

Step 10: Data Distribution and Sharing

  • Make the application-ready climate product available to the intended audience through online platforms, data portals, or publication in scientific journals.
  • Consider licensing and copyright issues if applicable.

By following these steps, you will create high-quality, application-ready climate products that provide valuable insights and information to users, facilitating their understanding of climate patterns, trends, and variations.

@AliciaTak
Copy link

@paolap Here are slightly different steps when I reword the question (what to do with raw RCM outputs?), e.g., steps 9., 10., 12. were not included in the precious prompts but are important - especially step 12.

Step 1: Data Collection and Preprocessing

  • Obtain the raw RCM output data from your simulation or model runs.
  • Preprocess the data to handle any missing values, inconsistencies, or outliers.

Step 2: Spatial and Temporal Subset

  • Determine the study area and time period of interest for your application.
  • Subset the raw RCM data to cover the specific spatial domain and time range relevant to your research.

Step 3: Data Bias Correction (if needed)

  • Evaluate and, if necessary, apply bias correction techniques to ensure that the RCM data aligns well with observed climate data.

Step 4: Calculate Climate Indices

  • Calculate relevant climate indices, such as temperature extremes (e.g., heatwaves, cold spells), precipitation indices (e.g., dry spells, extreme rainfall events), or other climatic variables of interest.

Step 5: Data Aggregation and Averaging

  • If needed, aggregate the data to different temporal resolutions (e.g., daily, monthly, seasonal) to create meaningful averages or trends.

Step 6: Spatial Interpolation (if needed)

  • If your application requires higher-resolution data, consider interpolating the RCM data to a finer grid.

Step 7: Data Visualization and Analysis

  • Create visualizations and plots to analyze the processed climate data.
  • Use appropriate charts, graphs, and maps to represent the data effectively.

Step 8: Quality Control and Validation

  • Perform quality control checks on the processed data to ensure accuracy and reliability.
  • Validate the results against observational data or other reputable climate datasets.

Step 9: Documentation

  • Document all processing steps, data sources, methodologies, and assumptions made during the data processing and analysis.

Step 10: Create User-Friendly Climate Products

  • Package the processed data and visualizations into user-friendly climate products.
  • Provide clear explanations and interpretations of the climate information for the intended audience.

Step 11: Data Distribution and Sharing

  • Make the application-ready climate products accessible to other researchers, policymakers, or stakeholders through data repositories, online platforms, or scientific publications.

Step 12: Continual Update and Improvement

  • Continually update and improve the climate products as new data becomes available or as your research evolves.

@paolap
Copy link
Contributor Author

paolap commented Jul 20, 2023

While this is great for the moment I would rather just add a generic scope of the guidelines. This is very useful but it is a very specific use case, so I'm hesitant to write it down as it is. What I could do is to pick the generic steps as points to cover, for example:

This guidelines should cover:

  • Data Collection and Preprocessing
  • Spatial and Temporal Subset
  • Data Bias Correction
  • ...
  • Create User-Friendly Climate Products etc.

Leaving out completely things like "Step 4: Calculate Climate Indices" and the "sub-steps" as they're specific to that example.

@AliciaTak
Copy link

Yeah, absolutely @paolap! I just thought I share the prompt about the definition of "application-ready" data and what it believes it involves. I think we also need to be mindful regarding copy-right issues as it is not clear who is the owner of the prompts :)

@paolap
Copy link
Contributor Author

paolap commented Jul 20, 2023

Great, it's still good to have the longer and detailed list here for later on.
And yes I will add copyright/license etc...

@paolap
Copy link
Contributor Author

paolap commented Jul 20, 2023

Ok I've had a go at writing the intro and will put a pull request soon. I re-organised the steps in 4 broader categories, let me know what you think

@paolap
Copy link
Contributor Author

paolap commented Sep 20, 2023

Just realised we didn't add this guidelines to the intro page

@paolap
Copy link
Contributor Author

paolap commented May 29, 2024

I finally added something about this to the introduction page, I also pointed out that this is a work in progress and we welcome input from others who have some experience/ideas. In branch paola_isssues

@AliciaTak
Copy link

Thanks, Paola!

paolap added a commit that referenced this issue May 30, 2024
@paolap paolap mentioned this issue Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants