Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adapt load_dataset() to 3W Dataset 2.0.0 #125

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

castrokelly
Copy link

@castrokelly castrokelly commented Oct 19, 2024

Adapt load_dataset() to 3W Dataset 2.0.0

This pull request adapts the load_dataset() function in toolkit/base.py to be compatible with the 3W Dataset 2.0.0. The new function, now named load_3w_dataset(), correctly handles the folder structure and different data types (real, simulated, and imputed) of the 3W Dataset 2.0.0

Changes made:

  • Replaced load_dataset() with load_3w_dataset(): The old load_dataset() function was replaced with the new load_3w_dataset() function, which is designed to handle the 3W Dataset 2.0.0
  • Handled folder structure: The new function iterates through the folders (0 to 9) in the dataset directory and loads all Parquet files within each folder.
  • Handled data types: The function allows loading specific data types ('real', 'simulated', or 'imputed') using the data_type parameter.
  • Maintained code standards: The new function follows the code standards of the 3W Toolkit, including documentation and error handling.
  • Updated documentation: The README.md file was updated to include information about the new function and how to use it.
  • Updated example notebook: The example notebook problems/01_binary_classifier_of_spurious_closure_of_dhsv/_baseline/main.ipynb was updated to use the new function.

Example usage in problems/01_binary_classifier_of_spurious_closure_of_dhsv/_baseline/main.ipynb:

# Import the load_3w_dataset function
from toolkit.base import load_3w_dataset

# ... (other imports) ...

# Load the 3W Dataset 2.0.0
df = load_3w_dataset(data_type='real', base_path='C:/Users/kelly/3W/dataset')  # Replace with the correct path

# Create the folds manually
folds = tk.EventFolds(
    experiment=experiment,
    df=df,  # Pass the loaded DataFrame to the EventFolds class
    # ... (other parameters, if needed) ...
)

# ... (rest of the code) ...

Benefits:

  • Compatibility with 3W Dataset 2.0.0: The 3W Toolkit is now compatible with the latest version of the dataset.
  • Improved data loading: The new function provides a more efficient and organized way to load the data.
  • Flexibility: Users can choose to load specific data types in the 2.0.0 version of 3W.
  • Updated documentation: The documentation reflects the changes and provides clear instructions on how to use the new function.

This contribution enhances the 3W Toolkit by enabling the use of the 3W Dataset 2.0.0, which provides more data and new types of events for analysis and experimentation.


By creating this pull request, I confirm that I have read and fully accept and agree with one of the Petrobras' Contributor License Agreements (CLAs):

Our CLAs are based on the Apache Software Foundation's CLAs:

@ricardoevvargas
Copy link
Collaborator

Hello, @castrokelly.

Thank you for submitting this PR.

Over the last few weeks, we have been concentrating on the necessary steps to add to the 3W Dataset some types of undesirable events that occur during the well drilling stage. We believe that this increase (planned for the coming months) will promote significant progress in the 3W Project.

We intend to evaluate this and the other open PRs over the next few days.

Once again, thank you for submitting this PR.

@castrokelly castrokelly changed the title feat: Adapt load_dataset() to 3W Dataset 2.0 feat: Adapt load_dataset() to 3W Dataset 2.0.0 Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants