Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Ray for distributed task execution #18

Open
wants to merge 2 commits into
base: development
Choose a base branch
from

Conversation

ake2l
Copy link
Member

@ake2l ake2l commented Dec 14, 2024

Overview

Introduces Ray framework for distributed task execution and multiprocessing support in Datamimic. This change moves us from single-process execution to a distributed model where:

  • Parent process handles model parsing and orchestration
  • Ray workers handle parallel data generation and processing
  • Improved resource utilization across multiple cores

Key Features

  • New GenerateWorker class using Ray for distributed execution
  • Parent/worker process separation for better resource management
  • Automatic scaling across available CPU cores
  • Proper context isolation between workers
  • Efficient cleanup of resources

Technical Implementation

  • Ray workers handle chunks of data generation tasks
  • Parent process manages task distribution and result aggregation
  • Context copying ensures proper isolation between workers
  • Resource cleanup in finally blocks
  • Error propagation from workers to parent process

Architecture

Parent Process

  • Parse models
  • Initialize Ray
  • Distribute tasks to workers
  • Aggregate results

Ray Workers (@ray.remote)

  • Handle data generation
  • Process data chunks
  • Manage database connections
  • Clean up resources

Notes

This is our first implementation using Ray for distributed processing. The architecture separates concerns between parent orchestration and worker execution while maintaining our existing model parsing and validation.

@ake2l ake2l requested a review from dangsg December 14, 2024 18:11
@ake2l
Copy link
Member Author

ake2l commented Dec 14, 2024

@dangsg a first draft of what we discussed , you can have a look ... all unittest and nearly all integration test are already working successfully

This update replaces synchronous test functions with asynchronous equivalents using `async def` and `pytest.mark.asyncio`. This ensures compatibility with async operations and improves consistency across functional test files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant