Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read event data in parallel to backtest #124

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

garyttierney
Copy link
Contributor

@garyttierney garyttierney commented Aug 17, 2024

Remove the read_data() calls within the backtest implementations and replace them with recv() calls on a lock-free queue. This avoids the pause that happened previously when a backtest reaches the end of the current periods data and begins loading the next file. With this model, the data for the next period should be available by the time the previous one finishes.

There are still a couple of improvements that need to be made here:

  • Clone is unnecessary, readers could easily accept a reference to Event but BusReader doesn't give out references
  • Files still need to be read in their entirety before they are sent down the bus.
  • Send and handle IO errors downstream in backtest engine
  • Some features (unstable_l3) currently aren't compiling because they haven't been updated
  • DataSource::Data is currently unsupported because it is not Send or Sync

There are also a few peculiarities in the implementation like having peek so initialize_data can be trivially implemented, I'd like to see about restructuring this.

Remaining todo items

  • Replace bus with a simple circular queue
  • NpyReader<R, T> that yields single T items from a reader R
  • Make DataSource::Data work
  • Clean up and consolidate the EventConsumer/TimestampedEventQueue/etc. traits that were introduced to reduce implementation effort.

Remove the `read_data()` calls within the backtest implementations and
replace them with `recv()` calls on a lock-free queue. This avoids the
pause that happened previously when a backtest reaches the end of the
current periods data and begins loading the next file. With this model,
the data for the next period should be available by the time the
previous one finishes.
@@ -60,7 +61,7 @@ where
/// Provides a data cache that allows both the local processor and exchange processor to access the
/// same or different data based on their timestamps without the need for reloading.
#[derive(Clone, Debug)]
pub struct Cache<D>(Rc<RefCell<HashMap<String, CachedData<D>>>>)
pub struct Cache<D>(Arc<RefCell<HashMap<String, CachedData<D>>>>)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from the initial prototype, needs reverted.

@@ -49,6 +49,7 @@ hmac = { version = "0.13.0-pre.3", optional = true }
rand = { version = "0.8.5", optional = true }
uuid = { version = "1.8.0", features = ["v4"], optional = true }
nom = { version = "7.1.3", optional = true }
bus = { version = "2.4" }
Copy link
Contributor

@bohblue2 bohblue2 Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: bus sometimes busy-waits in the current implementation, which may cause increased CPU usage — see jonhoo/bus#23.

. There is at least one case where the readers race with the writer and may not successfully wake it up, so the writer has to park with a timeout. I would love to get rid of this, but haven't had a chance to dig into it, and no longer use this library actively myself. If you want to take a look, I'd be happy to help out!

It probably won't cause any problems (because this happens in SPMC and we're an SPSC structure), but I think I'll put up some docs anyway.

@nkaz001
Copy link
Owner

nkaz001 commented Aug 18, 2024

I conducted a quick test and obtained the following results:

  • Current: 87 seconds (100%)
  • parallel-load: 72 seconds (82.8%)
  • parallel-loading-v2: 99 seconds (113.8%)

I have some concerns about whether this is the right direction. The approach introduces an additional copy. Moreover, the original daily loading method was designed to handle multiple days of data within limited memory by loading data one by one. With the new suggestion, there is a risk of exceeding memory capacity if data consumption doesn't keep pace with how quickly it is enqueued into the bus.

@nkaz001
Copy link
Owner

nkaz001 commented Aug 18, 2024

Without the strategy implementation, the test uses only elapse(100ms). I will include a test with more intensive data, such as BTCUSDT.

  • current: 23s: (100%)
  • parallel-load: 23s (100%)
  • parallel-loading-v2: 45s (195.7%)

@garyttierney
Copy link
Contributor Author

I have some concerns about whether this is the right direction. The approach introduces an additional copy.

That is simply a limitation of the bus API in use. Replacing this with a ring-buffer is on the todo list above and gets the readers back to zero-copy and good cache coherence.

With the new suggestion, there is a risk of exceeding memory capacity if data consumption doesn't keep pace with how quickly it is enqueued into the bus.

The queue is a fixed size, so there's no risk of exceeding memory capacity. Although it should be loading incrementally by copying chunks of Events out of the file at a time, also on the todo list.

Without the strategy implementation, the test uses only elapse(100ms). I will include a test with more intensive data, such as BTCUSDT.

Can you share this test? It'd be useful to put in a benchmark as I iterate.

@nkaz001
Copy link
Owner

nkaz001 commented Aug 18, 2024

Even though the queue implementation is lock-free, doesn't introducing an atomic value to check items in the producer/consumer potentially trigger cache invalidation, adding another layer of overhead?

I used the Rust version of the grid trading backtest example. It would be beneficial to have two benchmarks: one with and one without the strategy implementation. Using the BTCUSDT data from the here provided to ensure the benchmarks are aligned.

@garyttierney
Copy link
Contributor Author

Working on replacing the bus with a ring buffer that eliminates the copies now. I think we can get away with very little ato mic usage on x64, references:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants