-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read event data in parallel to backtest #124
base: master
Are you sure you want to change the base?
Conversation
Remove the `read_data()` calls within the backtest implementations and replace them with `recv()` calls on a lock-free queue. This avoids the pause that happened previously when a backtest reaches the end of the current periods data and begins loading the next file. With this model, the data for the next period should be available by the time the previous one finishes.
@@ -60,7 +61,7 @@ where | |||
/// Provides a data cache that allows both the local processor and exchange processor to access the | |||
/// same or different data based on their timestamps without the need for reloading. | |||
#[derive(Clone, Debug)] | |||
pub struct Cache<D>(Rc<RefCell<HashMap<String, CachedData<D>>>>) | |||
pub struct Cache<D>(Arc<RefCell<HashMap<String, CachedData<D>>>>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover from the initial prototype, needs reverted.
@@ -49,6 +49,7 @@ hmac = { version = "0.13.0-pre.3", optional = true } | |||
rand = { version = "0.8.5", optional = true } | |||
uuid = { version = "1.8.0", features = ["v4"], optional = true } | |||
nom = { version = "7.1.3", optional = true } | |||
bus = { version = "2.4" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: bus sometimes busy-waits in the current implementation, which may cause increased CPU usage — see jonhoo/bus#23.
. There is at least one case where the readers race with the writer and may not successfully wake it up, so the writer has to park with a timeout. I would love to get rid of this, but haven't had a chance to dig into it, and no longer use this library actively myself. If you want to take a look, I'd be happy to help out!
It probably won't cause any problems (because this happens in SPMC and we're an SPSC structure), but I think I'll put up some docs anyway.
I conducted a quick test and obtained the following results:
I have some concerns about whether this is the right direction. The approach introduces an additional copy. Moreover, the original daily loading method was designed to handle multiple days of data within limited memory by loading data one by one. With the new suggestion, there is a risk of exceeding memory capacity if data consumption doesn't keep pace with how quickly it is enqueued into the bus. |
Without the strategy implementation, the test uses only elapse(100ms). I will include a test with more intensive data, such as BTCUSDT.
|
That is simply a limitation of the
The queue is a fixed size, so there's no risk of exceeding memory capacity. Although it should be loading incrementally by copying chunks of
Can you share this test? It'd be useful to put in a benchmark as I iterate. |
Even though the queue implementation is lock-free, doesn't introducing an atomic value to check items in the producer/consumer potentially trigger cache invalidation, adding another layer of overhead? I used the Rust version of the grid trading backtest example. It would be beneficial to have two benchmarks: one with and one without the strategy implementation. Using the BTCUSDT data from the here provided to ensure the benchmarks are aligned. |
Working on replacing the bus with a ring buffer that eliminates the copies now. I think we can get away with very little ato mic usage on x64, references: |
Remove the
read_data()
calls within the backtest implementations and replace them withrecv()
calls on a lock-free queue. This avoids the pause that happened previously when a backtest reaches the end of the current periods data and begins loading the next file. With this model, the data for the next period should be available by the time the previous one finishes.There are still a couple of improvements that need to be made here:
Clone
is unnecessary, readers could easily accept a reference toEvent
butBusReader
doesn't give out referencesDataSource::Data
is currently unsupported because it is notSend
orSync
There are also a few peculiarities in the implementation like having
peek
soinitialize_data
can be trivially implemented, I'd like to see about restructuring this.Remaining todo items
bus
with a simple circular queueNpyReader<R, T>
that yields singleT
items from a readerR
DataSource::Data
workEventConsumer
/TimestampedEventQueue
/etc. traits that were introduced to reduce implementation effort.