You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried running all the blocks of the file_index.json example on my laptop. It drove up the cpu and memory usage to the point where the UI froze and I had to hard-reset my laptop. This will also be an issue in distributed scenarios where multiple producers (e.g. a crawler) feed into one consumer. The solution is to add throttling to push/pull connections.
We need to discuss the design options. Here's a few ideas:
Does zeromq actually queue messages? If so, can we get queue size statistics?
Could look at cpu / memory usage on the consumer side. If it exceeds some threshold (e.g. above 80%), request that produces back off a bit.
Messages would include timestamps. Consumer would send periodic acknowledgement message to consumer indicating the wait time and processing time for messages. If it is too large, producer would back off.
The text was updated successfully, but these errors were encountered:
Unfortunately, I cannot reproduce that on my machine. Maybe we can debug this on our phone conversation today.
Does zeromq actually queue messages? If so, can we get queue size statistics?
Yes, it does. But unfortunately we cannot access the queue data. Right now, we maintain counters on each block which keep track of how many requests have been served. This data is sent to the Master and it can help us with parallelization.
The other two design choices could work. But in some cases, the producers cannot back off as they might be getting data from external sources. The solution in this case would be to have a consumer shard and add extra consumers when the traffic goes high.
I tried running all the blocks of the file_index.json example on my laptop. It drove up the cpu and memory usage to the point where the UI froze and I had to hard-reset my laptop. This will also be an issue in distributed scenarios where multiple producers (e.g. a crawler) feed into one consumer. The solution is to add throttling to push/pull connections.
We need to discuss the design options. Here's a few ideas:
The text was updated successfully, but these errors were encountered: