Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need throttling for push/pull connections #17

Open
jfischer opened this issue Dec 10, 2011 · 1 comment
Open

Need throttling for push/pull connections #17

jfischer opened this issue Dec 10, 2011 · 1 comment

Comments

@jfischer
Copy link
Contributor

I tried running all the blocks of the file_index.json example on my laptop. It drove up the cpu and memory usage to the point where the UI froze and I had to hard-reset my laptop. This will also be an issue in distributed scenarios where multiple producers (e.g. a crawler) feed into one consumer. The solution is to add throttling to push/pull connections.

We need to discuss the design options. Here's a few ideas:

  • Does zeromq actually queue messages? If so, can we get queue size statistics?
  • Could look at cpu / memory usage on the consumer side. If it exceeds some threshold (e.g. above 80%), request that produces back off a bit.
  • Messages would include timestamps. Consumer would send periodic acknowledgement message to consumer indicating the wait time and processing time for messages. If it is too large, producer would back off.
@t-saideep
Copy link
Contributor

Were you trying: examples/file_index_noquery.json

Unfortunately, I cannot reproduce that on my machine. Maybe we can debug this on our phone conversation today.

Does zeromq actually queue messages? If so, can we get queue size statistics?

Yes, it does. But unfortunately we cannot access the queue data. Right now, we maintain counters on each block which keep track of how many requests have been served. This data is sent to the Master and it can help us with parallelization.

The other two design choices could work. But in some cases, the producers cannot back off as they might be getting data from external sources. The solution in this case would be to have a consumer shard and add extra consumers when the traffic goes high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants