-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performances #8
Comments
Thanks for the question! We have not run a detailed analysis on the inference speed, but it is slower than normal inference because of the gradient based updates to the activations. We are working on an extension that alleviates some of this, but it does get slower with an increased number of gradient updates. |
(not an issue or resolution, just a note) I'm also super grateful you've open-sourced this! It's a very creative approach to perturb the past and rerun iteratively. I've productionized this, figured I'd share some learnings:
In short, running this setup in production is tough; you can get decent speeds (5+ words per second with smaller gpt2 models on a GPU), but concurrent calls will queue since your flask server only has one worker. |
To directly answer the question, if I understand this code correctly, the performance impact is (1 + num_iterations) times greater than simply calling the model as-is. That's making the simplification assumption that the model predict function is 100% of the total inference time. |
Thanks for open-sourcing the code !
This approach is very interesting, but I'm curious about the impact on performance (inference speed).
Is there any benchmark showing the impact on performance with different parameters ?
The text was updated successfully, but these errors were encountered: