-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SD-119] Implement layer execution latency measurements for Pytorch #48
base: develop
Are you sure you want to change the base?
Conversation
@@ -7,4 +7,6 @@ requires-python = ">=3.11" | |||
dependencies = [ | |||
"dvc-s3>=3.2.0", | |||
"pandas>=2.2.3", | |||
"pillow>=11.1.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this packaged being used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It an implicit requirement by pytorch to run resnet18
module: the module to register hook. | ||
input: tuple containing the input arguments to module's forward method. | ||
""" | ||
layer_time_dict[layer_name] = (time.time(), datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if time.time
is a reliable function to get a time as it depends on system clock.
I think time.perf_counter
or time.perf_counter_ns()
makes more sense as these layers are going to be fast and we need more precise estimates (resolution offered by these functions is different).
Reference:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to worry about CPU implementation of timing too much, as the hardware we're using has cuda, and CUDA events will be used in the next ticket
layer_time_dict = {} | ||
|
||
for layer_name, layer in get_layers(model): | ||
layer.register_forward_pre_hook(partial(layer_time_pre_hook, layer_time_dict, layer_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are hooks being used? Does profiler API not work? I think it will provide much better results on CPU and GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand autograd profiler gives us the wrong resolution (it gives latencies by operation type, not by layer). But @osw282 can give more context I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small nit about milliseconds, otherwise good to go I think
module: the module to register hook. | ||
input: tuple containing the input arguments to module's forward method. | ||
""" | ||
layer_time_dict[layer_name] = (time.time(), datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N.B. these functions will need use CudaEvents from SD-118 in the actual benchmark script
module: the module to register hook. | ||
input: tuple containing the input arguments to module's forward method. | ||
""" | ||
layer_time_dict[layer_name] = (time.time(), datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not enough precision for layer start time, we will need milliseconds too (layers in resnet18 take about 1.5ms to execute). Worth fixing here, to not forget to fix it in the next ticket with Cuda Events
This PR implement a script that will record the layer execution time of a pytorch model for inferencing using CPU only.
The script will output a json file containing the execution time, timestamp and the layer name for all inference cycles.