-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarking for inference notebooks #184
Comments
@rishirich please add any updates for the approach you are taking to solve this issue here. |
@Shreyanand I was trying to take direct measurements of times taken by each PDF, but then figured that the time taken per PDF is dictated by the number of pages and its text density in general. We can then run this for all the PDFs, get average inferencing time per page per PDF per question, and collect all the averages for PDFs and check the mean, min, max, and std of these average times. This way, we get to consider the average density of text per page per PDF, and the varying sizes (number of PDFs) won't dictate the average inferencing times per PDF. After this, for any particular PDF, we multiply this average by the number of pages in that PDF to get the expected inferencing time, and can also record the actual inferencing time. |
We discussed in the Data Extraction weekly meeting that the extractor's pattern for recognizing paragraphs (a newline or perhaps a pair of newlines) was creating pessimal results for CDP documents where a paragraph is a short sentence "State the global scope 1 CO2 emissions (in megatons)" and the answer is even shorter ("1000"). Many small paragraphs are NOT conducive either to its method of extraction, as well as creating lots of fruitless paragraphs to search. The team will try a new approach of using a question number (a regexp that would match (C4.1a, C4.2, etc) and would treat all the text between as sentences. This will both create a lot more context and greatly reduce the number of paragraphs that have to be searched. Bottom line: number of "paragraphs" as well as pages should be measured. |
@DaBeIDS @MichaelTiemannOSC for visibility |
Contains modified inferencing coded for both Relevance and KPI Inferencing phases of the inference pipeline to add benchmarking steps. Both models have been benchmarked thoroughly and includes the following metrics: 1) Relevance Model: - Total Number of Data Points Processed - Total Inference Time - Average Number of Pages Per PDF - Average Inference Time Per PDF - Minimum Inference Time of PDF - Maximum Inference Time of PDF - Std of Inference Times of PDFs - Average Time Per Data Point Processed - Average Data Points Processed Per Second 2) KPI Model: - Total Number of Data Points Processed - Total Inference Time - Average Inference Time Per CSV - Minimum Inference Time of CSV - Maximum Inference Time of CSV - Std of Inference Times of CSVs - Average Time Per Data Point Processed - Average Data Points Processed Per Second Signed-off-by: [Rishikesh Gawade](https://github.com/rishirich/)
Contains modified inferencing coded for both Relevance and KPI Inferencing phases of the inference pipeline to add benchmarking steps. Both models have been benchmarked thoroughly and includes the following metrics: 1) Relevance Model: - Total Number of Data Points Processed - Total Inference Time - Average Number of Pages Per PDF - Average Inference Time Per PDF - Minimum Inference Time of PDF - Maximum Inference Time of PDF - Std of Inference Times of PDFs - Average Time Per Data Point Processed - Average Data Points Processed Per Second 2) KPI Model: - Total Number of Data Points Processed - Total Inference Time - Average Inference Time Per CSV - Minimum Inference Time of CSV - Maximum Inference Time of CSV - Std of Inference Times of CSVs - Average Time Per Data Point Processed - Average Data Points Processed Per Second Signed-off-by: [Rishikesh Gawade](https://github.com/rishirich/) Signed-off-by: [Rishikesh Gawade](https://github.com/rishirich/)
Contains modified inferencing coded for both Relevance and KPI Inferencing phases of the inference pipeline to add benchmarking steps. Both models have been benchmarked thoroughly and includes the following metrics: 1) Relevance Model: - Total Number of Data Points Processed - Total Inference Time - Average Number of Pages Per PDF - Average Inference Time Per PDF - Minimum Inference Time of PDF - Maximum Inference Time of PDF - Std of Inference Times of PDFs - Average Time Per Data Point Processed - Average Data Points Processed Per Second 2) KPI Model: - Total Number of Data Points Processed - Total Inference Time - Average Inference Time Per CSV - Minimum Inference Time of CSV - Maximum Inference Time of CSV - Std of Inference Times of CSVs - Average Time Per Data Point Processed - Average Data Points Processed Per Second Signed-off-by: Rishikesh Gawade <[email protected]>
For us to evaluate sparsification results, we need to evaluate the performance of each inference step: relevance and kpi-extraction.
As a part of this issue, create a notebook called benchmarks.ipynb in the demo2 directory. This notebook will load the relevance model and the kpi extraction model and infer from large number of pdfs (145 samples).
Results should look something like this:
Second, get the performance metrics for each model. Look at the end of train_relevance, and train_kpi_extraction, find and borrow relevant code and the test dataset to get performance metrics: f1 score, recall, precision, and accuracy.
Results should look something like this:
Print the model size in MBs for both the models
The text was updated successfully, but these errors were encountered: