Add benchmarking for inference notebooks #184

Shreyanand · 2022-07-14T17:18:25Z

For us to evaluate sparsification results, we need to evaluate the performance of each inference step: relevance and kpi-extraction.

As a part of this issue, create a notebook called benchmarks.ipynb in the demo2 directory. This notebook will load the relevance model and the kpi extraction model and infer from large number of pdfs (145 samples).
Results should look something like this:
- Relevance: [t1, t2, ... t145] distribution of 145 inference times; find it's min, mean, max, std
- KPI extraction: [t1, t2, ... t145] distribution of 145 inference times; find it's min, mean, max, std
- This should borrow inference code from infer_relevance and infer_kpi notebooks.
Second, get the performance metrics for each model. Look at the end of train_relevance, and train_kpi_extraction, find and borrow relevant code and the test dataset to get performance metrics: f1 score, recall, precision, and accuracy.
Results should look something like this:
- Relevance: f1 score, recall, precision, and accuracy (on the test set ~30 files assuming 80,20 split; double check this bit)
- KPI extraction: f1 score, recall, precision, and accuracy '
Print the model size in MBs for both the models

Shreyanand · 2022-07-26T12:56:58Z

@rishirich please add any updates for the approach you are taking to solve this issue here.

rishirich · 2022-07-27T16:35:54Z

@Shreyanand I was trying to take direct measurements of times taken by each PDF, but then figured that the time taken per PDF is dictated by the number of pages and its text density in general.
I did a deep-dive into the code and checked how the data was gathered and chunking was done on it.
I think a more accurate way of benchmarking would be to create chunks out of each individual page (chunk = number of questions X number of paragraphs), run inferencing on that chunk (i.e. a page), and then proceed to the next.
Once all the pages in the PDF are processed, we can note the average time taken by every page in that pdf.
A benefit of this method would be that the density of text within each page of the PDF gets considered.

We can then run this for all the PDFs, get average inferencing time per page per PDF per question, and collect all the averages for PDFs and check the mean, min, max, and std of these average times. This way, we get to consider the average density of text per page per PDF, and the varying sizes (number of PDFs) won't dictate the average inferencing times per PDF.

After this, for any particular PDF, we multiply this average by the number of pages in that PDF to get the expected inferencing time, and can also record the actual inferencing time.

MichaelTiemannOSC · 2022-07-27T16:54:58Z

We discussed in the Data Extraction weekly meeting that the extractor's pattern for recognizing paragraphs (a newline or perhaps a pair of newlines) was creating pessimal results for CDP documents where a paragraph is a short sentence "State the global scope 1 CO2 emissions (in megatons)" and the answer is even shorter ("1000"). Many small paragraphs are NOT conducive either to its method of extraction, as well as creating lots of fruitless paragraphs to search. The team will try a new approach of using a question number (a regexp that would match (C4.1a, C4.2, etc) and would treat all the text between as sentences. This will both create a lot more context and greatly reduce the number of paragraphs that have to be searched.

Bottom line: number of "paragraphs" as well as pages should be measured.

MichaelTiemannOSC · 2022-07-27T17:06:53Z

@DaBeIDS @MichaelTiemannOSC for visibility

Contains modified inferencing coded for both Relevance and KPI Inferencing phases of the inference pipeline to add benchmarking steps. Both models have been benchmarked thoroughly and includes the following metrics: 1) Relevance Model: - Total Number of Data Points Processed - Total Inference Time - Average Number of Pages Per PDF - Average Inference Time Per PDF - Minimum Inference Time of PDF - Maximum Inference Time of PDF - Std of Inference Times of PDFs - Average Time Per Data Point Processed - Average Data Points Processed Per Second 2) KPI Model: - Total Number of Data Points Processed - Total Inference Time - Average Inference Time Per CSV - Minimum Inference Time of CSV - Maximum Inference Time of CSV - Std of Inference Times of CSVs - Average Time Per Data Point Processed - Average Data Points Processed Per Second Signed-off-by: [Rishikesh Gawade](https://github.com/rishirich/)

Contains modified inferencing coded for both Relevance and KPI Inferencing phases of the inference pipeline to add benchmarking steps. Both models have been benchmarked thoroughly and includes the following metrics: 1) Relevance Model: - Total Number of Data Points Processed - Total Inference Time - Average Number of Pages Per PDF - Average Inference Time Per PDF - Minimum Inference Time of PDF - Maximum Inference Time of PDF - Std of Inference Times of PDFs - Average Time Per Data Point Processed - Average Data Points Processed Per Second 2) KPI Model: - Total Number of Data Points Processed - Total Inference Time - Average Inference Time Per CSV - Minimum Inference Time of CSV - Maximum Inference Time of CSV - Std of Inference Times of CSVs - Average Time Per Data Point Processed - Average Data Points Processed Per Second Signed-off-by: [Rishikesh Gawade](https://github.com/rishirich/) Signed-off-by: [Rishikesh Gawade](https://github.com/rishirich/)

Contains modified inferencing coded for both Relevance and KPI Inferencing phases of the inference pipeline to add benchmarking steps. Both models have been benchmarked thoroughly and includes the following metrics: 1) Relevance Model: - Total Number of Data Points Processed - Total Inference Time - Average Number of Pages Per PDF - Average Inference Time Per PDF - Minimum Inference Time of PDF - Maximum Inference Time of PDF - Std of Inference Times of PDFs - Average Time Per Data Point Processed - Average Data Points Processed Per Second 2) KPI Model: - Total Number of Data Points Processed - Total Inference Time - Average Inference Time Per CSV - Minimum Inference Time of CSV - Maximum Inference Time of CSV - Std of Inference Times of CSVs - Average Time Per Data Point Processed - Average Data Points Processed Per Second Signed-off-by: Rishikesh Gawade <[email protected]>

Shreyanand added the sparsification Indicates that the issue exists to achieve model sparsification. label Jul 14, 2022

Shreyanand assigned rishirich Jul 14, 2022

rishirich mentioned this issue Aug 8, 2022

Benchmarking Script for both Relevance and KPI Inference Models #194

Merged

Shreyanand mentioned this issue Oct 7, 2022

Add performance benchmark notebook #214

Merged

This was referenced Oct 27, 2022

[EPIC] Sparsification task for relevance text classification #215

Open

Add time analysis notebook and dataframe #217

Merged

Add transformer implementation of relevance task #216

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarking for inference notebooks #184

Add benchmarking for inference notebooks #184

Shreyanand commented Jul 14, 2022

Shreyanand commented Jul 26, 2022

rishirich commented Jul 27, 2022

MichaelTiemannOSC commented Jul 27, 2022

MichaelTiemannOSC commented Jul 27, 2022

Add benchmarking for inference notebooks #184

Add benchmarking for inference notebooks #184

Comments

Shreyanand commented Jul 14, 2022

Shreyanand commented Jul 26, 2022

rishirich commented Jul 27, 2022

MichaelTiemannOSC commented Jul 27, 2022

MichaelTiemannOSC commented Jul 27, 2022