-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 321.pdf-generator benchmark and its data #217
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Abhishek Kumar <[email protected]>
WalkthroughThe recent updates involve the addition of new files and configurations for a benchmark project focused on PDF generation using Puppeteer. Key changes include the introduction of configuration files, scripts for input generation, and automation for setting up a Chromium environment. These modifications streamline the process of converting HTML to PDF in a cloud setting and enhance resource management. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Generator
participant CloudStorage
participant Puppeteer
User->>Generator: Request PDF generation
Generator->>CloudStorage: Retrieve HTML file
CloudStorage->>Generator: Send HTML file
Generator->>Puppeteer: Render HTML to PDF
Puppeteer-->>Generator: Return PDF
Generator->>CloudStorage: Upload PDF
CloudStorage-->>Generator: Confirm upload
Generator-->>User: PDF generation complete
Warning Review ran into problemsProblems (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
When I was testing the HTML code with sample image puppeteer couldn't fetch the image from the location to add it in the pdf, so that's why I was thinking to use a direct (URL of the image in the HTML) document or use the web page URL (which of course we cant use because of the discussion we had). So this was the main issue, other wise the benchmark is working perfectly fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Outside diff range, codebase verification and nitpick comments (1)
benchmarks/600.pdf/601.pdf-generator/input.py (1)
7-32
: Add comments to clarify the input generation process.The function
generate_input
is well-structured but could benefit from additional comments to explain its steps.Consider adding comments like this:
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func): # Paths for the HTML template and images input_file_path = os.path.join(data_dir, 'template', 'demo.html') images_dir = os.path.join(data_dir, 'template', 'images') # Initialize the input configuration dictionary input_config = {'object': {}, 'bucket': {}} # Upload the HTML file to the input bucket upload_func(0, "demo.html", input_file_path) # Configure the bucket paths input_config['bucket']['bucket'] = benchmarks_bucket input_config['bucket']['input'] = input_paths[0] input_config['bucket']['output'] = output_paths[0] # Upload each image in the images directory for file in glob.glob(os.path.join(images_dir, '*.png')): img = os.path.relpath(file, data_dir) upload_func(0, img, file) # Store image configuration in 'object' input_config['object']['key'] = "images/" input_config['object']['input_file'] = 'demo.html' return input_config
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (6)
- benchmarks-data (1 hunks)
- benchmarks/600.pdf/601.pdf-generator/config.json (1 hunks)
- benchmarks/600.pdf/601.pdf-generator/input.py (1 hunks)
- benchmarks/600.pdf/601.pdf-generator/nodejs/function.js (1 hunks)
- benchmarks/600.pdf/601.pdf-generator/nodejs/init.sh (1 hunks)
- benchmarks/600.pdf/601.pdf-generator/nodejs/package.json (1 hunks)
Files skipped from review due to trivial changes (3)
- benchmarks-data
- benchmarks/600.pdf/601.pdf-generator/config.json
- benchmarks/600.pdf/601.pdf-generator/nodejs/package.json
Additional context used
Shellcheck
benchmarks/600.pdf/601.pdf-generator/nodejs/init.sh
[warning] 4-4: VERBOSE appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: SCRIPT_DIR appears unused. Verify use (or export if used externally).
(SC2034)
Benchmark PR data is here spcl/serverless-benchmarks-data#4 |
Signed-off-by: Abhishek Kumar <[email protected]>
effb193
to
7802114
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- benchmarks/300.utilities/321.pdf-generator/config.json (1 hunks)
- benchmarks/300.utilities/321.pdf-generator/input.py (1 hunks)
- benchmarks/300.utilities/321.pdf-generator/nodejs/function.js (1 hunks)
- benchmarks/300.utilities/321.pdf-generator/nodejs/init.sh (1 hunks)
- benchmarks/300.utilities/321.pdf-generator/nodejs/package.json (1 hunks)
Files skipped from review due to trivial changes (1)
- benchmarks/300.utilities/321.pdf-generator/nodejs/package.json
Additional context used
Shellcheck
benchmarks/300.utilities/321.pdf-generator/nodejs/init.sh
[warning] 4-4: VERBOSE appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: SCRIPT_DIR appears unused. Verify use (or export if used externally).
(SC2034)
Additional comments not posted (11)
benchmarks/300.utilities/321.pdf-generator/config.json (3)
2-2
: LGTM!The timeout setting of 60 seconds seems reasonable for a PDF generation task.
3-3
: LGTM!The memory setting of 256 MB seems reasonable for a PDF generation task.
4-4
: LGTM!The languages setting appropriately includes "nodejs" for the task.
benchmarks/300.utilities/321.pdf-generator/nodejs/init.sh (4)
6-10
: LGTM!The Chromium URL and download directory setup are appropriate.
Tools
Shellcheck
[warning] 9-9: SCRIPT_DIR appears unused. Verify use (or export if used externally).
(SC2034)
12-16
: LGTM!The directory creation and file download are correctly implemented.
18-22
: LGTM!The file extraction and cleanup are correctly implemented.
24-28
: LGTM!The file move and directory removal are correctly implemented.
benchmarks/300.utilities/321.pdf-generator/input.py (1)
1-5
: LGTM!The imports are appropriate, and the
buckets_count
function correctly returns a tuple.benchmarks/300.utilities/321.pdf-generator/nodejs/function.js (3)
7-9
: LGTM!The variable declarations are appropriate for the task at hand.
The code changes are approved.
12-53
: Improve error handling and readability.
- Add error handling for
page.setContent
andpage.pdf
to handle potential errors during PDF generation.- Consider using async/await for stream operations to improve readability.
- Ensure the
inputStream
is properly closed after use.Apply this diff to improve error handling and readability:
try { // Download the HTML file from storage const inputStream = await readStreamPromise; inputStream.pipe(htmlStream); // Launch Puppeteer and generate the PDF const browser = await puppeteer.launch({ executablePath: browserPath }); const page = await browser.newPage(); try { await page.setContent(await streamToString(htmlStream), { waitUntil: 'networkidle0' }); const pdfBuffer = await page.pdf({ format: 'A4' }); // Close Puppeteer await browser.close(); // Pipe the PDF buffer into the write stream writeStream.write(pdfBuffer); writeStream.end(); // Wait for upload to complete await promise; return { bucket: output_prefix, key: uploadName }; } catch (error) { console.error('Error generating PDF:', error); await browser.close(); throw error; } finally { inputStream.destroy(); } } catch (error) { console.error('Error generating PDF:', error); throw error; }
55-63
: LGTM!The
streamToString
function is well-implemented and handles stream conversion correctly.The code changes are approved.
@octonawish-akcodes Thanks for the PR! What would be needed to add later a Python version? |
Yes, python version is remaining, since nodejs one was quite fast in development, I first added that. Need to dig deeper about the supported |
Signed-off-by: Abhishek Kumar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- benchmarks/300.utilities/321.pdf-generator/nodejs/function.js (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- benchmarks/300.utilities/321.pdf-generator/nodejs/function.js
This Pull Request introduces a new PDF Generation benchmark for SEBS, implemented using Node.js. The benchmark leverages the Puppeteer library to facilitate the PDF generation process.
Details
puppeteer-core
at version22.15.0
.puppeteer-core
does not include a bundled browser. For optimal performance and compatibility, the benchmark is configured to work with Chrome version127.0.6533.88
, which is fully supported bypuppeteer-core
version22.15.0
.The specific Chrome version was chosen to ensure compatibility and stability with the selected
puppeteer-core
version, allowing the benchmark to reliably generate PDFs in various testing environments.Summary by CodeRabbit
New Features
Bug Fixes