Skip to content

Latest commit

 

History

History
160 lines (113 loc) · 3.65 KB

README.md

File metadata and controls

160 lines (113 loc) · 3.65 KB

Web Pdf Saver

This is a simple web application that allows users to save web pages as PDFs.

Support Us 💖

If you find this project useful, please consider:

  • Star the Repo: Give it a star on GitHub to help increase its visibility.

Installation 🔧

  1. Clone the repository
git clone [email protected]:juliogarciape/web-pdf-saver.git
  1. Install the dependencies
npm install
  1. Start the application
npm start

Usage 🛠️

The application uses Puppeteer to generate PDFs from web pages.

Authentication 🔑

If the web page requires authentication, you can provide the credentials. The directory src/storage contains the files cookies.json and localStorage.json, which store the cookie information and local storage data, respectively.

/* src/storage/cookies.json */

[
  {
	"name": "cookie_name",
	"value": "cookie_value",
	"domain": "example.com",
	"path": "/",
	"expires": 1634025600,
	"size": 50,
	"httpOnly": false,
	"secure": false,
	"session": false,
	"sameSite": "Lax"
  }
]

/* src/storage/localStorage.json */

{
  "userSettings": "{\"language\":\"en\",\"theme\":\"dark\"}",
  "lastPageVisited": "/home",
  "savedItems": "[{\"id\":\"item1\",\"name\":\"Item One\",\"quantity\":3},{\"id\":\"item2\",\"name\":\"Item Two\",\"quantity\":1}]",
  "sessionId": "xyz789"
}

Single Page 📄

To generate a PDF for a web page, you need to provide the URL of the page.

/* src/index.js */

import { generatePDF } from './lib/scraper.js';
import config from './config/config.js';

const baseUrl = 'https://www.example.com'; // URL of the web page

/* Generate PDF for a single page */

await generatePDF({
  webPage: baseUrl,
  pdfOptions: config.pdfOptions,
  puppeteerOptions: config.puppeteerOptions,
});

Multiple Pages 📚

To generate PDFs for multiple pages, you can use the multiPage option.

/* src/index.js */

import { generatePDF } from './lib/scraper.js';
import config from './config/config.js';

const baseUrl = 'https://www.example.com'; // URL of the web page

/* Generate PDF for multiple pages */

await generatePDF({
  webPage: baseUrl,
  pdfOptions: config.pdfOptions,
  puppeteerOptions: config.puppeteerOptions,
  multiPage: {
    startIndex: config.puppeteerOptions.startIndex, // Index of the first page
    linkSelector: '#sidebar-collection-categories a', // Selector for finding links to multiple pages
  },
});

Configuration ⚙️

The default configuration can be found in the config.js file.

/* src/config/config.js */

const config = {
	puppeteerOptions: {
		userAgent:
			'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
		headless: false,
		viewport: { width: 1024, height: 768 },
		startIndex: 0,
	},
	pdfOptions: {
		format: 'A4',
		outDir: './pdfs',
	},
};

export default config;

Save PDFs 💾

The generated PDFs will be saved in the pdfs directory.

web-pdf-saver
├── pdfs
│   ├── example-1.pdf
│   ├── example-2.pdf
│   ├── example-3.pdf
│   └── ...

Project Background 💡

When a well-known website was about to shut down, I created this app to preserve its valuable content, as saving the large number of pages manually was impractical.

License 📜

This project is licensed under the MIT License - see the LICENSE file for details

Contact 📧

For questions or support, please contact me at [email protected].