Skip to content

This Python script automates the extraction of crucial financial data from Bank of America statements in PDF format. It processes files in the input directory, extracting account balances, deposits, withdrawals, and daily ledger entries. The data is then organized into JSON files and saved in the output folder.

Notifications You must be signed in to change notification settings

kmqasim055/PDF-Scraper-for-Bank-of-America-Statements

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF-Scraper-for-Bank-of-America-Statements

This Python script automates the extraction of crucial financial data from Bank of America statements in PDF format. It processes files in the input directory, extracting account balances, deposits, withdrawals, and daily ledger entries. The data is then organized into JSON files and saved in the output folder.

Overview

This script is designed to extract specific information from PDF files, particularly those generated by Bank of America statements. It processes these files and saves the relevant data in JSON format. Below is an overview of the script's functionality:

Script Functionality

  1. Input and Output Paths

• The script expects the PDF files to be processed to be in a folder named input within the "Bank of America" directory. • The extracted JSON files will be saved in a folder named output within the same "Bank of America" directory.

  1. Data Extraction

• The script extracts various financial details, including account information, balances, deposits, withdrawals, and daily ledger balances.

  1. Data Structuring

• The extracted information is organized into a structured JSON format, making it easy to access and analyze.

  1. File Naming

• The resulting JSON files are named the same as the original PDF files.

How to Use

  1. Setting Up the Environment

•Ensure you have the necessary Python packages installed (e.g., os, fitz, time, pandas, re).

  1. Folder Structure

• Create a directory named Bank of America. • Within this directory, create sub-directories named input and output.

  1. Placing PDF Files

• Put the PDF files you want to process in the input directory.

  1. Running the Script

• Execute the script. It will process all PDF files in the input directory.

  1. Output Files

• Once the script finishes processing, you will find corresponding JSON files in the output directory.

  1. Customization (Optional)

• If you need to adjust any parameters or functionalities, refer to the comments within the script for guidance.

Important Notes

• Ensure Python Environment:

Make sure you have Python installed with the required packages before running the script.

• File Compatibility:

This script is designed for Bank of America statements in PDF format. Ensure the PDFs follow the expected format for accurate extraction.

Disclaimer

This script is provided as is and may require modification based on specific use cases. Use it responsibly and verify the results for critical applications.

About

This Python script automates the extraction of crucial financial data from Bank of America statements in PDF format. It processes files in the input directory, extracting account balances, deposits, withdrawals, and daily ledger entries. The data is then organized into JSON files and saved in the output folder.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages