LLM + 10-K

Leveraging Large Language Models (LLMs) to extract, summarize, financial information out of 10-K filings derived from the SEC EDGAR database.

Form 10-K filings are financial reports annually submitted by publicly reporting companies in the U.S., where crucial information regarding the corporation's financial status, numbers, and risks are disclosed. They are notoriously long, with some reaching over 50,000 words of text, making it tedious to manually derive meaningful insights out from them. Alternatively, LLMs, with their ability to perform information retrieval and summarization, can be used to automate this process and thus reduce the overhead needed to parse through them by hand.

Features

Retrieve, compile, and visualize key metrics (e.g. net sales, gross margin) across a timespan
- Default metrics are: Net Sales, Gross Margin, and Total Cost of Operations. I picked these since they seem to be discussed quite commonly across the board for the tickers I've selected. They stand to be crucial in an intuitive sense in evaluating the financial status of a company.
- Ability to customize metrics that the LLM retrieves from 10-K filings
Compare key metrics across three different companies/tickers
Generate summaries of important sections of a particular Form 10-K

Tech Stack

Below is a list of libraries/tools I've used heavily in this project:

edgartools: One of the most polished and well-featured libraries for retrieving 10-K filings from SEC EDGAR that I've encountered. Ability to extract raw text from each filing with ease. Also works especially well as a CLI tool for debugging/exploration.
gemini-1.5-flash-latest: The LLM API of choice for this project. It supports a very generous input context window (up to 1 million tokens), which is ideal for supporting such a large document as Form 10-K. Additionally, it is capable of generating responses in JSON format, making it especially easier to work with the retrieved data for visualization.
streamlit: Used for the UI frontend for displaying visualizations, showing user options, input for calling the LLM API, etc. Library was especially intuitive and was hassle-free for the most part. Additionally, I used Streamlit Community Cloud to host this project site.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.cache		.cache
assets		assets
.gitignore		.gitignore
README.md		README.md
constants.py		constants.py
generate.py		generate.py
main.py		main.py
requirements.txt		requirements.txt
ui.py		ui.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM + 10-K

Features

Tech Stack

About

Releases

Packages

Languages

richardso21/llm-plus-10k

Folders and files

Latest commit

History

Repository files navigation

LLM + 10-K

Features

Tech Stack

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages