LlamaIndex: Chat with Pandas DataFrame

This application leverages OpenAI's language models, the pandas data analysis library, and LlamaIndex's agents and tools to provide users with real-time qualitative data analysis and insights of the upload file.

Python Libraries

This application is powered by several libraries:

Streamlit: For the User Interface 🖥️
Pandas: For performing data analysis 📊
LlamaIndex: For creating LLMs agents and tools 🔗
OpenAI: The Large Language Models (LLM) provider 🧠

Getting started 🏁

Requirements

The Python Runtime Environment should be installed on your computer. Please choose the latest version of Python 3. The tested Python version is 3.10.12 on Ubuntu 22.04.5 LTS.

Installation

Clone the repository and install the dependencies:

git clone [this repository]
cd LlamaIndex-Chat-with-pandas-DataFrame
python3 -m pip install -r requirements.txt

Run the application

streamlit run chat_pandas_df_llamaindex.py

Usage 📖

Thanks to the graphical user interface, the usage of this application is pretty tuitive. 🤓

Paste your OpenAI key on the sidebar. The key won't be stored in anywhere. 🚫
Upload your data file. We only support CSV, XLS, XLSX, XLSM, and XLSB file types with 200MB size limitation. 📂
Enter your query text about the data file or even unrelated question. ❓
The LLM agent will perform data analysis by tool calling or directly anwser your question based on your query text and data file. 💡

Note: If you don't have a suitable data file. Sample datasets are provided on my github as well.

Features ✨

Natural language interface for automatic data analyzing by LLM agent and tools. 📊
Support variouse file types fo dataset. 📄
Implementation of LlamaIndex. (Note that most of the related projects are developed by LangChain) 👍

Limitations ⚠️

Cannot deal with non-tabular data, or extract tabular data from unsupported file types. 💔
Cannot perform a data analysis on large datasets since the LLM has its token limitation. 🚫
The data is not cached and the analysis report as well. 🔄

Improvements 🚀

Voice interface: Convert user's speech to text and perform a data analysis 🗣️
Third-party's data sources: Integrating internal and external data without file uploading 🤝
Perform intermediates checkings on the results to avoid LLM bias 🤔
Handle larger datasets 📚

Background 🧑‍🎓

My name is Sheldon Hsin-Peng Lin. I'm a software engineer and a research staff. I build various applications in telecommunication industry. 👨‍🔧 Since LLMs are really good at understanding human semantics, and an agent can perform data analysis by LLM reasoning and tool calling. 📚 This application is developed based on the above conditions, and I hope it can help you as well. 👍

Acknowledgements 🙏

The application is greatly inspired by LangChain Streamlit agent examples. ❤️

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chat_pandas_df_llamaindex.py		chat_pandas_df_llamaindex.py
homepage.png		homepage.png
requirements.txt		requirements.txt
st_dev_info.py		st_dev_info.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LlamaIndex: Chat with Pandas DataFrame

Python Libraries

Getting started 🏁

Requirements

Installation

Run the application

Usage 📖

Features ✨

Limitations ⚠️

Improvements 🚀

Background 🧑‍🎓

Acknowledgements 🙏

About

Releases

Packages

Languages

License

hsinpeng/LlamaIndex-Chat-with-pandas-DataFrame

Folders and files

Latest commit

History

Repository files navigation

LlamaIndex: Chat with Pandas DataFrame

Python Libraries

Getting started 🏁

Requirements

Installation

Run the application

Usage 📖

Features ✨

Limitations ⚠️

Improvements 🚀

Background 🧑‍🎓

Acknowledgements 🙏

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages