This application leverages OpenAI's language models, the pandas data analysis library, and LlamaIndex's agents and tools to provide users with real-time qualitative data analysis and insights of the upload file.
This application is powered by several libraries:
- Streamlit: For the User Interface 🖥️
- Pandas: For performing data analysis 📊
- LlamaIndex: For creating LLMs agents and tools 🔗
- OpenAI: The Large Language Models (LLM) provider 🧠
The Python Runtime Environment should be installed on your computer. Please choose the latest version of Python 3. The tested Python version is 3.10.12 on Ubuntu 22.04.5 LTS.
Clone the repository and install the dependencies:
git clone [this repository]
cd LlamaIndex-Chat-with-pandas-DataFrame
python3 -m pip install -r requirements.txt
streamlit run chat_pandas_df_llamaindex.py
Thanks to the graphical user interface, the usage of this application is pretty tuitive. 🤓
- Paste your OpenAI key on the sidebar. The key won't be stored in anywhere. 🚫
- Upload your data file. We only support CSV, XLS, XLSX, XLSM, and XLSB file types with 200MB size limitation. 📂
- Enter your query text about the data file or even unrelated question. ❓
- The LLM agent will perform data analysis by tool calling or directly anwser your question based on your query text and data file. 💡
Note: If you don't have a suitable data file. Sample datasets are provided on my github as well.
- Natural language interface for automatic data analyzing by LLM agent and tools. 📊
- Support variouse file types fo dataset. 📄
- Implementation of LlamaIndex. (Note that most of the related projects are developed by LangChain) 👍
- Cannot deal with non-tabular data, or extract tabular data from unsupported file types. 💔
- Cannot perform a data analysis on large datasets since the LLM has its token limitation. 🚫
- The data is not cached and the analysis report as well. 🔄
- Voice interface: Convert user's speech to text and perform a data analysis 🗣️
- Third-party's data sources: Integrating internal and external data without file uploading 🤝
- Perform intermediates checkings on the results to avoid LLM bias 🤔
- Handle larger datasets 📚
My name is Sheldon Hsin-Peng Lin. I'm a software engineer and a research staff. I build various applications in telecommunication industry. 👨🔧 Since LLMs are really good at understanding human semantics, and an agent can perform data analysis by LLM reasoning and tool calling. 📚 This application is developed based on the above conditions, and I hope it can help you as well. 👍
The application is greatly inspired by LangChain Streamlit agent examples. ❤️