This project involved developing a Python script for a client in the agriculture industry to extract current price updates for vegetables, fruits, and ornamentals from their website. The extracted data was then subjected to exploratory data analysis (EDA) and machine learning algorithms to identify patterns and predict future price trends. The results were inserted into a PostgreSQL database and displayed in a real-time dashboard for the client.
The project followed best practices in data science, including modular programming, data cleaning, and normalization. The code is organized into several modules, each with a specific task, and follows the DRY (Don't Repeat Yourself) principle. The modules include data extraction, data cleaning, data normalization, EDA, machine learning, and database management.
The data extraction module uses web scraping techniques to extract data from the client's website. The data cleaning module removes any irrelevant or incomplete data and handles missing values. The data normalization module ensures that the data is in a consistent and standardized format.
The EDA module includes visualizations and statistical analysis to understand the patterns and relationships in the data. The machine learning module includes algorithms for regression and time-series forecasting to predict future price trends.
Finally, the data is inserted into a PostgreSQL database and displayed in a real-time dashboard for the client to monitor. The code is fully documented, and the project follows best practices in version control using git.
Overall, this project demonstrates the ability to apply data science best practices to solve real-world problems and deliver actionable insights to clients.