Welcome to my GitHub repository! This is a collection of projects that I've worked on, both in class and personally, across various areas of programming, data science, and technology consulting. Each project showcases different skills and concepts I've learned and applied.
- Description: This project investigates how U.S. unemployment rate announcements impact the stock prices of three major ETFs: SPY, QQQ, and IWM. By analyzing daily trading data, the project aims to identify patterns in ETF performance around the time of these announcements and provide insights for investors on how to anticipate market movements.
- Key Concepts: Event study methodology, abnormal returns, economic indicators, market volatility, ETF performance, unemployment rate impact on financial markets.
- Technologies: Python, Pandas, Yahoo Finance API, Data Visualization (Infogram).
- Description: In this project, I aimed to predict the popularity of a song by analyzing both audio features and visual data. I focused on songs that have made it to the Billboard Global Top 200 chart as an indicator of popularity. Billboard's ranking system takes into account various factors such as online streaming, physical sales, social media engagement, and radio plays, making it a comprehensive measure of a song’s success.
- Key Concepts: I used data from Spotify to gather audio features (e.g., tempo, danceability) and from YouTube to collect video metrics (e.g., view count, like count). This combination of features reflects the multifaceted nature of what makes a song popular, beyond just its audio qualities.
- Technologies: Python, Spotify API, YouTube API, Billboard data, Data Visualization.
- Description: In this project, I focused on performing sentiment analysis on tweets written in Algerian Arabic, a language with limited NLP resources. Using a dataset from the "AfriSenti-SemEval 2023 Shared Task," I applied various machine learning models like Naive Bayes, SVM, Logistic Regression, Random Forest, LSTM, and a custom 2-step classification method to classify the sentiment of tweets as positive, neutral, or negative. The 2-step classification model achieved the highest accuracy at 70%.
- Key Concepts: Sentiment analysis, machine learning, natural language processing (NLP), 2-step classification, data augmentation, text preprocessing.
- Technologies: Python, Scikit-learn, TensorFlow, Aravec Embeddings, Hugging Face API.