This is my first data science project text binary classification, dataset from https://www.kaggle.com/datasets/andrewmvd/cyberbullying-classification
As social media usage becomes increasingly prevalent in every age group, a vast majority of citizens rely on this essential medium for day-to-day communication. Social media’s ubiquity means that cyberbullying can effectively impact anyone at any time or anywhere, and the relative anonymity of the internet makes such personal attacks more difficult to stop than traditional bullying.
On April 15th, 2020, UNICEF issued a warning in response to the increased risk of cyberbullying during the COVID-19 pandemic due to widespread school closures, increased screen time, and decreased face-to-face social interaction. The statistics of cyberbullying are outright alarming: 36.5% of middle and high school students have felt cyberbullied and 87% have observed cyberbullying, with effects ranging from decreased academic performance to depression to suicidal thoughts.
In light of all of this, this dataset contains more than 47000 tweets labelled according to the class of cyberbullying:
Age; Ethnicity; Gender; Religion; Other type of cyberbullying; Not cyberbullying The data has been balanced in order to contain ~8000 of each class.
Trigger Warning These tweets either describe a bullying event or are the offense themselves, therefore explore it to the point where you feel comfortable.