-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbest-model.py
55 lines (43 loc) · 2.02 KB
/
best-model.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#Best model
import pickle
from sklearn.metrics import classification_report
from sklearn.feature_extraction.text import TfidfVectorizer
import preprocessor as prep ###### Twitter preprocessor
from sklearn.externals import joblib # to save & load the model
## Load the model:
cbc = joblib.load('./Models/best-model.joblib')
"""
Both files are stored in .pkl format.
1) x_test : list containing all tweets of users
2) y_test : contains binary class values as 1: Hate | 0:Counter
"""
## Load your testing test features and lablels:
x_test = pickle.load(open(os.path.join(PATH,'x_test.pkl'),'rb'))
y_test = pickle.load(open(os.path.join(PATH,'y_test.pkl'),'rb'))
"""
We have already described how to create Tfidf and Lexicon Features in our previous model.
Now to get complete feature set use: Vader,Textblob and profanity.
# Dependencies:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from textblob import textblob
Profanity : Download the profane words from:
https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
"""
#Run vader sentiment on x_test, you will get 4 different sentiment values namely neutral,negative
#,positive,compound and for textblob you will get polarity and subjectivity.
#Store them as sentiment.pkl and textblob.pkl.
# For calculating profanity we check the number of profane words in tweets of individual users
#to prepare a list for all users and store it as profane.pkl.
"""
So Here features are composed of :
tfidf[word+char] + user history + Lexical[empath] + Sentiments[Vader+TextBlob+Profane]
in the respective order
"""
feature = np.c_[np.asarray(word_features.todense()),np.asarray(char_features.todense()),lexical_features,sentiment.pkl,textblob.pkl,profane.pkl]
"""
We have provided our already prepared feature vector and labels for a test run :
./Models/features.pkl | ./Models/labels.pkl
"""
#print classification report of your model's performance:
print(classification_report(y_test,cbc.predict(features)))
print('Accuracy:',cbc.score(features,y_test))