-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BERT Model Example #32
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,478 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #10. hash_vocab('bert-base-cased-vocab.txt', 'voc_hash.txt')
Todo Figure out if we can get a download link for this.
Reply via ReviewNB
@@ -0,0 +1,478 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #16. df = getDF('/nvme/1/ssayyah/nv-wip/amazon_bookreview.json.gz')
Todo Figure out if we can get a download link for this.
Reply via ReviewNB
@@ -0,0 +1,478 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #5. bert = AutoModel.from_pretrained('bert-base-uncased')
Replace with this model :
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
Reply via ReviewNB
@@ -0,0 +1,478 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do it to a fixed length to keep the example minimal , so something like , maybe something like 256 ?
Reply via ReviewNB
@@ -0,0 +1,478 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,478 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #1. train_seq = torch.tensor(tokens_train['input_ids'])
This goes cuda array interface rather than dlpack . Are there performance implications of either ?
Reply via ReviewNB
First PR draft of an example of using cuML BERT tokenizer and model for sentiment classification on Amazon book review dataset.