-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where are the samples of automated evaluation? #19
Comments
Are those samples in the [ human_anotation/pplm_labled_csvs ] directory? |
Hi, |
@ehsan-soe hi import math
import torch
from transformers import OpenAIGPTLMHeadModel, OpenAIGPTTokenizer
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
model = OpenAIGPTLMHeadModel.from_pretrained('openai-gpt')
model.eval()
def score(sent):
indexed_tokens = tokenizer.encode(sent)
tokens_tensor = torch.tensor([indexed_tokens])
with torch.no_grad():
outputs = model.forward(tokens_tensor, labels=tokens_tensor)
loss = outputs[0]
return math.exp(loss.item())
sents =['there is a book on the desk',
'there is a plane on the desk',
'there is a book in the desk']
print([score(s) for s in sents]) |
@ehsan-soe do you know how can I use this code to get the perplexity scores of paper? |
@dathath Sorry I can't find any samples generated in this repository, can you give me the specified location or some instructions on how can I use this code to get the perplexity scores of PPLM? |
@Guaguago Thanks. |
@ehsan-soe You can compute the perplexity of the generated text with regard to another language model (GPT), which is what we do here. @Guaguago human_annotation/pplm_labeled_csvs has the generated samples. You can read the csvs into python and then process the samples using GPT to compute perplexity. |
@dathath Thanks. Can you correct me if I am wrong? are perplexities usually computed on the test set? that is using the NLL of the trained model on the target text, right? |
@ehsan-soe Soga! Thank you! |
You can use the 'parse_*.ipynb' notebooks to process the CSVs. That should give you samples from different models separately. |
@dathath Thank you very much and I will try it! And I have made two programs to test PPL and distinct-n respectively according to your suggestions before. But the scores are not as same as the paper, so I want to do make a further check about 3 questions:
|
Are your scores in the same range as the paper?
|
@dathath Thank you so much! Really helpful clue by which I have got the exactly same Dist-1,2,3 scores as paper. But for PPL, most of the scores I have got are a little bit greater than the paper's(about 0-1.5 error range), so I have some questions:
import math
import torch
from transformers import OpenAIGPTLMHeadModel, OpenAIGPTTokenizer
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
model = OpenAIGPTLMHeadModel.from_pretrained('openai-gpt')
model.eval()
def score(sent):
indexed_tokens = tokenizer.encode(sent)
tokens_tensor = torch.tensor([indexed_tokens])
with torch.no_grad():
outputs = model.forward(tokens_tensor, labels=tokens_tensor)
loss = outputs[0]
return math.exp(loss.item()) |
Yes, the samples are the same. This is what we do, so should ideally match -- probably can try matching with the perplexities by topic/sentiment (see appendix). The Layer-norm layers used in the hugging-face implementation of transformers seem to have changed a little-bit between versions. I suspect this might be one possible cause for the discrepancy if you're using a recent version of "pytorch-transformers". |
@dathath Is there any special process to the token "<|endoftext|>" and the '\n' within a sentence in the calculation of PPL? |
@dathath After having solved some bugs and warnings, I found that my results of PPL turn to much lower than the paper but the dist scores almost ideally match. I have tried different versions of transformers but the results unchanged. So could you correct me if there are some wrong steps in my process to separate out samples for each model as follows?
I need some help, appreciate your reply |
Hi,
Can you drop me (and Andrea) an email? The issue is a little hard to follow here. I will try to respond by the weekend. |
@dathath @Andrea Hi, thank you, you are so nice! This is my email: [email protected] |
@dathath Hi |
Thanks for your reply, I have written an program to calculate perplexity by hugging-face transformers interface.
But I am not sure which samples are used for perplexity calculation.
The text was updated successfully, but these errors were encountered: