You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I ran the finetune, there is an eeror with loading the data as follow:
File "run_squad.py", line 826, in
main()
File "run_squad.py", line 768, in main
train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)
File "run_squad.py", line 447, in load_and_cache_examples
examples = processor.get_train_examples(args.data_dir, filename=args.train_file)
File "/usr/local/lib/python3.6/dist-packages/transformers/data/processors/squad.py", line 578, in get_train_examples
return self._create_examples(input_data, "train")
File "/usr/local/lib/python3.6/dist-packages/transformers/data/processors/squad.py", line 605, in _create_examples
title = entry["title"]
KeyError: 'title'
That seems like the conversion is ignoring part of the data needed in Squad format.
I wonder if you can comment on this that what may be wrong or how to handle this issue.
Here is the run-time parameters for run_squad:
python run_squad.py
--model_type roberta
--model_name_or_path roberta-base
--do_train
--do_eval
--do_lower_case
--train_file $SQUAD_DIR/squad-wikipedia-train-4096.json
--predict_file $SQUAD_DIR/squad-wikipedia-dev-4096.json
--learning_rate 3e-5
--num_train_epochs 2.0
--max_seq_length 4096
--doc_stride 128
--output_dir ro_tri_st_debug_squad/
--fp16
--per_gpu_eval_batch_size 1
--per_gpu_train_batch_size 1
--gradient_accumulation_steps 8 \
The text was updated successfully, but these errors were encountered:
I am trying to finetune a RoBERTa (or BERT) model on TriviaQA. I am using question-answering example from Huggingface transformers. Before the training, I have done TriviaQA 2 Squad using convert_to_squad_format.py
When I ran the finetune, there is an eeror with loading the data as follow:
File "run_squad.py", line 826, in
main()
File "run_squad.py", line 768, in main
train_dataset = load_and_cache_examples(args, tokenizer, evaluate=False, output_examples=False)
File "run_squad.py", line 447, in load_and_cache_examples
examples = processor.get_train_examples(args.data_dir, filename=args.train_file)
File "/usr/local/lib/python3.6/dist-packages/transformers/data/processors/squad.py", line 578, in get_train_examples
return self._create_examples(input_data, "train")
File "/usr/local/lib/python3.6/dist-packages/transformers/data/processors/squad.py", line 605, in _create_examples
title = entry["title"]
KeyError: 'title'
That seems like the conversion is ignoring part of the data needed in Squad format.
I wonder if you can comment on this that what may be wrong or how to handle this issue.
Here is the run-time parameters for run_squad:
python run_squad.py
--model_type roberta
--model_name_or_path roberta-base
--do_train
--do_eval
--do_lower_case
--train_file $SQUAD_DIR/squad-wikipedia-train-4096.json
--predict_file $SQUAD_DIR/squad-wikipedia-dev-4096.json
--learning_rate 3e-5
--num_train_epochs 2.0
--max_seq_length 4096
--doc_stride 128
--output_dir ro_tri_st_debug_squad/
--fp16
--per_gpu_eval_batch_size 1
--per_gpu_train_batch_size 1
--gradient_accumulation_steps 8 \
The text was updated successfully, but these errors were encountered: