HuggingFaceQaInference EngineException on Longformer models #3546

xxx24xxx · 2024-11-29T14:31:17Z

xxx24xxx
Nov 29, 2024

I implemented "Question-Answering" using "deepset/bert-base-cased-squad2" model from Huggingface.
The code below works and produces the expected result.

public class HuggingFaceLongformQaInference {
	public static void main(String[] args) throws IOException, TranslateException, ModelException {
		String question = "Where is my house?";
		String paragraph = "My house is in London.";

		QAInput input = new QAInput(question, paragraph);
		String answer = HuggingFaceLongformQaInference.qa_predict(input);
		// --> London
	}

	public static String qa_predict(QAInput input) throws IOException, TranslateException, ModelException {
		Criteria<QAInput, String> criteria = Criteria.builder()
				.setTypes(QAInput.class, String.class)
				.optModelPath(Paths.get("./model/bert-base-cased-squad2/bert-base-cased-squad2.pt"))
				//.optModelPath(Paths.get("./model/longformer-base-4096-finetuned-squadv2/longformer-base-4096-finetuned-squadv2.pt"))
				.optTranslatorFactory(new QuestionAnsweringTranslatorFactory())
				.optEngine("PyTorch")
				.optArgument("includeTokenTypes", "true")
				.optProgress(new ProgressBar())
				.build();

		ZooModel<QAInput, String> model = criteria.loadModel();
		try (Predictor<QAInput, String> predictor = model.newPredictor()) {
			return predictor.predict(input);
		}
	}
}

However, if I change the huggingface model to 'mrm8488/longformer-base-4096-finetuned-squadv2', which is a Longformer model that takes input paragraphs up to 4096 tokens, I get the following error:

Caused by: ai.djl.engine.EngineException: Expected at most 3 argument(s) for operator 'forward', but received 4 argument(s). Declaration: forward(__torch__.transformers.models.longformer.modeling_longformer.LongformerForQuestionAnswering self, Tensor input_ids, Tensor attention_mask) -> Dict(str, Tensor)

How can I solve this problem? Any help is welcome.

Answered by frankfliu

Dec 3, 2024

@xxx24xxx You are right, padding the input doesn't really work.
I think you have to modify the model code in modeling_longformer.py line 678

change:

            attn_output[is_index_global_attn_nonzero[::-1]] = nonzero_global_attn_output.view(
                len(is_local_index_global_attn_nonzero[0]), -1
            )

to:

            attn_output[is_index_global_attn_nonzero[::-1]] = nonzero_global_attn_output.view(
                is_local_index_global_attn_nonzero[0].shape[0], -1
            )

You don't need padding any more, the exsiting djl-convert and java code should work.

View full answer

frankfliu · 2024-11-29T15:37:54Z

frankfliu
Nov 29, 2024

I think the problem is:

.optArgument("includeTokenTypes", "true")

The model doesn't take token_type_ids as input. If you convert the model with djl-convert, all necessary arguments should be set properly. You only used .optArgument() when you understand the impact.

djl-convert -m mrm8488/longformer-base-4096-finetuned-squadv2

0 replies

xxx24xxx · 2024-12-02T07:40:12Z

xxx24xxx
Dec 2, 2024
Author

Thanks @frankfliu for your answer.

Correct, I converted the model with djl-convert -m mrm8488/longformer-base-4096-finetuned-squadv2.
If I build the Criteria without the includeTokenTypes property, I get an error in the TorchScript interpreter:

Exception in thread "main" ai.djl.translate.TranslateException: ai.djl.engine.EngineException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/transformers/models/longformer/modeling_longformer.py", line 23, in forward
    _3 = torch.expand_as(attention_mask0, input_ids)
    global_attention_mask = torch.to(torch.lt(_3, question_end_index0), 11)
    _4 = (longformer).forward(input_ids, global_attention_mask, attention_mask, )
          ~~~~~~~~~~~~~~~~~~~ <--- HERE
    _5 = torch.split((qa_outputs).forward(_4, ), 1, -1)
    start_logits, end_logits, = _5
  File "code/__torch__/transformers/models/longformer/modeling_longformer.py", line 66, in forward
    attention_mask2 = torch.slice(_21, 1, 0, 9223372036854775807)
RuntimeError: shape mismatch: value tensor of shape [8, 576] cannot be broadcast to indexing result of shape [6, 768]
	at ai.djl.inference.Predictor.batchPredict(Predictor.java:197)
	at ai.djl.inference.Predictor.predict(Predictor.java:133)

0 replies

xxx24xxx · 2024-12-02T14:02:07Z

xxx24xxx
Dec 2, 2024
Author

This is the code to reproduce the ai.djl.translate.TranslateException

convert the huggingface model: djl-convert -m mrm8488/longformer-base-4096-finetuned-squadv2
Use the model as follows:

import ai.djl.ModelException;
import ai.djl.huggingface.translator.QuestionAnsweringTranslatorFactory;
import ai.djl.inference.Predictor;
import ai.djl.modality.nlp.qa.QAInput;
import ai.djl.repository.zoo.Criteria;
import ai.djl.repository.zoo.ZooModel;
import ai.djl.training.util.ProgressBar;
import ai.djl.translate.TranslateException;

import java.io.IOException;
import java.nio.file.Paths;

public class HuggingFaceLongformQaInference {
	public static void main(String[] args) throws IOException, TranslateException, ModelException {
		String question = "Where is my house?";
		String paragraph = "My house is in London.";
		QAInput input = new QAInput(question, paragraph);

		String answer = HuggingFaceLongformQaInference.qa_predict(input);
		System.out.println(answer); // --> London
	}

	public static String qa_predict(QAInput input) throws IOException, TranslateException, ModelException {
		QuestionAnsweringTranslatorFactory questionAnsweringTranslatorFactory = new QuestionAnsweringTranslatorFactory();
		Criteria<QAInput, String> criteria = Criteria.builder()
				.setTypes(QAInput.class, String.class)
				.optModelPath(Paths.get("./model/longformer-base-4096-finetuned-squadv2/longformer-base-4096-finetuned-squadv2.pt"))
				.optTranslatorFactory(new QuestionAnsweringTranslatorFactory())
				.optEngine("PyTorch")
				.optProgress(new ProgressBar())
				.build();

		ZooModel<QAInput, String> model = criteria.loadModel();
		try (Predictor<QAInput, String> predictor = model.newPredictor()) {
			return predictor.predict(input);
		}
	}
}

gradle dependencies:

	implementation platform("ai.djl:bom:0.31.0")
	implementation "ai.djl.pytorch:pytorch-model-zoo"
	implementation("ai.djl.huggingface:tokenizers")

2 replies

frankfliu Dec 2, 2024

@xxx24xxx

I looked into this model, it doesn't handle dynamic input shape in a graceful way. So when you jit trace the model, the shape at certain layer is frozen at trace time. The workaround is to always pad the input to max_model_length.

What you need to do is:

change the line: https://github.com/deepjavalibrary/djl/blob/master/extensions/tokenizers/src/main/python/djl_converter/question_answering_converter.py#L65-L67 to:

        return tokenizer.encode_plus(text,
                                     text_pair=text_pair,
                                     padding="max_length",
                                     max_length=4096,
                                     return_tensors='pt')

convert the model with updated python code
Add the following arguments to your java code:

                .optArgument("modelMaxLength", 4096)
                .optArgument("padding", "MAX_LENGTH")

xxx24xxx Dec 3, 2024
Author

Thank you @frankfliu for your time and analysis.
I have adjusted 'djl-convert' according your input and converted the model again.
Also in Java code I made sure to pad the input to modelMaxLength (4096)

But unfortunately I still get the same error.

RuntimeError: shape mismatch: value tensor of shape [8, 576] cannot be broadcast to indexing result of shape [6, 768]

/home/workspace/djl/.venv/lib/python3.10/site-packages/transformers/models/longformer/modeling_longformer.py(677): forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1726): _slow_forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1747): _call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1736): _wrapped_call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/transformers/models/longformer/modeling_longformer.py(1173): forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1726): _slow_forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1747): _call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1736): _wrapped_call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/transformers/models/longformer/modeling_longformer.py(1237): forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1726): _slow_forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1747): _call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1736): _wrapped_call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/transformers/models/longformer/modeling_longformer.py(1309): forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1726): _slow_forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1747): _call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1736): _wrapped_call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1726): _slow_forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1747): _call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1736): _wrapped_call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/transformers/models/longformer/modeling_longformer.py(2074): forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1726): _slow_forward
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1747): _call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1736): _wrapped_call_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/jit/_trace.py(1278): trace_module
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/jit/_trace.py(698): _trace_impl
/home/workspace/djl/.venv/lib/python3.10/site-packages/torch/jit/_trace.py(1002): trace
/home/workspace/djl/.venv/lib/python3.10/site-packages/djl_converter/huggingface_converter.py(286): jit_trace_model
/home/workspace/djl/.venv/lib/python3.10/site-packages/djl_converter/huggingface_converter.py(231): save_pytorch_model
/home/workspace/djl/.venv/lib/python3.10/site-packages/djl_converter/huggingface_converter.py(64): save_model
/home/workspace/djl/.venv/lib/python3.10/site-packages/djl_converter/model_converter.py(74): main
/home/workspace/djl/.venv/bin/djl-convert(8): <module>
RuntimeError: shape mismatch: value tensor of shape [8, 576] cannot be broadcast to indexing result of shape [6, 768]

	at ai.djl.pytorch.jni.PyTorchLibrary.moduleRunMethod(Native Method)
	at ai.djl.pytorch.jni.IValueUtils.forward(IValueUtils.java:57)
	at ai.djl.pytorch.engine.PtSymbolBlock.forwardInternal(PtSymbolBlock.java:146)
	at ai.djl.nn.AbstractBaseBlock.forward(AbstractBaseBlock.java:79)
	at ai.djl.nn.Block.forward(Block.java:127)
	at ai.djl.inference.Predictor.predictInternal(Predictor.java:147)
	at ai.djl.inference.Predictor.batchPredict(Predictor.java:188)

I have tried other 'Longformer' models, but no luck so far (same error).
Maybe I missed something when you got it to work?

frankfliu · 2024-12-03T17:53:02Z

frankfliu
Dec 3, 2024

@xxx24xxx You are right, padding the input doesn't really work.
I think you have to modify the model code in modeling_longformer.py line 678

change:

            attn_output[is_index_global_attn_nonzero[::-1]] = nonzero_global_attn_output.view(
                len(is_local_index_global_attn_nonzero[0]), -1
            )

to:

            attn_output[is_index_global_attn_nonzero[::-1]] = nonzero_global_attn_output.view(
                is_local_index_global_attn_nonzero[0].shape[0], -1
            )

You don't need padding any more, the exsiting djl-convert and java code should work.

3 replies

xxx24xxx Dec 4, 2024
Author

Thank you @frankfliu - that worked. 👍
The only disadvantage is that you have to limit the paragraph to 512 tokens in QAInput.

frankfliu Dec 4, 2024

You can use the following code to remove 512 limit:

.optArgument("modelMaxLength", 4096)

xxx24xxx Dec 5, 2024
Author

Yes, I tried that, but then I got very bad prediction results in my case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFaceQaInference EngineException on Longformer models #3546

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

HuggingFaceQaInference EngineException on Longformer models #3546

xxx24xxx Nov 29, 2024

Replies: 4 comments · 5 replies

frankfliu Nov 29, 2024

xxx24xxx Dec 2, 2024 Author

xxx24xxx Dec 2, 2024 Author

frankfliu Dec 2, 2024

xxx24xxx Dec 3, 2024 Author

frankfliu Dec 3, 2024

xxx24xxx Dec 4, 2024 Author

frankfliu Dec 4, 2024

xxx24xxx Dec 5, 2024 Author

xxx24xxx
Nov 29, 2024

Replies: 4 comments 5 replies

frankfliu
Nov 29, 2024

xxx24xxx
Dec 2, 2024
Author

xxx24xxx
Dec 2, 2024
Author

xxx24xxx Dec 3, 2024
Author

frankfliu
Dec 3, 2024

xxx24xxx Dec 4, 2024
Author

xxx24xxx Dec 5, 2024
Author