Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E2E] Timm models got fail_accuracy #1217

Open
Tracked by #1223
mengfei25 opened this issue Dec 26, 2024 · 2 comments
Open
Tracked by #1223

[E2E] Timm models got fail_accuracy #1217

mengfei25 opened this issue Dec 26, 2024 · 2 comments

Comments

@mengfei25
Copy link
Contributor

🐛 Describe the bug

Failed dtype: bfloat16 and amp_bf16. float32, float16 and amp_fp16 passed
python benchmarks/dynamo/timm_models.py --accuracy --float16 -d xpu -n10 --training--only tf_efficientnet_b0 --backend=inductor
============ Summary for timm_models bfloat16 inference accuracy ============
Real failed models: 13 [['tf_efficientnet_b0', 'fail_accuracy'], ['spnasnet_100', 'fail_accuracy'], ['inception_v3', 'fail_accuracy'], ['regnety_002', 'fail_accuracy'], ['dla102', 'fail_accuracy'], ['dpn107', 'fail_accuracy'], ['hrnet_w18', 'fail_accuracy'], ['lcnet_050', 'fail_accuracy'], ['swsl_resnext101_32x16d', 'fail_accuracy'], ['fbnetc_100', 'fail_accuracy'], ['ghostnet_100', 'fail_accuracy'], ['dm_nfnet_f0', 'fail_accuracy'], ['mnasnet_100', 'fail_accuracy']]
============ Summary for timm_models bfloat16 training accuracy ============
Real failed models: 30 [['lcnet_050', 'fail_accuracy'], ['mixnet_l', 'fail_accuracy'], ['regnety_002', 'fail_accuracy'], ['res2next50', 'fail_accuracy'], ['ese_vovnet19b_dw', 'fail_accuracy'], ['convmixer_768_32', 'fail_accuracy'], ['mobilenetv3_large_100', 'fail_accuracy'], ['fbnetv3_b', 'fail_accuracy'], ['repvgg_a2', 'fail_accuracy'], ['selecsls42b', 'fail_accuracy'], ['swsl_resnext101_32x16d', 'fail_accuracy'], ['tf_efficientnet_b0', 'fail_accuracy'], ['dla102', 'fail_accuracy'], ['dm_nfnet_f0', 'fail_accuracy'], ['mnasnet_100', 'fail_accuracy'], ['inception_v3', 'fail_accuracy'], ['mobilenetv2_100', 'fail_accuracy'], ['spnasnet_100', 'fail_accuracy'], ['visformer_small', 'fail_accuracy'], ['adv_inception_v3', 'fail_accuracy'], ['gluon_inception_v3', 'fail_accuracy'], ['ghostnet_100', 'fail_accuracy'], ['res2net50_14w_8s', 'fail_accuracy'], ['hrnet_w18', 'fail_accuracy'], ['pnasnet5large', 'fail_accuracy'], ['convnext_base', 'fail_accuracy'], ['tf_mixnet_l', 'fail_accuracy'], ['rexnet_100', 'fail_accuracy'], ['res2net101_26w_4s', 'fail_accuracy'], ['fbnetc_100', 'fail_accuracy']]
============ Summary for timm_models amp_bf16 inference accuracy ============
Real failed models: 19 [['hrnet_w18', 'fail_accuracy'], ['swsl_resnext101_32x16d', 'fail_accuracy'], ['regnety_002', 'fail_accuracy'], ['lcnet_050', 'fail_accuracy'], ['rexnet_100', 'fail_accuracy'], ['gluon_inception_v3', 'fail_accuracy'], ['adv_inception_v3', 'fail_accuracy'], ['dla102', 'fail_accuracy'], ['ghostnet_100', 'fail_accuracy'], ['mnasnet_100', 'fail_accuracy'], ['mobilenetv2_100', 'fail_accuracy'], ['fbnetc_100', 'fail_accuracy'], ['res2net101_26w_4s', 'fail_accuracy'], ['res2net50_14w_8s', 'fail_accuracy'], ['dpn107', 'fail_accuracy'], ['inception_v3', 'fail_accuracy'], ['res2next50', 'fail_accuracy'], ['spnasnet_100', 'fail_accuracy'], ['tf_efficientnet_b0', 'fail_accuracy']]
============ Summary for timm_models amp_bf16 training accuracy ============
Real failed models: 29 [['convmixer_768_32', 'fail_accuracy'], ['fbnetc_100', 'fail_accuracy'], ['hrnet_w18', 'fail_accuracy'], ['ese_vovnet19b_dw', 'fail_accuracy'], ['ghostnet_100', 'fail_accuracy'], ['regnety_002', 'fail_accuracy'], ['tf_mixnet_l', 'fail_accuracy'], ['mixnet_l', 'fail_accuracy'], ['repvgg_a2', 'fail_accuracy'], ['selecsls42b', 'fail_accuracy'], ['rexnet_100', 'fail_accuracy'], ['res2net101_26w_4s', 'fail_accuracy'], ['res2next50', 'fail_accuracy'], ['fbnetv3_b', 'fail_accuracy'], ['swsl_resnext101_32x16d', 'fail_accuracy'], ['adv_inception_v3', 'fail_accuracy'], ['spnasnet_100', 'fail_accuracy'], ['mobilenetv3_large_100', 'fail_accuracy'], ['inception_v3', 'fail_accuracy'], ['visformer_small', 'fail_accuracy'], ['gluon_inception_v3', 'fail_accuracy'], ['lcnet_050', 'fail_accuracy'], ['tf_efficientnet_b0', 'fail_accuracy'], ['dla102', 'fail_accuracy'], ['mobilenetv2_100', 'fail_accuracy'], ['mnasnet_100', 'fail_accuracy'], ['pnasnet5large', 'fail_accuracy'], ['eca_halonext26ts', 'fail_accuracy'], ['res2net50_14w_8s', 'fail_accuracy']]

Versions

env:
python: 3.10
XPU_OPS: 9ed0a1a
TRITON_COMMIT_ID: e98b6fcb8df5b44eb0d0addb6767c573d37ba024
TORCH_COMMIT_ID: 4f8b7c4272db521f7ffc4070ce1bdece513d1183
TORCHBENCH_COMMIT_ID: 03cde49eba0580ed17f9ae2250832fd8af4ed756
TORCHVISION_COMMIT_ID: d23a6e1664d20707c11781299611436e1f0c104f
TORCHAUDIO_COMMIT_ID: a6b0a140cc13216975e8922093459019537bb80a
TRANSFORMERS_VERSION: 243e186efbf7fb93328dd6b34927a4e8c8f24395
TIMM_COMMIT_ID: ac3470188b914c5d7a5058a7e28b9eb685a62427
DRIVER_VERSION: 1.23.10.49.231129.50
KERNEL_VERSION: 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023
BUNDLE_VERSION: 2025.0.1.20241113
OS_PRETTY_NAME: Ubuntu 22.04.2 LTS
GCC_VERSION: 11

@mengfei25
Copy link
Contributor Author

Caused by e035f6b

@mengfei25
Copy link
Contributor Author

Most are fixed, and only the following are still failed
============ Summary for timm_models bfloat16 training accuracy ============
Real failed models: 1 [['convnext_base', 'fail_accuracy']]
============ Summary for timm_models amp_bf16 training accuracy ============
Real failed models: 2 [['fbnetv3_b', 'fail_accuracy'], ['eca_halonext26ts', 'fail_accuracy']]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant