-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crop Segementation local training error & AML training error #67
Comments
Thanks for using FarmVibes.AI and reporting the issue, @Amr-MKamal. I'll investigate this and return to you as soon as possible. |
@Amr-MKamal I couldn't properly reproduce your error. Are you running for the same region of the notebook (within the Continental USA, where CDL is available)? The
That assertion checks if all the values in the mask (in this case, the CDL maps) are defined and finite. This should be the case for the CDL, as the samples generated by CDLMask dataset are the result of a |
@rafaspadilha , the error was in the old version of this notebook , the error I'm getting now for local training in Section [4] (after trying the exmaple area for 2021-2022):
|
This error seems to happen because the Are you using the same region of the notebook or have you decreased the size of the input geometry? Please, let me know if that fixes you issue. |
I used the same input_region provided in the file example The chip size & other related training parameters I also didn't change :
I will try decreasing the chip size/img_size to 128 & tell you how that goes. |
I tried it down to only CHIP_SIZE = 1 & I still git the same error , minimizing this parameters alone or together doesn't solve the error @rafaspadilha |
I see. Please, could you check for me:
|
len(ndvi_rasters) result is : 330 |
However now I still get the assertion error in 6 :\ |
Hey, @Amr-MKamal. Yes, Are you still having |
@rafaspadilha Thank you , no I'm getting the same assertion error I got at the beginning in Cell [6] | Name | Type | Params0 | model | FPN | 23.3 M
|
@rafaspadilha as a final solution I thought about going to notebook_lib/models.py and I commented this section assert torch.all(torch.isfinite(t))`the rest of the cells in the local training notebook worked successfully and I was able to save the model to an onnx model
Note: that I get this error running the provided example in terms of area & date (2020) with the provided environment |
AssertionError Traceback (most recent call last) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in _call_and_handle_interrupt(self, trainer_fn, *args, **kwargs) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in _fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in _run(self, model, ckpt_path) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in _run_stage(self) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in _run_train(self) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in _run_sanity_check(self) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py in run(self, *args, **kwargs) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py in advance(self, *args, **kwargs) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py in run(self, *args, **kwargs) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py in advance(self, data_fetcher, dl_max_batches, kwargs) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py in _evaluation_step(self, **kwargs) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in _call_strategy_hook(self, hook_name, *args, **kwargs) ~/miniconda3/envs/crop-seg/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py in validation_step(self, *args, **kwargs) ~/farmvibes-ai/notebooks/crop_segmentation/notebook_lib/models.py in validation_step(self, batch, batch_idx) ~/farmvibes-ai/notebooks/crop_segmentation/notebook_lib/models.py in _shared_step(self, batch, batch_idx) AssertionError: |
I've been working my way around crop-segmentation notebook for a while , now I'm finally at the training stage , however I get errors for both local & AML traninging , for local training this what I get in[20] after running
trainer.fit(model,data)
:And for the AML training , the job fails after submitting it to a compute instant , again it appears to has something to do with inter-package compatibility , the error massage from AML:
Could really use help in moving forward from this.
The text was updated successfully, but these errors were encountered: