Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/comet-logger-update #2

Draft
wants to merge 108 commits into
base: master
Choose a base branch
from

Conversation

japdubengsub
Copy link

@japdubengsub japdubengsub commented Sep 5, 2024

In this pull request, the CometML logger was updated to support the recent Comet SDK.
It has been unified with the comet_ml.start() method to ensure ease of use. The unit tests have also been updated.


📚 Documentation preview 📚: https://pytorch-lightning--2.org.readthedocs.build/en/2/

@github-actions github-actions bot added the pl label Sep 5, 2024
Copy link

github-actions bot commented Sep 5, 2024

⛈️ Required checks status: Has failure 🔴

Warning
This job will need to be re-run to merge your PR. If you do not have write access to the repository, you can ask Lightning-AI/lai-frameworks to re-run it. If you push a new commit, all of CI will re-trigger.

Groups summary

🔴 pytorch_lightning: Tests workflow
Check ID Status
pl-cpu (macOS-13, lightning, 3.9, 2.1, oldest) failure
pl-cpu (macOS-14, lightning, 3.10, 2.1) failure
pl-cpu (macOS-14, lightning, 3.11, 2.2) failure
pl-cpu (macOS-14, lightning, 3.11, 2.3) failure
pl-cpu (macOS-14, lightning, 3.12, 2.4) failure
pl-cpu (ubuntu-20.04, lightning, 3.9, 2.1, oldest) failure
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.1) failure
pl-cpu (ubuntu-20.04, lightning, 3.11, 2.2) failure
pl-cpu (ubuntu-20.04, lightning, 3.11, 2.3) failure
pl-cpu (ubuntu-20.04, lightning, 3.12, 2.4) failure
pl-cpu (windows-2022, lightning, 3.9, 2.1, oldest) failure
pl-cpu (windows-2022, lightning, 3.10, 2.1) failure
pl-cpu (windows-2022, lightning, 3.11, 2.2) failure
pl-cpu (windows-2022, lightning, 3.11, 2.3) failure
pl-cpu (windows-2022, lightning, 3.12, 2.4) failure
pl-cpu (macOS-14, pytorch, 3.9, 2.1) failure
pl-cpu (ubuntu-20.04, pytorch, 3.9, 2.1) failure
pl-cpu (windows-2022, pytorch, 3.9, 2.1) failure
pl-cpu (macOS-12, pytorch, 3.10, 2.1) failure
pl-cpu (ubuntu-22.04, pytorch, 3.10, 2.1) failure
pl-cpu (windows-2022, pytorch, 3.10, 2.1) failure

These checks are required after the changes to src/lightning/pytorch/loggers/comet.py, tests/tests_pytorch/loggers/conftest.py, tests/tests_pytorch/loggers/test_comet.py.

🟡 pytorch_lightning: Azure GPU
Check ID Status
pytorch-lightning (GPUs) (testing Lightning latest) no_status
pytorch-lightning (GPUs) (testing PyTorch latest) no_status

These checks are required after the changes to src/lightning/pytorch/loggers/comet.py, tests/tests_pytorch/loggers/conftest.py, tests/tests_pytorch/loggers/test_comet.py.

🟡 pytorch_lightning: Benchmarks
Check ID Status
lightning.Benchmarks no_status

These checks are required after the changes to src/lightning/pytorch/loggers/comet.py.

🔴 pytorch_lightning: Docs
Check ID Status
docs-make (pytorch, doctest) success
docs-make (pytorch, html) failure

These checks are required after the changes to src/lightning/pytorch/loggers/comet.py.

🟢 mypy
Check ID Status
mypy success

These checks are required after the changes to src/lightning/pytorch/loggers/comet.py.

🟡 install
Check ID Status
install-pkg (ubuntu-22.04, fabric, 3.9) no_status
install-pkg (ubuntu-22.04, fabric, 3.11) no_status
install-pkg (ubuntu-22.04, pytorch, 3.9) no_status
install-pkg (ubuntu-22.04, pytorch, 3.11) no_status
install-pkg (ubuntu-22.04, lightning, 3.9) no_status
install-pkg (ubuntu-22.04, lightning, 3.11) no_status
install-pkg (ubuntu-22.04, notset, 3.9) no_status
install-pkg (ubuntu-22.04, notset, 3.11) no_status
install-pkg (macOS-12, fabric, 3.9) no_status
install-pkg (macOS-12, fabric, 3.11) no_status
install-pkg (macOS-12, pytorch, 3.9) no_status
install-pkg (macOS-12, pytorch, 3.11) no_status
install-pkg (macOS-12, lightning, 3.9) no_status
install-pkg (macOS-12, lightning, 3.11) no_status
install-pkg (macOS-12, notset, 3.9) no_status
install-pkg (macOS-12, notset, 3.11) no_status
install-pkg (windows-2022, fabric, 3.9) no_status
install-pkg (windows-2022, fabric, 3.11) no_status
install-pkg (windows-2022, pytorch, 3.9) no_status
install-pkg (windows-2022, pytorch, 3.11) no_status
install-pkg (windows-2022, lightning, 3.9) no_status
install-pkg (windows-2022, lightning, 3.11) no_status
install-pkg (windows-2022, notset, 3.9) no_status
install-pkg (windows-2022, notset, 3.11) no_status

These checks are required after the changes to src/lightning/pytorch/loggers/comet.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

@japdubengsub japdubengsub marked this pull request as draft September 5, 2024 12:19
Copy link
Member

@Lothiraldan Lothiraldan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following example fails with this branch but pass with the latest version of lightnintg.

Lightning 2.4.0, experiment: https://www.comet.com/lothiraldan/comet-example-pytorch-lightning/64e6b0df893b435c93f54f1bc48a8958

Output:

CometLogger will be initialized in online mode
COMET INFO: Experiment is live on comet.com https://www.comet.com/lothiraldan/comet-example-pytorch-lightning/64e6b0df893b435c93f54f1bc48a8958

COMET INFO: Couldn't find a Git repository in '/tmp' nor in any parent directory. Set `COMET_GIT_DIRECTORY` if your Git Repository is elsewhere.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------

COMET INFO: Experiment is live on comet.com https://www.comet.com/lothiraldan/comet-example-pytorch-lightning/64e6b0df893b435c93f54f1bc48a8958


  | Name | Type   | Params | Mode 
----------------------------------------
0 | l1   | Linear | 7.9 K  | train
----------------------------------------
7.9 K     Trainable params
0         Non-trainable params
7.9 K     Total params
0.031     Total estimated model params size (MB)
1         Modules in train mode
0         Modules in eval mode
Sanity Checking: |                                                                                       | 0/? [00:00<?, ?it/s]/home/lothiraldan/.virtualenvs/tempenv-60a6200361ab/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
Sanity Checking DataLoader 0: 100%|██████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 38.05it/s]/home/lothiraldan/.virtualenvs/tempenv-60a6200361ab/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/logger_connector/result.py:431: It is recommended to use `self.log('val_loss', ..., sync_dist=True)` when logging on epoch level in distributed setting to accumulate the metric across devices.
/home/lothiraldan/.virtualenvs/tempenv-60a6200361ab/lib/python3.12/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
Epoch 2: 100%|███████████████████████████████████████████████████████████████████| 469/469 [00:31<00:00, 14.73it/s, v_num=8958]`Trainer.fit` stopped: `max_epochs=3` reached.                                                                                 
Epoch 2: 100%|███████████████████████████████████████████████████████████████████| 469/469 [00:31<00:00, 14.73it/s, v_num=8958]
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml ExistingExperiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     name                  : upset_soil_1490
COMET INFO:     url                   : https://www.comet.com/lothiraldan/comet-example-pytorch-lightning/64e6b0df893b435c93f54f1bc48a8958
COMET INFO:   Metrics [count] (min, max):
COMET INFO:     train_loss [28] : (0.4863688051700592, 1.2028049230575562)
COMET INFO:     val_loss [3]    : (0.9357529878616333, 0.9526914358139038)
COMET INFO:   Others:
COMET INFO:     Created from : pytorch-lightning
COMET INFO:   Parameters:
COMET INFO:     layer_size : 784
COMET INFO:   Uploads:
COMET INFO:     model graph : 1
COMET INFO: 
COMET INFO: Please wait for metadata to finish uploading (timeout is 3600 seconds)
COMET INFO: Uploading 1651 metrics, params and output messages
True
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     name                  : upset_soil_1490
COMET INFO:     url                   : https://www.comet.com/lothiraldan/comet-example-pytorch-lightning/64e6b0df893b435c93f54f1bc48a8958
COMET INFO:   Others:
COMET INFO:     Created from : pytorch-lightning
COMET INFO:   Parameters:
COMET INFO:     batch_size : 64
COMET INFO:   Uploads:
COMET INFO:     environment details : 1
COMET INFO:     filename            : 1
COMET INFO:     installed packages  : 1
COMET INFO:     source_code         : 2 (17.51 KB)
COMET INFO: 

This branch, experiment: https://www.comet.com/lothiraldan/comet-example-pytorch-lightning/26baa02c5c7244b4a5dc48a72e84392e

Output:

COMET INFO: Experiment is live on comet.com https://www.comet.com/lothiraldan/comet-example-pytorch-lightning/26baa02c5c7244b4a5dc48a72e84392e

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
COMET INFO: Couldn't find a Git repository in '/tmp' nor in any parent directory. Set `COMET_GIT_DIRECTORY` if your Git Repository is elsewhere.
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 2 processes
----------------------------------------------------------------------------------------------------


  | Name | Type   | Params | Mode 
----------------------------------------
0 | l1   | Linear | 7.9 K  | train
----------------------------------------
7.9 K     Trainable params
0         Non-trainable params
7.9 K     Total params
0.031     Total estimated model params size (MB)
1         Modules in train mode
0         Modules in eval mode
W0906 18:27:21.134000 140399680829248 torch/multiprocessing/spawn.py:146] Terminating process 4052339 via signal SIGTERM
Traceback (most recent call last):
  File "/tmp/Comet_and_Pytorch_Lightning.py", line 86, in <module>
    main()
  File "/tmp/Comet_and_Pytorch_Lightning.py", line 76, in main
    trainer.fit(model, train_loader, eval_loader)
  File "/home/lothiraldan/project/cometml/pytorch-lightning/src/lightning/pytorch/trainer/trainer.py", line 538, in fit
    call._call_and_handle_interrupt(
  File "/home/lothiraldan/project/cometml/pytorch-lightning/src/lightning/pytorch/trainer/call.py", line 46, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lothiraldan/project/cometml/pytorch-lightning/src/lightning/pytorch/strategies/launchers/multiprocessing.py", line 144, in launch
    while not process_context.join():
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lothiraldan/.virtualenvs/tempenv-5fbd1040246d4/lib/python3.12/site-packages/torch/multiprocessing/spawn.py", line 189, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/lothiraldan/.virtualenvs/tempenv-5fbd1040246d4/lib/python3.12/site-packages/torch/multiprocessing/spawn.py", line 76, in _wrap
    fn(i, *args)
  File "/home/lothiraldan/project/cometml/pytorch-lightning/src/lightning/pytorch/strategies/launchers/multiprocessing.py", line 173, in _wrapping_function
    results = function(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lothiraldan/project/cometml/pytorch-lightning/src/lightning/pytorch/trainer/trainer.py", line 574, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/lothiraldan/project/cometml/pytorch-lightning/src/lightning/pytorch/trainer/trainer.py", line 964, in _run
    _log_hyperparams(self)
  File "/home/lothiraldan/project/cometml/pytorch-lightning/src/lightning/pytorch/loggers/utilities.py", line 93, in _log_hyperparams
    logger.log_hyperparams(hparams_initial)
  File "/home/lothiraldan/.virtualenvs/tempenv-5fbd1040246d4/lib/python3.12/site-packages/lightning_utilities/core/rank_zero.py", line 42, in wrapped_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/lothiraldan/project/cometml/pytorch-lightning/src/lightning/pytorch/loggers/comet.py", line 282, in log_hyperparams
    self.experiment.__internal_api__log_parameters__(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute '__internal_api__log_parameters__'

COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO: ---------------------------------------------------------------------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     name                  : sleepy_monastery_3541
COMET INFO:     url                   : https://www.comet.com/lothiraldan/comet-example-pytorch-lightning/26baa02c5c7244b4a5dc48a72e84392e
COMET INFO:   Parameters:
COMET INFO:     batch_size : 64
COMET INFO:   Uploads:
COMET INFO:     environment details : 1
COMET INFO:     filename            : 1
COMET INFO:     installed packages  : 1
COMET INFO:     source_code         : 2 (14.93 KB)
COMET INFO: 

Please investigate what is happening

src/lightning/pytorch/loggers/comet.py Outdated Show resolved Hide resolved
src/lightning/pytorch/loggers/comet.py Outdated Show resolved Hide resolved
src/lightning/pytorch/loggers/comet.py Outdated Show resolved Hide resolved
@japdubengsub
Copy link
Author

Did some testing with following Trainer() params.

CPU

Devices Strategy Status
1 None Works
2 None Works
1 ddp_spawn Works
2 ddp_spawn Works
1 ddp_fork HANG
2 ddp_fork CRASH: torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGSEGV
1 ddp_notebook HANG
2 ddp_notebook CRASH: torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGSEGV
1 fsdp ValueError: The strategy fsdp requires a GPU accelerator, but got: cpu
2 fsdp ValueError: The strategy fsdp requires a GPU accelerator, but got: cpu

GPU

Devices Strategy Status
1 None Works
2 None Works
1 ddp_spawn Works
2 ddp_spawn Works
1 ddp_fork CRASH: torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGSEGV
2 ddp_fork CRASH: torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGSEGV
1 ddp_notebook CRASH: torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGSEGV
2 ddp_notebook CRASH: torch.multiprocessing.spawn.ProcessExitedException: process 1 terminated with signal SIGSEGV
1 fsdp Works
2 fsdp Works

MULTI-NODE (two VM nodes, each has one CUDA-device)

Devices Nodes Strategy Status
1 2 ddp Works

With or without current PR - everything works the same.

@alexkuzmik
Copy link

@japdubengsub very nice job on testing, Sasha!
@Lothiraldan I also made a few runs, worked well. The results are the same as I had with a previous branch.

japdubengsub and others added 7 commits September 11, 2024 17:38
update tutorials to `d5273534`

Co-authored-by: Borda <[email protected]>
…ning-AI#20267)

* build(deps): bump Lightning-AI/utilities from 0.11.6 to 0.11.7

Bumps [Lightning-AI/utilities](https://github.com/lightning-ai/utilities) from 0.11.6 to 0.11.7.
- [Release notes](https://github.com/lightning-ai/utilities/releases)
- [Changelog](https://github.com/Lightning-AI/utilities/blob/main/CHANGELOG.md)
- [Commits](Lightning-AI/utilities@v0.11.6...v0.11.7)

---
updated-dependencies:
- dependency-name: Lightning-AI/utilities
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <[email protected]>
…ing-AI#20266)

Bumps [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) from 6 to 7.
- [Release notes](https://github.com/peter-evans/create-pull-request/releases)
- [Commits](peter-evans/create-pull-request@v6...v7)

---
updated-dependencies:
- dependency-name: peter-evans/create-pull-request
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
)

* Update favicon

* Update favicons - all sizes
amorehead and others added 30 commits November 25, 2024 23:40
…on()` to robustly seed NumPy-dependent dataloader workers (Lightning-AI#20369)

* Update seed.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update seed.py

* Update seed.py

* Update seed.py

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
…Lightning-AI#20440)

* Minimal transformer examples

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add tests for compile after fsdp2/tp

* Add README's

* Add docs

* Rename folder, add cross-reference

* Fix link

* Newline after code-block directive

* Update section name

* Fix reference

* Half standalone tests batch size

* Fix integration tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* make plugin type check more flexible

* Change signature and make the equivalent changes to Fabric connector

---------

Co-authored-by: Jianing Yang <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
…ightning-AI#20476)

Temporarily pin Python at 3.12.7 to avoid jsonargparse issue
update tutorials to `1e0e8073`

Co-authored-by: Borda <[email protected]>
* Patch argparse _parse_known_args

* Add patch to test

* Avoid importing lightning in assistant

* Fix return type
* Remove deprecated distutils

* Fix format

* Fix package name

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…teau (Lightning-AI#20471)

* fix TypeError in configure_optimizers

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Lukas Salchow <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add doc for TBPTT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove url to prevent linting error

* attempt to fix linter

* add tbptt.rst file

* adjust doc:

* nit

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make example easily copy and runnable

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address comments

* fix doc test warning

* Update docs/source-pytorch/common/tbptt.rst

---------

Co-authored-by: Alan Chu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: Alan Chu <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
* allow user to pass kwargs to DeepSpeedStrategy

* Update deepspeed.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update deepspeed.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* make timeout explicit in DeepSpeedStrategy

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Luca Antiga <[email protected]>
…deprecated (Lightning-AI#20361) (Lightning-AI#20477)

* Update checkpointing documentation to mark resume_from_checkpoint as deprecated

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update docs/source-pytorch/common/checkpointing_basic.rst

Co-authored-by: Luca Antiga <[email protected]>

* Update docs/source-pytorch/common/checkpointing_basic.rst

Co-authored-by: Luca Antiga <[email protected]>

* Address review comments

* Address review comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
* Add `convert_module` to FSDP

* Update ChangeLog

* make plugin type check more flexible (Lightning-AI#20186)

Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>

* Make plugin type check more flexible (Fabric) (Lightning-AI#20452)

* make plugin type check more flexible

* Change signature and make the equivalent changes to Fabric connector

---------

Co-authored-by: Jianing Yang <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>

* Pin setuptools for gpu builds

* Fix link in doc

---------

Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: Jianing Yang <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
…ctions=False. (Lightning-AI#20484)

* any_on_epoch referenced before assignment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Luca Antiga <[email protected]>
* Add feature implementation to datamodule for str method
First implementation scetch

* Removed list / tuple case for datamodule str method

* Added test cases for DataModule string function
Added alternative Boring Data Module implementations
Added test cases for all possible options
Added additional check for NotImplementedError in string function of DataModule

* Reverted accidental changes in DataModule

* Updated dataloader str method
Made changes to comply with requested suggestions
Switched from hardcoded \n to more general os.linesep

* Improvements to implementation of str method for datamodule
Corrected the annotation for the internal function and the list that is suppsoed to store the information on the datasets

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Implementing str method for datamodule
Fixed type annotation issue
Reduced code size by using Sized object from abc library

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add string method to datamodule
Switched from Dataset based implementation to Dataloader based implementation

* Implementing str mehtod for dataloader
Added missing size value to tuple in the error case instead of returning only a string

* Implementing str fucntion for datamodule
Adjusted test to match the new implementation requirenemnts
Added necessary BoringModules for tests
Fixed bugs and annotation issues in the str method

* Implementing str method for datamodule
Refactored code and made it more readable by implementing more abstarct fucntion methods
Adjusted tests
Removed debug statements
Removed TODO comments

* Finilized required adjustments for dataloader string proposal method
Renamed varaibles to more sensible names to increase readability

* Implementing str method
Switched name from dataset to dataloader
Switched name Prediction to Predict
removed available keyword and instead write None if not available
Switched from unknown to NA

* Update src/lightning/pytorch/core/datamodule.py

* Update src/lightning/pytorch/core/datamodule.py

---------

Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Luca Antiga <[email protected]>
* Force hook standalone tests to single device

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…arameters (Lightning-AI#20221)

* Fix LightningCLI failing when both module and data module save hyperparameters due to conflicting internal  parameter

* Update changelog pull link

* Only skip logging internal LightningCLI params

* Only skip logging internal LightningCLI params

* Only skip _class_path

---------

Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
…g-AI#20176)

* Add step to TensorBoardLogger.log_hyperparams

The metrics that get logged using this method are always logged to step 0, so no meaningful graph is made

* Change expected step value to None

---------

Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
* Update (ci): github action's artifact upgrade due to EOL for versions less than 4

* fix (ci:  linkcheck): ignore 403 status code from habana.ai since it redirects intel.com documentation

* fix with pattern + merge

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use dist

* Revert "use dist"

This reverts commit ed44fec.

* retention

* ls -lh pypi/

* strategy.job-index

* stupid missing needs: build-packages

* tree pypi

* sudo

* Apply suggestions from code review

---------

Co-authored-by: Jirka B <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.