Comments Ludwig Tools Deploy_V1 #15

paulocilasjr · 2024-11-05T00:25:58Z

Comments from: natefoo

Explicitly overriding $TMP* like this is probably not a good idea, Galaxy defaults to tmp space in the job directory, and admins often override this as needed for particular destinations or tools.
I see Ludwig has this option to disable multithreading, which is exposed in the tool as an option for reproducibility purposes. But if Ludwig has an option for controlling the number of threads, I don't see it. What may happen is it gets scheduled on a node with 64 cores but is only allocated a fraction of those, assumes it can use them all, and blows up. If there is any way to pass in the number of cores, that would be great, otherwise we might have to get creative in scheduling.
Minor, but ${dataset.element_identifier} is typically the dataset name I believe? So this can result in some weirdness when creating those symlinks, but should be safe at least since they are quoted.
Also minor but lot of those pwd calls can probably just be replaced by ., unless Ludwig changes the cwd internally.
This might fail under Pulsar, I am not sure if there is a "preferred" way of looking at tool stdout like this but the IUC channel probably has an answer.
Should this be a yaml.safe_dump()?
Quoting construction in ludwig_visualize.yml is a bit creative but I think ok, but if an IUC person has a look at that as well that would be great.

Comments from: bgruening

The tests could also use some asserts, as the simsize comparison is not very strict
the format="auto" on outputs should be avoided if possible. It will disable certain features in workflows.
Is there any reason you need to use extra_files_path?

Comments from: bernt-matthias

Use . instead of pwd
element_identifier needs to be sanitized (see here for an example)
is ludwig_model a proper Galaxy datatype?
if you unzip here

Galaxy-Ludwig/tools/ludwig_evaluate.xml

Line 16 in 00f7da5

unzip -o -q '$raw_data' -d ./;

you do not have any control over the output path name - I think this should be controlled.
Seems that the tool uses multithreading but does not allow any control over it. This will lead to problems.
It would be great to add min and max to numeric parameters
typo: 'randonness'

The text was updated successfully, but these errors were encountered:

paulocilasjr · 2024-11-07T01:29:59Z

Removed from the code - Explicitly overriding $TMP*
All pwd was replaced by .
${dataset.element_identifier} sanitized
Regarding the multithreading: Ludwig uses PyTorch underneath the hood and PyTorch uses torch.get_num_threads() and torch.get_num_interop_threads() to return the number of physical CPU cores available to use PyTorch CPU threading. The Ludwig parameter disable_parallel_threads sets the use or not of multithreading in PyTorch set_multihreading_enabled.
There is some alternatives to pass the number of Cores, if we see necessary.
Regarding the tool_stdout, I searched through the IUC and found some discussions and PRs, but nothing that directly answered the question. Additionally, while looking at the Galaxy codebase, I came across this test, which uses the same path. Could this indicate that it is okay to use it?
yaml.safe_dump() incorporated
format = "auto" changed on Ludwig_visualize.xml
Regarding the extra_files_path, couldn't find an explanation for it.
ludwig_model was replaced to html format.
Regarding the Unzip control: I'm not sure how to address this properly within Galaxy, considering that the tabular data includes the file_path as a column, which could potentially cause issues.
Numeric parameters with min and max in place.
Typo fixed

qchiujunhao · 2024-12-31T20:36:10Z

Added assertions to the tests and replaced all unnecessary extra_files with collections using discover_datasets. However, since the Ludwig model is designed to include extra_files, we have retained them for its output.

@paulocilasjr Could you add your changes to release_v0.10 via a PR? Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments Ludwig Tools Deploy_V1 #15

Comments Ludwig Tools Deploy_V1 #15

paulocilasjr commented Nov 5, 2024 •

edited by qchiujunhao

Loading

paulocilasjr commented Nov 7, 2024 •

edited

Loading

qchiujunhao commented Dec 31, 2024

Comments Ludwig Tools Deploy_V1 #15

Comments Ludwig Tools Deploy_V1 #15

Comments

paulocilasjr commented Nov 5, 2024 • edited by qchiujunhao Loading

Comments from: natefoo

Comments from: bgruening

Comments from: bernt-matthias

paulocilasjr commented Nov 7, 2024 • edited Loading

qchiujunhao commented Dec 31, 2024

paulocilasjr commented Nov 5, 2024 •

edited by qchiujunhao

Loading

paulocilasjr commented Nov 7, 2024 •

edited

Loading