Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #785

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ If you've finished a project with PySR, please submit a PR to showcase your work
- [Quickstart](#quickstart)
- [→ Documentation](https://ai.damtp.cam.ac.uk/PySR)
- [Contributors](#contributors-)
- [Tips for setting hyperparameters](#tips)

<div align="center">

Expand Down Expand Up @@ -432,3 +433,24 @@ If you have an idea for a new feature, don't hesitate to share it on the [issues
<!-- prettier-ignore-end -->

<!-- ALL-CONTRIBUTORS-LIST:END -->


### Tips

1. When running PySR with `julia 1.11`, the process seems to run into a memory leak bug that halts the execution by exceeding the available memory. Forcing the version `1.10` can help avoiding such problem:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This memory leak is a Julia bug and will be fixed once 1.11.3 is released (as well as 1.10.8 hopefully) – JuliaLang/julia#56801. So we probably don't need to provide this guidance as it will only be temporary until the new Julia is out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then maybe put as a highlighted issue (same as the "have you used PySR in your paper") while this is not fixed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!


```python
import juliapkg
juliapkg.require_julia("~1.10")
```

2. Another memory issue can happen if using a large enough `maxsize` parameter **(Miles: is there any explanation for that?)**. Since the main usage of PySR is for the discovery of scientific equations, this value should be set small anyway. Anything beyond $50$ seems to create a significant slowdown and memory usage.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything beyond $50$ seems to create a significant slowdown and memory usage.

I think we can turn this off now. It was basically only there because some beginners were running with like 10,000 maxsize, so I wanted it to warn them 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$10,000$ 😶 lol, I wonder why

3. Using a single population often makes the algorithm unstable, with a high variance on the results. A good enough starting value for this parameter is $10$.
4. It can be a good practice to set `optimizer_nrestarts` to something larger than $1$, depending on the computational budget. The minimization of error for nonlinear regression models is multimodal and multiple restarts may be required to assess the quality of the equation.
5. The default model selection may not work well in many situations. It may be worth to implement your own, or use the following code to predict using the most accurate in the Pareto front:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default model selection may not work well in many situations. It may be worth to implement your own, or use the following code to predict using the most accurate in the Pareto front:

You can do model.model_selection = "accuracy" for this btw

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it didn't work for me I don't know why. But I can rerun some experiments to see if it was related to using a single population instead of multiple islands. I'll let you know

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, just tested here again. The instability issue was due to using populations=1 , the model_selection = "accuracy" works.


```python
reg = PySRRegressor(...)
ix = reg.equations_.loss.argmin()
y_hat = reg.predict(X_train, index=ix)
```