Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New example glm ordinal features #717

Merged
merged 25 commits into from
Dec 14, 2024

Conversation

jonsedar
Copy link
Contributor

@jonsedar jonsedar commented Oct 27, 2024

New example notebook: GLM-ordinal-features

The current example notebook for ordinal regression
GLM-ordinal-regression.ipynb
shows how to handle ordinals in the target (endogenous) feature, but not in the predictor (exogenous) feature(s).

I wanted to treat an ordinal feature in a model myself and dug up an old example by Austin Rochford, itself based on a 2018 paper by Burkner & Charpenitier. The example was good, but I thought I'd take the opportunity
to build out a full workflow, add more explanations and include in pymc-examples.

I've a few other related ones in the pipeline, but starting simple.


📚 Documentation preview 📚: https://pymc-examples--717.org.readthedocs.build/en/717/

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jonsedar
Copy link
Contributor Author

@ maintainer team

Hello! It's been years since I submitted an examples notebook. This is the first and simplest of a handful of new notebooks that I'd like to add, and am starting simple and working upwards.

I've tried to add all the various formatting bits and pieces that seem to make the current standard, so I hope any changes required will be minor, but I'll happily edit as needed!

I've tested this running locally and in Google Collab

@jonsedar jonsedar marked this pull request as draft October 27, 2024 11:30
@jonsedar
Copy link
Contributor Author

jonsedar commented Oct 27, 2024

WIP need to create a simple env with jupyter-black in pre-commit so we have the same validation. My env uses more and different validation (!), will return to this in a couple of hours after I get back from something

@jonsedar jonsedar marked this pull request as ready for review October 27, 2024 17:56
@jonsedar
Copy link
Contributor Author

Alrighty - ready for review!

Copy link

review-notebook-app bot commented Oct 28, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-10-28T03:00:29Z
----------------------------------------------------------------

Probably better as paragraphs rather than bullet points (except where there are actual lists). For consistency with other notebooks, as well as readability.

Key issue with ordinal variables is that the distance between categories are unknown and potentially unequal (e.g. the difference between better and good vs that between good and very good).


jonsedar commented on 2024-10-28T05:30:51Z
----------------------------------------------------------------

Thanks - I've updated to be more clear :)

Copy link

review-notebook-app bot commented Oct 28, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-10-28T03:00:30Z
----------------------------------------------------------------

Assume the target_accept comment is not necessary since the model runs fine with 0.8?


jonsedar commented on 2024-10-28T04:57:44Z
----------------------------------------------------------------

Yep fair enough - I was drawing attention to it for when I change it in Model B, but we can just have the implict default

Copy link

review-notebook-app bot commented Oct 28, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-10-28T03:00:30Z
----------------------------------------------------------------

🔥


jonsedar commented on 2024-10-28T04:58:21Z
----------------------------------------------------------------

I couldn't resist ;)

Copy link

review-notebook-app bot commented Oct 28, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-10-28T03:00:31Z
----------------------------------------------------------------

A few typos here.


jonsedar commented on 2024-10-28T05:00:22Z
----------------------------------------------------------------

My tpying skills aren't what they were!

Copy link

review-notebook-app bot commented Oct 28, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-10-28T03:00:32Z
----------------------------------------------------------------

Cells above can be removed, no?


jonsedar commented on 2024-10-28T07:57:47Z
----------------------------------------------------------------

Stylistic thing.. I quite like to preserve the workflow if possible to explicitly show that data prep steps have been considered but not needed. I've added a note above in section 0 (NOTE some boilerplate steps are included but ~~struck through~~ with and explanatory comment

e.g. "Not needed in this simple example". This is to preserve the logical workflow which is

more generally useful)

Happy to remove completely if you prefer

Copy link

review-notebook-app bot commented Oct 28, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-10-28T03:00:33Z
----------------------------------------------------------------

A bit more exposition on the encoding of categoricals may be helpful for those unfamiliar.


jonsedar commented on 2024-10-28T07:54:04Z
----------------------------------------------------------------

Sure, have added more description and dicussion of the data

Copy link

review-notebook-app bot commented Oct 28, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-10-28T03:00:33Z
----------------------------------------------------------------

A bit more exposition about the target variable would be helpful, either here or in the introduction. I'm not really sure what I'm looking at.


jonsedar commented on 2024-10-28T07:53:55Z
----------------------------------------------------------------

Sure, have added more description and dicussion of the data

Copy link
Contributor Author

I couldn't resist ;)


View entire conversation on ReviewNB

Copy link
Contributor Author

My tpying skills aren't what they were!


View entire conversation on ReviewNB

Copy link
Contributor Author

Well caught!


View entire conversation on ReviewNB

Copy link
Contributor Author

Thanks - I've updated to be more clear :)


View entire conversation on ReviewNB

Copy link
Contributor Author

jonsedar commented Oct 28, 2024

Thanks Chris - much appreciated! Nice to have all the software admin tools surrounding the process too - makes life much better.

Your request to have more detail on the data encouraged me to properly read that Burkner paper and see that the ordinal scales are supposed to be [0-4] on both features (d450, d455), but on d450 we only observe values [0-3]. So there's a missing data problem too, which makes this example more rich and reinforces the need for ordinal handling (not numeric).

That paper actually doesn't handle the missing value, so I'll take the opportunity to improve upon it in our notebook here

Copy link
Contributor Author

Sure, added a title with a little explantion to encourage folk to dig more


View entire conversation on ReviewNB

Copy link
Contributor Author

Indeed! I've had this boilerplate for a while and can't remember why I was using it over and above factorize :D

I've changed it accordingly and suffered no impact here so maybe I'll have to change my boilerplate too


View entire conversation on ReviewNB

Copy link
Contributor Author

Poor naming on my part, I've tried to clarify


View entire conversation on ReviewNB

Copy link
Contributor Author

Sure, have added more description and dicussion of the data


View entire conversation on ReviewNB

Copy link
Contributor Author

Sure, have added more description and dicussion of the data


View entire conversation on ReviewNB

Copy link
Contributor Author

Stylistic thing.. I quite like to preserve the workflow if possible to explicitly show that data prep steps have been considered but not needed. I've added a note above in section 0 (NOTE some boilerplate steps are included but ~~struck through~~ with and explanatory comment

e.g. "Not needed in this simple example". This is to preserve the logical workflow which is

more generally useful)

Happy to remove completely if you prefer


View entire conversation on ReviewNB

@jonsedar
Copy link
Contributor Author

Huh? Cell 11 does come after 4... why is that a problem?

Check cells were executed sequentially........................................................Failed
- hook id: check-execution-order
- exit code: 1

Cell 11 comes after 4 in file 'examples/generalized_linear_models/GLM-ordinal-features.ipynb'
Cell 15 comes after 11 in file 'examples/generalized_linear_models/GLM-ordinal-features.ipynb'
Cell 20 comes after 16 in file 'examples/generalized_linear_models/GLM-ordinal-features.ipynb'
Cell 30 comes after 21 in file 'examples/generalized_linear_models/GLM-ordinal-features.ipynb'
Cell 35 comes after 30 in file 'examples/generalized_linear_models/GLM-ordinal-features.ipynb'

@jonsedar
Copy link
Contributor Author

Aha, the checks are defeated! Over to you @fonnesbeck :)

@jonsedar
Copy link
Contributor Author

This is ready to go, how do we get this merged? @fonnesbeck ? Cheers :)

Copy link
Member

@fonnesbeck fonnesbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fonnesbeck fonnesbeck merged commit a3f03a0 into pymc-devs:main Dec 14, 2024
2 checks passed
@fonnesbeck
Copy link
Member

Sorry this took so long to get back to!

@jonsedar
Copy link
Contributor Author

No worries - thanks for the merge!

@jonsedar jonsedar deleted the new-example-glm-ordinal-features branch December 16, 2024 06:19
fonnesbeck pushed a commit to fonnesbeck/pymc-examples that referenced this pull request Dec 20, 2024
* + added new notebook GLM-ordinal-features.ipynb
+ already complete and created in another env

* Created using Colab

* minor updates for latest seaborn

* Tweaks for collab and included authors

* added header

* + added new reference article and online
+ cited inside Notebook
+ adjusted errata

* + ran black again...

* + rep Burkner as B{\"u}rkner

* + maybe fixed readthedocs complaint pybtex.database.InvalidNameString: Too many commas in 'B{\\"u}rkner, P., & Charpentier, E.'

* + ran black-jupyter, let's see

* + another run of black-jupyter

* + installed local pymc_examples env
+ ran pre-commit in full
+ autocreated *.myst file

* + update tags

* + possibly persuaded the precommits to all pass

* + rerun on colab to confirm all good post new pre-commit process

* + okay, reran in colab again... lets see if this passes

* + added (again) the myst.md

* + minor updates post review
+ new work to positively include a coeff in mdlb for d450 = c4

* + reran precommit and readded myst.md

* + rerun localyl e2e

* + added myst.md again

* + reran again to ensure cell execution order even for markdown cells

* + reran again again to ensure order

* + minor update: forced addtional level c4 into d450 categorical feature
+ fixed a couple of typos

* + changed rating to intermediate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants