Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for feedback #22

Closed
eigenfoo opened this issue Jun 27, 2019 · 13 comments
Closed

Request for feedback #22

eigenfoo opened this issue Jun 27, 2019 · 13 comments

Comments

@eigenfoo
Copy link
Owner

eigenfoo commented Jun 27, 2019

Hello @lindeloev!

Following up from lindeloev/tests-as-linear#16, the port is ready for you to take a look - any feedback you have would be fantastic! You can view the page here: https://eigenfoo.xyz/tests-as-linear/.

Some specific notes for you:

  1. Your cheatsheets are licensed under CC-BY 4.0. Can I assume that the entire R post is similarly licensed? I've gone ahead and licensed this project the same way, citing your post as the source work.

  2. I've learnt that the Python statistics ecosystem really doesn't compare to R! scipy and statsmodels don't support:
    a. Two-way ANOVA
    b. ANCOVA
    c. One-way ANOVAs on GLMs
    I've just left these code snippets un-ported, with appropriate warning boxes.

  3. I haven't ported your simulations/appendices - hopefully somebody else will pick up Port appendices/simulations to Python #14.

  4. I'm not sure about how to implement Welch's t-test with a linear model: you don't go into details about how to do that in your original post. Do you know of anywhere I can read up on that?

@lindeloev
Copy link

lindeloev commented Jun 27, 2019

Awesome!

  1. Yes, all good re license!

  2. That's also where I got stuck. Since ANOVA, ANCOVA, etc. are linear models, statsmodels do support them (ols combined with anova_lm) - it just doesn't have built-in dedicated functions to do it. Actually, R's car::Anova also just calls lm in the background and then do some post-processing like anova_lm, so it's not all that different. pyvttbl.DataFrame.anova may be one alternative, but I don't know how popular it is.

In any case, I would consider re-phrasing what you write in the yellow boxes to something like this:

"Note on Python port: Unfortunately, scipy.stats does not have a dedicated function to perform XXXX, so we cannot demonstrate directly that it is fundamentally a linear model."

In general, I'm still really surprised that Python's t-tests etc. do not provide confidence intervals. Could be worth some MORE yellow boxes but you decide :-)

  1. No problem. I made the simulations to study equivalences. R was just a means to that end. I don't expect people to look at it to learn how to do these simulations.

  2. Ugh, I fail to find (Google) python packages that model independent variances (not just correlated). Here is the reasoning behind the nlme::gls approach and an lme4::lmer equivalent` in R: https://stats.stackexchange.com/questions/142685/equivalent-to-welchs-t-test-in-gls-framework

Other ideas for the Python notebook:

  • I deliberately simulated data which created mid-sized values for ease of comparison. p = 3.321709e-07 and p = 9.83425e-08 look very different but are not. Easier to compare p = 0.43 vs. p = 0.45.
  • I wonder if not it would be easier to leave out section 8-10, and simply link to the R tutorial? These sections are not Python-specific AFAICS. (Edit: hmm, it does link to the appropriate sections in the Python-version. What do you think?)
  • I think we should make the cheat sheets differ somewhat. Maybe include a clear R-logo and a Python-Logo somewhere in the two cheat sheets? Right now, I struggle to find an aesthetic solution - mostly because the R logo is so ugly! :-)

Ideas for the cheat sheet:

  • Vertical text to the left,
  • Fixing icons on the right,
  • change lm to a python solution in ANOVA + Kruskal-Wallis row(s).
  • Extreme petitesse: Maybe the titles could be "Built-in function in scipy.stats" and "Equivalent linear model in smf.ols"?

@eigenfoo
Copy link
Owner Author

eigenfoo commented Jun 28, 2019

Thanks for the feedback @lindeloev!

pyvttbl.DataFrame.anova may be one alternative, but I don't know how popular it is.

I also came across that solution, but I'd prefer not to use it. The last release of pyvttbl was back in 2013 and it doesn't look like its actively maintained.

I guess the best thing to do would just be to leave notes in the yellow boxes.

In any case, I would consider re-phrasing what you write in the yellow boxes to something like this:

Done!

In general, I'm still really surprised that Python's t-tests etc. do not provide confidence intervals. Could be worth some MORE yellow boxes but you decide :-)

Probably not! I've written a short note describing the lack of CIs (it's when I first actually use a scipy.stats function), but I don't think its worth putting it in its own yellow box 😄

Ugh, I fail to find (Google) python packages that model independent variances (not just correlated). Here is the reasoning behind the nlme::gls approach and an lme4::lmer equivalent` in R:

I'm not sure how to achieve this in statsmodels, but I've added this as a comment to the code.

I wonder if not it would be easier to leave out section 8-10, and simply link to the R tutorial? These sections are not Python-specific AFAICS. (Edit: hmm, it does link to the appropriate sections in the Python-version. What do you think?)

I think it's better to leave them in. As you say, its not language-specific content, so there's no reason to keep them separate: it would be a pain to have to click around two web pages to read those sections!

Ideas for the cheat sheet:

I admit that I skimped on the effort for the cheatsheet! I'm not a big fan of cheatsheets, but it could definitely be better and prettier. I'll look into writing it up better in LaTeX.

@eigenfoo
Copy link
Owner Author

eigenfoo commented Jun 28, 2019

Cheatsheets fixed (b962693)! I quickly gave up on LaTeX - I forget what a nightmare it is to write in LaTeX.

I'm not sure if adding the Python logo is wise - it might clutter the sheet even more, and it's already jam-packed with information! I think it should be obvious which cheatsheet it is, depending on how the viewer finds it (i.e. either through your blog or mine).

I think this might be ready to release and publicize - what do you think @lindeloev?

@lindeloev
Copy link

This all sounds reasonable and the updated cheat sheet looks great!

My major worry re the cheat sheet was that when your python version (hopefully!) goes viral, that people would think the "N/A"s mean "not possible in theory" when it is just a technical limitation of the Python modules right now. How about either:

  1. Making these N/A links which point to https://lindeloev.github.io/tests-as-linear
  2. Same as 1, but writing "N/A in Python, but see R version"

In addition, I just added links to your Python version in the R cheat sheet:

image

Maybe you could do the same below the title in "your" cheat sheet, pointing to the R version? (Also added links to the Python version in my Notebook).

With this, I think it's ready for prime time!

@eigenfoo
Copy link
Owner Author

eigenfoo commented Jun 28, 2019

Ah, valid concerns! I suppose I don't spend enough time thinking about what how things could be misconstrued. I've fixed up the "N/A" comments on the cheatsheet, and also linked back to the original R version. I've also released this as v1.0.0.

Would you do the honors of publicizing it? You're the original author, after all! 🚀

EDIT: my Twitter handle is @_eigenfoo, if you'd prefer to tweet it out.

@lindeloev
Copy link

I'd be very happy to tweet it with all sorts of praise for your work here! Will do it this afternoon. https://eigenfoo.xyz/tests-as-linear/ does not seem to include the update cheat sheet, though, so I'll hold off until then (if this is the link to be shared?).

@lindeloev
Copy link

Only if you feel like it is worth the time: Consider making a Twitter Card to show the cheat sheet: https://cards-dev.twitter.com/validator. This is the HTML I used to do so: https://github.com/lindeloev/tests-as-linear/blob/master/include/twitter_card.html (change it to your Twitter handle too). I haven't played with HTML in ipython notebooks or the export.

@eigenfoo
Copy link
Owner Author

Master Twitter user! I'll do this once I get back to a keyboard. I'll let you know!

@eigenfoo
Copy link
Owner Author

does not seem to include the update cheat sheet

Hmm, it seems to be working for me. Could you refresh the link again?

@eigenfoo
Copy link
Owner Author

Twitter card created! I just embedded the HTML tags directly into the index.html. I'm realizing that I should invest some time into some basic web design skills... serving a single HTML file makes me feel bad.

Nevertheless, I think everything is ready for prime time!

(if this is the link to be shared?)

Yes! https://eigenfoo.xyz/tests-as-linear/ would be perfect.

@lindeloev
Copy link

It's live! https://twitter.com/jonaslindeloev/status/1144587998291464195

@lindeloev
Copy link

It took me a few hours to find not-too-ugly ways to do share buttons, twitter/facebook/linkedin cards, etc. From now on, I'll just copy-paste the header of the notebook you ported :-)

But it's worth it because people use it quite a lot and it's fun to get Twitter mentions so that you can follow the spread of your work. You should definitely put some share buttons on your blog/website :-)

@eigenfoo
Copy link
Owner Author

I'll try to find some time to take a look at that! In any case, thanks so much for your time! Perhaps we'll bump into each other again sometime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants