From 1972616aa64c3513fdf565b50be938a5b2219f5d Mon Sep 17 00:00:00 2001 From: George Ho <19851673+eigenfoo@users.noreply.github.com> Date: Fri, 28 Jun 2019 13:59:58 +0800 Subject: [PATCH] ENH: onboard lindeloevs feedback --- index.html | 54 +++++++++++++++++----------------- tests-as-linear.ipynb | 68 +++++++++++++++++++++---------------------- 2 files changed, 61 insertions(+), 61 deletions(-) diff --git a/index.html b/index.html index 9509c82..93e33df 100644 --- a/index.html +++ b/index.html @@ -13167,7 +13167,7 @@
It couldn't be much simpler to run these models with statsmodels
(smf.ols
) or scipy
(scipy.stats.pearson
). They yield identical slopes, p
and t
values, but there's a catch: smf.ols
gives you the slope and even though that is usually much more interpretable and informative than the correlation coefficient $r$, you may still want $r$. Luckily, the slope becomes $r$ if x
and y
have a standard deviation of exactly 1. You can do this by scaling the data: data /= data.std()
.
Notice how scipy.stats.pearsonr
and smf.ols (scaled)
have the same slopes, $p$ and $t$ values.
Notice how scipy.stats.pearsonr
and smf.ols (scaled)
have the same slopes, $p$ and $t$ values. Also note that statistical functions from scipy.stats
do not provide confidence intervals, while performing the linear regression with smf.ols
does.
correlated = pd.DataFrame()
correlated["x"] = np.linspace(0, 1)
-correlated["y"] = 5 * correlated.x + 2 * np.random.randn(len(correlated.x))
+correlated["y"] = 1.5 * correlated.x + 2 * np.random.randn(len(correlated.x))
scaled = correlated / correlated.std()
@@ -13518,27 +13518,27 @@ 3.0.3 Python code: Pearson corre
scipy.stats.pearsonr
- 0.649620
- 3.321709e-07
+ 0.249694
+ 0.080332
NaN
NaN
NaN
smf.ols
- 5.012744
- 3.321709e-07
- 5.91995
- 3.310230
- 6.715258
+ 1.512744
+ 0.080332
+ 1.78652
+ -0.189770
+ 3.215258
smf.ols (scaled)
- 0.649620
- 3.321709e-07
- 5.91995
- 0.428985
- 0.870255
+ 0.249694
+ 0.080332
+ 1.78652
+ -0.031324
+ 0.530712
@@ -13628,19 +13628,19 @@ 3.0.4 Python code: Spearman cor
scipy.stats.spearmanr
- 0.634958
- 7.322277e-07
+ 0.233421
+ 0.102803
NaN
NaN
NaN
smf.ols (ranked)
- 0.634958
- 7.322277e-07
- 5.694307
- 0.410757
- 0.859159
+ 0.233421
+ 0.102803
+ 1.663134
+ -0.048772
+ 0.515615
@@ -15234,7 +15234,7 @@ 6.2 Two-way ANOVA
6.2.2 Python code: Two-way ANOVA¶
Note on Python port:
- Unfortunately, scipy.stats
does not have any function to perform a two-way ANOVA, so we can't verify that the linear model gives the same results as some other Python statistical function. Nevertheless, we'll go through the motions of performing the linear regression.
+ Unfortunately, scipy.stats
does not have a dedicated function to perform two-way ANOVA, so we cannot demonstrate directly that it is fundamentally a linear model. Nevertheless, we will write the code to perform the linear regression.
scipy.stats
does not have any function to perform ANCOVA, so again, we can't verify that the linear model gives the same results as some other Python statistical function. Nevertheless, we'll go through the motions of performing the linear regression.
+ Unfortunately, scipy.stats
does not have a dedicated function to perform ANCOVA, so again, we cannot demonstrate directly that it is fundamentally a linear model. Nevertheless, we will write the code to perform the linear regression.
smf.ols
does not support GLMs: we need to use sm.GLM
. While sm.GLM
does not have a patsy
-formula interface, we can still use patsy.dmatrices
to get the endog
and exog
design matrices, and then feed that into sm.GLM
.
statsmodels
does not currently support performing a one-way ANOVA test on GLMs (the anova_lm
function only works for linear models), so while we can perform the GLM, there is no support for computing the F-statistic or its p-value. Nevertheless, we'll go through the motions of performing the generalized linear regression.
+ Unfortunately, statsmodels
does not currently support performing a one-way ANOVA test on GLMs (the anova_lm
function only works for linear models), so while we can perform the GLM, there is no support for computing the F-statistic or its p-value. Nevertheless, we will write the code to perform the generalized linear regression.
Several named tests are still missing from the list and may be added at a later time. This includes the Sign test (require large N to be reasonably approximated by a linear model), Friedman as RM-ANOVA on rank(y)
, McNemar, and Binomial/Multinomial. See stuff on these in the section on links to further equivalences. If you think that they should be included here, feel free to submit "solutions" to the GitHub repo of this doc!
Several named tests are still missing from the list and may be added at a later time. This includes the Sign test (require large N to be reasonably approximated by a linear model), Friedman as RM-ANOVA on rank(y)
, McNemar, and Binomial/Multinomial. See stuff on these in the section on links to further equivalences. If you think that they should be included here, feel free to submit "solutions" to the GitHub repo of this doc!
Common statistical tests are linear models: Python port by Jonas Kristoffer Lindeløv and George Ho is licensed under a Creative Commons Attribution 4.0 International License.
+Common statistical tests are linear models: Python port by George Ho and Jonas Kristoffer Lindeløv is licensed under a Creative Commons Attribution 4.0 International License.
Based on a work at https://lindeloev.github.io/tests-as-linear/.
Permissions beyond the scope of this license may be available at https://github.com/eigenfoo/tests-as-linear.
diff --git a/tests-as-linear.ipynb b/tests-as-linear.ipynb index 37d321d..f3c74fa 100644 --- a/tests-as-linear.ipynb +++ b/tests-as-linear.ipynb @@ -21,7 +21,7 @@ { "data": { "text/markdown": [ - "Last updated: June 27, 2019" + "Last updated: June 28, 2019" ], "text/plain": [ "scipy.stats
does not have any function to perform a two-way ANOVA, so we can't verify that the linear model gives the same results as some other Python statistical function. Nevertheless, we'll go through the motions of performing the linear regression.\n",
+ " Unfortunately, scipy.stats
does not have a dedicated function to perform two-way ANOVA, so we cannot demonstrate directly that it is fundamentally a linear model. Nevertheless, we will write the code to perform the linear regression.\n",
"scipy.stats
does not have any function to perform ANCOVA, so again, we can't verify that the linear model gives the same results as some other Python statistical function. Nevertheless, we'll go through the motions of performing the linear regression.\n",
+ " Unfortunately, scipy.stats
does not have a dedicated function to perform ANCOVA, so again, we cannot demonstrate directly that it is fundamentally a linear model. Nevertheless, we will write the code to perform the linear regression.\n",
"statsmodels
does not currently support performing a one-way ANOVA test on GLMs (the anova_lm
function only works for linear models), so while we can perform the GLM, there is no support for computing the F-statistic or its p-value. Nevertheless, we'll go through the motions of performing the generalized linear regression.\n",
+ " Unfortunately, statsmodels
does not currently support performing a one-way ANOVA test on GLMs (the anova_lm
function only works for linear models), so while we can perform the GLM, there is no support for computing the F-statistic or its p-value. Nevertheless, we will write the code to perform the generalized linear regression.\n",
"