Large redesign to add flexibility and user defaults and fewer dependencies #139

junder873 · 2023-07-31T01:58:13Z

This is more or less a complete rewrite of the package to create more flexibility, allow the user to easily set defaults, and move toward the use of extensions with Julia 1.9. I did not originally set out to rewrite everything, but as I went for a few additions it became easier to do so. While this is not 100% complete (need a few more details fixed, documentation, make sure it is all tested), I wanted to post it to get some feedback.

In general, this package does 4 main things:

More flexibility.
a. There were several things I wanted to do with regression tables that are currently difficult or impossible. The biggest is I wanted to be able to add an extraline that was between two columns, which is useful if you need to add a statistic that compares two values, e.g.:

rr1 = reg(df, @formula(Sales ~ NDI + Price + fe(State) + fe(Year)), Vcov.cluster(:State))
rr2 = reg(df, @formula(Sales ~ NDI + Price + fe(State) + fe(Decade)), Vcov.cluster(:State))
regtable(
    rr1, rr2;
    align=:c,
    extralines=[["New line", 0.55 => 2:3]]
)
# ------------------------------------------
#                               Sales
#                        -------------------
#                           (1)        (2)
# ------------------------------------------
# NDI                     -0.005**    -0.001
#                         (0.003)    (0.002)
# Price                  -0.823***   -0.273*
#                         (0.190)    (0.157)
# ------------------------------------------
# State Fixed Effects       Yes        Yes
# Year Fixed Effects        Yes
# Decade Fixed Effects                 Yes
# ------------------------------------------
# N                        1,380      1,380
# R2                       0.846      0.796
# Within-R2                0.227      0.148
# New line                      0.550
# ------------------------------------------

There is some clunkiness to this solution. If the user does not use the :c align, then the user can pass a DataRow, which has its own settable alignment for each cell.
b. Another feature is adding statistics. Currently, the user is limited to those provided ($R^2$, Adj. $R^2$, etc.) or manually adding extralines. I went ahead and expanded those available, (e.g., Pseudo $R^2$), but it is now possible for the user to define any new statistic and use it, just as if it was built into the package.
c. A long standing todo in this package is to enable custom block ordering (such as stats in front of fixed effects), this is now possible.
d. The underlying type in this update is a vector of vectors. This means that if the user needs to combine tables in a somewhat unusual way to fit their needs, it should be more possible (I want to do more with this)

User settable defaults. I tend to want all my tables to look similar, but to do so in the current package, I need to make sure to change certain settings for each table. Other settings are difficult to change. This update tries to fix these issues.
a. As an example of this, in my Latex tables, I almost always want to use tabular*, but to do so I need to define a new RenderSetting that would match, which takes ~53 lines of code, even though only 2 lines really need to change. Now, the user would only need those 2 lines to change (the package also now exports LatexTableStar which does this as well):

RegressionTables.tablestart(::RegressionTables.AbstractLatex, align) = "\\begin{tabular*}{\\textwidth}{$(align[1])@{\\extracolsep{\\fill}}$(align[2:end])}"
RegressionTables.tableend(::RegressionTables.AbstractLatex) = "\\end{tabular*}"

b. As another example, I prefer using T-Stats in my tables. In the current package, I would need to set this in every table. Now, this default is settable by the user:

RegressionTables.default_below_statistic() = TStat

With Julia 1.9, extensions are now possible, I think this is valuable for a package like RegressionTables to minimize dependencies. This part of this proposed update is definitely not complete (I have tested it quite a bit with FixedEffectModels.jl, not the others)
As an added bonus, I wanted this package to be more friendly to other types of data. For example, I often create descriptive tables that need to make it into a paper, ideally with a similar style to my regression tables. The DataFrames.jl package provides a good setup to this, so I just needed to add a new function that works for this:

RegressionTables.RegressionTable(
    names(df_described),
    Matrix(df_described)
)
# ---------------------------------------------------------------------
# variable      mean        std         q25        median        q75
# ---------------------------------------------------------------------
# State         26.826      14.481      15.000      26.500       40.000
# Year          77.500       8.659      70.000      77.500       85.000
# Price         68.700      41.986      34.775      52.300       98.100
# Pop        4,537.113   4,828.836   1,053.000   3,174.000    5,280.250
# Pop16      3,366.616   3,641.847     781.175   2,315.300    3,914.325
# CPI           73.597      36.529      38.800      62.900      107.600
# NDI        7,525.023   4,747.859   3,327.869   6,281.201   11,024.110
# Sales        123.951      30.991     107.900     121.200      133.200
# Pimin         62.899      38.323      31.975      46.400       90.500
# ---------------------------------------------------------------------

With these changes, I also added a lot of other changes that I think are useful:

There is now an order argument, which keeps all of the coefficients but changes the order (a drop argument is a work in progress, I think it will be pretty simple)
Related to order and drop, these arguments are now more flexible. In the current package, you need to provide a full string of the coefficients you want to keep. This proposed update has 4 options: string, integers, ranges, and regex. Integers and ranges are pretty straightforward, regex applies the occursin function, so any coefficient names that match the regex will be used (kept, dropped, higher order).
Every statistic type has its own custom formatting options. For example, if the user wants $R^2$ values to be displayed as a percentage while other statistics are still displayed in the old way, this is now possible.
I changed how the renaming works related to interactions and categorical variables. Before, an interaction was treated as a completely different variable name, now, each piece of the interaction has the name of the base variable. In other words, relabeling these variables is much simpler (similar for categorical variables):

rr1 = reg(df, @formula(Sales ~ NDI * Price), Vcov.cluster(:State))
rr2 = reg(df, @formula(Sales ~ NDI + Price), Vcov.cluster(:State))
regtable(
    rr1, rr2;
    labels=Dict("NDI" => "Newspaper Advertising", "Price" => "Cigarette Price"),
    order=[r"Price", r"Adv"],
)
# -----------------------------------------------------------------
#                                                    Sales
#                                           -----------------------
#                                                  (1)          (2)
# -----------------------------------------------------------------
# Cigarette Price                            -0.813***    -0.938***
#                                              (0.251)      (0.173)
# Newspaper Advertising & Cigarette Price       -0.000
#                                              (0.000)
# Newspaper Advertising                       0.007***     0.007***
#                                              (0.001)      (0.002)
# (Intercept)                               133.068***   138.480***
#                                              (8.502)      (5.753)
# -----------------------------------------------------------------
# N                                              1,380        1,380
# R2                                             0.212        0.209
# -----------------------------------------------------------------

Related to the change to variable naming, different table types now use different interaction symbols. For example, the above would be Newspaper Advertising $\times$ Cigarette Price if using LatexTable(). This prevents the "&" symbols from being a problem in Latex Tables, but is also settable by the user if the user prefers \& or something similar.
For fixed effect models, there is now a suffix applied to the names. I think this makes it a little more consistent form a display perspective.
Several of the default display options are now dependent on what is passed. For example, in the current package the "estimator section" is always printed. Now, it is only printed (by default, which is again user settable) if multiple regression types are passed (e.g., IV and OLS). Another smaller example is column numbers are only printed if more than 1 regression is passed.

I am sure there is something in there I forgot. I would very much appreciate any feedback. Where possible, I tried to stick with the current user interface, but obviously with such large changes the interface changes as well. This proposal is also not completely finished, particularly the extensions.

resolves #130, resolves #109, resolves #105, resolves #90, resolves #52, resolves #17, resolves #12

(It would also sort of solve #129 and #128)

jmboehm · 2023-07-31T17:20:50Z

Thanks a lot for the PR. I haven't gone through all this yet, but overall it looks like a very sensible set of improvements.

Of course tests and documentation would need to be updated.
There are a bunch of breaking changes. I feel that the implicit contract with the user is that while it's ok to have some breaking changes every once in a while, we should also set out to minimize them (nobody enjoys having to update their code). So I think it's worth writing them all up and thinking about whether they're necessary and whether it's possible to mitigate them.
Finally, since you're rewritten most of the package (and since my ability to be involved in FOSS development is rapidly deteriorating) I'd suggest and invite you to take over maintaining the package (of course this needs to be ok with @greimel as well). It would be much more costly for me to maintain code that was largely written by you.

…nt on distribution

I wanted to check how comparable the new backend is compared to what already exists. There are a few settings that might change, but this is to show that the results are very comparable between the two. There is some minor spacing changes when dealing with multicolumn objects, it is otherwise capable of producing very similar tables.

greimel · 2023-08-07T10:46:58Z

Thanks for the effort, @junder873.

I agree with @jmboehm. I think it would be important that the tests are updated (and pass) so that we can see more clearly what changes from user's perspective. (Hopefully, most old code would still run.)

We can only merge this if you agree to maintain the new codebase.

From a maintainer's perspective, such a big rewrite is really hard to review. Not sure what the perfect solution is here. One option would be to split the PR into small chunks that we can actually review. Another would be to trust the tests and just merge if they look good.

…t tests to original setup

junder873 · 2023-08-07T18:40:37Z

Thank you both for the input (and the past work on this package). Just to respond to the different comments: I am happy to maintain the package going forward, though I appreciate your input. I am trying to minimize the breaking changes that the user faces, though this isn't perfect (see below). I have focused less on maintaining compatibility on the backend pieces, the changes there are just too big to make that doable. I can try to think through creating multiple pull requests, it is possible that doing the backend first would work, but I would have to think more about how to do it.

I have been a little slow to work on the tests because I have been trying to make sure the front end works well and work on the backward compatibility. In the most recent set of changes, I added more backward compatibility and reran the tests to see where things stand. Before I get to those results, a few notes:

I focused on the tests that produce tables, so the label_transforms and decorations tests still need updating
With this set of updates, I am proposing to change some defaults to (hopefully) be useful. To create a comparable set of results, I undid these changes. Specifically, the following 4 defaults are different in the current proposal compared to the tests:

RegressionTables.default_fe_suffix(x::RegressionTables.AbstractRenderType) = ""
RegressionTables.default_print_control_indicator(x::RegressionTables.AbstractRenderType) = false
RegressionTables.default_regression_statistics(x::RegressionTables.AbstractRenderType, rrs::Tuple) = [Nobs, R2]
RegressionTables.default_print_estimator(x::RegressionTables.AbstractRenderType, rrs) = true

The first two are new features (adding a suffix after fixed effects and printing a yes/no if coefficients are omitted). The next two defaults vary based on conditions, a nonlinear regression will include the Pseudo R2 and the estimator section is only printed if more than one type of estimator is provided. Because these defaults did not exist/are different than the current version of the package, I changed them back to make tests comparable.

First, going through the actual table output, for the most part the results are similar. There are a few places where spacing is different, often connected to allowing the Estimator to be more than OLS, IV or NL. The HTML tables are also quite different since the padding information is moved into the style section of the table instead of between each cell.

From a user perspective, things are mostly similar. The biggest difference that shows up is that file is now a separate argument to renderSettings. I will discuss why this is and why I am not sure how to fix that next. Other differences are:

regressors argument is now keep, this is one I can probably put back in with a deprecation warning. regressors seems inconsistent with the other arguments of drop and order that work similarly to keep
custom_statistics is gone, replaced by extralines, along with how these work. Simply passing two vectors with the information (and the label in the first argument) works. This means that labels are not necessary there anymore
The decorator arguments (estim_decoration, number_regressions_decoration and below_decoration) are gone. The idea is that for most users, these would be "set and forget" type arguments, so these are changeable but the idea is to change it for a table type or all tables, not necessarily for one specific table. In the tests, I create new table types for the two tables where this matters, but from a user perspective, if those are the settings they want they would not need to create a type.

I don't see any other differences between the current package and this proposal, so I wanted to come back to the differences in renderSettings. The renderSettings argument now expects an AbstractRenderType. The idea behind this type system is to make more use of the Julia type system. The render type provided controls how every other type is rendered, including rounding, labels and decorators as well as the defaults used in the regression table. Importantly, this allows users to set up defaults on a per table basis. For example, if a user has two tables (e.g., a descriptive latex table and a regression latex table), the needs might be different for rounding, headings, etc, but those changes should not require a lot of work to create.

In order to keep the creation of these new types as simple as possible, I didn't want to include any actual information with the type, so a file does not fit well. One solution (that feels kind of hacky) is to use multiple dispatch to split the arguments, so something like:

const asciiOutput = AsciiTable
const latexOutput = LatexTable
const htmlOutput = HtmlTable
(::Type{T})(file::String) where {T<:AbstractRenderType} = T(), file # returns tuple
default_render(x::Nothing) = AsciiTable()
default_render(x::AbstractRenderType) = x
default_render(x::Tuple{<:AbstractRenderType, String}) = x[1]
default_file(rndr::AbstractRenderType, renderSettings) = nothing
default_file(rndr::AbstractRenderType, renderSettings::Tuple{<:AbstractRenderType, String}) = renderSettings[2]

function regtable(
    rrs...;
    renderSettings = nothing,
    rndr::AbstractRenderType = default_render(renderSettings),
    file= default_file(rndr, renderSettings),
    ...
)

This would use the old naming system and allow old code to continue to work (based on some simple testing), probably with a deprecation warning. It is obviously a little ugly and possibly create some confusion if somebody provides both a renderSetting and rndr since rndr would dominate.

Once again, I appreciate the input and want to minimize the change for users.

greimel

could you please remove these two lines? they lead to an error on CI.

Can you also adjust change the minimum julia version in this line

RegressionTables.jl/.github/workflows/ci.yml

Line 18 in 34c5256

    
           - '1.8' # Replace this with the minimum Julia version that your package supports. E.g. if your package requires Julia 1.5 or higher, change this to '1.5'.

to 1.9?

Hopefully we can then see the state of the tests.

test/RegressionTables.jl

test/runtests.jl

Co-authored-by: Fabian Greimel <[email protected]>

greimel · 2023-08-07T19:58:08Z

I am trying to minimize the breaking changes that the user faces

Thanks!

You are proposing the following (sets of) changes.

Change defaults
Separate file from renderSettings
Change regressors to keep
Remove custom_statistics
Handle decorators differently
Remove dependencies, introduce package extensions

Ideally, you prepare separate PRs, where each PR contains the minimal changes to introduce your proposed change.

However I can imagine that some of these changes would depend on the same fundamental backend work. So it might be beneficial to prepare an initial PR with these backend changes first. I think that's what you're contemplating.

I can try to think through creating multiple pull requests, it is possible that doing the backend first would work, but I would have to think more about how to do it.

greimel · 2023-08-07T20:01:05Z

I am trying to run CI again. But it doesn't - let me close and re-open this PR

…ssion results

This reverts commit 4ccecdc.

…tput testable

Helps simplify the overall package

junder873 · 2023-09-20T20:54:15Z

I think I am finally happy overall with the state of this. Do you (@jmboehm and @greimel) have any other suggestions on things to add or change? I have tried to resolve as many of the outstanding pull requests as possible as a part of this.

If not, then I think it is ready to merge.

jmboehm · 2023-09-20T21:20:12Z

Looks great to me, thanks. No more suggestions from my side.

junder873 · 2023-10-09T19:54:05Z

@jmboehm For the documentation to work after merging this, there would need to be some setup under the GitHub settings for the project. As described here, there would need to be a change to GitHub Pages option and here "DOCUMENTER_KEY" so the TagBot can publish versions of the docs.

junder873 · 2023-10-25T14:41:38Z

@jmboehm Just following up, I can't change the setting to allow the documentation to work properly, once that is done though I am happy to merge this request

jmboehm · 2023-11-18T19:51:13Z

Hi @junder873 , apologies for the delay. I've now added the deploy key and the environment variable to the repo. Is there a way to test to see whether it works?

junder873 · 2023-11-20T16:30:43Z

As far as I can tell, there is no easy way to test the documenter key. However, as long as the GitHub Pages part works, it is possible to manually upload versions of documentation if there is an error with the key.

jmboehm · 2023-12-02T18:58:17Z

Great. Is this ready to be merged? I feel I've been holding this up longer than it should have been held up.

junder873 · 2023-12-04T17:38:51Z

I will go ahead and merge it, thank you so much for the suggestions and help.

junder873 added 10 commits June 13, 2023 21:48

Adjust to use output types as Julia types for multiple dispatch

b1ce1f6

change to by row printing

de8ea32

add functionality around interaction terms

09a32b4

some cleanup

b5bfea0

put stats back into main package

faf7f30

reorganize

ed166b2

standardize how text is rendered

a856da7

improve front end

70bc3b9

fixes to make work

7d906d8

correct colwidths with mixed single characters and multiple characters

5d86d12

This comment was marked as resolved.

Sign in to view

junder873 marked this pull request as draft July 31, 2023 15:29

junder873 added 4 commits August 5, 2023 12:14

add all extensions, add standardize_coef, make regressiontype depende…

e32fa95

…nt on distribution

fix defaults

a33100f

fixes to some naming and correct default decoration

1c46fa3

junder873 added 3 commits August 7, 2023 12:30

add symbols as option for backwards compatibility

4c7f37e

add estimformat and statisticformat for backward compatibility, rever…

41fe13b

…t tests to original setup

fix tests

874ded0

greimel requested changes Aug 7, 2023

View reviewed changes

test/RegressionTables.jl Outdated Show resolved Hide resolved

test/runtests.jl Outdated Show resolved Hide resolved

junder873 and others added 3 commits August 7, 2023 15:39

Update ci.yml to 1.9

95c6590

Update test/RegressionTables.jl

be585b9

Co-authored-by: Fabian Greimel <[email protected]>

Update runtests.jl

b1fa77f

greimel approved these changes Aug 7, 2023

View reviewed changes

greimel closed this Aug 7, 2023

This was linked to issues Sep 11, 2023

Automatically use :latex for latexOutput() #124

Closed

custom_statistics : format and storing stats in regression object #129

Closed

junder873 added 7 commits September 17, 2023 19:38

add other_stats option to increase flexibility when combining regre…

c34b7d2

…ssion results

fix test for nightly

2c0d098

fix doc reference and statsmodels 0.7.3

ef38a3b

try to correct docs

4ccecdc

Revert "try to correct docs"

4904e55

This reverts commit 4ccecdc.

make sure all docs are part of tests

44b1a66

update mixedmodels example to try to be constant which should make ou…

00a94a7

…tput testable

This was linked to issues Sep 20, 2023

Number of clusters argument to regression_statistics #40

Closed

Show clusters #121

Closed

junder873 added 4 commits September 20, 2023 13:53

finish removing fe_terms

41715e5

remove SimpleRegressionResult

5ec980e

Helps simplify the overall package

rename simpleRegressionResult -> regressionResults

e5d02ad

fix filename

d2eeba9

junder873 added 3 commits October 3, 2023 11:14

Update README.md

b758d63

update tagbot for documenter key

03d39b3

add deploydocs

843c82d

junder873 added 2 commits November 20, 2023 11:32

doc wording adjustments

3624c10

Add Aqua compat

c8b0263

Add docs link

9c028a8

junder873 merged commit c80aea9 into jmboehm:master Dec 4, 2023
4 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large redesign to add flexibility and user defaults and fewer dependencies #139

Large redesign to add flexibility and user defaults and fewer dependencies #139

junder873 commented Jul 31, 2023 •

edited by greimel

Loading

This comment was marked as resolved.

jmboehm commented Jul 31, 2023

greimel commented Aug 7, 2023

junder873 commented Aug 7, 2023

greimel left a comment

greimel commented Aug 7, 2023

greimel commented Aug 7, 2023

junder873 commented Sep 20, 2023

jmboehm commented Sep 20, 2023

junder873 commented Oct 9, 2023

junder873 commented Oct 25, 2023

jmboehm commented Nov 18, 2023

junder873 commented Nov 20, 2023

jmboehm commented Dec 2, 2023

junder873 commented Dec 4, 2023

Large redesign to add flexibility and user defaults and fewer dependencies #139

Large redesign to add flexibility and user defaults and fewer dependencies #139

Conversation

junder873 commented Jul 31, 2023 • edited by greimel Loading

This comment was marked as resolved.

jmboehm commented Jul 31, 2023

greimel commented Aug 7, 2023

junder873 commented Aug 7, 2023

greimel left a comment

Choose a reason for hiding this comment

greimel commented Aug 7, 2023

greimel commented Aug 7, 2023

junder873 commented Sep 20, 2023

jmboehm commented Sep 20, 2023

junder873 commented Oct 9, 2023

junder873 commented Oct 25, 2023

jmboehm commented Nov 18, 2023

junder873 commented Nov 20, 2023

jmboehm commented Dec 2, 2023

junder873 commented Dec 4, 2023

junder873 commented Jul 31, 2023 •

edited by greimel

Loading