`custom_statistics` : format and storing stats in regression object #129

Gkreindler · 2023-03-16T19:43:25Z

I find the custom_statistics option super helpful, for example for displaying the mean of the outcome variable, number of units (in a panel setting), etc.

I have two questions on features that would make this easier to use for me:

Is it possible to specify a custom format for custom_statistics? Currently, I think they are all %0.3f
Currently, we need to define a NamedTuple. I currently define these stats after running my regression, and carry them alongside the regression object. It would be more convenient to store these stats in the regression object itself and tell regtable what to print. Is this feasible? (I realize this might (also) be a question for the regression packages.)

The text was updated successfully, but these errors were encountered:

jmboehm · 2023-03-16T20:36:38Z

If the statistic is in numeric format, it should be formatted according to statisticformat (which defaults to %0.3f). I agree that this isn't always desirable, in particular if you want to show integers. If the "statistic" is a string, however, it will be displayed as is, so one possible workaround is to format the output before passing it to custom_statistics. To use the example from the test script:

using Statistics, Formatting
comments = ["Baseline", "Preferred"]
means = [sprintf1("%0.6f",Statistics.mean(df.SepalLength[rr1.esample])), sprintf1("%0.6f",Statistics.mean(df.SepalLength[rr2.esample]))]
mystats = NamedTuple{(:comments, :means)}((comments, means))
RegressionTables.regtable(rr1, rr2; renderSettings = RegressionTables.asciiOutput(), regression_statistics = [:nobs, :r2],custom_statistics = mystats, labels = Dict("__LABEL_CUSTOM_STATISTIC_comments__" => "Specification", "__LABEL_CUSTOM_STATISTIC_means__" => "My custom mean") )

If you have an idea about what would be a good interface for the formatting of these additional statistics, let me know.

Yeah, I agree. We had a similar discussion in the context of having the RegressionModel (or FixedEffectModel etc) store custom covariance matrices. Three options: 1) if you feel that the statistic should be part of every regression model, you could file a PR in StatsBase.jl to add the relevant statistic to the abstraction; 2) if you think it's something that the output from FixedEffectModels or GLFixedEffectModels should have, we could add that; 3) if it's very specific to your application, you could define your own struct that contains the RegressionModel (or whatever type your estimator is producing) as well as your custom statistics, and then write a short function that wraps regtable and fills in the relevant custom_statistics. If none of these sound satisfactory, we could have custom_statistics take functions as arguments that would take the RegressionModel as arguments and produce formatted output, something like this:

function mycustomstatistic(rr::RegressionModel)
	return 3.141592 	# or something that depends only on rr
end
mystats = NamedTuple{(:foo)}(mycustomstatistic)
RegressionTables.regtable(rr1, rr2; renderSettings = RegressionTables.asciiOutput(), custom_statistics = mystats)

Let me know what you think.

Gkreindler · 2023-03-16T20:54:53Z

"pre"-formatting the statistics as strings is a convenient solution!

For the 2nd issue, for my workflow, the ideal would be if RegressionModel has an attribute other_stats that is a Dict that I could load anything into (application-specific). This would be convenient because in my workflow, I find it convenient after I estimate a model to compute a few other statistics and store them in (attach them to) the rr object. These may also depend on the dataframe, etc., which, as far I understand, is (for good reasons) not included in rr.

It is a great suggestion to do this via a struct that includes RegressionModel and a wrapper to regtable! I'll report back with an example if I implement that.

Gkreindler · 2023-03-18T00:13:59Z

Here is my code to wrap fixed effects model (linear and GL) to include statistics, and to then include them (with formatting) in a regression table. I'm sure that this can be much improved!

using FixedEffectModels
using GLFixedEffectModels
using RegressionTables
using DataFrames
import Formatting: sprintf1

### Define FixedEffectModel with additional statistics 
mutable struct FEmodel
    model::Union{FixedEffectModel, GLFixedEffectModel}
    stats::Dict{Symbol, Union{String, Number}}
end

function FEmodel(mymodel::Union{FixedEffectModel, GLFixedEffectModel}, stats::Union{Nothing, Dict{Symbol, Union{String, Number}}}) 
    if isnothing(stats)
        emptydict = Dict{Symbol, Union{String, Number}}()
        return FEmodel(mymodel, emptydict)
    end
    return FEmodel(mymodel, stats)
end

function regtable_stats(
    mymodels::Vararg{FEmodel}; 
    custom_statistics_order::Union{Nothing, Vector{Symbol}, Tuple{Symbol}}=nothing, 
    custom_statistics_format::Union{Nothing, Dict{Symbol, String}}=nothing, 
    kwargs...)
    
    # all statistics names
        allkeys = union([Set(keys(mymodel.stats)) for mymodel=mymodels]...)
        if isnothing(custom_statistics_order)
            custom_statistics_order = sort(allkeys |> collect)
        else
            @assert Set(custom_statistics_order) == allkeys
        end

    # formatting
        if isnothing(custom_statistics_format)
            custom_statistics_format = Dict()
        end

    custom_statistic_dict = Dict{Symbol, Any}()
    for mykey=custom_statistics_order
        stat_entries = Vector{String}(undef, length(mymodels))

        for (idx, mymodel) = enumerate(mymodels)
            if mykey ∈ keys(mymodel.stats)

                myentry = mymodel.stats[mykey]
                
                if mykey ∈ keys(custom_statistics_format)
                    myformat = custom_statistics_format[mykey]
                else
                    if isa(myentry, String) || isa(myentry, Bool)
                        myformat = "%s"
                    elseif isa(myentry, Int)
                        myformat = "%'i" # "%d"
                    elseif isa(myentry, Real)
                        myformat = "%0.3f"
                    end
                end
                
                stat_entries[idx] = sprintf1(myformat, myentry)
            else
                stat_entries[idx] = ""
            end
        end 

        custom_statistic_dict[mykey] = stat_entries
    end

    # custom stats names
        custom_stats_vectors = [custom_statistic_dict[mykey] for mykey=custom_statistics_order]
        
        custom_statistics = NamedTuple{Tuple(custom_statistics_order)}(Tuple(custom_stats_vectors))

    # call regtable
        my_models = [mymodel.model for mymodel=mymodels]
        return regtable(my_models..., custom_statistics=custom_statistics; kwargs...)
end


    ### Fake data
    testdf = DataFrame("a" => [1,0,1,0.5], "b" => [1.2, 3.2, 1.1, 1.01], "fe" => [1, 1, 0, 0])
    testdf.b2 = testdf.b .^ 2

    ### Run some regressions
        r1 = reg(testdf, term(:a) ~ term(:b) + FixedEffectModels.fe(:fe))
        rr = FEmodel(r1,  Dict(:quadratic => 0.0, :linear => "a"))

        r2 = reg(testdf, term(:a) ~ term(:b2) + FixedEffectModels.fe(:fe))
        rr2 = FEmodel(r2, Dict(:quadratic => 1.99, :square => true))

        mymodels = [rr, rr2]

    ### Table -- minimal options
        regtable_stats(mymodels..., renderSettings = asciiOutput())

    ### Table -- full control
        custom_statistics_order = [:square, :quadratic, :linear] # Need to include ALL stats here, otherwise errors
        custom_statistics_format = Dict(:square => "%s", :quadratic => "%0.1f") # ok to only include some
        regtable_stats(mymodels..., 
                custom_statistics_order=custom_statistics_order,  
                custom_statistics_format=custom_statistics_format,
                renderSettings = asciiOutput())

jmboehm · 2023-03-19T21:27:01Z

Looks neat! I'm wondering whether it would make sense to implement this as a new parametric type in RegressionTables.jl, something like this:

struct AugmentedRegressionModel{T}
    model::T
    stats::Dict{Symbol, Union{String, Number}}
end

The advantage would be that it could work out-of-the-box with any output model type, including anything that's implementing the StatsBase abstraction. That could be a neat way to override the estimated VCov matrix with some custom one as well...

junder873 mentioned this issue Jul 31, 2023

Large redesign to add flexibility and user defaults and fewer dependencies #139

Merged

junder873 linked a pull request Sep 11, 2023 that will close this issue

Large redesign to add flexibility and user defaults and fewer dependencies #139

Merged

junder873 closed this as completed in #139 Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`custom_statistics` : format and storing stats in regression object #129

`custom_statistics` : format and storing stats in regression object #129

Gkreindler commented Mar 16, 2023

jmboehm commented Mar 16, 2023

Gkreindler commented Mar 16, 2023

Gkreindler commented Mar 18, 2023 •

edited

Loading

jmboehm commented Mar 19, 2023

custom_statistics : format and storing stats in regression object #129

custom_statistics : format and storing stats in regression object #129

Comments

Gkreindler commented Mar 16, 2023

jmboehm commented Mar 16, 2023

Gkreindler commented Mar 16, 2023

Gkreindler commented Mar 18, 2023 • edited Loading

jmboehm commented Mar 19, 2023

`custom_statistics` : format and storing stats in regression object #129

`custom_statistics` : format and storing stats in regression object #129

Gkreindler commented Mar 18, 2023 •

edited

Loading