-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V3: complete overhaul #372
Conversation
@Datseris This is as ready as it gets from my side now. A review would be appreciated! PS: I completely understand that this will probably take some time to go through, given the useless git diff. Please use documentation, the examples and the docstrings to guide the review! There are many, many more examples now, and they are mostly all cross-referenced. I essentially have no more time to spend on this in the foreseeable future, so I hope this is good enough to at least publish a new version that is up to speed with the latest version ComplexityMeasures.jl. At least it will be better than the current version, which lacks some compatibility. |
hi, sorry for the next two months I am super duper booked finishing projects as well as finishing some papers I am currently writing. I can have a detailed look at this after mid September (please ping again if I forget at this time!). I think Associations.jl is a fine name. Having lots of breaking changes is also fine as this is a major new release. And, if you rename the packge, you in fact release a new package anyways as that's how renaming a package works in Julia. |
@Datseris Thank you for the quick response. I think moving quickly here is preferred, as opposed to waiting several months for a release, for two reasons:
I already spent a few weeks quality-checking this improving tests (~95% coverage) and expanding with more runnable examples (all measures/estimators, independence tests and causal graph algorithms have runnable examples that are cross-referenced in docstrings). Given your busy schedule (and mine): this is good enough for a new release as is. If anything major shows up that requires breaking changes, we solve it either by deprecations or by releasing another new major version fairly quickly. Any other minor kinks can just be sorted with regular PRs. I propose to merge this now, change the package name and release a new package Associations.jl. Yay/nay? |
Okay, yes, this is fine. I went through the docs now (very quickly) and it looks very good anyways. I can open issues after you've released Associations.jl v1 anyways. I think this needs a central tutorial akin to ComplexityMeasures.jl, but I undersatnd this would take also time :D In the next semester I may be able to get PhD funding for someone working on Causal Graphs in cloudiness and if I get that they would also contribute and maintain this library. Fingers crossed! |
Perfect.
It does! We can just open an issue for it. For my part, it would be better to write this is a while, when I no longer have a fresh mental model of the package in my head. That'll make it easier to get into the package beginner mindset, and know what to focus on.
Nice! Fingers crossed for grant money💰 |
V3: breaking and other changes
This PR contains a complete overhaul of the codebase. The git diff is meaningless. It is better to have a look at the documentation (docstrings and examples, which there are plenty of).
The docs are here. Little content has changed, but the syntax has changed across the board.
This will be the definitely last silly-big PR. There will be no more large redesigns unless there is an extremely good reason for it. Given that we can now seamlessly use everything from ComplexityMeasures.jl here, I doubt there will be anything new coming up in the foreseeable future.
Package structuring
The package is organised around three main themes:
This is now reflected systematically across folder structure documentation structure and through the type system (hence a lot of moving files, and the huge diff).
Breaking changes
Many changes are breaking. The reason is simple: I don't have the extra time to spend rigorously making everything backwards-compatible. Finalising this PR already took many weeks, and it builds on endless experimentation and back-and-forth design changes over the past few years (in tandem with the work we did in ComplexityMeasures.jl). I simply don't have more time to spend on this nitpicking this any more, given that my industry jobs take up most of my time now. I'd much rather lock the design, finish V3, and move on with finishing all the features I've been working on and finalising the accompanying paper. Hence: deprecations for earlier versions are gone, much is breaking (but these changes should be well-covered in the changelog).
The main changes are
Three main contents define the package: association measures/estimators, independence testing, and network inference.
association
is the function that computes associations between variables.Estimators contains measure definitions (as we do in ComplexityMeasures.jl). For example
association(KSG1(MIShannon()), x, y)
computes Shannon mutual information.Counts/probabilities for multivariate data.
A complete overhaul of the multivariate information API.
Two different ways of discretising input data: per row/point (
CodifyPoints
) and per variable/column (CodifyVariables
).CodifyPoints
takesEncoding
s as inputs, andCodifyVariables
takesOutcomeSpace
s as inputs. See the documentation for how these work.Rename
SurrogateTest
toSurrogateAssociationTest
, since TimeseriesSurrogates.jl now definesSurrogateTest
.Slim down the documentation by keeping much more (condensed) stuff as tables in docstrings
Convenience functions are gone.
Example systems are removed (these should go into the PredefinedDynamicalSystems.jl package, or some other package, at some point).
A LOT of examples. Every association measure, every estimator, every independence testing method, and every network inference method is documented with at least one runnable example that gets executed when the docs build.
Increased test coverage to >92%.
Made a breakout-PR (Future features from 3.0 #374 ) containing a lot of unfinished content that will at some point go into the package. But this has to be done in a more structured manner in separate PRs, so I've completely removed anything temporarily from the codebase in this PR.
Big decision: package renaming?
I seriously want to rename the package. It consists of three main things
"CausalityTools" is a loaded term, which may give the user the impression that all or most functionality here is related too "causal inference". But causality has very specific meanings in specific contexts, and I don't want to dive into that controversy here. In preparation for the paper that will accompany v3.X, I want to keep everything as neutral as possible here.
We had a discussion in the Bergen group and we agreed that "Associations.jl" is a better name.
("AssociationMeasures.jl" could also work, to mirror ComplexityMeasures.jl, however, it limits the scope of the package - we'd rather use "associations" alone, since that also can contain things like independence testing and network inference under the same umbrella).
Yay/nay?