Skip to content

Commit

Permalink
Add more examples
Browse files Browse the repository at this point in the history
  • Loading branch information
kahaaga committed Jul 22, 2024
1 parent a1e1b09 commit 0aa4b73
Show file tree
Hide file tree
Showing 9 changed files with 252 additions and 74 deletions.
196 changes: 188 additions & 8 deletions docs/src/examples/examples_associations.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,107 @@
# Examples of association measure estimation

## [`HellingerDistance`](@ref)

### [From precomputed probabilities](@id example_HellingerDistance_precomputed_probabilities)

```@example example_HellingerDistance
using CausalityTools
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(HellingerDistance(), p1, p2)
```

### [[`JointProbabilities`](@ref) + [`OrdinalPatterns`](@ref)](@id example_HellingerDistance_JointProbabilities_OrdinalPatterns)

We expect the Hellinger distance between two uncorrelated variables to be close to zero.

```@example example_HellingerDistance
using CausalityTools
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(HellingerDistance(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
```

## [`KLDivergence`](@ref)

### [From precomputed probabilities](@id example_KLDivergence_precomputed_probabilities)

```@example example_KLDivergence
using CausalityTools
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(KLDivergence(), p1, p2)
```

### [[`JointProbabilities`](@ref) + [`OrdinalPatterns`](@ref)](@id example_KLDivergence_JointProbabilities_OrdinalPatterns)

We expect the [`KlDivergence`](@ref) between two uncorrelated variables to be close to zero.

```@example example_KLDivergence
using CausalityTools
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(KLDivergence(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
```


## [`RenyiDivergence`](@ref)

### [From precomputed probabilities](@id example_RenyiDivergence_precomputed_probabilities)

```@example example_RenyiDivergence
using CausalityTools
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(RenyiDivergence(), p1, p2)
```

### [[`JointProbabilities`](@ref) + [`OrdinalPatterns`](@ref)](@id example_RenyiDivergence_JointProbabilities_OrdinalPatterns)

We expect the [`RenyiDivergence`](@ref) between two uncorrelated variables to be close to zero.

```@example example_RenyiDivergence
using CausalityTools
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(RenyiDivergence(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
```


## [`VariationDistance`](@ref)

### [From precomputed probabilities](@id example_VariationDistance_precomputed_probabilities)

```@example example_VariationDistance
using CausalityTools
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(VariationDistance(), p1, p2)
```

### [[`JointProbabilities`](@ref) + [`OrdinalPatterns`](@ref)](@id example_VariationDistance_JointProbabilities_OrdinalPatterns)

We expect the [`VariationDistance`](@ref) between two uncorrelated variables to be close to zero.

```@example example_VariationDistance
using CausalityTools
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(VariationDistance(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
```

## [`JointEntropyShannon`](@ref)

### [[`JointProbabilities`](@ref) with [`Dispersion`](@ref)](@id example_JointEntropyShannon_Dispersion)
Expand Down Expand Up @@ -45,7 +147,7 @@ association(est, x, y)

## [`ConditionalEntropyShannon`](@ref)

### Analytical examples
### [Analytical examples](@id example_ConditionalEntropyShannon_analytical)

This is essentially example 2.2.1 in Cover & Thomas (2006), where they use the following
relative frequency table as an example. Notethat Julia is column-major, so we need to
Expand Down Expand Up @@ -88,7 +190,7 @@ pyx = Probabilities(transpose(freqs_yx))
ce_y_given_x = association(ConditionalEntropyShannon(), pyx) |> Rational
```

### [`ConditionalEntropyShannon`](@ref) with [`JointProbabilities`](@ref) estimator
### [[`JointProbabilities`](@ref) + [`CodifyVariables`](@ref) + [`UniqueElements`](@ref)](@id example_ConditionalEntropyShannon_JointProbabilities_CodifyVariables_UniqueElements)

We can of course also estimate conditional entropy from data. To do so, we'll use the
[`JointProbabilities`](@ref) estimator, which constructs a multivariate PMF for us.
Expand All @@ -98,14 +200,92 @@ are estimated under the hood for us.
Let's first demonstrate on some categorical data. For that, we must use
[`UniqueElements`](@ref) as the discretization (i.e. just count unique elements).

```@example
using CausalityTools, Random
rng = MersenneTwister(1234)
x = rand(rng, 1:3, 1000)
y = rand(rng, ["The Witcher", "Lord of the Rings"], 1000)
```@example example_ConditionalEntropyShannon_JointProbabilities_CodifyVariables_UniqueElements
using CausalityTools
using Random; rng = Xoshiro(1234)
n = 1000
rating = rand(rng, 1:6, n)
movie = rand(rng, ["The Witcher: the movie", "Lord of the Rings"], n)
disc = CodifyVariables(UniqueElements())
est = JointProbabilities(ConditionalEntropyShannon(), disc)
association(est, x, y)
association(est, rating, movie)
```

### [[`JointProbabilities`](@ref) + [`CodifyPoints`](@ref) + [`UniqueElementsEncoding`](@ref)](@id example_ConditionalEntropyShannon_JointProbabilities_CodifyPoints_UniqueElementsEncoding)

```@example example_ConditionalEntropyShannon_JointProbabilities_CodifyPoints_UniqueElementsEncoding
using CausalityTools
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1:5, 100), rand(rng, 1:5, 100), rand(rng, 1:3, 100)
X = StateSpaceSet(x, z)
Y = StateSpaceSet(y, z)
disc = CodifyPoints(UniqueElementsEncoding(X), UniqueElementsEncoding(Y));
est = JointProbabilities(ConditionalEntropyShannon(), disc);
association(est, X, Y)
```

## [`ConditionalEntropyTsallisAbe`](@ref)

### [[`JointProbabilities`](@ref) + [`CodifyVariables`](@ref) + [`UniqueElements`](@ref)](@id example_ConditionalEntropyTsallisAbe_JointProbabilities_CodifyVariables_UniqueElements)

We'll here repeat the analysis we did for [`ConditionalEntropyShannon`](@ref) above.

```@example example_ConditionalEntropyTsallisAbe_JointProbabilities_CodifyVariables_UniqueElements
using CausalityTools
using Random; rng = Xoshiro(1234)
n = 1000
rating = rand(rng, 1:6, n)
movie = rand(rng, ["The Witcher: the movie", "Lord of the Rings"], n)
disc = CodifyVariables(UniqueElements())
est = JointProbabilities(ConditionalEntropyTsallisAbe(q =1.5), disc)
association(est, rating, movie)
```

### [[`JointProbabilities`](@ref) + [`CodifyPoints`](@ref) + [`UniqueElementsEncoding`](@ref)](@id example_ConditionalEntropyTsallisAbe_JointProbabilities_CodifyPoints_UniqueElementsEncoding)

```@example example_ConditionalEntropyTsallisAbe_JointProbabilities_CodifyPoints_UniqueElementsEncoding
using CausalityTools
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1:5, 100), rand(rng, 1:5, 100), rand(rng, 1:3, 100)
X = StateSpaceSet(x, z)
Y = StateSpaceSet(y, z)
disc = CodifyPoints(UniqueElementsEncoding(X), UniqueElementsEncoding(Y));
est = JointProbabilities(ConditionalEntropyTsallisAbe(q = 1.5), disc);
association(est, X, Y)
```


## [`ConditionalEntropyTsallisFuruichi`](@ref)

### [[`JointProbabilities`](@ref) + [`CodifyVariables`](@ref) + [`UniqueElements`](@ref)](@id example_ConditionalEntropyTsallisFuruichi_JointProbabilities_CodifyVariables_UniqueElements)

We'll here repeat the analysis we did for [`ConditionalEntropyShannon`](@ref) and [`ConditionalEntropyTsallisAbe`](@ref) above.

```@example example_ConditionalEntropyTsallisFuruichi_JointProbabilities_CodifyVariables_UniqueElements
using CausalityTools
using Random; rng = Xoshiro(1234)
n = 1000
rating = rand(rng, 1:6, n)
movie = rand(rng, ["The Witcher: the movie", "Lord of the Rings"], n)
disc = CodifyVariables(UniqueElements())
est = JointProbabilities(ConditionalEntropyTsallisFuruichi(q =0.5), disc)
association(est, rating, movie)
```

### [[`JointProbabilities`](@ref) + [`CodifyPoints`](@ref) + [`UniqueElementsEncoding`](@ref)](@id example_ConditionalEntropyTsallisFuruichi_JointProbabilities_CodifyPoints_UniqueElementsEncoding)

```@example example_ConditionalEntropyTsallisFuruichi_JointProbabilities_CodifyPoints_UniqueElementsEncoding
using CausalityTools
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1:5, 100), rand(rng, 1:5, 100), rand(rng, 1:3, 100)
X = StateSpaceSet(x, z)
Y = StateSpaceSet(y, z)
disc = CodifyPoints(UniqueElementsEncoding(X), UniqueElementsEncoding(Y));
est = JointProbabilities(ConditionalEntropyTsallisFuruichi(q = 0.5), disc);
association(est, X, Y)
```


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,16 @@ where ``h^S(\\cdot)`` and ``h^S(\\cdot | \\cdot)`` are the [`Shannon`](@ref)
differential entropy and Shannon joint differential entropy, respectively. This is the
definition used when calling [`entropy_conditional`](@ref) with a
[`DifferentialEntropyEstimator`](@ref).
## Estimation
- [Example 1](@ref example_ConditionalEntropyShannon_analytical): Analytical example from Cover & Thomas's book.
- [Example 2](@ref example_ConditionalEntropyShannon_JointProbabilities_CodifyVariables_UniqueElements):
[`JointProbabilities`](@ref) estimator with[`CodifyVariables`](@ref) discretization and
[`UniqueElements`](@ref) outcome space on categorical data.
- [Example 3](@ref example_ConditionalEntropyShannon_JointProbabilities_CodifyPoints_UniqueElementsEncoding):
[`JointProbabilities`](@ref) estimator with [`CodifyPoints`](@ref) discretization and [`UniqueElementsEncoding`](@ref)
encoding of points on numerical data.
"""
Base.@kwdef struct ConditionalEntropyShannon{B} <: ConditionalEntropy
base::B = 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,15 @@ H_q^{T_A}(X | Y) = \\dfrac{H_q^T(X, Y) - H_q^T(Y)}{1 + (1-q)H_q^T(Y)},
where ``H_q^T(\\cdot)`` and ``H_q^T(\\cdot, \\cdot)`` is the [`Tsallis`](@ref)
entropy and the joint Tsallis entropy.
## Estimation
- [Example 1](@ref example_ConditionalEntropyTsallisAbe_JointProbabilities_CodifyVariables_UniqueElements):
[`JointProbabilities`](@ref) estimator with[`CodifyVariables`](@ref) discretization and
[`UniqueElements`](@ref) outcome space on categorical data.
- [Example 2](@ref example_ConditionalEntropyTsallisAbe_JointProbabilities_CodifyPoints_UniqueElementsEncoding):
[`JointProbabilities`](@ref) estimator with [`CodifyPoints`](@ref) discretization and [`UniqueElementsEncoding`](@ref)
encoding of points on numerical data.
"""
Base.@kwdef struct ConditionalEntropyTsallisAbe{B, Q} <: ConditionalEntropy
base::B = 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,15 @@ p(x, y) \\log(p(x | y))
If any of the entries of the marginal distribution for `Y` are zero, then the
measure is undefined and `NaN` is returned.
## Estimation
- [Example 1](@ref example_ConditionalEntropyTsallisFuruichi_JointProbabilities_CodifyVariables_UniqueElements):
[`JointProbabilities`](@ref) estimator with[`CodifyVariables`](@ref) discretization and
[`UniqueElements`](@ref) outcome space on categorical data.
- [Example 2](@ref example_ConditionalEntropyTsallisFuruichi_JointProbabilities_CodifyPoints_UniqueElementsEncoding):
[`JointProbabilities`](@ref) estimator with [`CodifyPoints`](@ref) discretization and [`UniqueElementsEncoding`](@ref)
encoding of points on numerical data.
"""
Base.@kwdef struct ConditionalEntropyTsallisFuruichi{B, Q} <: ConditionalEntropy
base::B = 2
Expand All @@ -61,9 +70,11 @@ function association(definition::ConditionalEntropyTsallisFuruichi, pxy::Probabi
qlog = logq0(q)
for j in 1:Ny
pyⱼ = py[j]
for i in 1:Nx
pxyᵢⱼ = pxy[i, j]
ce += pxyᵢⱼ^q * qlog(pxyᵢⱼ / pyⱼ)
if (pyⱼ > 0)
for i in 1:Nx
pxyᵢⱼ = pxy[i, j]
ce += pxyᵢⱼ^q * qlog(pxyᵢⱼ / pyⱼ)
end
end
end
ce *= -1.0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,23 +27,11 @@ D_{H}(P_Y(\\Omega) || P_Y(\\Omega)) =
\\dfrac{1}{\\sqrt{2}} \\sum_{\\omega \\in \\Omega} (\\sqrt{p_x(\\omega)} - \\sqrt{p_y(\\omega)})^2
```
## Examples
```julia
using CausalityTools
# From raw data
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(HellingerDistance(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(HellingerDistance(), p1, p2)
```
## Estimation
- [Example 1](@ref example_HellingerDistance_precomputed_probabilities): From precomputed probabilities
- [Example 2](@ref example_HellingerDistance_JointProbabilities_OrdinalPatterns):
[`JointProbabilities`](@ref) with [`OrdinalPatterns`](@ref) outcome space
"""
struct HellingerDistance <: DivergenceOrDistance end

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,22 +43,11 @@ D_{KL}(P_Y(\\Omega) || P_Y(\\Omega)) =
Distances.jl also defines `KLDivergence`. Quality it if you're loading both
packages, i.e. do `association(CausalityTools.KLDivergence(), x, y)`.
## Examples
```julia
using CausalityTools
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(KLDivergence(), CodifyVariables(OrdinalPatterns(m=3)))
# There should be zero information gain from `x` over `y` for independent random variables.
abs(div_kl) ≤ 0.001
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(KLDivergence(), p1, p2)
```
## Estimation
- [Example 1](@ref example_KLDivergence_precomputed_probabilities): From precomputed probabilities
- [Example 2](@ref example_KLDivergence_JointProbabilities_OrdinalPatterns):
[`JointProbabilities`](@ref) with [`OrdinalPatterns`](@ref) outcome space
"""
struct KLDivergence{B} <: DivergenceOrDistance
base::B
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,22 +41,12 @@ D_{q}(P_Y(\\Omega) || P_Y(\\Omega)) =
Distances.jl also defines `RenyiDivergence`. Quality it if you're loading both
packages, i.e. do `association(CausalityTools.RenyiDivergence(), x, y)`.
## Examples
```julia
using CausalityTools
# From raw data
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(RenyiDivergence(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(RenyiDivergence(), p1, p2)
```
## Estimation
- [Example 1](@ref example_RenyiDivergence_precomputed_probabilities): From precomputed probabilities
- [Example 2](@ref example_RenyiDivergence_JointProbabilities_OrdinalPatterns):
[`JointProbabilities`](@ref) with [`OrdinalPatterns`](@ref) outcome space
"""
struct RenyiDivergence{Q, B} <: DivergenceOrDistance
q::Q
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,9 @@ D_{V}(P_Y(\\Omega) || P_Y(\\Omega)) =
## Examples
```julia
using CausalityTools
# From raw data
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(VariationDistance(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(VariationDistance(), p1, p2)
```
- [Example 1](@ref example_VariationDistance_precomputed_probabilities): From precomputed probabilities
- [Example 2](@ref example_VariationDistance_JointProbabilities_OrdinalPatterns):
[`JointProbabilities`](@ref) with [`OrdinalPatterns`](@ref) outcome space
"""
struct VariationDistance <: DivergenceOrDistance end

Expand Down
Loading

0 comments on commit 0aa4b73

Please sign in to comment.