-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
New outcome space and encoding:
BubbleSortSwaps
and `BubbleSortSwap…
…sEncoding` (#390) * WIP: "bubble entropy" * Fix tests and be more efficient * Add to docs * Rename and add more tests * Fix StatisticalComplexity test Parameters must be defined inside test set * Add example * Add `BubbleEntropy` complexity measure * Address comments * Update src/outcome_spaces/bubble_sort_swaps.jl Co-authored-by: George Datseris <[email protected]> * Update src/outcome_spaces/bubble_sort_swaps.jl Co-authored-by: George Datseris <[email protected]> * Update src/outcome_spaces/bubble_sort_swaps.jl Co-authored-by: George Datseris <[email protected]> * Remove redundant method * Remove, not comment away * Add change log entry and up version * address comment --------- Co-authored-by: George Datseris <[email protected]>
- Loading branch information
Showing
17 changed files
with
284 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ name = "ComplexityMeasures" | |
uuid = "ab4b797d-85ee-42ba-b621-05d793b346a2" | ||
authors = "Kristian Agasøster Haaga <[email protected]>, George Datseries <[email protected]>" | ||
repo = "https://github.com/juliadynamics/ComplexityMeasures.jl.git" | ||
version = "3.3.0" | ||
version = "3.4.0" | ||
|
||
[deps] | ||
Combinatorics = "861a8166-3701-5b0c-9a16-15d98fcdc6aa" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,3 +50,9 @@ entropy_complexity_curves | |
```@docs | ||
LempelZiv76 | ||
``` | ||
|
||
## Bubble entropy | ||
|
||
```@docs | ||
BubbleEntropy | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
export BubbleEntropy | ||
|
||
""" | ||
BubbleEntropy <: ComplexityEstimator | ||
BubbleEntropy(; m = 3, τ = 1, definition = Renyi(q = 2)) | ||
The `BubbleEntropy` complexity estimator [Manis2017](@cite) is just a difference | ||
between two entropies, each computed with the [`BubbleSortSwaps`](@ref) outcome space, for | ||
embedding dimensions `m + 1` and `m`, respectively. | ||
[Manis2017](@citet) use the [`Renyi`](@ref) entropy of order `q = 2` as the | ||
information measure `definition`, but here you can use any [`InformationMeasure`](@ref). | ||
[Manis2017](@citet) formulates the "bubble entropy" as the normalized measure below, | ||
while here you can also compute the unnormalized measure. | ||
## Definition | ||
For input data `x`, the "bubble entropy" is computed by first embedding the input data | ||
using embedding dimension `m` and embedding delay `τ` (call the embedded pts `y`), and | ||
then computing the difference between the two entropies: | ||
```math | ||
BubbleEn_T(τ) = H_T(y, m + 1) - H_T(y, m) | ||
``` | ||
where ``H_T(y, m)`` and ``H_T(y, m + 1)`` are entropies of type ``T`` | ||
(e.g. [`Renyi`](@ref)) computed with the input data `x` embedded to dimension ``m`` and | ||
``m+1``, respectively. Use [`complexity`](@ref) to compute this non-normalized version. | ||
Use [`complexity_normalized`](@ref) to compute the normalized difference of entropies: | ||
```math | ||
BubbleEn_H(τ)^{norm} = | ||
\\dfrac{H_T(x, m + 1) - H_T(x, m)}{max(H_T(x, m + 1)) - max(H_T(x, m))}, | ||
``` | ||
where the maximum of the entropies for dimensions `m` and `m + 1` are computed using | ||
[`information_maximum`](@ref). | ||
## Example | ||
```julia | ||
using ComplexityMeasures | ||
x = rand(1000) | ||
est = BubbleEntropy(m = 5, τ = 3) | ||
complexity(est, x) | ||
``` | ||
""" | ||
Base.@kwdef struct BubbleEntropy{M, T, D} <: ComplexityEstimator | ||
m::M = 3 | ||
τ::T = 1 | ||
definition::D = Renyi(q = 2) | ||
end | ||
|
||
function complexity(est::BubbleEntropy, x) | ||
o_m = BubbleSortSwaps(m = est.m) | ||
o_m₊₁ = BubbleSortSwaps(m = est.m + 1) | ||
h_m = information(est.definition, o_m, x) | ||
h_m₊₁ = information(est.definition, o_m₊₁, x) | ||
return h_m₊₁ - h_m | ||
end | ||
|
||
function complexity_normalized(est::BubbleEntropy, x) | ||
o_m = BubbleSortSwaps(m = est.m) | ||
o_m₊₁ = BubbleSortSwaps(m = est.m + 1) | ||
h_m = information(est.definition, o_m, x) | ||
h_m₊₁ = information(est.definition, o_m₊₁, x) | ||
|
||
# The normalized factor as (I think) described in Manis et al. (2017). | ||
# Their description is a bit unclear to me. | ||
h_max_m = information_maximum(est.definition, o_m, x) | ||
h_max_m₊₁ = information_maximum(est.definition, o_m₊₁, x) | ||
norm_factor = (h_max_m₊₁ - h_max_m) # maximum difference for dims `m` and `m + 1` | ||
|
||
return (h_m₊₁ - h_m)/norm_factor | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
61 changes: 61 additions & 0 deletions
61
src/encoding_implementations/bubble_sort_swaps_encoding.jl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
using StaticArrays | ||
export BubbleSortSwapsEncoding | ||
""" | ||
BubbleSortSwapsEncoding <: Encoding | ||
BubbleSortSwapsEncoding{m}() | ||
`BubbleSortSwapsEncoding` is used with [`encode`](@ref) to encode a length-`m` input | ||
vector `x` into an integer in the range `ω ∈ 0:((m*(m-1)) ÷ 2)`, by counting the number | ||
of swaps required for the bubble sort algorithm to sort `x` in ascending order. | ||
[`decode`](@ref) is not implemented for this encoding. | ||
## Example | ||
```julia | ||
using ComplexityMeasures | ||
x = [1, 5, 3, 1, 2] | ||
e = BubbleSortSwapsEncoding{5}() # constructor type argument must match length of vector | ||
encode(e, x) | ||
``` | ||
""" | ||
struct BubbleSortSwapsEncoding{m, V <: AbstractVector} <: Encoding | ||
x::V # tmp vector | ||
end | ||
|
||
function BubbleSortSwapsEncoding{m}() where {m} | ||
v = zeros(m) | ||
return BubbleSortSwapsEncoding{m, typeof(v)}(v) | ||
end | ||
|
||
function encode(encoding::BubbleSortSwapsEncoding, x::AbstractVector) | ||
return n_swaps_for_bubblesort(encoding, x) | ||
end | ||
|
||
# super naive bubble sort | ||
function n_swaps_for_bubblesort(encoding::BubbleSortSwapsEncoding, state_vector) | ||
(; x) = encoding | ||
x .= state_vector | ||
L = length(state_vector) | ||
n = 0 | ||
swapped = true | ||
while swapped | ||
swapped = false | ||
n_swaps = 0 | ||
for j = 1:(L - 1) | ||
if x[j] > x[j+1] | ||
n_swaps += 1 | ||
x[j], x[j+1] = x[j+1], x[j] # move smallest element to the right | ||
end | ||
end | ||
if iszero(n_swaps) | ||
return n | ||
else | ||
swapped = true | ||
n += n_swaps | ||
end | ||
end | ||
return n | ||
end | ||
|
||
# there's no meaningful way to define `decode`, so it is not implemented. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
export BubbleSortSwaps | ||
|
||
""" | ||
BubbleSortSwaps <: CountBasedOutcomeSpace | ||
BubbleSortSwaps(; m = 3, τ = 1) | ||
The `BubbleSortSwaps` outcome space is based on [Manis2017](@citet)'s | ||
paper on "bubble entropy". | ||
## Description | ||
`BubbleSortSwaps` does the following: | ||
- Embeds the input data using embedding dimension `m` and embedding lag `τ` | ||
- For each state vector in the embedding, counting how many swaps are necessary for | ||
the bubble sort algorithm to sort state vectors. | ||
For [`counts_and_outcomes`](@ref), we then define a distribution over the number of | ||
necessary swaps. This distribution can then be used to estimate probabilities using | ||
[`probabilities_and_outcomes`](@ref), which again can be used to estimate any | ||
[`InformationMeasure`](@ref). An example of how to compute the "Shannon bubble entropy" | ||
is given below. | ||
## Outcome space | ||
The [`outcome_space`](@ref) for `BubbleSortSwaps` are the integers | ||
`0:N`, where `N = (m * (m - 1)) / 2 + 1` (the worst-case number of swaps). Hence, | ||
the number of [`total_outcomes`](@ref) is `N + 1`. | ||
## Implements | ||
- [`codify`](@ref). Returns the number of swaps required for each embedded state vector. | ||
## Examples | ||
With the `BubbleSortSwaps` outcome space, we can easily compute a "bubble entropy" | ||
inspired by [Manis2017](@cite). Note: this is not actually a new entropy - it is just | ||
a new way of discretizing the input data. To reproduce the bubble entropy complexity | ||
measure from [Manis2017](@cite), see [`BubbleEntropy`](@ref). | ||
## Examples | ||
```julia | ||
using ComplexityMeasures | ||
x = rand(100000) | ||
o = BubbleSortSwaps(; m = 5) # 5-dimensional embedding vectors | ||
information(Shannon(; base = 2), o, x) | ||
# We can also compute any other "bubble quantity", for example the | ||
# "Tsallis bubble extropy", with arbitrary probabilities estimators: | ||
information(TsallisExtropy(), BayesianRegularization(), o, x) | ||
``` | ||
""" | ||
Base.@kwdef struct BubbleSortSwaps{M, T} <: CountBasedOutcomeSpace | ||
m::M = 3 | ||
τ::T = 1 | ||
end | ||
|
||
# Add one to the total number of possible swaps because it may happen that we don't | ||
# need to swap. | ||
total_outcomes(o::BubbleSortSwaps{m}) where {m} = round(Int, (o.m * (o.m - 1)) / 2) + 1 | ||
outcome_space(o::BubbleSortSwaps{m}) where {m} = 0:(total_outcomes(o) - 1) | ||
|
||
function counts_and_outcomes(o::BubbleSortSwaps, x) | ||
observed_outs = codify(o, x) | ||
return counts_and_outcomes(UniqueElements(), observed_outs) | ||
end | ||
|
||
function codify(o::BubbleSortSwaps, x) | ||
encoding = BubbleSortSwapsEncoding{o.m}() | ||
x_embedded = vec(embed(x, o.m, o.τ)) | ||
return encode.(Ref(encoding), x_embedded) | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
using Test, ComplexityMeasures | ||
using Random; rng = MersenneTwister(1234) | ||
|
||
x = rand(rng, 1000) | ||
est = BubbleEntropy(m = 5) | ||
@test complexity(est, x) isa Real | ||
@test 0.0 <= complexity_normalized(est, x) <= 1.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
using Test, ComplexityMeasures | ||
using Random; rng = MersenneTwister(1234) | ||
using DelayEmbeddings | ||
|
||
x = rand(10000) | ||
m = 5 | ||
x_embed = embed(x, m, 1) | ||
encoding = BubbleSortSwapsEncoding{m}() | ||
symbols = encode.(Ref(encoding), x_embed.data) | ||
@test all(0 .<= symbols .<= (m * (m - 1)) ÷ 2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
using ComplexityMeasures, Test | ||
using Random; rng = MersenneTwister(1234) | ||
|
||
# Constructor | ||
@test BubbleSortSwaps(; m = 3, τ = 1) isa BubbleSortSwaps | ||
@test BubbleSortSwaps(; m = 3, τ = 1) isa ComplexityMeasures.CountBasedOutcomeSpace | ||
|
||
# Codify | ||
x = rand(rng, 100000) # enough points to cover the outcome space for small `m` | ||
m = 3 | ||
o = BubbleSortSwaps(; m = m, τ = 1) | ||
observed_outs = codify(o, x) | ||
@test length(observed_outs) == length(x) - (m - 1) | ||
|
||
# Outcomes | ||
o = BubbleSortSwaps(; m = 3, τ = 1) | ||
cts, outs = counts_and_outcomes(o, x) | ||
@test total_outcomes(o) == (m * (m - 1) / 2) + 1 | ||
@test total_outcomes(o, x) == total_outcomes(o) | ||
@test outcome_space(o) == collect(0:(total_outcomes(o) - 1)) # 0 included, so 1 less | ||
@test outs == outcome_space(o) # should be enough points in `x` to be true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
89c61a8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
89c61a8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/99000
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
89c61a8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
Release notes:
3.4
BubbleEntropy
.BubbleSortSwaps
.BubbleSortSwapsEncoding
.89c61a8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request updated: JuliaRegistries/General/99000
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via: