Skip to content

Commit

Permalink
New outcome space and encoding: BubbleSortSwaps and `BubbleSortSwap…
Browse files Browse the repository at this point in the history
…sEncoding` (#390)

* WIP: "bubble entropy"

* Fix tests and be more efficient

* Add to docs

* Rename and add more tests

* Fix StatisticalComplexity test

Parameters must be defined inside test set

* Add example

* Add `BubbleEntropy` complexity measure

* Address comments

* Update src/outcome_spaces/bubble_sort_swaps.jl

Co-authored-by: George Datseris <[email protected]>

* Update src/outcome_spaces/bubble_sort_swaps.jl

Co-authored-by: George Datseris <[email protected]>

* Update src/outcome_spaces/bubble_sort_swaps.jl

Co-authored-by: George Datseris <[email protected]>

* Remove redundant method

* Remove, not comment away

* Add change log entry and up version

* address comment

---------

Co-authored-by: George Datseris <[email protected]>
  • Loading branch information
kahaaga and Datseris authored Jan 16, 2024
1 parent c191814 commit 89c61a8
Show file tree
Hide file tree
Showing 17 changed files with 284 additions and 1 deletion.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

Changelog is kept with respect to version 0.11 of Entropies.jl. From version v2.0 onwards, this package has been renamed to ComplexityMeasures.jl.

## 3.4

- New complexity measure: `BubbleEntropy`.
- New outcome space: `BubbleSortSwaps`.
- New encoding: `BubbleSortSwapsEncoding`.

## 3.3

- Added the `SequentialPairDistances` outcome space. In the literature, this outcome
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name = "ComplexityMeasures"
uuid = "ab4b797d-85ee-42ba-b621-05d793b346a2"
authors = "Kristian Agasøster Haaga <[email protected]>, George Datseries <[email protected]>"
repo = "https://github.com/juliadynamics/ComplexityMeasures.jl.git"
version = "3.3.0"
version = "3.4.0"

[deps]
Combinatorics = "861a8166-3701-5b0c-9a16-15d98fcdc6aa"
Expand Down
11 changes: 11 additions & 0 deletions docs/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,17 @@ @article{Li2019
publisher={MDPI}
}

@article{Manis2017,
title={Bubble entropy: An entropy almost free of parameters},
author={Manis, George and Aktaruzzaman, MD and Sassi, Roberto},
journal={IEEE Transactions on Biomedical Engineering},
volume={64},
number={11},
pages={2711--2718},
year={2017},
publisher={IEEE}
}

@article{Zhou2023,
title={Using missing dispersion patterns to detect determinism and nonlinearity in time series data},
author={Zhou, Qin and Shang, Pengjian and Zhang, Boyi},
Expand Down
6 changes: 6 additions & 0 deletions docs/src/complexity.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,9 @@ entropy_complexity_curves
```@docs
LempelZiv76
```

## Bubble entropy

```@docs
BubbleEntropy
```
7 changes: 7 additions & 0 deletions docs/src/probabilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,12 @@ Diversity
SequentialPairDistances
```

### Bubble sort swaps

```@docs
BubbleSortSwaps
```

### Spatial outcome spaces

```@docs
Expand Down Expand Up @@ -161,5 +167,6 @@ RelativeMeanEncoding
RelativeFirstDifferenceEncoding
UniqueElementsEncoding
PairDistanceEncoding
BubbleSortSwapsEncoding
CombinationEncoding
```
75 changes: 75 additions & 0 deletions src/complexity_measures/bubble_entropy.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
export BubbleEntropy

"""
BubbleEntropy <: ComplexityEstimator
BubbleEntropy(; m = 3, τ = 1, definition = Renyi(q = 2))
The `BubbleEntropy` complexity estimator [Manis2017](@cite) is just a difference
between two entropies, each computed with the [`BubbleSortSwaps`](@ref) outcome space, for
embedding dimensions `m + 1` and `m`, respectively.
[Manis2017](@citet) use the [`Renyi`](@ref) entropy of order `q = 2` as the
information measure `definition`, but here you can use any [`InformationMeasure`](@ref).
[Manis2017](@citet) formulates the "bubble entropy" as the normalized measure below,
while here you can also compute the unnormalized measure.
## Definition
For input data `x`, the "bubble entropy" is computed by first embedding the input data
using embedding dimension `m` and embedding delay `τ` (call the embedded pts `y`), and
then computing the difference between the two entropies:
```math
BubbleEn_T(τ) = H_T(y, m + 1) - H_T(y, m)
```
where ``H_T(y, m)`` and ``H_T(y, m + 1)`` are entropies of type ``T``
(e.g. [`Renyi`](@ref)) computed with the input data `x` embedded to dimension ``m`` and
``m+1``, respectively. Use [`complexity`](@ref) to compute this non-normalized version.
Use [`complexity_normalized`](@ref) to compute the normalized difference of entropies:
```math
BubbleEn_H(τ)^{norm} =
\\dfrac{H_T(x, m + 1) - H_T(x, m)}{max(H_T(x, m + 1)) - max(H_T(x, m))},
```
where the maximum of the entropies for dimensions `m` and `m + 1` are computed using
[`information_maximum`](@ref).
## Example
```julia
using ComplexityMeasures
x = rand(1000)
est = BubbleEntropy(m = 5, τ = 3)
complexity(est, x)
```
"""
Base.@kwdef struct BubbleEntropy{M, T, D} <: ComplexityEstimator
m::M = 3
τ::T = 1
definition::D = Renyi(q = 2)
end

function complexity(est::BubbleEntropy, x)
o_m = BubbleSortSwaps(m = est.m)
o_m₊₁ = BubbleSortSwaps(m = est.m + 1)
h_m = information(est.definition, o_m, x)
h_m₊₁ = information(est.definition, o_m₊₁, x)
return h_m₊₁ - h_m
end

function complexity_normalized(est::BubbleEntropy, x)
o_m = BubbleSortSwaps(m = est.m)
o_m₊₁ = BubbleSortSwaps(m = est.m + 1)
h_m = information(est.definition, o_m, x)
h_m₊₁ = information(est.definition, o_m₊₁, x)

# The normalized factor as (I think) described in Manis et al. (2017).
# Their description is a bit unclear to me.
h_max_m = information_maximum(est.definition, o_m, x)
h_max_m₊₁ = information_maximum(est.definition, o_m₊₁, x)
norm_factor = (h_max_m₊₁ - h_max_m) # maximum difference for dims `m` and `m + 1`

return (h_m₊₁ - h_m)/norm_factor
end
1 change: 1 addition & 0 deletions src/complexity_measures/complexity_measures.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ include("reverse_dispersion_entropy.jl")
include("missing_dispersion.jl")
include("statistical_complexity.jl")
include("lempel_ziv.jl")
include("bubble_entropy.jl")
61 changes: 61 additions & 0 deletions src/encoding_implementations/bubble_sort_swaps_encoding.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
using StaticArrays
export BubbleSortSwapsEncoding
"""
BubbleSortSwapsEncoding <: Encoding
BubbleSortSwapsEncoding{m}()
`BubbleSortSwapsEncoding` is used with [`encode`](@ref) to encode a length-`m` input
vector `x` into an integer in the range `ω ∈ 0:((m*(m-1)) ÷ 2)`, by counting the number
of swaps required for the bubble sort algorithm to sort `x` in ascending order.
[`decode`](@ref) is not implemented for this encoding.
## Example
```julia
using ComplexityMeasures
x = [1, 5, 3, 1, 2]
e = BubbleSortSwapsEncoding{5}() # constructor type argument must match length of vector
encode(e, x)
```
"""
struct BubbleSortSwapsEncoding{m, V <: AbstractVector} <: Encoding
x::V # tmp vector
end

function BubbleSortSwapsEncoding{m}() where {m}
v = zeros(m)
return BubbleSortSwapsEncoding{m, typeof(v)}(v)
end

function encode(encoding::BubbleSortSwapsEncoding, x::AbstractVector)
return n_swaps_for_bubblesort(encoding, x)
end

# super naive bubble sort
function n_swaps_for_bubblesort(encoding::BubbleSortSwapsEncoding, state_vector)
(; x) = encoding
x .= state_vector
L = length(state_vector)
n = 0
swapped = true
while swapped
swapped = false
n_swaps = 0
for j = 1:(L - 1)
if x[j] > x[j+1]
n_swaps += 1
x[j], x[j+1] = x[j+1], x[j] # move smallest element to the right
end
end
if iszero(n_swaps)
return n
else
swapped = true
n += n_swaps
end
end
return n
end

# there's no meaningful way to define `decode`, so it is not implemented.
1 change: 1 addition & 0 deletions src/encoding_implementations/encoding_implementations.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@ include("ordinal_pattern.jl")
include("relative_mean_encoding.jl")
include("relative_first_difference_encoding.jl")
include("unique_elements_encoding.jl")
include("bubble_sort_swaps_encoding.jl")
include("combination_encoding.jl")
include("distance_pair_encoding.jl")
73 changes: 73 additions & 0 deletions src/outcome_spaces/bubble_sort_swaps.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
export BubbleSortSwaps

"""
BubbleSortSwaps <: CountBasedOutcomeSpace
BubbleSortSwaps(; m = 3, τ = 1)
The `BubbleSortSwaps` outcome space is based on [Manis2017](@citet)'s
paper on "bubble entropy".
## Description
`BubbleSortSwaps` does the following:
- Embeds the input data using embedding dimension `m` and embedding lag `τ`
- For each state vector in the embedding, counting how many swaps are necessary for
the bubble sort algorithm to sort state vectors.
For [`counts_and_outcomes`](@ref), we then define a distribution over the number of
necessary swaps. This distribution can then be used to estimate probabilities using
[`probabilities_and_outcomes`](@ref), which again can be used to estimate any
[`InformationMeasure`](@ref). An example of how to compute the "Shannon bubble entropy"
is given below.
## Outcome space
The [`outcome_space`](@ref) for `BubbleSortSwaps` are the integers
`0:N`, where `N = (m * (m - 1)) / 2 + 1` (the worst-case number of swaps). Hence,
the number of [`total_outcomes`](@ref) is `N + 1`.
## Implements
- [`codify`](@ref). Returns the number of swaps required for each embedded state vector.
## Examples
With the `BubbleSortSwaps` outcome space, we can easily compute a "bubble entropy"
inspired by [Manis2017](@cite). Note: this is not actually a new entropy - it is just
a new way of discretizing the input data. To reproduce the bubble entropy complexity
measure from [Manis2017](@cite), see [`BubbleEntropy`](@ref).
## Examples
```julia
using ComplexityMeasures
x = rand(100000)
o = BubbleSortSwaps(; m = 5) # 5-dimensional embedding vectors
information(Shannon(; base = 2), o, x)
# We can also compute any other "bubble quantity", for example the
# "Tsallis bubble extropy", with arbitrary probabilities estimators:
information(TsallisExtropy(), BayesianRegularization(), o, x)
```
"""
Base.@kwdef struct BubbleSortSwaps{M, T} <: CountBasedOutcomeSpace
m::M = 3
τ::T = 1
end

# Add one to the total number of possible swaps because it may happen that we don't
# need to swap.
total_outcomes(o::BubbleSortSwaps{m}) where {m} = round(Int, (o.m * (o.m - 1)) / 2) + 1
outcome_space(o::BubbleSortSwaps{m}) where {m} = 0:(total_outcomes(o) - 1)

function counts_and_outcomes(o::BubbleSortSwaps, x)
observed_outs = codify(o, x)
return counts_and_outcomes(UniqueElements(), observed_outs)
end

function codify(o::BubbleSortSwaps, x)
encoding = BubbleSortSwapsEncoding{o.m}()
x_embedded = vec(embed(x, o.m, o.τ))
return encode.(Ref(encoding), x_embedded)
end
1 change: 1 addition & 0 deletions src/outcome_spaces/outcome_spaces.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ include("transfer_operator/transfer_operator.jl")
include("dispersion.jl")
include("cosine_similarity_binning.jl")
include("sequential_pair_distances.jl")
include("bubble_sort_swaps.jl")
include("spatial/spatial.jl")
1 change: 1 addition & 0 deletions test/complexity/complexity.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
testfile("measures/entropy_sample.jl")
testfile("measures/statistical_complexity.jl")
testfile("measures/lempel_ziv.jl")
testfile("measures/entropy_bubble.jl")

testfile("missing_outcomes.jl")
end
7 changes: 7 additions & 0 deletions test/complexity/measures/entropy_bubble.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
using Test, ComplexityMeasures
using Random; rng = MersenneTwister(1234)

x = rand(rng, 1000)
est = BubbleEntropy(m = 5)
@test complexity(est, x) isa Real
@test 0.0 <= complexity_normalized(est, x) <= 1.0
1 change: 1 addition & 0 deletions test/encodings/encodings.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ testfile("encodings/ordinal_pattern_encoding.jl")
testfile("encodings/rectangular_bin_encoding.jl")
testfile("encodings/unique_elements_encoding.jl")
testfile("encodings/distance_pair_encoding.jl")
testfile("encodings/bubble_sort_swaps_encoding.jl")
testfile("encodings/combination_encoding.jl")
10 changes: 10 additions & 0 deletions test/encodings/encodings/bubble_sort_swaps_encoding.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
using Test, ComplexityMeasures
using Random; rng = MersenneTwister(1234)
using DelayEmbeddings

x = rand(10000)
m = 5
x_embed = embed(x, m, 1)
encoding = BubbleSortSwapsEncoding{m}()
symbols = encode.(Ref(encoding), x_embed.data)
@test all(0 .<= symbols .<= (m * (m - 1)) ÷ 2)
21 changes: 21 additions & 0 deletions test/outcome_spaces/implementations/bubble_sort_swaps.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
using ComplexityMeasures, Test
using Random; rng = MersenneTwister(1234)

# Constructor
@test BubbleSortSwaps(; m = 3, τ = 1) isa BubbleSortSwaps
@test BubbleSortSwaps(; m = 3, τ = 1) isa ComplexityMeasures.CountBasedOutcomeSpace

# Codify
x = rand(rng, 100000) # enough points to cover the outcome space for small `m`
m = 3
o = BubbleSortSwaps(; m = m, τ = 1)
observed_outs = codify(o, x)
@test length(observed_outs) == length(x) - (m - 1)

# Outcomes
o = BubbleSortSwaps(; m = 3, τ = 1)
cts, outs = counts_and_outcomes(o, x)
@test total_outcomes(o) == (m * (m - 1) / 2) + 1
@test total_outcomes(o, x) == total_outcomes(o)
@test outcome_space(o) == collect(0:(total_outcomes(o) - 1)) # 0 included, so 1 less
@test outs == outcome_space(o) # should be enough points in `x` to be true
1 change: 1 addition & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ testfile(file, testname=defaultname(file)) = @testset "$testname" begin; include
testfile("outcome_spaces/implementations/dispersion.jl")
testfile("outcome_spaces/implementations/cosine_similarity_binning.jl")
testfile("outcome_spaces/implementations/sequential_pair_distances.jl")
testfile("outcome_spaces/implementations/bubble_sort_swaps.jl")
testfile("outcome_spaces/implementations/spatial/spatial_ordinal_patterns.jl")
testfile("outcome_spaces/implementations/spatial/spatial_dispersion.jl")

Expand Down

4 comments on commit 89c61a8

@kahaaga
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/99000

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v3.4.0 -m "<description of version>" 89c61a8b4cc260b95e1146fc0fe00b6c5c643ebb
git push origin v3.4.0

@kahaaga
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register

Release notes:

3.4

  • New complexity measure: BubbleEntropy.
  • New outcome space: BubbleSortSwaps.
  • New encoding: BubbleSortSwapsEncoding.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request updated: JuliaRegistries/General/99000

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v3.4.0 -m "<description of version>" 89c61a8b4cc260b95e1146fc0fe00b6c5c643ebb
git push origin v3.4.0

Please sign in to comment.