Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

O(n) worst case sampling runtime #21

Open
LilithHafner opened this issue Dec 11, 2024 · 3 comments
Open

O(n) worst case sampling runtime #21

LilithHafner opened this issue Dec 11, 2024 · 3 comments

Comments

@LilithHafner
Copy link
Collaborator

The worst case sample time is also O(n) (or O(1) with a large constant factor, if you prefer to think about it that way), achieved when the list of levels starts with a lot of zeros:

julia> using DynamicSampling, Chairmarks

julia> ds = DynamicSampler()
DynamicSampler(Tuple{Int64, Float64}[])

julia> push!(ds, 1, 1.0)
DynamicSampler([(1, 1.0)])

julia> @b ds rand
8.429 ns

julia> for i in 2:1000
           push!(ds, i, 2.0^i)
       end

julia> @b ds rand
16.424 ns

julia> for i in 2:1000
           delete!(ds, i)
       end

julia> @b ds rand
1.292 μs (1 allocs: 48 bytes)

It's possible that this could be fixed by performing a partial sort while traversing the list during sampling, though maybe there's another way with less sampling overhead.

@LilithHafner
Copy link
Collaborator Author

IIUC, the best this algorithm can get is O(log(n)) sampling time because of the case where weights are of the form [1, 1/2, 1/2, 1/4, 1/4, 1/4, 1/4, ...] and level selection alone takes log(n) time.

@Tortar
Copy link
Owner

Tortar commented Dec 12, 2024

yes my idea for this is to sort at some point after deleting/adding a certain number of elements e.g. now I'm using the really (too) simple https://github.com/Tortar/DynamicSampling.jl/blob/main/src/DynamicWeightedSampler.jl#L252, do you have suggestion for a better criterion? In general It would be cool to have some theoretical results on this I think

@Tortar
Copy link
Owner

Tortar commented Dec 13, 2024

IIUC, the best this algorithm can get is O(log(n)) sampling time because of the case where weights are of the form [1, 1/2, 1/2, 1/4, 1/4, 1/4, 1/4, ...] and level selection alone takes log(n) time.

Yes my earlier comment was wrong, you are right, it actually requires O(log(n)) as you say, and this method can't do better than that for this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants