Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disk caching for memoized functions #4

Open
CameronBieganek opened this issue Aug 7, 2020 · 2 comments
Open

Add disk caching for memoized functions #4

CameronBieganek opened this issue Aug 7, 2020 · 2 comments

Comments

@CameronBieganek
Copy link

CameronBieganek commented Aug 7, 2020

As was discussed on Discourse, sometimes one would like to save memoized functions to disk for later reuse.

Here's my use case, in brief:

I'm using the Scopus Abstract Retrieval API to build a graph of article citations. There is a weekly quota of 10,000 queries, so I want to memoize my query function to avoid duplicate queries (since most articles will be cited by more than one other article). However, even with memoization, the total number of queries is likely to be more than 10,000, so I need to be able to save the memoized function to disk so that the graph construction can be resumed a week later with a fresh quota.

@marius311
Copy link
Owner

marius311 commented Aug 7, 2020

Yea, that'd be really nice to have. The way to do it is with a disk-backed Dict data type like:

@memoize DiskDict foo(x) = ....

Problem is I can't seem to find one in Julia, do you know if it exists? I do think such a Dict would live outside this package though.

In terms of this package though, we do need an easier way to specify arguments to the cache constructor, like

@memoize DiskDict("foo.dat") foo(x) = ....
@memoize DiskDict("bar.dat") bar(x) = ....

which is slightly non-trivial due to the use of generated functions here, but something I can work on.

@marius311
Copy link
Owner

Ok, so syntax like

@memoize DiskDict("foo.dat") foo(x) = ....

now works on master.

Unfortunately a disk-backed dict doesn't appear to exist anywhere, but I hacked together an absolutely minimal implementation of one below. With this you can now do,

julia> using Memoization, Main.JLD2BackedDicts

julia> @memoize JLD2BackedDict("foo.jld2", maxsize=2) foo(x) = (println("Computed $x"); x)

julia> foo(3)
Computed 3
3

julia> foo(3)
3

# restart session

julia> using Memoization, Main.JLD2BackedDicts

julia> @memoize JLD2BackedDict("foo.jld2", maxsize=2) foo(x) = (println("Computed $x"); x)

julia> foo(3)
3

There's a JLD2 disk-backing, and an LRUCache memory-backing. A ton of stuff doesn't fully work, including emptying, but maybe you would like to play with for a while and see it works for your problem? If you do end up building anything out, please feel free to create a repo, guessing it would be useful to many. If not no worries, I might get to it at some point in the future.

Here's the JLD2BackedDict code:

module JLD2BackedDicts

using LRUCache
using JLD2: JLDFile, jldopen
using Base: Callable

export JLD2BackedDict

struct JLD2BackedDict{K,V} <: AbstractDict{K,V}
    disk_backing :: JLDFile
    memory_backing :: LRU{K,V}
end

JLD2BackedDict(args...; kwargs...) = JLD2BackedDict{Any,Any}(args...; kwargs...)

function JLD2BackedDict{K,V}(filename::String; kwargs...) where {K,V}
    disk_backing = jldopen(filename,"a+")
    memory_backing = LRU{K,V}(;kwargs...)
    JLD2BackedDict{K,V}(disk_backing, memory_backing)
end

function Base.get!(default::Callable, d::JLD2BackedDict{K,V}, key::K) where {K,V}
    get!(d.memory_backing, key) do 
        get!(default, d.disk_backing, string(hash(key)))
    end
end

Base.show(io::IO, ::MIME"text/plain", d::JLD2BackedDict) = print(io, d.memory_backing, " + ", d.disk_backing)

end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants