Convert ddp.jl to Column-Major #109

ZacCranko · 2016-05-03T07:58:14Z

Summary

This is a placeholder to convert the "row-major" code in ddp.jl to a more Julia friendly column-major layout. To that end the specification for R is now of dimension states x actions, and Q is of dimension states_tomorrow x states x actions.

There are two functions where we reap the benefits of the new data structures, in the computation of the product Q * v and in solving for the state wise max with s_wise_max!.

Technical Details

There is a new method defined for ctranspose for Array{T, 3} that transposes the old format Q to the new one. In particular permuting according to

@inbounds for i in 1:m, j in 1:n, k in 1:o
    Q_new[k,i,j] = Q_old[i,j,k]
end

Transposes of a three dimension matrix are somewhat arbitrary to define, but the reason I choose this particular configuration for Q is that it is the most efficient in defining the multiplication Q * v. This is because with the new-format Q the elements of the resulting matrix Q * v are given by inner products of v and the "columns" of Q.

@inbounds for i in 1:s, j in 1:a
    R[j, i] = dot(v, Q[:, i, j])
end

Similarly the s_wise_max! code now computes the max over the columns of the computed matrix R instead of the rows.

Comparison

Solving the ddp0 model from the tests with VFI:
Branch master:

================ Benchmark Results ========================
     Time per evaluation: 17.92 μs [17.04 μs, 18.79 μs]
Proportion of time in GC: 1.07% [0.36%, 1.77%]
        Memory allocated: 8.31 kb
   Number of allocations: 80 allocations
       Number of samples: 5201
   Number of evaluations: 284701
         R² of OLS model: 0.743
 Time spent benchmarking: 10.47 s

Branch transddp:

================ Benchmark Results ========================
     Time per evaluation: 11.84 μs [11.32 μs, 12.36 μs]
Proportion of time in GC: 1.73% [0.90%, 2.56%]
        Memory allocated: 8.05 kb
   Number of allocations: 76 allocations
       Number of samples: 5801
   Number of evaluations: 504201
         R² of OLS model: 0.765
 Time spent benchmarking: 10.54 s

These results can be computed with

Pkg.checkout("QuantEcon", "transddp") # new code
# `Pkg.checkout("QuantEcon", "master")` # use this for the old code
using QuantEcon, Benchmarks
beta = 0.95
n, m = 2, 2
R = [5.0 10.0; -1.0 -Inf]
Q = Array{Float64}(n, m, n)
Q[:, :, 1] = [0.5 0.0; 0.0 0.0]
Q[:, :, 2] = [0.5 1.0; 1.0 1.0]
ddp0 = DiscreteDP(R, Q, beta)

v_init = [0.0, 1.0]
solve(ddp0, v_init, VFI)
@benchmark solve(ddp0, v_init, VFI)

Some issues to be addressed before merging:

Currently the old tests are being run with ctranspose transposing R and Q to the new format. This is convenient for troubleshooting, but once we're happy I'll hardcode the tests, and we should consider removing the ctranspose method for Array{T, 3} either as part of this PR or at some later stage.
I need to write some additional tests for the new input verification code in the DiscreteDP constructor (which checks that Q is of the correct orientation, to stop users accidentally using the old format).
Currently there is a test failing in random_mc.jl. I'll fix this once everyone is happy with the code in ddp.jl

Update: I originally mistakenly said the new format Q matrix is states x states_tomorrow x actions, this has been fixed.

to search over columns then rows i.e. over a transposed vals matrix

to search over columns then rows i.e. over a transposed vals matrix transpose both s_wise_max fns transpose R, tidy tests transpose multiplication code properly transpose RQ_sigma tidy the tests up a bit improve tests, tidy up fix a commit mistake

oyamad · 2016-05-03T08:40:51Z

@ZacCranko Thanks for your efforts on this!

What about the SA (or AS) formulation? For this case Q should be of shape states_tomorrow x sa_pairs. Then the expected value is computed by v * Q, where v is a vector of length L with L being the number of sa pairs and * is the standard dot product.

For s_wise_max I think sa paris should be ordered as (state_1, action_1), (state_1, action_2), (state_1, action_3), ....

(Actually I now suspect we don't need the two versions of internal data structures and in fact only the SA is enough (for both the Julia and Python versions).)

oyamad · 2016-05-03T09:13:25Z

Let me add some more words.

There are two issues: (1) How to store the data in the memory; (2) How to access them.
("Product formulation"/ "SA formulation" concerns (2) and I think it is of secondary importance.)

(1)
We have to decide what the efficient order of the data is for the computation of * and s_wise_max and others. This is independent of whether it is Julia or Python/NumPy; the data, R and Q, are stored linearly anyway.

My view is the following:

The values (rewards) of R should be ordered in memory as:
r(s1, a1), r(s1, a2), r(s1, a3), ...

The values (probabilities) of Q should be ordered in memory as:
q(s1, a1, t1), q(s1, a1, t2), ..., q(s1, a2, t1), q(s1, a2, t2), ..., q(s1, a3, t1), q(s1, a3, t2), ...
where t's are states tomorrow.

Then given the values tomorrow v, the expected values are computed as:
q(s1, a1, t1) * v(t1) + q(s1, a1, t2) * v(t2) + ...
for (s1, a1) and so on.

(2)
There are two ways to know what in the linear data corresponds to what. One is to define ~~s_indices~~ a_indices and a_indptr and the other to give R and Q dimensions 2 and 3, respectively. Julia and Python differ in indexing in the latter case.

We should discuss the issue (1) first. Is there a unique most efficient way? Are there several equally efficient ways and should we decide based on the issue (2)?

ZacCranko · 2016-05-04T01:53:59Z

My view is the following:

The values (rewards) of R should be ordered in memory as:
r(s1, a1), r(s1, a2), r(s1, a3), ...

The values (probabilities) of Q should be ordered in memory as:
q(s1, a1, t1), q(s1, a1, t2), ..., q(s1, a2, t1), q(s1, a2, t2), ..., q(s1, a3, t1), q(s1, a3, t2), ...
where t's are states tomorrow.

Then given the values tomorrow v, the expected values are computed as:
q(s1, a1, t1) * v(t1) + q(s1, a1, t2) * v(t2) + ...
for (s1, a1) and so on.

If I understand correctly I think the issue here is that with the column-major ordering means that internally an s x a x t dimension matrix Q will be stored in memory linearly as

[ [Q(1,1,1), Q(2,1,1), ... ,Q(s,1,1)], [Q(1,2,1), ... , Q(s,2,1)], ... , [Q(1,a,1), ... , Q(s, a,1)] ],
[ [Q(1,1,2), Q(2,1,2), ... ,Q(s,1,2)], [Q(1,2,2), ... , Q(s,2,2)], ... , [Q(1,a,1), ... , Q(s, a, 2)] ],
... ,
[ [Q(1,1,t), Q(2,1,t), ... ,Q(s,1,t)], [Q(1,2,t), ... , Q(s,2,t)], ... , [Q(1,a,t), ... , Q(s, a,t)] ]

where [ ] separate the various ellipses. This is why I permuted the Q matrix indices according to

@inbounds for i in 1:m, j in 1:n, k in 1:o
    Q_new[k,i,j] = Q_old[i,j,k]
end

Since the inner products happen over the 't' indices, we make these the "columns" in the matrix.
I.e., Q is now t x s x a. @oyamad If I have understood correctly, the result is the memory ordering you envisage.

oyamad · 2016-05-04T02:09:37Z

(I mis-read your Q to be of dimension "states x states_tomorrow x actions".)

If we want to align the probabilities in memory as
q(s1, a1, t1), q(s1, a1, t2), ..., q(s1, a2, t1), q(s1, a2, t2), ..., q(s1, a3, t1), q(s1, a3, t2), ...
(state tomorrow changes first, action changes second, and then state today changes) and if we want to access them with a 3-dimensional Q, then shouldn't Q be of shape "states_tomorrow x actions x states"? Maybe we disagree on the "best" memory layout?

oyamad · 2016-05-04T02:43:50Z

My design of the Python version may have misled us. I was ignorant about the distinction among:

How to order the data (reward values and probability values)
Common for Julia and Python/NumPy; the data are stored contiguously.
How to access the data internally
Here the issue is whether we need both DiscreteDP with (NQ, NR) = (3, 2) and DiscreteDP with (NQ, NR) = (2, 1).
What constructors to provide to the user
It is definitely useful to provide two constructors with (2-dim R, 3-dim Q) and with (1-dim R, 2-dim Q).

I didn't realize that there is no need for 2 to be the same as 3, and I paid no attention to 1 ex ante (but ex post it seems to be the efficient one).

We should discuss 1 first.

ZacCranko · 2016-05-04T03:53:40Z

and if we want to access them with a 3-dimensional Q, then shouldn't Q be of shape "states_tomorrow x actions x states"? Maybe we disagree on the "best" memory layout?

There's no difference performance wise between picking an orientation for Q of states_tomorrow x actions x states versus (the current) states_tomorrow x states x actions. I can't think of a clear reason to prefer one over the other, but if you'd rather I implement it the other way then I am happy to do so.

oyamad · 2016-05-04T04:50:51Z

On my machine there is a difference:

julia> n, m = 10^3, 10^3;

julia> vals = randn(n, m);

julia> vals_T = transpose(vals);

julia> @time maximum(vals, 2); @time maximum(vals, 2); @time maximum(vals, 2);
  0.003414 seconds (14 allocations: 8.359 KB)
  0.002886 seconds (14 allocations: 8.359 KB)
  0.002876 seconds (14 allocations: 8.359 KB)

julia> @time maximum(vals_T, 1); @time maximum(vals_T, 1); @time maximum(vals_T, 1);
  0.001320 seconds (14 allocations: 8.359 KB)
  0.001179 seconds (14 allocations: 8.359 KB)
  0.001205 seconds (14 allocations: 8.359 KB)

The reason is that it may be the case (on some machines) that it takes less time for the computer to refer to contiguous data than otherwise. Does this make sense?

ZacCranko · 2016-05-04T04:55:28Z

When I said before "there was no difference" this was with regard to defining the three dimension matrix multiplication Q * v. Of course there is a difference when computing the column-wise vs. row-wise maximum.

ZacCranko · 2016-05-04T05:00:09Z

Looking over the multiplication code, it may even be a tiny bit faster to use states_tomorrow x actions x states, but the difference is likely negligible.

oyamad · 2016-05-04T05:30:21Z

src/markov/ddp.jl

@@ -571,7 +592,7 @@ s_wise_max!(ddp::DiscreteDP, vals::AbstractMatrix, out::Vector, out_argmax::Vect
 Return the `Vector` `max_a vals(s, a)`,  where `vals` is represented as a
 `AbstractMatrix` of size `(num_states, num_actions)`.
 """
-s_wise_max(vals::AbstractMatrix) = vec(maximum(vals, 2))
+s_wise_max(vals::AbstractMatrix) = vec(maximum(vals, 1))


Why did you change this while keeping the shape (states, actions) fixed?

Good question, I'm not sure. Should it be vec(maximum(vals, 2)) ?

If the sa pairs are in the order that the state changes first, so that the action is accessed by the second argument, then it should be vec(maximum(vals, 2)), and vice versa.

oyamad · 2016-05-04T05:34:49Z

There seems to be a confusion (see the above comment).

Try this:

beta = 0.95;
n, m = 2, 2;
R = [5.0 10.0; -1.0 -Inf];
Q = Array{Float64}(n, n, m);
Q[:, 1, 1] = [0.5, 0.5];
Q[:, 1, 2] = [0, 1];
Q[:, 2, 1] = [0, 1];
Q[:, 2, 2] = [0.5, 0.5]  # Arbitrary;
ddp0 = DiscreteDP(R, Q, beta);

Then I get:

julia> bellman_operator(ddp0, [0, 0])
2-element Array{Float64,1}:
  5.0
 10.0

The outcome should be [10, -1].

ZacCranko · 2016-05-04T05:52:17Z

Easy fix. I'll make sure to add this to the test suite.

oyamad · 2016-05-04T05:57:24Z

Do we agree now that there is a clear reason why one of the two orders is preferred to the other?

ZacCranko · 2016-05-04T07:09:26Z

I still don't know why one would be preferred to another. Please let me know how you'd like me to set it up and I'll be happy to implement it.

oyamad · 2016-05-04T07:18:56Z

state_tomorrow versus sa_pair
state_tomorrow should change faster for *.
(We are in agreement on this, right?)
state_today versus action
action should change faster for s_wise_max.

Do you agree that maximum(vals_transpose, 1) is faster than maximum(vals, 2)?

ZacCranko · 2016-05-05T00:02:43Z

Of course I agree that computing the maximum is faster over the columns of an array. I just don't understand what you'd like me to change in the codebase. Do you want me to change the orientation of the Q matrix to states_tomorrow x actions x states? What other changes need to be made?

sglyon · 2016-07-06T14:49:55Z

Ping.

@ZacCranko and @oyamad where did we leave off with this. Can we make some checkboxes (either in a separate comment or in the original description above) that provide a set of TODO items that would help us get this merged?

oyamad · 2016-07-06T15:09:51Z

I would like to discuss first what would be the best way to order the data in memory. If I understand correctly, it is language independent and so should be common between the Python and the Julia versions.

sglyon · 2016-07-07T21:06:14Z

@oyamad by your most recent comment do you mean to say that you believe there is a single most efficient way the memory should be ordered for our particular algorithms with markov chains? If so, then I think we might be able to have a fruitful discussion. Otherwise, could you please explain more about what you meant above?

oyamad · 2016-07-12T13:55:47Z

@spencerlyon2 Let's discuss in a separate issue #124.

sglyon and others added 12 commits April 5, 2016 16:13

remove grid and optim from require

6c192dd

transpose s_wise_max!

3226403

to search over columns then rows i.e. over a transposed vals matrix

transpose both s_wise_max fns

6e25871

transpose R, tidy tests

e3680b9

transpose multiplication code properly

7f44812

transpose RQ_sigma

9d404ff

tidy the tests up a bit

26a5f48

improve tests, tidy up

396ea15

fix a commit mistake

559dfba

Merge branch 'master' of git://github.com/QuantEcon/QuantEcon.jl

8ba2169

transpose ddp to column major

253619b

to search over columns then rows i.e. over a transposed vals matrix transpose both s_wise_max fns transpose R, tidy tests transpose multiplication code properly transpose RQ_sigma tidy the tests up a bit improve tests, tidy up fix a commit mistake

merge conflict

e581a68

oyamad reviewed May 4, 2016
View reviewed changes

fix a mistake

e9a52f7

oyamad mentioned this pull request Jul 12, 2016

Efficient memory ordering for DiscreteDP #124

Open

sglyon added the on hold label Jul 12, 2016

ZacCranko closed this Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert ddp.jl to Column-Major #109

Convert ddp.jl to Column-Major #109

ZacCranko commented May 3, 2016 •

edited

Loading

oyamad commented May 3, 2016

oyamad commented May 3, 2016 •

edited

Loading

ZacCranko commented May 4, 2016 •

edited

Loading

oyamad commented May 4, 2016

oyamad commented May 4, 2016

ZacCranko commented May 4, 2016

oyamad commented May 4, 2016

ZacCranko commented May 4, 2016

ZacCranko commented May 4, 2016

oyamad May 4, 2016

ZacCranko May 4, 2016

oyamad May 4, 2016

oyamad commented May 4, 2016

ZacCranko commented May 4, 2016

oyamad commented May 4, 2016

ZacCranko commented May 4, 2016

oyamad commented May 4, 2016

ZacCranko commented May 5, 2016 •

edited

Loading

sglyon commented Jul 6, 2016

oyamad commented Jul 6, 2016

sglyon commented Jul 7, 2016

oyamad commented Jul 12, 2016

Convert ddp.jl to Column-Major #109

Convert ddp.jl to Column-Major #109

Conversation

ZacCranko commented May 3, 2016 • edited Loading

Summary

Technical Details

Comparison

Some issues to be addressed before merging:

oyamad commented May 3, 2016

oyamad commented May 3, 2016 • edited Loading

ZacCranko commented May 4, 2016 • edited Loading

oyamad commented May 4, 2016

oyamad commented May 4, 2016

ZacCranko commented May 4, 2016

oyamad commented May 4, 2016

ZacCranko commented May 4, 2016

ZacCranko commented May 4, 2016

oyamad May 4, 2016

Choose a reason for hiding this comment

ZacCranko May 4, 2016

Choose a reason for hiding this comment

oyamad May 4, 2016

Choose a reason for hiding this comment

oyamad commented May 4, 2016

ZacCranko commented May 4, 2016

oyamad commented May 4, 2016

ZacCranko commented May 4, 2016

oyamad commented May 4, 2016

ZacCranko commented May 5, 2016 • edited Loading

sglyon commented Jul 6, 2016

oyamad commented Jul 6, 2016

sglyon commented Jul 7, 2016

oyamad commented Jul 12, 2016

ZacCranko commented May 3, 2016 •

edited

Loading

oyamad commented May 3, 2016 •

edited

Loading

ZacCranko commented May 4, 2016 •

edited

Loading

ZacCranko commented May 5, 2016 •

edited

Loading