-
-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert ddp.jl to Column-Major #109
Conversation
to search over columns then rows i.e. over a transposed vals matrix
to search over columns then rows i.e. over a transposed vals matrix transpose both s_wise_max fns transpose R, tidy tests transpose multiplication code properly transpose RQ_sigma tidy the tests up a bit improve tests, tidy up fix a commit mistake
@ZacCranko Thanks for your efforts on this! What about the SA (or AS) formulation? For this case For (Actually I now suspect we don't need the two versions of internal data structures and in fact only the SA is enough (for both the Julia and Python versions).) |
Let me add some more words. There are two issues: (1) How to store the data in the memory; (2) How to access them. (1) My view is the following: The values (rewards) of The values (probabilities) of Then given the values tomorrow (2) We should discuss the issue (1) first. Is there a unique most efficient way? Are there several equally efficient ways and should we decide based on the issue (2)? |
If I understand correctly I think the issue here is that with the column-major ordering means that internally an s x a x t dimension matrix Q will be stored in memory linearly as [ [Q(1,1,1), Q(2,1,1), ... ,Q(s,1,1)], [Q(1,2,1), ... , Q(s,2,1)], ... , [Q(1,a,1), ... , Q(s, a,1)] ], where [ ] separate the various ellipses. This is why I permuted the @inbounds for i in 1:m, j in 1:n, k in 1:o
Q_new[k,i,j] = Q_old[i,j,k]
end Since the inner products happen over the 't' indices, we make these the "columns" in the matrix. |
(I mis-read your If we want to align the probabilities in memory as |
My design of the Python version may have misled us. I was ignorant about the distinction among:
I didn't realize that there is no need for 2 to be the same as 3, and I paid no attention to 1 ex ante (but ex post it seems to be the efficient one). We should discuss 1 first. |
There's no difference performance wise between picking an orientation for Q of states_tomorrow x actions x states versus (the current) states_tomorrow x states x actions. I can't think of a clear reason to prefer one over the other, but if you'd rather I implement it the other way then I am happy to do so. |
On my machine there is a difference: julia> n, m = 10^3, 10^3;
julia> vals = randn(n, m);
julia> vals_T = transpose(vals);
julia> @time maximum(vals, 2); @time maximum(vals, 2); @time maximum(vals, 2);
0.003414 seconds (14 allocations: 8.359 KB)
0.002886 seconds (14 allocations: 8.359 KB)
0.002876 seconds (14 allocations: 8.359 KB)
julia> @time maximum(vals_T, 1); @time maximum(vals_T, 1); @time maximum(vals_T, 1);
0.001320 seconds (14 allocations: 8.359 KB)
0.001179 seconds (14 allocations: 8.359 KB)
0.001205 seconds (14 allocations: 8.359 KB) The reason is that it may be the case (on some machines) that it takes less time for the computer to refer to contiguous data than otherwise. Does this make sense? |
When I said before "there was no difference" this was with regard to defining the three dimension matrix multiplication |
Looking over the multiplication code, it may even be a tiny bit faster to use states_tomorrow x actions x states, but the difference is likely negligible. |
@@ -571,7 +592,7 @@ s_wise_max!(ddp::DiscreteDP, vals::AbstractMatrix, out::Vector, out_argmax::Vect | |||
Return the `Vector` `max_a vals(s, a)`, where `vals` is represented as a | |||
`AbstractMatrix` of size `(num_states, num_actions)`. | |||
""" | |||
s_wise_max(vals::AbstractMatrix) = vec(maximum(vals, 2)) | |||
s_wise_max(vals::AbstractMatrix) = vec(maximum(vals, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change this while keeping the shape (states, actions) fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I'm not sure. Should it be vec(maximum(vals, 2))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the sa pairs are in the order that the state changes first, so that the action is accessed by the second argument, then it should be vec(maximum(vals, 2))
, and vice versa.
There seems to be a confusion (see the above comment). Try this: beta = 0.95;
n, m = 2, 2;
R = [5.0 10.0; -1.0 -Inf];
Q = Array{Float64}(n, n, m);
Q[:, 1, 1] = [0.5, 0.5];
Q[:, 1, 2] = [0, 1];
Q[:, 2, 1] = [0, 1];
Q[:, 2, 2] = [0.5, 0.5] # Arbitrary;
ddp0 = DiscreteDP(R, Q, beta); Then I get: julia> bellman_operator(ddp0, [0, 0])
2-element Array{Float64,1}:
5.0
10.0 The outcome should be [10, -1]. |
Easy fix. I'll make sure to add this to the test suite. |
Do we agree now that there is a clear reason why one of the two orders is preferred to the other? |
I still don't know why one would be preferred to another. Please let me know how you'd like me to set it up and I'll be happy to implement it. |
|
Of course I agree that computing the maximum is faster over the columns of an array. I just don't understand what you'd like me to change in the codebase. Do you want me to change the orientation of the Q matrix to states_tomorrow x actions x states? What other changes need to be made? |
Ping. @ZacCranko and @oyamad where did we leave off with this. Can we make some checkboxes (either in a separate comment or in the original description above) that provide a set of TODO items that would help us get this merged? |
I would like to discuss first what would be the best way to order the data in memory. If I understand correctly, it is language independent and so should be common between the Python and the Julia versions. |
@oyamad by your most recent comment do you mean to say that you believe there is a single most efficient way the memory should be ordered for our particular algorithms with markov chains? If so, then I think we might be able to have a fruitful discussion. Otherwise, could you please explain more about what you meant above? |
@spencerlyon2 Let's discuss in a separate issue #124. |
Summary
This is a placeholder to convert the "row-major" code in
ddp.jl
to a more Julia friendly column-major layout. To that end the specification forR
is now of dimension states x actions, andQ
is of dimension states_tomorrow x states x actions.There are two functions where we reap the benefits of the new data structures, in the computation of the product
Q * v
and in solving for the state wise max withs_wise_max!
.Technical Details
There is a new method defined for
ctranspose
forArray{T, 3}
that transposes the old formatQ
to the new one. In particular permuting according toTransposes of a three dimension matrix are somewhat arbitrary to define, but the reason I choose this particular configuration for
Q
is that it is the most efficient in defining the multiplicationQ * v
. This is because with the new-formatQ
the elements of the resulting matrixQ * v
are given by inner products ofv
and the "columns" ofQ
.Similarly the
s_wise_max!
code now computes the max over the columns of the computed matrixR
instead of the rows.Comparison
Solving the
ddp0
model from the tests with VFI:Branch
master
:Branch
transddp
:These results can be computed with
Some issues to be addressed before merging:
ctranspose
transposingR
andQ
to the new format. This is convenient for troubleshooting, but once we're happy I'll hardcode the tests, and we should consider removing thectranspose
method forArray{T, 3}
either as part of this PR or at some later stage.DiscreteDP
constructor (which checks thatQ
is of the correct orientation, to stop users accidentally using the old format).random_mc.jl
. I'll fix this once everyone is happy with the code inddp.jl
Update: I originally mistakenly said the new format Q matrix is states x states_tomorrow x actions, this has been fixed.