-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation of 0D PencilArrays when global size cannot be factored by processor config #43
Comments
Hi, are you sure you ran your example with 2 MPI processes? I say this because
The results you get are the same that I obtain when running with 4 MPI processes (
which means that the data is in ranks 2 and 3. Of course I agree this is not ideal since some processes are left with no data, and that ideally we shouldn't try to distribute data over singleton dimensions. The thing is that PencilFFTs wasn't designed with singleton dimensions in mind, and up to now there hasn't been an interest in this kind of usage. It might be possible to make some improvements, but I have to think of the proper way of doing this without affecting type stability. That said, generically working with arbitrary numbers of dimensions is possible and is actually an important feature of PencilArrays / PencilFFTs. It's just that the way of doing it is rather by passing a different number of dimensions. In your example, you would just do: nproc = MPI.Comm_size(comm)
plan = PencilFFTs.PencilFFTPlan((8, 44), PencilFFTs.Transforms.FFT!(), (nproc,), comm) and then work with 2D arrays everywhere. Note that in this case data is distributed over a single dimension at a time. More concretely, the Navier-Stokes tutorial (which I'm aware may lack some explanations) is mostly written in a dimension-generic way. Except from the initialisation steps, almost everything is dimension-independent, and it would be quite straightforward to convert the whole tutorial to 2D. This is actually how I would write a NS code that could work both in 2D and 3D. To be fair, it should be possible to make the |
Ah sorry, I bungled the example a bit. I think your output illustrates the issue still though, because there are arrays with size 0?
Right, I see! If we are distributed in x, y, this also requires that we "pre-process" with a transpose from the configuration with a singleton local dimension into a configuration with the non-singleton dimension local, correct? Just to motivate (and maybe it's already clear) this is a generic situation in geophysics. What we have is a system which is extremely flat (both physically and computationally), with huge dimensions in x, y but relatively thin z dimensions (a large example is: 16,000 by 8,000 by 100). In addition, the entire algorithm boils down to:
Another issue (which is maybe an Oceananigans design issue, but which will be very difficult to change) is that arrays are always 3D. When we do a 2D problem, we reduce the size of a dimension to 1 (and also often tag that direction with a type to indicate it has been I think I understand that you're saying this problem can be solved now with the framework provided. It seemed to me (but everything is definitely not crystal clear for me yet...) that I might have to implement different solvers for different dimensionality that mix transposes and transforms according to PencilFFT rules: 3 for xy, xz, and yz, and then one for xyz, using the appropriate layouts and specifications in each case. But
maybe there is another solution for writing relatively simple Oceananigans code that can support the full spectrum of configurations. I'll continue to self-educate... |
Feel free to close this if you don't think it's feasible to support singleton dimensions! |
I agree! It would be good to handle this case in a more intelligent way.
In that case, maybe it would be possible to reshape the arrays to
Ideally you should be able to write a single solver, with perhaps some "glue" code to be able to deal with the different possible layouts. You can for example have a single call to pencilfft_args(grid::RectilinearGrid{<:Any, Periodic, Periodic, Periodic}) = ...
pencilfft_args(grid::RectilinearGrid{<:Any, Periodic, Periodic, Flat}) = ... |
Ok, I'll get to work on that and see what comes out! |
With @jipolanco gracious help I'm starting to understand the logic. The problem is that configurations need to be "factorable" (somehow) in order for the work to be distributed amongst all processes (it's not actually a problem with singleton dimensions?) I rewrote the above script: using MPI
using PencilFFTs
MPI.Init()
comm = MPI.COMM_WORLD
plan = PencilFFTs.PencilFFTPlan((2, 2, 64), PencilFFTs.Transforms.FFT!(), (2, 4), comm)
input = PencilFFTs.allocate_input(plan)
msg = [MPI.Comm_rank(comm),
size(parent(input[1]))...,
size(parent(input[2]))...,
size(parent(input[3]))...]
function showmsg(msg)
rank = msg[1]
size1 = msg[2:4]
size2 = msg[5:7]
size3 = msg[8:10]
msg = string("rank: ", rank,
" size 1: ", size1,
" size 2: ", size2,
" size 3: ", size3)
@show msg
end
messages = MPI.Gather(msg, 0, comm)
if MPI.Comm_rank(MPI.COMM_WORLD) == 0
msg0 = messages[1:10]
msg1 = messages[11:20]
showmsg(msg0)
showmsg(msg1)
end which yields $ mpiexec -n 8 julia --project mpitest.jl [16:04:22]
msg = "rank: 0 size 1: [2, 1, 16] size 2: [2, 1, 16] size 3: [64, 0, 1]"
msg = "rank: 1 size 1: [2, 1, 16] size 2: [2, 1, 16] size 3: [64, 1, 1]" The first transform (along x) is length 2, distributed between 8 processors (2 in y, 4 in z). The second transform is also length 2 (along y), and can be divided into 8 (2 in z, 4 in x). But the third transform must divide a 2x2 region in (x, y) among 8 processes --- not possible. In other words, the processor configuration must be able to factor each pencil (yz, xz, xy, in that order). Here the xy pencil cannot be factored by (2, 4). Therefore, the only solution for this configuration is to let 4 processes sit idle while the 4 pencils are handled by just 4 out of the 8 total processes --- right? I believe this is the essential logic to the issue of the first post. Note also that singleton dimensions are sometimes ok, because plan = PencilFFTs.PencilFFTPlan((1, 2, 64), PencilFFTs.Transforms.FFT!(), (1, 2), comm) produces a successful decomposition: $ mpiexec -n 2 julia --project mpitest.jl [16:15:17]
msg = "rank: 0 size 1: [1, 2, 32] size 2: [2, 1, 32] size 3: [64, 1, 1]"
msg = "rank: 1 size 1: [1, 2, 32] size 2: [2, 1, 32] size 3: [64, 1, 1]" In this case, the requested configuration (1, 2) can factor all cases (EDIT: is this surprising?). Just to hammer it home (for me...) plan = PencilFFTs.PencilFFTPlan((1, 2, 64), PencilFFTs.Transforms.FFT!(), (2, 1), comm) $ mpiexec -n 2 julia --project mpitest.jl [16:15:27]
msg = "rank: 0 size 1: [1, 1, 64] size 2: [2, 0, 64] size 3: [64, 2, 0]"
msg = "rank: 1 size 1: [1, 1, 64] size 2: [2, 1, 64] size 3: [64, 2, 1]" This last case is interesting, because the second decomposition uses the permutation (y, x, z), which fails because (2, 1) cannot decompose the size (1, 2). But shouldn't it be possible to compute this problem if the intermediate configuration has the permutation (y, z, x)? Perhaps I'm missing something. It's a generic "issue" that not all configurations admit even distribution of work? I think this issue could become useful in 1 or 2 ways:
A few other things that may be achievable:
That last feature makes sense to support for applications where the memory decomposition is driven by factors other than the cost of FFT + transpositions (but where FFTs are still essential to the algorithm). But if I'm not mistaken, it's also an edge case that in principle can be implemented as a special case on the user side. |
Yes, that's exactly it. It's not really a problem with singleton dimensions. Sorry if it wasn't very clear previously... I agree that some docs on this would be very helpful. More generally, it's the problem of partitioning a 1D grid of
This should be possible, but I don't know if it's that easy to do it in a generic way though. Roughly speaking, the problem is that there's a logic behind the To see this, first consider your example with Here, both intermediate configurations I think it's easy to see that the current intermediate configuration Meanwhile, the alternative configuration The better solution would be to document what are the intermediate and output configurations starting from a given input configuration, which I'm aware it's not clear right now.
That would be great, your help is more than welcome here!
Yes, that's a good idea. I would do this when one ends up with 0D arrays. I hadn't thought about the specific problem with broadcasting.
Definitely, I think it's possible to automate the selection of a "good" ordering. The best would be for a user to be able to say "I want to partition the domain along 2 dimensions" and then for us to choose the best decomposition.
That's also an option, though I'm not sure if it's worth the extra complexity. |
I'm not sure I've quite wrapped my head around how this is all supposed to work, but it seems that when I have a local first dimension of length 1 I obtain unuseable results. I'm using the third interface in the docs:
https://jipolanco.github.io/PencilFFTs.jl/dev/PencilFFTs/#PencilFFTs.PencilFFTPlan
which (I thought) allows me to specify the number of processes in the 2nd and 3rd dimensions (but I might be confused).
This code:
when run with
produces
So we end up with arrays of size 0. Are singleton dimensions not handled correctly (bug?) or am I not using PencilFFTs correctly? I expected
proc_dims = (2, 2)
to use two processors for the 2nd and 3rd dimension (which seems to have happened forfirst(intput)
) Butinput[2]
andinput[3]
have 0 size so can't be used for computation?Basically, I have the need to write generic code that can limit to the 2D case when one or more dimensions is singleton, but also be valid for 3D cases.
The text was updated successfully, but these errors were encountered: