Add support for strided tensors #494

protonu · 2018-06-07T22:12:21Z

This commit is to start support for strided tensors. I made changes
to percolate a vector in TensorInfo down to emitCudaKernel to allow
codegen to cast strided tensors. This required changes to an unit test
to expect the correct cast.

ftynse

please do not commit spurious files (tags and .swp)

nicolasvasilache

I don't understand the point about passing possibly empty information which is default initialized to empty.
Simply passing the actual values everywhere should be enough.

codegen is called (without TensorInfo) from a number of unit tests. I can either either remove the default argument, or change the unit tests. Please let me know what you think.

This commit is to start support for strided tensors. I made changes to percolate a vector in TensorInfo down to emitCudaKernel to allow codegen to cast strided tensors. This required changes to an unit test to expect the correct cast.

nicolasvasilache · 2018-06-08T17:03:08Z

After looking a bit deeper at the impl, I am not a fan of having 2 ways of specifying the strides during codegen (both Halide and TensorInfo) so we should chose 1. Worse, the mechanisms themselves may be incorrect because the name of the generated kernel will not vary with strides but the content of the kernel will be hardcoded for strides.

So I think I would go the following route, but this probably requires a second set of eyes cc @abadams
@ftynse:

since Halide can represent strides, let's use Halide
add new parameters for each stride in translateParam and translateOutput (in tc2halide.cc) and make those the strides of the halide ImageParam and OutputImageParam
update computeParamValueMap in halide_utils.cc to tie the JIT stride information to the halide inputs
update codegen to take input strides into account

This will increase the dimension of the parameter space for ISL, it will also increase the specializedName and the list of values we send to each kernel call.

On the other hand, this will allow us to go to a parametric stride version since strides will be passed to the kernel.

So this is more complex than I originally anticipated for a bootcamp task, sorry about that.

ftynse · 2018-06-09T10:25:45Z

Because Scop already owns the description of in/out tensors (through Halide in/out parameters), it makes sense for it to also own the stride information. It is more consistent than passing this information to codegen with a room for mismatching size or stride errors.

This will increase the dimension of the parameter space for ISL, it will also increase the specializedName and the list of values we send to each kernel call.

And we probably need to introduce new parameters due to polynomial strides. That is, for A[N][M][K], the strides are M*K, K, 1 and you cannot express M*K as an affine function. So you need another parameter _M_times_K and some external knowledge that _M_times_K === M * K.

skimo-openhub · 2018-06-12T12:48:52Z

On Fri, Jun 08, 2018 at 10:03:18AM -0700, Nicolas Vasilache wrote: So I think I would go the following route, but this probably requires a second set of eyes cc @abadams @ftynse: 1. since Halide can represent strides, let's use Halide

I don't agree with this reasoning. It's not because we _can_ do something that we necessarily need to do it.

This will increase the dimension of the parameter space for ISL, it will also increase the specializedName and the list of values we send to each kernel call.

Hold on! Why does this need to be encoded as isl parameters? skimo

nicolasvasilache · 2018-06-12T13:51:09Z

I don't agree with this reasoning. It's not because we can do something that we necessarily need to do it.

Yeah that's essentially my motto. Who said we need to do it this way? I am putting a strawman proposal, feel free to propose something better. The reasoning behind using Halide for strides is that we are already using Halide at this level and that Halide already supports strides; we have just not used them. What's your alternative?

Hold on! Why does this need to be encoded as isl parameters?

They definitely don't need to, it is a proposal that followed the code flow as I was reading it and that would with minimal changes. But if having these extra parameters passed through ISL is a problem then by all means let's have an separate storage for them.

nicolasvasilache · 2018-06-12T13:53:15Z

At some point I will also reconsider pulling this piece of code we had been using in ancient days.
There is no particular reason to limit ourselves to subarray semantics, we just need the no-alias property.

nicolasvasilache · 2018-06-12T13:57:19Z

Also, note that there is a tradeoff:

hardcoding the strides in the kernel which forces us to recompile each time there is a change in strides;
using parametric strides but I think cuda still does not support C99 style variable length arrays so that would force us to pull in the DeviceTensor abstraction which OTOH buys use more flexible striding support

skimo-openhub · 2018-06-12T14:01:13Z

On Tue, Jun 12, 2018 at 06:51:13AM -0700, Nicolas Vasilache wrote: > I don't agree with this reasoning. It's not because we _can_ do something that we necessarily need to do it. Yeah that's essentially my motto. Who said we *need* to do it this way? I am putting a strawman proposal, feel free to propose something better. The reasoning behind using Halide for strides is that we are already using Halide at this level and that Halide already supports strides; we have just not used them. What's your alternative?

I haven't look at it in detail, so I can't propose anything for now. Note that I don't necessarily disagree with the conclusion. I was only commenting on the reasoning.

> Hold on! Why does this need to be encoded as isl parameters? They definitely don't need to, it is a proposal that followed the code flow as I was reading it and that would with minimal changes. But if having these extra parameters passed through ISL is a problem then by all means let's have an separate storage for them.

I don't see any reason at this point why the strides should be made available as parameters, but a proper motivation might be able to convince me. skimo

skimo-openhub · 2018-06-12T14:06:34Z

On Tue, Jun 12, 2018 at 06:57:31AM -0700, Nicolas Vasilache wrote: Also, note that there is a tradeoff: 1. hardcoding the strides in the kernel which forces us to recompile each time there is a change in strides;

Doesn't changing the strides usually change the sizes too? That is, if you only look at the even elements, then from the point of view of the polyhedral model, the array is only half as big and this may make a difference on the promotion decisions. skimo

nicolasvasilache · 2018-06-12T14:30:51Z

Doesn't changing the strides usually change the sizes too?

That's because strides in an overloaded term .. in this instance polyhedral folks have been using it improperly IMO. Sizes and strides are totally unrelated you can mix and match them arbitrarily. In ML a tensor is almost always defined as a "view" over a contiguous block of memory.

A Torch/DLPack/Numpy/other tensor is a view with an initial offset and a list of sizes and strides (of the same length). The sizes (sometimes also called shape) determine the number of elements in the view; the strides determine the spacing between consecutive elements in the view.

Here is an example:
T: offset=3, sizes=[2, 2], strides=[3, 7] will refer to elements (i, j) at offsets (3 + i * 3 + j * 7)) (notice potential aliasing if the tensor had > 21 elements).

(0, 0) -> @(3) 
(0, 1) -> @(10), 
(0, 0) -> @(6), 
(0, 0) -> @(13)

Strides corresponding to C99 variable length array are a special case: int [M][N][P] foo has a canonical view representation of offset=0, sizes=[M, N, P], strides=[N * P, P, 1] in an M*N*P allocation.

Such a view of a memory region has many properties that people like to use. For instance a transpose operation just consists of permuting 2 strides, it is a pure metadata transformation and requires no memory copy (contiguity and performance are not factors of course).

People also like to "review" tensors (traditionally called the reshape operation), for instance a 4-D output of a batched 2-D convolution can be reshaped (stride contiguity permitting) into a 2-D matric that can be passed to a GEMM operation. This is happening in many places in neural networks.

Of all these properties, we only care about non-aliasing because of obvious correctness issues.
Like many interesting abstractions people can abuse them, many reshape/expand/narrow/slice/transpose operations that you may find in neural network implementations are unnecessary, sometimes they are used to make your computation fit in a library call (see the BLAS GEMM API and you'll see similar stride concepts lda, ldb, ldc IIRC).

Lastly note that the type of compact representation polyhedral folks are accustomed to are called "contiguous" tensors (i.e.sizes=[M, N, P], strides=[N * P, P, 1] in an allocation of size >= M*N*P) and they can safely be cast to a C99 array.

So the classification is C99 array == contiguous ML tensor <= non-aliasing ML tensor <= ML tensor

Makes sense?

facebook-github-bot · 2018-07-25T21:09:34Z

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired.

Before we can review or merge your code, we need you to email [email protected] with your details so we can update your status.

facebook-github-bot · 2020-10-07T15:44:09Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

protonu requested a review from nicolasvasilache June 7, 2018 22:12

facebook-github-bot added the CLA Signed label Jun 7, 2018

protonu requested a review from ftynse June 7, 2018 22:13

ftynse suggested changes Jun 7, 2018

View reviewed changes

protonu force-pushed the strided-tensors branch 2 times, most recently from ff1ed36 to 2f842fb Compare June 8, 2018 03:35

nicolasvasilache suggested changes Jun 8, 2018

View reviewed changes

protonu force-pushed the strided-tensors branch 2 times, most recently from c4be53d to 9229b0c Compare June 8, 2018 14:45

Add support for strided tensors

9cd8fc9

This commit is to start support for strided tensors. I made changes to percolate a vector in TensorInfo down to emitCudaKernel to allow codegen to cast strided tensors. This required changes to an unit test to expect the correct cast.

protonu force-pushed the strided-tensors branch from 9229b0c to 9cd8fc9 Compare June 8, 2018 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for strided tensors #494

Add support for strided tensors #494

protonu commented Jun 7, 2018

ftynse left a comment

nicolasvasilache left a comment •

edited by protonu

Loading

nicolasvasilache commented Jun 8, 2018

ftynse commented Jun 9, 2018

skimo-openhub commented Jun 12, 2018 via email

nicolasvasilache commented Jun 12, 2018

nicolasvasilache commented Jun 12, 2018

nicolasvasilache commented Jun 12, 2018

skimo-openhub commented Jun 12, 2018 via email

skimo-openhub commented Jun 12, 2018 via email

nicolasvasilache commented Jun 12, 2018 •

edited

Loading

facebook-github-bot commented Jul 25, 2018

facebook-github-bot commented Oct 7, 2020

Add support for strided tensors #494

Are you sure you want to change the base?

Add support for strided tensors #494

Conversation

protonu commented Jun 7, 2018

ftynse left a comment

Choose a reason for hiding this comment

nicolasvasilache left a comment • edited by protonu Loading

Choose a reason for hiding this comment

nicolasvasilache commented Jun 8, 2018

ftynse commented Jun 9, 2018

skimo-openhub commented Jun 12, 2018 via email

nicolasvasilache commented Jun 12, 2018

nicolasvasilache commented Jun 12, 2018

nicolasvasilache commented Jun 12, 2018

skimo-openhub commented Jun 12, 2018 via email

skimo-openhub commented Jun 12, 2018 via email

nicolasvasilache commented Jun 12, 2018 • edited Loading

facebook-github-bot commented Jul 25, 2018

facebook-github-bot commented Oct 7, 2020

nicolasvasilache left a comment •

edited by protonu

Loading

nicolasvasilache commented Jun 12, 2018 •

edited

Loading