Skip to content
This repository has been archived by the owner on Apr 28, 2023. It is now read-only.

Add support for strided tensors #494

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

protonu
Copy link
Contributor

@protonu protonu commented Jun 7, 2018

This commit is to start support for strided tensors. I made changes
to percolate a vector in TensorInfo down to emitCudaKernel to allow
codegen to cast strided tensors. This required changes to an unit test
to expect the correct cast.

Copy link
Contributor

@ftynse ftynse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not commit spurious files (tags and .swp)

@protonu protonu force-pushed the strided-tensors branch 2 times, most recently from ff1ed36 to 2f842fb Compare June 8, 2018 03:35
Copy link
Contributor

@nicolasvasilache nicolasvasilache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point about passing possibly empty information which is default initialized to empty.
Simply passing the actual values everywhere should be enough.

codegen is called (without TensorInfo) from a number of unit tests. I can either either remove the default argument, or change the unit tests. Please let me know what you think.

@protonu protonu force-pushed the strided-tensors branch 2 times, most recently from c4be53d to 9229b0c Compare June 8, 2018 14:45
This commit is to start support for strided tensors. I made changes
to percolate a vector in TensorInfo down to emitCudaKernel to allow
codegen to cast strided tensors. This required changes to an unit test
to expect the correct cast.
@protonu protonu force-pushed the strided-tensors branch from 9229b0c to 9cd8fc9 Compare June 8, 2018 14:56
@nicolasvasilache
Copy link
Contributor

After looking a bit deeper at the impl, I am not a fan of having 2 ways of specifying the strides during codegen (both Halide and TensorInfo) so we should chose 1. Worse, the mechanisms themselves may be incorrect because the name of the generated kernel will not vary with strides but the content of the kernel will be hardcoded for strides.

So I think I would go the following route, but this probably requires a second set of eyes cc @abadams
@ftynse:

  1. since Halide can represent strides, let's use Halide
  2. add new parameters for each stride in translateParam and translateOutput (in tc2halide.cc) and make those the strides of the halide ImageParam and OutputImageParam
  3. update computeParamValueMap in halide_utils.cc to tie the JIT stride information to the halide inputs
  4. update codegen to take input strides into account

This will increase the dimension of the parameter space for ISL, it will also increase the specializedName and the list of values we send to each kernel call.

On the other hand, this will allow us to go to a parametric stride version since strides will be passed to the kernel.

So this is more complex than I originally anticipated for a bootcamp task, sorry about that.

@ftynse
Copy link
Contributor

ftynse commented Jun 9, 2018

Because Scop already owns the description of in/out tensors (through Halide in/out parameters), it makes sense for it to also own the stride information. It is more consistent than passing this information to codegen with a room for mismatching size or stride errors.

This will increase the dimension of the parameter space for ISL, it will also increase the specializedName and the list of values we send to each kernel call.

And we probably need to introduce new parameters due to polynomial strides. That is, for A[N][M][K], the strides are M*K, K, 1 and you cannot express M*K as an affine function. So you need another parameter _M_times_K and some external knowledge that _M_times_K === M * K.

@skimo-openhub
Copy link
Contributor

skimo-openhub commented Jun 12, 2018 via email

@nicolasvasilache
Copy link
Contributor

I don't agree with this reasoning. It's not because we can do something that we necessarily need to do it.

Yeah that's essentially my motto. Who said we need to do it this way? I am putting a strawman proposal, feel free to propose something better. The reasoning behind using Halide for strides is that we are already using Halide at this level and that Halide already supports strides; we have just not used them. What's your alternative?

Hold on! Why does this need to be encoded as isl parameters?

They definitely don't need to, it is a proposal that followed the code flow as I was reading it and that would with minimal changes. But if having these extra parameters passed through ISL is a problem then by all means let's have an separate storage for them.

@nicolasvasilache
Copy link
Contributor

At some point I will also reconsider pulling this piece of code we had been using in ancient days.
There is no particular reason to limit ourselves to subarray semantics, we just need the no-alias property.

@nicolasvasilache
Copy link
Contributor

Also, note that there is a tradeoff:

  1. hardcoding the strides in the kernel which forces us to recompile each time there is a change in strides;
  2. using parametric strides but I think cuda still does not support C99 style variable length arrays so that would force us to pull in the DeviceTensor abstraction which OTOH buys use more flexible striding support

@skimo-openhub
Copy link
Contributor

skimo-openhub commented Jun 12, 2018 via email

@skimo-openhub
Copy link
Contributor

skimo-openhub commented Jun 12, 2018 via email

@nicolasvasilache
Copy link
Contributor

nicolasvasilache commented Jun 12, 2018

Doesn't changing the strides usually change the sizes too?

That's because strides in an overloaded term .. in this instance polyhedral folks have been using it improperly IMO. Sizes and strides are totally unrelated you can mix and match them arbitrarily. In ML a tensor is almost always defined as a "view" over a contiguous block of memory.

A Torch/DLPack/Numpy/other tensor is a view with an initial offset and a list of sizes and strides (of the same length). The sizes (sometimes also called shape) determine the number of elements in the view; the strides determine the spacing between consecutive elements in the view.

Here is an example:
T: offset=3, sizes=[2, 2], strides=[3, 7] will refer to elements (i, j) at offsets (3 + i * 3 + j * 7)) (notice potential aliasing if the tensor had > 21 elements).

(0, 0) -> @(3) 
(0, 1) -> @(10), 
(0, 0) -> @(6), 
(0, 0) -> @(13)

Strides corresponding to C99 variable length array are a special case: int [M][N][P] foo has a canonical view representation of offset=0, sizes=[M, N, P], strides=[N * P, P, 1] in an M*N*P allocation.

Such a view of a memory region has many properties that people like to use. For instance a transpose operation just consists of permuting 2 strides, it is a pure metadata transformation and requires no memory copy (contiguity and performance are not factors of course).

People also like to "review" tensors (traditionally called the reshape operation), for instance a 4-D output of a batched 2-D convolution can be reshaped (stride contiguity permitting) into a 2-D matric that can be passed to a GEMM operation. This is happening in many places in neural networks.

Of all these properties, we only care about non-aliasing because of obvious correctness issues.
Like many interesting abstractions people can abuse them, many reshape/expand/narrow/slice/transpose operations that you may find in neural network implementations are unnecessary, sometimes they are used to make your computation fit in a library call (see the BLAS GEMM API and you'll see similar stride concepts lda, ldb, ldc IIRC).

Lastly note that the type of compact representation polyhedral folks are accustomed to are called "contiguous" tensors (i.e.sizes=[M, N, P], strides=[N * P, P, 1] in an allocation of size >= M*N*P) and they can safely be cast to a C99 array.

So the classification is C99 array == contiguous ML tensor <= non-aliasing ML tensor <= ML tensor

Makes sense?

@facebook-github-bot
Copy link

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours has expired.

Before we can review or merge your code, we need you to email [email protected] with your details so we can update your status.

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants