Potential optimizations using System.Span, System.Memory, and stackalloc #22

lostromb · 2018-05-31T20:25:36Z

Concentus interpreted almost all C pointers as a combination of an array and an integer offset, e.g. "int[] x, int x_ptr". These could potentially all be replaced by new Span constructs in .NetCore 2.0 for better optimization. Furthermore, most stack array allocations are done on the heap in C#, which also yields bad performance (except where they were hand-inlined inside the FFT)
See https://msdn.microsoft.com/en-us/magazine/mt814808.aspx

prepconcede · 2021-09-15T13:59:28Z

Hello, sir. How can i help? Not really good at low-level stuff, but ready to try where I can

lostromb · 2021-09-15T18:50:39Z

Uh, yeah, sure. You can start with some of the tight-loop kernels such as in here. Replace the byte[] array, int array_offset, int array_length function signatures with Span<byte> and then work upwards until the build passes again. Run the parity test console for a while to ensure there's no regressions. There are also places that allocate temporary heap arrays which could be switched to stackalloc, such as here in the NLSF encoder. You will probably have to reference the original C code to determine safe fixed-length array sizes since you can't do variable-length stackalloc.

lostromb · 2021-10-01T00:35:22Z

I took a quick shot of this because I was curious if .Net 6 carries an measurable performance benefits for this application. Porting the obvious places in kernels, xcorr, VQ, and NLSF yielded a slight (1-2%) benefit. Interestingly, when I ported over the MDCT code to use spans, the performance decreased by about 2%. My hunch is that the way that pointers get incremented in C (such as in kf_bfly5()) run faster when they are interpreted as incrementing the array index variables Fout_ptr++ rather than doing Fout = Fout.Slice(1). So it seems likely that the hand-inlined code is already quite optimized in some places.

AraHaan · 2022-03-15T20:17:43Z

Performance-wise, the current build runs about 40-50% as fast as its equivalent libopus build, mostly due to the lack of stack arrays and vectorized intrinsics in managed languages.

How ironic I was going to file this issue also as for c# this is no longer the case. Also I think Tanner Gooding knows of some intrinsics to increase perf on this as well.
Just mention cc Tanner @tannergooding (sorry for the ping tanner).

AraHaan · 2022-03-15T20:20:54Z

Uh, yeah, sure. You can start with some of the tight-loop kernels such as in here. Replace the byte[] array, int array_offset, int array_length function signatures with Span<byte> and then work upwards until the build passes again. Run the parity test console for a while to ensure there's no regressions. There are also places that allocate temporary heap arrays which could be switched to stackalloc, such as here in the NLSF encoder. You will probably have to reference the original C code to determine safe fixed-length array sizes since you can't do variable-length stackalloc.

for VL stackalloc I would rather rent an Memory<T> if that is possible currently.

in0finite · 2024-10-12T11:47:12Z

Hey @lostromb ,

I noticed that library generates a LOT of garbage when decoding. Even for smallest chunk (10 ms), it allocates 37 KB of memory.

I took a quick look at the code and spotted some obvious places where Span can be used instead of arrays.

Also, ArrayPool could probably be used on some places.

Is there any plan to do this, or do you accept Pull Requests ?

lostromb · 2024-10-12T15:39:55Z

My current "plan" is to chip away at this branch and overhaul the entire code to use unsafe pointers and unmanaged allocation, which should run a lot faster, have better parity with the original code, make SIMD easier to implement, and have zero allocations. I just haven't had time to work on it for a bit.

in0finite · 2024-10-12T16:32:40Z

It's good to see that there is some progress.
I don't mind having pointers in code, but still ArrayPools and stackallocs seem better than unmanaged memory.

Plus, I am using this library on WebAssembly, so compatibility is also a concern for me.

lostromb added the help wanted label May 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential optimizations using System.Span, System.Memory, and stackalloc #22

Potential optimizations using System.Span, System.Memory, and stackalloc #22

lostromb commented May 31, 2018 •

edited

Loading

prepconcede commented Sep 15, 2021 •

edited

Loading

lostromb commented Sep 15, 2021

lostromb commented Oct 1, 2021

AraHaan commented Mar 15, 2022 •

edited

Loading

AraHaan commented Mar 15, 2022 •

edited

Loading

in0finite commented Oct 12, 2024

lostromb commented Oct 12, 2024

in0finite commented Oct 12, 2024

Potential optimizations using System.Span, System.Memory, and stackalloc #22

Potential optimizations using System.Span, System.Memory, and stackalloc #22

Comments

lostromb commented May 31, 2018 • edited Loading

prepconcede commented Sep 15, 2021 • edited Loading

lostromb commented Sep 15, 2021

lostromb commented Oct 1, 2021

AraHaan commented Mar 15, 2022 • edited Loading

AraHaan commented Mar 15, 2022 • edited Loading

in0finite commented Oct 12, 2024

lostromb commented Oct 12, 2024

in0finite commented Oct 12, 2024

lostromb commented May 31, 2018 •

edited

Loading

prepconcede commented Sep 15, 2021 •

edited

Loading

AraHaan commented Mar 15, 2022 •

edited

Loading

AraHaan commented Mar 15, 2022 •

edited

Loading