Skip to content

Releases: ermig1979/Simd

Simd v4.3.75

07 Mar 11:12
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE2, SSSE3, AVX2 and AVX-512BW optimizations of function BgraToYuva420p.
  • NEON optimization of function NeuralSigmoid.
  • NEON optimization of function NeuralTanh.
  • NEON optimization of function NeuralPow.
  • NEON version of functions GetFlushToZero and SetFlushToZero.
  • NEON optimization of function Fill32f.
  • NEON optimization of function AlphaFilling.
  • NEON optimization of function CosineDistance16f.
  • NEON optimization of function CosineDistance32f.
  • NEON optimization of function Gemm32fNN.
  • NEON optimization of function Gemm32fNT.
  • NEON optimization of function FillPixel.
  • NEON optimization of function ReduceColor2x2.
  • NEON optimization of function BayerToBgra.
  • NEON optimization of function BayerToBgr.
  • NEON optimization of function TransformImage.
  • NEON optimization of function BgraToYuva420p.
  • NEON optimization of function Yuva420pToBgra.
  • NEON optimization of function Resizer.
  • NEON optimization of function HogLiteFindMax7x7.
  • NEON optimization of function HogLiteCreateMask.
  • NEON optimization of function HogLiteFilterSeparable.
  • NEON optimization of function HogLiteCompressFeatures.
  • NEON optimization of function HogLiteResizeFeatures.
  • NEON optimization of function HogLiteFilterFeatures.
  • NEON optimization of function HogLiteExtractFeatures.
  • NEON optimization of function Winograd2x3SetFilter.
  • NEON optimization of function Winograd4x3SetFilter.
  • NEON optimization of function Winograd2x3SetInput.
  • NEON optimization of function Winograd2x3SetOutput.
  • NEON optimization of function SynetAddBias.
  • NEON optimization of function SynetEltwiseLayerForward.
  • NEON optimization of function SynetPoolingForwardMax.
  • NEON optimization of function SynetFusedLayerForward0.
  • NEON optimization of function SynetFusedLayerForward1.
  • NEON optimization of function SynetFusedLayerForward2.
  • NEON optimization of function SynetFusedLayerForward3.
  • NEON optimization of function SynetFusedLayerForward4.
  • NEON optimization of function SynetInnerProductLayerForward.
  • NEON optimization of function SynetLrnLayerCrossChannels.
  • NEON optimization of function SynetPreluLayerForward.
  • NEON optimization of function SynetRestrictRange.
  • NEON optimization of function SynetScaleLayerForward.
  • NEON optimization of function SynetSoftmaxLayerForward.
  • NEON optimization of function ConvolutionForward.
Improving
  • AVX, AVX2 and AVX-512F optimizations of function ConvolutionForward.
  • SSE, AVX, AVX2 and AVX-512F optimizations of function Resizer.
Bug fixing
  • Error in AVX-512BW optimization of function ChangeColors.
  • Error in AVX-512BW optimization of function NormalizeHistogram.
  • Error in AVX-512F optimization of function NeuralConvolutionForward.
  • Error in NEON optimization of function Uint8ToFloat32.
  • Error in NEON optimization of function SquaredDifferenceSum16f.
  • Error in SSE version of functions GetFlushToZero.
  • Error in Base implementation of function SynetFusedLayerForward0.

Test framework

New features
  • Tests for verifying functionality of function BgraToYuva420p.
  • Tests for verifying NEON optimization of of function NeuralSigmoid.
  • Tests for verifying NEON optimization of of function NeuralTanh.
  • Tests for verifying NEON optimization of of function NeuralPow.
  • Tests for verifying NEON optimization of of function Fill32f.
  • Tests for verifying NEON optimization of of function AlphaFilling.
  • Tests for verifying NEON optimization of of function CosineDistance16f.
  • Tests for verifying NEON optimization of of function CosineDistance32f.
  • Tests for verifying NEON optimization of of function Gemm32fNN.
  • Tests for verifying NEON optimization of of function Gemm32fNT.
  • Tests for verifying NEON optimization of of function FillPixel.
  • Tests for verifying NEON optimization of of function ReduceColor2x2.
  • Tests for verifying NEON optimization of of function BayerToBgra.
  • Tests for verifying NEON optimization of of function BayerToBgr.
  • Tests for verifying NEON optimization of of function TransformImage.
  • Tests for verifying NEON optimization of of function BgraToYuva420p.
  • Tests for verifying NEON optimization of of function Yuva420pToBgra.
  • Tests for verifying NEON optimization of of function Resizer.
  • Tests for verifying NEON optimization of of function HogLiteFindMax7x7.
  • Tests for verifying NEON optimization of of function HogLiteCreateMask.
  • Tests for verifying NEON optimization of of function HogLiteFilterSeparable.
  • Tests for verifying NEON optimization of of function HogLiteCompressFeatures.
  • Tests for verifying NEON optimization of of function HogLiteResizeFeatures.
  • Tests for verifying NEON optimization of of function HogLiteFilterFeatures.
  • Tests for verifying NEON optimization of of function HogLiteExtractFeatures.
  • Tests for verifying NEON optimization of of function Winograd2x3SetFilter.
  • Tests for verifying NEON optimization of of function Winograd4x3SetFilter.
  • Tests for verifying NEON optimization of of function Winograd2x3SetInput.
  • Tests for verifying NEON optimization of of function Winograd2x3SetOutput.
  • Tests for verifying NEON optimization of of function SynetAddBias.
  • Tests for verifying NEON optimization of of function SynetEltwiseLayerForward.
  • Tests for verifying NEON optimization of of function SynetPoolingForwardMax.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward0.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward1.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward2.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward3.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward4.
  • Tests for verifying NEON optimization of of function SynetInnerProductLayerForward.
  • Tests for verifying NEON optimization of of function SynetLrnLayerCrossChannels.
  • Tests for verifying NEON optimization of of function SynetPreluLayerForward.
  • Tests for verifying NEON optimization of of function SynetRestrictRange.
  • Tests for verifying NEON optimization of of function SynetScaleLayerForward.
  • Tests for verifying NEON optimization of of function SynetSoftmaxLayerForward.
  • Tests for verifying NEON optimization of of function ConvolutionForward.
Bug fixing
  • Error (at 32-bit OS) in test of function HogLiteFindMax7x7.

Simd v4.2.74

01 Feb 06:43
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd2x3SetFilter(NHWC mode).
  • Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd4x3SetFilter(NHWC mode).
  • Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd2x3SetInput(NHWC mode).
  • Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd2x3SetOutput(NHWC mode).
  • Parameter gemm (a pointer to external function of matrix multiplication) in function ConvolutionInit.
  • Choise of the best gemm function in runtime.
  • SIMD_RUNTIME_GEMM_STATISTIC macro (annotation of runtime choise of gemm).
  • Base implementation, SSE, AVX, AVX2 and AVX-512F optimizations of function SynetPoolingForwardMax.
  • Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward4
  • Base implementation, SSE2, AVX2 and AVX-512F optimizations of function SynetSoftmaxForward.
  • Base implementation, SSE2, AVX2 and AVX-512BW optimizations of function Yuva420pToBgra.
  • Base implementation, SSSE3 optimization of function TransformImage.
Improving
  • SSE, AVX, AVX2 and AVX-512F optimizations of function ConvolutionForward.
Removing
  • Function Winograd2x3iSetInput.
  • Function Winograd2x3iSetOutput.
Bug fixing
  • Error in AVX-512F optimization of ConvolutionDirectHwcConvolutionBiasActivationDefault.

Test framework

New features
  • Tests for verifying functionality of function Winograd2x3SetFilter (NHWC mode).
  • Tests for verifying functionality of function Winograd4x3SetFilter (NHWC mode).
  • Tests for verifying functionality of function Winograd2x3SetInput (NHWC mode).
  • Tests for verifying functionality of function Winograd2x3SetOutput (NHWC mode).
  • Printing of internal performance statistic.
  • Tests for verifying functionality of function SynetPoolingForwardMax.
  • Tests for verifying functionality of function FusedLayerForward4.
  • Tests for verifying functionality of function SynetSoftmaxForward.
  • Tests for verifying functionality of function Yuva420pToBgra.
  • Tests for verifying functionality of function TransformImage.

Infrastructure

Bug fixing
  • The input variable CMAKE_CXX_FLAGS can contain invalid options (-mtune=native, -march=haswell, -mavx, etc.).

Simd v4.2.73

02 Jan 06:19
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward3.
  • Base implementation, SSE, AVX and AVX-512F optimizations of function ConvolutionBiasAndActivation(NHWC mode).
Improving
  • SSE, AVX, AVX2 and AVX-512F optimizations of function Gemm32fNN.
  • Add output parameter 'internal' to function ConvolutionSetWeight.
Bug fixing
  • Wrong assert condition in AVX-512F optimization of function NeuralRelu.
  • Visual Studio 2017 compiler error (intrinsic _mm512_maskz_loadu_epi8 in Release mode).
  • Crash: reading of unaligned memory in AVX-512BW optimization of function HogLiteFilterFeatures.
  • Performance bug in functions SynetAddBias, SynetFusedLayerForwardX, SynetPreluLayerForward and SynetScaleLayerForward when (count = 1, trans = 1).

Test framework

New features
  • Tests for verifying functionality of function FusedLayerForward3.

Simd v4.2.72

03 Dec 05:52
Compare
Choose a tag to compare

Algorithms

New features
  • PReLU activation function in convolution framework.
  • DepthwiseDotProduct optimization in convolution framework.
  • AVX2 and AVX-512F optimizations of ImgToCol function in convolution framework.
  • Transposed flag in function SynetAddBias.
  • Transposed flag in function SynetScaleLayerForward.
  • Base implementation, SSE, AVX and AVX-512F optimizations of function SynetPreluLayerForward.
  • Transposed flag in function FusedLayerForward0.
  • Transposed flag in function FusedLayerForward1.
  • Transposed flag in function FusedLayerForward2.
  • SIMD_NO_MANS_LAND macro.
Bug fixing
  • Memory reading outside of input array in SSE, AVX and AVX-512F optimizations of function Winograd2x3pSetInput.

Test framework

New features
  • Tests for verifying functionality of function SynetPreluLayerForward.

Simd v4.2.71

01 Nov 05:54
Compare
Choose a tag to compare

Algorithms

New features
  • Base implementation, SSE3, AVX and AVX-512F optimizations of function SynetRestrictRange.
  • Base implementation, SSE, AVX and AVX-512F optimizations of function Fill32f.
  • Base implementation, SSE, AVX and AVX-512F optimizations of function ConvolutionSetActivation
  • Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward0
  • Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward1
  • Base implementation, SSE, AVX, AVX2 and AVX-512F optimizations of function SynetInnerProductLayerForward
  • Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward2
Improving
  • Base implementation, SSE, SSE3, AVX, AVX2 and AVX-512F optimizations of function ConvolutionForward.
  • Add output parameter 'internal' to function ConvolutionSetWeight.
Bug fixing
  • Compiler error in function Gemm32fNN (32 bit mode).
  • Error in Relu when slope > 1.

Test framework

New features
  • Tests for verifying functionality of function SynetRestrictRange.
  • Tests for verifying functionality of function Fill32f.
  • Tests for verifying functionality of function FusedLayerForward0.
  • Tests for verifying functionality of function FusedLayerForward1.
  • Tests for verifying functionality of function SynetInnerProductLayerForward.
  • Tests for verifying functionality of function FusedLayerForward1.

Infrastructure

New features
  • PRINT_INFO option for CMake.
  • UpdateCopyrights.sh script.
Bug fixing
  • CMake build error when Simd is used as external project.

Simd v4.2.70

01 Oct 10:09
Compare
Choose a tag to compare

Algorithms

New features
  • AVX optimization of function Winograd2x3iSetInput.
  • AVX and AVX-512F optimizations of function Winograd2x3pSetInput.
  • AVX and AVX-512F optimizations of function Winograd2x3pSetOutput.
  • Base implementation and SSE and AVX optimizations of function Winograd2x3iSetOutput.
  • Own implementation of XML instead of tinyxml2.
  • Base implementation of function ConvolutionInit.
  • Base implementation of function ConvolutionBufferSize.
  • Base implementation and SSE optimization of function ConvolutionSetWeight.
  • Base implementation, SSE, AVX, AVX2 and AVX-512F optimizations of of function ConvolutionForward.
  • Base implementation, SSE3, AVX, AVX2 and AVX-512F optimizations of of function Gemm32fNT.

Test framework

New features
  • Tests for verifying of AVX optimization of function Winograd2x3iSetInput.
  • Tests for verifying of AVX and AVX-512F optimizations of function Winograd2x3pSetInput.
  • Tests for verifying of AVX and AVX-512F optimizations of function Winograd2x3pSetOutput.
  • Tests for verifying functionality of function Winograd2x3iSetOutput.
  • Tests for verifying functionality of function ConvolutionInit.
  • Tests for verifying functionality of function ConvolutionBufferSize.
  • Tests for verifying functionality of function ConvolutionSetWeight.
  • Tests for verifying functionality of function ConvolutionForward.
  • Tests for verifying functionality of function Gemm32fNT.

Simd v4.2.69

03 Sep 06:40
Compare
Choose a tag to compare

Algorithms

New features
  • SSE2, SSSE3, AVX2, AVX-512BW optimizations of function ReduceColor2x2.
  • Function Simd::Reduce2x2.
  • Function Simd::ResizeArea.
  • Conversion Sim::Point to cv::Point2f.
  • Base implementation and SSE optimization of function Winograd2x3iSetInput.
  • Base implementation and SSE optimization of function Winograd2x3pSetFilter.
  • Base implementation and SSE optimization of function Winograd2x3pSetInput.
  • Base implementation and SSE optimization of function Winograd2x3pSetOutput.
  • Base implementation and SSE optimization of function Winograd4x3pSetFilter.
  • Base implementation of function Winograd4x3pSetInput.
  • Base implementation of function Winograd4x3pSetOutput.
Bug fixing
  • Error in AVX2 optimization of function ReduceGray2x2 for Visual Studio 2013.
  • Assert in function Font::Draw.
  • Linker error when used options -march=native and -DAVX512=0 for SkylakeX.
  • Compiler error (Visual Studio 2017 for Android).

Test framework

New features
  • Tests for verifying functionality of SSE2, SSSE3, AVX2 and AVX-512BW optimizations of function ReduceColor2x2.
  • Tests for verifying functionality of function Winograd2x3iSetInput.
  • Tests for verifying functionality of function Winograd2x3pSetFilter.
  • Tests for verifying functionality of function Winograd2x3pSetInput.
  • Tests for verifying functionality of function Winograd2x3pSetOutput.
  • Tests for verifying functionality of function Winograd4x3pSetFilter.
  • Tests for verifying functionality of function Winograd4x3pSetInput.
  • Tests for verifying functionality of function Winograd4x3pSetOutput.

Infrastructure

New features
  • Compilation without generation of file SimdVersion.h.

Simd v4.2.68

06 Aug 07:07
Compare
Choose a tag to compare

Algorithms

New features
  • Error message in function Allocate.
Bug fixing
  • Error in AVX-512F optimization of function HogLiteCompressFeatures.
  • Error in AVX-512F optimization of function HogLiteFilterSeparable.
  • Error in AVX-512F optimization of function HogLiteFilterFeatures.

Test framework

Bug fixing
  • Test error for function CosineDistance32f.
  • Error in test for function HogLiteFilterSeparable.
  • Error in test for function ReduceGray4x4.

Infrastructure

Removing
  • Extraction of current SVN revision.

Documentation

Removing
  • References to old project on sourceforge.net.

Simd v4.2.67

03 Jul 12:55
Compare
Choose a tag to compare

Algorithms

New features
  • NEON optimization of function NeuralConvolutionForward.
  • Extension of functionality of SynetEltwiseLayerForward.
  • SSE2, AVX2 and AVX-512BW optimizations of function BayerToBgra.
  • SSSE3, AVX2 and AVX-512BW optimizations of function BayerToBgr.
Bug fixing
  • Visual Studio warning (NOMINMAX macro redefinition) in file SimdEnable.h.

Test framework

New features
  • Tests for verifying functionality of NEON optimization of function NeuralConvolutionForward.
  • Tests for verifying functionality of SSE2, AVX2 and AVX-512BW optimizations of function BayerToBgra.
  • Tests for verifying functionality of SSSE3, AVX2 and AVX-512BW optimizations of function BayerToBgr.

Infrastructure

New features
  • SIMD_TEST option in CMakeLists.txt.
  • Library building in arbitrary directory.
  • Library building with using of MinGW.
  • New release storing site (github.com).

Documentation

New features
  • An example for function View::Ref.