Releases: ermig1979/Simd
Releases · ermig1979/Simd
Simd v4.3.75
Algorithms
New features
- Base implementation, SSE2, SSSE3, AVX2 and AVX-512BW optimizations of function BgraToYuva420p.
- NEON optimization of function NeuralSigmoid.
- NEON optimization of function NeuralTanh.
- NEON optimization of function NeuralPow.
- NEON version of functions GetFlushToZero and SetFlushToZero.
- NEON optimization of function Fill32f.
- NEON optimization of function AlphaFilling.
- NEON optimization of function CosineDistance16f.
- NEON optimization of function CosineDistance32f.
- NEON optimization of function Gemm32fNN.
- NEON optimization of function Gemm32fNT.
- NEON optimization of function FillPixel.
- NEON optimization of function ReduceColor2x2.
- NEON optimization of function BayerToBgra.
- NEON optimization of function BayerToBgr.
- NEON optimization of function TransformImage.
- NEON optimization of function BgraToYuva420p.
- NEON optimization of function Yuva420pToBgra.
- NEON optimization of function Resizer.
- NEON optimization of function HogLiteFindMax7x7.
- NEON optimization of function HogLiteCreateMask.
- NEON optimization of function HogLiteFilterSeparable.
- NEON optimization of function HogLiteCompressFeatures.
- NEON optimization of function HogLiteResizeFeatures.
- NEON optimization of function HogLiteFilterFeatures.
- NEON optimization of function HogLiteExtractFeatures.
- NEON optimization of function Winograd2x3SetFilter.
- NEON optimization of function Winograd4x3SetFilter.
- NEON optimization of function Winograd2x3SetInput.
- NEON optimization of function Winograd2x3SetOutput.
- NEON optimization of function SynetAddBias.
- NEON optimization of function SynetEltwiseLayerForward.
- NEON optimization of function SynetPoolingForwardMax.
- NEON optimization of function SynetFusedLayerForward0.
- NEON optimization of function SynetFusedLayerForward1.
- NEON optimization of function SynetFusedLayerForward2.
- NEON optimization of function SynetFusedLayerForward3.
- NEON optimization of function SynetFusedLayerForward4.
- NEON optimization of function SynetInnerProductLayerForward.
- NEON optimization of function SynetLrnLayerCrossChannels.
- NEON optimization of function SynetPreluLayerForward.
- NEON optimization of function SynetRestrictRange.
- NEON optimization of function SynetScaleLayerForward.
- NEON optimization of function SynetSoftmaxLayerForward.
- NEON optimization of function ConvolutionForward.
Improving
- AVX, AVX2 and AVX-512F optimizations of function ConvolutionForward.
- SSE, AVX, AVX2 and AVX-512F optimizations of function Resizer.
Bug fixing
- Error in AVX-512BW optimization of function ChangeColors.
- Error in AVX-512BW optimization of function NormalizeHistogram.
- Error in AVX-512F optimization of function NeuralConvolutionForward.
- Error in NEON optimization of function Uint8ToFloat32.
- Error in NEON optimization of function SquaredDifferenceSum16f.
- Error in SSE version of functions GetFlushToZero.
- Error in Base implementation of function SynetFusedLayerForward0.
Test framework
New features
- Tests for verifying functionality of function BgraToYuva420p.
- Tests for verifying NEON optimization of of function NeuralSigmoid.
- Tests for verifying NEON optimization of of function NeuralTanh.
- Tests for verifying NEON optimization of of function NeuralPow.
- Tests for verifying NEON optimization of of function Fill32f.
- Tests for verifying NEON optimization of of function AlphaFilling.
- Tests for verifying NEON optimization of of function CosineDistance16f.
- Tests for verifying NEON optimization of of function CosineDistance32f.
- Tests for verifying NEON optimization of of function Gemm32fNN.
- Tests for verifying NEON optimization of of function Gemm32fNT.
- Tests for verifying NEON optimization of of function FillPixel.
- Tests for verifying NEON optimization of of function ReduceColor2x2.
- Tests for verifying NEON optimization of of function BayerToBgra.
- Tests for verifying NEON optimization of of function BayerToBgr.
- Tests for verifying NEON optimization of of function TransformImage.
- Tests for verifying NEON optimization of of function BgraToYuva420p.
- Tests for verifying NEON optimization of of function Yuva420pToBgra.
- Tests for verifying NEON optimization of of function Resizer.
- Tests for verifying NEON optimization of of function HogLiteFindMax7x7.
- Tests for verifying NEON optimization of of function HogLiteCreateMask.
- Tests for verifying NEON optimization of of function HogLiteFilterSeparable.
- Tests for verifying NEON optimization of of function HogLiteCompressFeatures.
- Tests for verifying NEON optimization of of function HogLiteResizeFeatures.
- Tests for verifying NEON optimization of of function HogLiteFilterFeatures.
- Tests for verifying NEON optimization of of function HogLiteExtractFeatures.
- Tests for verifying NEON optimization of of function Winograd2x3SetFilter.
- Tests for verifying NEON optimization of of function Winograd4x3SetFilter.
- Tests for verifying NEON optimization of of function Winograd2x3SetInput.
- Tests for verifying NEON optimization of of function Winograd2x3SetOutput.
- Tests for verifying NEON optimization of of function SynetAddBias.
- Tests for verifying NEON optimization of of function SynetEltwiseLayerForward.
- Tests for verifying NEON optimization of of function SynetPoolingForwardMax.
- Tests for verifying NEON optimization of of function SynetFusedLayerForward0.
- Tests for verifying NEON optimization of of function SynetFusedLayerForward1.
- Tests for verifying NEON optimization of of function SynetFusedLayerForward2.
- Tests for verifying NEON optimization of of function SynetFusedLayerForward3.
- Tests for verifying NEON optimization of of function SynetFusedLayerForward4.
- Tests for verifying NEON optimization of of function SynetInnerProductLayerForward.
- Tests for verifying NEON optimization of of function SynetLrnLayerCrossChannels.
- Tests for verifying NEON optimization of of function SynetPreluLayerForward.
- Tests for verifying NEON optimization of of function SynetRestrictRange.
- Tests for verifying NEON optimization of of function SynetScaleLayerForward.
- Tests for verifying NEON optimization of of function SynetSoftmaxLayerForward.
- Tests for verifying NEON optimization of of function ConvolutionForward.
Bug fixing
- Error (at 32-bit OS) in test of function HogLiteFindMax7x7.
Simd v4.2.74
Algorithms
New features
- Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd2x3SetFilter(NHWC mode).
- Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd4x3SetFilter(NHWC mode).
- Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd2x3SetInput(NHWC mode).
- Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd2x3SetOutput(NHWC mode).
- Parameter gemm (a pointer to external function of matrix multiplication) in function ConvolutionInit.
- Choise of the best gemm function in runtime.
- SIMD_RUNTIME_GEMM_STATISTIC macro (annotation of runtime choise of gemm).
- Base implementation, SSE, AVX, AVX2 and AVX-512F optimizations of function SynetPoolingForwardMax.
- Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward4
- Base implementation, SSE2, AVX2 and AVX-512F optimizations of function SynetSoftmaxForward.
- Base implementation, SSE2, AVX2 and AVX-512BW optimizations of function Yuva420pToBgra.
- Base implementation, SSSE3 optimization of function TransformImage.
Improving
- SSE, AVX, AVX2 and AVX-512F optimizations of function ConvolutionForward.
Removing
- Function Winograd2x3iSetInput.
- Function Winograd2x3iSetOutput.
Bug fixing
- Error in AVX-512F optimization of ConvolutionDirectHwcConvolutionBiasActivationDefault.
Test framework
New features
- Tests for verifying functionality of function Winograd2x3SetFilter (NHWC mode).
- Tests for verifying functionality of function Winograd4x3SetFilter (NHWC mode).
- Tests for verifying functionality of function Winograd2x3SetInput (NHWC mode).
- Tests for verifying functionality of function Winograd2x3SetOutput (NHWC mode).
- Printing of internal performance statistic.
- Tests for verifying functionality of function SynetPoolingForwardMax.
- Tests for verifying functionality of function FusedLayerForward4.
- Tests for verifying functionality of function SynetSoftmaxForward.
- Tests for verifying functionality of function Yuva420pToBgra.
- Tests for verifying functionality of function TransformImage.
Infrastructure
Bug fixing
- The input variable CMAKE_CXX_FLAGS can contain invalid options (-mtune=native, -march=haswell, -mavx, etc.).
Simd v4.2.73
Algorithms
New features
- Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward3.
- Base implementation, SSE, AVX and AVX-512F optimizations of function ConvolutionBiasAndActivation(NHWC mode).
Improving
- SSE, AVX, AVX2 and AVX-512F optimizations of function Gemm32fNN.
- Add output parameter 'internal' to function ConvolutionSetWeight.
Bug fixing
- Wrong assert condition in AVX-512F optimization of function NeuralRelu.
- Visual Studio 2017 compiler error (intrinsic _mm512_maskz_loadu_epi8 in Release mode).
- Crash: reading of unaligned memory in AVX-512BW optimization of function HogLiteFilterFeatures.
- Performance bug in functions SynetAddBias, SynetFusedLayerForwardX, SynetPreluLayerForward and SynetScaleLayerForward when (count = 1, trans = 1).
Test framework
New features
- Tests for verifying functionality of function FusedLayerForward3.
Simd v4.2.72
Algorithms
New features
- PReLU activation function in convolution framework.
- DepthwiseDotProduct optimization in convolution framework.
- AVX2 and AVX-512F optimizations of ImgToCol function in convolution framework.
- Transposed flag in function SynetAddBias.
- Transposed flag in function SynetScaleLayerForward.
- Base implementation, SSE, AVX and AVX-512F optimizations of function SynetPreluLayerForward.
- Transposed flag in function FusedLayerForward0.
- Transposed flag in function FusedLayerForward1.
- Transposed flag in function FusedLayerForward2.
- SIMD_NO_MANS_LAND macro.
Bug fixing
- Memory reading outside of input array in SSE, AVX and AVX-512F optimizations of function Winograd2x3pSetInput.
Test framework
New features
- Tests for verifying functionality of function SynetPreluLayerForward.
Simd v4.2.71
Algorithms
New features
- Base implementation, SSE3, AVX and AVX-512F optimizations of function SynetRestrictRange.
- Base implementation, SSE, AVX and AVX-512F optimizations of function Fill32f.
- Base implementation, SSE, AVX and AVX-512F optimizations of function ConvolutionSetActivation
- Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward0
- Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward1
- Base implementation, SSE, AVX, AVX2 and AVX-512F optimizations of function SynetInnerProductLayerForward
- Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward2
Improving
- Base implementation, SSE, SSE3, AVX, AVX2 and AVX-512F optimizations of function ConvolutionForward.
- Add output parameter 'internal' to function ConvolutionSetWeight.
Bug fixing
- Compiler error in function Gemm32fNN (32 bit mode).
- Error in Relu when slope > 1.
Test framework
New features
- Tests for verifying functionality of function SynetRestrictRange.
- Tests for verifying functionality of function Fill32f.
- Tests for verifying functionality of function FusedLayerForward0.
- Tests for verifying functionality of function FusedLayerForward1.
- Tests for verifying functionality of function SynetInnerProductLayerForward.
- Tests for verifying functionality of function FusedLayerForward1.
Infrastructure
New features
- PRINT_INFO option for CMake.
- UpdateCopyrights.sh script.
Bug fixing
- CMake build error when Simd is used as external project.
Simd v4.2.70
Algorithms
New features
- AVX optimization of function Winograd2x3iSetInput.
- AVX and AVX-512F optimizations of function Winograd2x3pSetInput.
- AVX and AVX-512F optimizations of function Winograd2x3pSetOutput.
- Base implementation and SSE and AVX optimizations of function Winograd2x3iSetOutput.
- Own implementation of XML instead of tinyxml2.
- Base implementation of function ConvolutionInit.
- Base implementation of function ConvolutionBufferSize.
- Base implementation and SSE optimization of function ConvolutionSetWeight.
- Base implementation, SSE, AVX, AVX2 and AVX-512F optimizations of of function ConvolutionForward.
- Base implementation, SSE3, AVX, AVX2 and AVX-512F optimizations of of function Gemm32fNT.
Test framework
New features
- Tests for verifying of AVX optimization of function Winograd2x3iSetInput.
- Tests for verifying of AVX and AVX-512F optimizations of function Winograd2x3pSetInput.
- Tests for verifying of AVX and AVX-512F optimizations of function Winograd2x3pSetOutput.
- Tests for verifying functionality of function Winograd2x3iSetOutput.
- Tests for verifying functionality of function ConvolutionInit.
- Tests for verifying functionality of function ConvolutionBufferSize.
- Tests for verifying functionality of function ConvolutionSetWeight.
- Tests for verifying functionality of function ConvolutionForward.
- Tests for verifying functionality of function Gemm32fNT.
Simd v4.2.69
Algorithms
New features
- SSE2, SSSE3, AVX2, AVX-512BW optimizations of function ReduceColor2x2.
- Function Simd::Reduce2x2.
- Function Simd::ResizeArea.
- Conversion Sim::Point to cv::Point2f.
- Base implementation and SSE optimization of function Winograd2x3iSetInput.
- Base implementation and SSE optimization of function Winograd2x3pSetFilter.
- Base implementation and SSE optimization of function Winograd2x3pSetInput.
- Base implementation and SSE optimization of function Winograd2x3pSetOutput.
- Base implementation and SSE optimization of function Winograd4x3pSetFilter.
- Base implementation of function Winograd4x3pSetInput.
- Base implementation of function Winograd4x3pSetOutput.
Bug fixing
- Error in AVX2 optimization of function ReduceGray2x2 for Visual Studio 2013.
- Assert in function Font::Draw.
- Linker error when used options -march=native and -DAVX512=0 for SkylakeX.
- Compiler error (Visual Studio 2017 for Android).
Test framework
New features
- Tests for verifying functionality of SSE2, SSSE3, AVX2 and AVX-512BW optimizations of function ReduceColor2x2.
- Tests for verifying functionality of function Winograd2x3iSetInput.
- Tests for verifying functionality of function Winograd2x3pSetFilter.
- Tests for verifying functionality of function Winograd2x3pSetInput.
- Tests for verifying functionality of function Winograd2x3pSetOutput.
- Tests for verifying functionality of function Winograd4x3pSetFilter.
- Tests for verifying functionality of function Winograd4x3pSetInput.
- Tests for verifying functionality of function Winograd4x3pSetOutput.
Infrastructure
New features
- Compilation without generation of file SimdVersion.h.
Simd v4.2.68
Algorithms
New features
- Error message in function Allocate.
Bug fixing
- Error in AVX-512F optimization of function HogLiteCompressFeatures.
- Error in AVX-512F optimization of function HogLiteFilterSeparable.
- Error in AVX-512F optimization of function HogLiteFilterFeatures.
Test framework
Bug fixing
- Test error for function CosineDistance32f.
- Error in test for function HogLiteFilterSeparable.
- Error in test for function ReduceGray4x4.
Infrastructure
Removing
- Extraction of current SVN revision.
Documentation
Removing
- References to old project on sourceforge.net.
Simd v4.2.67
Algorithms
New features
- NEON optimization of function NeuralConvolutionForward.
- Extension of functionality of SynetEltwiseLayerForward.
- SSE2, AVX2 and AVX-512BW optimizations of function BayerToBgra.
- SSSE3, AVX2 and AVX-512BW optimizations of function BayerToBgr.
Bug fixing
- Visual Studio warning (NOMINMAX macro redefinition) in file SimdEnable.h.
Test framework
New features
- Tests for verifying functionality of NEON optimization of function NeuralConvolutionForward.
- Tests for verifying functionality of SSE2, AVX2 and AVX-512BW optimizations of function BayerToBgra.
- Tests for verifying functionality of SSSE3, AVX2 and AVX-512BW optimizations of function BayerToBgr.
Infrastructure
New features
- SIMD_TEST option in CMakeLists.txt.
- Library building in arbitrary directory.
- Library building with using of MinGW.
- New release storing site (github.com).
Documentation
New features
- An example for function View::Ref.