Releases: ermig1979/Simd
Releases · ermig1979/Simd
Simd v5.0.115
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cdc.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Cd.
- Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512BF16, AMX optimizations of class SynetMergedConvolution32fBf16Dc.
- AVX-512BF16 extension support.
- AVX-512BF16 optimizations of function Float32ToBFloat16.
- AVX-512BF16, AMX optimizations of class SynetConvolution32fBf16Nhwc.
- AMX extension support.
- Support of 3D pooling in Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetPoolingMax32f.
Improving
- AVX-512BW optimizations of function Fill32f.
Renaming
- Rename function SynetPoolingForwardAverage to SynetPoolingAverage.
- Rename function SynetPoolingForwardMax32f to SynetPoolingMax32f.
- Rename function SynetPoolingForwardMax8u to SynetPoolingMax8u.
Replacing
- Replace AVX-512F optimizations to AVX-512BW for function SvmSumLinear.
- Replace AVX-512F optimizations to AVX-512BW for function Fill32f.
- Replace AVX-512F optimizations to AVX-512BW for class ResizerNearest.
- Replace AVX-512F optimizations to AVX-512BW for class ResizerFloatBilinear.
- Replace AVX-512F optimizations to AVX-512BW for function SquaredDifferenceSum32f.
- Replace AVX-512F optimizations to AVX-512BW for function SquaredDifferenceKahanSum32f.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralConvolutionForward.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Forward.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Backward.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution2x2Sum.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Forward.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Backward.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution3x3Sum.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Forward.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Backward.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution4x4Sum.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Forward.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Backward.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddConvolution5x5Sum.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralProductSum.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAdaptiveGradientUpdate.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling1x1Max3x3.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling2x2Max2x2.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralPooling2x2Max3x3.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralUpdateWeights.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddValue.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddVector.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralAddVectorMultipliedByValue.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughSigmoid.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughSigmoid2.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeSigmoid.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralRoughTanh.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeTanh.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralDerivativeRelu.
- Replace AVX-512F optimizations to AVX-512BW for function NeuralPow.
- Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fGemmNN.
- Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fGemmNT.
- Replace AVX-512F optimizations to AVX-512BW for class SynetConvolution32fWinograd.
- Replace AVX-512F optimizations to AVX-512BW for class SynetDeconvolution32fGemmNN.
- Replace AVX-512F optimizations to AVX-512BW for class SynetDeconvolution32fNhwcDirect2x2.
- Replace AVX-512F optimizations to AVX-512BW for function SynetDeconvolution32fInit.
- Replace AVX-512F optimizations to AVX-512BW for class SynetInnerProduct32fGemm.
- Replace AVX-512F optimizations to AVX-512BW for class SynetInnerProduct32fProd.
- Replace AVX-512F optimizations to AVX-512BW for function SynetInnerProduct32fInit.
- Replace AVX-512F optimizations to AVX-512BW for function ConvolutionBiasAndActivation.
- Replace AVX-512F optimizations to AVX-512BW for function SynetReorderImage.
- Replace AVX-512F optimizations to AVX-512BW for function SynetReorderFilter.
- Replace AVX-512F optimizations to AVX-512BW for function Gemm32fNN.
- Replace AVX-512F optimizations to AVX-512BW for function Gemm32fNT.
- Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward0.
- Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward1.
- Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward2.
- Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward3.
- Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward4.
- Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward8.
- Replace AVX-512F optimizations to AVX-512BW for function SynetFusedLayerForward9.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetFilter.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetInput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x3Block1x4SetOutput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetFilter.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetInput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel1x5Block1x4SetOutput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetFilter.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetInput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block2x2SetOutput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetFilter.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetInput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel2x2Block4x4SetOutput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetFilter.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetInput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block2x2SetOutput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetFilter.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetInput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block3x3SetOutput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetFilter.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetInput.
- Replace AVX-512F optimizations to AVX-512BW for function WinogradKernel3x3Block4x4SetOutput.
- Replace AVX-512F optimizations to AVX-512BW for function SynetElu32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetHardSigmoid32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetHswish32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetMish32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetPreluLayerForward.
- Replace AVX-512F optimizations to AVX-512BW for function SynetRelu32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetRestrictRange32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetSigmoid32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetSoftplus32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetSwish32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetTanh32f.
- Replace AVX-512F optimizations to AVX-512BW for function SynetScaleLayerForward.
- Replace AVX-512F optimizations to AVX-512BW for function SynetPoolingAverage.
- Replace AVX-512F optimizations to AVX-512BW for function SynetAddBias.
- Replace AVX-512F optimizations to AVX-512BW for function SynetEltwiseLayerForward.
- Replace AVX-512F optimizations to AVX-512BW for function SynetInnerProductLayerForward.
- Replace AVX-512F optimizations to AVX-512BW for function SynetLrnLayerCrossChannels.
Simd v4.10.114
Algorithms
New features
- Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToUyvy422.
- AVX-512BW, NEON optimizations of function Uyvy422ToYuv420p.
- AVX-512BW, NEON optimizations of function Uyvy422ToBgr.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Float32ToBFloat16.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function BFloat16ToFloat32.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function BFloat16ToFloat32.
- Base implementation of class SynetConvolution32fBf16Gemm.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of class SynetConvolution32fBf16Nhwc.
- Base implementation of class SynetMergedConvolution32fBf16.
Removing
- Remove external GEMM function parameter from function SynetConvolution32fInit.
- Remove external GEMM function parameter from function SynetDeconvolution32fInit.
Test framework
New features
- Tests for verifying functionality of function Yuv420pToUyvy422.
- Tests for verifying functionality of function Float32ToBFloat16.
- Tests for verifying functionality of function BFloat16ToFloat32.
Infrastructure
New features
- Project files for Microsoft Visual Studio 2022.
Simd v4.9.113
Algorithms
New features
- SSE4.1, AVX2, AVX-512BW optimizations of class ResizerByteArea2x2.
Improving
- Base implementation of class ResizerByteArea1x1.
Bug fixing
- Error in Base implementation of class ResizerByteArea2x2.
- Error in AVX optimizations of class SynetConvolution32fDirectNchw.
Removing
- SimdSynetCompatibilityFloatZero flag.
Infrastructure
New features
- Git commit ID info in function SimdVersion.
- Git branch name in function SimdVersion.
Simd v4.9.112
Algorithms
New features
- NEON optimizations of function Base64Encode.
- NEON optimizations of ImageJpegSaver class.
- NEON optimizations of function Yuv420pSaveAsJpegToMemory.
- NEON optimizations of function Nv12SaveAsJpegToMemory.
- Owner method in View structure.
- Owner method in Frame structure.
- Capture method in View structure.
- Capture method in Frame structure.
- Base implementation of class ResizerByteAreaReduced2x2.
Bug fixing
- MSVS compiler error in AVX-512BW optimizations of function Yuv420pToBgraV2.
- Error in AVX2 optimizations of function BgraToRgb.
- Error (aligned reading of unaligned memory) in SSE4.1, AVX2, AVX-512BW optimizations of function InterleaveBgra.
- Error in function View::ToOcv.
- Error in View copy constructor (from OpenCV Mat).
Test framework
Bug fixing
- Wrong default ROOT_PATH for Linux.
- Error in test SynetConvert32fTo8uAutoTest.
- Special test ResizeYuv420pSpecialTest.
Simd v4.9.111
Algorithms
New features
- AVX2, AVX-512BW optimizations of ResizerByteBicubic class.
- SSE4.1, AVX2, AVX-512BW, NEON optimizations of function Base64Decode.
- NEON optimizations of function SynetSwish32f.
- Swish activation function to NEON optimizations of SynetConvolution32f framework.
- Swish activation function to NEON optimizations of SynetDeconvolution32f framework.
- Swish activation function to NEON optimizations of SynetMergedConvolution32f framework.
- Swish activation function to NEON optimizations of SynetConvolution8i framework.
- Swish activation function to NEON optimizations of SynetMergedConvolution8i framework.
- NEON optimizations of function Yuv444pToBgraV2.
- SSE2, AVX2, AVX-512BW, NEON optimizations of function Yuv420pToBgraV2.
Improving
- SSE4.1 optimizations of ResizerByteBicubic class.
Bug fixing
- Compiler error in NEON optimizations of function AlphaUnpremultiply.
- MSVS Compiler warnings in SSE4.1, AVX2, AVX-512BW optimizations of function TransformImage.
Simd v4.9.110
Algorithms
New features
- Base implementation, SSE4.1 optimizations of ResizerByteBicubic class.
- Base implementation of function BgraToYuv444pV2.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Nv12SaveAsJpegToMemory.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Yuv420pSaveAsJpegToMemory.
- Base implementation of function BgraToYuv420pV2.
Bug fixing
- Error in SSE4.1, AVX2, AVX-512BW optimizations of function BgraToRgba.
- Error in SSE4.1, AVX2 optimizations of function BgraToBgr.
- Error in SSE4.1, AVX2 optimizations of function BgraToRgb.
- Error in Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of function AlphaUnpremultiply.
Test framework
New features
- Tests for verifying functionality of function BgraToYuv444pV2.
- Tests for verifying functionality of function Nv12SaveAsJpegToMemory.
- Tests for verifying functionality of function Yuv420pSaveAsJpegToMemory.
- Tests for verifying functionality of function BgraToYuv420pV2.
Simd v4.9.109
Algorithms
New features
- Parameter Uyvy422ToBgr to function.
- SSE4.1, AVX2 optimizations of function Uyvy422ToBgr.
- Base implementation, SSE4.1, AVX2 optimizations of function Uyvy422ToYuv420p.
- Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function Base64Encode.
- Base implementation of function Base64Decode.
Improving
- AVX2 optimizations of class ResizerNearest for Bgr24, Uv16.
Renaming
- Function UyvyToBgr to Uyvy422ToBgr.
Test framework
New features
- Tests for verifying functionality of function Uyvy422ToYuv420p.
- Tests for verifying functionality of function Base64Encode.
- Tests for verifying functionality of function Base64Decode.
Documentation
Changes
- Update developers list.
Simd v4.9.108
Algorithms
New features
- SSE4.1, AVX2, AVX-512F, AVX-512BW optimizations of class ResizerNearest.
- Add SimdResizeMethodNearestPytorch to SimdResizeMethodType enumeration.
- Add parameter BackgroundStatUpdateTime to Motion Detector.
- MotionDetector performance optimization (case of falling star).
- 16-bit UYVY image format in View.
- Base implementation of function UyvyToBgr.
- Base implementation, SSE2, AVX2, AVX-512F optimizations of function SynetSwish32f.
- SimdConvolutionActivationSwish item of SimdConvolutionActivationType enumeration.
- Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetConvolution32f framework.
- Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetDeconvolution32f framework.
- Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetMergedConvolution32f framework.
- Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8i framework.
- Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
- SimdYuvType enumeration.
- Base implementation, SSE2, AVX2, AVX-512BW optimizations of function Yuv444pToBgraV2.
- Function Simd::Resize supports images with 16-bit channel size.
- Base implementation function Yuv420pToBgraV2.
Improving
- Refactoring of SimdResizeMethodType enumeration.
Bug fixing
- Stack corruption in function Simd::Avx2::JpegWriteBlockSubs.
Test framework
New features
- Tests for verifying functionality of function UyvyToBgr.
- Tests for verifying functionality of function SynetSwish32f.
- Tests for verifying functionality of function Yuv444pToBgraV2.
- Tests for verifying functionality of function Yuv420pToBgraV2.
Infrastructure
Bug fixing
- Wrong compiler options correction in Cmake.
Simd v4.9.107
Algorithms
New features
- Internal class Holder to replace std::unique_ptr for old compilers without support of C++11 standard.
- SimdBayerLayoutType enumeration.
- Base implementation of class ResizerNearest.
Bug fixing
- Compiler error when defined macro SIMD_SSE2_DISABLE.
- Compiler error when defined macro SIMD_NEON_DISABLE.
Infrastructure
New features
- SIMD_ROOT Cmake parameter.
Simd v4.9.106
Algorithms
New features
- Base implementation, SSE2, AVX, AVX-512F, NEON optimizations of function SynetHardSigmoid32f.
- SimdConvolutionActivationHardSigmoid item of SimdConvolutionActivationType enumeration.
- HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetConvolution32f framework.
- HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetDeconvolution32f framework.
- HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetMergedConvolution32f framework.
- NEON optimizations of SynetMergedConvolution32fDc class.
- NEON optimizations of SynetMergedConvolution32fCd class.
- NEON optimizations of SynetInnerProduct32fGemm class.
- NEON optimizations of SynetInnerProduct32fProd class.
- HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI, NEON optimizations of SynetConvolution8i framework.
- HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
Bug fixing
- Compiler error in file SimdInit.h (CLang, Windows).
Removing
- Remove including SimdConfig.h in SimdLib.h.
Test framework
New features
- Tests for verifying functionality of function SynetHardSigmoid32f.
- '-pi' test parameter (to print internal performance statistics of Simd Library to console).