Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFMPEG errors when recognizing MP4 AAC rather than WAV #2677

Open
tkzv opened this issue Dec 29, 2024 · 0 comments
Open

FFMPEG errors when recognizing MP4 AAC rather than WAV #2677

tkzv opened this issue Dec 29, 2024 · 0 comments

Comments

@tkzv
Copy link

tkzv commented Dec 29, 2024

I built whisper.cpp with -DWHISPER_FFMPEG=ON and tried to transcribe an MP4 file with a command like .../whisper.cpp-1.7.3/bin/main -m .../ggml-large-v3-turbo-q8_0.bin -of Speech -otxt -osrt Speech.mp4 -l auto

I got a lot of errors, and only the first 7 seconds out of 2 minutes were transcribed. The typical FFMPEG errors were:

[aac @ 0x55cd235cfb00] TNS filter order 29 is greater than maximum 12.
[aac @ 0x55cd235cfb00] Number of bands (26) exceeds limit (2).
[aac @ 0x55cd235cfb00] skip_data_stream_element: Input buffer exhausted before END element found
[aac @ 0x55cd235cfb00] Prediction is not allowed in AAC-LC.
[aac @ 0x55cd235cfb00] Scalefactor (-1) out of range.
[aac @ 0x55cd235cfb00] invalid band type
[aac @ 0x55cd235cfb00] Gain control is not implemented. Update your FFmpeg version to the newest one from Git. If the problem still occurs, it means that your file has a feature which has not been implemented.
[aac @ 0x55cd235cfb00] channel element 2.5 is not allocated
[aac @ 0x55cd235cfb00] Sample rate index in program config element does not match the sample rate index configured by the container.
[aac @ 0x55cd235cfb00] decode_pce: Input buffer exhausted before END element found
[aac @ 0x55cd235cfb00] Reserved bit set.
[aac @ 0x55cd235cfb00] channel element 2.2 is not allocated
[aac @ 0x55cd235cfb00] Pulse data corrupt or invalid.
[aac @ 0x55cd235cfb00] skip_data_stream_element: Input buffer exhausted before END element found

The same MP4 was correctly converted to WAV without errors by the command ffmpeg -i Speech.mp4 -ar 16k out.wav

The video was downloaded from https://t.me/botcharov/11840.
OS: Gentoo Linux, AMD64.
whisper.cpp version 1.7.3.

$ ffmpeg -version
ffmpeg version 6.1.1 Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 13 (Gentoo 13.3.1_p20241025 p1)
configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --docdir=/usr/share/doc/ffmpeg-6.1.1-r8/html --mandir=/usr/share/man --enable-shared --cc=x86_64-pc-linux-gnu-gcc --cxx=x86_64-pc-linux-gnu-g++ --ar=x86_64-pc-linux-gnu-ar --nm=x86_64-pc-linux-gnu-nm --strip=x86_64-pc-linux-gnu-strip --ranlib=x86_64-pc-linux-gnu-ranlib --pkg-config=x86_64-pc-linux-gnu-pkg-config --optflags='-O2 -pipe -march=native' --disable-static --disable-libaribcaption --enable-avfilter --disable-stripping --disable-optimizations --disable-libcelt --enable-nonfree --enable-version3 --enable-version3 --disable-indev=oss --disable-outdev=oss --enable-version3 --enable-version3 --enable-version3 --enable-nonfree --enable-bzlib --enable-runtime-cpudetect --disable-debug --enable-gcrypt --enable-gnutls --enable-gmp --enable-gpl --enable-hardcoded-tables --enable-iconv --enable-libxml2 --enable-lzma --enable-network --enable-opencl --enable-openssl --enable-postproc --enable-libsmbclient --enable-ffplay --enable-sdl2 --enable-vaapi --enable-vdpau --enable-vulkan --enable-xlib --enable-libxcb --enable-libxcb-shm --enable-libxcb-xfixes --enable-zlib --enable-libcdio --disable-libiec61883 --disable-libdc1394 --enable-libcaca --enable-openal --enable-opengl --enable-libv4l2 --enable-libpulse --enable-libdrm --enable-libjack --enable-libopencore-amrwb --enable-libopencore-amrnb --enable-libcodec2 --enable-libdav1d --enable-libfdk-aac --enable-libopenjpeg --disable-libjxl --enable-libbluray --enable-libgme --enable-libgsm --enable-libaribb24 --enable-libmodplug --enable-libopus --disable-libvpl --enable-libilbc --enable-librtmp --enable-libssh --enable-libspeex --enable-libsrt --enable-librsvg --disable-ffnvcodec --enable-libvorbis --enable-libvpx --enable-libzvbi --disable-appkit --enable-libbs2b --enable-chromaprint --disable-cuda-llvm --enable-libflite --enable-fontconfig --enable-frei0r --enable-libfribidi --disable-libglslang --enable-ladspa --enable-lcms2 --enable-libass --disable-libplacebo --enable-libtesseract --enable-lv2 --enable-librubberband --disable-libshaderc --enable-libfreetype --enable-libharfbuzz --enable-libvidstab --disable-libvmaf --enable-libzmq --enable-libzimg --enable-libsoxr --enable-pthreads --enable-amf --enable-libvo-amrwbenc --enable-libkvazaar --enable-libaom --enable-libmp3lame --enable-libopenh264 --enable-librav1e --enable-libsnappy --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --disable-gnutls --enable-version3 --disable-armv5te --disable-armv6 --disable-armv6t2 --disable-neon --disable-vfp --disable-vfpv3 --disable-armv8 --disable-dotprod --disable-i8mm --disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu --disable-altivec --disable-vsx --disable-power8 --disable-amd3dnow --disable-amd3dnowext --disable-avx2 --enable-pic --cpu=host --disable-doc --disable-htmlpages --enable-manpages
libavutil      58. 29.100 / 58. 29.100
libavcodec     60. 31.102 / 60. 31.102
libavformat    60. 16.100 / 60. 16.100
libavdevice    60.  3.100 / 60.  3.100
libavfilter     9. 12.100 /  9. 12.100
libswscale      7.  5.100 /  7.  5.100
libswresample   4. 12.100 /  4. 12.100
libpostproc    57.  3.100 / 57.  3.100

The full console output of whisper.cpp is attached as log1.txt.
Also note that out of 6 transcribed lines only the first 3 are correct. The text beyond 00:00:06.420 is some garbage that doesn't appear anywhere in the transcribed file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant