FrameText Extractor is an open-source tool for optimized text extraction (OCR) from videos. It combines OpenCV, Pillow, and Tesseract to extract text from individual video frames, using multithreading to improve processing performance. It also includes a feature for text correction using a language model.
- Text Recognition (OCR): Extracts text from video frames using Tesseract OCR.
- Optimized Video Processing: Processes frames at regular intervals (e.g., 1 frame per second) and uses multithreading for better performance.
- Motion Detection: Detects changes between frames to avoid unnecessary text extraction on static frames.
- Scalable Processing: Utilizes all available CPU cores for faster execution.
- Flexible Customization: Allows for dynamic adjustment of frame interval, frame size, and motion detection sensitivity.
- Text Correction with LLM: Corrects extracted text using the DeepSeek API a language model.
To use this project, you'll need:
- Python 3.x
- OpenCV (
cv2
) - Pillow
- Tesseract OCR (installed and available in the system path)
- pytesseract
- Numpy
- OpenAI (DeepSeek API for text correction)
If you are using Windows, ensure that Tesseract is installed and the path is set correctly.
You can install the necessary Python libraries with:
pip install opencv-python pillow pytesseract numpy
Install Tesseract OCR:
- Windows: Tesseract Download
- macOS: Install via Homebrew:
Install additional language packages:
brew install tesseract
brew install tesseract-lang
- Linux: Install via your system’s package manager (e.g.,
apt
on Ubuntu):Install additional language packages:sudo apt install tesseract-ocr
Replacesudo apt install tesseract-ocr-[language-code]
[language-code]
with the specific code for the language you need (e.g.,deu
for German).
-
Set Tesseract Path (Windows only): Update the path in the
set_tesseract_path()
function if necessary:pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
-
Set API Key: Obtain an API key from DeepSeek and set it in the
api_key
variable:api_key = "<DeepSeek API Key>"
-
Process Video: Place your video in the same directory or specify the path in the
video_path
variable. -
Run the Script:
python frametext_extractor.py
The extracted and corrected text will be saved to the output file specified in
output_text
.
if __name__ == "__main__":
video_path = "video.mp4"
api_key = "<DeepSeek API Key>"
output_text = "corrected_extracted_text.txt"
final_text = process_and_correct_text(video_path, api_key)
with open(output_text, 'w', encoding='utf-8') as f:
f.write(final_text)
logging.info(f"Processing complete. Corrected text saved to {output_text}.")
- Load Video: The video is loaded, and frames are processed at regular intervals (e.g., 1 frame per second).
- Resize Frames: Frames are resized to speed up processing.
- Motion Detection: The script checks if the current frame differs significantly from the previous frame to avoid unnecessary OCR operations.
- Text Extraction: If motion is detected, text is extracted using Tesseract OCR.
- Text Correction: Extracted text is processed and corrected using the DeepSeek API and a language model.
- Save Results: The corrected text is saved to a text file.
You can customize the following parameters to suit your needs:
-
Frame Interval: Process more or fewer frames by adjusting the interval between frames. This is done by setting the
frame_interval
parameter when calling theprocess_video_optimized
function:process_video_optimized(video_path, output_text, frame_interval=2)
This example processes one frame every two seconds (if
fps = 1
). -
Frame Size: Adjust the scaling of the frames to influence processing time. Use the
scale_factor
parameter to resize frames. For example:process_video_optimized(video_path, output_text, scale_factor=3)
-
Motion Threshold: Adjust the sensitivity of motion detection by changing the
motion_threshold
parameter. A higher threshold reduces sensitivity (i.e., fewer movements are detected), while a lower threshold increases sensitivity:process_video_optimized(video_path, output_text, motion_threshold=0.1)
You can combine all these options to fine-tune your video processing:
process_video_optimized(video_path, output_text, frame_interval=1, scale_factor=2, motion_threshold=0.05)
Contributions are welcome! Please submit a pull request or open an issue if you have improvements or find bugs.
This project is licensed under the MIT License – see the LICENSE file for details.