Skip to content

GuillaumeRochette/psifx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

psifx - Psychological and Social Interactions Feature Extraction


Content

  1. Installation
    1. Docker
    2. Local
  2. Usage
    1. Concept
    2. Audio
      1. Manipulation
        1. Extraction
        2. Conversion
        3. Mixdown
        4. Normalization
      2. Diarization
        1. Inference
        2. Visualization
      3. Identification
        1. Inference
      4. Transcription
        1. Inference
        2. Enhance
      5. Non-verbal Feature Extraction
        1. Inference
    3. Video
      1. Manipulation
        1. Process
      2. Pose Estimation
        1. Inference
        2. Visualization
      3. Face Analysis
        1. Inference
        2. Visualization
  3. Examples

Installation

We recommend using Docker for reducing compatibility issues.

Docker

  1. Install Docker Engine and make sure to follow the post-install instructions. Otherwise, install Docker Desktop.
  2. If you have a GPU and want to use it to accelerate compute:
    1. Install NVIDIA CUDA Toolkit.
    2. Install NVIDIA Container Toolkit.
  3. Run the latest image:
    export PSIFX_VERSION="0.0.2"
    export DATA_PATH="/path/to/data" 
    
    docker run \
       --user $(id -u):$(id -g) \
       --gpus all \
       --mount type=bind,source=$DATA_PATH,target=$DATA_PATH \
       --interactive \
       --tty \
       guillaumerochette/psifx:$PSIFX_VERSION
  4. Check out psifx available commands!
    psifx --all-help

Local

  1. Install the following system-wide:
    sudo apt install ffmpeg ubuntu-restricted-extras
  2. Create a dedicated conda environment following the instructions in that order:
    conda create -y -n psifx-env python=3.9 pip
    conda activate psifx-env
  3. Now install psifx:
    pip install 'git+https://github.com/GuillaumeRochette/psifx.git'
  4. We provide an API endpoint to use OpenFace, useable only if you comply with their license agreement, e.g. academic, research or non-commercial purposes.
    1. Install the following system-wide:
      sudo apt install \
      build-essential \
      cmake \
      wget \
      libopenblas-dev \
      libopencv-dev \
      libdlib-dev \
      libboost-all-dev \
      libsqlite3-dev
    2. Install OpenFace using our fork.
      wget https://raw.githubusercontent.com/GuillaumeRochette/OpenFace/master/install.py && \
      python install.py

Usage

Concept

psifx is a Python package that can be used both as a library,

from psifx.audio.diarization.pyannote.tool import PyannoteDiarizationTool

# Parameterize a tool w/ specific settings, such as choosing the underlying neural network, etc.
tool = PyannoteDiarizationTool(...)
# Run the inference method on a given data, here it will be an audio track for example.
tool.inference(...)

But it can also come with its own CLI, that can be run directly in a terminal,

psifx audio diarization pyannote inference --audio /path/to/audio.wav --diarization /path/to/diarization.rttm

Audio

Manipulation

Extraction
psifx audio manipulation extraction [-h] --video VIDEO --audio AUDIO
                                           [--overwrite | --no-overwrite]
                                           [--verbose | --no-verbose]

    Tool for extracting the audio track from a video.

optional arguments:
  -h, --help            show this help message and exit
  --video VIDEO         path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
  --audio AUDIO         path to the output audio file, such as `/path/to/audio.wav`
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)
Conversion
psifx audio manipulation conversion [-h] --audio AUDIO --mono_audio
                                           MONO_AUDIO
                                           [--overwrite | --no-overwrite]
                                           [--verbose | --no-verbose]

    Tool for converting any audio track to a mono audio track at 16kHz sample rate.

optional arguments:
  -h, --help            show this help message and exit
  --audio AUDIO         path to the input audio file, such as `/path/to/audio.wav` (or .mp3, etc.)
  --mono_audio MONO_AUDIO
                        path to the output audio file, such as `/path/to/mono-audio.wav`
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)
Mixdown
psifx audio manipulation mixdown [-h] --mono_audios MONO_AUDIOS
                                        [MONO_AUDIOS ...] --mixed_audio
                                        MIXED_AUDIO
                                        [--overwrite | --no-overwrite]
                                        [--verbose | --no-verbose]

    Tool for mixing multiple mono audio tracks.

optional arguments:
  -h, --help            show this help message and exit
  --mono_audios MONO_AUDIOS [MONO_AUDIOS ...]
                        paths to the input mono audio files, such as `/path/to/mono-audio-1.wav /path/to/mono-audio-2.wav`
  --mixed_audio MIXED_AUDIO
                        path to the output mixed audio file, such as `/path/to/mixed-audio.wav`
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)
Normalization
psifx audio manipulation normalization [-h] --audio AUDIO
                                              --normalized_audio
                                              NORMALIZED_AUDIO
                                              [--overwrite | --no-overwrite]
                                              [--verbose | --no-verbose]

    Tool for normalizing an audio track.

optional arguments:
  -h, --help            show this help message and exit
  --audio AUDIO         path to the input audio file, such as `/path/to/audio.wav`
  --normalized_audio NORMALIZED_AUDIO
                        path to the output normalized audio file, such as `/path/to/normalized-audio.wav`
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)

Diarization

Inference
psifx audio diarization pyannote inference [-h] --audio AUDIO
                                                  --diarization DIARIZATION
                                                  [--num_speakers NUM_SPEAKERS]
                                                  [--model_name MODEL_NAME]
                                                  [--api_token API_TOKEN]
                                                  [--device DEVICE]
                                                  [--overwrite | --no-overwrite]
                                                  [--verbose | --no-verbose]

    Tool for diarizing an audio track with pyannote.

optional arguments:
  -h, --help            show this help message and exit
  --audio AUDIO         path to the input audio file, such as `/path/to/audio.wav`
  --diarization DIARIZATION
                        path to the output diarization file, such as `/path/to/diarization.rttm`
  --num_speakers NUM_SPEAKERS
                        number of speaking participants, if ignored the model will try to guess it, it is advised to specify it
  --model_name MODEL_NAME
                        version number of the pyannote/speaker-diarization model, c.f. https://huggingface.co/pyannote/speaker-diarization/tree/main/reproducible_research
  --api_token API_TOKEN
                        API token for the downloading the models from HuggingFace
  --device DEVICE       device on which to run the inference, either 'cpu' or 'cuda'
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)
Visualization
psifx audio diarization pyannote visualization [-h] --diarization
                                                      DIARIZATION
                                                      --visualization
                                                      VISUALIZATION
                                                      [--overwrite | --no-overwrite]
                                                      [--verbose | --no-verbose]

    Tool for visualizing the diarization of a track.

optional arguments:
  -h, --help            show this help message and exit
  --diarization DIARIZATION
                        path to the input diarization file, such as `/path/to/diarization.rttm`
  --visualization VISUALIZATION
                        path to the output visualization file, such as `/path/to/visualization.png`
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)

Identification

Inference
psifx audio identification pyannote inference [-h] --mixed_audio
                                                     MIXED_AUDIO --diarization
                                                     DIARIZATION --mono_audios
                                                     MONO_AUDIOS
                                                     [MONO_AUDIOS ...]
                                                     --identification
                                                     IDENTIFICATION
                                                     [--model_names MODEL_NAMES [MODEL_NAMES ...]]
                                                     [--api_token API_TOKEN]
                                                     [--device DEVICE]
                                                     [--overwrite | --no-overwrite]
                                                     [--verbose | --no-verbose]

    Tool for identifying speakers from an audio track with pyannote.

optional arguments:
  -h, --help            show this help message and exit
  --mixed_audio MIXED_AUDIO
                        path to the input mixed audio file, such as `/path/to/mixed-audio.wav`
  --diarization DIARIZATION
                        path to the input diarization file, such as `/path/to/diarization.rttm`
  --mono_audios MONO_AUDIOS [MONO_AUDIOS ...]
                        paths to the input mono audio files, such as `/path/to/mono-audio-1.wav /path/to/mono-audio-2.wav`
  --identification IDENTIFICATION
                        path to the output identification file, such as `/path/to/identification.json`
  --model_names MODEL_NAMES [MODEL_NAMES ...]
                        names of the embedding models
  --api_token API_TOKEN
                        API token for the downloading the models from HuggingFace
  --device DEVICE       device on which to run the inference, either 'cpu' or 'cuda'
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)

Transcription

Inference
psifx audio transcription whisper inference [-h] --audio AUDIO
                                                   --transcription
                                                   TRANSCRIPTION
                                                   [--language LANGUAGE]
                                                   [--model_name MODEL_NAME]
                                                   [--translate_to_english | --no-translate_to_english]
                                                   [--device DEVICE]
                                                   [--overwrite | --no-overwrite]
                                                   [--verbose | --no-verbose]

    Tool for transcribing an audio track with Whisper.

optional arguments:
  -h, --help            show this help message and exit
  --audio AUDIO         path to the input audio file, such as `/path/to/audio.wav`
  --transcription TRANSCRIPTION
                        path to the output transcription file, such as `/path/to/transcription.vtt`
  --language LANGUAGE   language of the audio, if ignore, the model will try to guess it, it is advised to specify it
  --model_name MODEL_NAME
                        name of the model, check https://github.com/openai/whisper#available-models-and-languages
  --translate_to_english, --no-translate_to_english
                        whether to transcribe the audio in its original language or to translate it to english (default: False)
  --device DEVICE       device on which to run the inference, either 'cpu' or 'cuda'
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)
Enhance
psifx audio transcription whisper enhance [-h] --transcription
                                                 TRANSCRIPTION --diarization
                                                 DIARIZATION --identification
                                                 IDENTIFICATION
                                                 --enhanced_transcription
                                                 ENHANCED_TRANSCRIPTION
                                                 [--overwrite | --no-overwrite]
                                                 [--verbose | --no-verbose]

    Tool for enhancing a transcription with diarization and identification.

optional arguments:
  -h, --help            show this help message and exit
  --transcription TRANSCRIPTION
                        path to the input transcription file, such as `/path/to/transcription.vtt`
  --diarization DIARIZATION
                        path to the input diarization file, such as `/path/to/diarization.rttm`
  --identification IDENTIFICATION
                        path to the input identification file, such as `/path/to/identification.json`
  --enhanced_transcription ENHANCED_TRANSCRIPTION
                        path to the output transcription file, such as `/path/to/enhanced-transcription.vtt`
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)

Non-verbal Feature Extraction

Inference
psifx audio speech opensmile inference [-h] --audio AUDIO --diarization
                                              DIARIZATION --features FEATURES
                                              [--feature_set FEATURE_SET]
                                              [--feature_level FEATURE_LEVEL]
                                              [--overwrite | --no-overwrite]
                                              [--verbose | --no-verbose]

    Tool for extracting non-verbal speech features from an audio track with OpenSmile.

optional arguments:
  -h, --help            show this help message and exit
  --audio AUDIO         path to the input audio file, such as `/path/to/audio.wav`
  --diarization DIARIZATION
                        path to the input diarization file, such as `/path/to/diarization.rttm`
  --features FEATURES   path to the output feature archive, such as `/path/to/opensmile.tar.gz`
  --feature_set FEATURE_SET
                        available sets: ['ComParE_2016', 'GeMAPSv01a', 'GeMAPSv01b', 'eGeMAPSv01a', 'eGeMAPSv01b', 'eGeMAPSv02', 'emobase']
  --feature_level FEATURE_LEVEL
                        available levels: ['lld', 'lld_de', 'func']
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)

Video

Manipulation

Process
psifx video manipulation process [-h] --in_video IN_VIDEO --out_video
                                        OUT_VIDEO [--start START] [--end END]
                                        [--x_min X_MIN] [--y_min Y_MIN]
                                        [--x_max X_MAX] [--y_max Y_MAX]
                                        [--width WIDTH] [--height HEIGHT]
                                        [--overwrite | --no-overwrite]
                                        [--verbose | --no-verbose]

    Tool for processing videos.
    The trimming, cropping and resizing can be performed all at once, and in that order.

optional arguments:
  -h, --help            show this help message and exit
  --in_video IN_VIDEO   path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
  --out_video OUT_VIDEO
                        path to the output video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
  --start START         trim: timestamp in seconds of the start of the selection
  --end END             trim: timestamp in seconds of the end of the selection
  --x_min X_MIN         crop: x-axis coordinate of the top-left corner in pixels
  --y_min Y_MIN         crop: y-axis coordinate of the top-left corner in pixels
  --x_max X_MAX         crop: x-axis coordinate of the bottom-right corner in pixels
  --y_max Y_MAX         crop: y-axis coordinate of the bottom-right corner in pixels
  --width WIDTH         resize: width of the resized output
  --height HEIGHT       resize: height of the resized output
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)

Pose Estimation

Inference
psifx video pose mediapipe inference [-h] --video VIDEO --poses POSES
                                            [--masks MASKS]
                                            [--mask_threshold MASK_THRESHOLD]
                                            [--model_complexity MODEL_COMPLEXITY]
                                            [--smooth | --no-smooth]
                                            [--device DEVICE]
                                            [--overwrite | --no-overwrite]
                                            [--verbose | --no-verbose]

    Tool for inferring human pose with MediaPipe Holistic.

optional arguments:
  -h, --help            show this help message and exit
  --video VIDEO         path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
  --poses POSES         path to the output pose archive, such as `/path/to/poses.tar.gz`
  --masks MASKS         path to the output segmentation mask video file, such as `/path/to/masks.mp4` (or .avi, .mkv, etc.)
  --mask_threshold MASK_THRESHOLD
                        threshold for the binarization of the segmentation mask
  --model_complexity MODEL_COMPLEXITY
                        complexity of the model: {0, 1, 2}, higher means more FLOPs, but also more accurate results
  --smooth, --no-smooth
                        temporally smooth the inference results to reduce the jitter (default: True)
  --device DEVICE       device on which to run the inference, either 'cpu' or 'cuda'
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)
Visualization
psifx video pose mediapipe visualization [-h] --video VIDEO --poses
                                                POSES --visualization
                                                VISUALIZATION
                                                [--confidence_threshold CONFIDENCE_THRESHOLD]
                                                [--overwrite | --no-overwrite]
                                                [--verbose | --no-verbose]

    Tool for visualizing the poses over the video.

optional arguments:
  -h, --help            show this help message and exit
  --video VIDEO         path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
  --poses POSES         path to the input pose archive, such as `/path/to/poses.tar.gz`
  --visualization VISUALIZATION
                        path to the output visualization video file, such as `/path/to/visualization.mp4` (or .avi, .mkv, etc.)
  --confidence_threshold CONFIDENCE_THRESHOLD
                        threshold for not displaying low confidence keypoints
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)

Face Analysis

Inference
psifx video face openface inference [-h] --video VIDEO --features
                                           FEATURES
                                           [--overwrite | --no-overwrite]
                                           [--verbose | --no-verbose]

    Tool for inferring face features from videos with OpenFace.

optional arguments:
  -h, --help            show this help message and exit
  --video VIDEO         path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
  --features FEATURES   path to the output feature archive, such as `/path/to/openface.tar.gz`
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)
Visualization
psifx video face openface visualization [-h] --video VIDEO --features
                                               FEATURES --visualization
                                               VISUALIZATION [--depth DEPTH]
                                               [--f_x F_X] [--f_y F_Y]
                                               [--c_x C_X] [--c_y C_Y]
                                               [--overwrite | --no-overwrite]
                                               [--verbose | --no-verbose]

    Tool for visualizing face features from videos with OpenFace.

optional arguments:
  -h, --help            show this help message and exit
  --video VIDEO         path to the input video file, such as `/path/to/video.mp4` (or .avi, .mkv, etc.)
  --features FEATURES   path to the input feature archive, such as `/path/to/openface.tar.gz`
  --visualization VISUALIZATION
                        path to the output video file, such as `/path/to/visualization.mp4` (or .avi, .mkv, etc.)
  --depth DEPTH         projection: assumed static depth of the subject in meters
  --f_x F_X             projection: x-axis of the focal length
  --f_y F_Y             projection: y-axis of the focal length
  --c_x C_X             projection: x-axis of the principal point
  --c_y C_Y             projection: y-axis of the principal point
  --overwrite, --no-overwrite
                        overwrite existing files, otherwise raises an error (default: False)
  --verbose, --no-verbose
                        verbosity of the script (default: True)

Examples

psifx video manipulation process --in_video Videos/Left.mp4 --out_video Videos/Left.processed.mp4  --start 18 --end 210 --x_min 1347 --y_min 459 --x_max 2553 --y_max 1898 --overwrite
psifx video manipulation process --in_video Videos/Right.mp4 --out_video Videos/Right.processed.mp4  --start 18 --end 210 --x_min 1358 --y_min 435 --x_max 2690 --y_max 2049 --overwrite

Audio

Pre-processing

psifx audio manipulation extraction --video Videos/Left.mp4 --audio Audios/Left.wav
psifx audio manipulation extraction --video Videos/Right.mp4 --audio Audios/Right.wav

psifx audio manipulation mixdown --mono_audios Audios/Right.wav Audios/Left.wav --mixed_audio Audios/Mixed.wav

psifx audio manipulation normalization --audio Audios/Right.wav --normalized_audio Audios/Right.normalized.wav
psifx audio manipulation normalization --audio Audios/Left.wav --normalized_audio Audios/Left.normalized.wav
psifx audio manipulation normalization --audio Audios/Mixed.wav --normalized_audio Audios/Mixed.normalized.wav

Inference

psifx audio diarization pyannote inference --audio Audios/Mixed.normalized.wav --diarization Diarizations/Mixed.rttm --num_speakers 2 --device cuda

psifx audio identification pyannote inference --mixed_audio Audios/Mixed.normalized.wav --diarization Diarizations/Mixed.rttm --mono_audios Audios/Left.normalized.wav Audios/Right.normalized.wav --identification Identifications/Mixed.json --device cuda

psifx audio transcription whisper inference --audio Audios/Mixed.normalized.wav --transcription Transcriptions/Mixed.vtt --model_name large --language fr --device cuda

psifx audio transcription whisper enhance --transcription Transcriptions/Mixed.vtt --diarization Diarizations/Mixed.rttm --identification Identifications/Mixed.json --enhanced_transcription Transcriptions/Mixed.enhanced.vtt

Visualization

psifx audio diarization visualization --diarization Diarizations/Mixed.rttm --visualization Visualizations/Mixed.png

Video

Inference

psifx video pose mediapipe inference --video Videos/Right.mp4 --poses Poses/Right.tar.xz --masks Masks/Right.mp4
psifx video pose mediapipe inference --video Videos/Left.mp4 --poses Poses/Left.tar.xz --masks Masks/Left.mp4

psifx video face openface inference --video Videos/Right.mp4 --features Faces/Right.tar.xz
psifx video face openface inference --video Videos/Left.mp4 --features Faces/Left.tar.xz

Visualization

psifx video pose mediapipe visualization --video Videos/Right.mp4 --poses Poses/Right.tar.xz --visualization Visualizations/Right.mediapipe.mp4
psifx video pose mediapipe visualization --video Videos/Left.mp4 --poses Poses/Left.tar.xz --visualization Visualizations/Left.mediapipe.mp4

psifx video face openface visualization --video Videos/Right.mp4 --features Faces/Right.tar.xz --visualization Visualizations/Right.openface.mp4
psifx video face openface visualization --video Videos/Left.mp4 --features Faces/Left.tar.xz --visualization Visualizations/Left.openface.mp4

Development

Build & Push

export PSIFX_VERSION="0.0.2"
export HF_TOKEN="write-your-hf-token-here"

docker buildx build \
   --build-arg PSIFX_VERSION=$PSIFX_VERSION \
   --build-arg HF_TOKEN=$HF_TOKEN \
   --tag "psifx:$PSIFX_VERSION" \
   --push .