Skip to content

Latest commit

 

History

History
32 lines (25 loc) · 901 Bytes

README.md

File metadata and controls

32 lines (25 loc) · 901 Bytes

Torch Batcher

Serve batched requests using redis, can scale linearly by increasing the number of workers per device and along devices.

Dependencies

Usage

  • For Linear Scaling, start nvidia-cuda-mps-control, Check Section 2.1.1 GPU utilization for details.

    nvidia-cuda-mps-control -d # To start
    
    # To exit mps after stoping the server do.
    nvidia-cuda-mps-control # Will enter the command prompt
    quit # enter command to quit
  • Start Redis

    redis-server --save "" --appendonly no
  • Start Batch-Serving

    supervisord -c supervisor.conf # Start 3 workers on a single gpu
  • Start Batch benchmark

    python3 bench_batched.py