Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gunicorn 20.1.0 worker does not respect graceful timeout option #2839

Closed
wonjoonSeol-WS opened this issue Aug 3, 2022 · 6 comments
Closed
Assignees

Comments

@wonjoonSeol-WS
Copy link

wonjoonSeol-WS commented Aug 3, 2022

gunicorn 20.1.0

TERM: Graceful shutdown. Waits for workers to finish their current requests up to the graceful_timeout.

Example code: (download django-mvp.zip)

import time

from django.http import HttpResponse

def index(request):
    return HttpResponse("Hello, world. You're at the polls index.")

def test(request):
    is_pending = True
    print(1, is_pending)
    time.sleep(10)

    is_pending = False
    print(2, is_pending)
    return HttpResponse("completed")

i.e. it sleeps for 10 secs and print.

Test 1 Timeout < 10 graceful timeout > 10

Expected: Worker terminate gracefully.

How to run test:
GET test url:
curl -X 'GET' 'http://localhost:8000/polls/test/'

while running send TERM signal:

kill -s TERM {pid}

Result: Worker does not terminate gracefully.

gunicorn -w 1 --max-requests 2 --timeout 8 --graceful-timeout 14 mysite.wsgi
[2022-08-04 01:08:42 +0900] [86314] [INFO] Starting gunicorn 20.1.0
[2022-08-04 01:08:42 +0900] [86314] [INFO] Listening at: http://127.0.0.1:8000 (86314)
[2022-08-04 01:08:42 +0900] [86314] [INFO] Using worker: sync
[2022-08-04 01:08:42 +0900] [86315] [INFO] Booting worker with pid: 86315
1 True
[2022-08-04 01:08:56 +0900] [86314] [CRITICAL] WORKER TIMEOUT (pid:86315)
[2022-08-03 16:08:56 +0000] [86315] [INFO] Worker exiting (pid: 86315)
[2022-08-04 01:08:56 +0900] [86393] [INFO] Booting worker with pid: 86393

Test 2 timeout > 10 graceful timeout < 10

Expected: worker to be killed without completing request.

gunicorn -w 1 --max-requests 2 --timeout 14 --graceful-timeout 2 mysite.wsgi
[2022-08-04 01:20:58 +0900] [87093] [INFO] Starting gunicorn 20.1.0
[2022-08-04 01:20:58 +0900] [87093] [INFO] Listening at: http://127.0.0.1:8000 (87093)
[2022-08-04 01:20:58 +0900] [87093] [INFO] Using worker: sync
[2022-08-04 01:20:58 +0900] [87094] [INFO] Booting worker with pid: 87094
1 True
2 False
[2022-08-03 16:21:13 +0000] [87094] [INFO] Worker exiting (pid: 87094)
[2022-08-04 01:21:13 +0900] [87166] [INFO] Booting worker with pid: 87166
1 True
2 False
[2022-08-03 16:22:03 +0000] [87166] [INFO] Worker exiting (pid: 87166)
[2022-08-04 01:22:03 +0900] [87264] [INFO] Booting worker with pid: 87264

i.e. gunicorn worker only seems to respect timeout option but not graceful timeout option.
This is not consistent with the documented behaviour.

I notice however, that gunicorn log does not show [INFO] Handling signal: TERM in console?
It does for INT signal for some reason.

@benoitc benoitc self-assigned this Aug 6, 2022
@ankush
Copy link

ankush commented Nov 2, 2023

I was facing a similar problem (frappe/bench#1493) but it turned out to be related to supervisord. I can't reproduce this with vanilla gunicorn.

Is this still valid?

Here's what I did to reproduce:

  • timeout > 10 (120) graceful timeout = 10
def slow_function():
    import time

    for i in range(300):
        print("Sleeping", i)
        time.sleep(1)

This is what get as output. Since we aren't issuing request at the same exact time, the time is off by a second. but that's acceptable (?)

I curled a request in one terminal and sent SIGTERM kill -s TERM 24227 in another.

[2023-11-02 12:07:19 +0530] [24227] [INFO] Starting gunicorn 21.2.0
[2023-11-02 12:07:19 +0530] [24227] [INFO] Listening at: http://127.0.0.1:8000 (24227)
[2023-11-02 12:07:19 +0530] [24227] [INFO] Using worker: sync
[2023-11-02 12:07:19 +0530] [24228] [INFO] Booting worker with pid: 24228
[2023-11-02 12:07:19 +0530] [24229] [INFO] Booting worker with pid: 24229
[2023-11-02 12:07:19 +0530] [24230] [INFO] Booting worker with pid: 24230
[2023-11-02 12:07:19 +0530] [24231] [INFO] Booting worker with pid: 24231
Sleeping 0
Sleeping 1
[2023-11-02 12:07:32 +0530] [24227] [INFO] Handling signal: term
[2023-11-02 12:07:32 +0530] [24228] [INFO] Worker exiting (pid: 24228)
[2023-11-02 12:07:32 +0530] [24229] [INFO] Worker exiting (pid: 24229)
[2023-11-02 12:07:32 +0530] [24231] [INFO] Worker exiting (pid: 24231)
Sleeping 2
Sleeping 3
Sleeping 4
Sleeping 5
Sleeping 6
Sleeping 7
Sleeping 8
Sleeping 9
Sleeping 10
Sleeping 11
[2023-11-02 12:07:42 +0530] [24227] [INFO] Shutting down: Master

@katzlbt
Copy link

katzlbt commented Dec 23, 2023

UseCase: save work on timeout, idempotent handlers
sync-worker does not give time to save work on timeout, graceful option is ignored. SIGKILL almost immediately follows SIGTERM <1s
Documentation of the commandline options is not helpful.

I found this problem too. It behaves differently depending on worker. gthread ignores the timeout and gives infinite time (from the users perspective at least, but maybe both treads must be stuck to trigger).

def sdk_test_something(request):
    try:
        sleep(35) # cause worker timeout
    except SystemExit as e:
        logging.error(e)
        sleep(5) # save work done in the 30s until SystemExit
        return "SystemExit, run again"
        
    return "Hello World!"
sync:
gunicorn -b :$PORT --workers 2 --timeout 30 --graceful-timeout 40 main:app

[2023-12-23 21:52:10 +0100] [61603] [INFO] Starting gunicorn 21.2.0
[2023-12-23 21:52:10 +0100] [61603] [INFO] Listening at: http://0.0.0.0:61815 (61603)
[2023-12-23 21:52:10 +0100] [61603] [INFO] Using worker: sync
[2023-12-23 21:52:10 +0100] [61604] [INFO] Booting worker with pid: 61604
[2023-12-23 21:52:10 +0100] [61605] [INFO] Booting worker with pid: 61605
INFO     2023-12-23 20:52:11,283 instance.py:294] Instance PID: 61603
INFO     2023-12-23 20:52:11,297 module.py:862] default: "GET /_ah/warmup HTTP/1.1" 200 6
[2023-12-23 21:52:18 +0100] [61595] [INFO] Parent changed, shutting down: <Worker 61595>
[2023-12-23 21:52:18 +0100] [61595] [INFO] Worker exiting (pid: 61595)
[2023-12-23 21:52:18 +0100] [61594] [INFO] Parent changed, shutting down: <Worker 61594>
[2023-12-23 21:52:18 +0100] [61594] [INFO] Worker exiting (pid: 61594)
[2023-12-23 21:52:39 +0100] [61598] [CRITICAL] WORKER TIMEOUT (pid:61600)
ERROR:root:1
[2023-12-23 21:52:40 +0100] [61598] [ERROR] Worker (pid:61600) was sent SIGKILL! Perhaps out of memory?
[2023-12-23 21:52:40 +0100] [61609] [INFO] Booting worker with pid: 61609
INFO     2023-12-23 20:52:40,894 module.py:862] default: "GET /h/sdk/testsomething HTTP/1.1" 500 739
gthread
gunicorn -b :$PORT --workers 2 --threads 3 --timeout 30 main:app

[2023-12-23 21:55:09 +0100] [61627] [INFO] Starting gunicorn 21.2.0
[2023-12-23 21:55:09 +0100] [61627] [INFO] Listening at: http://0.0.0.0:61857 (61627)
[2023-12-23 21:55:09 +0100] [61627] [INFO] Using worker: gthread
[2023-12-23 21:55:09 +0100] [61628] [INFO] Booting worker with pid: 61628
[2023-12-23 21:55:09 +0100] [61629] [INFO] Booting worker with pid: 61629
INFO     2023-12-23 20:55:10,537 instance.py:294] Instance PID: 61627
INFO     2023-12-23 20:55:10,549 module.py:862] default: "GET /_ah/warmup HTTP/1.1" 200 6
INFO     2023-12-23 20:55:43,863 module.py:862] default: "GET /h/sdk/testsomething HTTP/1.1" 200 12
Hello World!

@benoitc
Copy link
Owner

benoitc commented Dec 25, 2023

well graceful timeout purpose is to wait soem long worker tp o be killed but doesn'tnot ensure it will wait this tile so send the signal. I will re-check for the coming version.

@tilgovi
Copy link
Collaborator

tilgovi commented Dec 27, 2023

I'd like to keep all graceful shutdown related issues to #1236, so I'm going to close this one. It's documented in that issue that the threaded worker might not behave exactly like the others with respect to graceful shutdown.

@katzlbt
Copy link

katzlbt commented Dec 28, 2023

@tilgovi Ack! My issue is not related to graceful shutdown, but to timeout behavior.

All this is about a misunderstanding of the documentation that the options --timeout and --graceful-timeout are related.

My misunderstanding was:
--timeout 30 ... send SIGTERM after 30s (catchable SysExit, can save progress of a task-queue handler) AND THEN ...
--graceful-timeout 40 ... onTimeout send SIGKILL after 40s (not catchable) meaning "make timeout graceful"
UseCase: save work on timeout by catching SysExit and having a predictable time until SIGKILL arrives

The actual meaning of the option is "set a different timeout for graceful shutdown". Suggest to improve the doc to avoid this misinterpretation or call the option --graceful-shutdown-timeout

@tilgovi
Copy link
Collaborator

tilgovi commented Dec 28, 2023

Thanks for following up. I'll close this for now, but I've been contemplating for a long time what might be done to make timeout behavior clearer and I'll take your idea under consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants