Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpfland: "failed to run for" on heavy loaded system #1212

Open
kmikita opened this issue Jan 17, 2025 · 8 comments
Open

bpfland: "failed to run for" on heavy loaded system #1212

kmikita opened this issue Jan 17, 2025 · 8 comments
Labels
bug Something isn't working scx_bpfland

Comments

@kmikita
Copy link

kmikita commented Jan 17, 2025

On heavy loaded system (especially during compilation) bpfland restarts ~ every hour with dmesg log:

[31456.565759] sched_ext: BPF scheduler "bpfland" disabled (runnable task stall)
[31456.565772] sched_ext: bpfland: ld.lld[8779] failed to run for 5.344s
[31456.565774]    process_scheduled_works+0x1f8/0x440
[31456.565781]    worker_thread+0x35f/0x420
[31456.565784]    kthread+0x2f3/0x330
[31456.565786]    ret_from_fork+0x3e/0x50
[31456.565789]    ret_from_fork_asm+0x1a/0x30
[31458.471197] sched_ext: BPF scheduler "bpfland" enabled
[35951.212757] sched_ext: BPF scheduler "bpfland" disabled (runnable task stall)
[35951.212770] sched_ext: bpfland: clang++[13478] failed to run for 6.673s
[35951.212772]    process_scheduled_works+0x1f8/0x440
[35951.212779]    worker_thread+0x35f/0x420
[35951.212781]    kthread+0x2f3/0x330
[35951.212784]    ret_from_fork+0x3e/0x50
[35951.212786]    ret_from_fork_asm+0x1a/0x30
[35952.729087] sched_ext: BPF scheduler "bpfland" enabled

linux 6.12.8, scx 1.0.8

@arighi
Copy link
Contributor

arighi commented Jan 17, 2025

Can you post the trace printed by the scheduler (scx_bpfland stdout)?

@kmikita
Copy link
Author

kmikita commented Jan 17, 2025

journalctl -o short-iso -u scx.service
2025-01-17T12:25:01+0300 dx systemd[1]: Started Start scx_scheduler.
2025-01-17T12:25:01+0300 dx bash[1022]: 09:25:01 [INFO] scx_bpfland 1.0.9-g5b10a44e x86_64-unknown-linux-gnu SMT off
2025-01-17T12:25:02+0300 dx bash[1022]: 09:25:02 [INFO] primary CPU domain = 0x3
2025-01-17T12:25:02+0300 dx bash[1022]: 09:25:02 [INFO] cpufreq performance level: max
2025-01-17T12:25:02+0300 dx bash[1022]: 09:25:02 [INFO] L2 cache ID 0: sibling CPUs: [0]
2025-01-17T12:25:02+0300 dx bash[1022]: 09:25:02 [INFO] L2 cache ID 1: sibling CPUs: [1]
2025-01-17T12:25:02+0300 dx bash[1022]: 09:25:02 [INFO] L3 cache ID 0: sibling CPUs: [0]
2025-01-17T12:25:02+0300 dx bash[1022]: 09:25:02 [INFO] L3 cache ID 1: sibling CPUs: [1]
2025-01-17T21:09:11+0300 dx bash[1022]: DEBUG DUMP
2025-01-17T21:09:11+0300 dx bash[1022]: ================================================================================
2025-01-17T21:09:11+0300 dx bash[1022]: kworker/u32:3[8272] triggered exit kind 1026:
2025-01-17T21:09:11+0300 dx bash[1022]:   runnable task stall (ld.lld[8779] failed to run for 5.344s)
2025-01-17T21:09:11+0300 dx bash[1022]: Backtrace:
2025-01-17T21:09:11+0300 dx bash[1022]:   process_scheduled_works+0x1f8/0x440
2025-01-17T21:09:11+0300 dx bash[1022]:   worker_thread+0x35f/0x420
2025-01-17T21:09:11+0300 dx bash[1022]:   kthread+0x2f3/0x330
2025-01-17T21:09:11+0300 dx bash[1022]:   ret_from_fork+0x3e/0x50
2025-01-17T21:09:11+0300 dx bash[1022]:   ret_from_fork_asm+0x1a/0x30
2025-01-17T21:09:11+0300 dx bash[1022]: CPU states
2025-01-17T21:09:11+0300 dx bash[1022]: ----------
2025-01-17T21:09:11+0300 dx bash[1022]: CPU 0   : nr_run=11 flags=0x1 cpu_rel=0 ops_qseq=30809884 pnt_seq=7964631
2025-01-17T21:09:11+0300 dx bash[1022]:           curr=kworker/u32:3[8272] class=0xffffffff980d2f20
2025-01-17T21:09:11+0300 dx bash[1022]:  *R kworker/u32:3[8272] +0ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=366486986423024974
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     scx_dump_state+0x654/0x830
2025-01-17T21:09:11+0300 dx bash[1022]:     scx_ops_error_irq_workfn+0x28/0x90
2025-01-17T21:09:11+0300 dx bash[1022]:     irq_work_run+0x7d/0x1e0
2025-01-17T21:09:11+0300 dx bash[1022]:     __sysvec_irq_work+0x1c/0xc0
2025-01-17T21:09:11+0300 dx bash[1022]:     sysvec_irq_work+0x6e/0x80
2025-01-17T21:09:11+0300 dx bash[1022]:     asm_sysvec_irq_work+0x1b/0x20
2025-01-17T21:09:11+0300 dx bash[1022]:     scx_watchdog_workfn+0x155/0x240
2025-01-17T21:09:11+0300 dx bash[1022]:     process_scheduled_works+0x1f8/0x440
2025-01-17T21:09:11+0300 dx bash[1022]:     worker_thread+0x35f/0x420
2025-01-17T21:09:11+0300 dx bash[1022]:     kthread+0x2f3/0x330
2025-01-17T21:09:11+0300 dx bash[1022]:     ret_from_fork+0x3e/0x50
2025-01-17T21:09:11+0300 dx bash[1022]:     ret_from_fork_asm+0x1a/0x30
2025-01-17T21:09:11+0300 dx bash[1022]:   R ld.lld[8779] -5344ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580922238514107
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     irqentry_exit+0x12/0x50
2025-01-17T21:09:11+0300 dx bash[1022]:     sysvec_apic_timer_interrupt+0x44/0x90
2025-01-17T21:09:11+0300 dx bash[1022]:     asm_sysvec_apic_timer_interrupt+0x1b/0x20
2025-01-17T21:09:11+0300 dx bash[1022]:   R scx_bpfland[1022] -137ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694209164419
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x1e7d/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R amule[2548] -119ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694140861170
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     __se_sys_poll+0x108/0x1a0
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x37e/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R ThreadPoolForeg[8986] -107ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694249029891
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     futex_wait+0xc7/0x180
2025-01-17T21:09:11+0300 dx bash[1022]:     do_futex+0x177/0x1f0
2025-01-17T21:09:11+0300 dx bash[1022]:     __se_sys_futex+0x194/0x230
2025-01-17T21:09:11+0300 dx bash[1022]:     __x64_sys_futex+0x29/0x30
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x1c1a/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R amule[2590] -51ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694140768029
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     futex_wait+0xc7/0x180
2025-01-17T21:09:11+0300 dx bash[1022]:     do_futex+0x177/0x1f0
2025-01-17T21:09:11+0300 dx bash[1022]:     __se_sys_futex+0x194/0x230
2025-01-17T21:09:11+0300 dx bash[1022]:     __x64_sys_futex+0x29/0x30
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x1c1a/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R ThreadPoolForeg[8989] -36ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694252822265
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     futex_wait+0xc7/0x180
2025-01-17T21:09:11+0300 dx bash[1022]:     do_futex+0x177/0x1f0
2025-01-17T21:09:11+0300 dx bash[1022]:     __se_sys_futex+0x194/0x230
2025-01-17T21:09:11+0300 dx bash[1022]:     __x64_sys_futex+0x29/0x30
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x1c1a/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R ThreadPoolForeg[8990] -36ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694249158291
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     futex_wait+0xc7/0x180
2025-01-17T21:09:11+0300 dx bash[1022]:     do_futex+0x177/0x1f0
2025-01-17T21:09:11+0300 dx bash[1022]:     __se_sys_futex+0x194/0x230
2025-01-17T21:09:11+0300 dx bash[1022]:     __x64_sys_futex+0x29/0x30
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x1c1a/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R gdbus[2405] -6ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694128947929
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     __se_sys_poll+0x108/0x1a0
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x37e/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R ThreadPoolForeg[8914] -4ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694126479657
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     irqentry_exit+0x46/0x50
2025-01-17T21:09:11+0300 dx bash[1022]:     sysvec_apic_timer_interrupt+0x44/0x90
2025-01-17T21:09:11+0300 dx systemd[1]: scx.service: Main process exited, code=exited, status=1/FAILURE
2025-01-17T21:09:11+0300 dx bash[1022]:     asm_sysvec_apic_timer_interrupt+0x1b/0x20
2025-01-17T21:09:11+0300 dx bash[1022]:     do_user_addr_fault+0x393/0x900
2025-01-17T21:09:11+0300 dx bash[1022]:     exc_page_fault+0x59/0xc0
2025-01-17T21:09:11+0300 dx bash[1022]:     asm_exc_page_fault+0x27/0x30
2025-01-17T21:09:11+0300 dx bash[1022]:   R ThreadPoolForeg[8917] +0ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694069927171
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     blk_account_io_start+0xef/0x150
2025-01-17T21:09:11+0300 dx bash[1022]:     blk_mq_submit_bio+0x6f6/0x9f0
2025-01-17T21:09:11+0300 dx bash[1022]:     __submit_bio+0x80/0x270
2025-01-17T21:09:11+0300 dx bash[1022]:     submit_bio_noacct_nocheck+0x12c/0x370
2025-01-17T21:09:11+0300 dx bash[1022]:     submit_bio_noacct+0x3c3/0x660
2025-01-17T21:09:11+0300 dx bash[1022]:     submit_bio+0xe5/0xf0
2025-01-17T21:09:11+0300 dx bash[1022]:     ext4_mpage_readpages+0xa3c/0xa70
2025-01-17T21:09:11+0300 dx bash[1022]:     ext4_readahead+0x37/0x40
2025-01-17T21:09:11+0300 dx bash[1022]:     read_pages+0xb4/0x420
2025-01-17T21:09:11+0300 dx bash[1022]:     page_cache_ra_unbounded+0xc0/0x340
2025-01-17T21:09:11+0300 dx bash[1022]:     page_cache_sync_ra+0x42c/0x460
2025-01-17T21:09:11+0300 dx bash[1022]:     filemap_get_pages+0x164/0xb60
2025-01-17T21:09:11+0300 dx bash[1022]:     filemap_read+0xe3/0x440
2025-01-17T21:09:11+0300 dx bash[1022]:     generic_file_read_iter+0x39/0x110
2025-01-17T21:09:11+0300 dx bash[1022]:     ext4_file_read_iter+0x171/0x340
2025-01-17T21:09:11+0300 dx bash[1022]:     vfs_read+0x2a7/0x300
2025-01-17T21:09:11+0300 dx bash[1022]:     __x64_sys_pread64+0x92/0x110
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x22e0/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]: CPU 1   : nr_run=6 flags=0x1 cpu_rel=0 ops_qseq=47108100 pnt_seq=15992891
2025-01-17T21:09:11+0300 dx bash[1022]:           curr=chrome[8983] class=0xffffffff980d2f20
2025-01-17T21:09:11+0300 dx bash[1022]:  *R chrome[8983] +0ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=424580693858046882
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     irqentry_exit+0x46/0x50
2025-01-17T21:09:11+0300 dx bash[1022]:     sysvec_apic_timer_interrupt+0x44/0x90
2025-01-17T21:09:11+0300 dx bash[1022]:     asm_sysvec_apic_timer_interrupt+0x1b/0x20
2025-01-17T21:09:11+0300 dx bash[1022]:   R tixati[2602] -137ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694140605134
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     __x64_sys_epoll_wait+0xe9/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x2233/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R clang++[8949] -136ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580700093338484
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     irqentry_exit+0x12/0x50
2025-01-17T21:09:11+0300 dx bash[1022]:     sysvec_apic_timer_interrupt+0x44/0x90
2025-01-17T21:09:11+0300 dx bash[1022]:     asm_sysvec_apic_timer_interrupt+0x1b/0x20
2025-01-17T21:09:11+0300 dx bash[1022]:   R transmission-da[2026] -68ms
2025-01-17T21:09:11+0300 dx systemd[1]: scx.service: Failed with result 'exit-code'.
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694140711369
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     __x64_sys_epoll_wait+0xe9/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x2233/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R transmission-da[2028] -39ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694140822040
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     __x64_sys_nanosleep+0xd0/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x83/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]:   R ThreadPoolForeg[8915] -6ms
2025-01-17T21:09:11+0300 dx bash[1022]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T21:09:11+0300 dx bash[1022]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=424580694074781638
2025-01-17T21:09:11+0300 dx bash[1022]:       cpus=ff
2025-01-17T21:09:11+0300 dx bash[1022]:     filemap_get_pages+0x52b/0xb60
2025-01-17T21:09:11+0300 dx bash[1022]:     filemap_read+0xe3/0x440
2025-01-17T21:09:11+0300 dx bash[1022]:     generic_file_read_iter+0x39/0x110
2025-01-17T21:09:11+0300 dx bash[1022]:     ext4_file_read_iter+0x171/0x340
2025-01-17T21:09:11+0300 dx bash[1022]:     vfs_read+0x2a7/0x300
2025-01-17T21:09:11+0300 dx bash[1022]:     __x64_sys_pread64+0x92/0x110
2025-01-17T21:09:11+0300 dx bash[1022]:     x64_sys_call+0x22e0/0x2d60
2025-01-17T21:09:11+0300 dx bash[1022]:     do_syscall_64+0x65/0x130
2025-01-17T21:09:11+0300 dx bash[1022]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T21:09:11+0300 dx bash[1022]: ================================================================================
2025-01-17T21:09:11+0300 dx bash[1022]: 18:09:11 [INFO] Unregister scx_bpfland scheduler
2025-01-17T21:09:11+0300 dx bash[1022]: Error: EXIT: runnable task stall (ld.lld[8779] failed to run for 5.344s)
2025-01-17T21:09:11+0300 dx systemd[1]: scx.service: Service hold-off time over, scheduling restart.
2025-01-17T21:09:11+0300 dx systemd[1]: scx.service: Scheduled restart job, restart counter is at 1.
2025-01-17T21:09:11+0300 dx systemd[1]: Stopped Start scx_scheduler.
2025-01-17T21:09:11+0300 dx systemd[1]: Started Start scx_scheduler.
2025-01-17T21:09:11+0300 dx bash[9148]: 18:09:11 [INFO] scx_bpfland 1.0.9-g5b10a44e x86_64-unknown-linux-gnu SMT off
2025-01-17T21:09:11+0300 dx bash[9148]: 18:09:11 [INFO] primary CPU domain = 0x3
2025-01-17T21:09:11+0300 dx bash[9148]: 18:09:11 [INFO] cpufreq performance level: max
2025-01-17T21:09:11+0300 dx bash[9148]: 18:09:11 [INFO] L2 cache ID 0: sibling CPUs: [0]
2025-01-17T21:09:11+0300 dx bash[9148]: 18:09:11 [INFO] L2 cache ID 1: sibling CPUs: [1]
2025-01-17T21:09:11+0300 dx bash[9148]: 18:09:11 [INFO] L3 cache ID 0: sibling CPUs: [0]
2025-01-17T21:09:11+0300 dx bash[9148]: 18:09:11 [INFO] L3 cache ID 1: sibling CPUs: [1]
2025-01-17T22:24:05+0300 dx bash[9148]: DEBUG DUMP
2025-01-17T22:24:05+0300 dx bash[9148]: ================================================================================
2025-01-17T22:24:05+0300 dx bash[9148]: kworker/u32:3[8272] triggered exit kind 1026:
2025-01-17T22:24:05+0300 dx bash[9148]:   runnable task stall (clang++[13478] failed to run for 6.673s)
2025-01-17T22:24:05+0300 dx bash[9148]: Backtrace:
2025-01-17T22:24:05+0300 dx bash[9148]:   process_scheduled_works+0x1f8/0x440
2025-01-17T22:24:05+0300 dx bash[9148]:   worker_thread+0x35f/0x420
2025-01-17T22:24:05+0300 dx bash[9148]:   kthread+0x2f3/0x330
2025-01-17T22:24:05+0300 dx bash[9148]:   ret_from_fork+0x3e/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:   ret_from_fork_asm+0x1a/0x30
2025-01-17T22:24:05+0300 dx bash[9148]: CPU states
2025-01-17T22:24:05+0300 dx bash[9148]: ----------
2025-01-17T22:24:05+0300 dx bash[9148]: CPU 0   : nr_run=6 flags=0x1 cpu_rel=0 ops_qseq=36627392 pnt_seq=8070411
2025-01-17T22:24:05+0300 dx bash[9148]:           curr=Chrome_ChildIOT[12669] class=0xffffffff980d2f20
2025-01-17T22:24:05+0300 dx bash[9148]:  *R Chrome_ChildIOT[12669] -1ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=111688987676330929
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     ___sys_recvmsg+0x1c3/0x200
2025-01-17T22:24:05+0300 dx bash[9148]:     __x64_sys_recvmsg+0xbe/0x120
2025-01-17T22:24:05+0300 dx bash[9148]:     x64_sys_call+0x2c1c/0x2d60
2025-01-17T22:24:05+0300 dx bash[9148]:     do_syscall_64+0x71/0x130
2025-01-17T22:24:05+0300 dx bash[9148]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T22:24:05+0300 dx bash[9148]:   R clang++[13829] -705ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=111689016627346708
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     irqentry_exit+0x12/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:     sysvec_apic_timer_interrupt+0x44/0x90
2025-01-17T22:24:05+0300 dx bash[9148]:     asm_sysvec_apic_timer_interrupt+0x1b/0x20
2025-01-17T22:24:05+0300 dx bash[9148]:   R chrome[13685] -77ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=111688988910508995
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     irqentry_exit+0x12/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:     sysvec_apic_timer_interrupt+0x44/0x90
2025-01-17T22:24:05+0300 dx bash[9148]:     asm_sysvec_apic_timer_interrupt+0x1b/0x20
2025-01-17T22:24:05+0300 dx bash[9148]:   R Chrome_ChildIOT[12676] -6ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=111688988804459853
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     irqentry_exit+0x46/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:     sysvec_reschedule_ipi+0x67/0x80
2025-01-17T22:24:05+0300 dx bash[9148]:     asm_sysvec_reschedule_ipi+0x1b/0x20
2025-01-17T22:24:05+0300 dx bash[9148]:     do_syscall_64+0x37/0x130
2025-01-17T22:24:05+0300 dx bash[9148]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T22:24:05+0300 dx bash[9148]:   R ThreadPoolForeg[13733] -2ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=111688988714210450
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     irqentry_exit+0x12/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:     sysvec_apic_timer_interrupt+0x44/0x90
2025-01-17T22:24:05+0300 dx bash[9148]:     asm_sysvec_apic_timer_interrupt+0x1b/0x20
2025-01-17T22:24:05+0300 dx bash[9148]:   R ThreadPoolForeg[13841] +0ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=111688988747165337
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     irqentry_exit+0x12/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:     sysvec_apic_timer_interrupt+0x44/0x90
2025-01-17T22:24:05+0300 dx bash[9148]:     asm_sysvec_apic_timer_interrupt+0x1b/0x20
2025-01-17T22:24:05+0300 dx bash[9148]: CPU 1   : nr_run=6 flags=0x1 cpu_rel=0 ops_qseq=56060913 pnt_seq=17172409
2025-01-17T22:24:05+0300 dx bash[9148]:           curr=kworker/u32:3[8272] class=0xffffffff980d2f20
2025-01-17T22:24:05+0300 dx bash[9148]:  *R kworker/u32:3[8272] +0ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=313740076
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     scx_dump_state+0x654/0x830
2025-01-17T22:24:05+0300 dx bash[9148]:     scx_ops_error_irq_workfn+0x28/0x90
2025-01-17T22:24:05+0300 dx bash[9148]:     irq_work_run+0x7d/0x1e0
2025-01-17T22:24:05+0300 dx bash[9148]:     __sysvec_irq_work+0x1c/0xc0
2025-01-17T22:24:05+0300 dx bash[9148]:     sysvec_irq_work+0x6e/0x80
2025-01-17T22:24:05+0300 dx bash[9148]:     asm_sysvec_irq_work+0x1b/0x20
2025-01-17T22:24:05+0300 dx bash[9148]:     scx_watchdog_workfn+0x155/0x240
2025-01-17T22:24:05+0300 dx bash[9148]:     process_scheduled_works+0x1f8/0x440
2025-01-17T22:24:05+0300 dx bash[9148]:     worker_thread+0x35f/0x420
2025-01-17T22:24:05+0300 dx bash[9148]:     kthread+0x2f3/0x330
2025-01-17T22:24:05+0300 dx bash[9148]:     ret_from_fork+0x3e/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:     ret_from_fork_asm+0x1a/0x30
2025-01-17T22:24:05+0300 dx bash[9148]:   R clang++[13478] -6673ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x1 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=111689503189692039
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     irqentry_exit+0x46/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:     sysvec_irq_work+0x41/0x80
2025-01-17T22:24:05+0300 dx bash[9148]:     asm_sysvec_irq_work+0x1b/0x20
2025-01-17T22:24:05+0300 dx bash[9148]:     local_irq_enable_exit_to_user+0x6/0x10
2025-01-17T22:24:05+0300 dx bash[9148]:     irqentry_exit+0x12/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:     common_interrupt+0x50/0xa0
2025-01-17T22:24:05+0300 dx bash[9148]:     asm_common_interrupt+0x27/0x40
2025-01-17T22:24:05+0300 dx bash[9148]:   R ThreadPoolForeg[13731] -40ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=111688988872646430
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     futex_wait+0xc7/0x180
2025-01-17T22:24:05+0300 dx bash[9148]:     do_futex+0x177/0x1f0
2025-01-17T22:24:05+0300 dx bash[9148]:     __se_sys_futex+0x194/0x230
2025-01-17T22:24:05+0300 dx bash[9148]:     __x64_sys_futex+0x29/0x30
2025-01-17T22:24:05+0300 dx bash[9148]:     x64_sys_call+0x1c1a/0x2d60
2025-01-17T22:24:05+0300 dx bash[9148]:     do_syscall_64+0x65/0x130
2025-01-17T22:24:05+0300 dx bash[9148]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T22:24:05+0300 dx bash[9148]:   R chrome[13729] -38ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=111688988806245614
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx systemd[1]: scx.service: Main process exited, code=exited, status=1/FAILURE
2025-01-17T22:24:05+0300 dx bash[9148]:     futex_wait+0x150/0x180
2025-01-17T22:24:05+0300 dx bash[9148]:     do_futex+0x177/0x1f0
2025-01-17T22:24:05+0300 dx bash[9148]:     __se_sys_futex+0x194/0x230
2025-01-17T22:24:05+0300 dx bash[9148]:     __x64_sys_futex+0x29/0x30
2025-01-17T22:24:05+0300 dx bash[9148]:     x64_sys_call+0x1c1a/0x2d60
2025-01-17T22:24:05+0300 dx bash[9148]:     do_syscall_64+0x65/0x130
2025-01-17T22:24:05+0300 dx bash[9148]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T22:24:05+0300 dx bash[9148]:   R tixati[2333] -29ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x0 dsq_vtime=111688988890806528
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     futex_wait+0x150/0x180
2025-01-17T22:24:05+0300 dx bash[9148]:     do_futex+0x177/0x1f0
2025-01-17T22:24:05+0300 dx bash[9148]:     __se_sys_futex+0x194/0x230
2025-01-17T22:24:05+0300 dx bash[9148]:     __x64_sys_futex+0x29/0x30
2025-01-17T22:24:05+0300 dx bash[9148]:     x64_sys_call+0x1c1a/0x2d60
2025-01-17T22:24:05+0300 dx bash[9148]:     do_syscall_64+0x65/0x130
2025-01-17T22:24:05+0300 dx bash[9148]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T22:24:05+0300 dx bash[9148]:   R chrome[12432] -1ms
2025-01-17T22:24:05+0300 dx bash[9148]:       scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
2025-01-17T22:24:05+0300 dx bash[9148]:       sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=111688987677359648
2025-01-17T22:24:05+0300 dx bash[9148]:       cpus=ff
2025-01-17T22:24:05+0300 dx bash[9148]:     irqentry_exit+0x46/0x50
2025-01-17T22:24:05+0300 dx bash[9148]:     common_interrupt+0x50/0xa0
2025-01-17T22:24:05+0300 dx bash[9148]:     asm_common_interrupt+0x27/0x40
2025-01-17T22:24:05+0300 dx bash[9148]:     read_hpet+0x35/0x100
2025-01-17T22:24:05+0300 dx bash[9148]:     ktime_get_ts64+0x96/0x170
2025-01-17T22:24:05+0300 dx bash[9148]:     posix_get_monotonic_timespec+0x26/0xc0
2025-01-17T22:24:05+0300 dx bash[9148]:     __x64_sys_clock_gettime+0x81/0x100
2025-01-17T22:24:05+0300 dx bash[9148]:     x64_sys_call+0x2421/0x2d60
2025-01-17T22:24:05+0300 dx bash[9148]:     do_syscall_64+0x65/0x130
2025-01-17T22:24:05+0300 dx bash[9148]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
2025-01-17T22:24:05+0300 dx bash[9148]: ================================================================================
2025-01-17T22:24:05+0300 dx bash[9148]: 19:24:05 [INFO] Unregister scx_bpfland scheduler
2025-01-17T22:24:05+0300 dx bash[9148]: Error: EXIT: runnable task stall (clang++[13478] failed to run for 6.673s)
2025-01-17T22:24:05+0300 dx systemd[1]: scx.service: Failed with result 'exit-code'.
2025-01-17T22:24:05+0300 dx systemd[1]: scx.service: Service hold-off time over, scheduling restart.
2025-01-17T22:24:05+0300 dx systemd[1]: scx.service: Scheduled restart job, restart counter is at 2.
2025-01-17T22:24:05+0300 dx systemd[1]: Stopped Start scx_scheduler.
2025-01-17T22:24:05+0300 dx systemd[1]: Started Start scx_scheduler.
2025-01-17T22:24:05+0300 dx bash[13862]: 19:24:05 [INFO] scx_bpfland 1.0.9-g5b10a44e x86_64-unknown-linux-gnu SMT off
2025-01-17T22:24:06+0300 dx bash[13862]: 19:24:06 [INFO] primary CPU domain = 0x3
2025-01-17T22:24:06+0300 dx bash[13862]: 19:24:06 [INFO] cpufreq performance level: max
2025-01-17T22:24:06+0300 dx bash[13862]: 19:24:06 [INFO] L2 cache ID 0: sibling CPUs: [0]
2025-01-17T22:24:06+0300 dx bash[13862]: 19:24:06 [INFO] L2 cache ID 1: sibling CPUs: [1]
2025-01-17T22:24:06+0300 dx bash[13862]: 19:24:06 [INFO] L3 cache ID 0: sibling CPUs: [0]
2025-01-17T22:24:06+0300 dx bash[13862]: 19:24:06 [INFO] L3 cache ID 1: sibling CPUs: [1]

@arighi
Copy link
Contributor

arighi commented Jan 18, 2025

...

2025-01-17T21:09:11+0300 dx bash[1022]: CPU 0 : nr_run=11 flags=0x1 cpu_rel=0 ops_qseq=30809884 pnt_seq=7964631
2025-01-17T21:09:11+0300 dx bash[1022]: curr=kworker/u32:3[8272] class=0xffffffff980d2f20

Hm... why the sched class symbol is not resolved here? Can you share your kernel .config?

Are you running services that are adjusting niceness / priority dynamically (such as ananicy-cpp or similar)?

@arighi
Copy link
Contributor

arighi commented Jan 18, 2025

BTW @htejun what do you think about including p->scx.weight in the task dump? It could help to understand if certain tasks were overly prioritized/de-prioritized due to a big niceness gap.

@kmikita
Copy link
Author

kmikita commented Jan 18, 2025

Are you running services that are adjusting niceness / priority dynamically (such as ananicy-cpp or similar)?

I assume the only thing which cause a conflict: used "conservative" cpu governor with ignore_nice_load=1 .

Can you share your kernel .config?

config-6.12.8-llvm.txt

@arighi
Copy link
Contributor

arighi commented Jan 18, 2025

@kmikita can you check if bpfland-next is fixing this? Thanks!

https://github.com/sched-ext/scx/tree/bpfland-next

@kmikita
Copy link
Author

kmikita commented Jan 18, 2025

@arighi I will be testing bpfland-next, but I do not promise results soon, because it needs a longtime heavy load cpu (multithreaded gcc or clang) and the issue does not often happens.

@arighi
Copy link
Contributor

arighi commented Jan 18, 2025

@arighi I will be testing bpfland-next, but I do not promise results soon, because it needs a longtime heavy load cpu (multithreaded gcc or clang) and the issue does not often happens.

Sure, no rush and only if you have time. Thanks!

@hodgesds hodgesds added scx_bpfland bug Something isn't working labels Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working scx_bpfland
Projects
None yet
Development

No branches or pull requests

3 participants