Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow scan_check? #28

Open
psi-c opened this issue Mar 21, 2023 · 10 comments
Open

Slow scan_check? #28

psi-c opened this issue Mar 21, 2023 · 10 comments

Comments

@psi-c
Copy link

psi-c commented Mar 21, 2023

Running jive5ab built from issue25 commit d31d7ce results in a rather lethargic scan_check?

Recordings here are single-threaded VDIF_8000-512-16-2

In 'some' flavour of jive5ab-3.0.0 scan_check returns before you have time to blink e.g.

FS:
2023.080.10:10:12.25:checkfb
2023.080.10:10:12.31/scan_check/?,r11095_on_080-1008,VDIF,32,2023y80d10h8m20.1709s,110.829s,16.000000,0,8032

j5ab:
202308011:10:12 2023-03-21 10:10:12.25: Processing command 'scan_check?'
202308011:10:12 2023-03-21 10:10:12.27: openfile_type: found 7122135040 bytes in 28 chunks, 100%
202308011:10:12 2023-03-21 10:10:12.30: Reply: !scan_check? 0 : ? : r11095_on_080-1008 : VDIF : 32 : 2023y080d10h08m20.1709s : 110.829s : 16Mbps : 0 : 8032 ;


FS:
2023.080.10:12:43.25:checkfb
2023.080.10:12:43.33/scan_check/?,r11095_on_080-1011a,VDIF,32,2023y80d10h11m9.6695s,92.3304s,16.000000,0,8032

j5ab:
202308011:12:43 2023-03-21 10:12:43.25: Processing command 'scan_check?'
202308011:12:43 2023-03-21 10:12:43.28: openfile_type: found 5932884992 bytes in 24 chunks, 100%
202308011:12:43 2023-03-21 10:12:43.32: Reply: !scan_check? 0 : ? : r11095_on_080-1011a : VDIF : 32 : 2023y080d10h11m09.6695s : 92.3304s : 16Mbps : 0 : 8032 ;


FS:
2023.080.10:15:17.44:checkfb
2023.080.10:15:17.51/scan_check/?,r11095_on_080-1014a,VDIF,32,2023y80d10h14m11.5259s,64.474s,16.000000,0,8032

j5ab:
202308011:15:17 2023-03-21 10:15:17.44: Processing command 'scan_check?'
202308011:15:17 2023-03-21 10:15:17.47: openfile_type: found 4143499968 bytes in 17 chunks, 100%
202308011:15:17 2023-03-21 10:15:17.51: Reply: !scan_check? 0 : ? : r11095_on_080-1014a : VDIF : 32 : 2023y080d10h14m11.5259s : 64.474s : 16Mbps : 0 : 8032 ;

Restarting with issue25, things become a lot more laid back

FS:
2023.080.11:05:11.25:checkfb
2023.080.11:05:12.27#flagr#flagr/antenna,acquired
2023.080.11:05:15.32#flagr#flagr/antenna,off-source
2023.080.11:05:18.58/scan_check/?,r11095_on_080-1103b,VDIF,32,2023y80d11h3m37.0074s,92.9925s,16.000000,0,8032

j5ab:
202308012:05:11 2023-03-21 11:05:11.25: Processing command 'scan_check?'
202308012:05:18 2023-03-21 11:05:18.53: openfile_type: found 5976016832 bytes in 24 chunks, 100%
202308012:05:18 2023-03-21 11:05:18.57: Reply: !scan_check? 0 : ? : r11095_on_080-1103b : VDIF : 32 : 2023y080d11h03m37.0074s : 92.9925s : 16Mbps : 0 : 8032 ;


FS:
2023.080.11:08:20.26:checkfb
2023.080.11:08:21.27#flagr#flagr/antenna,new-source
2023.080.11:08:27.42/scan_check/?,r11095_on_080-1106,VDIF,32,2023y80d11h6m11.0017s,127.998s,16.000000,0,8032

j5ab:
202308012:08:20 2023-03-21 11:08:20.26: Processing command 'scan_check?'
202308012:08:27 2023-03-21 11:08:27.38: openfile_type: found 8225346304 bytes in 33 chunks, 100%
202308012:08:27 2023-03-21 11:08:27.42: Reply: !scan_check? 0 : ? : r11095_on_080-1106 : VDIF : 32 : 2023y080d11h06m11.0017s : 127.998s : 16Mbps : 0 : 8032 ;

FS:
2023.080.11:10:09.14:checkfb
2023.080.11:10:17.43?ERROR m5 -104 mk5cn: time-out, connection closed, will reopen on next attempt

j5ab:
202308012:10:09 2023-03-21 11:10:09.14: Processing command 'scan_check?'
202308012:10:17 2023-03-21 11:10:17.44: openfile_type: found 2955840256 bytes in 12 chunks, 100%
202308012:10:17 2023-03-21 11:10:17.49: Reply: !scan_check? 0 : ? : r11095_on_080-1109a : VDIF : 32 : 2023y080d11h09m21.0102s : 45.9896s : 16Mbps : 0 : 8032 ;

FS seems to have an 8s timeout for scan_check and a small proportion of these calls end up taking >8s.

Monitoring with htop and iotop, nothing particularly seems to be happening during scan_check.

Oddly in the last example that took the longest, the scan was a mere 2.75 GB, whilst the others at 4-8 GB were marginally quicker.

@haavee
Copy link
Member

haavee commented Mar 21, 2023

Could you please restart with -m 4 or -m 5 capturing output, and repeat scancheck, and send output?

@psi-c
Copy link
Author

psi-c commented Mar 21, 2023

Perhaps -m 5 is a bit too verbose...

scan_check produces a gazillion (well 2329866 before I ctrl-c'd after 30s) lines of

`202308015:49:14 2023-03-21 14:49:14.25: Cmd: scan_check?
202308015:49:14 2023-03-21 14:49:14.25: Processing command 'scan_check?'
202308015:49:14 2023-03-21 14:49:14.25: isRecordingEntry: checking entry . against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:14 2023-03-21 14:49:14.25: isRecordingEntry: checking entry .. against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:14 2023-03-21 14:49:14.25: isRecordingEntry: checking entry ny2217_ns_218-0638 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:14 2023-03-21 14:49:14.25: isRecordingEntry: checking entry ny2217_ns_218-0639 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:14 2023-03-21 14:49:14.25: isRecordingEntry: checking entry ny2217_ns_218-0641 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:14 2023-03-21 14:49:14.25: isRecordingEntry: checking entry ny2217_ns_218-0643 against ^r11095_on_080-1447b(_ds[^_\.]+)?$

........

202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0459a_3 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0459a_5 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0504_2 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0504_4 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0505a_5 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0505a_6 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0505a_7 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0506b_0 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0506b_4 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0507b_0 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0507b_2 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0507b_6 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0509_2 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0509_4 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0509_7 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0510_0 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0510_2 against ^r11095_on_080-1447b(_ds[^_\.]+)?$
202308015:49:33 2023-03-21 14:49:33.56: isRecordingEntry: checking entry vo2321_ow_322-0510_3 against ^r11095_on_080-1447b(_ds[^_\.]+)?$

@psi-c
Copy link
Author

psi-c commented Mar 21, 2023

With -m 4 only silence after scan_check

202308015:58:54 2023-03-21 14:58:54.49: Processing command 'scan_check?'
202308015:59:01 2023-03-21 14:59:01.72: openfile_type: found 2569621536 bytes in 11 chunks, 100%
202308015:59:01 2023-03-21 14:59:01.76: scan_check[1/*] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d14h58m13.***s 0b #156
202308015:59:01 2023-03-21 14:59:01.76: scan_check[2/8] read 995968b from offset = 366941920b (2202679616 to EOF)
202308015:59:01 2023-03-21 14:59:01.76: scan_check[2/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d14h58m18.***s 367929856b #5964
202308015:59:01 2023-03-21 14:59:01.76: scan_check[3/8] read 995968b from offset = 733883840b (1835737696 to EOF)
202308015:59:01 2023-03-21 14:59:01.76: scan_check[3/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d14h58m24.***s 734871776b #3649
202308015:59:01 2023-03-21 14:59:01.76: scan_check[4/8] read 995968b from offset = 1100825760b (1468795776 to EOF)
202308015:59:01 2023-03-21 14:59:01.76: scan_check[4/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d14h58m30.***s 1101813696b #1334
202308015:59:01 2023-03-21 14:59:01.76: scan_check[5/8] read 995968b from offset = 1467767680b (1101853856 to EOF)
202308015:59:01 2023-03-21 14:59:01.76: scan_check[5/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d14h58m35.***s 1468755616b #7019
202308015:59:01 2023-03-21 14:59:01.76: scan_check[6/8] read 995968b from offset = 1834709600b (734911936 to EOF)
202308015:59:01 2023-03-21 14:59:01.76: scan_check[6/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d14h58m41.***s 1835697536b #4704
202308015:59:01 2023-03-21 14:59:01.76: scan_check[7/8] read 995968b from offset = 2201651520b (367970016 to EOF)
202308015:59:01 2023-03-21 14:59:01.76: scan_check[7/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d14h58m47.***s 2202639456b #2389
202308015:59:01 2023-03-21 14:59:01.76: scan_check[8/8] read 995968b from offset = 2568625568b (995968 to EOF)
202308015:59:01 2023-03-21 14:59:01.76: scan_check[8/8] = VDIFx32@16000000 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d14h58m52.9999s 2568978976b #7999
202308015:59:01 2023-03-21 14:59:01.76: VDIF from 2023y080d14h58m13.0195s to 2023y080d14h58m52.9999s ntrack=32 bitrate/track=16Mbps framesize=8032 dataarray=8000 threads={0} missing_bytes=0

@psi-c
Copy link
Author

psi-c commented Mar 21, 2023

Another attempt at -m 5 on a fresh non-recording jive5ab, scan_set takes time to run, but a subsequent scan_check? returns instantly. Log file for just these two commands - 192 MB!

2023-03-21 15:05:22.90: Cmd: scan_set=r11095_on_080-1504
2023-03-21 15:05:22.90: Processing command 'scan_set=r11095_on_080-1504'
2023-03-21 15:05:22.90: isRecordingEntry: checking entry . against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:22.90: isRecordingEntry: checking entry .. against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:22.90: isRecordingEntry: checking entry t2159_on_039-1614 against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:22.90: isRecordingEntry: checking entry . against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:22.90: isRecordingEntry: checking entry .. against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:22.90: isRecordingEntry: checking entry ny2217_ns_218-1335 against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:22.90: isRecordingEntry: checking entry ny2217_ns_218-1338 against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:22.90: isRecordingEntry: checking entry ny2217_ns_218-1343 against ^r11095_on_080\-1504(_ds[^_\.]+)?$

..........

2023-03-21 15:05:30.48: isRecordingEntry: checking entry r11095_on_080-0201 against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:30.48: isRecordingEntry: checking entry r11095_on_080-0205b against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:30.48: isRecordingEntry: checking entry r11095_on_080-0210 against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:30.48: isRecordingEntry: checking entry r11095_on_080-0218a against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:30.48: isRecordingEntry: checking entry r11095_on_080-0222a against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:30.48: isRecordingEntry: checking entry r11095_on_080-0226a against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:30.48: isRecordingEntry: checking entry r11095_on_080-0231 against ^r11095_on_080\-1504(_ds[^_\.]+)?$
2023-03-21 15:05:30.48: openfile_type: found 2248614624 bytes in 9 chunks, 100%
2023-03-21 15:05:30.48: Reply: !scan_set= 0 ;
2023-03-21 15:05:30.48: Polling for events....
2023-03-21 15:05:30.48: fd#6 got POLLIN,
2023-03-21 15:05:30.48: Polling for events....

2023-03-21 15:08:06.61: listensok[jive5ab] got POLLIN,
2023-03-21 15:08:06.61: incoming jive5ab on fd#6 129.16.208.148:38048
2023-03-21 15:08:06.61: Polling for events....
2023-03-21 15:08:06.61: fd#6 got POLLIN,
2023-03-21 15:08:06.61: Found 1 command
2023-03-21 15:08:06.61: Cmd: scan_check?
2023-03-21 15:08:06.61: Processing command 'scan_check?'
2023-03-21 15:08:06.61: filechunk_type:open_chunk[/mnt/disk6/r11095_on_080-1504/r11095_on_080-1504.00000000] fd#7 Success
2023-03-21 15:08:06.65: scan_check[1/*] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d15h04m29.***s 0b #126
2023-03-21 15:08:06.65: filechunk_type:close_chunk[/mnt/disk6/r11095_on_080-1504/r11095_on_080-1504.00000000] fd#7
2023-03-21 15:08:06.65: filechunk_type:open_chunk[/mnt/disk15/r11095_on_080-1504/r11095_on_080-1504.00000001] fd#7 Success
2023-03-21 15:08:06.65: scan_check[2/8] read 995968b from offset = 321087232b (1927527392 to EOF)
2023-03-21 15:08:06.65: scan_check[2/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d15h04m34.***s 322075168b #225
2023-03-21 15:08:06.65: filechunk_type:close_chunk[/mnt/disk15/r11095_on_080-1504/r11095_on_080-1504.00000001] fd#7
2023-03-21 15:08:06.65: filechunk_type:open_chunk[/mnt/disk3/r11095_on_080-1504/r11095_on_080-1504.00000002] fd#7 Success
2023-03-21 15:08:06.65: scan_check[3/8] read 995968b from offset = 642174464b (1606440160 to EOF)
2023-03-21 15:08:06.65: scan_check[3/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d15h04m39.***s 643162400b #201
2023-03-21 15:08:06.65: filechunk_type:close_chunk[/mnt/disk3/r11095_on_080-1504/r11095_on_080-1504.00000002] fd#7
2023-03-21 15:08:06.65: filechunk_type:open_chunk[/mnt/disk24/r11095_on_080-1504/r11095_on_080-1504.00000003] fd#7 Success
2023-03-21 15:08:06.65: scan_check[4/8] read 995968b from offset = 963261696b (1285352928 to EOF)
2023-03-21 15:08:06.65: scan_check[4/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d15h04m44.***s 964249632b #177
2023-03-21 15:08:06.65: filechunk_type:close_chunk[/mnt/disk24/r11095_on_080-1504/r11095_on_080-1504.00000003] fd#7
2023-03-21 15:08:06.65: filechunk_type:open_chunk[/mnt/disk12/r11095_on_080-1504/r11095_on_080-1504.00000005] fd#7 Success
2023-03-21 15:08:06.65: scan_check[5/8] read 995968b from offset = 1284348928b (964265696 to EOF)
2023-03-21 15:08:06.65: scan_check[5/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d15h04m49.***s 1285336864b #153
2023-03-21 15:08:06.65: filechunk_type:close_chunk[/mnt/disk12/r11095_on_080-1504/r11095_on_080-1504.00000005] fd#7
2023-03-21 15:08:06.65: filechunk_type:open_chunk[/mnt/disk18/r11095_on_080-1504/r11095_on_080-1504.00000006] fd#7 Success
2023-03-21 15:08:06.65: scan_check[6/8] read 995968b from offset = 1605436160b (643178464 to EOF)
2023-03-21 15:08:06.65: scan_check[6/8] = VDIFx32@0 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d15h04m54.***s 1606424096b #129
2023-03-21 15:08:06.65: filechunk_type:close_chunk[/mnt/disk18/r11095_on_080-1504/r11095_on_080-1504.00000006] fd#7
2023-03-21 15:08:06.65: filechunk_type:open_chunk[/mnt/disk1/r11095_on_080-1504/r11095_on_080-1504.00000007] fd#7 Success
2023-03-21 15:08:06.65: scan_check[7/8] read 995968b from offset = 1926523392b (322091232 to EOF)
2023-03-21 15:08:06.65: scan_check[7/8] = VDIFx32@16000000 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d15h04m58.9999s 1926659936b #7999
2023-03-21 15:08:06.65: filechunk_type:close_chunk[/mnt/disk1/r11095_on_080-1504/r11095_on_080-1504.00000007] fd#7
2023-03-21 15:08:06.65: filechunk_type:open_chunk[/mnt/disk29/r11095_on_080-1504/r11095_on_080-1504.00000008] fd#7 Success
2023-03-21 15:08:06.65: scan_check[8/8] read 995968b from offset = 2247618656b (995968 to EOF)
2023-03-21 15:08:06.65: scan_check[8/8] = VDIFx32@16000000 V:8032x1thrds {thrd#0=16ch x 2bits/sample} => 2023y080d15h05m03.9999s 2247939936b #7999
2023-03-21 15:08:06.65: VDIF from 2023y080d15h04m29.0158s to 2023y080d15h05m03.9999s ntrack=32 bitrate/track=16Mbps framesize=8032 dataarray=8000 threads={0} missing_bytes=0
2023-03-21 15:08:06.65: Reply: !scan_check? 0 : ? : r11095_on_080-1504 : VDIF : 32 : 2023y080d15h04m29.0158s : 34.9841s : 16Mbps : 0 : 8032 ;
2023-03-21 15:08:06.65: Polling for events....
2023-03-21 15:08:06.65: fd#6 got POLLIN,
2023-03-21 15:08:06.65: Polling for events....


@haavee
Copy link
Member

haavee commented Mar 21, 2023

It seems the scan for chunks takes ages. That is the big difference between 3.0.0 and this one: automatically finding chunks of multiple data stream recordings. Is this particular flexbuff quite full? Of course, printing all the debug info also takes a non-zero amount of time, especially at this volume.

If you manage to compile & run this specific commit (1a147cd) you should see that both the scan_set= and the scan_check? would take ~8 seconds; that was an oversight that I fixed only recently (caching the open vbs recording).

Somehow I think the slowness is related to either the machine doing something timeconsuming in parallel, or being very full/having LOADS of chunk names to filter. Hmmm.

@psi-c
Copy link
Author

psi-c commented Mar 21, 2023

Is this particular flexbuff quite full?

Well, about 70% full so it can only get worse ;)

it seems to take about the same time with -m 3 (scan_check) as with -m 5 (scan_set) when -m 5 is piped only to a file and not teed to stdout. Interestingly the change to this release was to take advantage of "-D" as exactly this issue was hitting the first 6-8s of the recordings since yesterday, now the 'dead-time' has jumped to post-scan!

Machine's idle apart from this, but grepping for isRecordingEntry gives 1668982 matches in the 192 MB log so that's probably how many individual scan-directories there are.

Like I said I didn't see anything much happening in either iotop or htop, but I can double check later with your suggested commit.

@psi-c
Copy link
Author

psi-c commented Mar 21, 2023

Righty maybe I blinked the first time but running 1a147cd ....

Initially

time echo "scan_set=r11095_on_080-1504" | nc -N <flexbuff> 2620
!scan_set= 0 ;

real	0m12.210s
user	0m0.003s
sys	0m0.000s

Then subsequent alternating calls to scan_set & scan_check take 7.6ish s each. During which all 4 cores boost to 4.3 GHz and show about 50% usage with 36 extra threads created. Could it be 36 threads on 4 cores is a bit too much and they're starting to fight over core time? Note that on this buff jive5ab is typically pinned to just 2 cores anyway. 1 core saved for network & 1 core for "everything else".

@haavee
Copy link
Member

haavee commented Mar 22, 2023

Interestingly the change to this release was to take advantage of "-D" as exactly this issue was hitting the first 6-8s of the recordings since yesterday,

Yes, the start-up delay (checking for duplicate recording names) suffers from the exact same issue. I take it the reason this was not happening before was because an older version (say 3.0.0 or before) was in use?

During which all 4 cores boost to 4.3 GHz and show about 50% usage with 36 extra threads created. Could it be 36 threads on 4 cores is a bit too much and they're starting to fight over core time?

The threads are 1:1 for each mount point to scan - if we don't parallelise scanning them then it would take even longer. However, in this case, I really don't know which is the bottle neck: the disk I/O (36 threads trying to execute readdir_r() at the same time on different mount points) or CPU because 8 threads / core trying to regex_match() on many strings. Is there scope to let jive5ab loose over a few more cores to see what happens?

@psi-c
Copy link
Author

psi-c commented Mar 22, 2023

Hmph somebody's cleaned up the buff overnight...

With 3.0.0 there was a startup-delay but scan_check was instant.

Currently at about 40% full now

4 cores fairly consistent scan_check & scan_set 4.1s
3 cores 7 times 4.1s, one time 3.1s
2 cores 3.0-3.1s
1 core 3.2s

@haavee
Copy link
Member

haavee commented Mar 22, 2023

so it's not the amount of CPUs/cores but really the actual opening directories and scanning the entries by the looks of it.
Ah wait! There is a huge GIL-like construction; the [n]ftw() system calls are probably NOT mt-safe

The nftw() function need not be thread-safe.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/nftw.html

So within jive5ab I've put a global mutex around nftw() which effectively serialises the scanning to some extent - limiting to collecting the matching entries. I've tried to do parallelise as much of the processing of those results by doing that without locking. It might be an idea to see if this can be lifted. Could have a go at trying with the help of C++11 lambda's.
(update well, it seems that for this style of callback functions the capturing lambda I was thinking of won't work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants