-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set_disks? not entirely benign? #33
Comments
Interesting issue! All the more puzzling since All the calls to In this log I see is what I suspect to be a truncated reply to the |
Interesting, your description is indeed how I'd expect it to behave (i.e. it affects nothing). The final (good) reply is just truncated in my copy paste, it's a full set of disks in the log. The only change I'm aware of since this happened is the insertion of a "set_disks?" for each scan recorded hence the finger pointing... I'll check nothing else was fiddled with at the same time! |
So it seems they are running several m5copys in parallel (vbs to remote file) and somewhere along the line files are not being closed, but transfers seem to complete correctly at the destination. After one jive5ab gave up last night they switched to another on a different port and resumed. On skirner, that new jive5ab has around 440 open vbs files (transfer finished about 4 hours ago), most of which are open by 2 threads plus the main PID. Can't see anything unusual in the log though. The same was done on gyller but that jive5ab seems to have a healthy amount of files open... |
Ah ok. It may be that indeed another transfer in the same process is hogging up file descriptors. I think I heard before that some transfers seem to not release all the file descriptors. This is all |
This is using disk2net (vbs to net), odd that both buffs suffered the same fate last night. I've suggested removing the set_disks query and repeat to see if it happens any more. |
My .02 arbitrary currency units will be that it won't affect the outcome 🤞 |
I'll likely be offering you .02 zl in a couple of weeks then! I did count the occurrences of
in the second log file on skirner, each popped up 442 times, which matches rather well with the number of files open - 1 per scan. |
Crikey. Well, not that arbitrary currency units! |
I count 444 occurrences of transient so I'd say that's a big yes! The first 2 transients died due due to no write permissions (I guess bad path) on the remote. |
That's very useful to know, more hints at where to search. Thanks! |
I think I have an idea what's happening: we introduced a counted-pointer-to-filedescriptor in the This change was introduced to fix a My expectation was that this mechanism would leak at most one file descriptor: it would keep the last one used open but as soon as a new transfer would be started that would replace the counted pointer with a new one and making the refcount of the previous opened file go to zero and have it deleted. Now that I'm typing this I realise that this may be per runtime. So now we have each runtime leaking one fd because it does not get reused. Probably the "transient" runtimes with unique names interfere badly with each runtiming leaking something. Possible ways forward: Note: |
It would appear that should one have a fascination with the "set_disks?" query, jive5ab becomes rather annoyed and ultimately retaliates by replying
Reply: !set_disks? 0 : 0 ;
Which for a flexbuff isn't particularly useful.
One buff (30 mount points) got away with 1037 queries before bad things happened, the other (36 mount points) 1092 so that looks to be the order of what jive5ab (linux/ulimit?) tolerates. From the log, could it be set_disks? is opening the mountpoints again on each call, without freeing them?
202333420:16:02 2023-11-30 19:16:02.65: Processing command 'set_disks?' 202333420:16:02 2023-11-30 19:16:02.65: Reply: !set_disks? 0 : 30 : /mnt/disk0 : /mnt/disk1 : /mnt/disk10 : /mnt/disk11 : /mnt/disk12 : /mnt/disk13 : /mnt/disk14 : /mnt/disk15 : /mnt/disk> 202333420:16:04 2023-11-30 19:16:04.81: Processing command 'record=mk6:0' 202333420:16:04 2023-11-30 19:16:04.81: Reply: !record= 0 ; 202333420:16:04 2023-11-30 19:16:04.82: Processing command 'scan_set=v23334_oe_334-1735_3::' 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk21)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk22)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk23)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk24)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk25)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk26)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk27)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk15)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk2)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk16)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk20)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk12)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk28)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk13)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.82: scanMk6RecordingMountpoint(/mnt/disk3)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk5)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk7)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk9)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk6)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk29)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk4)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk8)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk18)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk17)/ ::opendir fails - Too many open files 202333420:16:04 2023-11-30 19:16:04.83: scanMk6RecordingMountpoint(/mnt/disk14)/ ::opendir fails - Too many open files
The text was updated successfully, but these errors were encountered: