Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Handle Leak in SearchFiles demo and LargeInputFST test code #615

Closed
ShreyTiwari opened this issue Feb 3, 2022 · 5 comments · Fixed by #1074
Closed

File Handle Leak in SearchFiles demo and LargeInputFST test code #615

ShreyTiwari opened this issue Feb 3, 2022 · 5 comments · Fixed by #1074
Labels
is:bug pri:low up-for-grabs This issue is open to be worked on by anyone

Comments

@ShreyTiwari
Copy link

Hi,

I am new to the Lucenenet project. While investigating this project, I found this bug related to a file handle leak.
Location: https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Demo/SearchFiles.cs#L117
Bug: The "input" variable can point to a FileStream that is never disposed.

Another example:
Location: https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Tests.Suggest/Suggest/Fst/LargeInputFST.cs#L45
Bug: The "reader" variable points to a FileStream that is never disposed.

I was also wondering if such kind of leaks related to IDisposables exist in the code base. Running an existing CodeQL (static analysis tool) query for detecting local un-disposed objects in the code, I observed 45 alerts being generated for the entire codebase. I do not know if all of these 45 alerts are actual leaks or benign alerts.

Could you please share information on ways in which leaks, in general, are currently being prevented in the Lucenenet project (any tools being used or processes in place)?

Thanks in advance!

@NightOwl888
Copy link
Contributor

Thanks for the report.

Technically, this is a duplicate of #265. We haven't yet reviewed the IDisposable usages to ensure they are correct from a .NET perspective. Although, that issue is up-for-grabs if you are interested in contributing, and we welcome your assistance. We have largely saved this task for the community and have been focusing on more complicated issues that require more than just an understanding of .NET best practices to solve.

Lucene.NET is (for the most part) a line-by-line port of Lucene 4.8.0. As such, we carried over many of the gaps between Java and .NET. One such gap is that in Java the Closable interface inherits the AutoCloseable interface, which can be configured to automatically be cleaned up. We haven't yet researched whether Lucene is configured with that option enabled, but suspect that is the reason why the code doesn't explicitly call .close() (and why the .NET version doesn't explicitly call .Dispose()). This includes the two you mentioned.

Given this fact, there are likely lots of places where there are missing calls to Dispose(), most of which can be added (with a comment LUCENENET specific: added call to Dispose() or LUCENENET specific: added using block).

Note there is at least 1 known place where the call to Dispose() is problematic in #271. We are seeing multiple calls to Dispose() because we have added this line in Tokenizer that didn't exist in Lucene. I suspect that Lucene falls back to the AutoCloseable config in this case, and the closest .NET equivalent is to use a finalizer, but both of these assumptions need confirmation. Clearly, using a finalizer has drawbacks and we should avoid that approach and use explicit calls to Dispose(), if possible. It also seems like Dispose() and Close() should be separate concepts in this particular case, being that this class is expected to be reused after it is "closed" without creating a new instance. But, the bottom line is we cannot just add all of the missing Dispose() calls and using statements and assume that is the correct approach - each case needs to be considered carefully according to its existing usage in Java.

It sounds like CodeQL can be utilized to speed up the process of locating them. My suggestion would be to submit a separate commit for each of the issues you found, and break them down into multiple PRs in some sensible way, so we can review and test them thoroughly chunk by chunk

@ShreyTiwari
Copy link
Author

Thank you for all the useful links. From the current analysis it doesn't seem like all the alerts that have been raised could be bugs. For example, "MemoryStream" instances in the code do not necessarily have to be disposed since they would be garbage collected without causing any resource leaks.

I will raise PR's for the bugs and submit the fixes as part of separate commits.

@paulirwin paulirwin added this to the 4.8.0-beta00018 milestone Oct 28, 2024
@paulirwin
Copy link
Contributor

Looking at the two files listed, one of them is in Demo, and the other is in unit tests. The Demo one still matches latest Lucene. (I can't find the other test file quickly in latest Lucene, might have been renamed.) The latter isn't even referenced in Lucene.NET anyways and probably just serves as example code. I think these cases can be fixed so I'll leave this open, but I don't think this is high priority. As @NightOwl888 noted, #265 would serve as a good review of this solution-wide.

@paulirwin paulirwin changed the title File Handle Leak File Handle Leak in SearchFiles demo and LargeInputFST test code Nov 19, 2024
@paulirwin paulirwin added pri:low up-for-grabs This issue is open to be worked on by anyone labels Nov 21, 2024
@stesee
Copy link
Contributor

stesee commented Nov 24, 2024

Missing dispose calls can also be found by the NETAnalyzers that integrate into your local dev env. Are you interested in activating them in the whole solution?

Seems like there are a lot more missing dispose calls - 28 CA2000 warnings in the main project "Lucene.Net".

@paulirwin
Copy link
Contributor

@stesee Thanks, I moved your comment over to #265 which is a ticket to do a larger sweep of IDisposable usage, where that might be useful. We'll keep this issue focused on these two particular issues.

paulirwin added a commit to paulirwin/lucene.net that referenced this issue Dec 22, 2024
paulirwin added a commit that referenced this issue Dec 26, 2024
* Add using statements where possible, per CA2000, #265

* Fix file handle leaks in demo code, #615

* Add leaveOpen parameter to InputStreamDataInput, #265

* Add leaveOpen parameter to OutputStreamDataOutput, #265
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:bug pri:low up-for-grabs This issue is open to be worked on by anyone
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants