Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow when comparing two large automata #84

Open
tneele opened this issue Jan 6, 2025 · 3 comments
Open

Integer overflow when comparing two large automata #84

tneele opened this issue Jan 6, 2025 · 3 comments

Comments

@tneele
Copy link

tneele commented Jan 6, 2025

I have two large automata for which I call the function Automata.findShortestSeparatingWord to find some difference (if any). However, these automata are apparently so large that an integer overflow happens on the following line.

This means that the subsequent check if (totalStates > MAP_THRESHOLD) passes and the program ultimately crashes with a NegativeArraySizeException on

@mtf90
Copy link
Member

mtf90 commented Jan 6, 2025

Thank you for reporting this issue. Unfortunately, I don't have a quick solution in mind. For computing the shortest separating word, the method performs a breadth-first traversal on the product-automaton which requires n1 * n2 possible cache entries to detect cycles (more importantly, it requires the complete range to correctly identify pairs of states).

Do you have anything specific in mind, or is it more about reporting the correct issue (e.g., using an IllegalArgumentException with a more expressive error message)?

If you don't care about shortest counterexamples, you could use the NearLinearEquivalenceTest instead (or Automata#findSeparatingWord) which only requires O(n1 + n2) instead of O(n1 * n2) space/ids.

@tneele
Copy link
Author

tneele commented Jan 10, 2025

Unless I'm overlooking something, I think the method findSeparatingWordLarge in the same file would work for me. As far as I can see, this method does not allocate a cache a priori, but builds it up gradually in a HashMap. Thus, if we manage to find a separating word and terminate early, we might never hit an OOM. If both automata are equal and also minimal, then the set of reachable states in the product is not any larger than either automaton, and we also do not hit an OOM.

Even though findSeparatingWordLarge is meant as a fallback to findSeparatingWord, we never get there, because the check on line 46 fails due to the integer overflow.

Do you agree with this?

If you don't care about shortest counterexamples

Unfortunately, my algorithms assume shortest counterexamples.

@mtf90
Copy link
Member

mtf90 commented Jan 10, 2025

The main problem is not running out of memory, but not being able to compute the correct ids / indices. Despite overflowing, I think two's complement still allows you to correctly compute multiplication and addition (i.e., abusing negative values as an unsigned interpretation) but as soon as your input automata have more than 2^16 states each, you hit a dead end.

Luckily, a rather simple solution is to use longs instead of ints. Especially in the large case, you can easily use them as keys for the Map without too much of a performance overhead. I refactored the class in b7dc4ca and took the freedom to remove the large version as well since it was an identical implementation with just different bookkeeping.

I also added a test-case that now passes. However, I would be greatful if you could double-check with your use-case as well. Since the class is pretty much self-contained, you should be able to just copy it as-is and call it instead of the current upstream version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants