Integer overflow when comparing two large automata #84

tneele · 2025-01-06T13:45:40Z

I have two large automata for which I call the function Automata.findShortestSeparatingWord to find some difference (if any). However, these automata are apparently so large that an integer overflow happens on the following line.

automatalib/util/src/main/java/net/automatalib/util/automaton/equivalence/DeterministicEquivalenceTest.java

Line 44 in d156e08

int totalStates = refSize * other.size();

This means that the subsequent check if (totalStates > MAP_THRESHOLD) passes and the program ultimately crashes with a NegativeArraySizeException on

automatalib/util/src/main/java/net/automatalib/util/automaton/equivalence/DeterministicEquivalenceTest.java

Line 74 in d156e08

Pred<I>[] preds = new Pred[totalStates];

The text was updated successfully, but these errors were encountered:

mtf90 · 2025-01-06T18:54:36Z

Thank you for reporting this issue. Unfortunately, I don't have a quick solution in mind. For computing the shortest separating word, the method performs a breadth-first traversal on the product-automaton which requires n1 * n2 possible cache entries to detect cycles (more importantly, it requires the complete range to correctly identify pairs of states).

Do you have anything specific in mind, or is it more about reporting the correct issue (e.g., using an IllegalArgumentException with a more expressive error message)?

If you don't care about shortest counterexamples, you could use the NearLinearEquivalenceTest instead (or Automata#findSeparatingWord) which only requires O(n1 + n2) instead of O(n1 * n2) space/ids.

tneele · 2025-01-10T09:39:00Z

Unless I'm overlooking something, I think the method findSeparatingWordLarge in the same file would work for me. As far as I can see, this method does not allocate a cache a priori, but builds it up gradually in a HashMap. Thus, if we manage to find a separating word and terminate early, we might never hit an OOM. If both automata are equal and also minimal, then the set of reachable states in the product is not any larger than either automaton, and we also do not hit an OOM.

Even though findSeparatingWordLarge is meant as a fallback to findSeparatingWord, we never get there, because the check on line 46 fails due to the integer overflow.

Do you agree with this?

If you don't care about shortest counterexamples

Unfortunately, my algorithms assume shortest counterexamples.

mtf90 · 2025-01-10T21:27:01Z

The main problem is not running out of memory, but not being able to compute the correct ids / indices. Despite overflowing, I think two's complement still allows you to correctly compute multiplication and addition (i.e., abusing negative values as an unsigned interpretation) but as soon as your input automata have more than 2^16 states each, you hit a dead end.

Luckily, a rather simple solution is to use longs instead of ints. Especially in the large case, you can easily use them as keys for the Map without too much of a performance overhead. I refactored the class in b7dc4ca and took the freedom to remove the large version as well since it was an identical implementation with just different bookkeeping.

I also added a test-case that now passes. However, I would be greatful if you could double-check with your use-case as well. Since the class is pretty much self-contained, you should be able to just copy it as-is and call it instead of the current upstream version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer overflow when comparing two large automata #84

Integer overflow when comparing two large automata #84

tneele commented Jan 6, 2025

mtf90 commented Jan 6, 2025

tneele commented Jan 10, 2025

mtf90 commented Jan 10, 2025

Integer overflow when comparing two large automata #84

Integer overflow when comparing two large automata #84

Comments

tneele commented Jan 6, 2025

mtf90 commented Jan 6, 2025

tneele commented Jan 10, 2025

mtf90 commented Jan 10, 2025