Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate possible bug in im crate #3395

Closed
Tracked by #3400
bojanserafimov opened this issue Jan 20, 2023 · 20 comments
Closed
Tracked by #3400

Investigate possible bug in im crate #3395

bojanserafimov opened this issue Jan 20, 2023 · 20 comments
Assignees
Labels
c/storage Component: storage
Milestone

Comments

@bojanserafimov
Copy link
Contributor

If the bug is not in the im crate then it's in our layer map so it's good to check. If the bug is in the im crate, it's good to minimize and report/fix

More context:

Originally posted by @bojanserafimov in #2998 (comment)

@knizhnik
Copy link
Contributor

If you have working version with rpds, then it seems to be quite easy to investigate this bug: just dump located layer in both cases and find the first difference. Do you want me to look at this bug or your prefer to investigate it yourself?

@neondatabase-bot neondatabase-bot bot added this to the 2023/01 milestone Jan 23, 2023
@bojanserafimov
Copy link
Contributor Author

just dump located layer in both cases and find the first difference

Yes it could be easy. Or it might get more complicated with the heisenbug mentioned here #2998 (comment) It's very suspicious that get_prev mutates something.

Anyway, feel free to steal this task, just let me know (assign it to yourself). I'll work on layer map gc first

@knizhnik
Copy link
Contributor

This branch is now removed (once you have merged in in main).
Do you have version with im somewhere?

Also I wonder about get_prev() method - I failed to find it in TreeMap specification:
https://docs.rs/immutable-map/latest/immutable_map/index.html
Are you using this crate?

@bojanserafimov
Copy link
Contributor Author

The branch lm_im still exists https://github.com/neondatabase/neon/compare/main...lm_im?expand=1

But I created this before a big refactor of the code, so maybe it's easier to start from scratch. Just replace rpds with im, and insert_mut with insert, etc. Should be a 10 line diff.

Here's get_prev https://docs.rs/im/latest/im/struct.OrdMap.html#method.get_prev

@knizhnik
Copy link
Contributor

I wonder on which test IM is failed?
I tried it with pgbench and it works normally and shows results quite similar to rpds (both population database and select-only).

@bojanserafimov
Copy link
Contributor Author

I wonder on which test IM is failed?

See here #2998 (comment)

Fails on the checks in the benchmark. Also previously it failed on running pgbench init 10GB 3 times on the same timeline. Not sure if that one was deterministic.

@knizhnik
Copy link
Contributor

Which benchmark? bench_layer_map? I was able to run it... And once again I do not see that im is faster:
rpds:

captest_uniform_queries time:   [5.9445 ms 6.0608 ms 6.2280 ms]
captest_rel_dir_query   time:   [257.27 ns 260.78 ns 265.20 ns]
Finished layer map init in 990.251568ms
real_map/uniform_queries
                        time:   [4.8665 ms 4.9175 ms 4.9728 ms]
real_map/get_difficulty_map
                        time:   [13.663 ms 13.819 ms 14.002 ms]
sequential/uniform_queries
                        time:   [23.100 µs 23.256 µs 23.446 µs]

im:

captest_uniform_queries time:   [6.0459 ms 6.1214 ms 6.2084 ms]
                        change: [-2.0298% +1.0005% +3.5463%] (p = 0.51 > 0.05)
captest_rel_dir_query   time:   [324.09 ns 328.31 ns 334.09 ns]
                        change: [+16.280% +18.682% +21.332%] (p = 0.00 < 0.05)
Finished layer map init in 762.662596ms
real_map/uniform_queries
                        time:   [6.2236 ms 6.3163 ms 6.4160 ms]
                        change: [+26.030% +28.446% +30.901%] (p = 0.00 < 0.05)
real_map/get_difficulty_map
                        time:   [21.821 ms 21.973 ms 22.140 ms]
                        change: [+56.668% +59.010% +61.147%] (p = 0.00 < 0.05)
 sequential/uniform_queries
                        time:   [18.917 µs 19.094 µs 19.305 µs]
                        change: [-19.097% -18.097% -16.849%] (p = 0.00 < 0.05)

@bojanserafimov
Copy link
Contributor Author

Are you using get_prev?

@bojanserafimov
Copy link
Contributor Author

real_map/get_difficulty_map has an internal check that failed.

@knizhnik
Copy link
Contributor

No.
But I replaced rev().next() with next_back()

@bojanserafimov
Copy link
Contributor Author

Read #2998 (comment)

At first they passed, but the code was slower. Then I changed .range(..=key).rev().next() to .get_prev(&key), and the code got faster, but it failed the checks. The two different get_difficulty implementations returned different results, even though they both use the same map.

@bojanserafimov
Copy link
Contributor Author

I think get_prev runs actually different code. It doesn't make the same allocations that range makes.

@bojanserafimov
Copy link
Contributor Author

If you share a link to your branch I can help reproduce it.

@bojanserafimov
Copy link
Contributor Author

Nevermind, here's a branch you can use: https://github.com/neondatabase/neon/compare/layer_map_im?expand=1

Just run the benchmarks, and the get_difficulty_map one will fail

@bojanserafimov
Copy link
Contributor Author

Updated the branch with a tighter assertion. Found that get_prev(&key) sometimes retuns None when range(..=key).rev.next() returns Some.

@bojanserafimov
Copy link
Contributor Author

bojanserafimov commented Jan 24, 2023

Found a shorter sequence of updates that reveals the bug (about 100 updates). See https://github.com/bojanserafimov/im_crate_bug_report

Also, I previously thought thatget_prev mutates something, but that's not the case. It was just causing an early return because of a ?. So there's no unexpected mutation, just incorrect return value.

@bojanserafimov
Copy link
Contributor Author

Reported the bug bodil/im-rs#205

Also given this other issue bodil/im-rs#202 and the lack of response to it, I think it's better to stay with rpds

I'm currently running the same fuzz test on rpds and have found no issues so far

@knizhnik
Copy link
Contributor

I tried to step through get_prev() code in debugger.
May be I missed something, but looks like the following code is not correct:

46	        match A::search_key(&self.keys, key) {
247	            Ok(index) => Some(&self.keys[index]),
248	            Err(index) => match self.children[index] {
249	                None if index == 0 => None,
250	                None => self.keys.get(index - 1).map(|_| &self.keys[index - 1]),
251	                Some(ref node) => node.lookup_prev(key),
252	            },
253	        }

If key is not found in the children, then no attempt is made to search it in previous children and as a result None is propagated upstairs.
In any case I think it is right decision to stay with rpds.
Original question was whether it is bug in IM or in our layer_map implementation.
Now we can say for sure that it is bug in IM.

@knizhnik
Copy link
Contributor

with the following IM patch layer bench normally completed:

diff --git a/src/nodes/btree.rs b/src/nodes/btree.rs
index 84f63fa..2e6452e 100644
--- a/src/nodes/btree.rs
+++ b/src/nodes/btree.rs
@@ -235,6 +235,24 @@ impl<A: BTreeValue> Node<A> {
         }
     }
 
+    fn child<'a, BK>(&'a self, key: &BK, index: usize) -> Option<&A>
+    where
+        BK: Ord + ?Sized,
+        A::Key: Borrow<BK>,
+    {
+        match self.children[index] {
+            None if index == 0 => None,
+            None => self.keys.get(index - 1).map(|_| &self.keys[index - 1]),
+            Some(ref node) => node.lookup_prev(key).or_else(|| {
+                if index == 0 {
+                    None
+                } else {
+                    self.child(key, index - 1)
+                }
+            }),
+        }
+    }
+
     pub(crate) fn lookup_prev<'a, BK>(&'a self, key: &BK) -> Option<&A>
     where
         BK: Ord + ?Sized,
@@ -245,11 +263,7 @@ impl<A: BTreeValue> Node<A> {
         }
         match A::search_key(&self.keys, key) {
             Ok(index) => Some(&self.keys[index]),
-            Err(index) => match self.children[index] {
-                None if index == 0 => None,
-                None => self.keys.get(index - 1).map(|_| &self.keys[index - 1]),
-                Some(ref node) => node.lookup_prev(key),
-            },
+            Err(index) => self.child(key, index),
         }
     }

Performance with patched IM and get_prev is the following:

captest_uniform_queries time:   [3.4951 ms 3.5300 ms 3.5703 ms]
captest_rel_dir_query   time:   [188.40 ns 190.01 ns 191.80 ns]
Finished layer map init in 764.03677ms
real_map/uniform_queries
                        time:   [3.7333 ms 3.8060 ms 3.8843 ms]
real_map/get_difficulty_map
                        time:   [19.668 ms 20.244 ms 20.842 ms]
Finished layer map init in 625.197474ms
sequential/uniform_queries
                        time:   [10.334 µs 10.756 µs 11.213 µs]

Seems to be not "times" faster I have replaced next_back() only in one place - in query(&self, key: i128).
It seems to me that you wrote that get_prev() is 70% faster. On which workload?

@bojanserafimov
Copy link
Contributor Author

#2998 (comment)

It seems to me that you wrote that get_prev() is 70% faster. On which workload?

30%, not 70% (see my comment), on my pc, on most of the microbenchmarks. Mostly vec allocation is what disappeared from the flame graph. rpds also does allocations, which we can also probably patch out. The next bottleneck is Arc::clone, and internal overhead for using the Sync version of the immutable data structure. That's harder to avoid without a larger refactor. Not worth imo, what we have is fast enough. Memory use will be a larger concern as we scale the layer map with more granularity of layer internals.

If we want more layer map perf, a better place to start is to change the search api so we're not looking for the latest image on every request. We should only be looking for the latest image once per reconstruction call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage Component: storage
Projects
None yet
Development

No branches or pull requests

2 participants