Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max order comparator #2

Open
wants to merge 29 commits into
base: develop
Choose a base branch
from
Open

Conversation

santialvarezcolombo
Copy link

The new logging and caching module for antidote, will use eleveldb as one of its possible backends. In order to use eleveldb, we needed to code a new comparator, so keys get sorted as we want.

Keys are composed as follows:

{antidote_key, max_value_in_vc, vc_hash, op/snap, vc}

  • antidote_key: Same key as in Antidote.
  • max_value_in_vc: The max value for all DCs in the VC.
  • vc_hash: A hash of the VC. This is done in order to provide a more performant sorting algorithm, since we don't need to compare all the VC, to find that keys are different. If the hash matches, we also compare the VC, just to make sure it's not a collision.
  • op/snap: an atom indicating if the stored value corresponds to an operation or a snapshot.
  • vc: the vc of when the op/snap ocurred.

Sorting of keys

Keys get sorted first by the Antidote key.
Then we look at the MAX value of it's VC, sorting first the biggest one. We reverse the "natural" order, so we can have the most recent operations first.
If the max value for two keys matches, we sort taking into account the hash value.
For matching hashes, we check the rest of the key, sorting accordingly.

As Erlang serialises objects sent to eleveldb, according http://erlang.org/doc/apps/erts/erl_ext_dist.html, so what we do is check those code to compare keys.

* develop: (69 commits)
  add 2.0.27 info
  revert leveldb version to blank before merge to develop branch
  address code review issues.
  revert worker thread default to 71 due to fact some customers have static advanced.conf files that do NOT set thread count.  so their production environment would drop from 71 threads to 7 ... oops.
  adjust code to use 7 worker threads in non-production modes and/or developer mode.  71 threads will still be assigned by cuttlefish defaults for production.
  update test for default compression format to lz4.  2.0 branch not backported to develop
  remove double delete at line 148 by using RefDecNoDelete() instead of RefDec().  Use accessors instead direct variable access to enable memory fencing
  move m_CloseRequest to protected, forcing code to use accessors ... which now provides memory fencing
  address Issue 212, cleanup unused writebatch upon error.  Switch access to m_CloseRequested to functions instead of direct access.  Allows easy memory fencing.
  branch specific change
  disable time conversion in debug routine currently not used.
  update release info for 2.0.25
  make lz4 the default compression instead of snappy
  Adjust the disabling of expiry_minutes to use unlimited keyword.
  revert to expiry module per vnode, allows multibackend differences
  update expiry cuttlefish params to match notes from cv
  create compression multi_backend test
  remove _Default case in compression translation block
  Adjusted open_options type definition to include proper type for compression tag.
  update -type and -spec for compression
  ...
* develop: (69 commits)
  add 2.0.27 info
  revert leveldb version to blank before merge to develop branch
  address code review issues.
  revert worker thread default to 71 due to fact some customers have static advanced.conf files that do NOT set thread count.  so their production environment would drop from 71 threads to 7 ... oops.
  adjust code to use 7 worker threads in non-production modes and/or developer mode.  71 threads will still be assigned by cuttlefish defaults for production.
  update test for default compression format to lz4.  2.0 branch not backported to develop
  remove double delete at line 148 by using RefDecNoDelete() instead of RefDec().  Use accessors instead direct variable access to enable memory fencing
  move m_CloseRequest to protected, forcing code to use accessors ... which now provides memory fencing
  address Issue 212, cleanup unused writebatch upon error.  Switch access to m_CloseRequested to functions instead of direct access.  Allows easy memory fencing.
  branch specific change
  disable time conversion in debug routine currently not used.
  update release info for 2.0.25
  make lz4 the default compression instead of snappy
  Adjust the disabling of expiry_minutes to use unlimited keyword.
  revert to expiry module per vnode, allows multibackend differences
  update expiry cuttlefish params to match notes from cv
  create compression multi_backend test
  remove _Default case in compression translation block
  Adjusted open_options type definition to include proper type for compression tag.
  update -type and -spec for compression
  ...
* develop:
  Make multi_backend compression settings commented
  update BASHO_RELEASES
  change default compression expectation in basic_schema_test
  add description to compression algorithms
  snappy as default if nothing in riak.conf, default riak.conf sets lz4
@peterzeller
Copy link

I wonder if you still need the custom comparator, if you don't have to compare vector clocks any more.

I think you could now use Russells approach (here at 33:40 he explains it in a talk) and just format the key such that the native ordering works. He used 0-bytes in the key to separate the different parts. For comparing numbers you would probably have to pad them with zeros to make them comparable with the native ordering.

I don't know if that would be faster, but I think it would be easier to maintain if we later want to update eleveldb or if we want to add big-sets to Antidote.

Another comment: Wouldn't it make sense to put the op/snap part more to the front, so that snapshots and operations are stored separately?

@santialvarezcolombo
Copy link
Author

We need the comparator because it makes the lookup for partial keys work. What I mean by partial keys is this. If you simply reverse the order, that fold doesn't match any starting key, and therefore, folds all the DB.

I was planning to first get this version working in Antidote, and then use what Russell commented of getting rid of the C++ code, which I would really like too.

I can try out the op/snap thing once this code is working on Antidote and check if there is any difference.

@marc-shapiro
Copy link

Santiago,

Repeating the suggestion by Russell: instead of coding a new comparator (which implies complexity and overhead), couldn't you just encode the LevelDB key as you mention below, so that the default comparison operator returns the right thing (or close enough)?

                        Marc

Le 27 oct. 2016 à 01:38, Santiago Alvarez Colombo [email protected] a écrit :

The new logging and caching module for antidote, will use eleveldb as one of its possible backends. In order to use eleveldb, we needed to code a new comparator, so keys get sorted as we want.

Keys are composed as follows:

{antidote_key, max_value_in_vc, vc_hash, op/snap, vc}

antidote_key: Same key as in Antidote.
max_value_in_vc: The max value for all DCs in the VC.
vc_hash: A hash of the VC. This is done in order to provide a more performant sorting algorithm, since we don't need to compare all the VC, to find that keys are different. If the hash matches, we also compare the VC, just to make sure it's not a collision.
op/snap: an atom indicating if the stored value corresponds to an operation or a snapshot.
vc: the vc of when the op/snap ocurred.
Sorting of keys

Keys get sorted first by the Antidote key.
Then we look at the MAX value of it's VC, sorting first the biggest one. We reverse the "natural" order, so we can have the most recent operations first.
If the max value for two keys matches, we sort taking into account the hash value.
For matching hashes, we check the rest of the key, sorting accordingly.

As Erlang serialises objects sent to eleveldb, according http://erlang.org/doc/apps/erts/erl_ext_dist.html http://erlang.org/doc/apps/erts/erl_ext_dist.html, so what we do is check those code to compare keys.

You can view, comment on, or merge this pull request online at:

#2 #2
Commit Summary

first changes to add antidote comparator
fix elseif order
added erlang external format checks + comparison of antidote keys
fix bug in Slice size while copying
parse list size method
key compare finished. started with VC comparison
first version for VC comparison
revert change in checkList method
added missing tuple parsing
revrse sorting order. most recent VCs first
added code to provide folding
refactored VCs comparator to treat keys with different amount of VCs
refactored VCs comparator to treat keys with different amount of VCs
fixed bug while parsing ints
VCs keys are now atoms instead of ints
Parsing for empty lists (empty snapshots)
fix bug while comparing keys with != number of DCs
first implementation to use big numbers
fixed assert
fixed power
New comparator method, asuming VCs is a list sorted by DCs
Merge branch 'develop' into antidote_comparator
Merge branch 'develop' into antidote_comparator
Merge branch 'antidote_comparator' of https://github.com/SyncFree/eleveldb into antidote_comparator
Fixed comparison of keys with same VC to deferentiate op vs snaps.
Merge branch 'develop' into antidote_comparator
New comparator ignoring VCs name and only taking into consideration
New comparator function using only the MAX value in the VC
File Changes

A c_src/antidote.cc https://github.com/SyncFree/eleveldb/pull/2/files#diff-0 (218)
A c_src/antidote.h https://github.com/SyncFree/eleveldb/pull/2/files#diff-1 (11)
M c_src/eleveldb.cc https://github.com/SyncFree/eleveldb/pull/2/files#diff-2 (11)
M src/eleveldb.erl https://github.com/SyncFree/eleveldb/pull/2/files#diff-3 (1)
Patch Links:

https://github.com/SyncFree/eleveldb/pull/2.patch https://github.com/SyncFree/eleveldb/pull/2.patch
https://github.com/SyncFree/eleveldb/pull/2.diff https://github.com/SyncFree/eleveldb/pull/2.diff

You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #2, or mute the thread https://github.com/notifications/unsubscribe-auth/AGWtn65zkaa-OqF-wg6RIbdm1vuVNiezks5q3-RggaJpZM4KhynI.

@santialvarezcolombo
Copy link
Author

@marc-shapiro I'll talk to Russell again so we can get rid of this code faster...

…ator

* upstream/develop:
  revert previous eleveldb.cc check-in
  add comments for 2.0.33 tag
  remove explicity dependency to mv-bucket-expiry prior to merge
  busted:  failed to compile and unit test before check-in.
  tie this branch to same name in leveldb ... for testing
  Use CreateExpiryModule() instead of new so EE and OS version of leveldb operate correctly
  Port submit_to_thread_queue() from riak_ts-develop to develop
  more iterator hardening against two thread use case.  moved from 2.0 branch.
  clean LEVELDB_VSN in build_deps.sh and update BASHO_RELEASES
  update to use combined leveldb branches:  mv-no-md-expiry and mv-tuning8
  create special case RefDec to deal with AAE multi-process issue.
  remove stale iterator debug logging code
  Make LevelIteratorWrap an embedded object instead of dynamic ref counted object.
  Switch the few remaining naked pointer uses to reference counted pointers.  This fills some remaining race condition holes (of which one recently seen).
  update for tag 2.0.30
  Don't use deprecated erlang:now/0
  unsigned long for memory_sz so it compiles on scaleway / arm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants