Skip to content

Latest commit

 

History

History
executable file
·
66 lines (47 loc) · 2.64 KB

zewtopia_visitor_counting.md

File metadata and controls

executable file
·
66 lines (47 loc) · 2.64 KB

This example shows how to use Count Min Sketch to keep track of many items that need to be counted

As is common with probablistic data structures - there is the potential for accuracy to be sacrificed in exchange for speed and memory conservation

Imagine we have a need to keep track of everytime someone visits any of our exhibits in Zewtopia - we have hundreds of exhibits and we get millions of visitors a year.

We don't really care if we are off by a few percentage points - do we?

  • Define and Create the Count Min Sketch Object in Redis: (the smaller the values - the more accurate the count)
CMS.INITBYPROB {cms}:zew:visitor:counts 0.0015 0.25
  • We will use the Count Min Sketch to record visitors as they stop at various exhibits like this:
CMS.INCRBY {cms}:zew:visitor:counts <some_entry_id> <some-val>
  • The following script will populate many tens of thousands of unique happenings. All participating slots should have one of the two total values that are stored in the two deterministic Strings. ( The script also stores two of the counts in redis strings so that we can compare them and their deterministic scores to the scores recorded by our Count Min Sketch):
EVAL "for index = 80,2000 do local vcount = math.random(30,80) redis.call('CMS.INCRBY','{cms}:zew:visitor:counts',ARGV[1], vcount, ARGV[2], index%vcount ) redis.call('INCRBY','{cms}'..ARGV[1],vcount) redis.call('INCRBY','{cms}'..ARGV[2],index%vcount) for inner = 1,708 do redis.call('CMS.INCRBY','{cms}:zew:visitor:counts',ARGV[1]..inner, vcount, ARGV[2]..inner, index%vcount ) end end" 1 {cms} 'zebra_pen' 'lion_pen' 
  • Query the Count Min Sketch: (some rare results may be inflated)
CMS.QUERY {cms}:zew:visitor:counts 'lion_pen' 'zebra_pen' 'lion_pen22' 'zebra_pen100'
1) (integer) 105872
2) (integer) 105833
3) (integer) 52936
4) (integer) 105833
  • Check results against the deterministic strings:
mget {cms}lion_pen {cms}zebra_pen
1) "52936"
2) "105833"

As shown above it can be seen that 100% accuracy is not guaranteed. You can get close to 100% if you size the data structure accordingly.

You could try the above exercise again - this time using these setting for the Count Min Sketch object:

CMS.INITBYPROB {cms}:zew:visitor:counts 0.001 0.01

It is worth noting the difference in memory needed as the settings change

You can use this command to find out what each version requires:

  • the original settings use this much ram:
memory usage {cms}:zew:visitor:counts
(integer) 21408
  • the more accurate version with the smaller rate of error uses this much:
memory usage {cms}:zew:visitor:counts
(integer) 112064