-
Notifications
You must be signed in to change notification settings - Fork 4
Project1
This page for documentation Project1: Signal
There's three versions of my implementation.
Of course, the multi thread
implementation has the best performance. but it return query mismatch randomly on server (tiny and small set work well). And mismatched query is change every time with same code (if remove race-condition;just single thread). In the end, I switched to a lower performance implementation(master
).
First of all, main idea is aho-corasick algorithm. But my implementation is little bit different. Check from below.
This for find matching patterns in various patterns from query. There is KMP Algorithm for find matching single pattern from query. But it takes O(m + zn)
in set of patterns (m
as total length of patterns, z
as count of patterns and n
as size of query). Aho-corasick can solve this problem in O(m + n + k)
(k
as number of patterns in query).
There is three major parts in Aho-corasick Algorithm.
- Construct Trie for patterns.
- Trace trie nodes and create node if there are no matches.
- Make
Failure Link
andOutput Link
-
Failure Link
for prevent looking back from the beginning. -
Output Link
ensure that works correctly even when sub string exists.
-
- Find matching
- Follow the trie nodes, find the pattern if there is an output link.
Redesign for performance.
- Change Trie structure to 2d array(
maximum state size
×char size(a-z)
) - Define
Final
,Normal
andInit
states as automata. - Design efficient add / deletion algorithm.
- Lazy update
- O(n) addition
- O(n) (or similar) deletion
Process
- Make 2d array for Aho-corasick accept automata. (Table::update_table)
- Define
Init State
as size of patterns. - From
Init State
, trace char of pattern and make node. - It is
Final State
ifState
<Init State
elseNormal State
.
- Define
- Lazy update when add / deletion.
- Insert pattern to buffer when add/deletion occurs.
- Update table collectively, when match requested.
- Resize 2d array if pattern size overflow and clean up.
- Rebuild table.
- Find match
- Trace from
Init State
to until meet-1
(this is out state) - Found matched pattern if
State
<Init State
- Append to result list if no exists in results.
- Trace from
Example
Example Input as
3 // init pattern count
abc
ab
bbc
Q abc
Q bbca
A bc
Q abbc
After init patterns, we can make 2d array as below
a | b | c | |
---|---|---|---|
0 | - | - | - |
1 | - | - | 0 |
2 | - | - | - |
3 | 4 | 6 | - |
4 | - | 1 | - |
5 | - | - | - |
6 | - | 7 | - |
7 | - | - | 2 |
follow update_table()
And abc
requested then start at 3(Init State
), and follow node with 'a'. [3][a]
is 4
, so next state is 4
. Find next char with next state, [4][b]
is 1
. 1
is Final State
(< Init State
). So, there is matched pattern marked as 1
(ab
). And next [1][c]
is 0
and it is Final State
too. Finally, find two accept pattern(0
, 1
). And It is ab
, abc
.
you can comments and make merge requests to fix my codes. 😄