External string searching library (x-search) written in C++ (C++20)
- libboost-program-options1.74-dev (only for examples)
- liblz4-dev
- libzstd-dev
- cmake
- g++ or clang
We refer to the corresponding Wiki entry: Installation
As a brief example on how to use x-search, we will create a small (very basic) grep-like executable:
// my_grep.cpp
#include <xsearch/xsearch.h>
#include <iostream>
int main(int argc, char** argv) {
auto searcher = xs::extern_search<xs::lines>(argv[1], argv[2], false, 1);
for (auto const& line : *searcher->getResult()) {
std::cout << line << '\n';
}
}
Now, just build it and link against xsearch as described here
Done! We have created a grep-like command line search tool. Let's check if it can be as fast as GNU grep...
# GNU grep:
$ time grep Sherlock opensubtitles.en.txt > /tmp/grep.result
real 0m3.379s
user 0m2.525s
sys 0m0.843s
# Our implementation using x-search
$ time my_grep Sherlock opensubtitles.en.txt > /tmp/my_grep.result
real 0m1.154s
user 0m0.716s
sys 0m0.469s
x-search
provides a simple one-function API call to search on external files.
#include <xsearch/xsearch.h>
// count number of matches:
auto res = xs::extern_search<xs::count>(pattern, file_path, meta_file_path, num_threads, max_num_readers);
Besides
xs::count
,xs::extern_search
is specialized for the following template arguments:
xs::count_lines
: count lines containing a matchxs::match_byte_offsets
: a vector of the byte offsets of all matchesxs::line_byte_offsets
: a vector of the byte offsets of matching linesxs::line_indices
: a vector of the line indices of matching linesxs::lines
: a vector of lines (as std::string) containing the match
After calling xs::extern_search
, the returned shared_ptr of the Searcher instance can be...
- ... joined (
res->join()
): the main thread sleeps until the search process finishes - ... used to access already created results using the iterator:
for (auto r : *res->getResult()) { ... }
- ... ignored: the threads started for the search are joined on destruction.