-
Notifications
You must be signed in to change notification settings - Fork 0
bgfeldm/Csv2Lucene
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Csv2Lucene Note: code is largely a proof of concept. Description: Csv2Lucene's goal is to bulk index a large amount of huge CSV files quickly. The focus is on the record level and not the file level when creating threads. Multi-Threading on record lines instead of files has advantages when it comes to speed as well as recovery. Requirements: - Quickly index a large amount of huge CSV files. - Thread on record lines not on files. Advantages of threading on the record level instead of by file: - Working on a single huge database dump file. - Faster to index until the very end, keeping all threads busy until the last few lines - Simpler recovery from abrupt application halts, since we are reading a smaller set of files at a time. Note: More than one file can be read at a time, when reading the tail end of one file the beginning of the next file is read, keeping a continuous flow of record lines until the last record of the last file is read.
About
Fast CSV file indexer for Lucene Index
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published