Sometimes it is useful to create a fuzzy dictionary or fuzzy database that is unique for a document, so we need to change the database on demand.
Fuzzy Dictionaries are a powerful but mostly unknown feature of Kofax Transformation. A Format Locator normally uses a regex or a pattern, but if it is given a Dictionary instead of a regex then it becomes a Fuzzy Dictionary Locator, which has the advantage over a Database Locator that it finds ALL references to a word, whereas a Database Locator only finds ONE reference (and that may not be the first one in the document)
In Kofax Transformation Dictionary and Database files are static. There are two ways that they are designed to be changable
- Update the import text file. Kofax Transformation Modules Recognition Server, and KTA Extraction Activity and RPA's Document Transformation Server DTS will all check upon opening a new batch whether the database import files have changed - they will then be imported before the batch is processed.
- Search and Match Server has a setting to regularly update a remote fuzzy from it's SQL or text file source.
This guide will show you how the safely make a dictionary or database dynamic. It will describe the steps for a database, but the dictionary uses exactly the same approach. You also need to consider that documents could be processed in parallel and so you need to ensure that the database for one document cannot leak to another document - the script below is safe for parallel processing because each database has a GUID in its name, and the project file is NEVER saved.
-
Create a fuzzy database with sample entries or even empty entries.
-
Add a Database Locator that uses this fuzzy database. Test that it works.
-
Add a Script Locator before the Databaselocator called SL_CreateTempDatabase and one after called SL_DeleteTempDatabase
The script does the following- Create a GUID for this document (This ensures no conflict between this document and another document in parallel processing)
- Create a databasefile right next to the original import file.
- Create a new temporary database.
- Imports the temporary database.
- Alters the Database Locator to point to the new database. This is a potentially dangerous step, because the project is altering itself. Make sure that you do not save your project in an invalid state. You can corrupt your project file if you make a mistake here.
-
After the Database Locator has finished running, the locator is pointed back to the original database and the temporary database files are deleted.
MAKE SURE THAT YOUR DOCUMENTS ARE CLASSIFIED (hotkey F5) BEFORE TESTING!! pXDoc.ExtractionClass is needed to find the Database Locator Definition.
Option Explicit
' Class script: NewClass1
Private Sub SL_CreateTempDatabase_LocateAlternatives(ByVal pXDoc As CASCADELib.CscXDocument, ByVal pLocator As CASCADELib.CscXDocField)
Dim TempName As String
TempName=Database_CreateTemp("Items","Windows 7;10;190.00") 'copy the format of a database, but give it new content
DatabaseLocator_SetDatabase(pXDoc.ExtractionClass, "DL_Items",TempName) 'Point the database locator at the new database
End Sub
Private Sub SL_DeleteTempDatabase_LocateAlternatives(ByVal pXDoc As CASCADELib.CscXDocument, ByVal pLocator As CASCADELib.CscXDocField)
DatabaseLocator_SetDatabase(pXDoc.ExtractionClass,"DL_Items", "Items") 'point the database locator back to the default database
Database_DeleteTemp("Items_Temp")
End Sub
Private Function Database_CreateTemp(DatabaseName As String, Values As String) As String
'This function copies a database with new values
Dim Database As CscDatabase, TempDatabase As New CscDatabase, F As Long, TypeLib As Object, GUID As String
Set TypeLib = CreateObject("Scriptlet.TypeLib") ' this library can create GUIDs
GUID=Mid(TypeLib.GUID,2,36) ' remove { and }
If Project.Databases.ItemExists(DatabaseName+ "_TEMP") Then Database_DeleteTemp(DatabaseName) 'clean up old temp database if it's there
Set Database=Project.Databases.ItemByName(DatabaseName)
If Not Database.DatabaseType=CscDatabaseType.CscFUZZYLocalType Then Err.Raise(487,,"Database '" & Database.Name & "' must be a fuzzy local database!")
TempDatabase.Name= Database.Name & "_TEMP"
TempDatabase.DatabaseType=CscDatabaseType.CscFUZZYLocalType
TempDatabase.ImportFilename = Replace(Database.ImportFilename, ".txt", "_" & GUID & ".txt")
TempDatabase.DelimiterChars=Database.DelimiterChars
Open TempDatabase.ImportFilename For Output As #1
Print #1, vbUTF8BOM; 'make the database file UTF-8 compatible
'Write the captions to the new database if they are there
If Database.FirstLineIsCaption Then
For F = 0 To Database.FieldCount-1
Print #1, Database.FieldName(F); 'The final ";" prevents a newline being printed
If F<Database.FieldCount-1 Then Print #1, Database.DelimiterChars ; 'print a delimiter between the columns
Next
Print #1, "" ' New line
Print #1, Values
Close #1
End If
TempDatabase.FirstLineIsCaption=Database.FirstLineIsCaption
TempDatabase.DelimiterChars=Database.DelimiterChars
TempDatabase.AutoUpdate=False
TempDatabase.DetectFieldCount(TempDatabase.DelimiterChars)
Project.Databases.Add(TempDatabase)
TempDatabase.ImportDatabase(True)
Database_CreateTemp=TempDatabase.Name 'This generates the database file (just a copy of the import file) and the fuzzy index file (.crp2)
End Function
Sub DatabaseLocator_SetDatabase(ClassName As String, LocatorName As String, DatabaseName As String)
'Add reference to Kofax Cascade DatabaseLocator
Dim DLMethod As CscDatabaseLocator
If Not Project.Databases.ItemExists(DatabaseName) Then Err.Raise(456,,"Database '" & DatabaseName & "' doesn't exist!")
Set DLMethod=Project.ClassByName(ClassName).Locators.ItemByName(LocatorName).LocatorMethod
DLMethod.DatabaseName=DatabaseName 'setting to the new database
End Sub
Private Sub Database_DeleteTemp(DatabaseName As String)
Dim Database As CscDatabase
Set Database=Project.Databases.ItemByName(DatabaseName)
Kill Database.ImportFilename 'Delete the import file
Kill Database.DatabasePath 'delete the fuzzy index file .crp2
Kill Database.TextFilename 'delete the copy of the text file
Project.Databases.RemoveByName(Database.Name)
End Sub
Function FormatLocator_ReplaceDict(ByVal pXDoc As CASCADELib.CscXDocument,LocatorName As String,DictFileName As String) As String
'Kofax Cascade FormatLocator
Dim FLMethod As CscRegExpLocator, Dict As CscDictionary
Set FLMethod=Project.ClassByName(pXDoc.ExtractionClass).Locators.ItemByName(LocatorName).LocatorMethod
FLMethod.RegularExpressions(0).RegularExpression="§"& DictFileName & "§"
End Function