Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor pipelines to include contentSources #83

Merged
merged 50 commits into from
Aug 19, 2024
Merged
Changes from 2 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
f753b9b
refactor pipeliens
faberf Jul 8, 2024
d237eda
adding textquery
faberf Jul 10, 2024
3a5dec1
retrieval task instructions
faberf Jul 10, 2024
d13422c
Initial draft of ContentMergingTransformer
v0idness Jul 11, 2024
9d2c414
Merge branch 'feature/contentpipelines' of github.com:vitrivr/vitrivr…
v0idness Jul 11, 2024
f19fe95
Extended ContentMergingTransformer to include a template and fill mat…
v0idness Jul 15, 2024
6fe6bbb
added unnormalized fusion and fixed score bug
faberf Jul 17, 2024
7b4540b
Merge branch 'feature/contentpipelines' of github.com:vitrivr/vitrivr…
faberf Jul 17, 2024
62a3f7e
Tested ContentMergingTransformer on sample pipeline; pipeline config …
v0idness Jul 17, 2024
1e9837a
Merge branches 'feature/contentpipelines' and 'feature/contentpipelin…
v0idness Jul 17, 2024
1c83e7e
bidirectional content author maps
faberf Jul 17, 2024
a08ab40
allow image captioning to use content as prompt
faberf Jul 17, 2024
01d0ee3
Updated ContentMergingTransformer to transform content to content rat…
v0idness Jul 18, 2024
f64358c
Refactored parameters to reduce redundancy: content to include taken …
v0idness Jul 18, 2024
54e244a
Merge branch 'feature/contentpipelines' of github.com:vitrivr/vitrivr…
v0idness Jul 18, 2024
93ec49a
Refactor regex to not be applied for each retrievable
v0idness Jul 24, 2024
7f4f1b3
bug fixes for content pipelines
faberf Jul 29, 2024
e186494
Merge branch 'feature/contentpipelines' of github.com:vitrivr/vitrivr…
faberf Jul 29, 2024
2d460aa
Merge remote-tracking branch 'origin/dev' into feature/contentpipelines
faberf Jul 31, 2024
e34f891
simplified passthrough, renamed templatetext
faberf Jul 31, 2024
ffe3805
WIP: committing current changes
faberf Aug 8, 2024
bfff8b7
Merge remote-tracking branch 'origin/dev' into feature/contentpipelines
faberf Aug 8, 2024
7b132bd
finished merge with dev
faberf Aug 8, 2024
f2a10ec
reimplemented batched extraction
faberf Aug 8, 2024
6b36992
WIP on feature/contentpipelines
net-cscience-raphael Aug 12, 2024
d93d940
improves log message
net-cscience-raphael Aug 14, 2024
23522da
Define default string and regex as constant
v0idness Aug 14, 2024
e147e2a
Deleted test schema and pipeline files
v0idness Aug 14, 2024
001d47a
Merge branch 'feature/contentpipelines' of github.com:vitrivr/vitrivr…
net-cscience-raphael Aug 14, 2024
af39709
small refactoring
faberf Aug 14, 2024
c650d1a
Merge branch 'feature/contentpipelines' of github.com:vitrivr/vitrivr…
faberf Aug 14, 2024
9c6238e
Adds filter threshold and topk
net-cscience-raphael Aug 14, 2024
544910a
Merge branch 'feature/contentpipelines' of github.com:vitrivr/vitrivr…
net-cscience-raphael Aug 14, 2024
446b6f1
Merge branch 'feature/contentpipelines' of github.com:vitrivr/vitrivr…
faberf Aug 14, 2024
2191431
Merge branch 'feature/contentpipelines' of github.com:vitrivr/vitrivr…
faberf Aug 14, 2024
e2815f7
adds missing field
net-cscience-raphael Aug 15, 2024
442a38f
adds escaping entity name for pg
net-cscience-raphael Aug 15, 2024
0854d69
Adds retrievableId for persisting
net-cscience-raphael Aug 15, 2024
7ade79b
removes postgres escapeing
net-cscience-raphael Aug 15, 2024
1131708
Escapes and lowercases field names
net-cscience-raphael Aug 15, 2024
8286c85
Merge remote-tracking branch 'origin/dev' into feature/contentpipelines
faberf Aug 16, 2024
29f800b
escapes schema in PGVectorConnection
net-cscience-raphael Aug 16, 2024
7127034
changes credentials for test db
net-cscience-raphael Aug 16, 2024
7b79384
maintains lc pg naming convention
net-cscience-raphael Aug 16, 2024
29b105e
bugfix
net-cscience-raphael Aug 19, 2024
bf8b78f
bugfix
net-cscience-raphael Aug 19, 2024
8516eeb
Merge branch 'dev' into feature/contentpipelines
net-cscience-raphael Aug 19, 2024
2cc9cbe
Merge branch 'dev' into feature/contentpipelines
net-cscience-raphael Aug 19, 2024
5fecf89
debug CI
net-cscience-raphael Aug 19, 2024
3f0076f
adds change due immutable retrievable
net-cscience-raphael Aug 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
package org.vitrivr.engine.index.transform

import io.github.oshai.kotlinlogging.KotlinLogging
import kotlinx.coroutines.CoroutineScope
import kotlinx.coroutines.flow.Flow
import kotlinx.coroutines.flow.flow
import org.vitrivr.engine.core.context.Context
import org.vitrivr.engine.core.context.IndexContext
import org.vitrivr.engine.core.model.content.factory.ContentFactory
import org.vitrivr.engine.core.model.retrievable.Retrievable
import org.vitrivr.engine.core.model.retrievable.attributes.ContentAuthorAttribute
import org.vitrivr.engine.core.operators.Operator
import org.vitrivr.engine.core.operators.general.Transformer
import org.vitrivr.engine.core.operators.general.TransformerFactory

private val logger = KotlinLogging.logger {}

class ContentMergingTransformer : TransformerFactory {
override fun newTransformer(name: String, input: Operator<out Retrievable>, context: Context): Transformer {
val contentFields = context[name, "contentFields"]?.split(",") ?: throw IllegalArgumentException("The content merging transformer requires a list of content fields.")
return Instance(
input = input,
contentFactory = (context as IndexContext).contentFactory,
contentFields = contentFields,
name = name
)
}

private class Instance(
override val input: Operator<out Retrievable>,
val contentFactory: ContentFactory,
val contentFields: List<String>,
val name: String
) : Transformer {
override fun toFlow(scope: CoroutineScope): Flow<Retrievable> = flow {
input.toFlow(scope).collect { retrievable: Retrievable ->
val mergedContent = StringBuilder()
contentFields.forEach { fieldName ->
retrievable.getContent(fieldName)?.let { content ->
v0idness marked this conversation as resolved.
Show resolved Hide resolved
mergedContent.append(content)
mergedContent.append("\n")
}
}
if (mergedContent.isNotEmpty()) {
val content = contentFactory.newTextContent(mergedContent.toString().trim())
retrievable.addContent(content)
retrievable.addAttribute(ContentAuthorAttribute(content.id, name))
logger.debug { "Contents from fields $contentFields of retrievable ${retrievable.id} have been merged into a single content element." }
}
emit(retrievable)
}
}
}
}