Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds JSONL connection #88

Merged
merged 12 commits into from
Aug 6, 2024
Merged

Adds JSONL connection #88

merged 12 commits into from
Aug 6, 2024

Conversation

lucaro
Copy link
Member

@lucaro lucaro commented Jul 27, 2024

A connection that reads and writes data to line-wise JSON files. Can be useful for larger-scale distributed extractions on shared collections or just for small-scale testing purposes.

@lucaro lucaro requested a review from ppanopticon July 27, 2024 19:18
Copy link
Member

@ppanopticon ppanopticon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic itself looks good. Surely could be useful. I have a few fundamental comments though:

  • I don't see why this is part of the vitrivr-engine-index module. This means, that I cannot use it of retrieval only (unless I include this module). Why not just have a separate module for this database connector?
  • Having a few tests that make sure the basic functionality works as intended would go a long ways to keep this stable. I'd suggest to implement at least those that I've created for the other database connectors.
  • Personally, I'm not a huge fan of the old Java File API and I'd probably use java.nio for everything new (however, this is not a must just a personal preference).

Other than that, the the PR looks good to me.

import org.vitrivr.engine.core.model.types.Value
import java.util.*

/** A typealias to identify the [UUID] identifying a [Descriptor]. */
typealias DescriptorId = UUID
typealias DescriptorId = @Serializable(UUIDSerializer::class) UUID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works? Amazing!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was surprized too

* A [Type] that represents a [UUID] value.
*/
@Serializable
data object UUID : Type {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I can understand why this is needed: UUIDs (thus far) are only used internally by the engine as IDs. Adding this as a type basically means, that we want to be able to actually use UUIDs as fields. Is that the intent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I had to touch the type system anyway to make everything serializable, I thought adding this would be nice for completeness' sake. This way, the type can be used, for example, in a struct descriptor that needs to deal with external identifiers.

}

override fun update(item: D): Boolean {
LOGGER.warn { "JsonlWriter.update is not supported" }
Copy link
Member

@ppanopticon ppanopticon Jul 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why one cannot support deletes in this case. But why are updates not supported if the entry is stored on a single line anyway? Just update the line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically correct, this will still require to rewrite the entire file. I'm not sure if the complexity, especially since this then requires additional synchronization, is worth the effort.

@lucaro lucaro marked this pull request as ready for review July 31, 2024 08:20
@lucaro lucaro requested a review from ppanopticon July 31, 2024 08:20
@ppanopticon
Copy link
Member

Looks good to me.

@ppanopticon ppanopticon merged commit 27a8dcf into dev Aug 6, 2024
1 check passed
@lucaro lucaro deleted the feature/jsonl-connection branch August 6, 2024 14:03
@lucaro
Copy link
Member Author

lucaro commented Aug 6, 2024

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants