Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special characters gets removed #359

Open
2 of 6 tasks
cfgs opened this issue Jan 5, 2025 · 1 comment
Open
2 of 6 tasks

Special characters gets removed #359

cfgs opened this issue Jan 5, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@cfgs
Copy link

cfgs commented Jan 5, 2025

Description

I'm using Verba (1.28.2) together with Weaviate Cloud. When uploading both .txt- and .pdf-files including the characters åäö, they get stripped out when inspecting the documents uploaded.

E.g example.txt
Uploaded: "Hej detta är ett test, jag bor på en ö"
Output: "Hej detta r ett test, jag bor p en "

When using the built in chat in Verba, it performs well but the referenced document is - as stated - stripped out of åäö. What could be the issue?

Installation

  • pip install goldenverba
  • pip install from source
  • Docker installation

If you installed via pip, please specify the version:

Weaviate Deployment

  • Local Deployment
  • Docker Deployment
  • Cloud Deployment

Configuration

Reader:
Chunker: Token
Embedder: VoyageAI (Model: voyage-multilingual-2, should support Swedish)
Retriever: Advanced
Generator: OpenAI (Model: gpt-4o)

Steps to Reproduce

  1. Uploading a .txt- or a .pdf containing the character åäö through the "Import Data" function in Verba GUI.
  2. Select the uploaded file in the list to the left, press "Import selected".
  3. Go to "Documents".
  4. Click the uploaded document in the list to the left
  5. Inspect the document (which is now opened to the right) and find a sentence that should contain either å, ä or ö. Confirm that the letter is not there.
@thomashacker
Copy link
Collaborator

Interesting, thanks for the issue! I'll look into this 🚀

@thomashacker thomashacker added the bug Something isn't working label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants