Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show original sentence on hover #5

Open
jerinphilip opened this issue Feb 14, 2022 · 4 comments
Open

Show original sentence on hover #5

jerinphilip opened this issue Feb 14, 2022 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jerinphilip
Copy link

So the old Google Translate web used to be able to pop out a bubble showing the original text. I always thought this a valuable feature when it was available.

The sentence byte-range annotations in Response are envisioned to be used for this (aside from its use in quality annotations).

Could you implement this feature if it's not too much (using Response.source[idx] corresponding to Response.target[idx]). I expect this to be hard given HTML in place doing things. I'd expect plaintext to be easier, to begin with, and in pursuit of an equivalent for HTML, the bergamot-translator library's HTML / sentence demarcations/notions can also potentially improve.

An over the page show original button could also be useful.

@jelmervdl
Copy link
Owner

I was thinking about this, because it would also be useful during development. But I have no idea how to do it right now. You'd need to identify which sentence you're hovering over. That means going

  1. from cursor position
  2. to HTML/DOM tree position (relatively easy, only estimating the character position is tricky)
  3. to identifying the translation response associated with that node (easy)
  4. to byte position in the translated HTML (ehhh maybe insert a temporary element, then get innerHTML, then count bytes until you encounter that temp element?)
  5. to byte position in the original HTML (just follow the indexes)
  6. to the slice of original HTML that covers the sentence (doable with some Utf8Array magic)
  7. to either removing the HTML, or fixing it so it is valid for that slice (just remove it, then it's doable)
  8. to then displaying the popup in the webpage. (that's the easiest part, hehe)

Okay maybe it is doable. Google figured it out… But it feels like a major undertaking.

@jerinphilip
Copy link
Author

jerinphilip commented Feb 14, 2022

Oh, I know how they did it (because I used it to get some seed data to train a translation model at some point). They projected what was one node earlier to two nodes (In our case this would mean we modify "sentences"). You'll already know the following at construction (C++).

You'd need to identify which sentence you're hovering over.

Your HTML pipeline can potentially inject these dummy nodes and wrap a dummy element around them. Target would be visibility: show and the other would be visibility:hidden. On hover of the parent node, javascript is configured to highlight.

Could be a flag to begin with while experimenting, then open once stable.

Edit: I guess we may or may not be using Response.source.sentenceAsByteRange(...) in this case.

@jelmervdl
Copy link
Owner

jelmervdl commented Feb 14, 2022

Looked at it on https://www.coderepublics.com/howto/how-to-google-translate.php

What Google seems to do is wrap the text node in a <font> tag, or multiple font tags if there are multiple sentence segments. That is something that would be really easy to do inside bergamot-translator as well. On hover, it adds a CSS class to all font elements associated with that sentence to highlight the sentence. It also does an absolutely positioned tooltip with the original sentence. No attempt is made to have the original styling in that original text.

image

I think they picked <font> because nobody is dumb enough to add CSS rules for that element nor does it have any styling of its own. And you can use it almost everywhere, even inside button. Pretty smart. I would imagine it would break Google news though because of #4.

@jelmervdl
Copy link
Owner

Related: https://github.com/jelmervdl/bergamot-translator/tree/html-embed-original-sentence

I'd rather not use the "add font tags everywhere with metadata" way of implementing this as it breaks React websites since we can't properly re-use text nodes in the page for the translated text without modifying the DOM tree too much.

… But I don't know another way of implementing it. Storing sentences by offsets somewhere sounds really difficult for a tree. As does determining at which offset we would be when hovering over some translated text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants