Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(translator_commons): support excluding tags from translation #959

Closed
wants to merge 1 commit into from

Conversation

pfeiwu
Copy link

@pfeiwu pfeiwu commented Dec 11, 2024

Pull request

Issue tracker

Fixes will automatically close the related issue

Fixes #

Feature

Describe feature of pull request
tl,dr: 允许script/table_translator跳过特定的tag

例如,场景:

输入orydarenosiwaza(ory是日语副翻译器引导前缀)

此时的候选

[だれのしわざ]|

page: 1 (of size 5)

  1. [誰の仕業]

...

假如此时我选了2

此时的composition: [{japanese,phony}ok|{japanese,partial}dare=>誰|{abc,japanese}nosiwaza=>諾斯哇咋]

候选

誰[no si wa za]|

page: 1 (of size 5)

  1. [諾斯哇咋]

  2. の仕業

...

后续的segment由于带有abc tag,翻译受到了主翻译器的影响

本次pr使得用户可通过配置

translator/exclude_tags: [japanese]

让主翻译器不再处理带特定tag的segment

Unit test

  • Done

Manual test

  • Done

Code Review

  1. Unit and manual test pass
  2. GitHub Action CI pass
  3. At least one contributor reviews and votes
  4. Can be merged clean without conflicts
  5. PR will be merged by rebase upstream base

Additional Info

@lotem
Copy link
Member

lotem commented Dec 11, 2024

挺好的。

想問問 {abc,japanese}nosiwaza 這一段的 Tag,abcjapanese 是怎麼加上的,有沒有可能刪去 abc
按理說打了這個標籤,就應該產生對應的翻譯。所以直觀地看,與其排除其他標籤,不如刪去 abc

@pfeiwu
Copy link
Author

pfeiwu commented Dec 11, 2024

挺好的。

想問問 {abc,japanese}nosiwaza 這一段的 Tag,abcjapanese 是怎麼加上的,有沒有可能刪去 abc。 按理說打了這個標籤,就應該產生對應的翻譯。所以直觀地看,與其排除其他標籤,不如刪去 abc

刚才看了一下代码,发现affix segmentor其实是有删除abc tag的逻辑的。

  if (!segmentation->back().HasTag(tag_)) {
    if (segmentation->size() >= 2) {
      Segment& previous_segment(*(segmentation->rbegin() + 1));
      if (previous_segment.HasTag("partial") && previous_segment.HasTag(tag_)) {
        // the remaining part of a partial selection should inherit the tag
        segmentation->back().tags.insert(tag_);
        // without adding new tag "abc"
        if (!previous_segment.HasTag("abc")) {
          segmentation->back().tags.erase("abc");
        }
      }
    }
    return true;
  }

在我的例子中,我将affix_segmentor配置在了abc_segmentor的前面(我参考的很多方案都是如此,顺势),导致这个删除没有生效。
通过将segmentor顺序调换,我自己的需求得到了解决:)

@lotem
Copy link
Member

lotem commented Dec 13, 2024

作者想得挺周全。强啊。我很好奇选了一半之后,剩下一段怎么还能有japanese标签。模式匹配应该不成立了呀。得读读代码搞明白。

@pfeiwu
Copy link
Author

pfeiwu commented Dec 20, 2024

作者想得挺周全。强啊。我很好奇选了一半之后,剩下一段怎么还能有japanese标签。模式匹配应该不成立了呀。得读读代码搞明白。

感谢指导。

如果我没有理解错,这就是 // the remaining part of a partial selection should inherit the tag 所说的逻辑:部分选中后剩下的segment继承之前的tag。

anyway 这个改动是不必要的,所以先关掉了。

@pfeiwu pfeiwu closed this Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants