Skip to content

Commit

Permalink
增加粵語詞彙,提高 recall
Browse files Browse the repository at this point in the history
  • Loading branch information
laubonghaudoi committed Sep 3, 2024
1 parent 01d98d3 commit 73f38b8
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
8 changes: 4 additions & 4 deletions cantofilter/judge.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@
from typing import Set, Tuple

CANTO_UNIQUE = re.compile(
r'[嘅嗰啲咗佢喺咁噉冇啩哋畀嚟諗惗乜嘢閪撚𨳍𨳊瞓睇㗎餸𨋢摷喎嚿噃嚡嘥嗮啱揾搵喐逳噏𢳂岋糴揈捹撳㩒𥄫攰癐冚孻冧𡃁嚫跣𨃩瀡氹嬲掟孭黐唞㪗埞忟𢛴]|' +
r'唔[係得會好識使洗駛通知到去走掂該錯差]|點[樣會做得解]|[琴尋噚聽第]日|[而依]家|家[下陣]|[真就實梗又話都但淨剩只定一]係|邊[度個位科]|' +
r'[嚇凍攝整揩逢淥浸激][親嚫]|[橫搞傾諗得唔]掂|仲[有係話要得好衰唔]|返[學工去歸]|執[好生實返輸]|' +
r'屋企|收皮|慳錢|傾[偈計]|幫襯|求其|是[但旦]|[濕溼]碎|零舍|肉[赤緊酸]|核突|同埋|勁[秋抽]')
r'[嘅嗰啲咗佢喺咁噉冇啩哋畀嚟諗惗乜嘢閪撚𨳍𨳊瞓睇㗎餸𨋢摷喎嚿噃嚡嘥嗮啱揾搵喐逳噏𢳂岋糴揈捹撳㩒𥄫攰癐冚孻冧𡃁嚫跣𨃩瀡氹嬲掟揼揸孭黐唞㪗埞忟𢛴踎𧦠奀啫]|' +
r'唔[係得會想好識使洗駛通知到去走掂該錯差多少]|點[樣會做得解知]|[琴尋噚聽第]日|[而依]家|[真就實梗緊堅又話都但淨剩只定一]係|邊[度個位科]|' +
r'[嚇凍攝整揩逢淥浸激][親嚫]|[橫搞傾諗得唔好]掂|仲[有係話要得好衰唔]|返[學工去翻番到]|執[好生實返輸]|[癡痴][埋線住起身]|[同帶做整溝]埋|[剩淨坐留]低|' +
r'屋企|收皮|慳錢|屈機|隔籬|傾[偈計]|幫襯|求其|家陣|是[但旦]|[濕溼]碎|零舍|肉[赤緊酸]|核突||勁[秋抽]')
MANDO_UNIQUE = re.compile(r'[這哪您們唄咱啥甭她]|還[是好有]')
# “在不把” 因為太多融入粵語所以唔喺判別標準內
# Too many Cantonese loan words have 在不把, so not included in the judgment criteria
Expand Down
2 changes: 1 addition & 1 deletion cantofilter/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.1.1"
__version__ = "1.1.2"

0 comments on commit 73f38b8

Please sign in to comment.