Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need more prompt examples for Chinese #5

Open
timwu-ipevo opened this issue Jul 15, 2024 · 1 comment
Open

Need more prompt examples for Chinese #5

timwu-ipevo opened this issue Jul 15, 2024 · 1 comment

Comments

@timwu-ipevo
Copy link

timwu-ipevo commented Jul 15, 2024

Thanks for your great work. The Kaggle notebook effectively generates the transcription file, and I have tested various samples from the MediaTek-Research/formosaspeech dataset. However, regardless of how I modify the prompt, it continues to transcribe incorrectly. For example, let's look at grandchallenge-1st-round/A0001544.wav in the dev split.

Ground truth

「金金分離」是金管會近期所推動的政策,又被稱為「寶佳條款」,起因於寶佳集團近來插旗多家金融機構,金管會認為,同一大股東在多家銀行擁有一定股權,或握有董事席次,會產生「競業禁止」與「營業秘密外洩」等問題,有必要限制最多只能插旗一家金融機構。

Vanilla Whisper(via HF space https://huggingface.co/spaces/sanchit-gandhi/whisper-large-v2)

金金分離是金管會近期所推動的政策又被稱為保家條款其因與保家集團近來插旗多家金融機構金管會認為同一大股東在多家銀行擁有一定股權或握有董事席次會產生競業禁止與營業秘密外洩等問題有必要限制最多只能插旗一家金融機構

GFD

my prompt file

asr_prompt: "以下是繁體中文轉錄文檔,有下列詞彙: 寶佳集團 寶佳條款"
llm_prompt: "以下是繁體中文轉錄文檔,有下列詞彙: 寶佳集團 寶佳條款"

output

"金金分離是金管會近期所推動的政策又被稱為「保加條款」其因於保加集團近來插旗多家金融機構金管會認為同一大股東在多家銀行擁有一定股權或握有董事席次會產生競業禁止與營業秘密外洩等問題有必要限制最多只能插旗一家金融機構"

I believe the way I prompt is probably incorrect. Currently the most complete prompt file is atco2-asr-promot.yaml.
However, it's English. It would be great if you can provide more prompt example for Chinese audio.

@Splend1d
Copy link
Contributor

Splend1d commented Sep 2, 2024

Hi Tim,
Thank you for trying out our model!
We recommend prompting on whisper: target keywords only; prompting on llm: any description phrase.

In addition, we have found a bug that makes the prompt not working as intended.
We have now fixed the bug, would you mind trying this example again? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants