Skip to content
This repository has been archived by the owner on Nov 5, 2022. It is now read-only.

How to create pronunciation lexicon for Bengali? #23

Open
Rajan-sust opened this issue May 29, 2019 · 6 comments
Open

How to create pronunciation lexicon for Bengali? #23

Rajan-sust opened this issue May 29, 2019 · 6 comments

Comments

@Rajan-sust
Copy link
Contributor

Rajan-sust commented May 29, 2019

For creating a pronunciation of a word, we have to do two task (phoneme finding and splitting into syllable). I think spitting into syllable is a big deal. Expected format [1]. How can I do it programmatically?

[1] https://github.com/google/language-resources/blob/master/bn/data/lexicon.tsv

@pasindud
Copy link
Contributor

Do you mean a program that can take in arbitrary words and output the transcription for that?

@Rajan-sust
Copy link
Contributor Author

Yes, how can I do it?

@pasindud
Copy link
Contributor

pasindud commented Jun 2, 2019

The quick answer is no.

@Rajan-sust
Copy link
Contributor Author

Hope you will share if u find an idea.

@Rajan-sust
Copy link
Contributor Author

We merged lexicon words from [1] and [2]. The total number of unique lexicon is 64969. 4443 unique words of our corpus do not exist in merged lexicon. What can be the best procedure for transcribing 4443 words to lexicon?

[1] https://github.com/google/language-resources/blob/master/bn/data/lexicon.tsv
[2] https://github.com/google/language-resources/blob/master/bn/festvox/lexicon.scm

@pasindud
Copy link
Contributor

Note that [2] is generated from [1].
Only difference is the file types.

  • This conversation is done by running

    cat bn/data/lexicon.tsv | python festival_utils/festival_lexicon_from_tsv.py > bn/festvox/lexicon.scm
    

The transcription guide can be found at [3]

[1] https://github.com/google/language-resources/blob/master/bn/data/lexicon.tsv
[2] https://github.com/google/language-resources/blob/master/bn/festvox/lexicon.scm
[3] https://github.com/google/language-resources/blob/master/bn/transcription.md

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants