Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to recognize blank, recognize English and Chinese in one model #48

Open
ghost opened this issue Apr 23, 2020 · 8 comments
Open

how to recognize blank, recognize English and Chinese in one model #48

ghost opened this issue Apr 23, 2020 · 8 comments

Comments

@ghost
Copy link

ghost commented Apr 23, 2020

Firstly, you codes are great. I trained with SynthText90k dataset and achieved very good performance on English words.

there are several questions. hopefully you can give me a hand. Thank you very much.
thanks for your time.

  1. How to recognize blank in one sentence?
    for example,I want to recognize "I love python"
    there is blank between I and love. how to handle this problem?
    just add blank in alphabet? like this? and prepare for the training data
    alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ """

  2. Can we recognize English and Chinese in one model?
    if we want to recognize English and Chinese in one model, how to do?
    just make alphabet contain all English and Chinese characters? just like this?
    alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ是不我一有大在人了中到資..."""

  3. if we want to recognize very long sentence?
    do you think it would be better to train with very long sentences or we can just train with short sentence?
    because your current model only support text length less than 26. so have to modify the network if I want to support training with long sentence.

@Holmeyoung
Copy link
Owner

Hi,

  1. these are two different things, recognize sentence and segment sentences. Just add blank in the labels is not recommended.

  2. just make alphabet contain all English and Chinese characters, like what you say.

  3. Calculate the last lstm T length. The longer you resize image width to be, the longer you can train with. One location for one word.

@ghost
Copy link
Author

ghost commented Apr 25, 2020

Thank you so much for your help.

@ghost
Copy link
Author

ghost commented Apr 26, 2020

How to recognize blank between two English words?
for my current model, if I input one English sentence then the output will concatenate all the English words together. for example:
inputed image:
36
recognized result:

A------l-ll-t-h--e--r-e-c--o--g-n-i-tiio---n--a-c-cc--ur-a--c-i-e-s---o--n--t-h---e => Alltherecognitionaccuraciesonthe

So how to recognize the blank between two English words?

@Holmeyoung
Copy link
Owner

Holmeyoung commented Apr 28, 2020

Try to give the label

all#the#recognition#accuracies#on#the

replace all the blanks with #, and put the word # in alphabets.py

So, when there is blank, the net will output #, and you can replace # with blank, you will get normal sentences.

You can try as this, but i am not sure about it.

@ghost
Copy link
Author

ghost commented Apr 28, 2020

thanks for your reply.
maybe blank itself can also be considered as a character. So currently I decide not to replace blank with #.
I decide to add the blank character itself to alphabet and train with English sentences.
wait for my results.
thank you so much.

@ghost
Copy link
Author

ghost commented May 5, 2020

@Holmeyoung
it is not necessary to replace blank with #. just view blank itself as one character and add blank to the alphabet. then prepare for English sentences as training data.
alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'. -,"

my training images:
image
image

the following are training progress:
image

you can find that it works.
finally, I'm very grateful to you for your responses to all my questions.
Thank you again.

@ducbluee
Copy link

ducbluee commented Aug 7, 2020

@cvchongci Hi, I am also having problems with the space between words in English. could you please share your model ??? thanks!!!

@ghost
Copy link
Author

ghost commented Aug 13, 2020

@ducbluee Hi, I used very limited synthetic data to train the model. so the model does not work well on real-world images.
you can follow the way I handle blank.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants