how to recognize blank, recognize English and Chinese in one model #48

ghost · 2020-04-23T15:54:09Z

Firstly, you codes are great. I trained with SynthText90k dataset and achieved very good performance on English words.

there are several questions. hopefully you can give me a hand. Thank you very much.
thanks for your time.

How to recognize blank in one sentence?
for example，I want to recognize "I love python"
there is blank between I and love. how to handle this problem?
just add blank in alphabet? like this? and prepare for the training data
alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ """
Can we recognize English and Chinese in one model?
if we want to recognize English and Chinese in one model, how to do?
just make alphabet contain all English and Chinese characters? just like this?
alphabet = """0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ是不我一有大在人了中到資..."""
if we want to recognize very long sentence?
do you think it would be better to train with very long sentences or we can just train with short sentence?
because your current model only support text length less than 26. so have to modify the network if I want to support training with long sentence.

The text was updated successfully, but these errors were encountered:

Holmeyoung · 2020-04-25T09:40:02Z

Hi,

these are two different things, recognize sentence and segment sentences. Just add blank in the labels is not recommended.
just make alphabet contain all English and Chinese characters, like what you say.
Calculate the last lstm T length. The longer you resize image width to be, the longer you can train with. One location for one word.

ghost · 2020-04-25T11:30:52Z

Thank you so much for your help.

ghost · 2020-04-26T07:19:16Z

How to recognize blank between two English words?
for my current model, if I input one English sentence then the output will concatenate all the English words together. for example:
inputed image:

recognized result:

A------l-ll-t-h--e--r-e-c--o--g-n-i-tiio---n--a-c-cc--ur-a--c-i-e-s---o--n--t-h---e => Alltherecognitionaccuraciesonthe

So how to recognize the blank between two English words?

Holmeyoung · 2020-04-28T11:51:06Z

Try to give the label

all#the#recognition#accuracies#on#the

replace all the blanks with #, and put the word # in alphabets.py

So, when there is blank, the net will output #, and you can replace # with blank, you will get normal sentences.

You can try as this, but i am not sure about it.

ghost · 2020-04-28T15:28:04Z

thanks for your reply.
maybe blank itself can also be considered as a character. So currently I decide not to replace blank with #.
I decide to add the blank character itself to alphabet and train with English sentences.
wait for my results.
thank you so much.

ghost · 2020-05-05T13:58:17Z

@Holmeyoung
it is not necessary to replace blank with #. just view blank itself as one character and add blank to the alphabet. then prepare for English sentences as training data.
alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'. -,"

my training images:

the following are training progress:

you can find that it works.
finally, I'm very grateful to you for your responses to all my questions.
Thank you again.

ducbluee · 2020-08-07T04:56:26Z

@cvchongci Hi, I am also having problems with the space between words in English. could you please share your model ??? thanks!!!

ghost · 2020-08-13T19:00:45Z

@ducbluee Hi, I used very limited synthetic data to train the model. so the model does not work well on real-world images.
you can follow the way I handle blank.

ghost mentioned this issue May 20, 2020

how to recognize the blank between two English words DayBreak-u/chineseocr_lite#131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to recognize blank, recognize English and Chinese in one model #48

how to recognize blank, recognize English and Chinese in one model #48

ghost commented Apr 23, 2020 •

edited by ghost

Loading

Holmeyoung commented Apr 25, 2020

ghost commented Apr 25, 2020

ghost commented Apr 26, 2020 •

edited by ghost

Loading

Holmeyoung commented Apr 28, 2020 •

edited

Loading

ghost commented Apr 28, 2020 •

edited by ghost

Loading

ghost commented May 5, 2020

ducbluee commented Aug 7, 2020

ghost commented Aug 13, 2020

how to recognize blank, recognize English and Chinese in one model #48

how to recognize blank, recognize English and Chinese in one model #48

Comments

ghost commented Apr 23, 2020 • edited by ghost Loading

Holmeyoung commented Apr 25, 2020

ghost commented Apr 25, 2020

ghost commented Apr 26, 2020 • edited by ghost Loading

Holmeyoung commented Apr 28, 2020 • edited Loading

ghost commented Apr 28, 2020 • edited by ghost Loading

ghost commented May 5, 2020

ducbluee commented Aug 7, 2020

ghost commented Aug 13, 2020

ghost commented Apr 23, 2020 •

edited by ghost

Loading

ghost commented Apr 26, 2020 •

edited by ghost

Loading

Holmeyoung commented Apr 28, 2020 •

edited

Loading

ghost commented Apr 28, 2020 •

edited by ghost

Loading