Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hebrew and digits don't render ("Failed to get emoji face") #49

Closed
mattansb opened this issue Oct 28, 2024 · 11 comments · Fixed by #53
Closed

Hebrew and digits don't render ("Failed to get emoji face") #49

mattansb opened this issue Oct 28, 2024 · 11 comments · Fixed by #53

Comments

@mattansb
Copy link

It seems that AGG devices are unable to render Hebrew text with a numeral. When this is attempted, the following message appears in the console:

Failed to get emoji face: <seemingly random, non-reproducible number/string>
  • ragg version: 1.3.3
  • R version: 4.4.1
  • OS: Win 11

Here is some reproducible code (I'm using {ggplot2} here, but this is not specific to ggplot2, see tidyverse/ggplot2#6013 and rstudio/rstudio#15370):

library(ggplot2)

df <- data.frame(
  x = 1:3,
  y = 0,
  label = c("זה טקסט", 
            "this is text with number 1",
            "זה טקסט עם מספר 2")
)

p <- ggplot(df, aes(x, y)) + 
  geom_label(aes(label = label))

ggsave("plot.png", device = png) # works

plot

ggsave("plot_agg.png", device = ragg::agg_png)
#> Saving 7 x 7 in image
#> Failed to get emoji face:  , 1763456416
#> Failed to get emoji face:  ��ֶ�, 1885696607
#> Failed to get emoji face:  ��ֶ�, -474210312
#> Failed to get emoji face:  ��ֶ�, -474210312
#> Failed to get emoji face:  [

plot_agg

These other devises produce the same pattern:

ggsave("plot.tiff", device = tiff)
ggsave("plot_agg.tiff", device = ragg::agg_tiff)

ggsave("plot.jpeg", device = jpeg)
ggsave("plot_agg.jpeg", device = ragg::agg_jpeg)
@mattansb
Copy link
Author

It seems this behavior is avoided if the string begins with Latin text + Latin text separates the non-Latin text and the number:

library(ggplot2)

df <- tibble::tibble(
  label = c(
    "هذا مكتوب باللغة العربية 1",
    "זה כתוב בעברית 2",
    "هي سنڌيءَ ۾ لکيو ويو آهي 3",
    "العربية English separates 4",
    "English before نڌيءَ 5",
    "English before עברית English separates 6"
  ),
  y = rev(seq_along(label))
)

ggplot(df) + 
  geom_label(
    aes(0, y, label = label)
  )

image

@oriobr
Copy link

oriobr commented Dec 8, 2024

If I understand correctly, this is a package https://github.com/r-lib/textshaping/tree/main related problem.
Because the warning appears there in the file https://github.com/r-lib/textshaping/blob/main/src/string_shape.cpp

@thomasp85 thomasp85 transferred this issue from r-lib/ragg Dec 9, 2024
@thomasp85
Copy link
Member

Thanks for digging into this. It is most likely a textshaping issue, though it baffles me somewhat

@thomasp85
Copy link
Member

thomasp85 commented Dec 9, 2024

With the just released version of textshaping I get better results (no warning and everything gets rendered but the digit is misplaced)

@thomasp85
Copy link
Member

This is due to some wrong expectations in the output of fribidi--I assumed that weak characters such as digits would take on whatever direction they were close to but that is not the case. Instead the shaping needs to take into account the overall direction of the text as deduced by fribidi.

This will take some effort to correct but I'm on it

@oriobr
Copy link

oriobr commented Dec 9, 2024

First, thank you for the great work you are doing.
I downloaded the new version but the results remained the same.
The warnings appear and only the last text is produced.

library(ggplot2)

df <- tibble::tibble(
  label = c(
    "هذا مكتوب باللغة العربية 1",
    "זה כתוב בעברית 2",
    "هي سنڌيءَ ۾ لکيو ويو آهي 3",
    "العربية English separates 4",
    "English before نڌيءَ 5",
    "English before עברית English separates 6"
  ),
  y = rev(seq_along(label))
)

ggplot(df) + 
  geom_label(
    aes(0, y, label = label)
  )

image

sessionInfo()
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=Hebrew_Israel.utf8  LC_CTYPE=Hebrew_Israel.utf8   
#> [3] LC_MONETARY=Hebrew_Israel.utf8 LC_NUMERIC=C                  
#> [5] LC_TIME=Hebrew_Israel.utf8    
#> 
#> time zone: Asia/Jerusalem
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     fastmap_1.2.0     xfun_0.49         glue_1.8.0       
#>  [5] knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29    lifecycle_1.0.4  
#>  [9] cli_3.6.3.9001    reprex_2.1.1      withr_3.0.2       compiler_4.4.1   
#> [13] rstudioapi_0.17.1 tools_4.4.1       evaluate_1.0.1    yaml_2.3.10      
#> [17] rlang_1.1.4.9000  fs_1.6.5

Created on 2024-12-09 with reprex v2.1.1

@thomasp85
Copy link
Member

Strange. However, I'm embarking on a larger internal change to the shaping engine that should hopefully resolve this completely

@mattansb
Copy link
Author

Just to confirm that I get the same results as @oriobr with the new version of textshaping (0.4.1).

Thanks for looking into to the @thomasp85 !

@thomasp85
Copy link
Member

With the latest changes in main I now get:

test

When running the example from @oriobr.

Some notes on how it looks:
You will notice that the arrangement of glyphs in label 4 and 5 differs from how it is presented in the browser. I'm frankly surprised by the difference since textshaping is using the unicode bidi algorithm (as implemented by fribidi) and I would assume all others were doing that as well. The reason for the difference in 4) is that textshaping/fribidi detects the string as rtl (the bidi algorithm specifies that the first encountered strong character determines the global direction). 5) is a bit trickier as I think fribidi (and by extension textshaping) is doing the wrong thing. Weak types such as european numbers should take on the direction of the first strong preceding character (here arabic). I'm in contact with the fribidi developer about this

@mattansb
Copy link
Author

mattansb commented Jan 7, 2025

Wow, thanks @thomasp85 for taking the time to work on this! I've actually learned so much just from trying to pin-point the bug and watching your iterations fixing the problem.
Thanks again!

@thomasp85
Copy link
Member

With the latest change (#60) 5) is now also correct

Moreover, I've now integrated unicodes test suite for bidi implementations (91707 test cases) and textshaping passes them all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants