Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DOS CRLF as line ending in parser #138

Closed
wants to merge 1 commit into from
Closed

Support DOS CRLF as line ending in parser #138

wants to merge 1 commit into from

Conversation

alixinne
Copy link
Contributor

I realized some parsers expect \n as the single char for a line ending, which isn't the case on Windows with its notorious \r\n. This PR replaces occurrences of \n in parsers with nom's line_ending which supports both. This also adds a test to prevent regressions.

Example of failure on Windows (on one of my personal projects): https://travis-ci.com/github/vtavernier/tinygl/jobs/393882051#L396

---- test_glsl_from_string stdout ----

Error: GlslParseError(ParseError { info: "0: at line 1:\n#version 460 core\n                 ^\nexpected \'\n\', found \r\n\n1: at line 1, in Alt:\n#version 460 core\n                 ^\n\n" })

@hadronized
Copy link
Owner

Hello, thank you a lot! I’ll try reviewing that tomorrow and merge ASAP!

@AlexTMjugador
Copy link
Contributor

Any news on the merge status for this PR? I also stumbled on this issue on my project and look forward to see it merged.

@hadronized
Copy link
Owner

Parts of this got fixed in a recent PR (#145). Can you check whether something is still missing? Sorry for the delay.

@alixinne
Copy link
Contributor Author

#145 is lacking the following fixes in the helpers:

diff --git a/glsl/src/parsers.rs b/glsl/src/parsers.rs
index 1983f39..064ae11 100644
--- a/glsl/src/parsers.rs
+++ b/glsl/src/parsers.rs
@@ -9,7 +9,7 @@ mod nom_helpers;
 
 use nom::branch::alt;
 use nom::bytes::complete::{tag, take_until, take_while1};
-use nom::character::complete::{anychar, char, digit1, space0, space1};
+use nom::character::complete::{anychar, char, digit1, line_ending, space0, space1};
 use nom::character::{is_hex_digit, is_oct_digit};
 use nom::combinator::{cut, map, not, opt, peek, recognize, value, verify};
 use nom::error::{ErrorKind, ParseError as _, VerboseError, VerboseErrorKind};
@@ -1631,7 +1631,7 @@ pub(crate) fn pp_version_profile(i: &str) -> ParserResult<syntax::PreprocessorVe
 ///
 /// This parser is needed to authorize breaking a line with the multiline annotation (\).
 pub(crate) fn pp_space0(i: &str) -> ParserResult<&str> {
-  recognize(many0_(alt((space1, tag("\\\n")))))(i)
+  recognize(many0_(alt((space1, preceded(tag("\\"), line_ending)))))(i)
 }
 
 /// Parse a preprocessor define.
diff --git a/glsl/src/parsers/nom_helpers.rs b/glsl/src/parsers/nom_helpers.rs
index 44084f7..7bf02c1 100644
--- a/glsl/src/parsers/nom_helpers.rs
+++ b/glsl/src/parsers/nom_helpers.rs
@@ -6,6 +6,7 @@ use nom::character::complete::{anychar, line_ending, multispace1};
 use nom::combinator::{map, recognize, value};
 use nom::error::{ErrorKind, VerboseError, VerboseErrorKind};
 use nom::multi::fold_many0;
+use nom::sequence::preceded;
 use nom::{Err as NomErr, IResult};
 
 pub type ParserResult<'a, O> = IResult<&'a str, O, VerboseError<&'a str>>;
@@ -74,7 +75,13 @@ where
 /// Discard any leading newline.
 pub fn str_till_eol(i: &str) -> ParserResult<&str> {
   map(
-    recognize(till(alt((value((), tag("\\\n")), value((), anychar))), eol)),
+    recognize(till(
+      alt((
+        value((), preceded(tag("\\"), line_ending)),
+        value((), anychar),
+      )),
+      eol,
+    )),
     |i| {
       if i.as_bytes().last() == Some(&b'\n') {
         &i[0..i.len() - 1]
@@ -91,5 +98,5 @@ pub fn str_till_eol(i: &str) -> ParserResult<&str> {
 //
 // Taylor Swift loves it.
 pub fn blank_space(i: &str) -> ParserResult<&str> {
-  recognize(many0_(alt((multispace1, tag("\\\n")))))(i)
+  recognize(many0_(alt((multispace1, preceded(tag("\\"), line_ending)))))(i)
 }

Which means that other PR is not enough to fix line continuations on Windows. I have rebased and updated my PR to include a test for this case.

@AlexTMjugador
Copy link
Contributor

AlexTMjugador commented Jul 11, 2021

FWIW, I've been using @vtavernier's fork with these changes for a while and they work like a charm. I also like the fact that they have some unit tests.

@AlexTMjugador
Copy link
Contributor

After reviewing PR #149 I realized the changes this PR proposes in str_till_eol seem to not strip the carriage return for CRLF line endings, so it could return a string with an unwanted \r character in it. I pushed a commit to my fork, ComunidadAylas@9381db4, that completely strips line endings in both cases.

@alixinne
Copy link
Contributor Author

According to the GLSL 4.60 spec, 3.1:

Lines are relevant for compiler diagnostic messages and the preprocessor. They are terminated by
carriage-return or line-feed. If both are used together, it will count as only a single line termination.
For the remainder of this document, any of these combinations is simply referred to as a new-line.
Lines may be of arbitrary length.

This means we should define a glsl_line_ending that matches either of those 4 strings:

  • \r\n
  • \n\r
  • \r
  • \n

And then we could define the other parsers (str_till_eol and pp_space0) in terms of this, stripping what glsl_line_ending matches?

@AlexTMjugador
Copy link
Contributor

That sounds perfect to me 👍

@alixinne alixinne closed this by deleting the head repository Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants