Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions: New features for version 0.2.0? #1

Open
Evelyn-H opened this issue Apr 26, 2019 · 8 comments
Open

Suggestions: New features for version 0.2.0? #1

Evelyn-H opened this issue Apr 26, 2019 · 8 comments

Comments

@Evelyn-H
Copy link
Collaborator

Hey,
Just putting this issue here to list suggestions / ideas for the next version.

  1. port everything to use nom 5.0 (alpha)
  2. ...
@zypeh
Copy link

zypeh commented Apr 29, 2019

I think it is nice if it could add parsing guard feature in next version.

eg: (just my 2 cents)

html =
"<" <tag1: string> ">"
value
"<" "/" <tag1: string> ">"

Basically it adds a runtime constraints to match the tag based on the tag name. And it can be used as a combinator.

@Evelyn-H
Copy link
Collaborator Author

Evelyn-H commented Apr 29, 2019

@zypeh
Hm, I'm not sure I entirely get what you mean. Could you provide a bit more in-depth example?

Edit: Do you mean a way to ensure that the two strings/tags parsed are equal, and make the parser fail otherwise?

This would definitely be useful for parsing non-regular grammars

@zypeh
Copy link

zypeh commented Apr 30, 2019

@Evelyn-H

Do you mean a way to ensure that the two strings/tags parsed are equal, and make the parser fail otherwise?

Yes. 😃 The example I had given is not so precise anyway, my bad. But I would like to know how to implement this feature.

@Evelyn-H
Copy link
Collaborator Author

Evelyn-H commented Apr 30, 2019

Hm, I'd probably make it a bit more general and optionally allow the code block at the end to return a Result.

Maybe something like this:

html =
"<" <left_tag: string> ">" 
value
"<" "/" <right_tag: string> ">"
=> ?{
    if left_tag == right_tag { Ok(result) } else { Err(error) }
}

Note the ? before the code block to signify that it returns a Result. (Just an idea, definitely not the final syntax)

@jgall
Copy link
Contributor

jgall commented Sep 2, 2019

It might be nice to be able to specify ranges of characters. I'm not entirely sure what the best way to do this would be, but something along the following would be really nice.

textdata = (' '-'!'|'#'-'+'|'-'-'~')

as an alternative to manually writing the nom parser for this, which would look like the following:

pub fn textdata<T>(input: T) -> IResult<T, T>
where
    T: InputTakeAtPosition,
    <T as InputTakeAtPosition>::Item: AsChar,
{
    input.split_at_position(|item| is_textdata(item.as_char()))
}

/// TEXTDATA as seen here: https://tools.ietf.org/html/rfc4180#section-2
fn is_textdata(input: char) -> bool {
    (' ' <= input && input <= '!')
        || ('#' <= input && input <= '+')
        || ('-' <= input && input <= '~')
}

I'm also not sure whether it would be better to use characters here, or strings, or the number values (i.e. 0x20-0x21). Another alternative to this could be to use the re_capture! macro in Nom.

@Evelyn-H
Copy link
Collaborator Author

Evelyn-H commented Sep 2, 2019

Yeah, I've been thinking about this too, it's definitely one of the next features I wanna add.

Having a specific syntax for nom-peg would significantly increase the complexity of the procedural macro parsing code though, which is already quite complex.
So I had the same idea of maybe outsourcing it to the regex macros/functions in nom, but I haven't looked into it in detail yet.

One potential problem with that approach could be that, if I remember correctly, the semantics of regexes are slightly different than those of PEG grammars. So, interspersing them could become confusing and result in unintuitive results.

@jgall
Copy link
Contributor

jgall commented Sep 2, 2019

normally, if I were just writing a nom parser and had repeated occurrences of this "all chars in x range" pattern, I'd write a function like the following:

pub fn in_range<T>(a: char, b: char) -> impl Fn(T) -> IResult<T, T>
where
    T: InputTakeAtPosition,
    <T as InputTakeAtPosition>::Item: AsChar,
{
    move |input| input.split_at_position(|item| between(item.as_char(), a, b))
}

fn between(input: char, start: char, end: char) -> bool {
    start <= input && input <= end
}

However within the pattern declaration portion of the grammar! macro I am unable to call rust functions.
The below code does not compile:

textdata: &'input str = (in_range(' ', '!')|"#")

Maybe allowing some syntax for pure rust blocks that return nom parsers within the macro would be an alternative way to implement this.

@Evelyn-H
Copy link
Collaborator Author

Evelyn-H commented Sep 3, 2019

Adding support for calling arbitrary nom parser functions in the grammar! macro is definitely planned too. This way you could write a regular nom function and include it as a nonterminal in the grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants