Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to detect and clean up non-idiomatic grammars. #4

Open
kaby76 opened this issue Oct 25, 2024 · 2 comments
Open

Add script to detect and clean up non-idiomatic grammars. #4

kaby76 opened this issue Oct 25, 2024 · 2 comments

Comments

@kaby76
Copy link
Owner

kaby76 commented Oct 25, 2024

See antlr/grammars-v4#4291 (comment)

@kaby76
Copy link
Owner Author

kaby76 commented Oct 29, 2024

Bison/Yacc does not have the equivalent of an optional element, i.e., there is no ?-operator. https://www.gnu.org/software/bison/manual/bison.html

After converting a Bison grammar to Antlr, it would be best to convert the non-idiomatic usages to idiomatic Antlr. There should be scripts to rewrite empty alternatives into rules that use the ?-operator.

So far the scripts are:

  • Detect:
#
# set -x
set -e
set -o pipefail
for dir in `find . -name desc.xml | sed 's#/desc.xml##' | sort -u`
do
	# Find rules that contain top-level empty alts.
	# Note, not complete because the alt may be not empty, but could derive empty.
	dotnet trparse -l $dir/*.g4 2>/dev/null | dotnet trxgrep ' //parserRuleSpec[./ruleBlock/ruleAltList/labeledAlt/alternative[count(./*) = 0]]/RULE_REF' | dotnet trcaret
done

  • Fix:
#
# set -x
set -e
set -o pipefail
for dir in `find . -name desc.xml | sed 's#/desc.xml##' | sort -u`
do
	echo $dir
	# Find rules that contain top-level empty alts.
	# Note, not complete because the alt may be not empty, but could derive empty.
	dotnet trparse -l $dir/*.g4 2>/dev/null > save.pt
	cat save.pt | dotnet trxgrep ' //parserRuleSpec[./ruleBlock/ruleAltList/labeledAlt/alternative[count(./*) = 0]]/RULE_REF/text()' > rules.txt
	# Find locations of use that an operator applied to it.
	for r in `cat rules.txt`
	do
		rr=`echo $r | tr -d '\n' | tr -d '\r'`
		echo Working on $rr
		dotnet trparse $dir/*.g4 2>/dev/null | dotnet trquery replace ' //parserRuleSpec/ruleBlock//element[./atom and not(./ebnfSuffix)]/atom/ruleref/RULE_REF[./text() = "'$rr'"]' "'$rr?'" | dotnet trsponge -o $dir -c
		# Remove empty alt in the rule.
		dotnet trparse $dir/*.g4 2>/dev/null | dotnet trquery delete "  //parserRuleSpec[RULE_REF/text() = '$rr']/ruleBlock/ruleAltList/labeledAlt[alternative/count(./*) = 0 and ./preceding-sibling::*[last()]/self::OR]/(. | ./preceding-sibling::OR[last()])" | dotnet trsponge -o $dir -c
		dotnet trparse $dir/*.g4 2>/dev/null | dotnet trquery delete "  //parserRuleSpec[RULE_REF/text() = '$rr']/ruleBlock/ruleAltList/labeledAlt[alternative/count(./*) = 0 and ./following-sibling::*[1]/self::OR]/(. | ./following-sibling::OR[1])" | dotnet trsponge -o $dir -c
	done
done
  • Rename:

@kaby76 kaby76 changed the title Add script to detect and clean up empty 'alternative' grammars. Add script to detect and clean up non-idiomatic grammars. Oct 29, 2024
@kaby76
Copy link
Owner Author

kaby76 commented Oct 31, 2024

After clean up of non-idiomatic empty productions, I wrote a trnullable app to determine if a parser rule is nullable. It works on a pure grammar AST visitor analysis, which contrasts with the ATN method used in the Antlr4 Tool (checkEpsilonClosure). This is fine because they are dual solutions to the same problem. (Note, there was a paper that noted the same observation that you don't need to construct ATNs ever.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant