Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix[parser]: fix bad tokenization of hex strings #4406

Merged
merged 14 commits into from
Jan 9, 2025

Conversation

charles-cooper
Copy link
Member

@charles-cooper charles-cooper commented Dec 18, 2024

this commit fixes parsing of hex strings. it leaves the locations unmodified as to minimize the changes to locations in the source code

What I did

Fix #4405

How I did it

How to verify it

Commit message

this commit fixes parsing of hex strings. there were several issues
with the hex string pre-parser, including:

- modification of the string locations
- incorrectly not exiting the state machine if a non-string token is
  encountered.

this commit fixes the state machine, changes the pre-parser to leave
the locations of hex strings unmodified as to minimize the changes to
locations in the reformatted code vs source code. to see the effect,
print out the reformatted code of the test cases included in this PR
before and after this commit.

this commit additionally adds several sanity checks to the pre-parser so
that the chance of future tokenization bugs is minimized.

Description for the changelog

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

this commit fixes parsing of hex strings. it leaves the locations
unmodified as to minimize the changes to locations in the source code
@charles-cooper charles-cooper marked this pull request as ready for review December 18, 2024 04:51
@cyberthirst cyberthirst added the release - must release blocker label Dec 27, 2024
@cyberthirst
Copy link
Collaborator

please add tests for all 3 error cases as they are listed in #4405

@charles-cooper charles-cooper changed the title fix[parser]: fix bad fixup of hex strings fix[parser]: fix bad tokenization of hex strings Jan 1, 2025
@cyberthirst
Copy link
Collaborator

not very informative:

def test() -> Bytes[10]:
    a: Bytes[10] = x'gloglo'
    return a
ValueError: non-hexadecimal number found in fromhex() arg at position 0

@cyberthirst
Copy link
Collaborator

crashes on our grammar:

def test_hexstring(get_contract):
    src = """
@external
def test() -> Bytes[10]:
    a: Bytes[10] = x \
    'baba'
    return a
    """

    c = get_contract(src)

    assert c.test() == b'\xba\xba'

@cyberthirst
Copy link
Collaborator

what about we pop the hex_string_location in parse and assert at the end that the locations are empty? as a sanity check to prevent confusion between str and hexbytes

@cyberthirst
Copy link
Collaborator

questionable, but maybe ok..

this compiles

def test() -> Bytes[10]:
    x: Bytes[10] = x'  10  '
    return x

@charles-cooper
Copy link
Member Author

what about we pop the hex_string_location in parse and assert at the end that the locations are empty? as a sanity check to prevent confusion between str and hexbytes

good idea, added here 24bb91d

Copy link

codecov bot commented Jan 5, 2025

Codecov Report

Attention: Patch coverage is 16.66667% with 15 lines in your changes missing coverage. Please review.

Project coverage is 44.91%. Comparing base (194d60a) to head (04faac3).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
vyper/ast/pre_parser.py 12.50% 13 Missing and 1 partial ⚠️
vyper/ast/parse.py 50.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (194d60a) and HEAD (04faac3). Click for more details.

HEAD has 137 uploads less than BASE
Flag BASE (194d60a) HEAD (04faac3)
138 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #4406       +/-   ##
===========================================
- Coverage   91.86%   44.91%   -46.96%     
===========================================
  Files         119      119               
  Lines       16640    16677       +37     
  Branches     2801     2806        +5     
===========================================
- Hits        15286     7490     -7796     
- Misses        929     8623     +7694     
- Partials      425      564      +139     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@charles-cooper
Copy link
Member Author

questionable, but maybe ok..

this compiles

def test() -> Bytes[10]:
    x: Bytes[10] = x'  10  '
    return x

i'd say even if this is an issue, it's out of scope because it's not related to the tokenization changes in this PR

@charles-cooper charles-cooper merged commit 9db1546 into vyperlang:master Jan 9, 2025
157 checks passed
@charles-cooper charles-cooper deleted the fix/hexstr-location branch January 9, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release - must release blocker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect HexString Parsing Leads To Compilation Error Or Type Confusion
2 participants