Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower the boom on the mess in the saw-script typechecker #2157

Merged
merged 22 commits into from
Dec 17, 2024

Conversation

sauclovian-g
Copy link
Contributor

No description provided.

No functional change intended.
No functional change intended, but could have happened by accident.
When checking a function call, and it's wrong, we expect the error to
be reported as the arguments having the wrong type, not the function
being the wrong type for the arguments we provided.

Therefore, check an application by matching the argument against the
function, not the function against the arguments. In particular, we
should generate a fresh argument type and unify it with the type of
the function, _then_ unify it with the argument expression, not the
other way around.

Someone stepped on this today internally and it was confusing all
around.

Add a test case in test_type_errors.
@sauclovian-g sauclovian-g added type: bug Issues reporting bugs or unexpected/unwanted behavior type: enhancement Issues describing an improvement to an existing feature or capability topics: error-messages Issues involving the messages SAW produces on error usability An issue that impedes efficient understanding and use tech debt Issues that document or involve technical debt topics: error-handling Issues involving the way SAW responds to an error condition labels Dec 11, 2024
This field used to store a copy of the monad type of the do-block the
StmtBind appears in. This is a reasonable thing to have, but it was
creating complications in the typechecker... and it turns out it's
never used. So there's no reason to keep it around.
Check types for being well formed as well as having the expected kind.

This fixes some of the cases where you can freely use unknown typedef
names and they turn into fresh type variables.
This is somewhat messy and will need further cleanup later.
…them.

Export enough logic from the typechecker to make this possible.

This is a hack; doing it properly will take a good bit of further cleanup.

Add some pertinent tests.

Closes #2077.
@sauclovian-g
Copy link
Contributor Author

Provided the CI doesn't fail, this probably ought to get reviewed and merged before I go on.

You'll almost certainly want to review it commit by commit; the overall diff to MGU.hs is completely unreadable.

Apparently whether zipWithM appears otherwise depends on the GHC
version.
Copy link
Contributor

@RyanGlScott RyanGlScott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a first pass, although I will admit that my review did not thoroughly examine some of the denser parts of the code. Please let me know if there are parts that you think warrant further scrutiny.

src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
src/SAWScript/MGU.hs Show resolved Hide resolved
src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
intTests/test_type_errors/err005.saw Show resolved Hide resolved
(and noting that some of your scripts might break)
Fixes a version skew problem where zipWithM has different provenance
in different builds, which then leads to warnings if the imports are
only partially restricted.
Clarify/add comments, make a couple minor adjustments. Some of this is
provisional and may change again.
@sauclovian-g
Copy link
Contributor Author

On the further/deeper scrutiny topic: the one to be suspicious of is 01c4ca8, particularly the statement checking; it's unlikely that any of the other ones that weren't supposed to cause functional changes actually did, and the other changes afterwards are all relatively straightforward.

@RyanGlScott
Copy link
Contributor

On the further/deeper scrutiny topic: the one to be suspicious of is 01c4ca8, particularly the statement checking; it's unlikely that any of the other ones that weren't supposed to cause functional changes actually did, and the other changes afterwards are all relatively straightforward.

I take it that you're referring to test suite failures such as this one?

[23:54:57.507] /workdir/memory-safety/salsa20/solution.saw:3:32-3:40: Unbound type variable LLVMType

That is worrisome indeed. I decided to see how this worked to begin with, and it's quite miraculous that it ever did:

  • Unlike, say, LLVMSpec (which is hard-coded as a type constant), LLVMType is not hard-coded anywhere. How, then, does SAW know how to treat LLVMType differently from other types?

  • The only knowledge of LLVMType that SAW has is that it is listed in the types of various primitives in SAWScript.Interpreter. That turns out to be of vital importance...

  • ...here, which builds the initial TopLevel RW environment. If you trace through the code here, you'll see that it is constructing an initial rwTypes map by mapping each primitive's name to its type (as specified in SAWScript.Interpreter).

    As one example, consider the primitive "llvm_float" "LLVMType". The type is specified as a string, so we need a way to turn the string into a full-blown AST Type. That is done using SAWScript's parser. Specifically, when it sees a type name that is not hard-coded (such as LLVMSpec), it assumes that it is a TyVar:

    BaseType :: { Type }
    : name { tVar (getPos $1) (tokStr $1) }

    TyVar is a somewhat interesting choice, as we will see later.

  • Now let's suppose we want to define a SAWScript function of the following type:

    let f (x : LLVMType) = x;
    

    There are two conceivable types that SAW could give to f:

    1. {LLVMType} LLVMType -> LLVMType (i.e., LLVMType is a forall-quantified type variable)
    2. LLVMType -> LLVMType (i.e., LLVMType is a type constructor)

    SAW chooses which type to give f in the generalize function in SAWScript.MGU.

  • generalize determines which variables are already in scope (and therefore do not need to be generalized) by consulting the type environment here:

    envUnify <- unifyVarsInEnv
    envNamed <- namedVarsInEnv
    let is = M.toList (unifyVars ts M.\\ envUnify)
    let bs = M.toList (namedVars ts M.\\ envNamed)

    If you trace back through this code, the type environment ultimately comes from the initial TopLevel RW, which contains the types of all primitives. Moreover, there is at least one primitive whose type contains a TyVar with the name LLVMType, which means that as far as generalize is concerned, LLVMType is already in scope. As such, generalize will pick the type LLVMType -> LLVMType (without a forall) for f.

Phew, that was a lot! (Apologies if I am repeating things that you already know here, but I found it useful to write this up for my own understanding.)


A related question: what changed in this PR? My guess is that commit 91f9bbf subtly broke the delicate arrangement that allowed this to work previously. Specifically, the checkType case for TyVar pos x looks up the type variable x in the typedefEnv, and if it's not found, it throws the Unbound type variable error seen above. As far as I understand, this is a check that was simply not performed before this PR, and functions like f above will now fail this check since LLVMType is not in the typedefEnv.

One possible (albeit hacky) way to restore the old behavior prior to this PR would be to see if x is in the namedVars of the typeEnv (which contains primitives that use the name LLVMType in their types). Alternatively, we could make LLVMType et al. hard-coded constants, which avoid this problem in the first place. I am somewhat more fond of this alternative idea, as it feels a bit more principled, although it would likely be more work to implement. Let me know what you think.

@kiniry
Copy link
Member

kiniry commented Dec 13, 2024

One possible (albeit hacky) way to restore the old behavior prior to this PR would be to see if x is in the namedVars of the typeEnv (which contains primitives that use the name LLVMType in their types).

"hacky" a thousand times is what got us into this mess, IMO.

@sauclovian-g
Copy link
Contributor Author

I take it that you're referring to test suite failures such as this one?

Actually, unfortunately, I wasn't -- was speaking in general. I didn't mean for you to have to go investigate that problem yourself; I posted elsewhere about it but I guess you saw this first.

However, that explains what the intended purpose of the code at lines 844-847 is. (See #2105, wherein it fails to handle user typedefs.) Note that this relies on the fact that the primitives in the primitives table are not actually declared via the typechecker; if it were, the types like LLVMType would get generalized and forall-bound.

Anyway the right way to fix this is to declare the abstract types. I've been thinking about what the best way to represent them is, and I'll push a draft version shortly.

Add the ability to have aliases of commands, and make :t an alias for
:type; this way the existence of :tenv doesn't prevent you from using
:t for :type.

Along with the previous commit, closes #2161.
…anups

Rename concatSubst to mergeSubst, which seems like a better name for it.

Use pattern guards the way Ryan likes them.
The former is the typing environment for values, that is, maps
variable names to their types. Calling it rwTypes is perfectly correct
but also very misleading in the sense that it's easy to think it maps
type names to their definitions. Since the key type is String there's
no obvious obstacle to this interpretation, and while I haven't found
any mistakes of this form (yet), they shouldn't be encouraged.

Meanwhile, I'm about to extend rwTypedef (which _does_ map from type
names to type information) so it doesn't only contain typedef names,
and consequently it ought to get a different name... and the natural
name is rwTypes. But that clearly won't go. So pick something else.

If anyone has better suggestions, I'm happy to hear them.

No functional change intended.
…types.

Don't populate it with any yet, though.
Create a table of the builtin types that aren't already special cases
in AST.hs, and initialize the named type environment with them.

(Note that some of the special cases in AST.hs probably don't need to
be special cases there and can be moved to this table, but that's for
later.)

This makes the types like LLVMType work again after the
typedef-checking changes.

Note that I haven't yet arranged to typecheck the primitives as part
of loading them. That's coming up, but needs some other fixes first.
Copy link
Contributor

@RyanGlScott RyanGlScott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there!

src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
src/SAWScript/Interpreter.hs Outdated Show resolved Hide resolved
src/SAWScript/AST.hs Outdated Show resolved Hide resolved
src/SAWScript/Interpreter.hs Outdated Show resolved Hide resolved
saw/SAWScript/REPL/Trie.hs Outdated Show resolved Hide resolved
src/SAWScript/MGU.hs Outdated Show resolved Hide resolved
src/SAWScript/Interpreter.hs Outdated Show resolved Hide resolved
Copy link
Contributor

@RyanGlScott RyanGlScott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, @sauclovian-g. I'm eager to see this land!

@sauclovian-g
Copy link
Contributor Author

Yay! 😸 I'll merge it once the CI finishes.

@sauclovian-g sauclovian-g merged commit 4658e03 into master Dec 17, 2024
34 checks passed
@sauclovian-g sauclovian-g deleted the dholland-typechecker branch December 17, 2024 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tech debt Issues that document or involve technical debt topics: error-handling Issues involving the way SAW responds to an error condition topics: error-messages Issues involving the messages SAW produces on error type: bug Issues reporting bugs or unexpected/unwanted behavior type: enhancement Issues describing an improvement to an existing feature or capability usability An issue that impedes efficient understanding and use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants