The codebase of HMℓ (HMloc) has all basic components of a type checker: lexer, parser, typer (but no code generation). For testing, there is a test suite and a web demo.
We now give a high-level introduction to each compiler component and its correspondence to
our Scala sources. Note that source file paths are rooted in /shared/src/main/scala/hmloc
.
Class Lexer
in Lexer.scala
is the lexer class. It takes an origin
object
which contains the original source string together with the source file name,
the number of the first line, and some helper functions. Each token has information
about the source code location where it was introduced.
Scala native types are used as token types.
Class OcamlParser
in OcamlParser.scala
is the parser class. It uses parser
combinators functions to parse a subset of OCaml syntax. It consumes the source
and produces terms. Method pgrm
splits the source by the separator
into blocks that are parsed by toplvl
method. toplvl
parses type declarations,
function declarations or other terms.
File syntax.scala
contains surface syntax data types of HMloc
which are immutable, and different from internal representations in the
typer for later type inference. Here we introduce several surface syntax
data types:
- Class
TypeDef
defines type and data constructors.TypeDefKind
is the kind of the declaration, aCls
for data constructor,Als
for type constructor. - Class
Def
defines top level functions - Class
Term
includes HMloc term data types. - Class
IfBody
terms for the body of an if-then-else and match-with term. - Class
Type
defines surface level types.
The typer accepts the abstract syntax tree of a program and performs type inference. It tracks source locations of the program terms. It wraps the inferred type with the provenance that contains the source location of the corresponding terms. For more information about the type system, see section 3 of the paper.
The corresponding files are:
Typer.scala
contains the main typer class.TyperDatatypes.scala
contains classTyperDatatypes
which includes data types to support internal representation of types with mutable states to supportTypeDefs.scala
contains classTypeDefs
and methods for declaring data types.TypeSimplifier.scala
contains type simplification algorithms to simplify inferred types.TyperHelpers.scala
contains classTyperHelpers
that provides helper methods for the typer.
The typer (class Typer
) works with a typing context (class Ctx
) which
mainly maintains all global and local bindings of names to their types.
The typer accepts a type (surface level), a top level definition, or type constructor
definitions or terms. It types the term/statement and returns the resulting type.
All the typing errors are collected and reported at once. Any new bindings
and definitions are added to the context. The resulting type
is sent to the type simplifier and is finally expanded, i.e., converted
back to types in the surface syntax for presentation.
The Typer
also defines builtin bindings and types.
TyperDatatypes.scala
defines internal representation of types.
- Class
PolymorphicType
represents a type with universally quantified type variables. By convention, in the type body, type variables of levels greater than the polymorphic type's level are polymorphic. - Class
SimpleType
is a general type form of all types. It has a fieldlevel
for level-based polymorphism.- Class
FunctionType
is a function type. - Class
TupleType
is a tuple type. The tuple's fields are a list of types. - Class
ProvType
is a derived type form to store more type provenance information. - Class
TypeRef
is a reference to named types - builtin or user-defined It has a list of type arguments, for parametric type definitions. - Class
TypeVariable
represents a type variable, which is unified with other types for type inference. It has mutable state to store types it's been unified with. - Class
RigidTypeVariable
is used when unifying type signatures with inferred types. They can only be unified with themselves.
- Class
- Class
TypeProvenance
stores the source code location where a type is introduced.
It has two methods for typing -
typeTerm
accepts a term and types it. It also tracks the source code location of the term using aTypeProvenance
. When two inferred types need to be unified it generates a data flow constraint and passes it on to theUnificationSolver
typeType
accepts a surface level type and converts it into an internal type representation.
TypeDefs.scala
defines the internal representation of classes and aliases. It
defines the processTypeDefs
method which type checks new type definitions and
adds them to the Ctx
.
UnificationSolver.scala
defines the UnificationSolver
class which unifies types.
Unification is particularly interesting because it tracks the data flow through
source code locations. As described in section 4 of the corresponding paper, it uses
the provenance of types, which contains the source code locations,
to track the data flow.
-
class
Unification
stores a sequence of data flows that shows how two types are unified -
abstract class
DataFlow
shows how two types are unified- class
Constraint
is the smallest data flow and shows two types unified together - class
Constructor
is a data flow that unifies the arguments of a constructor type tuple type, function type, record types etc.
- class
-
method
unify
unifies two types and propagates any newly generated constraints -
method
createErrorMessage
creates the textual representation of the error if two types cannot be unified.
Unification logic can be broadly described as follows -
- A type can be unified with a type variable with a
Constraint
. The unification is recorded in with type variable. The type is unified with all other types the type variable has been unified with. - Two primitives can be unified with a
Constraint
if they are equal otherwise it's an error - Two constructor types can be unified with a
Constraint
and their type arguments can be unified with aConstructor
data flow, if they are equal otherwise it's an error