There can be multiple ways to model a certain type of data. When students are coming up with data definitions, they might create something equivalent to a correct structure, but they might instead create something that does not contain all the necessary data or contains possibly redundant information. We want to model these data types as graphs and try to find isomorphisms between them. For example, finding an injective but not isomorphic map between the nodes of a student solution and the nodes of the correct solution might mean that the student’s solution lacks the ability to express necessary data. On the other hand, a surjective but not isomorphic map might indicate that the student’s solution has multiple equivalent representations of the same information. In discovering properties about correct and student graphs, we hope to establish the foundation for a tool that will give students actionable feedback on their data definitions.
We used Alloy* to represent our data types and find mappings between them.
In this project, we worked with algebraic data types: each type of data can be represented as one or more variants, each of which is a collection of fields, where a field is a name to type mapping. For example, here is a (Pyret) data definition for a binary tree type:
data BinTree:
| mt
| node(val :: Number, left :: BinTree, right :: BinTree)
end
Here, the BinTree
can be either the mt
or node
variant. The mt
variant
has no fields, as empty trees contain no data, while the node
variant has
three fields: val
, the numerical data stored at the node, and left
and
right
, the children of this node. In general, our data types can be
self-referential or recursive.
Part of our project was an automated translation from these Pyret definitions
to our Alloy* sigs. First, we have an abstract Type
sig, which has a set
of Variant
s. The Variant
sig contains a sequence of Type
s, which are the
fields of the variant (a sequence in Alloy is a functional Int->Type
mapping).
We represented fields as sequences rather than sets because, as above in the
tree example, a variant might have multiple fields of the same type. Here is
an example of what the BinTree
type graph would look like:
In order to compare a student's data definition to the "correct" instructor
definition, we had three abstract sigs extend Type
: StudentType
,
InstructorType
, and BuiltinType
. StudentType
s and InstructorType
s come
from data definitions written by students and instructors, respectively, while
BuiltinType
are things like numbers and strings which neither students nor
instructors define, but both reference.
To make the process of going from Pyret data definitions to Alloy* specs easy, we've built a Chrome extension that will take a set of definitions from the Pyret editor and produce a series of sigs that represent that data. Currently, we support Pyret files with data definitions that have annotations indicating whether the definition is a student or instructor one. See the following code snippet for an example of valid syntax:
# @Instructor
data foo:
| noFields
| oneField(bar: Number)
end
# @Student
data baz:
| severalFields(a: String, b: String, c: baz)
end
To install the extension, go to chrome://extensions,
enter developer mode, click "Load unpacked", and select the crx
directory
from this project. Then, when you're on code.pyret.org
, you'll be able to
click on the extension favicon to use it.
The purpose of our project is to perform a comparison between student-defined
and instructor-defined data types. To do so, we have Alloy* attempt to find
a function from StudentType
to InstructorType
, or vice-versa.
In order to say that a student's representation is equivalent to the
instructor's, we constrain the function to be a bijection. A bijection itself
is not sufficient: if type A maps to type B, then the variants of type A must
be equivalent to those of type B. To define variant-equivalence, we again
construct a bijection, this time between the variants of type A and those of
type B. This bijection must have the property that if variant A maps to variant
B, they have the "same" fields, where two fields are the same if they are the
same BuiltinType
or they are corresponding student and instructor types from
our initial bijection. The naming and ordering of fields does not matter for
this comparison.
If a student's definitions are not equivalent to the instructor's, it may be that they are more or less expressive. The process for finding whether either of these cases hold is very similar to the equivalence case. We looked at two ways a type could be less expressive than another: it could be missing a variant or a variant could be missing a field (there are other ways, but these were the two we focused on). In the first case, we again found a bijection between types, but then an injection between variants. Variants mapped to each other still had to be equivalent. For the second case, we found an bijection between types and between variants, but let the mapping between fields of a variant be injective. Both of these comparisons can be done in the instructor to student or student to instructor directions: instructor to student when the student was more expressive, and student to instructor when the student was less expressive.
Finally, we also attempted to do some analysis of variants within a type. Consider the following definition:
data BinTree:
| mt
| leaf(val :: Number)
| node(val :: Number, left :: BinTree, right :: BinTree)
end
The leaf
variant is "redundant" in the sense that the same data could be
expressed as a node
with mt
left
and right
fields. We wanted to identify
when a type had a variant which could have all of its data represented through
the other variants of the type. While a programmer could intentionally use this
sort of redundancy to express some information beyond just the data contained in
the type, especially for a beginning programmer, this could be unintentional.
To perform this comparison, we see if there exists a variant such that there is an injective mapping of fields of that variant (variant A) to another (variant B). Variant A is "redundant" if the fields of variant B which are not mapped to could all contain no data: that is, the types of those fields have an empty variant, one with no fields.
There are many more ways that this sort of redundancy could occur, but we felt this was the most likely way a student could make this mistake.
We provide several examples of data definitions and their corresponding sigs in
the sample-types
directory. They illustrate the different comparisons we can
make between different student and instructor definitions.
We believe that our project is a solid foundation for performing comparisons of different data definitions. There are possible applications of this work for introductory courses such as cs0111 and cs0190. Further extensions could include adding more ways to compare student and instructor types, such as if one uses a helper type that another does not, and making more robust the analysis of possibly redundant types.