-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Name spaces #36
Comments
Well, I've tentatively implemented this using the first candidate above: within functions each variable is treated as global until the first instance where it gets overwritten by some operation; from that point on it is then treated as "local" -- i.e. it thereafter is replaced with a mangled version of its name (regardless of whether that operation was inside an |
I've said this a few times before but I'll repeat it here, now that it has a proper issue. Python also has this problem with scopes, and if it detects "read global, set global", it will fail: >>> x = 0
>>> def f():
... x += 1
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
UnboundLocalError: local variable 'x' referenced before assignment The correct solution is to declare it with >>> def f():
... global x
... x += 1
...
>>> f()
>>> x
1
Implementation wise it's trivial: everything in a function is namespaced and local to that function, unless there is a Note: this should be implemented in at least two commits (one for the scopes, another to support |
We've talked elsewhere about potentially introducing some analog of Python's name spaces. Currently, we do this only for the args to out-of-line functions, giving them mangled names to keep them distinct from like-named variables elsewhere. All other variables used within functions are effectively treated as global.
In Python, each call of a function generates its own distinct namespace, so e.g, in recursive calls, each layer in the recursion can have different values for its (local) variables. To implement that, we'd need to use a stack and lots of push/pop operations, and would likely find that the cost in processor cycles is too steep for whatever minor benefits this might give.
If we set aside cases of recursion, we could get a decent approximation of Python namespaces by mangling any variable names that we want to treat as being "local" to some particular function. In Python, the global/local distinction is handled by explicit declaration, and by a rule that says that, in the absence of explicit guidance, every assignment/"write" output must be local, whereas all other (input/"read") uses of variables will go to the smallest scope in which they are defined. This means that same-looking variables may end accessing quite different memory addresses at different points in a function, and which they access may depend upon conditions that can't be known at compile-time.
I don't think there's any good way to do all this with name-mangling. We can handle references to global b by leaving
b
unmangled, and we can handle references to a quasi-localb
by using a mangled version like__foo_b
. But whichever one of these we pick for the middleprint(b)
above, it won't always give the same behavior as Python does. (That might be a good thing, since Python's way of doing this confuses and frustrates many novices.)If we want to implement some version of this, we'll need a further simplification away from Python (in addition to the no-new-namespace-upon-recursion simplification). That then raises the question: what rules should we use to decide which variables count as "local" in the absence of explicit declaration? One fairly plausible candidate rule would be to say that any variable that the compiler hasn't seen be written to yet within the function will default to being global/unmangled, but starting with the first line where it gets written to (even within an
if
statement) it will then be local/mangled for the rest of the function. Another plausible candidate would say that each variable is either global-throughout-the-function or local-throughout-the-function -- no switching like you can have in Python -- and that being written to anywhere in a function (even within anif
statement) makes a variable therefore count as local throughout the function. This latter option fits better with our "once type C, always type C" rule, but it would require mangling to be done on a later pass, not at initial compilation.My own inclination here is to just say "Everything but function arguments is global, so be careful about using the same variable for different things!" But anyway, I've been reworking name-mangling to fit better with the new type-detection system, so now would be a good point to add some further approximation of name-spaces, if we want it? This may also make a difference to whether I'll do name-mangling in the first pass, which can be slightly more efficient, or wait to do mangled-substitutions later, as would be needed if we opted for something like the latter candidate rule above.
The text was updated successfully, but these errors were encountered: