-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for lists and tuples #51
Comments
Huh, I learn something new. I didn't know that thing with squary brackets is tuple (not a list) and it's immutable. Thought it was list initialization. I don't know python, I just use it from time to time. I really ought to sit down for a day and learn it properly Well neither "real" computers saves string into memory. They save number representation. There is a reason why following works in c: char c1='a',c2='b';
char c3=c1+c2;
printf("%d%c",c3,c3); Strings are just character arrays, which are actually number arrays. When doing stuff in "lower level" languages than python, like c/cpp, strings are saved in dynamic arrays and have wrapper class to allow certain methods to be done on them. At least that was how would I implement it. I had idea to make C compiler (since I already talk way to much "low" level instead of python) using LLVM, but it's too much work for some game I play once every 2 years |
Sorry I didn't make it clear. That was an example with a list. my_list = [1, 2, 3]
my_list.append(4)
my_list[3] = 5
my_list.pop() # 5
my_tuple = (1, 2, 3) # (parenthesis optional)
# my_tuple[2] = 4 # error! tuples are immutable
# my_tuple.append(4) # error! tuples do not have .append or .pop
Yeah, I'm aware. However, unlike C (and most languages), mlog does not really have support to access a string's bytes, or characters, or unicode points. So, not even pyndustric could possibly "compile" storing strings to memory. If mlog had an
This is simply not possible with the instructions mlog provides in the 6.0 release, and I don't remember anything in the 7.0 which would enable this either.
No need for a C compiler, just a new LLVM backend would be enough, and we'd get all benefits from having an already-existing optimizing compiler. But this is not the issue here (I wouldn't worry that much about hyper-optimizing mlog programs). |
As another example, if pyndustric compiled: a = "Hey" to
then print(a) couldn't possibly work because there is no way to turn the numbers back to a string in mlog. Now, if pyndustric compiled: a = "Hey" to
then print(a) would work just fine (as it does now), but there's no way to store the characters in memory cells because mlog doesn't have to pick a string's bytes apart. |
How about a = "Hey"
print(a) compiles to (I'm going to mix some higher langauge here)
I looked and haven't seen any instruction that takes in a string besides print, so it seems there is no need to concatinate (for now, correct me if I missed something)
|
Now, that was demonstration of simple static arrays. Having array of strings would be simple array holding pointers to arrays of characters. |
With 95 printable characters, assuming the switch consists of Any string operation also becomes ridiculously expensive. Storing strings in memory is just not feasible (I would not even be willing to merge such a feature - it's too much). Anyway we are deviating from the main topic of implementing lists and tuples, but essentially lists cannot be dynamically stored anywhere in a reasonable way so I won't be willing to merge such a thing for strings. |
Ok, we throw out supporting strings that arent literals in registers. We are left with implementing stack (static) arrays that hold regular numbers (which is really useful and not that hard) |
There is a problem implementing tuples (roughly "statically sized arrays") when variables are dynamically typed with no way of knowing a variable's type at runtime or compile time. Essentially, a 1-tuple is different from a 2-tuple, which is different from a 3-tuple, etc. So if you had: tup = ("hello", "bye")
for elem in tup:
print(tup) When compiling if random() > 0.5:
tup = (1, 2, 3, 4)
else:
tup = ("hello", "bye")
for elem in tup:
print(tup) Here it will either be a 2-tuple or a 4-tuple at runtime, but if tuples were to be implemented using infinite variables (to support strings, which can't go in memory cells), then knowing their length simply becomes impossible. Perhaps we have to give up on implementing tuples. |
For dynamically sized lists, memory cells can indeed be used. But remember that a memory cell can be as small as 64 numbers in capacity. For this code: if random() > 0.5:
arr = [1, 2, 3, 4]
else:
arr = [101, 102]
for elem in arr:
print(elem) When compiling the loop, the only information we have again is just "user is trying to iterate over Assuming the compiler stored "a pointer" in the (All of this applies to number-only tuples too. Perhaps that's what should be implemented, just tuples backed by memory.) Dynamically sized lists would mean the compiler also needs to write some sort of memory allocator, which means reserving some memory for the allocator itself, and would also need to embed routines to create and free allocations (at a minimum; a resize / move routine could also be added for more efficiency). If a list grows large enough that it doesn't even fit in a single memory cell, well, that's a fun problem: you would probably need to handle a dynamically sized amount of memory cells for compiling lists in the general case. I think allocators and dynamic lists are completely overkill for this project. Logic processors in Mindustry have a limited amount of instructions. Embedding so much code, even if it were only embedded when the features are used, feels like far too much. I would rather investigate how to improve the ergonomics of using memory cells. I think supporting things such as: start = 10
end = 20
for elem in Mem.cell1[start:end]:
print(elem)
Mem.cell1[5:10] = Mem.cell1[25:30]
Mem.cell1[40:45] = [1, 2, 3, 4, 5] ...would mostly mitigate the need for having actual lists. Non-growable lists (or tuples) can probably be implemented with the approach I described (compiler generates a pair of pointer + cell per declared list variable), but the fact it could silently fail with this approach (e.g. iterating over some variable when it's not actually a list) is not very nice. If only iteration over |
Sorry it took a while to get back, I was super busy! I thought about implementing tuples. We agree on a lot of stuff, but here are my caviats:
Accessing tuple elements (if they are stored in infinite registers (as they probably should, sorry about my static-stack blip, having stuff beyond numbers in tuples is useful)) should be done via switch case (I don't see any ohter way because of register naming) so that compiler shouldn't generate access points when it encounters for loop, but from information about tuple assignment. I guess it would be reasonable to have some sort of lookahead and see what is the maximum length of tuple to generate switch case acording to that. (each tuple would have its own switch case) For loop would obviously iterate over actual number of elements. Also, compiler could (if told to, for example by flag) instert out-of-bound checks In regards to what if later on programmer assigns a number to variable that was holding a tuple and then tries to iterate: let him shoot himself in the foot. So I guess tuple being represented as Another thing would be to make elements mutable (there is no reason not to, it would just add more instructions to compiled program). The only requirement for tuples is to have size limited at compile time. (hence why my static-stack blip) I checked largest memory cell and was super dissapointed when I figured out it was 512 adresses. I thought I've seen some K somewhere. Such harsh limitations...
I think tuples are a must and if we decide to give up on something heap idea is that! Here are some ideas: It makes sense to hold only numbers there. Programmer knows how much heap he will give to processor (by how much he built ingame) so passing options to compiler as: [ {cell_number , cell_size} ] Memory allocator is a must tho! Having alloc and free isn't that big of a problem. Easiest algorithm would be to have a list of elements holding free sections of heap, and it would get choped up anytime there is alloc, and added to it anytime there is free.1 Ergonomic improvements for accessing cells is great idea. Implementation of for loop should somehow account for all of the possibile things it can iterate over (There is a lot about this to discuss, so maybe later when we see about previous stuff) Footnotes
|
Not sure how to best answer to your comment. Tuples actually compiling to a switch seems reasonable, since we also want the following to compile: index = 2
tup = ('a', 'b', 'c', 'd')
elem_c = tup[index] And indeed the only way to do that at runtime within Mindustry's processor is some sort of switch or jump table. This worries me though: n = 2
backup = None
for i in range(n):
tup = ('a', 'b', i)
if backup is None:
backup = tup
print(backup[2]) # should print 0, but tup[2] is 1 The loop will run its body twice. On the first iteration, a tuple Regarding mutability of tuples, yeah, we can do that. It's not something normally allowed in Python but it can be done. pyndustric would be repurposing "tuples for fixed size, lists for dynamic", both mutable.
Not sure about this. I think the code should specify which cell to use, not a compiler option. But I can see your idea being "easier" on the programmer and harder on the compiler developer. |
Can we just make tuples immovable and anytime someone tries to assign them, we actually provide reference? |
We cannot know the type of variables, so we cannot force them to be "immovable" (if by that you mean |
I mean that when someone tries to copy it ( |
It's your second idea:
I think the only feasable way is to do it like that (by passing tuples around as reference) |
Assuming tup_a = ('a', 'b')
tup_b = tup_a
for x in tup_b:
print(x) compiling to (kind of):
Seems like we have a plan, next would be implementing it and adding tests. In essence tuples compile to pretty much "functions" which need to be "called" at a certain index ( |
Note that: nested_tulpe = ( ("a", "b"), (1, 2), ("c", "d") ) should probably compile fine and result in 4 compiled jump tables (three for the inner ones, then one for the outer one). |
Maybe: tup = ("a", "b")
a, b = tup could also work (well, that would be another feature of tuples). |
I'm glat that that's sorted out. What should we do about figuring out type we are trying to iterate over? |
We do not care about that. Whenever the compiler sees Any of these expressions (and perhaps more) should assume
|
Yes, it's simple if we only do tuples, but what about Env.links() and what if we also decide to implement heap arrays? |
The compiler can simply forbid |
Hmmm, an idea that comes to my mind to prevent problems with tuples and dynamic arrays is for memory managment system to bake in at compile time forbidden addresses (pointers) that are used by tuples and to use other addresses. When we jump to iterate, first we check if we are having "tuple address" or something else, and if it's something else, we pass it to memory managment system |
This would add an extra check every time such an operation occurs, with an overhead of 2 instructions (compare and jump) at a minimum. We shouldn't get too fancy. The 1000 instruction limits isn't a high number. |
I'll need to think about this then |
I would first implement tuples and then worry about dynamic arrays (dynamic arrays could be similar to tuples. Imagine they compile to the same switch table, but instead of jumping back, they jump elsewhere to determine the real address or similar: this would eliminate the check from every access and push the overhead to dynamic ones only). |
Discussion from #50:
The text was updated successfully, but these errors were encountered: