-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathroadmap.html
2421 lines (2171 loc) · 191 KB
/
roadmap.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!Doctype html>
<html lang="en">
<head>
<title>Roadmap</title>
<meta charset="UTF-8">
<!--<link rel="stylesheet" href="css/bootstrap.min.css">-->
<link rel="stylesheet" href="css/style_new.css">
<script src="js/jquery-1.12.1.min.js" charset="utf-8"></script>
<link rel="stylesheet" href="js/embed-2cd369fa1c0830bd3aa06c21d4f14a13e060d2d31bbaae740f4af4.css"><div id="gist28627206" class="gist"></div>
<link rel="stylesheet" href="js/embed-cbe5b40fa72b0964f90d4919c2da8f8f94d7c9f6c2aa49c07f6fa3.css"><div id="gist28627206" class="gist"></div>
</head>
<div class="container">
<header id="navtop">
<a href="index.html" class="logo fleft"><img src="img/logo.png" alt=""></a>
<nav class="fright">
<ul>
<li><a href="index.html">Home</a></li>
<li><a href="about.html">About</a></li>
<li><a href="roadmap.html" class="navactive">Roadmap</a></li>
<li><a href="documentation.html">Documentation</a></li>
</ul>
</nav>
</header>
<div class="Services-page main grid-wrap">
<header class="grid col-full">
<hr/>
<p class="fleft">ROADMAP</p>
<div style="background-color: #dff0d8;padding:4px 8px;border-radius: 4px;" class="grid col-full">
<h4 style="font-size: 16px">
mkDocs version of the roadmap is available at <a href="https://silcnitc.github.io/expl-docs/roadmap" target="_blank">silcnitc.github.io/expl-docs/roadmap</a>
</h4>
</div>
<!--<a class="button" href="">Download as PDF</a>-->
</header>
<aside class="grid col-one-quarter mq2-col-full">
<menu>
<ul>
<li class="sec"><a href="#nav-stage0">0. Installation</a> </li>
<li class="sec"><a href="#nav-stage1">1. CodeGeneration for Arithmetic Expressions</a></li>
<li class="sec"><a href="#nav-stage2">2. Introduction to static storage allocation</a></li>
<li class="sec"><a href="#nav-stage3">3. Adding Flow Control Statements</a></li>
<li class="sec"><a href="#nav-stage4">4. User Defined Variables and arrays</a></li>
<li class="sec"><a href="#nav-stage5">5. Adding Functions</a></li>
<li class="sec"><a href="#nav-stage6">6. User defined types and Dynamic Memory Allocation</a></li>
<!-- <li class="sec"><a href="#nav-stage7">7. Register Allocation</a></li> -->
<li class="sec"><a href="#nav-stage7">7. Adding Objects – Data encapsulation</a></li>
<li class="sec"><a href="#nav-stage8">8. Inheritance and Sub-type Polymorphism</a></li>
</ul>
</menu>
</aside>
<body>
<section class="grid col-three-quarters mq2-col-full">
<div class="grid-wrap">
<article class="grid col-full">
<h2>Using the Roadmap</h2>
<p>
This roadmap is divided into several stages, to be done in sequential order. Incrementally you will build a compiler for the ExpL language according to its specification. Links are provided for background reading material wherever appropriate. It will be assumed that you have background in C programming, Data Structures and Principles of Computer Organization.
</p>
</article>
<article class="grid col-full" id="nav-stage0">
<h2>Stage 0 : Installation and Preparation</h2>
<b>Time Estimate :</b> 2 weeks, 5-10 hours/week
<p>
<b>Pre-requisites</b>: NIL<br>
In this stage, you will download and familiarize yourself with the simulation package and learn the compiler design software tools LEX and YACC. Follow the instructions below.
</p>
<p>
<ul>
<li id="otis">1. Install the LEX, YACC and the XSM simulator package. Follow the instructions <a href="install.html" target="_blank">here</a></li>.
</ul>
</p>
<p>
You need to learn two software tools - YACC and LEX which you will use in the project. These tools are somewhat
sophisticated. Fortunately, understanding what is enough for the purpose of our compiler project is not very difficult. The following tutorials will help you through this process.
</p>
<p>
If you are not already familiar with the tools LEX and YACC do the following:<br>
<ul>
<li id="otis">2. Complete the <a href="lex.html" target="_blank">LEX tutorial</a>.</li>
<li id="otis">3. Complete the <a href="yacc.html" target="_blank">YACC tutorial</a>.</li>
<li id="otis">4. Complete the <a href="ywl.html" target="_blank">Using YACC with LEX tutorial</a>.</li>
</ul>
</p>
<p>
If you are not already familiar with using the GNU debugger, do the following:
<ul>
<li id="otis">5. Complete the <a href="gdb.html" target="_blank">GDB tutorial</a>.</li>
</ul>
</p>
<p>
The next step is to understand the target machine envionment. You must carefully go through the following tutorial
before proceeding to the next stage of this roadmap.
</p>
<p>
<ul>
<li id="otis">6. Complete the <a href="xsm-environment-tut.html">XSM execution environment tutorial</a>.</li>
</ul>
</p>
<p>
With this, you are ready with all the required pre-requisites to proceed further in this roadmap.
</p>
<div class="up column3 mright"> <a href="#navtop" class="ir">Go up</a> </div>
</article>
<article class="grid col-full" id="nav-stage1">
<h2>Stage 1 : Code generation for Arithmetic Expressions</h2>
<b>Time Estimate :</b> 0.5 week, 5-10 hours/week
<p>
<b>Prerequisites</b>:<br/>
<ul>
<li id="otis">1. You must be comfortable with LEX and YACC. ( If you are not, you must first do <a href="lex.html" target="_blank">LEX</a> tutorial, <a href="yacc.html" target="_blank">YACC</a> Tutorial and <a href="ywl.html" target="_blank">Using Lex with Yacc</a> tutorials.)</li>
<li id="otis">2. You must have completed the <a href="xsm-environment-tut.html" target="_blank">XSM environment tutorial</a> <b>including all the exercises</b> before staring this stage.</li>
</ul>
</p>
<p>
<b>Learning Objectives</b>:<br/>
<p>In this stage, you will:</p>
<ul>
<li id="otis">1. Parse an input arithmetic expression and create an expression tree using YACC and LEX.</li>
<li id="otis">2. Recursively traverse the tree and generate assembly language code. The allocation of registers for storing results of intermediate computations will be handled enroute.</li>
</ul>
<hr>
</p>
<p>
A compiler is a software that takes as input a high level program and produces a machine
recognizible target program as output. The high level program typically allows variables,
expressions, conditionals, iterative constructs, user defined types, functions etc.
The low level target program on the other hand will be a sequence of assembly level
instructions that can be run on a target machine (which is the XSM simulator in this
project).
</p>
<p>
The strategy of the roadmap is to help you build the compiler in stages.
We start here by building a very simple compiler whose input (high level) program
contains only simple arithmetic expressions.
In subsequent stages, we will add more and more constructs to the
input language one by one, learning the relevant theoretical concepts along the way.
</p>
<!-- <p>
In this stage, you will implement a very simple compiler that can take an arithmetic expression as input (from some input file) and generate a target executable file containing XSM instructions to evaluate the expression and output the result.
</p> -->
<p>
We assume that you have implemented the library routine for handling console output,
which you were asked to do in the <a href="xsm-environment-tut.html" target="_blank">XSM execution environment tutorial</a>.
</p>
<p>
Consider arithmetic expressions with the following syntax.
<div class="syntax">
E : E + E | (E) | NUM
</div>
Where the <a href="lex.html#navyytext" target="_blank">lexeme</a> <b>NUM</b> correspond to integers. Assume left <a href="yacc.html#associativity" target="_blank">associativity</a> for '+'. Thus, the <a href="lex.html#token" target="_blank">tokens</a> relevant are NUM and +. The attribute value associated with a number is the number read. Assume that the input file is passed as argument to the main() function in YACC.
</p>
<p>
The lexer must pack the attribute into a tree node of the following structure:
<div class="syntax">
typedef struct tnode{<br>
 int val;
 char *op; //indicates the name of the operator for a non leaf node<br>
 struct tnode *left,*right; //left and right branches<br>
}tnode;<br>
<br>
#define YYSTYPE tnode*
</div>
Since the semantics actions in the parser must build the tree, the following function must be written:
<div class="syntax">
/*Make a leaf tnode and set the value of val field*/<br/>
struct tnode* makeLeafNode(int n);<br/>
<br/>
/*Make a tnode with operator, left and right branches set*/<br/>
struct tnode* makeOperatorNode(char op,struct tnode *l,struct tnode *r);
</div>
</p>
<p>
<b>Task 1</b>: Build the expression tree for the given input.<br>
<b>Exercise 1</b>: Output the prefix and postfix forms of the expression from the tree.<br>
</p>
<p>
(Note: You would have already completed this task if you have done the <a href="ywl.html" target="_blank"> Using Yacc With Lex tutorial </a>).
</p>
<p>
Now, comes the next task - to generate assembly language program equivalent for the expression and write it out into an executable file in the XEXE format. Once this is done, you can use the simulator to load the XEXE file into the memory of the XSM machine and execute it as outlined in the <a href="xsm-environment-tut.html" target="_blank"> XSM run time environment tutorial</a>.
</p>
<p>
To do this, one needs to know the following:<br>
<ul>
<li id="otis">1. The <a href="abi.html#nav-XSM-instruction-set" target="_blank">machine model and the instruction set</a> of the target machine.</li>
<li id="otis">2. Format of the <a href="abi.html#nav-XEXE-executable-file-format" target="_blank"> executable file</a>.</li>
<li id="otis">3. You need to know the address in the memory (in the target machine) where each instruction you generate will be loaded (by the OS loader). This is because program control instructions like JMP, CALL etc., requires specification of the jump address.
</p>
<p>
As already outlined in the <a href="xsm-environment-tut.html" target="_blank"> XSM run time environment tutorial</a>, the header will be loaded into addresses 2048-2055. The first instruction generated by you will be loaded to the address 2056. Each XSM instruction occupies 2 memory words. Hence, the next instruction will be loaded at address 2058 and so on. The entry point field of the header must contain the address of the first instruction to be fetched and executed.</li>
<li id="otis">4. You need to fix the memory addresses where variables and other data is stored. For example, for each variable in the program, the compiler will have to allocate storage space in memory. The ABI stipulates that the region for this is the <a href="abi.html#nav-virtual-address-space-model" target="_blank"> stack region</a>. Thus each variable must be stored in some address between 4096 and 5119.</li>
<li id="otis">5. Since XSM machine stipulates that arithmetic and logic instructions can be done only when operands are loaded into machine registers, we need to load the contents of variables/constants in the program into the machine registers before processing them. This brings in the problem of register allocation. The XSM machine makes available 20 registers (R0-R19) for the compiler.</li>
</ul>
</p>
<p>
Of the above, the XSM execution environment tutorial has already explained (1) and (2). Evaluation of expressions do not involve either storage allocation or program control transfer (JMP). Hence, we will not take up (3) and (4) at this stage. However, we need to solve (5) now.
</p>
<p><h3>What must be the evaluation strategy?</h3></p>
<p>
Let us take an example:<br>
If you are given a two node expression tree as shown below corresponding to the expression (3+2):<br>
</p>
<p>
<img src="img/tree1.png">
</p>
<p>
The evaluation strategy will be:
<ul>
<li id="otis">1. Store 3 in a register – say R0.</li>
<li id="otis">2. Store 2 in a register – say R1.</li>
<li id="otis">3. ADD R0, R1.</li>
</ul>
</p>
<p>
The result will be stored in R0 and is sufficient for us. To generate code for the above tasks and write it into a target_file, you must write code as:
<div class="syntax">
fprintf(target_file, "MOV R0, 3");<br>
fprintf(target_file, "MOV R1, 2");<br>
fprintf(target_file, "ADD R0, R1");<br>
</div>
However, life becomes complicated if we have an expression like (3+2)+(5+6) resulting in the following tree.
</p>
<p>
<img src="img/tree2.png">
</p>
<p>
Of course, we can “hand craft” this case also. But the strategy will not generalize. The basic issue is that your compiler does not know the input expression before-hand. Technically speaking, the issue is that the “expression is not available at <b>compile time</b>, but only known at <b>run time</b>”. Your code generation module must be more <i>"intelligent"</i> to handle <i>arbitrary expressions</i>.
</p>
<p>
The root of the problem with the above code is that R0 and R1 were picked by you and not by your compiler. Thus, we must have a <b>register assignment policy</b> (basically a function) that returns a free register whenever we require one. That is, you must design the following functions:
</p>
<div class="syntax">
int getReg() // Allocate a free register
</div>
<p>
That returns the register number of an unallocated register, so that your code for adding 3 and 2 would look like:
</p>
<div class="syntax">
int p = getReg();<br>
int q = getReg();<br>
fprintf(target_file, “MOV R%d, 3”, p);<br>
fprintf(target_ file, “MOV R%d, 2”, q);<br>
fprintf(target_file, “ADD R%d, R%d,”, p,q);<br>
</div>
<p>
In addition to allocating registers, you must also have mechanism to <b>release a register</b> back into the register pool. In the above example, after the ADD instruction is generated R1 can be released and send back to the register pool.
</p>
<p>
For this purpose, you will write a function
</p>
<div class="syntax">
freeReg() // Releases a register.
</div>
<p>
To make the allocation strategy simple, we suggest that you generate target code in such a way that <i>the result of a CPU instruction involving two registers will be always stored in the register with lower index</i>. In the code above the result of the computation is kept in R0 and not R1 so that the register with the higher index value can be released. As a consequence, the freeReg() function does not require any arguments. Instead, freeReg() and getReg() can be designed to keep track of the highest numbered register allocated so far and hence can keep track of the correct register that must be allocated or freed.
</p>
<p>
The following summarizes the register allocation strategy:
</p>
<div class="syntax">
1. Whenever a register is needed, allocate the lowest numbered register that is free. (Thus, give R0 if possible, otherwise R1 etc.)<br>
2. Whenever we free a register, always release the highest used register that was allocated previously. (Thus, if R0, R1 and R2 were allocated, freeReg() must release R2).
</div>
<p>
Finally, we must design a code generation module. The strategy here is to start with an expression tree and do the following:
</p>
<div class="syntax">
1. At the leaf nodes of the tree (corresponding to a NUM), Allocate a new register and store the number  to the register.<br>
2. At the intermediete nodes :<br>
 a. Generate code for the left subtree (recursively). Find out the register holding the result.<br>
 b. Evaluate the right subtree (recursively). Find out the register holding the result.<br>
 c. ADD the contents of the two registers and store the result in the lower numbered register.<br>
 d. Release the higher numbered register and return.<br>
</div>
<p>
In the above box, as step 2.a and 2.b requires finding the index of the register which stores the result of expression evaluation. The simplest strategy is to <b>design a codeGen() function that can take as input an expression tree and generates code for the expression, returning the index of the register storing the result</b>:
</p>
<div class="syntax">
#define reg_index int;<br>
reg_index codeGen( struct node *t) {<br>
 ..<br>
 ..<br>
 return register number storing result<br>
}
</div>
<p>
The codeGen() function takes as input a pointer to the root of an expression tree and generates code for the subtree rooted at that node. After generating code, the function must return the index of the register storing the result. See this <a href="codegen.html" target="_blank">link</a> for furthur details.
</p>
<p>
<b>Task 2</b>: Complete the simple compiler for expression evaluation and generate the executable file. The result of expression evaluation may be stored in the first location of the stack region – memory address 4096. This value may be printed out using the write system call. Note that the XEXE executable format must be adhered so that the XSM simulator can load and execute the file.
</p>
<p>
<b>Note</b>: To run the simulator, you must prepare the library.lib together with the XEXE executable file. Please follow instructions in the <a href="xsm-environment-tut.html" target="_blank">XSM environment tutorial</a>.
</p>
<p>
<b>Exercise 2</b>: Modify the grammar to
<div class="syntax">E : E + E | E*E | E-E| E/E | (E) | NUM</div> Assume standard rules of precedence and associativity.
</p>
<p>
<b>Exercise 3</b>: Redo Exercise 2 assuming that the input expression is given in prefix from.
</p>
<p>
<b>Note:</b> Here we assumed that machine registers never get exhausted. XSM provides 20 general purpose registers and these registers are sufficient for all practial purposes. However, if all registers are exhausted, then space will have to be allocated in memory. We will not address this contingency in this roadmap. If register pool is exhausted, your compiler may stop compilation and flag "Out of registers" error.
</p>
<div class="up column3 mright">
<a href="#navtop"> ↑ </a>
</div>
</article>
<article class="grid col-full" id="nav-stage2">
<h2>Stage 2. Introduction to static storage allocation</h2>
<b>Time Estimate :</b> 0.5 week, 5-10 hours/week
<p>
<b>Prerequisites</b> :
<ul>
<li id="otis">1. You must be comfortable with LEX and YACC. (If you are not, you must first do <a href="lex.html" target="_blank"> LEX tutorial</a>, <a href="yacc.html" target="_blank">YACC Tutorial</a> and <a href="ywl.html" target="_blank"> YACC+LEX tutorial</a>.)</li>
<li id="otis">2. You must have completed the <a href="xsm-environment-tut.html" target="_blank">XSM environment tutorial</a> before starting this stage.</li>
</ul>
</p>
<p>
<b>Learning Objectives</b> :
<ul>
<li id="otis">In this stage, you will extend the expression evaluator of the previous stage to support a set of pre-defined variables with Input/Output and assignment statements. You will get introduced to the notion of <b>static storage allocation</b> enroute. You will also learn to differentiate between statements and expressions and also construct an <i>abstract syntax tree representation</i> for a program.</li>
</ul>
<hr>
</p>
<p>
Consider a simple programming language with the following syntax:
<div class="syntax">
Program ::= BEGIN Slist END | BEGIN END<br>
<br>
Slist ::= Slist Stmt | Stmt<br>
<br>
Stmt ::= InputStmt | OuptputStmt | AsgStmt<br>
<br>
InputStmt ::= READ(ID);<br>
<br>
OutputStmt ::== WRITE(E);<br>
<br>
AsgStmt ::== ID = E;<br>
</div>
Apart from the <a href="lex.html" target="_blank">literal tokens</a>, BEGIN, END, READ, and WRITE are tokens corresponding to keywords "begin", "end", "read" and "write". ID is a token for variables. We will permit only variables [a-z] in this stage.
</p>
<p>
To support variables to appear in expressions, you must add the rule <b>E := ID</b> to the expression syntax used in Stage 1. The above syntax defines a small programming language that permits just straight line programs (programs without conditionals, loops, jumps or such control transfer constructs). There are only 26 pre-defined variables that are supported – a, b,c,..,z. A typical program would look like:
<div class="syntax">
begin<br>
 read (a);<br>
 read (b);<br>
 d = a + 2 * b;<br>
 write (a+d);<br>
end;
</div>
</p>
<p>
We will assume that variables can store only integers. Handling variables of multiple types will be taken up in subsequent stages.
</p>
<p>
A conceptual point to note here is that apart from the addition of variables, the extended language now has two kinds of constructs – expressions and <b>statements</b>. While an expression evaluates to a value (in this case, we limit ourselves to integer expressions), a statement commands the execution of some action by the machine. For example, the statement <i>read(a);</i> instructs the action of reading a variable from the console into a variable a. <i>write(a+d);</i> instructs evaluation of the expression (a+d) and printing the result into the console output.
</p>
<p>
Another important conceptual point to note is that the introduction of variables also demand binding them to storage (memory) locations. The storage location associated with a variable must hold the value of the variable at each point of program execution. A statement (like the assignment statement or a read statement) that alters the value of a variable must result in a change the value stored in the corresponding storage location.
</p>
<p>
In the present case, the compiler can fix the address for each variable in memory right at the time of program compilation. Since the ABI stipulate that storage allocation must be done in the stack region, we can pre-allocate the first 26 memory locations in the stack region of <a href="abi.html#nav-virtual-address-space-model" target="_blank">memory</a> for the variables a-z. Thus, variable <i>a</i> will refer to contents of address 4096, <i>b</i> to contents of address 4097 and so on. Any time the compiler encounters the variable – say <i>a</i>, the address to be looked at is fixed – in this case 4096. Such allocation policy is called <b>static allocation</b>. In later stages you will encounter situations where it will not be possible for the compiler to fix memory address of a variable at compile time. This leads to <b>run time</b> and <b>dynamic memory allocation</b> policies. For now, we will be content with static allocation.
</p>
<p>
To implement the above, your compiler must:
<ul>
<li id="otis">1. Fix the storage location for each variable. As noted above, the first 26 locations of the stack region starting at address 4096 may be assigned for a to z. Note that the XSM machine can store an integer in a single memory location. Hence, for each variable we need to allocate only 1 memory word. Note that "allocation" here means that while generating code, the compiler assumes that the variable a is stored in location 4096, b in location 4097 and so forth.</li>
<li id="otis"><b>Note</b> : Some programming languages stipulate that variables must be initialized to zero. In that case, the compiler must generate code to MOV 0 to each of these locations before generating code for statements in the program. Some machines provide machine instructions that support initializing memory to zero. Certain operating systems would have initialized all memory regions (except those to which code is loaded into) to zero at load time. We will not pursue these issues here.</li>
<li id="otis">2. To translate an assignment statement, the compiler must generate code to evaluate the expression and then MOV the contents of the register storing the result to the memory location allocated for the variable.</li>
<li id="otis">3. To translate a Read statement, the compiler must generate code to invoke the <a href="abi.html#nav-eXpOS-system-library-interface" target="_blank">library</a> function for read, passing the address of the variable as argument. Write is implemented similarly.</li>
</ul>
</p>
<p>
But before getting into code generation, we must create an <b>abstract syntax tree (AST)</b> for the program. An abstract syntax tree is a tree representation of the program, just like an expression tree for expressions. An abstract syntax tree for the above program would look like the following:
</p>
<p>
<img src="img/ast_stage2.png">
</p>
<p>
Observe that each node now needs to store distinguishing information like:
<ul>
<li id="otis">1. Whether it corresponds to a variable, constant, operator, assignment statement, write statement or read statement. </li>
<li id="otis">2. In case of operators, the information on the operator must be present. In the case of constants, the value must be stored in the node. In the case of variables, the node must contain the variable name.</li>
<li id="otis">3. There are also connector nodes which simply join two subtrees of statements together.</li>
</ul>
</p>
<p>
This leads to the definition of the following node structure:
<script src="js/dd1979bba5d35250a0c9419520a6b5b8.js"></script>
</p>
<p>
<b>Task 1</b> : Use Yacc and Lex to generate abstract syntax tree representation of a program. A file containing the source program will be input to your program.
</p>
<p>
Thus, after parsing, we use the syntax directed translation scheme of YACC to construct an intermediate representation – namely, the abstract syntax tree. This phase of compilation is sometimes called the <b>front end</b> of the compiler. The next step is to recursively traverse the expression tree to generate executable code. This is typically called the <b>back end</b>. The output of the front end is generally a <b>machine independent intermediate representation</b> like the AST. The back end of course will be dependent on the target platform.
</p>
<p>
<b>Task 2</b> : Modify CodeGen() function of Stage 1 to generate code for the abstract syntax tree generated as Task 1 above.
</p>
<p>
In the next stage, we will see how program control instructions like if-then-else can be incorporated into the language.
</p>
<p>
<b>Note</b> : An abstract syntax tree is an <a href="https://en.wikipedia.org/wiki/Intermediate_representation" target="_blank">intermediate representation</a> of the source program in a tree form suitable for code generation. There are several other forms of intermediate representations like the three address code form, the static single assignment form etc. This roadmap will be based on the abstract syntax tree representation.
</p>
<p>
In commercial strength compilers, the source is first translated to intermediate forms like the three address form which is a <b>lower level representation</b> (that is the intermediate form is closer to machine code) than the AST. Typically machine independent <a href="https://en.wikipedia.org/wiki/Optimizing_compiler" target="_blank">code optimizations</a> are performed on the intermediate code and only then the back-end code generation is run. This step is followed by another set of machine dependent code optimizations before the target file is finally generated. As these issues are beyond the scope of our project, we will not dwell into these matters further in this roadmap.
</p>
<p>
<b>Exercise 1</b> : Build an <b>evaluator</b> for the program. (Hint: Your front end does not change. But, instead of generating code from the AST, you can recursively "evaluate" it. For storage allocation of variables, you can simply declare an array that can store 26 integers and allocate one entry for each variable).
</p>
<p>
<b>Note</b> : The compiler generates target code which must be executed by the target machine. In our case, the compiler you wrote as Task 2 actually is a <b>cross compiler</b>. This means that your compiler generated target code that is not for your host system, but on some other target platform – which in our case the simulated XSM machine. The evaluator done in Exercise 1 actually does not generate "code" for any machine. Instead, it executes the program in "then and there". Such a program could be classified as an <b>interpreter</b>. (Unfortunately, the standard terminology in literature associated with the term "interpreter" seems to be contradictory to this classification).
</p>
<div class="up column3 mright">
<a href="#navtop"> ↑ </a>
</div>
</article>
<article class="grid col-full" id="nav-stage3">
<h2>Stage 3: Adding Flow Control Statements</h2>
<b>Time Estimate :</b> 1 week, 5-10 hours/week
<p>
<b>Prerequisite</b>: You must read the <a href="label-translation.html" target="_blank">label translation tutorial</a> before proceeding with this stage.
</p>
<p>
<b>Learning Objectives</b>:
<ul>
<li id="otis">In this stage, you will extend the straight-line-program compiler of Stage 2 to support control flow constructs like if-then-else, while-do, break and continue. You will encounter integer and boolean expressions and the notion of <i>type</i> enroute. You will also learn the use of <i>labels</i> for handling control flow constructs.</li>
</ul>
<hr>
</p>
<p>
The if-then-else and the while-do constructs can be added to the source language of Stage 2 by adding the grammar rule:
<div class="syntax">
Ifstmt ::= IF (E) then Slist Else Slist ENDIF<br>
     | IF (E) then Slist ENDIF;
<br>
Whilestmt ::= WHILE (E) DO Slist ENDWHILE;<br>
</div>
To permit logical expressions, we need to add to the grammar the following productions:
<div class="syntax">
E ::= E < E | E > E | E < =E | E >= E | E != E | E == E;<br>
</div>
A simple program in this language to find the largest among three numbers would look like the following:
<div class="syntax">
 read(a);<br>
 read(b);<br>
 read(c);<br>
 if (a < b) then<br>
  if (b < c) then Write(c); else Write(b); endif; <br>
 else<br>
  if (a < c) then Write(c); else Write(a); endif; <br>
 endif; <br>
</div>
Note that we continue to assume that variables hold only integer values. The first task in translation is to complete the front end.
</p>
<p>
There is one important conceptual point to understand here before proceeding to the front end implementation. With the introduction of logical expressions, there are two <b>types</b> of expressions in the language – <b>arithmetic expressions</b> and <b>logical expressions</b>. An arithmetic expression evaluates to an integer value whereas a logical expression evaluates to a <b>boolean value</b> – that is true/false.
</p>
<p>
The guard of an if-else statement or a while-do statement must be a boolean expression. On the other hand, the expression on the right side of an assignment statement must be of integer type as variables are assumed to hold integer values only. In other words, the statements given below are <b>invalid</b>.
<div class="syntax">
if (a+b) then Write(c);<br>
  OR<br>
a = b < c;<br>
</div>
</p>
<p>
Your compiler must flag a "<i>type mismatch</i>" error if such constructs are encountered during the AST construction process. A program with type errors must not pass the compiler's type check scrutiny and the compiler must report error without generating code. Type analysis is a part of the responsibilities of a compiler (normally classified under <a href="https://en.wikipedia.org/wiki/Semantic_analysis_(compilers)" target="_blank">semantic analysis</a>).
</p>
<p>
A simple way to handle this issue is to <i>annotate each node in the AST with a <b>type attribute</b></i> that indicates what is the type of the expression (or subexpression) with this node as the root.
</p>
<p>
For example, consider the AST for the following erratic expression.
<div class="syntax">
d = ( a + b ) + ( c < 3 )
</div>
<p>
<img src="img/ast3.png">
</p>
</p>
<p>
Here, the root of the AST is an assignment node which is <b>typeless</b>. (statements have no type, only expressions have a type associated with them). The left subtree of the root is a variable, and hence has type integer. The right subtree is a <b>+</b> node of type integer. Hence, at the root, there is no type mismatch. However, the right child of the right subtree has type boolean and does not match the operand type for the + operator. Hence the compiler must terminate compilation flagging error "type mismatch". Note that the compiler can stop processing when the first error is encountered without proceeding further with the tree construction.
</p>
<p>
To implement type checking, add a type field in the AST node structure.
<script src="js/09129e56138b0207d595eaefbf2873cd.js"></script>
At the leaf nodes of the tree, since you have either constants or variables, the type must be set to integer. Next, while constructing the tree for intermediate nodes, check whether the types of the children are compatible with the operator at the root. For instance, for the addition operation, the check could be as the following:
<div class="syntax">
E :== E+E {<br>
      if (($1->type != inttype) || ($2->type != inttype)) {<br>
       error("type mismatch"); <br>
       exit();<br>
      } else {<br>
       $$->type = inttype); <br>
    }<br>
</div>
</p>
<p>
If there is no mismatch, you must annotate root node ($$->type) with the proper type (integer in the above case).
</p>
<p>
<li id="otis">
<b>Note</b>: The above check is better done inside the TreeCreate() function so that the YACC file is not cluttered with C statements.
</li>
</p>
<p>
The essential idea is that the type of each node can by synthesized from the types of the subtrees. At any stage, the compiler may terminate flagging error if a type error is found.
</p>
<p>
<b>Task 1</b>: Complete the front-end module (AST construction) for the programming language. You need to
<ul>
<li id="otis">(1) add additional lexical tokens for the new constructs</li>
<li id="otis">(2) make appropriate modifications in the tree node structure including provision for storing type attribute</li>
<li id="otis">(3) modify the TreeCreate() function to have three subtrees passed (for if-then-else) etc.</li>
</ul>
</p>
<p>
<b>Exercise 1</b>: To test the implementation of Task 1, implement an <b>evaluator</b> for the expression tree. Test with simple programs like those for finding the largest of 3 numbers, sum of n numbers (n read from input) etc.
</p>
<p>
The next task is to complete the back-end code generation phase. For better clarity, we will split the task into two steps.
</p>
<p>
<ul>
<li id="otis"><b>Step 1</b>: Generate code with <b>labels</b>. At this stage labels will be placed at various control flow points of the target assembly code so that a JMP instruction will only indicate the label corresponding to the instruction to which transfer of program flow must happen.</li>
<li id="otis"><b>Step 2</b>: Replace the labels with addresses.</li>
</ul>
</p>
<p>
<b>Important note</b>: You must have read the <a href="label-translation.html" target="_blank">label translation tutorial</a> before proceeding any further.
</p>
<p>
We will now look at Subtask 1. Consider the following statement:
<div class="syntax">
while (a < b) {<br>
  a = a+1 ;<br>
}<br>
</div>
</p>
<p>
The expression tree for the above statement would look like:
</p>
<p>
<img src="img/ast31.png">
</p>
<p>
Suppose variable a is bound to address 4096, b to address 4097, then our plan is to generate code that would look like the following:
<div class="syntax">
L1: <br>
MOV R0, [4096] // transfer a to R0<br>
MOV R1, [4097] // transfer b to R1<br>
LT R0, R1 // a<b<br>
JZ R0,L2 // if (a<b) is false goto L2 <br>
MOV R0, [4096] // transfer a to R0<br>
ADD R0, 1 // add 1 <br>
MOV [4096], R0 // transfer sum back to a<br>
JMP L1 // goto next iteration. <br>
L2: <br>
... Next Instruction .. <br>
</div>
</p>
<p>
Note the use of labels L1 and L2 indicating control flow points in the above code. A while statement involves two jumps and two labels. The labels are just symbols that are placed at the start of instructions to which jump instructions must branch to. Placing labels in the code relieves us from bothering about the exact memory address to which jump must be made. Of course, this is only a temporary measure. The final target code must not contain labels.
</p>
<p>
Our strategy here is to first generate code with labels and then replace the labels with addresses. To implement the plan, we may name labels in the program L0, L1,.... We must design an <i>int GetLabel()</i> function that returns the index of the next unused label. Thus the first call to <i>GetLabel()</i> returns 0, next call returns 1 and so forth.
</p>
<p>
The code generation strategy for the while-do statement is illustrated by the following pseudo-code.
<div class="syntax">
int label_1 = getLabel();<br>
int label_2 = getLabel();<br>
fprintf (target_file "L%d", Label_1) // Place the first label here. <br>
Generate code for the guard expression. <br>
Generate code to compare the result to zero and if so jump to label_2 // loop exit <br>
Generate code for the body of the while loop.<br>
fprintf(target_file, "JMP L%d", label_1); // return to the beginning of the loop. <br>
fprintf(target_file, "L%d", label_2); // Place the second label here<br>
</div>
</p>
<p>
<b>Task 2</b>: Complete the code generation with labels for while-do, if-then and if-then-else constructs.
</p>
<p>
Now, we must complete Step 2 of replacing the labels with the correct addresses. This is explained in the <a href="label-translation.html" target="_blank">label translation documentation</a>.
</p>
<p>
<b>Task 3</b>: Read the link specified above and complete the label translation for if-then, if-then-else and the while-do statement.
</p>
<p>
<b>Exercise 2</b>: Test your Task 3 code with the following programs:
<ul>
<li id="otis">(a) program to find the largest for a, b, c (values read from input)</li>
<li id="otis">(b) program to read numbers till 0 is input and output the sum. </li>
</ul>
</p>
<p>
<b>Task 4</b>: Add <b>break</b> and <b>continue</b> statements. Code for these statements need be generated only if they appear inside some while loop. Otherwise, the compiler may simply ignore these statements, generating no code. (The primary task is to keep track of which label to jump to when one of these statements is encountered).
</p>
<p>
<b>Exercise</b> : Add <i>repeat-until</i> and <i>do-while</i> statements to the language with standard semantics.
</p>
<p>
<b>Reading Exercise</b> : Please read the <a href="gdb.html" target="_blank">GNU Debugger (GDB) tutorial</a> ,before proceeding to the next stage for learning the GDB debugger.
</p>
<p>
<b>Note</b>: Often in practice, programming languages allow a program to be split into different functions, written in different source files. In such cases, each file is separately compiled and the compiler generates target code with labels without translating them into addresses. Even variable references will be symbolic and the actual addresses may not be determined. Such target files are called <a href="https://en.wikipedia.org/wiki/Object_file" target="_blank">object files</a>. The compiler will include symbol table information in the object file for translation later. A separate software called the <a href="https://en.wikipedia.org/wiki/Linker_(computing)" target="_blank">linker</a> will collect the information in all the symbol tables and combine the object files into a single executable file replacing labels and symbolic variable references with actual addresses.
</p>
<div class="up column3 mright">
<a href="#navtop"> ↑ </a>
</div>
</article>
<article class="grid col-full" id="nav-stage4">
<h2>Stage 4: User Defined Variables and arrays</h2>
<b>Time Estimate :</b> 1 week, 5-10 hours/week
<br>
<p>
<b>Prerequisites</b>:<br/>
<ul>
<li id="otis">1. You must have completed the <a href="gdb.html" target="_blank">GNU Debugger (GDB) tutorial</a> before starting this stage.</li>
</ul>
</p>
<p>
<b>Learning Objectives</b>:<br>
You will extend the language of Stage 3 to permit users to declare and use variables of <i>integer</i> and <i>string</i> types. You will learn <b>symbol table</b> management enroute.
<hr>
</p>
<p>
In this stage, we allow the program to contain variable declarations of the following syntax:
<div class="syntax">
Declarations ::= DECL DeclList ENDDECL | DECL ENDDECL<br>
<br>
DeclList ::= DeclList Decl | Decl<br>
<br>
Decl ::= Type VarList ;<br>
<br>
Type ::= INT | STR<br>
<br>
VarList ::= Varlist , ID | ID<br>
</div>
</p>
<p>
We will assume hereafter that all variables used in a program must be <b>declared</b> in the declaration section of the program (between the <i>decl</i> and <i>enddecl</i> keywords). Since string type variables are allowed, we will allow string constants as well. (See <a href="expl.html#nav-constant">ExpL specification</a> for details).
</p>
<p>
A simple program in this language to find the sum of numbers entered from the console (until a zero is entered) would look like the following:
<div class="syntax">
decl<br>
  int num, sum;<br>
  str mesg;<br>
enddecl<br><br>
read(num);<br>
sum = 0;<br>
while (num != 0) do<br>
  sum = sum + num;<br>
  read(num);<br>
endwhile;<br>
write("sum is:");<br>
write(sum);<br>
mesg = "good bye";<br>
write(mesg);<br>
</div>
</p>
<p>
It is the responsibility of the compiler to track for various <b>semantic errors</b> as:
<ul>
<li id="otis">1. Flag error if any variable not declared is used.</li>
<li id="otis">2. Flag error if a type mismatch involving any variable is found.</li>
</ul>
</p>
<p>
To this end, while parsing declarations, the compiler transfers the information about variables in a compile time data structure called the <b>symbol table</b>. The symbol table stores the following information about each variable:
<ul>
<li id="otis">1. Name of the variable (known at the time of declaration).</li>
<li id="otis">2. Type (For the present stage, only integer/string).</li>
<li id="otis">3. Size (For the time being, we will assume that all variables have size one).</li>
<li id="otis">4. The memory <b>binding</b> of each variable – that is, static memory address determined by the compiler for the variable.</li>
</ul>
</p>
<p>
The first three entries are determined by the declaration of the variable. For the fourth, a simple strategy would be to allocate the first address (4096) for the variable declared first, 4097 for the next variable and so on. Note that here too we are fixing the address of each variable at compile time (<b>static allocation</b>).
</p>
<p>
The following structure may be used for a symbol table entry:
<div class="syntax">
struct Gsymbol {<br>
  char* name;  // name of the variable<br>
  int type;   // type of the variable<br>
  int size;   // size of the type of the variable<br>
  int binding; // stores the static memory address allocated to the variable<br>
  struct Gsymbol *next;<br>
<!-- ... any other field for data structure maintainance .. -->
}
</div>
</p>
<p>
The symbol table entries for the program above would look as below:
<img src="img/gsymboltable1.png">
</p>
<p>
To implement the symbol table, you must write two functions. For a simple implementation, a linear linked list suffices. In modern compilers, hash tables are maintained to make search efficient.
</p>
<p>
<div class="syntax">
struct Gsymbol *<i>Lookup</i>(char * name); // Returns a pointer to the symbol table entry for the variable, returns NULL otherwise.<br>
<br>
void <i>Install</i>(char *name, int type, int size); // Creates a symbol table entry.
</div>
</p>
<p>
<b>Note</b>: You must check before installing a variable whether the variable is already present. If a variable is declared multiple times, the compiler must stop the compilation and flag error.
</p>
<p>
<b>Task 1</b>: Complete the program to parse declarations and set up the symbol table entries and print out the contents of the symbol table.
</p>
<p>
The next task is to make necessary modifications to the AST construction and code generation. These are straightforward. Add a an additional field to the tree node structure
<div class="syntax">
typedef struct tnode{<br>
  int val;       //value of the constant<br>
  char* varname;   //name of the variable<br>
  int type;      //type of the variable<br>
  int nodetype;    //node type information<br>
  struct Gsymbol *Gentry;  //pointer to GST entry for global variables and functions<br>
  struct tnode *left,*right;  //left and right branches<br>
}tnode;
</div>
</p>
<p>
While constructing the tree, if a variable is encountered, keep a pointer to the corresponding symbol table entry. Set the type field as well to the correct type. The rest of the type checking steps are exactly as in the previous stage. The AST of while loop present in the above code is as follows (the relevant part of the code is shown below for easy reference):
<div class="syntax">
.<br>
.<br>
.<br>
while (num != 0) do<br>
  sum = sum + num;<br>
  read(num);<br>
endwhile;<br>
.<br>
.<br>
.<br>
</div>
</p>
<p>
<img src="img/ast_stage4.png">
</p>
<p>
There is no serious change to the code generation process, except that for variables, the binding address is obtained from the symbol table.<br><br>
<b>Important Note</b>: The XSM architecture is unrealistic in that allows a memory word to hold a string. Normally, in a real system, a string would require storing the characters one after another in consecutive memory locations as in an array. You will anyway learn array allocation immediately.
</p>
<p>
<b>Task 2</b>: Complete the AST construction and code generation steps.
</p>
<p>
<h4>Adding arrays</h4>
The next step is to allow declaration of arrays like:
<div class="syntax">
decl<br>
 ---<br>
 int a[100];<br>
 str names[20];<br>
 ---<br>
enddecl
</div>
The declaration syntax must permit:
<div class="syntax">
Varlist ::= Varlist , ID[NUM] | ID[NUM]
</div>
To implement this, for each variable, you must reserve as much static space as specified by the declaration and set the size field in the symbol table to indicate the number of words allocated. The next variable must be allocated space only below this space allocated.
</p>
<p>
For instance, for the declaration,
<div class="syntax">
decl<br>
  int a[10], b;<br>
enddecl
</div>
The binding field in the symbol table for the variable a may be set to address 4096.
The size entry set to 10. This means that we are allocating locations 4096-4105 for the array.
The next variable, b can be bound to the address 4106.
<img src="img/gsymboltable2.png">
</p>
<p>
<b>Task 2</b>: Complete the implementation of single dimensional arrays.
<br><br>
<b>Exercise 1</b>: Permit two dimensional arrays like:
<div class="syntax"> int a[10][10]; </div>
Test your implementation with a program for multiplying two nxn matrices.
<br><br>
<b id="stage4_ex2">Exercise 2</b>: Permit <i>pointer type</i> variables as in the following declaration as in the C programming language.
<div class="syntax">
decl<br>
  int x, *p;<br>
  str p, *q;<br>
enddecl
</div>
If you permit assignments like p=&x; and q=&p; , pointer variables may also be permitted in expressions like *p=*q+1; for referring to the data pointed to, as permitted in the C programming language. Semantic rules as in the C programming language may be assumed.
</p>
<p>
<b>Note</b>: Right now, you are not equipped to do dynamic memory
allocation for pointer variables (as done by the malloc() function of C).
Hence, a pointer type variable can be used as a pointer to another statically
declared variable of the corresponding type. Dynamic memory allocation will be discussed in later stages.
</p>
<h4> <b>Test Programs </b></h4>
<p>
Check your implementation with the following test cases : <br>
<h4> 1. Bubblesort (iterative) </h4>
<p>
<p>
This test program reads elements into an array and sorts them using the classic bubblesort algorithm. (iterative version)
</p>
<p>
<b> Input </b> : 1. Number of elements to be sorted from standard input.
2. Elements to be sorted
</p>
<p>
<b> Output </b> : A sorted array of elements.
</p>
To get the code for this test program <a href="testprograms/stage4/bubblesort.html" target="_blank">click here</a>.
</p>
</p>
<h4> 2. Nth Fibonacci Number(iterative) </h4>
<p>
<p>
This test program prints the nth fibonacci number
</p>
<p>
<b> Input </b> : 1.An integer n
</p>
<p>
<b> Output </b> : nth fibonacci number
</p>
To get the code for this test program <a href="testprograms/stage4/fibaofn.html" target="_blank">click here</a>.
</p>
</p>
<h4> 3. Is Prime or Not </h4>
<p>
<p>
This program tests if a given integer is prime or not.
</p>
<p>
<b> Input </b> : 1.An integer n
</p>
<p>
<b> Output </b> : Prime if n is prime else not a prime.
</p>
To get the code for this test program <a href="testprograms/stage4/prime.html" target="_blank">click here</a>.
</p>
</p>
<h4> 4. Sum of n factorials (iterative) </h4>
<p>
<p>
This program prints the sum to n factorial for a given n.
</p>
<p>
<b> Input </b> : 1.An integer n
</p>
<p>
<b> Output </b> : sum of factorial of all integers 1 to n.
</p>
To get the code for this test program <a href="testprograms/stage4/sum-to-n-fact.html" target="_blank">click here</a>.
</p>
</p>
<br>
</article>
<article class="grid col-full" id="nav-stage5">
<h2>Stage 5: Adding Functions</h2>
<b>Time Estimate :</b> 2 weeks, 5-10 hours/week
<p>
<b>Prerequisite Reading</b> : You must read the following documents before proceeding with this stage:<br>
<ul>
<li id="otis">1. The main page of the document <a href="run-data-structures.html" target="_blank">Run time allocation</a>.</li>
<li id="otis">2. <a href="run_data_structures/run-time-stack.html" target="_blank">Run time stack Allocation</a>.</li>
</ul>
</p>
<p>
<b>Learning Objectives</b>:
<ul>
<li id="otis">You will extend the language of Stage 4 by adding functions with support for recursion. Addition of functions to the language requires handling <b>scope</b> of variables. Support for recursion demands <b>run-time storage allocation</b>. Only <i>integer</i> and <i>string</i> type variables will be supported.</li>
</ul>
<hr>
</p>
<p>
This is the first major stage in the ExpL project. A skeletal outline of the syntax rules for defining the extension of the language of Stage 4 to support subroutines is as below. You are required to fill in rules required to complete the grammar. Note that variables may be only of type integer/string.
<div class="syntax">
Program ::= GDeclBlock FdefBlock <a href="grammar-outline.html" target="_blank">MainBlock</a><br>
        | GdeclBlock MainBlock<br>
        | MainBlock<br>
<br>
GdeclBlock ::= DECL GdeclList ENDDECL | DECL ENDDECL<br>
<br>
GdeclList ::= GDeclList GDecl | GDecl
<br>
GDecl ::= Type GidList ; <br>
<br>
GidList ::= GidList , Gid | Gid <br>
<br>
Gid ::= ID <br>
     | ID[NUM] <br>
     | ID(ParamList) <br>
--------------------------------------------------------------------------------------
<br>
FDefBlock ::= FdefBlock Fdef | Fdef <br>
<br>
Fdef ::=Type ID ( ParamList ) { LdeclBlock Body } <br>
<br>
ParamList ::= ParamList , Param | Param<br>
        |   /*param can be empty*/<br>
<br>
Param ::= Type ID <br>
<br>
Type ::= INT | STR <br>
-----------------------------------------------------------------------------------------
LdeclBlock ::= DECL LDecList ENDDECL | DECL ENDDECL <br>
<br>
LDecList ::= LDecList LDecl | LDecl<br>
<br>
LDecl ::= Type IdList ; <br>
<br>
IdList ::= IdList, ID | ID <br>
<br>
Type ::= INT | STR <br>
</div>
</p>
<p>
Since a function call is treated as an expression (whose value is the return value of the function), the following rules must be added:
<div class="syntax">
E ::= ID () | ID(ArgList) <br>
<br>
ArgList ::= ArgList, E | E <br>
</div>
</p>
<p>
Here is an <a href="run_data_structures/run-time-stack.html#nav-illustration" target="_blank">example</a> for a program with a function. We will take up semantic analysis and AST representation before proceeding to code generation.
</p>
<p>
Each function requires a <b>declaration</b>. The declaration of functions must be made along with the global declarations. The declaration of a function must specify the types and names of the <b>formal parameters</b> and the <b>return type</b> of the function. The compiler must store the declaration information in the global symbol table. For example, the declaration
<div class="syntax">
decl<br>
..<br>
..<br>
 int factorial(int n);<br>
..<br>
..<br>
enddecl<br>
</div>
specifies that <i>factorial</i> is a function that takes as input one integer argument and returns an integer. This is sometimes called the <b>signature</b> of the function. Conceptually, to invoke the factorial function, the <b>caller</b> must know:
<ul>
<li id="otis">1. The memory address to which the function call must be directed (<b>binding</b>).</li>
<li id="otis">2. The <b>types and names of the formal parameters</b> to the function and the order in which the actual <b>arguments</b> must be given as input to the function.</li>
<li id="otis">3. The <b>return type</b> of the function.</li>
</ul>
This precisely is the information that the symbol table stores.
</p>
<p>
A function <b>definition</b> contains:
<ul>
<li id="otis">a) The function's signature.</li>
<li id="otis">b) The declaration of <b>local variables</b> of the function.</li>