Compiler HWK#5 Due date: 2:20pm, June 14, 2007 Write a simple compiler to parse a simple PASCAL-like expression language with 1-D array, IF .. THEN .. ELSE, while loop, for loop and case statement. Code generation in C-- (Version 2.3); A sample program is as follows: INCLUDE filename PROGRAM program_name VAR %% beginning of variable declarations, optional x, y, z: INTEGER; a : REAL; b: ARRAY[-20..30] OF REAL; %% 61 elements ENDVAR BEGIN READ(y); x := (y + 3 * 4) - 5; IF (x >= -20) AND (x <= 30) THEN BEGIN a := 0.3; b[x] := a; END ENDIF WRITE(y); WRITESP(); WRITE(x); WRITELN() ; END The definition of your language is as follows: 1. This language is case-sensitive with reserved words. 1.0 is epsilon or INCLUDE filename[1] %% each include statement must be in one line .... INCLUDE filename[n] The included file contains the definition of procedures. ProcedureName():ReturnType ; ProcedureName(Type1 ID1,Type2 ID2,..., TypeN IDN):ReturnType; 1.1
PROGRAM name optional var decl BEGIN statements or blocks END 1.2 is epsilon or VAR ... ENDVAR VAR ... ENDVAR is a declaration of local variables which can be empty or completely missing. VAR ... ENDVAR must be before any statements and can have only one appearance. 1.3 IF boolean-expression THEN block/statement ENDIF IF boolean-expression THEN block/statement ELSE block/statement ENDIF 1.4 BEGIN statements or blocks END BEGIN and END starts and ends a block, respectively. 1.5 boolean-expression: expression (> | < | = | <= | >= | <>) expression boolean-expression (OR | AND) boolean-expression NOT boolean-expression Precedence and Associativity for operators follow C. 1.6 a. assignment of expressions, where expressions can have variables, real-constant, integer-constant, (, ), +, -, *, / and "uminus". assignment is ":=", b. input statement %% it's legal when sysio.h is included READ(single variable) c. output statements %% it's legal when sysio.h is included WRITE(single variable) --- output a variable WRITESP() --- output a single space WRITELN() --- write a new line There may have spaces before, in between or after "()". d. Procedure call is in the form of %% declared in header files procedure_name() There may have spaces before, in between or after "()". e. The statement terminator is ";" 1.7 precedence/associativity of operators same as C 1.8 for loop FOR int-var := int-expression-1 TO int-expression-2 DO statement/block FOR int-var := int-expression-1 DOWNTO int-expression-2 DO statement/block 1.9 while loop WHILE boolean-expression DO statement/block 1.10 case statement CASE expression OF constant1: statement/block constant2: statement/block ... [optional] OTHERWISE: statement/block ENDCASE; 2. each line has at most one statement can have a blank line cannot have a null statement, ";" 3. Two different data types: INTEGER and REAL. 3.1 Each occupies 4 bytes. 3.2 No type conversion is allowed. 3.3 Can have 1-D array of INTEGER or REAL x, y ,z : ARRAY[20..30] OF INTEGER; ... x[a] := y[3] + 20; 4. variable must be declared before its usage. variable/program names uses the rule as HWK4. However, the length of a variable name is 1024. There can have lots of variables. I expect you to use either hash table or balanced-binary search tree for symbol tables. 5. Anything after %% are comments until to the end of the line. 6. Must detect the following error and reports its line number(report all errors): a. invalid ID names b. undeclared variables c. duplicatedly declared variables d. mismatched data type e. undefined procedure. f. duplicatedly declared procedures g. invalid constant h. illegal syntax 7. Name your compiler "pascal-" and it should support -S and -o option: pascal- program.p -o prog %% generate an executable called prog and default is p.out pascal- -S program.p %% generate an file contains the intermediate code in C-- Note: (1) TA will announce details for error handling later. (2) To submit your homework, use the software submission system designed by TA. Read TA's web site for details. (3) You need to have LEX and YACC programs. (4) TA will announce rules for submitting homeworks.