Compiler HWK#5 Due date: 2:20pm, June 13, 2013 Write a simple compiler to parse a simple PASCAL-like expression language with integer and real data types, block structure, 1-D array, I/O statement, and IF .. THEN .. ELSE. Code generation in C-- (Version 2.3); A sample program is as follows: PROGRAM program_name VAR %% beginning of variable declarations, optional INTEGER: x, y, z; REAL: a; ARRAY[-20..30] OF REAL: b; %% a total of 61 elements ENDVAR BEGIN READ(y); x := (y + 3 * 4) - 5; IF (x >= -20) AND (x <= 30) THEN BEGIN a := 0.3; b[x] := a; END ENDIF WRITE(y); WRITESP(); WRITE(x); WRITELN() ; END The definition of your language is as follows: 1. This language is case-sensitive with reserved words. 1.1
PROGRAM name optional var decl BEGIN statements or blocks END 1.2 is epsilon or VAR ... ENDVAR VAR ... ENDVAR is a declaration of local variables which can be empty or completely missing. VAR ... ENDVAR must be before any statements and can have only one appearance. 1.3 IF boolean-expression THEN block/statement ENDIF IF boolean-expression THEN block/statement ELSE block/statement ENDIF 1.4 BEGIN statements or blocks END BEGIN and END starts and ends a block, respectively. 1.5 boolean-expression: expression (> | < | = | <= | >= | <>) expression boolean-expression (OR | AND) boolean-expression NOT boolean-expression Precedence and Associativity for operators follow C. 1.6 a. assignment of expressions, where expressions can have variables, real-constant (digits.digits), integer-constant (digits), (, ), +, -, *, / and "uminus". assignment is ":=", b. input statement %% it's legal when sysio.h is included READ(single variable) c. output statements %% it's legal when sysio.h is included WRITE(single variable) --- output a variable WRITESP() --- output a single space WRITELN() --- write a new line There may have spaces before, in between or after "()". d. The statement terminator is ";" 2. each line has at most one statement can have a blank line cannot have a null statement, ";" 3. Two different data types: INTEGER and REAL. 3.0 There is no auto-type conversion. 3.1 Each occupies 4 bytes. 3.2 No type conversion is allowed. 3.3 Can have 1-D array of INTEGER or REAL ARRAY[20..30] OF INTEGER: x,y,z; ... x[a] := y[3] + 20; 4. variable must be declared before its usage. variable/program names uses the rule as HWK4. However, the length of a variable name is 1024. There can have lots of variables. I expect you to use either hash table or balanced-binary search tree for symbol tables. 5. Anything after %% are comments until to the end of the line. 6. Must detect at least the following error and reports its line number(report all errors): a. invalid ID names b. undeclared variables c. duplicatedly declared variables d. mismatched data type e. invalid constant f. illegal syntax 7. Name your compiler "pascal-" and it should support -S and -o option: pascal- program.p -o prog %% generate an executable called prog and default is p.out pascal- -S program.p %% generate an file contains the intermediate code in C-- Note: (1) TA will announce details for error handling later. (2) To submit your homework, use the software submission system designed by TA. Read TA's web site for details. (3) You need to have LEX and YACC programs. (4) TA will announce rules for submitting homeworks.