Catch errors that are not caught by parsing (because many constructs are not context-free).
Examples (C):
void foo (int n) { int a = 0, i; printf("a = %d\n", a); for (i = 0; i < n; i++) { int a = 1; printf("a + j = %d\n", a + j); //a's value is 1. where is j. error? } printf("a = %d\n", a); }
void foo(int x) { int a = 4; bar(3); } void bar(int y) { printf("a = %d\n", a); //refers to a in the closest enclosing binding in the execution of the program }
n
n
n
{ //new scope int x = 4; //declarations f(x); } Tree: NewScope -->(left child) declarations x = 4 NewScope -->(right child) statementsOn entry to the new scope, the new declarations will be added to the symbol table before recursing down right child (statements). On returning from the new scope, the new declarations will be removed and the old declarations of x (if any) will be restored in the symbol table.
enter_scope()
: start a new nested scope
find_symbol(x)
: search stack starting from top, for the first scope that contains x
. Return first x
found or NULL if none found
add_symbol(x)
: add a symbol to the current scope (top scope)
check_scope(x)
: true if x
defined in the current scope (top scope). allows to check for double definitions.
exit_scope()
: exit current scope
add $r1,$r2,$r3What are the types of $r1, $r2, $r3? Can't say. For all you know, r1 may represent a bit-pattern that represents a string. The only thing we can say is that these have the base-type bitvector.
e1
has type Int
and e2
has type Int
, then e1+e2
has type Int
(e1:Int ^ e2:Int) => e1+e2:Int
(e1:Int ^ e2:Int) => e1+e2:Int
is a special case of Hypothesis1 ^ Hypothesis2 ... ^ HypothesisN => Conclusion
|- Hypothesis ... |- Hypothesis ------------------------------- |- Conclusion
|- e:T
|-
means "it is provable that ..."
i is an integer literal (constant) ---------------------------------- [Int] |- i:int
|-e1:int |-e2:int ---------------------------------- [Add] |-e1+e2:int
1 is an int literal ------------------- |- 1:int 2 is an int literal ------------------- |- 2:int |-1:int |-2:int ----------------- |-1+2:int
|-e:T
e
evaluates to a value of type T
1 is an integer literal ----------------------- |-1:voidThis is correct (sound) but not the most precise typing rule. e.g., it will not allow 1 to be added because voids cannot be added. assumption:
int
is a subtype of void
e:T
e
:
e
's subexpressions
e
------------------ [false] |- false : bool
s is a character literal ------------------------- [char] |- s : char
|-s:T ----------------------- [address-of] |- &s : T *
s is a string literal ----------------------- [string] |- s : char *
|-b:bool ----------------------- [Not] |- !b : bool
|-x:T ----------------------- [Expr-Stmt] |- x=y; : T
|-x:T |-y:T ----------------------- [Assignment-Stmt] |- x=y : T
|-e:bool |-x:T ------------------------- [While-Stmt] |- while (e) x : voidThis is a design decision. One could have said that the type of a while loop is the type of the last statement that executes. But then what if the statement didn't execute even once? Hence a type 'void' means that the statement has a type which is not usable. Similarly for if-then-else: the types in the two branches can be different, so the final type is void.
But what is the type of a variable reference?
x is a variable ---------------- [Var] |- x:?The local, structural rule does not carry enough information to give
x
a type.
Solution: put more information in the rules!
A type environment gives types for free variables
x
is free in expression x
x
is free in expression x+y
x
is free in expression {int y = ...; x+y;}
y
is a bound variable in expression {int y = ...; x+y;}
The sentence O |- e:T
is read
e
has the type T
i is an integer literal (constant) ---------------------------------- [Int] O |- i:int
O |- e1:int O |- e2:int ---------------------------------- [Add] O |- e1+e2:intNotice that the same assumptions are required in both the hypotheses and the conclusion.
And we can write new rules
O(x) = T --------- [Var] O |- x:T
O[T0/x] |- e1:T1 ---------------------------- [Let-No-Init] O |- { T0 x; e1 } : T1O[T/x] represents one function, where
O[T/x](x) = T O[T/x](y) = O(y)When we enter the scope, we add a new assumption in our type environment. When we leave the scope, we remove the assumption.
This terminology reminds us of the symbol table. So the type-environment is stored in the symbol table.
O[T0/x] |- e1:T1 O |- e2:T0 ---------------------------- [Let-Init] O |- { T0 x = e2; e1 } : T1Notice that this rule says that the initializer
e2
should have the same type T0
as x
. This is quite weak. In general, we should allow such assignment as long as e2
is of a subtype of T0
.
Define a relation <= on C++ classes
O[T0/x] |- e1:T1 O |- e2:T0 T0 <= T ---------------------------- [Let-Init] O |- { T x = e2; e1 } : T1A better version of [Assignment-Stmt] using subtyping
O|-x:T0 O|-e1:T1 T1 <=T0 ------------------------------ [Assignment-Stmt] |- x=e1 : T1(Notice that the type of the assignment statement is T1, so it can participate in more operations than T0)
X?y:z with subtyping. define least-upper-bound. use least-upper-bound to type this ternary operator. Compare with if-then-else
O |- e1:T1 ... O |- en:T1 ------------------------------ [Dispatch] O |- f(e1,e2,...,en):?Also maintain a mapping for method signatures in O
O(f) = (T1,...,Tn,T(n+1))means that there is a method f such that
f(x1:T1,x2:T2,x3:T3,...,xn:Tn):T(n+1)Hence, our dispatch rule becomes:
O |- e1:T1 ... O |- en:Tn O |- f:(T1',..,Tn',T(n+1)') T1<=T1',...Tn<=Tn' ---------------------------------------- [Dispatch] O |- f(e1,e2,...,en):T(n+1)Object-oriented languages also implement encapsulation (e.g., some functions are only visible in certain classes, etc.). To handle them, one can extend O to be a function from a tuple of class-name and the identifier-name. Further, to express inheritance, we will use the subtype-relation for the "this" object in the method dispatch.
O |- e0:T0 O |- e1:T1 ... O |- en:Tn O |- C::f : (T1',..,Tn',T(n+1)') T0 < T T1<=T1',...Tn<=Tn' ---------------------------------------- [Static-Dispatch] O |- e0@T.f(e1,e2,...,en):T(n+1)For object-oriented languages, you also need to know the "current class" C in which the expression is sitting.
The form of a sentence in the logic is:
O,C |- e:TE.g.,
O,C |- e1:int O,C |- e2:int ---------------------------------- [Add] O,C |- e1+e2:intGeneral themes
O,C |- e1:int O,C |- e2:int ---------------------------------- [Add] O,C |- e1+e2:intThis says that to type-check
e1+e2
, then we have to type-check
e1
and e2
separately in the same O,C environment.
TypeCheck(Environment, e1+e2) = { T1 = TypeCheck(Environment, e1); Check T1 == int; T2 = TypeCheck(Environment, e2); Check T2 == int; return int; }Let's consider a more complex example:
O |- e0:T0 O[T0/x] |- e1:T1 T0 <= T ---------------------------- [Let-Init] O |- { T x = e0; e1 } : T1Let's look at the corresponding type-checking implementation:
TypeCheck(Environment, { T x = e0; e1 }) = { T0 = TypeCheck(Environment, e0); T1 = TypeCheck(Environment.add(x:T), e1); Check subtype(T0,T1); return T1; }
new C
" expression that created it.
Soundness theorem: for all expressions E, dynamic_type(E) = static_type(E)
In all executions, E evaluates to values of the type inferred by the compiler. You have to actually run the program to get the dynamic_type. In the early days of programming languages, this was the type of property that was proved for their type systems.
Consider C++ program:
class A { ... } class B : public A { ... } void foo() { A foo; B bar; .... foo = bar }Static type of foo is "A", but the dynamic type of foo is "A" and "B" at different program points.
Soundness theorem for object-oriented languages that allow subtyping.
\forall{E}{dynamic_type(E) <= static_type(E)}All operations that can be used on an object of type C can also be used on an object of type C' <= C.
The dynamic type is obtained through the execution semantics of the program, e.g., if we have an expression x++
, and the current state of the program determines x
to be of type int
, there is a dynamic typing rule that says that the type of the new expression is also int
. Consider the example { T x = e0; x; }
where e0:T0 <= T
; here the dynamic type of the expression is T0
(not T
!). Similarly, the dynamic
type of if x then y:T1 else Y:T2
will be either T1
or T2
(not lub(T1,T2)
or void
).
In other words, soundness states that the static type system will not accept any incorrect program (that will not pass the dynamic type check of equal power). A dynamic type check for array-bounds is not of equal power (it has more checks enabled); a sound static type check for array-bounds would reject all incorrect programs (but it will also reject a few more that will actually pass the dynamic type check).
Soundness of static type system: all dynamically-type-incorrect programs will be rejected. Ensured by ensuring that static_type is a supertype of dynamic_type. A trivial static-type system that rejects all programs is sound (but not useful).
Completeness: all dynamically-type-correct programs will be accepted. Not possible to ensure in general.
Methods can be redefined but with same type! e.g., C++, Java override.
class Count { int i = 0; //default value = 0 Count inc() { i <-- i + 1; return *this; } };Now consider a subclass
Stock
of Count
:
class Stock : public Count { string name; //name of the item };And the following use of
Stock
:
Stock a; Stock b = a.inc(); ... b.name ...This code does not type-check because the return value of
inc
is Count
which is not a subtype of Stock
.
This seems like a major limitation as now the derived classes (subtypes) will be unable to use the inc
method.
a.inc()
has dynamic type Stock
.
So it is legitimate to write:
Stock b <-- a.inc();
But this is not well-typed
a.inc()
has static type Count
.
dynamic_cast
: returns nullptr at runtime if not-successful; else returns a pointer to the object of the new type. Allows bypassing the static type system. May entail runtime cost.
template<typename T> class Count<T> { int i = 0; T inc() { i <-- i + 1; return *static_cast<T *>(this); } //static_cast gets checked at compile-time! } class Stock : public Count<Stock> { ... }
dynamic_cast
operator. Will fail at runtime if the cast cannot be executed successfully. The compiler just treats this as an operator that either returns nullptr
(cast not successful) or (a pointer to) an element of the new type. In both cases, the compiler can perform the rest of the static type-check using the new type.
No_type
for use with ill-typed expressions
No_type <= C
for all types C
. Avoids cascading type errors due to one type-error.
No_type
No_type
result
No_type
at the bottom