Catch errors that are not caught by parsing (because many constructs are not context-free).

Examples (C):

  1. All identifiers are declared
  2. Types
  3. ...
The set of legal programs is a subset of the parse-able programs.

Scope

Symbol Tables

Types

Type checking formalism

Some rules

i is an integer literal (constant)
----------------------------------  [Int]
     |- i:int
|-e1:int    |-e2:int
----------------------------------  [Add]
     |-e1+e2:int

Type Checking

Type Environments

------------------  [false]
  |- false : bool
 s is a character literal
------------------------- [char]
  |- s : char
   |-s:T
----------------------- [address-of]
  |- &s : T *
 s is a string literal
----------------------- [string]
  |- s : char *
    |-b:bool
----------------------- [Not]
  |- !b : bool
    |-x:T
----------------------- [Expr-Stmt]
  |- x=y; : T
 |-x:T  |-y:T
----------------------- [Assignment-Stmt]
  |- x=y : T
 |-e:bool  |-x:T
------------------------- [While-Stmt]
  |- while (e) x : void
This is a design decision. One could have said that the type of a while loop is the type of the last statement that executes. But then what if the statement didn't execute even once? Hence a type 'void' means that the statement has a type which is not usable. Similarly for if-then-else: the types in the two branches can be different, so the final type is void.

But what is the type of a variable reference?

   x is a variable
  ----------------  [Var]
      |- x:?
The local, structural rule does not carry enough information to give x a type.

Solution: put more information in the rules!

A type environment gives types for free variables

The sentence O |- e:T is read

i is an integer literal (constant)
----------------------------------  [Int]
    O |- i:int
 O |- e1:int    O |- e2:int
----------------------------------  [Add]
    O |- e1+e2:int
Notice that the same assumptions are required in both the hypotheses and the conclusion.

And we can write new rules

O(x) = T
---------  [Var]
O |- x:T
      O[T0/x] |- e1:T1
----------------------------    [Let-No-Init]
 O |- { T0 x; e1 } : T1
O[T/x] represents one function, where
O[T/x](x) = T
O[T/x](y) = O(y)
When we enter the scope, we add a new assumption in our type environment. When we leave the scope, we remove the assumption.

This terminology reminds us of the symbol table. So the type-environment is stored in the symbol table.

      O[T0/x] |- e1:T1
         O |- e2:T0
----------------------------    [Let-Init]
 O |- { T0 x = e2; e1 } : T1
Notice that this rule says that the initializer e2 should have the same type T0 as x. This is quite weak. In general, we should allow such assignment as long as e2 is of a subtype of T0.

Define a relation <= on C++ classes

A better version of [Let-Init] using subtyping:
      O[T0/x] |- e1:T1
         O |- e2:T0
       T0 <= T
----------------------------    [Let-Init]
 O |- { T x = e2; e1 } : T1
A better version of [Assignment-Stmt] using subtyping
 O|-x:T0  O|-e1:T1 T1 <=T0
------------------------------ [Assignment-Stmt]
  |- x=e1 : T1
(Notice that the type of the assignment statement is T1, so it can participate in more operations than T0)

X?y:z with subtyping. define least-upper-bound. use least-upper-bound to type this ternary operator. Compare with if-then-else

Typing Methods

C++ example
        O |- e1:T1
        ...
        O |- en:T1
------------------------------ [Dispatch]
   O |- f(e1,e2,...,en):?
Also maintain a mapping for method signatures in O
O(f) = (T1,...,Tn,T(n+1))
means that there is a method f such that
f(x1:T1,x2:T2,x3:T3,...,xn:Tn):T(n+1)
Hence, our dispatch rule becomes:
        O |- e1:T1
        ...
        O |- en:Tn
        O |- f:(T1',..,Tn',T(n+1)')
        T1<=T1',...Tn<=Tn'
---------------------------------------- [Dispatch]
   O |- f(e1,e2,...,en):T(n+1)
Object-oriented languages also implement encapsulation (e.g., some functions are only visible in certain classes, etc.). To handle them, one can extend O to be a function from a tuple of class-name and the identifier-name. Further, to express inheritance, we will use the subtype-relation for the "this" object in the method dispatch.
        O |- e0:T0
        O |- e1:T1
        ...
        O |- en:Tn
        O |- C::f : (T1',..,Tn',T(n+1)')
        T0 < T
        T1<=T1',...Tn<=Tn'
---------------------------------------- [Static-Dispatch]
   O |- e0@T.f(e1,e2,...,en):T(n+1)
For object-oriented languages, you also need to know the "current class" C in which the expression is sitting.

The form of a sentence in the logic is:

O,C |- e:T
E.g.,
 O,C |- e1:int    O,C |- e2:int
----------------------------------  [Add]
    O,C |- e1+e2:int
General themes Warning: Type rules are very compact! A lot of information in compact rules and it takes time to interpret them.

Implementing Type Checking

Let's consider the following rule:
 O,C |- e1:int    O,C |- e2:int
----------------------------------  [Add]
    O,C |- e1+e2:int
This says that to type-check e1+e2, then we have to type-check e1 and e2 separately in the same O,C environment.
TypeCheck(Environment, e1+e2) = {
  T1 = TypeCheck(Environment, e1);
  Check T1 == int;
  T2 = TypeCheck(Environment, e2);
  Check T2 == int;
  return int;
}
Let's consider a more complex example:
         O |- e0:T0
      O[T0/x] |- e1:T1
       T0 <= T
----------------------------    [Let-Init]
 O |- { T x = e0; e1 } : T1
Let's look at the corresponding type-checking implementation:
TypeCheck(Environment, { T x = e0; e1 }) = {
  T0 = TypeCheck(Environment, e0);
  T1 = TypeCheck(Environment.add(x:T), e1);
  Check subtype(T0,T1);
  return T1;
}

Static vs. Dynamic Typing

Formalizing the relationship:

Soundness theorem: for all expressions E, dynamic_type(E) = static_type(E)

In all executions, E evaluates to values of the type inferred by the compiler. You have to actually run the program to get the dynamic_type. In the early days of programming languages, this was the type of property that was proved for their type systems.

Consider C++ program:

class A { ... }
class B : public A { ... }
void foo() {
  A foo;
  B bar;
  ....
  foo =  bar
}
Static type of foo is "A", but the dynamic type of foo is "A" and "B" at different program points.

Soundness theorem for object-oriented languages that allow subtyping.

\forall{E}{dynamic_type(E) <= static_type(E)}
All operations that can be used on an object of type C can also be used on an object of type C' <= C. Subclasses only add attributes or methods (they never remove attributes/methods).

The dynamic type is obtained through the execution semantics of the program, e.g., if we have an expression x++, and the current state of the program determines x to be of type int, there is a dynamic typing rule that says that the type of the new expression is also int. Consider the example { T x = e0; x; } where e0:T0 <= T; here the dynamic type of the expression is T0 (not T!). Similarly, the dynamic type of if x then y:T1 else Y:T2 will be either T1 or T2 (not lub(T1,T2) or void).

In other words, soundness states that the static type system will not accept any incorrect program (that will not pass the dynamic type check of equal power). A dynamic type check for array-bounds is not of equal power (it has more checks enabled); a sound static type check for array-bounds would reject all incorrect programs (but it will also reject a few more that will actually pass the dynamic type check).

Soundness of static type system: all dynamically-type-incorrect programs will be rejected. Ensured by ensuring that static_type is a supertype of dynamic_type. A trivial static-type system that rejects all programs is sound (but not useful).

Completeness: all dynamically-type-correct programs will be accepted. Not possible to ensure in general.

Methods can be redefined but with same type! e.g., C++, Java override.

Example where static type system can be too restrictive

While a static type system may be sound, it may be too restrictive and may disallow certain programs that may be dynamically well-typed.
class Count {
  int i = 0; //default value = 0
	Count inc() { i <-- i + 1; return *this; }
};
Now consider a subclass Stock of Count:
class Stock : public Count {
  string name; //name of the item
};
And the following use of Stock:
Stock a;
Stock b = a.inc();
... b.name ...
This code does not type-check because the return value of inc is Count which is not a subtype of Stock.

This seems like a major limitation as now the derived classes (subtypes) will be unable to use the inc method.

a.inc() has dynamic type Stock.

So it is legitimate to write: Stock b <-- a.inc();

But this is not well-typed

Error Recovery