CMSC330

Typing

Typing

Type Systems
Type Checking
Subtyping
Type Inference

Typing

Typing

Type System: Rules that dictate a type to a construct

Type: a categorization, classification or set (ultimately an ontological problem)

Examples for PL: int, float, bool, function

Construct: A meaningful thing

Examples: 5, var1, fun x -> x + 1

Type Systems give meaning to constructs (ultiamtely bits)

Tells us two things:

  • What type is _
  • What can we do with type _

Type Systems can be sound, complete, or decidable (but never all 3)

  • Sound: All bad programs will be rejected
  • Complete: All good programs will be accepted
  • Decidable: able to check all programs

Basically, to what extent can the type system enforce?

Basically, to what extent can the type system enforce?

Thing can be well-typed (accepted by type system) but be ill-defined


char buff[4];
buff[x];
          

if \(x \ge 4\) we get an error

C's type system only says \(x\) has to be an int

C's type system only says \(x\) has to be an int

Partial type coverage is common

If an operation is partial, does the type system enforce this?

Type safe languages enforce, type-unsafe do not

Type safe: well-typed \(\Rightarrow\) well-defined

Recap

Type System: Rules about types and constructs

Also rules about what we can do with types

Well typed: program accepted by the type system

Type Checking

Type systems are rules

Applying the rules: type checking

Can do this at compile time: static type checking

Can do this at run time: dynamic type checking

This is done on the AST

We will create rules for how an ocaml program will type check

Suppose our languge is small: only numbers


(* Grammar *)
E -> n
          

An type checker needs a rule of what type an expression is

An type checker needs a rule of what type an expression is

\(e : t\)

Like Opsem but instead of \(e \Rightarrow v\), we say \(e : t\)

We also need an Environment (we call context): \(G\)

An type checker needs a rule of what type an expression is


(* Grammar *)
E -> n
          

\(\cfrac{}{G \vdash n:int}\)

Let's add a new data type


(* Grammar *)
E -> n|true|false
          

\(\cfrac{}{G \vdash true:bool}\)

\(\cfrac{}{G \vdash false:bool}\)

And now Variables


(* Grammar *)
E -> x|n|true|false|let x = E in E
          

\(\cfrac{G(x) = t}{G \vdash x:t}\)

\(\cfrac{G\vdash e_1 : t_1\qquad G,x:t_1\vdash e_2 : t_2}{G \vdash \text{let}\ x = e_1\ \text{in}\ e_2:t_2}\)

And now conditional


(* Grammar *)
E -> let x = E in E|if E then E else E|V
V -> x|n|true|false
          

\(\cfrac{G\vdash e_1 : bool\qquad G\vdash e_2 : t\qquad G\vdash e_3 : t}{G \vdash \text{if}\ e_1\ \text{then}\ e_2\ \text{else}\ e_3:t_2}\)

Notice: \(\texttt{if 3 then 4 else 5}\) is accepted by the grammar but not the type checker

And a simple function


(* Grammar *)
E -> let x = E in E|if E then E else E|eq0 E|V
V -> x|n|true|false
          

\(\cfrac{G\vdash e : int}{G \vdash \text{eq0}\ e\ :bool}\)

Now add records (will need later)


(* Grammar *)
E -> let x = E in E|if E then E else E|eq0 E|V
V -> x|n|true|false|{x1=E1;...;xn=En}
          

\(\cfrac{G\vdash e_1:t_1 \qquad ... \qquad G\vdash e_n:t_n}{G \vdash \{l_1=e_1;...;l_n=e_n\}:l_1:t_1 ... l_n:t_n}\)

Subtyping

How to deal with polymorphism?

Recall that types are an ontological problem

Ultimately, categories will group entities together based on properties

Polymorphism is about entities that belong to multiple categories (in a particular way)

All squares are rectangles

Squares have square properties

They also have rectangle properties.

We can say that squares are a (sub)type of rectangle

Subtyping: Expressing polymorphic types

LisKov Substituion Principle: \(Subtype(S,T):\forall x \in T[P(x)] \Rightarrow \forall y \in S[P(y)]\)

ie: I can use \(S\) where I expect \(T\) if \(S\) is a subtype of \(T\)

So now we can do something like \(3 + 1.4\) (in Java)

So now we can do something like \(3 + 1.4\) (in Java)

Generalize ints and floats to something more

\(\cfrac{G\vdash e:S\qquad S <:T}{G\vdash e:T}\)

\(S\) is more specific than \(T\)

Subtypes are reflexive and transitive

x <: x

x <: y ^ y <:z \(\Rightarrow\) x<: z

In OCaml: subtypes are used for records

{x:int;y:int} <: {x:int}

records where \(x:int \land y:int\) are a type of records where \(x:int\)

Generalized: Longer records are a subtype of shorter ones (with the same labels/types)

\(\{l_i:T_i^{i\in 1..n+k}\}<:\{l_i:T_i^{i\in 1..n}\}\)

Also where each type is a subtype of other

Also where each type is a subtype of other

{x:int} <: {x:number}

\(\cfrac{\forall i.S_i <:T_i}{\{l_i:S_i^{i\in 1..n}\}<:\{l_i:T_i^{i\in1..n}\}}\)

And the order of the labels do not matter

And the order of the labels do not matter

{x:int;y:float} <: {y:float;x:number}

Putting this all together

{x:int;z:bool;y:{a:float;b:string}}
<:
{y:{a:float};z:bool}

Functions can also be subtyped

\(\cfrac{S <: T\qquad P < U}{T \rightarrow P:S \rightarrow U}\)

Input becomes more general

Output becomes more specific

\(rectangle \rightarrow square<: square \rightarrow rectangle\)

\(rectangle \rightarrow rectangle<: square \rightarrow shape\)

Type Inference

How to determine types in a function? (or a free variable?)

Free variable: a variable that is not bound to an input in a context

\(fun\ x \rightarrow x + y\)[\(y\) is free, \(x\) is not]

\(fun\ x \rightarrow fun\ y \rightarrow x + y\)[\(x,y\) not free]

context important

In OCaml we know how variables are used determine their type

x + 3 makes x an int

More accurate to say \(+:int\rightarrow int\rightarrow int\)

\((e_1:t_1 + e_2:t_2):t_3\)

we can make contraints about \(t_1,t_2,t_3\)

\(t_1=int,t_2=int,t_3=int\)

\((e_1:t_1 + e_2:t_2):t_3\)

we can make contraints about \(t_1,t_2,t_3\)

\(t_1=int,t_2=int,t_3=int\)

\(\cfrac{G\vdash e_1:t_1,C_1\qquad G\vdash e_2:t_2,C_2}{G\vdash e_1 + e_2: t_3,\{t_1=int,t_2=int,t_3=int\}\cup C_1 \cup C_2}\)

\(C\) is a set of constraints

\(C\) is a set of constraints

should be empty if we don't need to infer

\(\cfrac{}{G\vdash n:int,C_1}\)

\(\cfrac{}{G\vdash true:bool,C_1}\)

\(C\) is a set of constraints

should be empty if we don't need to infer

or should have contraints of known functions

\(\cfrac{G\vdash e_1:t_1,C_1\qquad G\vdash e_2:t_2,C_2}{G\vdash e_1 - e_2: t_3,\{t_1=int,t_2=int,t_3=int\}\cup C_1 \cup C_2}\)

We can do this for the built-in functions

We can do this for the built-in functions

\(\cfrac{G\vdash e_1:t_1,C_1\qquad G\vdash e_2:t_2,C_2}{G\vdash e_1 - e_2: t_3,\{t_1=int,t_2=int,t_3=int\}\cup C_1 \cup C_2}\)

\(\cfrac{G\vdash e_1:t_1,C_1\qquad G\vdash e_2:t_2,C_2}{G\vdash e_1 = e_2: t_3,\{t_1=t_2,t_3=bool\}\cup C_1 \cup C_2}\)

\[\cfrac{G\vdash e_1:t_1,C_1\qquad G\vdash e_2:t_2,C_2\qquad G\vdash e_3:t_3,C_3}{G\vdash \text{if}\ \ e_1\ \text{then}\ e_2\ \text{else}\ e_3: t_3,\{t_1=bool,t_2=t_3\}\cup C_1 \cup C_2 \cup C_3}\]

We can also make anonymous functions

\(\cfrac{make(t_1)\qquad G,x:t_1\vdash e:t_2,C}{G\vdash \text{fun}\ x\ \rightarrow\ e : t_1 \rightarrow t_2,C}\)

and call them

\(\cfrac{make(t_3)\qquad G\vdash e_1:t_1,C_1\qquad G\vdash e_2:t_2,C_2}{G\vdash e_1\ e_2 : t_3,\{t_1=t_2\rightarrow t_3\}\cup C_1\cup C_2}\)

To solve, we need to unify constraints

\((x:t_1 = 4:t_2):t_3,\{t_1=t_2,t_2=int,t_3=bool\}\)

\(\Rightarrow t_1 = int\)

Should fail if get a contradiction: \(\{int = bool\}\)

Let polymorphism

\(\text{let}\ x\ =\ e_1\ \text{in}\ e_2\) is hard