User blog:Emlightened/Ordinals in Type Theory

Here a simple type system which can represent powerful fast-growing recursive functions is presented.

It is based on the simply typed lambda calculus, and contains expressions for countible ordinals. The system could be extended in a way which permits multiple cardinaities and collapsing functions.

The Simply Typed Lambda Calculus \(\lambda\to\)
This system is presented without contexts, and with \(\beta\) but not \(\eta\) equivalence.

The lambda calculus has two sorts of thing: terms (which are represented by lowercase letters) and types (which are represented by uppercase letters). There are also three statements (judgements) that can be formed. These are:


 * \(A\text{ Type}\) which expresses that \(A\) is a type.
 * \(a : A\) which expresses that the term \(a\) has (unique) type \(A\). This symbol is occasionally written \(\in\)
 * \(a = b\) which expresses that \(a\) and \(b\) are equal.

A rule is written in the form

\[\frac{a_1 \quad \cdots \quad a_n}{c}\].

This means that, if we know that the assumptions \(a_1, \cdots, a_n\) (which are all statements) are true, we can deduce that \(c\) (which is also a statement) is true. If a rule doesn't have any required assumptions, than we can just write \(c\).

(In case you were wondering, we don't have any rules which correspond to for all, and, or, or anything like that. This theory is a base theory, in the same way that first order logic is a base theory. That's why we use this notation (called natural deduction) as opposed to using implication etc.)

How to make terms and types
Terms and types are made in a few specific, predetermined ways.

In the simply typed lambda calculus, there is only one way to make a type: if \(A\) and \(B\) are types, then \(A \to B\) is a type; the type of functions from \(A\) to \(B\). This is written as: \[\frac{A\text{ Type} \quad B\text{ Type}}{A \to B \text{ Type}}\]

There are multiple ways to make a term, however terms also have types, and it's possible to put terms together into somethign that doesn't have a type, so we have to be careful when first forming terms. Nevertheless, terms here are always in one of the following forms:


 * \(x^A, y^A, z^A...\) variables of the type \(A\), for any \(A\).
 * \(c\) some constant, which is introduced later.
 * \(f(a)\) (also written \(fa\)) the application of the function \(f\) to \(a\)
 * \(\lambda x^A.a\) the lambda abstraction to form a function, which binds the variable \(x^A\). This takes a term \(a\) with free variable \(x^A\), and turns it into a function. When the function is used, the variable \(x\) is replaced with whatever the argument of the funciton is.

To save space (and parentheses), there are some conventions about how terms are understood. Function types associate to the right, so \(A \to (B \to C) \to D \to E\) is \((A \to ((B \to C) \to (D \to E)))\). Function application associates to the left, so \(fab(gc)d\) is \(((((fa)b)(gc))d)\) is also \(f(a)(b)(g(c))(d)\). Lastly, lambda abstractions extend as far to the right as possible, so \(\lambda x^A.ax^A(\lambda y^B.bcy^B)x^A\) is \(\lambda x^A.(ax^A(\lambda y^B.(bcy^B))x^A)\). Lastly, we sometimes write \(B^A\) in place of \(A \to B\); they have exactly the same meaning.

There is also an operation on terms called substitution, a[x^A := b], which replaces all occurances of the variable \(x^A\) in \(a\) with the term \(b\). If \(a\) contains the subterm \(\lambda x^A.b\) than it may be unclear what to do; won't the substitution change the meaning of the function? Functions shouldn't depend on the variable they've used though, so \(\lambda x^A.b = \lambda y^A.c\) where \(c = b[x^A:=y^A]\). If this substitution is done first, then there are no worries! This is called capture avoiding substitution.

Rules for the lambda calculus
There are some basic rules about equality and variables:


 * \(x^A : A\)
 * \(a = a\)
 * \(\frac{a = b}{b = a}\)
 * \(\frac{a = b \quad b = c}{a = c}\)
 * \(\frac{a = b \quad c = d}{a[x^A:=c] = b[x^A:=d]}\)
 * \(\lambda x^A.s = \lambda y^A.s[x^A:=y^A]\)

Where \([x^A := y^A]\) is the substitution operator described above. There are also some rules for functions:

\[\frac{A\text{ Type} \quad B\text{ Type}}{A \to B \text{ Type}}\] is the basic rule for forming function types (given any two types \(A\), \(B\), you can form the type of functions from \(A\) to \(B\)).

\[\frac{b:B}{\lambda x^A.b : A \to B}\] (where \(x^A\) may be free in \(b\)) is the rule which says how to construct (introduce) functions (if \(b\) is in \(B\) (which may also but doesn't have to contain the free variable \(x^A\)), then we can form a function by abstracting that variable).

\[\frac{a : A \quad f : A \to B}{f(a) : B}\] is the rule which says how to apply (eliminate) functions (if \(f\) is a function from \(A\) to \(B\), and \(a\) is in \(A\), then \(f(a)\) is in \(B\)).

\[(\lambda x^A.b)(d) = b[x^A:=d]\] is the rule that says what happens when you evaluate a function (substitute the variable in the lambda abstraction for the argument).

These are the main rules for the lambda calculus.

With \(\times\)
The plain simply typed lambda calculus is - well - simple. It doesn't allow us to make a number of simple constructions, such as pairs. Many extensions along this vein can be made, but we only opt to consider product types, as these are definitely both the most useful and the most familiar.

\(\times\) binds more tightly than \(\to\), so \(A \to B \times C \to D\) is \(A \to (B \times C) \to D\)


 * \(\frac{A \text{ Type} \quad B \text{ Type}}{A \times B \text{ Type}\)
 * \(\text{pair} : A \to B \to A \times B\)
 * \(\pi_0 : A \times B \to A\)
 * \(\pi_1 : A \times B \to B\)
 * \(\pi_0(\text{pair}(a)(b) = a\)
 * \(\pi_1(\text{pair}(a)(b) = b\)

\(\text{pair}(a)(b)\) could also be written as \(\langle a, b \rangle\).

This type \(A \times B\), put simply, is the type of pairs which consist of a member of \(A\) and a member of \(B\). \(\pi_0\) and \(\pi_1\) retrieve the first and second members of the pair respectively.

If you're unfamiliar with this all, it might be a good idea to prove to yourself why \(A \times B\) is tye type of pairs of \(A\) and \(B\).

Now, convince yourself that the types \((A \times B) \to C\) and \(A \to (B \to C)\) are essentially equivalent. If you want, you can construct functions from one to the other as a small exercise.

With \(\texttt{Nat}\)
Now, we still can't do much with this system; there aren't even any types we can write down! (\(A\), \(B\) etc are just metavariables representing arbitrary types.) Even if we did have a base type, then we couldn't do much arithmetic with them, as we can only really define addition and multiplication. So instead, we choose to add a new type of natural numbers.

\(\texttt{Nat}\text{ Type}\)

\(0 : \texttt{Nat}\)

\(S : \texttt{Nat} \to \texttt{Nat}\)

\(rec_A^N : A \to (A \to \texttt{Nat} \to A) \to \texttt{Nat} \to A\)

Let \(R := rec_A a f\)

\(R 0 = a\)

\(R (Sn) = f (R n) n\)

We drop the superscript in \(\rec_A^N\) wherever unambiguous.

With \(\texttt{Ord}\)
Now, this post is (naturally) geared towards people who want to uswe this to make large numbers, and the simplest way to enable this is to make a type of countible ordinals.

\(\texttt{Ord}\text{ Type}\)

\(0^O : \texttt{Ord}\)

\(\sup : \texttt{Ord}^\texttt{Nat} \to \texttt{Ord}\)

\(rec_A^O : A \to (\texttt{Ord}^\texttt{Nat} \to A^\texttt{Nat} \to A) \to \texttt{Ord} \to A\)

Let \(R = rec_A a f\)

\(R 0 = a\)

\(R (\sup s) = f(\lambda n:\texttt{Nat}.sn)(\lambda n:\texttt{Nat}.R (sn))\)

We typically write \(0^O\) just as \(0\), as it will rarely cause confusion, and drop the script in \(\rec_A^O\) wherever unambiguous.

Now, although it was clear for the natural numbers how to make new naturals, it is considerably less clear for the countible ordinals. Clearly, \(0\) is the smallest ordinal, but how do we make \(1\), given that we don't have a sucessor function?

The answer can be found from a description of what \(\sup\) actually does. Now, \(\sup\) essentailly takes a fundamental sequence for an ordinal, and turns that fundamental sequence into an ordinal. So, what about successor ordinals? Well, the only element of the fundamental sequence of \(\alpha + 1\) is \(\alpha\), so it makes sense to try using the constant function to make \(\alpha + 1\). And indeed this works! We can define the successor function as follows:

\(\text{suc} \alpha = \sup\lambda n:\texttt{Nat}.\alpha\)

It may be unclear how to use the recursor for ordinals - how to actually make functions using ordianls. In the next section, we make simple arithmetic operations, define a version of the Hardy Hierarchy, and provide a function for fixpoints of functions \(f:\texttt{Ord} \to \texttt{Ord}\).

(A word of warning - different choices of fundamental sequences actually do lead to different 'ordinals'. \(\omega[n] = n\) and \(\omega'[n] = 2n\) are actually different ordinals, internally.)

Making ordinal functions
In the previous section, we defined how ordinals work in our theory. Here, we'll define how to make functions from ordinals.

One of the simplest, perhaps surprisingly, is a modified Hardy Hierarchy. The rules for this hierarchy are:


 * \(H_0(n) = n\)
 * \(H_{\alpha+1}(n) = H_{\alpha}(n+1)\)
 * \(H_{\alpha}(n) = H_{\alpha[n]}(n+1)\) for limit \(\alpha\)

In the theory, we can define:

\[H_\alpha = rec_{\texttt{Nat} \to \texttt{Nat}} (\lambda n:\texttt{Nat}.n) (\lambda s:\texttt{Ord}^\texttt{Nat}. \lambda f:(\texttt{Nat} \to \texttt{Nat})^\texttt{Nat}. \lambda n:\texttt{Nat}.fn(n+1) ) \alpha \]

We can even check that it works:

\(H_0(m) = (rec_{\texttt{Nat} \to \texttt{Nat}} (\lambda n:\texttt{Nat}.n) (\lambda s:\texttt{Ord}^\texttt{Nat}. \lambda f:(\texttt{Nat} \to \texttt{Nat})^\texttt{Nat}. \lambda n:\texttt{Nat}.fn(n+1) )0)m = (\lambda n:\texttt{Nat}.n)m = m\)

\(H_{\sup s}(m) = (rec_{\texttt{Nat} \to \texttt{Nat}} (\lambda n:\texttt{Nat}.n) (\lambda s:\texttt{Ord}^\texttt{Nat}. \lambda f:(\texttt{Nat} \to \texttt{Nat})^\texttt{Nat}. \lambda n:\texttt{Nat}.fn(n+1))(\sup s))m = (\lambda s:\texttt{Ord}^\texttt{Nat}. \lambda f:(\texttt{Nat} \to \texttt{Nat})^\texttt{Nat}. \lambda n:\texttt{Nat}.fn(n+1))(\lambda n:\texttt{Nat}.sn)(\lambda n:\texttt{Nat}.H_{sn})m = (\lambda f:(\texttt{Nat} \to \texttt{Nat})^\texttt{Nat}. \lambda n:\texttt{Nat}.fn(n+1))(\lambda n:\texttt{Nat}.H_{s[n]})m = (\lambda n:\texttt{Nat}.(\lambda k:\texttt{Nat}.H_{s[k]})n(n+1))m = (\lambda k:\texttt{Nat}.H_{s[k]})m(m+1) = H_{s[m]}(m+1)\)

As required.

The next operation we would typically want to make would be ordinal addition. This is defined as so:

\(\alpha + \beta = rec_\texttt{Ord} \alpha (\lambda s:\texttt{Ord}^\texttt{Nat}. \lambda f:\texttt{Ord}^\texttt{Nat}. \sup f) \beta\)

Multiplication and exponentiation are similarly defined:

\(\alpha \cdot \beta = rec_\texttt{Ord} 0 (\lambda s:\texttt{Ord}^\texttt{Nat}. \lambda f:\texttt{Ord}^\texttt{Nat}. \sup \lambda n:\texttt{Nat}.fn + \alpha) \beta\)

\(\alpha^\beta = rec_\texttt{Ord} 1 (\lambda s:\texttt{Ord}^\texttt{Nat}. \lambda f:\texttt{Ord}^\texttt{Nat}. \sup \lambda n:\texttt{Nat}.fn \cdot \alpha) \beta\)

The last two operations we want to define are the iteration diagonaliser:

\(\text{diag} : \texttt{Ord} \to (\texttt{Ord} \to \texttt{Ord}) \to \texttt{Ord}\)

\(\text{diag} \alpha f = \sup \lambda n: \texttt{Nat}. rec_\texttt{Ord}^N \alpha (\lambda \beta:\texttt{Ord}. \lambda k:\texttt{Nat}.f\beta) n\)

\(\text{diag} \alpha f\) is the suprenum of the sequence \(\alpha, f(\alpha), f(f(\alpha)), \cdots\)

And the normal diagonaliser:

\(\text{norm} : (\texttt{Ord} \to \texttt{Ord}) \to (\texttt{Ord} \to \texttt{Ord})\)

\(\text{norm} f \alpha = rec_\texttt{Ord} 0 (\lambda s:\texttt{Ord}^\texttt{Nat}. \lambda f:\texttt{Ord}^\texttt{Nat}.\sup f) (1+\alpha)

\(\text{norm} f\) is the function that ennumerates the limit points of \(f\)

norm f 0 = 0 norm f (sup s) = sup \n.norm f (s n)

Using these operations, it's not difficult to define some ordinals, and hence large numbers:

\(\omega = \text{diag} 0 \text{suc}\)

\(\varepsilon_0 = \text{diag} 1 \lambda \alpha:\texttt{Ord}.\omega^\alpha\)

\(\text{nexteps} \alpha = \text{diag} (\alpha + 1) \lambda \alpha:\texttt{Ord}.\omega^\alpha\)

\(\varepsilon = \text{norm }\text{nexteps}\)

\(\zeta_0 = \text{diag} 0 \varepsilon\)

\(H_{\zeta_0}(6)\) is a somewhat large number made in this system.

Make larger in the comments section.