2. Basic Definitions (Chapter 2)

2.1. Data Types and Structured Data Types

Let me begin the course by giving definitions for the terms data type and structured data type.

A data type consists of

Example 1: boolean or logical data type provided by most programming languages.

Example 2: As a second example, consider the datatype fraction. How can we specify the domain and operations that define fractions? It seems straightforward to name the operations; fractions are numbers so all the normal arithmetic operations apply, such as addition, multiplication, comparison. In addition there might be some fraction-specific operations such as normalizing a fraction by removing common terms from its numerator and denominator - for example, if we normalized 6/9 we'd get 2/3.

But how do we specify the domain for fractions, i.e. the set of possible values for a fraction?

2.2. Structural and Behavioural Definitions

There are two different approaches to specifying a domain: we can give a structural definition, or we can give a behavioural definition. Let us see what these two are like.

Structural Definition of the domain for `Fraction'

The value of of a fraction is made of three parts (or components):

This is called a structural definition because it defines the values of type `fraction' by imposing an internal structure on them (they have 3 parts...). The parts themselves have specific types, and there may be further constraints. For example, we could have insisted that a fraction's numerator and denominator have no common divisor (in that case we wouldn't need the normalize operation - 6/9 would not be a fraction by this definition).

Behavioural Definition of the domain for `Fraction'

The alternative approach to defining the set of values for fractions does not impose any internal structure on them. Instead it just adds an operation that creates fractions out of other things, such as

CREATE_FRACTION(N,D)
where N is any integer, D is any non-zero integer.
The values of type fraction are defined to be the values that are produced by this function for any valid combination of inputs.

The parameter names were chosen to suggest its intended behaviour: CREATE_FRACTION(N,D) should return a value representing the fraction N/D (N for numerator, D for denominator).

You are probably thinking, this is crazy, CREATE_FRACTION could be any old random function, how do we guarantee that CREATE_FRACTION(N,D) actually returns the fraction N/D?

The answer is that we have to constrain the behaviour of this function, by relating it to the other operations on fractions. For example, one of the key properties of multiplication is that:

                NORMALIZE ((N/D) * (D/N)) = 1/1
This turns into a constraint on CREATE_FRACTION:
    NORMALIZE (CREATE_FRACTION(N,D) * CREATE_FRACTION(D,N)) = CREATE_FRACTION(1,1)
So you see CREATE_FRACTION cannot be any old function, its behaviour is highly constrained, because we can write down lots and lots of constraints like this.

And that's the reason I call this sort of definition behavioural, because the definition is strictly in terms of a set of operations and constraints or axioms relating the behaviour of the operations to one another.

In this style of definition, the domain of a data type - the set of permissible values - plays an almost negligible role. Any set of values will do, as long as we have an appropriate set of operations to go along with it.

Which Kind of Definition Should We Use?

In this course, we will mainly use behavioural definitions. Their main advantage is that they are more abstract, imposing absolutely no unnecessary or arbitrary constraints on the data type. This makes them ideal as abstract definitions, which are central to this course. Behavioural definitions are also the best ones to use if you ever need to reason abstractly about a data type: we won't stress this in this course, but in general it is an important reason for using abstract definitions.

We will occasionally use structural definitions. They are somewhat more intuitive, until you get used to the behavioural kind, and they are very common in computer science (e.g. many textbooks use them). Also, because they impose structure on the values they are quite useful in giving ideas about how to implement a particular data type in a computer program.