2. Our Specification Of Collection - Details

As mentioned earlier, we view a list as a special kind of collection, and we would like to be able to apply to lists all the operations on general-purpose collections. So, before we specify lists themselves, we will first specify the primitive operations on collections that we wish to support.

The operations are very simple and for most I only have a few minor comments to add to the description in the handout. Note that all preconditions are checked by the user. Also look carefully for the C-primed (C') notation: if there is an input, C, that is changed by an operation, this will be indicated by having an output named C'.

CREATE

input: C (a collection)
output: C'
preconditions: C is undefined
postconditions: C' is the empty collection. Storage is allocated as needed.

The allocation of storage depends on low-level implementation details; it turns out that sometimes (often, in fact) dynamically-allocated storage is needed even for an empty collection.

DESTROY

input: C (a collection)
output: C'
preconditions: C is defined.
postconditions: C' is undefined. All dynamically allocated memory associated with C is returned to the global pool.

This is the inverse of CREATE.

SIZE

input: C (a collection)
output: S, a non-negative integer.
preconditions: C is defined.
postconditions: S = the number of elements in C.

Extremely frequently used. As mentioned above, we have included this as a primitive operation in order to ensure that it is computed in constant time.

IS_EMPTY

input: C (a collection)
preconditions: C is defined.
postconditions: true if C is empty, false otherwise.

Of course, it is possible to write this function in terms of the SIZE function: IS_EMPTY(C) <==> (SIZE(C) = 0). However, there are often ways of computing IS_EMPTY that are more efficient than computing SIZE, and we therefore include this as a separate operation.

INSERT

input: C (a collection), V (a value).
output: C'.
preconditions: V is of a type suitable for collection elements.
postconditions: C' = C with value V added. Storage is allocated as needed. If V already occurs in C, C' contains an additional copy of V.

A collection in which you are permitted to have multiple copies of a value is sometimes called a bag, or a multiset. This is the kind of collection we are going to implement, primarily because we are thinking ahead to lists: we certainly want to allow multiple occurrences of an element in our lists. In a list different occurrences of a value can be distinguished by their position - e.g. we talk of the first occurrence of a value - in a collection there is no way to distinguish between the different occurrences.

DELETE

input: C (a collection), V (a value).
output: C'.
preconditions: V is of a type suitable for collection elements.
postconditions: C' = C with value V removed.
If V does not occur in C, the operation does nothing. If there are several copies of V in C, just one is deleted. If possible, the storage associated with the deleted occurrence of V is returned to the global pool.

JOIN

input: C1 and C2 (collections).
output: C1' and C2'.
preconditions: C1 and C2 are both defined, and are different collections.
postconditions: C1' is the union of the two collections (i.e. it contains all the elements in C1 and C2). C2' is undefined.

JOIN does just what the name suggests. If C1 = { A, B, C } and C2 = { D, E }, then JOIN(C1,C2) changes C1 to { A, B, C, D, E }. What about C2? The options were discussed above, in the context of lists. As the postconditions state, C2 is undefined after the JOIN operation.

MAP

input: C (a collection), P (a procedure), D extra data parameter for P.
output: determined by P.
preconditions: C is defined. P takes two parameters: the first is of type pointer to `collection element', and the second is D.
postconditions: P has been called with every element of C.

This is an extremely powerful operation, even though it is very simple. It goes through the collection one element at a time, and calls the procedure with each of the elements.

Suppose, for example, that C = { 9, 3, 1 } and P(X,D) = printf("%d\n",X) then MAP(C,P,D) would call P three times: P(9,D), P(3,D), and P(1,D). This would result in the values in the collection being written out onto the terminal, one value on each line. We cannot be sure what order the values will get written out: the postconditions do not say in what order the elements are `processed'.

The purpose of the extra data parameter D will be illustrated in the following. It is always a pointer. By convention it is assumed to be of type char*, and it is the responsibility of procedure P to cast it to the appropriate pointer type.

2.1. Programming notes

C allows procedures to be passed as parameters. Assuming collection elements are of type int, the above example could be written:

        typedef int collection_element;

        void print_element (collection_element *x,char *ignored)
        { printf("%d\n",*x); }

        void print_collection (collection c)
        { MAP(c,print_element,NULL); }

Notice that the call to MAP supplies a NULL pointer for the data parameter and that print_element ignores this argument. Here is a way of using MAP to compute the size of a collection: for each element, we add 1 to a counter. We need to pass a procedure to MAP that adds 1 to a counter. The extra data parameter will be a pointer to this counter.

        void increment_counter (collection_element *x,char *n)
        { * ((int *) n) += 1; }

        int size (collection c)
        { int n = 0; MAP(c,increment_counter,&n); return n; }

There is no magic here. MAP calls the function increment_counter once with each element in the given collection. increment_counter increments, i.e. adds one to, the counter pointed to by its second argument. Notice that this pointer is received as a char* and must be cast to int*.

Notice in the specification of MAP that procedure P's first parameter is a pointer to an element. This means that the procedure can modify the element itself. Therefore MAP can be used to change the elements in a collection.

For example, suppose we wanted to double every element in a collection. This is easily done, by mapping across the collection a procedure that doubles the value it is given:

        void double (collection_element *x,char *ignored)
        { *x += *x; }

        void double_everything (collection c)
        { MAP(c,double,NULL); }

As a final example, how could we write a function that takes a value, V, and a collection C, and counts the number of times V occurs in C? We need to compare V with each element of C, and add 1 to a counter if V is equal to the element. We'll write a procedure to do this, and pass this procedure to MAP - the fact that we will process the elements of C one by one is what tells us we can use MAP. The data parameter to this procedure must contain both the value V and the counter. Therefore we will implement it by a pointer to a structure with two fields: a collection element and an integer;

        typedef struct {
          collection_element v;
          int n;
        } occurrence_data;

        void increment_if_equal (collection_element *x,char *data)
        {
          occurrence_data * odata = (occurrence_data *) data;
          if (*x == odata->v) odata->n++;
        }

        int number_of_occurrences (collection c,collection_element v)
        {
          occurrence_data data; data.v = v; data.n = 0;
          MAP(c,increment_if_equal,(char *) &data);
          return data->n;
        }

2.2. Our Initial Implementation Of Collection

Before moving on to the specification of lists, let me briefly explain a few details of our initial implementation of Collection, which you will be completing and using in the next few assignments.

The main things that need to be explained are the data type definitions:

        typedef int collection_element;

        typedef struct collection_node {
          collection_element value;
          struct collection_node * next;
        } collection_node;

        typedef struct {
          int size;
          collection_node *first,*last;
        } collection_header;

        typedef collection_header *collection;

For this assignment, each element in a collection will be an integer. Each element will be stored in a node. The memory needed for a node will be dynamically allocated... you are given procedures called get_node_memory and return_node_memory for allocating and releasing this memory.

The nodes will be linked together by pointers to form a chain. The link/pointer is stored inside the node in the field next. This, of course, imposes an order on the nodes, but this order is completely arbitrary - it can be whatever you like (whatever is easiest to implement and most efficient to execute). You get this freedom because by definition, the elements in a collection are unordered: The ordering on nodes is just an implementation trick which the user has no say about (as far as he is concerned there is no order).

A collection is just a pointer to a record called a header. Headers are a very useful idea - we'll discuss them further below. For now, all you need to know is that a collection is a pointer to a header, which is a record containing three pieces of information:

size, the total number of elements currently in the collection.
first, a pointer to the first node in the chain of nodes.
last, a pointer to the last node in the chain of nodes.

In an empty collection, size will be 0, and first and last will be NULL (i.e. the distinguished pointer to nothing). Note that every operation that changes the size of a collection must update the size field in this record; otherwise the value stored there will be wrong. The memory for headers is allocated dynamically - you are given procedures called get_header_memory and return_header_memory for allocating and releasing this memory.

To give a concrete example: the collection C={ 9, 1, 5 } would be represented:

Explain the following regarding Assignment #4:

Explain type definitions.
Explain header_count and node_count memory business.