1. General Considerations For Designing An Implementation

Time efficiency

We would like all our operations to be constant time (except for MAP, DELETE, and DESTROY, which cannot possibly be constant time).

Space efficiency

no maximum limit on list size: dynamic allocation of memory for elements (nodes).
pointers consume memory, they are usually 32 bits, which is 4 times the size of a character and, in many systems, twice the size of an integer. This can be a significant overhead, so we'd like to minimize the number of pointers we have per element, while still having enough to permit all our operations to be constant time. It turns out that we can get away with one pointer per element. This is possible because of our choice of primitives: for example, if we had a command for moving backwards, we'd need two pointers per element.

Should we use a Header for our lists/collections?

In the simplest possible dynamically-allocated implementation of List, or Collection, a variable of type list is simply a pointer to one node in the list (usually the first, but not always). This node then points to one or more other nodes etc.

The alternative is to have a header for your list. A header contains a pointer, as before, but it also contains other information - sometimes some additional pointers, and almost always some information about the list contents. In the implementation of Collection you're working with, there is a header. It contains a pointer to the first node, and, in addition, a pointer to the last node, and the number of elements in the collection. Why bother with headers? What use is this extra information, which we could compute when we needed it from the one essential piece of information, the pointer to the first node?

The answer is efficiency: we can look up the information in a header very quickly, in constant time, but to compute that information, given only the first node pointer, would require linear (non-constant) time. The size information obviously lets us compute size in constant time.

Why do we need the last node pointer?

Answer: To implement JOIN in constant time.

Whenever you want to compute some feature of a data structure in constant time, if it is not perfectly easy to compute directly (such as whether or not a data structure is empty) you should consider storing its value in a header. The catch is this: all the information stored in a header has to be kept up-to-date: every operation that affects the information must update the value in the header. How much of an overhead is this?

It depends on the operation. For example, for collections, the operations that affect size are: CREATE (sets to 0), INSERT (adds 1), DELETE (subtracts 1 if the element was in the list), and JOIN - that's on the assignment. Obviously all of these updates are very simple and efficient.

With lists we have one extra operation that affects size: SPLIT. Now actually splitting the list in two is no more difficult than the JOIN operation: we just need to change a few pointers. However, if we are storing the size information in a header, then the SPLIT operation has to update this information so that it correctly indicates the length of both lists after the SPLIT has been done. Can this be done in constant time? Not easily. We cannot just zip down one of the lists and count its length - that isn't a constant time operation. We could update the sizes in constant time if we knew the position in the list where the SPLIT was taking place. It would be easy to store position information with each element, but this information also changes when we SPLIT the list - the positions of all the elements in L2 change. How can we update that information without zipping down L2, one element at a time?

I don't know. So it looks like we have to give up on computing size in constant time. But there is a trick we'll see next lecture that makes this computation almost constant time. So, we will use a header for our lists: the pointer to the last node makes JOIN and SPLIT truly constant time, and we'll also store in the header the information needed for the trick that makes SIZE almost constant-time.

Semantics Of Assignments

The final consideration is a very important one, although it sounds trivial, and a very general one - like the preceding considerations it arises whenever you implement any abstract type. The issue is this:

What is the effect of the assignment statement, L2 = L1, on lists, and on windows (W1 = W2)?

Now there is, of course, the obvious sense in which the effect of the assignment operation is well-defined. It always does the same thing. X = Y makes a copy of the value in Y and stores it in X. It must be emphasized that X gets a copy of the value in Y; if we subsequently change the value in X, the value in Y does not change. That is all there is to the assignment statement.

X = Y works the same whether Y is a pointer or a normal value: X gets a copy of Y. However, if Y is a pointer, X only gets a copy of the pointer, it does not get a copy of the thing that Y points to.

To take a concrete example, let us see what would happen if we defined the type Collection to be a structure instead of a pointer to a structure:

        typedef struct {
          int size;
          collection_node *first, *last;
        } collection;

Suppose we have two collections, C1 and C2, and C2 = { 9, 1, 5 }

If we say, C1 = C2, whatever is stored in C2 gets copied into C1, like this:

And this is bad news! Can anyone say why this is a total disaster?

Answer: Because they are sharing some, but not all, of their information. If they didn't share any information, there'd be no problem - changing one would have no effect on the other. Likewise, if they shared all of their information, they would be unified, any change to one would change them both - we'll see this in a minute.

But the current arrangement, with part of the information shared (the nodes) and part of it copied (the header), is trouble. If you change one of them, the shared part will be changed in both, but the copied part will only be changed in ONE of them. Which means that the other will have out-of-date information in its header. For example, if you did DELETE(C2,5) you'd get:

C2 is perfect. Its last node was deleted and the memory returned to the global pool. C2 now has two elements (the change in size has been recorded in the header). But C1 is all screwed up. Its size is wrong, and its last pointer is dangling.

It is our responsibility as designers to make the assignment statement as safe as possible (or to warn the user that its effects are unpredictable).

There are two ways to make X = Y a safe operation:

Option 1: If the memory location Y contains all the information about Y, then X = Y will make a complete copy of Y; they will share nothing, and changes to one will have absolutely no effect on the other.
Option 2: If the memory location Y contains no information about Y, but just a pointer to the information about Y, then X = Y makes X and Y point to the same place... in other words they share all their information, and any change to one of them automatically changes the other.

To illustrate Option 2, let us now redo the preceding example using the type definition for Collection you have in your assignment, in which collection is a pointer to the header...

Then the result of C1 = C2 is

And if we now DELETE(C2,5), C1 and C2 are both changed appropriately:

Every operation will work perfectly with this arrangement: that is why we defined Collection as a pointer to a header.

With this analysis we have identified two safe ways of handling the assignment statement. Which shall we use for lists? Which for windows?

With lists we really have no choice. If we are going to have elements stored in dynamically allocated memory, we cannot store all the information about list L inside the variable L. Some of it will be outside L. The only remaining option, then, is to have all the information about list L outside of variable L: variable L will just point to the header, which will contain some information and point to the rest (the nodes). This is what we have just sketched.

One consequence of choosing this way to implement lists is that every list that is defined requires dynamically allocated memory for its header. Even the empty list. If L is list variable, we cannot use L = NIL to indicate that L is empty. Why not? Suppose L1 has one element, and we do L2 = L1, and then delete the element in L1. L1 has just become empty, so, if NIL pointers are going to represent empty lists, L1 should be a NIL and the memory for its header must be returned to the global pool. This makes L1 empty, but L2 is now invalid: it is not NIL, it is pointing at the header which is now gone!

Unlike lists, for Windows we could use either option. A window does not absolutely demand dynamically-allocated storage, we could store all the information about a window inside the window's own memory.

Is this what we want? Suppose we say W1 = W2 and then MOVE_FORWARD(W1) - the question is, should this operation affect W2, or not? If we want it to affect W2, then assignment should UNIFY W1 and W2 - which means windows must be pointers to their information (option 2). If we want MOVE_FORWARD(W1) to have no effect on W2 then assignment should copy W2's information into W1, which means the windows must contain all their own information (option 1).

For windows, the copying option is extremely useful. If W2 is pointing at an element you want to keep track of, you can set W1 = W2, and then move W2 elsewhere in search of some other element. This is how we use array indices, and it will be nice to have list-windows act the same.

These then are the four major considerations in designing our implementation:

Speed
Memory Efficiency
Headers or No Headers
The assignment statement

After considering the alternatives, we are ready to write down the the C type definitions for our implementation of lists. They are essentially the same as for collection with the addition of windows. As mentioned at the beginning of this discussion, we are now completely finished. With the specification and the type definitions decided, there are no more decision to make about the implementation: it's time to write the code.