An Introduction to C++ Programming - Part 10

The data representation problem

In the file array as implemented last month, data was always stored in a raw binary format, exactly mirroring the bits as they lay in memory. This works fine for integers and such, but can be disastrous in other situations. Imagine a file array of strings (where string is a ``char*''). With the implementation from last month, the pointer value would be stored, not the data pointed to. When reading, a pointer value is read, and when dereferenced, whatever happens to be at the memory location pointed to (if anything) will be used (which is more than likely to result in a rather quick crash.) Anything with pointers is dangerous when stored in a raw binary format, yet we must somehow allow pointers in the array, and preferably so without causing problems for those using the array with built-in arithmetic types. How can this be done?
In part 4, when templates were introduced, a clever little construct called ``traits classes'' was shown. I then gave this rather terse description: ``A traits class is never instantiated, and doesn't contain any data. It just tells things about other classes, that is its sole purpose.'' Doesn't that smell like something we can use here? A traits class that tells how the data types should be represented on disk?
What do we need from such a traits class? Obviously, we need to know how much disk space each element will take, so a ``size'' member will definitely be necessary, otherwise we cannot know much disk space will be required. We also need to know how to store the data, and how to read it. The easiest way is probably to have member functions ``writeTo'' and ``readFrom'' in the traits class. Thus we can have something looking like this:

  template  class FileArrayElementAccess
  {
  public:
    static const size_t size;
    static void writeTo(T value, ostream& os);
    static T readFrom(istream& is);
  };
The array is then rewritten to use this when dealing with the data. The change is extremely minor. ``storeElement'' needs to be rewritten as:

  template 
  void FileArray::storeElement(size_t index,
                                  const T& element)
  {
    // what if index >= array_size?
    typedef FileArrayElementAccess traits;
    (*pstream).seekp(traits::size*index
                      +sizeof(array_size), ios::beg);
    // what if seek fails?
    traits::writeTo(element,*pstream);
    // what if write failed?
    // what if too much data was written?
  }
The change for ``readElement'' is of course analogous. However, as indicated by the last comment, a new error possibility has shown up. What if the ``writeTo'' and ``readFrom'' members of the traits class are buggy and write or read more data to disk than they're allowed to? Since it's the user of the array that must write the traits class (at least for their own data types) we cannot solve the problem, but we can give the user a chance to discover that something went wrong. Unfortunately for writing, the error is extremely severe; it means that the next entry in the array will have its data destroyed... In the traits class, by the way, the constant ``size'', used for telling how many bytes in the stream each ``T'' will occupy, poses a problem with most C++ compilers today (modern ones mostly makes life so much easier.) The problem is that a static variable, and also a static constant, in a class, needs to reside somewhere in memory, and the class declaration is not enough for that. This problem is two-fold. To begin with, where should it be stored? It's very much up to whoever writes the class, but somewhere in the code, there must be something like:

  const size_t ArrayFileElementAccess::size = ...;
where ``X'' is the name of the class dealt with by the particular traits specialisation. The second problem is that this is totally unnecessary. What we want is a value that can be used by the compiler at compile time, not a memory location to read a value from. As I mentioned, a modern compiler does make this much easier. In standard C++ it is allowed to write:

  template<> class ArrayFileElementAccess
  {
  public:
    const size_t size = ...;
  ...
  };
Note that for some reason that I do not know, this construct is only legal if the type is a constant of an integral or enumeration type. ``size_t'' is such a type, it's some unsigned integral type, probably ``unsigned int'', but possibly ``unsigned long''. The expression denoted ``...'' must be possible to evaluate at compile time. Unless code is written that explicitly takes the address of ``size'', we need not give the constant any space to reside in. The odd construct ``template <>'' is also new C++ syntax, and means that what follows is a specialisation of a previously declared template. For old compilers, however, there's a work-around for integral values, no larger than the largest ``int'' value. We cheat and use an enum instead of a ``size_t''. This makes the declaration:

  class ArrayFileElementAccess
  {
  public:
    enum { size= ... };
  ...
  };
This is a bit ugly, but it is perfectly harmless. The advantage gained by adding the traits class is flexibility and safety. If someone wants to use a file array for their own class, they're free to do so. However, they must first write a ``FileArrayElementAccess'' specialisation. Failure to do so will result in a compilation error. This early error detection is beneficial. The sloppy solution from last month would not yield any error until run-time, which means a (usually long) debugging session.

Several arrays in a file

What is needed in order to host several arrays in the same file? One way or the other, there must be a mechanism for finding out where one array begins and another ends. I think the simplest solution, is to let go of the file names, and instead make the constructors accept an ``fstream&''. We can then require that the put and get pointer of the stream must be where the array can begin, and we can in turn promise that the put and get pointer will be positioned at the byte after the array end. Of course, in addition to having a reference to the ``fstream'' in our class, we also need the ``home'' position, to seek relative to, when indexing the array. This becomes easy to write for us, it becomes easy to use as well. For someone requiring only one array in a file, there'll be slightly more code, an ``fstream'' object must be explicitly initialised somewhere, and passed to the constructor of the array, instead of just giving it a name. I think the functionality increase/code expansion exchange is favorable.
In order to improve the likelihood of finding errors, we can waste a few bytes of disk space by writing a well known header and trailer pattern at the beginning and end of the array (before the first element, and after the last one.) If someone wants to allocate an array using an existing file, we can find out if the get pointer is in place for an array start.
The constructor creating a file should, however, first try to read from the file to see if it exists. If it does, it should be created from the file, just like the constructor accepting a stream only does. If the read fails, however, we can safely assume that the file doesn't exist and should instead be created.
The change in the class definition, and constructor implementation is relatively straight forward, if long:

  template 
  class FileArray
  {
  public:
    FileArray(fstream& fs, size_t elements);
    // create a new file.

    FileArray(fstream& fs);
    // use an existing file and get size from there
  ...
  private:
    void initFromFile(const char*);

    fstream& stream;
    size_t array_size; // in elements
    streampos home;
  };

 template 
  FileArray::FileArray(fstream& fs, size_t elements)
    : stream(fs),
      array_size(elements)
  {
    // what if the file could not be opened?
    // first try to read and see if there's a begin
    // pattern. Either there is one, or we should
    // get an eof.

    char pattern[6];
    stream.read(pattern,6);
    if (stream.eof()) {
      stream.clear(); // clear error state
                      // and initialise.

      // begin of array pattern.
      stream.write("ABegin",6);
      // must store size of elements, as last month
      const size_t elem_size
        =FileArrayElementAccess::size;
      stream.write((const char*)&elem_size,
                   sizeof(elem_size));
      // and of course the number of elements
      stream.write((const char*)&array_size,
                   sizeof(array_size));
      // Now that we've written the maintenance
      // stuff, we know what the home position is.

      home = stream.tellp();

      // Then we must go the the end and write
      // the end pattern.

      stream.seekp(home+elem_size*array_size);
      stream.write("AEnd",4);

      // set put and get pointer to past the end pos.
      stream.seekg(stream.tellp());
      return;
    }

    initFromFile(pattern); // shared with other
                           // stream constructor
    if (array_size != elements) {
      // Uh oh. The data read from the stream,
      // and the size given in the constructor
      // mismatches! What now?
      stream.clear(ios::failbit);
    }

    // set put and get pointer to past the end pos.
    stream.seekp(stream.tellg());
  }

  template 
  FileArray::FileArray(fstream& fs)
    : stream(fs)
  {
    // First read the head pattern to see if
    // it's right.
    char pattern[6];
    stream.read(pattern,6);
    initFromFile(pattern);
    // set put and get pointer to past the end pos.
    stream.seekp(stream.tellg());
  }

  template 
  void FileArray::initFromFile(const char* p)
  {
    // Check if the read pattern is correct
    if (strncmp(p,"ABegin",6)) {
      // What to do? It was all wrong!
      stream.clear(ios::failbit);
      // for lack of better,
      // set the fail flag.
      return;
    }
    // OK, we have a valid array, now let's see if
    // it's of the right kind.
    size_t elem_size;
    stream.read((char*)&elem_size,sizeof(elem_size));
    if (elem_size != FileArrayElementAccess::size)
    {
      // wrong kind of array, the element sizes
      // mismatch. Again, what to do? Let's set
      // the fail flag for now.
      stream.clear(ios::failbit);
      // stupid name for the
      // member function, right?
      return;
    }
    // Get the size of the array. Can't do much with
    // the size here, though.
    stream.read((char*)&array_size,sizeof(array_size));
    // Now we're past the header, so we know where the
    // data begins and can set the home position.

    home = stream.tellg();

    stream.seekg(home+elem_size*array_size);

    // Now positioned immediately after the last
    // element.

    char epattern[4];
    stream.read(epattern,4);
    if (strncmp(epattern,"AEnd",4)) {
      // Whoops, corrupt file!
      stream.clear(ios::failbit);
      return;
    }
    // Seems like we have a valid array!
  }
Other than the above, the only change needed for the array is that seeking will be done relative to ``home'' rather than the beginning of the file (plus the size of the header entries.) The new versions of ``storeElement'' and ``readElement'' become:

  template 
  T FileArray::readElement(size_t index) const
  { // what if index >= max_elements?
    typedef FileArrayElementAccess traits;
    stream.seekg(home+index*traits::size);
    // what if seek fails?

    return traits::readFrom(stream);
    // what if read fails?
    // What if too much data is read?
  }

  template 
  void FileArray::storeElement(size_t index,
                                  const T& element)
  { // what if index >= array_size?
    typedef FileArrayElementAccess traits;
    stream.seekp(home+traits::size*index);
    // what if seek fails?
    traits::writeTo(element,stream);
    // what if write failed?
    // what if too much data was written?
  }

Temporary file array

Making use of a temporary file to store a file array that's not to be persistent between runs of the application isn't that tricky. The implementation so far makes use of a stream and known data about the beginning of the stream, number of elements and size of the elements. This can be used for the temporary file as well. The only thing we need to do is to create the temporary file first, open it with an fstream object, and tie the stream reference to that object, and remember to delete the file in the destructor.
What's the best way of creating something and making sure we remember to undo it later? Well, of course, creating a new helper class which creates the file in its constructor and removes it in its destructor. Piece of cake. The only problem is that we shouldn't always create a temporary file, and when we do, we can handle it a bit different from what we do with a ``global'' file that can be shared. For example, we know that we have exclusive rights to the file, and that it won't be reused, so there's no need for the extra information in the beginning and end. So, how's a temporary file created? The C++ standard doesn't say, and neither is there any support for it in the old de-facto standard. I don't think C does either. There are, however, two functions ``tmpnam'' and ``tempnam'' defined as commonly supported extensions to C. They can be found in . I have in this implementation chosen to use ``tempnam'' as it's more flexible. ``tempnam'' works like this: it accepts two string parameters named ``dir'' and ``prefix''. It first attempts to create a temporary file in the directory pointed to by the environment variable ``TMPDIR''. If that fails, it attempts to create it in the directory indicated by the ``dir'' parameter, unless it's 0, in which case a hard-coded default is attempted. It returns a ``char*'' indicating a name to use. The memory area pointed to is allocated with the C function ``malloc'', and thus must be deallocated with ``free'' and not delete[].
Over to the implementation details:
We add a class called temporaryfile, which does the above mentioned work. We also add a member variable ``pfile'' which is of type ``ptr''. Remember the ``ptr'' template from last month? It's a smart pointer that deallocates whatever it points to in its destructor. It's important that the member variable ``pfile'' is listed before the ``stream'' member, since initialisation is done in the order listed, and the ``stream'' member must be initialised from the file object owned by ``pfile''. We also add a constructor with the number of elements as its sole parameter, which makes use of the temporary file.

  class temporaryfile
  {
  public:
    temporaryfile();
    ~temporaryfile();
    iostream& stream();
  private:
    char* name;
    fstream fs;
  };

  temporaryfile::temporaryfile()
    : name(::tempnam(".","array")),
      fs(name, ios::in|ios::out|ios::binary)
  {
    // what if tmpnam fails and name is 0
    // what if fs is bad?
  }

  temporaryfile::~temporaryfile()
  {
    fs.close();
    ::remove(name);
    // what if remove fails?
    ::free(name);
  }
In the above code, ``tempnam'', ``remove'' and ``free'' are prefixed with ``::``, to make sure that it's the names in global scope that are meant, just in case someone enhances the class with a few more member functions whose name might clash. For the sake of syntactical convenience, I have added yet another operator to the ``ptr'' class template:

  template  class ptr
  {
  public:
    ptr(T* tp=0) : p(tp) {};
    ~ptr() { delete p; };
    T* operator->(void) const { return p; };
    T& operator*(void) const { return *p;};
  private:
    ptr(const ptr&);
    ptr& operator=(const ptr&);
    T* p;
  };
It's the ``operator->'' that's new, which allows us to write things like ``p->x,'' where p is a ``ptr'', and the type ``X'' contains some member named ``x''. The return type for ``operator->'' must be something that ``operator->'' can be applied to. The explanation sounds recursive, but it makes sense if you look at the above code. ``ptr::operator->()'' returns an ``X*''. ``X*'' is something you can apply the built in ``operator->'' to (which gives you access to the elements.)

  template 
  FileArray::FileArray(size_t elements)
    : pfile(new temporaryfile),
      stream(pfile->stream()),
      array_size(elements),
      home(stream.tellg())
  {
    const size_t elem_size=
      FileArrayElementAccess::size;
    // put a char just after the end to make
    // sure there's enough free disk space.
    stream.seekp(home+array_size*elem_size);
    char c;
    stream.write(&c,1);
    // what to do if write fails?
    // set put and get pointer to past the end pos
    stream.seekg(stream.tellp());
  }
That's it! The rest of the array works exactly as before. No need to rewrite anything else.

Code reuse

If you're an experienced C programmer, especially experienced with programming embedded systems where memory constraints are tough and you also have a good memory, you might get a feeling that something's wrong here.
What I'm talking about is something I mentioned the first time templates were introduced: ``Templates aren't source code. The source code is generated by the compiler when needed.'' This means that if we in a program uses FileArray, FileArray, FileArray and FileArray (where ``X'' and ``Y'' are some classes,) there will be code for all four types. Now, have a close look at the member functions and see in what way ``FileArray::FileArray(iostream& fs, size_t elements)'' differs from ``FileArray::FileArray(iostream& fs, size_t elements)''. Please do compare them.
What did you find? The only difference at all is in the handling of the member ``elem_size'', yet the same code is generated several times with that as the only difference. This is what is often referred to as the template code bloat of C++. We don't want code bloat. We want fast, tight, and slick applications.
Since the only thing that differs is the size of the elements, we can move the rest to something that isn't templatised, and use that common base everywhere. I've already shown how code reuse can be done by creating a separate class and have a member variable of that type. In this article I want to show an alternative way of reusing code, and that is through inheritance. Note very carefully that I did not say public inheritance. Public inheritance models ``is-A'' relationships only. We don't want an ``is-A'' relationship here. All we want is to reuse code to reduce code bloat. This is done through private inheritance. Private inheritance is used far less than it should be. Here's all there is to it. Create a class with the desired implementation to reuse and inherit privately from it. Nothing more, nothing less. To a user of your class, it matters not at all if you chose not to reuse code at all, reuse through encapsulation of a member variable, or reuse through private inheritance. It's not possible to refer to the descendant class through a pointer to the private base class, private inheritance is an implementation detail only, and not an interface issue.
To the point. What can, and what can not be isolated and put in a private base class? Let's first look at the data. The ``stream'' reference member can definitely be moved to the base, and so can the ``pfile'' member for temporary files. The ``array_size'' member can safely be there too and also the ``home'' member for marking the beginning of the array on the stream. By doing that alone we have saved just about nothing at all, but if we add as a data member in the base class the size (on disk) for the elements, and we can initialise that member through the ``FileArrayElementAccess::size'' traits member, all seeking in the file, including the initial seeking when creating the file array, can be moved to the base class. Now a lot has been gained. Left will be very little. Let's look at the new improved implementation:
Now for the declaration of the base class.

  class FileArrayBase
  {
  public:
  protected:
    FileArrayBase(iostream& io,
                  size_t elements,
                  size_t elem_size);
    FileArrayBase(iostream& io);
    FileArrayBase(size_t elements, size_t elem_size);
    iostream& seekp(size_t index) const;
    iostream& seekg(size_t index) const;
    size_t size() const; // number of elements
    size_t element_size() const;
  private:
    class temporaryfile
    {
    public:
      temporaryfile();
      ~temporaryfile();
      iostream& stream();
    private:
      char* name;
      fstream fs;
    };
    void initFromFile(const char* p);
    ptr pfile;
    iostream& stream;
    size_t array_size;
    size_t e_size;
    streampos home;
  };
The only surprise here should be the nesting of the class ``temporaryfile.'' Yes, it's possible to define a class within a class. Since the ``temporaryfile'' class is defined in the private section of ``FileArrayBase'', it's inaccessible from anywhere other than the ``FileArrayBase'' implementation. It's actually possible to nest classes in class templates as well, but few compilers today support that. When implementing the member functions of the nested class, it looks a bit ugly, since the surrounding scope must be used.

  FileArrayBase::temporaryfile::temporaryfile()
    : name(::tempnam(".","array")),
      fs(name,ios::in|ios::out|ios::binary)
  {
    // what if tmpnam fails and name is 0
    // what if fs is bad?
  }

  FileArrayBase::temporaryfile::~temporaryfile()
  {
    fs.close();
    ::remove(name);
    // What if remove fails?
    ::free(name);
   }

  iostream& FileArrayBase::temporaryfile::stream()
  {
    return fs;
  }
The implementation of ``FileArrayBase'' is very similar to the ``FileArray'' earlier. The only difference is that we use a parameter for the element size, instead of the traits class.

  FileArrayBase::FileArrayBase(iostream& io,
                               size_t elements,
                               size_t elem_size)
    : stream(io),
      array_size(elements),
      e_size(elem_size)
  {
    char pattern[sizeof(ArrayBegin)];
    stream.read(pattern,sizeof(pattern));
    if (stream.eof()) {
      stream.clear(); // clear error state
                      // and initialize.
      // begin of array pattern.
      stream.write(ArrayBegin,sizeof(ArrayBegin));

      // must store size of elements
      stream.write((const char*)&elem_size,
                   sizeof(elem_size));

      // and of course the number of elements
      stream.write((const char*)&array_size,
                   sizeof(array_size));

      // Now that we've written the maintenance
      // stuff, we know what the home position is.
      home = stream.tellp();

      // Then we must go the the end and write
      // the end pattern.

      stream.seekp(home+elem_size*array_size);
      stream.write(ArrayEnd,sizeof(ArrayEnd));

      // set put and get pointer to past the end pos.
      stream.seekg(stream.tellp());
      return;
    }
    initFromFile(pattern); // shared with other
                           // stream constructor

    if (array_size != elements) {
      // Uh oh. The data read from the stream,
      // and the size given in the constructor
      // mismatches! What now?

      stream.clear(ios::failbit);
    }
    if (e_size != elem_size) {
      stream.clear(ios::failbit);
    }
    // set put and get pointer to past the end pos.
    stream.seekp(stream.tellg());
  }
To make life a little bit easier, I've assumed two arrays of char named ``ArrayBegin'' and ``ArrayEnd'', which hold the patterns to be used for marking the beginning and end of an array on disk.

  FileArrayBase::FileArrayBase(iostream& io)
    : stream(io)
  {
    char pattern[sizeof(ArrayBegin)];
    stream.read(pattern,sizeof(pattern));
    initFromFile(pattern);

    // set put and get pointer to past the end pos.
    stream.seekp(stream.tellg());
  }

  FileArrayBase::FileArrayBase(size_t elements,
                               size_t elem_size)
    : pfile(new temporaryfile),
      stream(pfile->stream()),
      array_size(elements),
      e_size(elem_size),
      home(stream.tellg())
  {
    stream.seekp(home+array_size*e_size);
    char c;
    stream.write(&c,1);
    // set put and get pointer to past the end pos.
    stream.seekg(stream.tellp());
  }

  void FileArrayBase::initFromFile(const char* p)
  {
    // Check if the read pattern is correct
    if (strncmp(p,ArrayBegin,sizeof(ArrayBegin))) {
      // What to do? It was all wrong!
      stream.clear(ios::failbit); // for lack of better,
                                  // set the fail flag.
      return;
    }
    // OK, we have a valid array, now let's see if
    // it's of the right kind.
    stream.read((char*)&e_size,sizeof(e_size));

    // Get the size of the array. Can't do much with
    // the size here, though.
    stream.read((char*)&array_size,sizeof(array_size));

    // Now we're past the header, so we know where the
    // data begins and can set the home position.
    home = stream.tellg();
    stream.seekg(home+e_size*array_size);
    // Now positioned immediately after the last
    // element.
    char epattern[sizeof(ArrayEnd)];
    stream.read(epattern,sizeof(epattern));
    if (strncmp(epattern,ArrayEnd,sizeof(ArrayEnd)))
    {
      // Whoops, corrupt file!
      stream.clear(ios::failbit);
      return;
    }
    // Seems like we have a valid array!
  }

  iostream& FileArrayBase::seekg(size_t index) const
  {
    // what if index is out of bounds?
    stream.seekg(home+index*e_size);
    // what if seek failed?
    return stream;
  }

  iostream& FileArrayBase::seekp(size_t index) const
  {
    // What if index is out of bounds?
    stream.seekp(home+index*e_size);
    // What if seek failed?
    return stream;
  }

  size_t FileArrayBase::size() const
  {
    return array_size;
  }

  size_t FileArrayBase::element_size() const
  {
    return e_size;
  }
Apart from the tricky questions, it's all pretty straight forward. The really good news, however, is how easy this makes the implementation of the class template ``FileArray''.

  template 
  class FileArray : private FileArrayBase
  {
  public:
    FileArray(iostream& io, size_t size);// create one.
    FileArray(iostream& io); // use existing array
    FileArray(size_t elements);  // create temporary
    T operator[](size_t index) const;
    FileArrayProxy operator[](size_t index);
    size_t size() { return FileArrayBase::size(); };
  private:
    FileArray(const FileArray&); // illegal
    FileArray& operator=(const FileArray&);
    // illegal

    T readElement(size_t index) const;
    void storeElement(size_t index, const T& elem);
    friend class FileArrayProxy;
  };
Now watch this!

  template 
  FileArray::FileArray(iostream& io, size_t size)
    : FileArrayBase(io,
                    elements,
                    FileArrayElementAccess::size)
  {
  }

  template 
  FileArray::FileArray(iostream& io)
    : FileArrayBase(io)
  {
    // what if element_size is wrong?
  }

  template 
  FileArray::FileArray(size_t elements)
    : FileArrayBase(elements,
                    FileArrayElementAccess::size)
  {
  }

  template 
  T FileArray::operator[](size_t index) const
  {
    // what if index>= size()?
    return readElement(index);
  }

  template 
  FileArrayProxy
  FileArray::operator[](size_t index)
  {
    // what if index>= size()?
    return FileArrayProxy(*this, index);
  }

  template 
  T FileArray::readElement(size_t index) const
  {
    // what if index>= size()?
    iostream& s = seekg(index); // parent seekg
    return FileArrayElementAccess::readFrom(s);
    // what if read failed?
    // What if too much data was read?
    return t;
  }

  template 
  void FileArray::storeElement(size_t index,
                                  const T& element)
  { // what if index>= size()?
    iostream& s = seekp(index); // parent seekp
    // what if seek fails?
    FileArrayElementAccess::writeTo(element,s);
    // what if write failed?
    // What if too much data was written?
  }
How much easier can it get? This reduced code bloat, and also makes the source code easier to understand, extend and maintain.

What can go wrong?

Already in the very beginning of this article series, part 1, I introduced exceptions; the C++ error handling mechanism. Of course exceptions should be used to handle the error situations that can occur in our array class. When I introduced exceptions, I didn't tell the whole truth about them. There was one thing I didn't tell, because at that time it wouldn't have made much sense. That one thing is that when exceptions are caught, dynamic binding works, or to use wording slightly more English-like, we can create exception class hierarchies with public inheritance, and we can choose what level to catch. Here's a mini example showing the idea:

  class A {};
  class B : public A {};
  class C : public A {};
  class B1 : public B{};

  void f() (throw A); // may throw any of the above

  void x()
  {
    try {
      f();
    }
    catch (B& b) {
      // **1
    }
    catch (C& c) {
      // **2
    }
    catch (A& a) {
      // **3
    }
  }
At ``**1'' above, objects of class ``B'' and class ``B1'' are caught if thrown from ``f''. In ``**2'' objects of class ``C'' (and descendants of C, if any are declared elsewhere) are caught. At ``**3'' all others from the ``A'' hierarchy are caught. This may seem like a curious detail of purely academic worth, but it's extremely useful. We can use abstraction levels for errors. For example, we can have a root class ``FileArrayException'', from which all other exceptions regarding the file array inherits. We can see that there are clearly two kinds of errors that can occur in the file array; abuse and environmental issues outside the control of the programmer. For abuse I mean things like indexing outside the valid bounds, and with environmental issues I mean faulty or full disks (Since there are several programs running, a check if there's enough disk space is still taking a chance. Even if there was enough free space when the check was made, that space may be occupied when the next statement in the program is executed.)
A reasonable start for the exception hierarchy then becomes:

  class FileArrayException {};
  class FileArrayLogicError
    : public FileArrayException {};
  class FileArrayRuntimeError
    : public FileArray Exception {};
Here ``FileArrayLogicError'' are for clear violations of the not too clearly stated preconditions, and ``FileArrayRuntimeError'' for things that the programmer may not have a chance to do something about. In a perfectly debugged program, the only exceptions ever thrown from file arrays will be of the ``FileArrayRuntimeError'' kind. We can divide those further into:

  class FileArrayCreateError
    : public FileArrayRuntimeError {};
For whenever the creation of the array fails, regardless of why (it's not very easy to find out if it's a faulty disk or lack of disk space, for example.)

  class FileArrayStreamError
    : public FileArrayRuntimeError {};
If after creation, something goes wrong with a stream; for example if seeking or reading/writing fails.

  class FileArrayDataCorruptionError
    : public FileArrayRuntimeError {};
If an array is created from an old existing file, and we note that the header or trailer doesn't match the expected.

  class FileArrayBoundsError
    : public FileArrayLogicError {};
Addressing outside the legal bounds.

  class FileArrayElementSizeError
    : public FileArrayLogicError {};
If the read/write members of the element access traits class are faulty and either write too much (thus overwriting the data for the next element) or reads too much (in which case the last few bytes read will be garbage picked from the next element.) It's of course possible to take this even further. I think this is quite enough, though.
Now we have a reasonably fine level of error reporting, yet an application that wishes a coarse level of error handling can choose to catch the higher levels of the hierarchy only.
As an exercise, I invite you to add the throws to the code. Beware, however; it's not a good idea to add exception specifications to the member functions making use of the T's (since you cannot know which operations on T's that may throw, and what they do throw.) You can increase the code size and eligibility gain from the private inheritance of the implementation in the base by putting quite a lot of the error handling there.

Iterators

An iterator into a file array is something whose behavior is analogous to that of pointers into arrays. We want to be able to create an iterator from the array (in which case the iterator refers to the first element of the array.) We want to access that element by dereferencing the iterator (unary operator *,) and we want iterator arithmetic with integers.
An easy way of getting there is to let an iterator contain a pointer to a file array, and an index. Whenever the iterator is dereferenced, we return (*array)[index]. That way we even have error handling for iterator arithmetic that lead us outside the valid range for the array given for free from the array itself. The iterator arithmetics becomes simple too, since it's just ordinary arithmetics on the index type. The implementation thus seems easy; all that's needed is to define the operations needed for the iterators, and the actions we want. Here's my idea:
  • creation from array yields iterator referring to first element
  • copy construction and assignment are of course well behaved.
  • moving forwards and backwards with operator++ and operator--.
  • addition of array and ``long int'' value ``n'' yields iterator referring to n:th element of array.
  • iterator+=n (where n is of type long int) adds n to the value of the index in the iterator. This addition is never an error; it's dereferencing the iterator that's an error if the index is out of range. Operator -= is analogous.
  • iterator+n yields a new iterator referring to the iterator.index+n:th element of the array, and analogous for operator-.
  • iterator1-iterator2 yields a long int which is the difference between the indices of the iterators. If iterator1 and iterator2 refer to different arrays, it's an error and we throw an exception.
  • iterator1==iterator2 returns non-zero if the arrays and indices of iterator1 and iterator2 are equal.
  • iterator1!=iterator2 returns !(iterator1==iterator2)
  • *iterator returns whatever (*array)[index] returns, i.e a
  • leArrayProxy. * iterator[n] returns (*array)[index+n].
  • iterator1< iterator2.index. If the iterators refer to different arrays, it's an error and we throw an exception. Likewise for operator>.
  • iterator1>=iterator2 returns !(iterator1<=.
I think the above is an exhaustive list. Neither of the above is difficult. It's just a lot of code to write, and thus a good chance of making errors. With a little thought, however, quite a lot of code can be reused over and over, thus reducing the amount to write and also the risk for errors. As an example, a rule of thumb when writing a class for which an object ``o'' and some other value ``v'' the operations ``o+=v'', ``o+v'' and ``v+o'' are well defined and behaves like they do for the built in types (which they really ought to, unless you want to give the class users some rather unhealthy surprises) is to define ``operator+='' as a member of the class, and two versions of operator+ that are implemented with ``operator+=''. Here's how it's done in the iterator example:

  template 
  class FileArrayIterator
  {
  public:
    FileArrayIterator(FileArray& f);
    FileArrayIterator& operator+=(long n);
    FileArrayProxy operator*();
    FileArrayProxy operator[](long n);
    ...
  private:
    FileArray* array;
    unsigned long index;
  };

  template  FileArrayIterator
  operator+(const FileArrayIterator& i, long n);

  template  FileArrayIterator
  operator+(long n, const FileArrayIterator& i);

  template 
  FileArrayIterator::FileArrayIterator(
    const FileArray& a
  )
    : array(&a),
      index(0)
  {
  }

  template 
  FileArrayIterator::FileArrayIterator(
    const FileArrayIterator& i
  )
    : array(i.array),
      index(i.index)
  {
  }

  template 
  FileArrayIterator&
  FileArrayIterator::operator+=(long n)
  {
    index+=n;
    return *this;
  }

  template  FileArrayIterator
  operator+(const FileArrayIterator& i, long n)
  {
    FileArrayIterator it(i);
    return it+=n;
  }

  template  FileArrayIterator
  operator+(long n, const FileArrayIterator& i)
  {
    FileArrayIterator it(i);
    return it+=n;
  }

  template 
  FileArrayProxy FileArrayIterator::operator*()
  {
    return (*array)[index];
  }

  template 
  FileArrayProxy
  FileArrayIterator::operator[](long n)
  {
    return (*array)[index+n];
  }
Surely, the code for the two versions of ``operator+'' must be written, but since its behaviour is defined in terms of ``operator+='' it means that if we have an error, there's only one place to correct it. There's no need to display all the code here in the article, you can study it in the sources. The above shows how it all works, though, and as you can see, it's fairly simple.

Recap

This month the news in short was:
  • You can increase flexibility for your templates without sacrificing ease of use or safety by using traits classes.
  • Enumerations in classes can be used to have class-scope constants of integral type.
  • Modern compilers do not need the above hack. Defining a class-scope static constant of an integral type in the class declaration is cleaner and more type safe.
  • Standard C++ and even C, does not have any support for the notion of temporary files. Fortunately there are commonly supported extensions to the languages that do.
  • Private inheritance can be used for code reuse.
  • Private inheritance is very different from public inheritance. Public inheritance models ``is-A'' relationships, while private inheritance models ``is-implemented-in-terms-of'' relationships.
  • A user of a class that has privately inherited from something else cannot take advantage of this fact. To a user the private inheritance doesn't make any difference.
  • Private inheritance is in real-life used far less than it should be. In many situations where public inheritance is used, private inheritance should've been used.
  • Exception catching is polymorphic (i.e. dynamic binding works when catching.)
  • The polymorphism of exception catching allows us to create an arbitrarily fine-grained error reporting mechanism while still allowing users who want a coarse error reporting mechanism to use one (they'll just catch classes near the root of the exception class inheritance tree.)
  • Always implement binary operator+, operator-, operator* and operator/ as functions outside the classes, and always implement them in terms of the operator+=, operator-=, operator*= and operator/= members of the classes.

Exercises

  • Alter the file array such that it's possible to instantiate two (or more) kinds of FileArray in the same program, where the alternatives store the data in different formats. (hint, the alternatives will all need different traits class specialisations.)
  • What's the difference between using private inheritance of a base class, and using a member variable of that same class, for reusing code?
  • In which situations is it crucial which alternative you choose?

An Introduction to C++ Programming - Part 9

In parts 5 and 6, the basics of I/O were introduced, with formatted reading and writing from standard input and output. We'll now have a look at I/O for files. In a sense, it's better to stop using the term I/O here, and instead use streams and streaming, since the ideas expressed here and in parts 5 and 6 can be used for other things than I/O, for example in-memory formatting of data (we'll see that at the very end of this article.)

Files

In what way is writing ``Hello world'' on standard output different from writing it to a file? The question is worth some thought, since in many programming languages there is a distinct difference. Is the message different? Is the format (as seen from the program) different? I cannot see any difference in those aspects. The only thing that truly differs is the media where the formatted message ends up. In the former case, it's on your screen, but for file I/O it's in a file somewhere on your hard disk. In other words, there is very little difference, or at least, there's very much in common.
As we've seen so far, commonality is expressed either through inheritance or templates, depending on what's common and what's not. To refresh your memory, templates are used when we want the same kind of behaviour, independent of data. For example a stack of some data type. Inheritance is used when you want similar, but in some important aspects different, behaviour at runtime for the same kind of data. We saw this for the staff hierarchy and mailing addresses in parts 7 and 8. In this case it's inheritance that's the correct solution, since the data will be the same, but where it will end up (and most notably, how it does end up there) differs. (Incidentally, there's a good case for using templates too, regarding the type of characters used. The C++ standard does indeed have templatized streams, just for differing between character types. Few compilers today support this, however. See the ``Standards Update'' towards the end of the article for more information.)
The inheritance tree for stream types look like this:

The way to read this is that there's a base class named ``ios'', from which the classes ``istream'' and ``ostream'' inherit. The classes ``ifstream'' and ``ofstream'' in their turn inherit from ``istream'' and ``ostream'' respectively. The ``f'' in the names imply that they're file streams. Then there's the odd ones, ``iostream'', which inherits from both ``istream'' and ``ostream'', and ``fstream'' which inherits from both ``ifstream'' and ``ofstream.'' Inheriting from two bases is called multiple inheritance, and is by many seen as evil. Many programming languages have banned it: Objective-C, Java, Smalltalk to mention a few, while other programming languages, like Eiffel, go to the other extreme and allow you to inherit the same base several times Personally I think multiple inheritance is very useful if used right, but it can cause severe problems. Here is a situation where it's used in the right way. Anyway, this means that ``fstream'' is a file stream for both reading and writing, while ``iostream'' is an abstract stream for both reading and writing. More often than you think, you probably don't want to use the ``iostream'' or ``fstream'' classes.
This inheritance, however, means that all the stream insertion and extraction functions (the ``operator>>'' and ``operator<<'') you've written, will work just as they do with file streams. Now, wasn't that neat? In other words, the only things you need to learn for file based I/O are the details that are specific to files.

File Streams

The first thing you need to know before you can use file streams is how to create them. The parts of interest look like this:

  class ifstream : public istream
  {
    ifstream();
    ifstream(const char* name,
             int mode=ios::in);
    void open(const char* name,
              int mode=ios::in);
    ...
  };

  class ofstream : public ostream
  {
    ofstream();
    ofstream(const char* name,
             int mode=ios::out);
    void open(const char* name,
              int mode=ios::out);
    ...
  };

  class fstream : public ofstream, public ifstream
  {
    fstream();
    fstream(const char* name,
            int mode);
    void open(const char* name,
              int mode);
    ...
  };
You get access to the classes by #including . The empty constructors always create a file stream object that is not tied to any file. To tie such an object to a file, a call to ``open'' must be made. ``open'' and the constructors with parameters behaves identically. ``name'' is of course the name of the file. Since you normally use either ``ifstream'' or ``ofstream'' and rarely ``fstream'', this is normally the only parameter you need to supply. Sometimes, however, you need to use the ``mode'' parameter. It's a bit field, in which you use bitwise or (``operator|'') for any of the values ``ios::in'', ``ios::out'', ``ios::ate'', ``ios::app'', ``ios::trunc'', and finally ``ios::binary.'' Some implementations also provide ``ios::nocreate'' and ``ios::noreplace,'' but those are extensions. Some implementations do not have ``ios::binary,'' while others call it ``ios::bin.'' These variations of course makes it difficult to write portable C++ today. Fortunately, the six ones listed first are required by the standard (although, they belong to class ``ios_base,'' rather than ``ios.'') The meaning of these are:

  ios::in        open for reading

  ios::out       open for writing

  ios::ate       open with the get and set pointer at the end
                 (see Seeking for info) of the file.

  ios::app       open for append, that is, any write you make
                 to the file will be appended to the file.

  ios::trunc     scrap all data in the file if it already exists.

  ios::binary    open in binary mode, that is, do not do the brain
                 damaged LF<->CR/LF conversions that OS/2,
                 DOS, CP/M (RIP), Windows, and probably other
                 operating systems, so often insist on. The reason
                 some implementations do not have ios::binary
                 is that many operating systems do not have this
                 conversion, so there's no need for it.

  ios::noreplace cause the open to fail if the file already exists.

  ios::nocreate  cause the open to fail if the file doesn't exist.
Of course combinations like ``ios::noreplace | ios::nocreate'' doesn't make sense -- the failure is guaranteed. On many implementations today there's also a third parameter for the constructors and ``open;'' a protection parameter. How this parameter behaves is very operating system dependent.
Now for some simple usage:

  #include 

  int main(int argc, char* argv[])
  {
    if (argc != 2) {
      cout << ``Usage: `` << argv[0] << ``filename'' << endl;
      return 1; // error code
    }

    ofstream of(argv[1]); // create the ofstream object
                          // and open the file.

    if (!of) { // something went wrong
      cout << ``Error, cannot open `` << argv[1] << endl;
      return 2;
    }

    // Now the file stream object is created. Write to it!
    of << ``Hello file!'' << endl;
    return 0;
  }
As you can see, once the stream object is created, its usage is analogous to that of ``cout'' that you're already familiar with. Of course reading with ``ifstream'' is done the same way, just use the object as you've used ``cin'' earlier. The file stream classes also have a member function ``close'', that by force closes the file and unties the stream object from it. Few are the situations when you need to call this member function, since the destructors do close the file.
Actually this is all there is that's specific to files.

Binary streaming

So far we've dealt with formatted streaming only, that is, the process of translating raw data into a human readable form, or translating human readable data into the computer's internal representation. Some times you want to stream raw data as raw data, for example to save space in a file. If you look at a file produced by, for example a word processor, it's most likely not in a human readable form. Note that binary streaming does not necessarily mean using the ``ios::binary'' mode when opening a file (although, that is indeed often the case.) They're two different concepts. Binary streaming is what you use your stream for, raw data that is, and opening a file with the ``ios::binary'' mode, means turning the brain damaged LF<->CR/LF translation off.
Binary streaming is done through the stream member functions :

  class ostream ...
  {
  public:
    ostream& write(const char* s, streamsize n);
    ostream& put(char c);
    ostream& flush();
  ...
  };

  class istream ...
  {
  public:
    istream& read(char* s, streamsize n);
    int get();
    istream& get(char& c);
    istream& get(char* s, streamsize n, char delim='\n');
    istream& getline(char* s, streamsize n,
                     char delim='\n');
    istream& ignore(streamsize n=1, int delim=EOF);
  };
The writing interface is extremely simple and straight forward, while the reading interface includes a number of small but important differences. Note that these member functions are implemented in classes ``istream'' and ``ostream,'' so they're not specific to files, although files are where you're most likely to use them. Let's have a look at them, one by one:

  ostream& ostream::write(const char* s, streamsize n);
Write ``n'' characters to the stream, from the array pointed to by ``s.'' ``streamsize'' is a signed integral data type. Despite ``streamsize'' being signed, you're of course not allowed to pass a negative size here (what would that mean?) Exactly the characters found in ``s'' will be written to the stream, no more, no less.

  ostream& ostream::put(char c);
Inserts the character into the stream.

  ostream& ostream::flush();
Force the data in the stream to be written (file streams are usually buffered.)

  istream& istream::read(char* s, streamsize n);
Read ``n'' characters into the array pointed to by ``s.'' Here you better make sure that the array is large enough, or unpleasant things will happen. Note that only the characters read from the stream are inserted into the array. It will not be zero terminated, unless the last character read from the stream indeed is '\0'.

  int istream::get();
Read one character from the stream, and return it. The value is an ``int'' instead of ``char'' since the return value might be ``EOF'' (which is not uniquely representable as a ``char.'')

  istream& istream::get(char& c);
Same as above, but read the character into ``c'' instead. Here a ``char'' is used instead of an ``int,'' since you can check the value directly by calling ``.eof()'' on the reference returned.

  istream& istream::get(char* s, streamsize n,
                        char delim='\n');
This one's similar to ``read'' above, but with the difference that it reads at most ``n'' characters. It stops if the delimiter character is found. Note that when the delimiter is found, it is not read from the stream.

  istream& istream::getline(char* s, streamsize n,
                            char delim='\n');
The only difference between this one and ``get'' above, is that this one does read the delimiter from the stream. Note, however, that the delimiter is not stored in the array.

  istream& istream::ignore(streamsize n=1,
                           int delim=EOF);
Reads at most ``n'' characters from the stream, but doesn't store them anywhere. If the delimiter character is read, it stops there. Of course, if the delimiter is ``EOF'' (as is the default) it does not read past ``EOF,'' that's physically impossible.

Array on file

An example: Say we want to store an array of integers in a file, and we want to do this in raw binary format. Naturally we want to be able to read the array as well. A reasonable way is to first store a size (in elements) followed by the data. Both the size and the data will be in raw format.

  #include 

  void storeArray(ostream& os, const int* p, size_t elems)
  {
    os.write((const char*)&elems,sizeof(elems));
    os.write((const char*)p, elems*sizeof(*p));
  }
The above code does a lot of ugly type casting, but that's normal for binary streaming. What's done here is to use brute force to see the address of ``elems'' as a ``const char*'' (since that's what ``write'' expects) and then say that only the ``sizeof(elems)'' bytes from that pointer are to be read. What this actually does is to write out the raw memory that ``elems'' resides in to the stream. After this, it does the same kind of thing for the array. Note that ``sizeof(*p)'' reports the size of the type that ``p'' points to. I could as well have written ``sizeof(int),'' but that is a dangerous duplication of facts. It's enough that I've said that ``p'' is a pointer to ``int.'' Repeating ``int'' again just means I'll forget to update one of them when I change the type to something else. To read such an array into memory requires a little more work:

  #include 

  size_t readArray(istream& is, int*& p)
  {
    size_t elems;
    is.read((char*)&elems, sizeof(elems));
    p = new int[elems];
    is.read((char*)elems, elems*sizeof(*p));
    return elems;
  }
It's not particularly hard to follow; first read the number of elements, then allocate an array of that size, and read the data into it.

Seeking

Up until now we have seen streams as, what it sounds like, continuous streams of data. Sometimes however, there's a need to move around, both backward and forward. Streams like standard input and standard output are truly continuous streams, within which you cannot move around. Files, in contrast, are true random access data stores. Random access streams have something called position pointers. They're not to be confused with pointers in the normal C++ sense, but it's something referring to where in the file you currently are. There's the put pointer, which refers to the next position to write data to, if you attempt to write anything, and the get pointer, which refers to the next position to read data from. An ostream of course only has the put pointer, and an istream only the get pointer. There's a total of 6 new member functions that deal with random access in a stream:

  streampos istream::tellg();

  istream& istream::seekg(streampos);

  istream& istream::seekg(streamoff, ios::seek_dir);

  streampos ostream::tellp();

  ostream& ostream::seekp(streampos);

  ostream& ostream::seekp(streamoff, ios::seek_dir);
``streampos'', which you get from ``tellg'' and ``tellp'' is an absolute position in a stream. You cannot use the values for anything other than ``seekg'' and ``seekp''. You especially cannot examine a value and hope to find something useful there (i.e. you can, but what you find out might hold only for the current release of your specific compiler, other compilers, or other releases of the same compiler, might show different characteristics for ``streampos.'') Well, there are two other things you can do with ``streampos'' values. You can subtract two values, and get a ``streamoff'' value, and you can add a ``streamoff'' value to a ``streampos'' value. ``streamoff,'' by the way, is some signed integral type, probably a ``long.'' By using the value returned from ``tellg'' or ``tellp,'' you have a way of finding your way back, or do relative searches by adding/subtracting ``streamoff'' values.
The ``seekg'' and ``seekp'' methods accept a ``streamoff'' value and a direction, and work in a slightly different way. You search your way to a position relative to the beginning of the stream, the end of the stream, or the current position, the selection of which, is done through the ``ios::seek_dir'' enum, which has these three values ``ios::beg'', ``ios::end'' and ``ios::cur.'' To make the next write occur on the very first byte of the stream, call ``os.seekp(0,ios::beg),'' where ``os'' is some random access ``ostream.''
In any reasonable implementation, any of the seek member functions use lazy evaluation. That is, when you call any of the seek member functions, the only thing that happens is that some member variable in the stream object changes value. It's not until you actually read or write, something truly happens on disk (or wherever the stream data resides.)

A stream array, for really huge amounts of data

Suppose we have a need to access enormous amounts of simple data, say 10 million floating point numbers. It's not a very good idea to just allocate that much memory, at least not on my machine with a measly 64Mb RAM. It'll not just make this application crawl, but probably the whole system due to excessive paging. Instead, let's use a file to access the data. This makes for slow access, for sure, but nothing else will suffer.
Here's the idea. The array must be possible to use with any data type, including user defined classes. Its usage must resemble that of real arrays as much as possible, but extra functionality that arrays do not have, such as asking for the number of elements in it, is OK. There must be a type, resembling pointers to arrays, that can be used for traversing it. We do not want the size of the array to be part of its type (if you've programmed in Pascal, you know why.) In addition to arrays, we want some measures of safety from stupid mistakes, such as addressing beyond the range of the array, and also for errors that arrays cannot have (disk full, cannot create file, disk corruption, etc.) We also want to say that an array is just a part of a file and not necessarily an entire file. This would allow the user to create several arrays within the same file. To prevent this article from growing way too long, quite a few of the above listed features will be left for next month. The things to cover this month are: An array of built-in fundamental types only, which lacks pointers and is limited to one file per array. We'll also skip error handling for now (you can add it as an exercise, I'll raise some interesting questions along the way,) and add that too next month.
First of all, the array must be a template, so it can be used to store arbitrary types. Since we do not want the size to be part of the type signature, the size is not a template parameter, but a parameter for the constructor. Of course, we cannot have the entire array duplicated in memory (then all the benefits will be lost,) instead we will search for the data on file every time it's needed.
Here's the outline for the class.

  template 

  class FileArray
  {
  public:
    FileArray(const char* name, size_t elements);
    // Create a new array and set the size.

    FileArray(const char* name);
    // Create an array from an existing file, get the
    // size from the file.

    // use compiler defined destructor.

    T operator[](size_t index) const;
    ??? operator[](size_t index);

    size_t size() const;
  private:
    // don't want these to be used.
    FileArray(const FileArray&);
    FileArray& operator=(const FileArray&);
    ...
  };
As can be expected, ``operator[]'' can be overloaded, which is handy for providing a familiar syntax. However, already here we see a problem. What's the non-const ``operator[]'' to return? To see why this is a problem, ask yourself what you want ``operator[]'' to do. I want ``operator[]'' to do two things, depending on where it's used; like this:

  FileArray x;
  ...
  x[5] = 4;
  int y = x[3];
When ``operator[]'' is on the left hand side of an assignment, I want to write data to the file, and if its on the right hand side of an assignment, I want to read data from the file. Ouch. Warning: I've often seen it suggested that the solution is to have the const version read and return a value, and the non-const version write a value. As slick as it would be, it's wrong and it won't work. The const version is called for const array objects, the non-const version for non-const array objects.
Instead what we have to do is to pull a little trick. The trick is, as so often in computer science, to add another level of indirection. This is done by not taking care of the problem in ``operator[],'' but rather let it return a type, which does the job. We create a class template, looking like this:

  template 
  class FileArrayProxy
  {
  public:
    FileArrayProxy& operator=(const T&); // write value
    operator T() const; // read a value

    // compiler generated destructor

    FileArrayProxy&
    operator=(const FileArrayProxy& p);

    FileArrayProxy(const FileArrayProxy&);
  private:
    ... all other constructors.
    FileArray& array;
    const size_t index;
  };
We have to make sure, of course, that there are member functions in ``FileArray'' that can read and write (and of course, those functions are not the ``operator[],'' since then we'd have an infinite recursion.) All constructors, except for the copy constructors, are made private to prevent users from creating objects of the class whenever they want to. After all, this class is a helper for the array only, and is not intended to ever even be seen. This, however, poses a problem; with the constructors being private, how can ``FileArray::operator[]()'' create and return one? Enter another C++ feature: friends. Friends are a way of breaking encapsulation. What?!?! Yes, what you read is right. Friends break encapsulation, and (this is the real shock) that's a good thing! Friends break encapsulation in a controlled way. We can, in ``FileArrayProxy'' declare ``FileArray'' to be a friend. This means that ``FileArray'' can access everything in ``FileArrayProxy,'' including things that are declared private. Paradoxically, violating encapsulation with friendship strengthens encapsulation when done right. The only alternative here to using friendship, is to make the constructors public, but then anyone can create objects of this class, and that's what we wanted to prevent. Friends are useful for strong encapsulation, but it's important to use it only in situations where two (or more classes) are so tightly bound to one another that they're meaningless on their own. This is the case with ``FileArrayProxy.'' It's meaningless without ``FileArray,'' thus ``FileArray'' is declared a friend of ``FileArrayProxy.'' The declaration then becomes:

  template 

  class FileArrayProxy
  {
  public:
    FileArrayProxy& operator=(const T&); // write a value
    operator T() const; // read a value
    // compiler generated destructor

    FileArrayProxy& // read from p and then write
    operator=(const FileArrayProxy& p);

    // compiler generated copy contructor
  private:
    FileArrayProxy(FileArray& fa, size_t n);
    // for use by FileArray only.

    FileArray& array;
    const size_t index;

    friend class FileArray;
  };
We can now start implementing the array. Some problems still lie ahead, but I'll mention them as we go.

  // farray.hpp
  #ifndef FARRAY_HPP
  #define FARRAY_HPP

  #include 
  #include  // size_t

  template  class FileArrayProxy;
  // Forward declaration necessary, since FileArray
  // returns the type.

  template  class FileArray
  {
  public:
    FileArray(const char* name, size_t size); // create
    FileArray(const char* name); // use existing array
    T operator[](size_t size) const;
    FileArrayProxy operator[](size_t size);
    size_t size() const;
  private:
    FileArray(const FileArray&); // illegal
    FileArray& operator=(const FileArray&);

    // for use by FileArrayProxy
    T readElement(size_t index) const;
    void storeElement(size_t index, const T&);

    fstream stream;
    size_t max_size;

    friend class FileArrayProxy;
  };
The functions for reading and writing are made private members of the array, since they're not for anyone to use. Again, we need to make use of friendship to grant ``FileArrayProxy'' the right to access them. Let's define them right away

  template 
  T FileArray::readElement(size_t index) const
  {
    T t;
    stream.seekg(sizeof(max_size)+index*sizeof(T));
    // what if seek fails?

    stream.read((char*)&t, sizeof(t));
    // what if read fails?

    return t;
  }
All of a sudden, we face an unexpected problem. The above code won't compile. The member function is declared ``const'', and as such, all member variables are ``const'', and neither ``seekg'' nor ``read'' are allowed on constant streams. The problem is one of differing between logical constness and bitwise constness. This member function is logically ``const'', as it does not alter the array in any way. However, it is not bitwise const; the stream member changes. C++ cannot understand logical constness, only bitwise constness. If you have a modern compiler, the solution is very simple; you declare ``stream'' to be ``mutable fstream stream;'' in the class definition. I, however, have a very old compiler, so I have to find a different solution. This solution is, yet again, one of adding another level of indirection. I can have a pointer to an ``fstream.'' When in a ``const'' member function, the pointer is also ``const'', but not what it points to (there's a difference between a constant pointer, and a pointer to a constant.) The only reasonable way to achieve this is to store the stream object on the heap, and in doing this I introduce a possible danger; what if I forget to delete the pointer? Sure, I'll delete it in the destructor, but what if an exception is thrown already in the constructor, then the destructor will never execute (since no object has been created that must be destroyed.) Do you remember the ``thing to think of until this month?'' The clues were, destructor, pointer and delete. Thought of anything? What about this extremely simple class template?

  template 
  class ptr
  {
  public:
    ptr(T* pt);
    ~ptr();

    T& operator*() const;
  private:
    ptr(const ptr&); // we don't want copying
    ptr& operator=(const ptr&); // nor assignment

    T* p;
  };

  template 
  ptr::ptr(T* pt)
    : p(pt)
  {
  }

  template 
  ptr::~ptr()
  {
    delete p;
  }

  template 
  T& ptr::operator*() const
  {
    return *p;
  }
This is probably the simplest possible of the family known as ``smart pointers.'' I'll probably devote a whole article exclusively for these some time. Whenever an object of this type is destroyed, whatever it points to is deleted. The only thing we have to keep in mind when using it, is to make sure that whatever we feed it is allocated on heap (and is not an array) so it can be deleted with operator delete. This solves our problem nicely. When this thing is a constant, the thing pointed to still isn't a constant (look at the return type for ``operator*,'' it's a ``T&,'' not a ``const T&.'') So, instead of using an ``fstream'' member variable called ``stream,'' let's use a ``ptr'' member named ``pstream.'' With this change, ``readElement'' must be slightly rewritten:

  template 
  T FileArray::readElement(size_t index) const
  {
    (*pstream).seekg(sizeof(max_size)+index*sizeof(T));
    // what if seek fails?

    T t;
    (*pstream).read((char*)&t, sizeof(t));
    // what if read fails?

    return t;
  }
I bet the change wasn't too horrifying.

  template 
  void FileArray::storeElement(size_t index,
                                  const T& elem)
  {
    (*pstream).seekp(sizeof(max_size)+index*sizeof(T),
                     ios::beg);
    // what if seek fails?

    (*pstream).write((char*)&elem, sizeof(elem));
    // what if write failed?
  }
Now for the constructors:

  template 
  FileArray::FileArray(const char* name, size_t size)
    : pstream(new fstream(name, ios::in|ios::out|ios::binary)),
      max_size(size)
  {
    // what if the file could not be opened?

    // store the size on file.
    (*pstream).write((const char*)&max_size,
                     sizeof(max_size));
    // what if write failed?

    // We want to write a value (any value) at the end
    // to make sure there is enough space on disk.

    T t;
    storeElement(max_size-1,t);
    // What if this fails?
  }

  template 
  FileArray::FileArray(const char* name)
    : pstream(new fstream(name, ios::in|ios::out|ios::binary)),
      max_size(0)
  {
    // get the size from file.
    (*pstream).read((char*)&max_size,
                    sizeof(max_size));
    // what if read fails or max_size == 0?
    // How do we know the file is even an array?
  }
The access members:

  template 
  T FileArray::operator[](size_t size) const
  {
    // what if size >= max_size?
    return readElement(size);
    // What if read failed because of a disk error?
  }

  template 
  FileArrayProxy FileArray::operator[](size_t size)
  {
    // what if size >= max_size?
    return FileArrayProxy(*this , size);
  }
Well, this wasn't too much work, but then, as can be seen by the comments, there's absolutely no error handling here. I've left out the ``size'' member function, since its implementation is trivial. Next in line is ``FileArrayProxy.''

  template 
  class FileArrayProxy
  {
  public:
    // copy constructor generated by compiler
    operator T() const;
    FileArrayProxy& operator=(const T& t);
    FileArrayProxy&
      operator=(const FileArrayProxy& p);
    // read from one array and write to the other.
  private:
    FileArrayProxy(FileArray& f, size_t i);

    size_t index;
    FileArray& fa;

    friend class FileArray;
  };
The copy constructor is needed, since the return value must be copied (return from ``FileArray::operator[],'') and it must be public for this to succeed. The one that the compiler generates for us, which just copies all member variables, will do just fine. The compiler doesn't generate a default constructor (one which accepts no parameters,) since we have explicitly defined a contructor. The assignment operator is necessary, however. Sure, the compiler will try to generate one for us if we don't, but it will fail, since references (``fa'') can't be rebound. Note, however, that if we instead of a reference had used a pointer, it would succeed, but the result would *NOT* be what we want. What it would do is to copy the member variables, but what we want to do is to read data from one array and write it to another. Now for the implementation:

  template 
  FileArrayProxy::FileArrayProxy(FileArray& f,
                                    size_t i)
    : index(i),
      fa(f)
  {
  }

  template 
  FileArrayProxy::operator T() const
  {
    return fa.readElement(index);
  }

  template 
  FileArrayProxy&
  FileArrayProxy::operator=(const T& t)
  {
    fa.storeElement(index,t);
    return *this;
  }

  template 
  FileArrayProxy& FileArrayProxy::operator=(
    const FileArrayProxy& p
  )
  {
    fa.storeElement(index,p);
    return *this;
  }

#endif // FARRAY_HPP
That was it. Can you see what happens with the proxy? Let's analyze a small code snippet:

  1 FileArray arr("file",10);
  2 arr[2]=0;
  3 int x=arr[2];
  4 arr[0]=arr[2];
On line two, ``arr.operator[](2)'' is called, which creates a ``FileArrayProxy'' from ``arr'' with the index 2. The object, which is a temporary and does not have a name, has as its member ``fa'' a reference to ``arr'', and as its member ``index'' the value 2. On this temporary object, ``operator=(int)'' is executed. This operator in turn calls ``fa.storeElement(index, t),'' where ``index'' is still 2 and the value of ``t'' is 0. Thus, ``arr[2]=0'' ends up as ``arr.storeElement(2,0)''. On line 3, a similar proxy is created through the call to ``operator[](2)'' This time, however, the ``operator int() const'' is called. This member function in turn calls ``fa.readElement(2)'' and returns its value, thus ``int x=arr[2]'' translates to ``int x=arr.readElement(2).'' On line 4, finally, ``arr[0]=arr[2]'' creates two temporary proxies, one referring to index 0, and one to index 2. The assignment operator is called, which in turn calls ``fa.storeElement(0,p)'', where p is the temporary proxy referring to element 2. Since ``storeElement'' wants an ``int,'' ``p.operator int() const'' is called, which calls ``arr.readElement(2).'' In other words ``arr[0] = arr[2]'' generates the code ``arr.storeElement(0, arr.readElement(2)).'' As you can see, the proxies don't add any new functionality, they're just syntactic sugar, albeit very useful. With them we can treat our file arrays very much like any kind of array. There's one thing we cannot do:

  int* p = &arr[2];
  int& x = arr[3];
  *p=2;
  x=5;
With ordinary arrays, the above would be legal and have well defined semantics, assigning arr[2] the value 2, and arr[3] the value 5. With our file array we cannot do this, but unfortunately the compiler does not prevent it (a decent compiler will warn that we're binding a constant or pointer to a temporary.) We'll mend that hole next month (think about how) and also add iterators, which will allow us to use the file arrays almost exactly like real ones.

In memory data formatting

One often faced problem is that of converting strings representing some data to that data, or vice versa. With the aid of ``istrstream'', ``ostrstream'' and ``strstream'', this is easy. For example, say we have a string containing digits, and want those digits as an integer, the thing to do is to create an ``istrstream'' object from the string. An example will explain:

  char* s = "23542";
  istrstream is(s);
  int x;
  is >> x;
After executing this snippet, ``x'' will have the value 23542. ``istrstream'' isn't much more exciting than that. ``ostrstream'' on the other hand is more exciting. There are two alternative uses for ``ostrstream.'' One where you have an array you want to store data in, and one where you want the ``ostrstream'' to create it for you, as needed (usually because you have no idea what size the buffer must have.) The former usage is like this:

  char buffer[24];
  ostrstream os(buffer, sizeof(buffer));
  double x=23.34;
  os << "x=" << x << ends;
The variable ``buffer'' will contain the string ``x=23.34'' after this snippet. The stream manipulator ``ends'' zero terminates the buffer. Zero termination is not done by default, since the stream cannot know where to put it, and besides you might not always want it. The other variant, where you don't know how large a buffer you will need, is generally more useful (I think.)

  ostrstream os;
  double x=23.34, y=34.45;
  os << x << '*' << y << '=' << x*y << ends;
  const char* p = os.str();
  const size_t length=os.pcount();

  // work with p and length.
  os.freeze(0); // release the memory.
I think the example pretty much shows what this kind of usage does. The member function ``str'' returns a pointer to the internal buffer (which is then frozen, that is, the stream guarantees that it will not deallocate the buffer, nor overwrite it. Attempts to alter the stream while frozen, will fail.) ``pcount'' returns the number of characters stored in the buffer. Last ``freeze'' can either freeze the buffer, or ``unfreeze'' it. The latter is done by giving it a parameter with the value 0. I find this interface to be unfortunate. It's so easy to forget to release the buffer (by simply forgetting to call ``os.freeze(0)'') and that leads to a memory leak. ``strstream'' finally, is just like ``fstream'' the combined read/write stream.
The string streams can be found in the header (or for some compilers .)

Standards update

With the C++ standard, a lot of things have changed regarding streams. As I mentioned already last month, the headers are actually and , and the names std::istream, std::ostream, etc. The streams are templatized too, which both makes life easier and not. The underlying type for std::ostream is:

  std::basic_ostream
                     class traits=std::char_traits >
``charT'' is the basic type for the stream. For ``ostream'' this is ``char'' (ostream is actually a typedef.) There's another typedef, ``std::wostream'', where the underlying type is ``wchar_t'', which on most systems probably will be 16-bit Unicode. The class template ``char_traits'' is a traits class which holds the type used for EOF, the value of EOF, and some other house keeping things. Why the standard has removed the file stream open modes ios::create and ios::nocreate is beyond me, as they're extremely useful.
Casting is ugly, and it's hard to see in large code blocks. There are four new cast operators, that are highly visible, in the standard. They're (in approximate order of increasing danger,) dynamic_cast, static_cast, const_cast and reinterpret_cast. In the binary streaming seen in this article, reinterpret_cast would be used, as a way of saying, ``Yeah, I know I'm violating type safety, but hey, I know what I'm doing, OK?'' The good thing about it is that it's so visible that anyone doubting it can easily spot the dangerous lines and have a careful look. The syntax is: os.write(reinterpret_cast(&variable), sizeof(variable));
Finally, the generally useful strstreams has been replaced by ``std::istringstream'', ``std::ostringstream'' and ``std::stringstream'' (plus wide variants, std::wistringstream, etc.) defined in the header . They do not operate on ``char*'', but on strings (there is a string class, or again, rather a string class template, where the most important template parameter is the underlying character.) ``std::ostringstream'' does not suffer from the freeze problem that ``ostrstream'' does.

Recap

The news this month were:
  • streams dealing with files, or in-memory formatting, are used just the same way as the familiar ``cout'' and ``cin,'' which saves both learning and coding (the already written ``operator<<'' and ``operator>>'' can be used for all kinds of streams already.)
  • streams can be used for binary, unformatted I/O too. This normally doesn't make sense for ``cout'' and ``cin'' or in-memory formatting (as the name implies,) but it's often useful when dealing with files.
  • It is possible to move around in streams, at least file streams and in-memory formatting streams. It's generally not possible to move around in ``cin'' and ``cout.''
  • proxy classes can be used to differentiate read and write operations for ``operator[]'' (the construction can of course be used elsewhere too, but it's most useful in this case.)
  • friends break encapsulation in a way that, when done right, strengthens encapsulation.
  • there's a difference between logical const and bitwise const, but the C++ compiler doesn't know and always assumes bitwise const.
  • truly simple smart pointers can save some memory management house keeping, and also be used as a work around for compilers lacking ``mutable'' (i.e. the way of declaring a variable as non-const for const members, in other words, how to differentiate between logical and bitwise const.)
  • streams can be used also for in-memory formatting of data.

Exercises

  • Improve the file array such that it accepts a ``stream&'' instead of a file name, and allows for several arrays in the same file.
  • Improve the proxy such that ``int& x=arr[2]'' and ``int* p=&arr[1]'' becomes illegal.
  • Add a constructor to the array that accepts only a ``size_t'' describing the size of the array, which creates a temporary file and removes it in its destructor.
  • What happens if we instantiate ``FileArray'' with a user defined type? Is it always desireable? If not, what is desireable? If you cannot define what's desireable, how can instantiation with user defined types be banned?
  • How can you, using the stream interface, calculate the size of a file?

An Introduction to C++ Programming - Part 8

Short recap of inheritance

Inheritance can be used to make runtime decisions about things we know conceptually, but not in detail. The employee/engineer/manager inheritance tree was an example of that; by knowing about employees in general, we can handle any kind of employee, including engineers, managers, secretaries, project leaders, and even kinds of employees we haven't yet thought of, for example marketers, salesmen and janitors.

A deficiency in the model

While this is good, it's not quite enough. The classic counter example is a vector drawing program. Such a program usually holds a collection of shapes. A shape can be a square, a circle, a rectangle, a collection of grouped images, text, lines and so on. The problem in the model lies in the common base, shape. You know a number of things for shapes in general; you can draw them on a canvas, you can scale them, you can rotate them and translate them. The problem is, how do you do any of these for a shape in general? How is a generic shape drawn or rotated? It's impossible. It's only the concrete shapes that can be drawn, rotated, translated, scaled, etc. This in itself is not a problem, we can create our base class ``Shape'' with virtual member functions ``drawOn(Canvas&)'', ``rotate(double degrees)'', ``translate(Coordinate c)'', ``scale(double)'' and so on, and make sure to override these in our concrete shape classes. Herein lies the problem. How do we force the descendants to override them? One way (a bad way) is to implement them in the base class in such a way that they bomb with an error message when called. The bad thing with that is that it violates a very simple rule-of-thumb; ``The sooner you catch an error, the better.'' There are 5 phases in which an error can be found; design, edit, compile, link and runtime. Please note the obvious that errors that cannot be detected until runtime might go undetected! How to discover errors at design or edit time is not for this article (or even this article series), but there's a simple way of moving this particular discovery from runtime to compile time.

Pure virtual (abstract base classes)

C++ offers a way of saying ``This member function must be overridden by all descendants.'' Saying so also implies that objects of the class itself can never be instantiated, only objects of classes inheriting from it. This makes sense. What would you do with a generic shape object? It's better to make it impossible to create one by mistake, since it's meaningless anyway.
Here's how a pure abstract base class might be defined:

  class Shape
  {
  public:
    virtual void draw(Canvas&) = 0;
    virtual void rotate(double angle) = 0;
    virtual void translate(Coordinate c) = 0;
    virtual void scale(double) = 0;
  };
The ``= 0'' ending of a member function declaration makes it pure virtual. Pure virtual means that it must be overridden by descendants. Having one or more pure virtual member functions in a class makes the class an abstract base class. Abstract because you cannot instantiate objects of the class. If you try you'll get compiler errors. A class which has only pure virtual member functions and no data is often called a pure abstract base class, or some times an interface class. The latter is more descriptive; the class defines an interface that descendants must conform to, and any piece of code that can understand the interface can operate on objects implementing the interface (the concrete classes like ``Triangle'', ``Rectangle'', and ``Circle''). The graphically experienced reader has of course noticed that rotation of a circle can be implemented extremely efficiently by doing nothing at all, so how can we take care of that scenario? It's unnecessary to write code that does nothing, is it not? Let's have a look at the alternatives.
  • Let's just ignore it. It won't work, though, since then our ``Circle'' class will be an abstract class (at least one pure virtual is not ``terminated.'')
  • We can change the interface of ''Shape`` such that ``rotate'' is not a pure virtual, and code its implementation to do nothing. This doesn't seem like a good idea because then the programmer implementing the square might forget to implement ``rotate'' without getting compiler errors.
The root of this lies in the illusion that doing nothing at all is the default behaviour, while it is an optimization for circles. As such the ``do nothing at all'' code belongs in ``Circle`` only. In other words, the best solution is with the original pure abstract ``Shape'' class, and an empty implementation for ``Circle::rotate.''

Addressing pure virtuals

I won't write a drawing program, since that'd make this article way too long, and the point would be drowned in all other intricacies of graphical programming. Instead I'll attack another often forgotten issue; addresses. Mailing addresses have different formatting depending on sender and receiver country. If you send something internationally you add the destination country to the address, while for domestic letters that's not necessary. The formatting itself also differs from country to country. Here are a few (simplified) examples:

  Sweden

  Name
  Street Number
  {Country-Code}Postal-Code City
  {Country-Name}

  USA

  Name
  Number Street
  City, State Zip
  {Country-Name}

  Canada and U.K.

  Name
  Number Street
  City
  {Country}
  Postal-Code
Then, of course, there are totally different types of addresses. E-mail, Ham Radio call-signs, phone number, fax number, etc. As a simplification for this example I'll treat State and Zip in U.S. addresses as a unit, and I will assume that Postal-Code and State/Zip in U.S. addresses are synonymous (i.e. I'll only have one field that's used either as postal code or as state/zip combination, depending on country). As an exercise you can improve this. Make sure ``State'' is only dealt with in address kinds where it makes sense. The Country-Code as can be seen in the Swedish address example will also be ignored (this too makes for an excellent exercise to include). The address class hierarchy will be done such that other kinds of addresses like e-mail addresses and phone numbers can be added.
Here's the base class:

  class Address
  {
  public:
    virtual const char* type() const = 0;
    virtual void print(int international=0) const = 0;
    virtual void acquire(void) = 0;
    virtual ~Address();
  };
The idea here is that ``type'' can be used to ask an address object what kind of address it is, a mailing address, e-mail address and so on. If the parameter for ``print'' is non-zero, the address will be printed in international form, (i.e. country name will be added to mailing addresses and international prefixes added to phone numbers). The member function ``acquire'' is used for asking an operator to enter address data. Note that the destructor is virtual, but not pure virtual (what would happen if it was?)

Unselfish protection

All kinds of mailing addresses will share a base, inheriting from ``Address'', that contains the address fields, and ways to access them. This class, however, will not implement any of the formatting pure virtuals from ``Address.'' That must be done by the concrete address classes with knowledge about the country's formatting and naming. The member function ``type'' will be defined here, however, to always return the string ``Mailing address'', since all kinds of mailing addresses are mailing addresses, even if they're Swedish addresses or U.S. Addresses. Access to the address fields is for the concrete classes only, and this is a problem. We've seen how we can make things generally available by declaring them public, or by hiding them from the general public by making them private. Here we want something in between. We want descendants, the concrete address classes, to access the address fields, but only the descendants and no one else. This can be achieved through the third protection level, ``protected.'' Protected means that access is limited to the class itself (of course) and all descendants of it. It is thus looser than private, but much stricter than public.
Here comes the ``MailingAddress'' base class:

  class  MailingAddress : public Address
  {
  public:
    virtual ~MailingAddress();
    const char* type() const;
  protected:
    MailingAddress();

    void name(const char*); // set
    const char* name() const; // get

    void street(const char*); // set
    const char* street() const; // get

    void number(const char*); // set
    const char* number() const; // get

    void city(const char*); // set
    const char* city() const; // get

    void postalCode(const char*); // set
    const char* postalCode() const; // get

    void country(const char*); // set
    const char* country() const; // get
  private:
    char* name_data;
    char* street_data;
    char* number_data;
    char* city_data;
    char* postalCode_data;
    char* country_data;
    //
    // declared private to disallow them
    //
    MailingAddress(const MailingAddress&);
    MailingAddress& operator=(const MailingAddress&);
  };
Here the copy constructor and assignment operator is declared private to disallow copying and assignment. This is not because they conceptually don't make sense, but because I'm too lazy to implement them (and yet want protection from stupid mistakes that would come, no doubt, if I left it to the compiler to generate them). It's the responsibility of this class to manage memory for the data strings, distributing this to the concrete descendants is asking for trouble. As a rule of thumb, protected data is a bad mistake. Having all data private, and always manage the resources for the data in a controlled way, and giving controlled access through protected access member functions will drastically cut down your aspirin consumption. The reason for the constructor to be protected is more or less just aestethical. No one but descendants can construct objects of this class anyway, since some of the pure virtuals from ``Address'' aren't yet terminated.
Now we get to the concrete address classes:

  class SwedishAddress : public MailingAddress
  {
  public:
    SwedishAddress();
    virtual void print(int international=0) const;
    virtual void acquire(void);
  };


  class USAddress : public MailingAddress
  {
  public:
    USAddress();
    virtual void print(int international=0) const;
    virtual void acquire(void);
  };
As you can see, the definitions of ``USAddress'' and ``SwedishAddress'' are identical. The only difference lies in the implementation of ``print'', and ``acquire''. I've left the destructors to be implemented at the compilers discretion. Since there's no data to take care of in these classes (it's all in the parent class) we don't need to do anything special here. We know the parent takes care of it. Don't be afraid of copy construction and assignment. They were declared private in ``MailingAddress'', which means the compiler cannot create them the ``USAddress'' and ''SwedishAddress.'' Let's look at the implementation. For the ``Address'' base class only one thing needs implementing and that is the destructor. Since the class holds no data, the destructor will be empty:

  Address::~Address()
  {
  }
A trap many beginners fall into is to think that since the destructor is empty, we can save a little typing by declaring it pure virtual and there won't be a need to implement it. That's wrong, though, since the destructor will be called when a descendant is destroyed. There's no way around that. If you declare it pure virtual and don't implement it, you'll probably get a nasty run-time error when the first concrete descendant is destroyed. The observant reader might have noticed a nasty pattern of the authors refusal to get to the point with pure virtuals and implementation. Yes, you can declare a member function pure virtual, and yet implement it! Pure virtual does not illegalize implementation. It only means that the pure virtual version will NEVER be called through virtual dispatch (i.e. by just calling the function on an object, a reference or a pointer to an object.) Since it will never, ever, be called through virtual dispatch, it must be implemented by the descendants, hence the rule that you cannot instantiate objects where pure virtuals are not terminated. By termination, by the way, I mean declaring it in a non pure virtual way. OK, so a pure virtual won't ever be called through virtual dispatch. Then how can one be called? Through explicit qualification. Let's assume, just for the sake of argument, that we through some magic found a way to implement the some reasonable generic behaviour of ``acquire'' in ``Address,'' but we want to be certain that descendants do implement it. The only way to call the implementation of ``acquire'' in ``Address'' is to explicitly write ``Address::acquire.'' This is what explicit qualification means. There's no escape for the compiler; writing it like this can only mean one thing, even if ``Address::acquire'' is declared pure virtual.
Now let's look at the middle class, the ``MailingAddress'' base class.

  MailingAddress::~MailingAddress()
  {
    delete[] name_data;
    delete[] street_data;
    delete[] number_data;
    delete[] city_data;
    delete[] postalCode_data;
    delete[] country_data;
  }
I said when explaining the interface for this class, that it is responsible for handling the resources for the member data. Since we don't know the length of the fields, we oughtn't restrict them, but rather dynamically allocate whatever is needed. The ``delete[]'' syntax is for deleting arrays as opposed to just ``delete'' which deletes single objects. Note that it's legal to delete the 0 pointer. This is used here. If, for some reason, one of the fields are not set to anything, it will be 0. Deleting the 0 pointer does nothing at all. From this to the constructor:

  MailingAddress::MailingAddress()
  : name_data(0),
    street_data(0),
    number_data(0),
    city_data(0),
    postalCode_data(0),
    country_data(0)
  {
  }
The only thing the constructor does is to make sure all pointers are 0, in order to guarantee destructability. The ``type'' and read-access methods are trivial:

  const char* MailingAddress::type(void) const
  {
    return "Mailing address";
  }

  const char* MailingAddress::name(void) const
  {
    return name_data;
  }

  const char* MailingAddress::street(void) const
  {
    return street_data;
  }

  const char* MailingAddress::number(void) const
  {
    return number_data;
  }

  const char* MailingAddress::city(void) const
  {
    return city_data;
  }

  const char* MailingAddress::postalCode(void) const
  {
    return postalCode_data;
  }

  const char* MailingAddress::country(void) const
  {
    return country_data;
  }
The write access methods are a bit trickier, though. First we must check if the source and destination are the same, and do nothing in those situations. This is to achieve robustness. While it may seem like a very stupid thing to do, it's perfectly possible to see something like:

  name(name());
The meaning of this is, of course, ``set the name to what it currently is.'' We must make sure that doing this works (or find a way to illegalize the construct, but I can't think of any way). If the source and destination are different, however, the old destination must be deleted, a new one allocated on heap and the contents copied. Like this:

  void MailingAddress::name(const char* n)
  {
    if (n != name_data) {
      delete[] name_data; // OK even if 0
      name_data = new char[strlen(n)+1];
      strcpy(name_data,n);
    }
  }
This is done so many times over and over, exactly the same way for all kinds of data members, that we'll use a convenience function, ``replace,'' to do the job. ``strlen'' and ``strcpy'' are the C library functions from that calculates the length of, and copies strings.

  static void replace(char*& data, const char* n)
  {
    if (data != n) {
      delete[] data;
      data = new char[strlen(n)+1];
      strcpy(data,n);
    }
  }
Using this convenience function, the write-access member functions will be fairly straight forward:

  void MailingAddress::name(const char* n)
  {
    ::replace(name_data,n);
  }

  void MailingAddress::street(const char* n)
  {
    ::replace(street_data,n);
  }

  void MailingAddress::number(const char* n)
  {
    ::replace(number_data,n);
  }

  void MailingAddress::city(const char* n)
  {
    ::replace(city_data,n);
  }

  void MailingAddress::postalCode(const char* n)
  {
    ::replace(postalCode_data,n);
  }

  void MailingAddress::country(const char* n)
  {
    ::replace(country_data,n);
  }
That was all the ``MailingAddress'' base class does. Now it's time for the concrete classes. All they do is to ask questions with the right terminology and output the fields in the right places:

  SwedishAddress::SwedishAddress()
  : MailingAddress()
  {
    country("Sweden"); // what else?
  }

  void SwedishAddress::print(int international) const
  {
    cout << name() << endl;
    cout << street() << ' ' << number() << endl;
    cout << postalCode() << ' ' << city() << endl;
    if (international) cout << country() << endl;
  }

  void SwedishAddress::acquire(void)
  {
    char buffer[100]; // A mighty long field

    cout << "Name: " << flush;
    cin.getline(buffer,sizeof(buffer));
    name(buffer);

    cout << "Street: " << flush;
    cin.getline(buffer,sizeof(buffer));
    street(buffer);

    cout << "Number: " << flush;
    cin.getline(buffer,sizeof(buffer));
    number(buffer);

    cout << "Postal code: " << flush;
    cin.getline(buffer,sizeof(buffer));
    postalCode(buffer);

    cout << "City: " << flush;
    cin.getline(buffer,sizeof(buffer));
    city(buffer);
  }

  USAddress::USAddress()
  : MailingAddress()
  {
    country("U.S.A."); // what else?
  }

  void USAddress::print(int international) const
  {
    cout << name() << endl;
    cout << number() << ' ' << street() << endl;
    cout << city() << ' ' << postalCode() << endl;
    if (international) cout << country() << endl;
  }

  void USAddress::acquire(void)
  {
    char buffer[100]; // Seems like a mighty long field

    cout << "Name: " << flush;
    cin.getline(buffer,sizeof(buffer));
    name(buffer);

    cout << "Number: " << flush;
    cin.getline(buffer,sizeof(buffer));
    number(buffer);

    cout << "Street: " << flush;
    cin.getline( buffer,sizeof(buffer));
    street(buffer);

    cout << "City: " << flush;
    cin.getline(buffer, sizeof(buffer));
    city(buffer);

    cout << "State and ZIP: " << flush;
    cin.getline(buffer,sizeof(buffer));
    postalCode(buffer);
  }

A toy program

Having done all this work with the classes, we must of course play a bit with them. Here's an short and simple example program that (of course) also makes use of the generic programming paradigm introduced last month.

  int main(void)
  {
    const unsigned size=10;
    Address* addrs[size];
    Address** first = addrs; // needed for VACPP (bug?)
    Address** last = get_addrs(addrs,addrs+size);

    cout << endl << "--------" << endl;

    for_each(first,last,print(1));
    for_each(first,last,deallocate
()); return 0; }
OK, that was mean. Obviously there's a function ``get_addrs'', which reads addresses into a range of iterators (in this case pointers in an array) until the array is full, or it terminates for some other reason. Here's how it may be implemented:

  Address** get_addrs(Address** first,Address** last)
  {
    Address** current = first;
    while (current != last)
    {
      cout << endl << "Kind (U)S, (S)wedish or (N)one "
           << flush;

      char answer[5]; // Should be enough.
      cin.getline(answer,sizeof(answer));
      if (!cin) break;

      switch (answer[0]) {
      case 'U': case 'u':
        *current = new USAddress;
        break;
      case 'S': case 's':
        *current = new SwedishAddress;
        break;
      default:
        return current;
      }
      (**current).acquire();
      ++current;
    }
    return current;
  }
In part 6 I mentioned that virtual dispatch could replace switch statements, and yet here is one. Could this one be replaced with virtual dispatch as well? It would be unfair of me to say ``no'', but it would be equally unfair of me to propose using virtual dispatch here. The reason is that we'd need to work a lot without gaining anything. Why? We obviously cannot do virtual dispatch on the ``Address'' objects we're about to create, since they're not created yet. Instead we'd need a set of address creating objects, which we can access through some subscript or whatever, and call a virtual creation member function for. Doesn't seem to save a lot of work does it? Probably the selection mechanism for which address creating object to call would be a switch statement anyway! So, that was reading, now for the rest. ``for_each'' does something for every iterator in a range. It could be implemented like this:

  template 
  void for_each(OI first,OI last, const F& functor)
  {
    while (first != last) {
      functor(*first);
      ++first;
    }
  }
In fact, in the (final draft) C++ standard, there is a beast called ``for_each'' and behaving almost like this one (it returns the functor). It's pretty handy. Imagine never again having to explicitly loop through a complete collection again. What is ``print'' then? Print is a ``functor,'' or ``function object'' as they're often called. It's something which behaves like a function, but which might store a state of some kind (in this case whether the country should be added to written addresses or not), and which can be passed around like any object. Defining one is easy, although it looks odd at first.

  class print
  {
  public:
    print(int i) ;
    void operator()(const Address*) const;
  private:
    int international;
  };

  print::print(int i)
   : international(i)
  {
  }

  void print::operator()(const Address* p) const
  {
    p->print(international);
    cout << endl;
  }
What on earth is ``operator()''? It's the member function that's called if we boldly treat the name of an object just as if it was the name of some function, and simply call it. Like this:

  print pobject; // define print object.
  pobject(1);    // pobject.operator()(1);
This is usually called the ``function call'' operator, by the way. The only remaining thing now is ``dealllocate'', but you probably already guessed it looks like this:

  template 
  class deallocate
  {
  public:
    void operator()(T* p) const;
  };

  template 
  void deallocate::operator()(T* p) const
  {
    delete p;
  }
This is well enough for one month, isn't it? You know what? You know by now most of the C++ language, and have some experience with the C++ standard class library. Most of the language issues that remain are more or less obscure and little known. We'll look mostly at library stuff and clever ideas for how to use the language from now on.

Recap

This month, you've learned:
  • what pure virtual means, and how you declare pure virtual functions.
  • that despite what most C++ programmers believe, pure virtual functions can be implemented.
  • that the above means that there's a distinction between terminating a pure virtual, and implementing one.
  • why it's a bad idea to make destructors pure virtual.
  • a new protection level, ``protected.''
  • why protected data is bad, and how you can work around it in a clever way.
  • that switch statements cannot always be replaced by virtual dispatch.
  • that there is a ``function call'' operator and how to define and use it.

Exercises

  • Find out what happens if you declare the ``MailingAddress'' destructor pure virtual, and yet define it.
  • Think of two ways to handle the State/Zip problem, and implement both (what are the advantages, disadvantages of the methods?)
  • Rewrite ``get_addrs'' to accept templatized iterators instead of pointers.