An Introduction to C++ Programming - Part 5

Introduction

We've seen how the fundamental types of C++ can be written to the screen with "cout << value" and read from standard input with "cin >> variable". This month, you will learn how you can do the same for your own classes and structs. It's surprisingly easy to do.

Exploring I/O of fundamental types

Formatted I/O, is not part of the language proper in C++ (or in C for that matter.) It's handled by an I/O library, that's implemented in the language. (If you're familiar with Pascal, try to implement something like Write and WriteLn in Pascal. You can't, the language doesn't allow it, that's why it's built into the language itself.) We've seen a number of times how we can print something with "cout << value". How can this be expressed in the language? To begin with, the syntax is legal only because you can overload operators in C++. You've already seen that with operator=. Let's see what actually happens when we use operator=.

  class X
  {
  public:
    ...
    X& operator=(int i);
    ...
  };

  X x;

  x=5; //**
At the last line of the example, what actually happens is that operator= is called for the object named "x". Another way of expressing this is:

  x.operator=(5);
In fact, this syntax is legal, and it generates identical code, because this is how the compiler will treat the more human-readable form "x=5".
As we can see then, an operator overridden in a class, is just like any other member function, it's just called in a peculiar form.
Let's go back to printing again. "cout" is an object of some class, which has operator<<(T) overloaded, where T is any of the fundamental types of C++. The relevant section of the class definition looks as follows:

  class ostream
  {
    ...
  public:
    ...
    ostream& operator<<(char);
    ostream& operator<<(signed char);
    ostream& operator<<(unsigned char);
    ostream& operator<<(short);
    ostream& operator<<(unsigned short);
    ostream& operator<<(int);
    ostream& operator<<(unsigned int);
    ostream& operator<<(long);
    ostream& operator<<(unsigned long);
    ostream& operator<<(float);
    ostream& operator<<(double);
    ostream& operator<<(long double);
    ostream& operator<<(const char*);
    ...
  };
The value returned by each of these is the stream object itself (i.e. if you call "operator<<(char)" on "cout", the return value will be a reference to "cout" itself.)
With the above in mind, we can see that writing

  int i;
  double d;

  cout << i << d;
is synonymous with

  int i;
  double d;
  (cout.operator<<(i)).operator<<(d);
The only difference for reading is that the class is called "istream" instead, and that the operator used is operator>>().

I/O with our own types

The most important thing to recognise is that our own types (classes and structs) always consists of fundamental types. This is important. The C++ I/O library only supports I/O of the fundamental types, so if our own data types consisted of something completely different, I/O would be very difficult indeed.
So, how do we make sure we can do I/O on ranges and stacks (from the earlier lessons?) What about extending our own class with the members operator<< and operator>>? This would, sort of, work, but the syntax would change. As I wrote above, "a << b" is identical with "a.operator<<(b)", and if we add operators << and >> to our class, we'll require our object on the left hand side, and the stream to print on/read from, on the right hand side, and that's not what we want. Another possible way of doing this is to edit the ostream and istream class to contain operator<< and operator>> for our own classes. Does that seem like a good idea to you? It doesn't to me.
The solution does yet again lie in operator overloading, but this time in a somewhat different way. We just saw how we can overload an operator for a class, such that the operator becomes a member function for that class (only, in its use, the syntax differs.) It's also possible to overload operators, such that the operator becomes a function, provided that at least one of the parameters to the operator is not a built-in type. Most operators that can be defined like a nonmember function, accept two parameters. Such is the case for our new friends operator<< and operator>>.
Let's revisit our old friend, the class "Range." This is the definition of "Range", for those who do not have old issues handy (I've added "const" on the member functions, now that you know what it's for. See part 3 for details if you've forgotten):

  struct BoundsError {};
  class Range
  {
  public:
    Range(int upper_bound = 0, int lower_bound = 0)
    throw (BoundsError);
    // Precondition: upper_bound >= lower_bound
    // Postconditions:
    //   lower == upper_bound
    //   upper == upper_bound

    int lowerBound() const throw ();
    int upperBound() const throw ();
    int includes(int aValue) const throw ();
  private:
    int lower;
    int upper;
  };
How should this thing be printed and read? Here's a wishlist. We'll reduce it a little bit, to be more realistic later.
  1. The syntax and semantics for printing must be the same as for the fundamental types of C++.
  2. Full commit or roll back, that is, either we print all there is to be printed, or we print nothing at all.
  3. The print must be in a form distinguishable from, say, two integers separated by a comma.
  4. Full type safety
  5. Encapsulation not violated.
  6. No unnecessary computations.
  7. We want printing and reading synchronized (i.e., if we read something, then print a range, then reads something, we want the reading to complete before printing, and we want it all printed before reading again.) Since both reading and writing is normally buffered, it is not at all obvious that this will occur.
All of these are possible, but #2, #6 and #7 are usually skipped. I'll skip #2 for now. What's the appearance we want of a range when printed, and what format should we accept when reading? A golden rule in I/O (and not just in C++) is to be very strict in your output, but very liberal in what you accept as input. Normally, the C++ I/O library handles just exactly this for you. For format I chose is "[upper,lower]", no spaces anywhere. On input however, white space is allowed before the first bracket, and between any of the tokens (the tokens here are '[', number, ',' and ']').
OK, so now we have a pretty good picture on what to do, now... how? Overloading operator<< as a global function. The signature becomes:

  ostream& operator<<(ostream&, const Range&);
This declares a function, which has the syntax of a left shift operator. If we have code like:

  Range r;
  int i;
  ...
  cout << r << i;
The compiler will treat it as

  operator<<(cout, r).operator<<(i);
This even works for more complex expressions, like:

  Range r;
  int i;
  int j;
  ...
  cout << i << r << j;
Which the compiler interprets as:

  operator<<(cout.operator<<(i),r).operator<<(j);
Study these examples carefully, to make sure you understand what's going on. Now, after these examples, it's fairly easy to get down to work with implementing the operator<< function.

  ostream&A& operator<<(ostream& os, const Range& r)
  {
    os << '[' << r.upperBound()
      << ',' << r.lowerBound() << ']';
    return os;
  }
Here "r" is passed as const reference, since the function does not alter "r" in any way (and promises it won't.) The stream, "os", however, is passed by non-const reference. This is essential. Printing does alter a stream. It will not be the same after printing as it was before printing. It is not possible to pass it by value, since when passing by value, means copying, and copying a stream doesn't make much sense (think about it.) Inside the function, we're printing known types, char and int, so the operator<< provided by the I/O class library suits just fine. How well does this suit the 7 points above? The syntax is correct, and the semantics are too, given the facts known this far (more is needed, as you will see further down.) We do not have full commit or rollback, but I mentioned already in the beginning that we'll skip that for now. The format is distinct enough, we have type safety and encapsulation is not violated. This is as far as most books on C++ cover when it comes to printing your own types. However, we do make some unnecessary computations if the stream is bad in one way or the other. Say, for example, we have a detached process. Detached processes do not have standard output and standard input (unless redirected) and as such printing will always fail. Why then, even try? We also do not synchronize our output with input. The check and synchronization is simple to make, but oddly enough not mentioned in most C+ books.

  ostream& operator<<(ostream& os, const Range& r)
  {
    if (!os.opfx())
      return os;
    os << '[' << r.upperBound()
      << ',' << r.lowerBound() << ']';
    os.osfx();
    return os;
  }
The "prefix" ("opfx" means "output prefix") function checks for a valid stream, and also synchronizes output with input. The "suffix" ("osfx" means "output suffix") signals end of output, so that synchronized input streams can begin accepting input again.) I dare you to find this in a C++ book (I know of one book.) I don't know why it's just about always skipped, since it isn't more difficult than this to avoid unnecessary computations and synchronize input with output. That was printing, how about reading? The signature and general appearance of the function is pretty much clear from the above discussions. Let's make a try:

  istream& operator>>(istream& is, Range& r)
  {
    if (!is.ipfx())
      return is;
    char c;
    is >> c;
    if (c != '[')
      // signal error somehow and roll back stream.
      ;
    int upper;
    is >> upper;
    is >> c;
    if (c != ',')
      // signal error somehow and roll back stream.
      ;
    int lower;
    is >> lower;
    is >> c;
    if (c != ']')
      // signal error and roll back stream.
      ;
    r=Range(upper,lower);
    is.isfx(); // ERROR! Does not exist!
    return is;
  }
Hmm... OK, so reading wasn't as easy... There are three issues above that needs to be resolved. How to signal error, how to roll back the stream, and how to deal with the suffix function, since the guess "is.isfx()" was wrong. Let's begin from the easy end, the suffix function. The solution is that there isn't one, so we needn't even try. The problem is fixed by removing the faulty line (don't you just love bugs that you fix solely by removing code!) Rolling back the stream is interesting indeed, since it's very difficult to do. In fact, it's almost impossible. We can put back a character. One character, that is all that is guaranteed to work. In other words, our only chance is if the first character read is not right. Putting back a character is done with "istream::putback(char)". It's also absolutely necessary that the character put back is the same as the last one read, otherwise the behaviour is undefined (which literally means all bets are off, the program may do *anything*, but in practice it means you cannot know if it just backs a position, or actually changes the character.)
The obvious solution to signalling an error, to throw an exception, is wrong. The reason is conceptual. Use exceptions to signal exceptional situations, and other means to handle the expected. The wrong input *is* expected. Remember you're dealing with input generated by human beings here. Sure you can, in theory, demand that the users of your program enter the exact right data in the exact correct format every time, but you won't be very popular among them, and soon will have none. No, erroneous user input is expected, and thus not exceptional, and thus not to be handled with exceptions. How then?
A stream object has an error state consisting of three orthogonal failure flags. "bad", "eof" and "fail". "eof" is used to signal end of file, a not too unusual situation (as a matter of fact, a situation most programs rely on, but if it occurs in the middle of reading something, it's usually a failure.) "fail" is the one we're interested in here, it's used to signal that we received something that was not what we expected, but the stream itself is OK. "bad" is something we hope to never see, since it means the stream is really out of touch with reality and we cannot trust anything from it (I've only seen this one once, and it was due to a bug in a library!) I guess we can expect "bad" if reading from a file, and hit a bad sector.
So, what we should do if we read something unexpected, is to set the stream state to "fail." This is done with the odd named member function "clear(int)". "clear" sets the status bits of the stream to the pattern of the integer parameter (which defaults to 0, so if nothing is passed, the name makes sense.) The bits we can set are "ios::badbit", "ios::failbit" and "ios::eofbit". We can get the current status bits by calling "is.rdstate()", and usually we want to do that when setting or resetting a status bit, since we want to affect only that bit, and leave the other bits as they were before the call. The status bits can also be checked with the calls "is.fail()", "is.bad()" and "is.eof()" (which return 0 if the bit they represent is not set, and non-zero otherwise.) A fourth call "is.good()" returns non-zero if no error state bits are set, and 0 otherwise. Now with the above in mind, let's make another try:

  istream& operator>>(istream& is, Range& r)
  {
    if (!is.ipfx())
      return is;
    char c;
    is >> c;
    if (c != '[')
    {
      is.putback(c);
      is.clear(ios::failbit|is.rdstate());
      return is;
    }
    int upper;
    is >> upper >> c;
    if (c != ',')
    {
      is.clear(ios::failbit|is.rdstate());
      return is;
    }
    int lower;
    is >> lower >> c;
    if (c != ']')
    {
      is.clear(ios::failbit|is.rdstate());
      return is;
    }
    if (is.good()) {
      if (upper >= lower)
        r=Range(upper,lower);
      else
        is.clear(ios::failbit|is.rdstate());
    }
    return is;
  }
This actually solves the problem as far as is possible. The call to "is.ipfx()" not only synchronizes the input stream with output streams, but also checks for error conditions and reads past leading white space. If the first character read is not a '[', we put the character back and set the fail bit (the order is important, "putback" is not guaranteed to work if the stream is in error.) After this we read the upper limit of the range, and the separator. Note that operator>> for built in types skips leading whitespace, so we needn't work on that at all. If the separator is not ",", mark the stream as failed, and return. Then read the lower limit and the terminator. If the terminator is not ']', we set the stream state to failed and return. If reading of either upper limit or lower limit failed, the stream is set to fail state, and other reads will not do anything at all (not even alter the stream error state,) thus the check near the end for "is.good()" is enough to know if all parts were read as we expected. If they were, all we need to do is to check that the upper limit indeed is at or above the lower limit (precondition for the range) and if so set "r" (since we haven't declared an assignment operator, the compiler did it for us, so the call is valid,) otherwise set the fail error state. How well do we match the 7 item wish list? You check and judge; I think we're doing fine, and in fact better than what can be found in most books on the subject.

Formatting

There are a number of ways in which the output format of the fundamental types of C++ can be altered, and a few ways in which the requirements on the input format can be altered. For example, a field width can be set, and alignment within that field. For integral types, the base can be set (decimal, octal, hexadecimal). For floating point types the format can be fixed point or scientific. All of these, and yet some, are controlled with a few formatting flags, and a little data. All flags are set or cleared with the member functions "os.setf()" and "os.unsetf()". I think they're difficult to use, but fortunately there are easier ways of achieving the same effect, and we'll visit those later.
The base for integral output is altered with a call to "os.setf(v, ios::basefield)", where v is one of "ios::hex", "ios::dec" or "ios::oct". As a small example, consider:

  #include 

  int main(void)
  {
    int i=19;
    cout << i << endl;
    cout.setf(ios::hex, ios::basefield);
    cout << i << endl;
    cout.setf(ios::oct, ios::basefield);
    cout << i << endl;
    cout.setf(ios::dec, ios::basefield);
    cout << i << endl;
    return 0;
  }
The result of running this program is:

  19
  13
  23
  19
The base is converted as expected, but there is no way to see what base it is. This can be improved with the formatting flag ios::showbase, so let's set that one too.

  int main(void)
  {
    int i=19;
    cout.setf(ios::showbase);
    cout << i << endl;
    cout.setf(ios::hex, ios::basefield);
    cout << i << endl;
    cout.setf(ios::oct, ios::basefield);
    cout << i << endl;
    cout.setf(ios::dec, ios::basefield);
    cout << i << endl;
    return 0;
  }
The output of this program is

  19
  0x13
  023
  19
That's more like it, right? The call to "setf()" for setting the "ios::showbase" flag is different, though. "setf()" is overloaded in two forms. One accepts a set of flags and a mask, the other one a full set of flags only. All the formatting flags of the iostreams are represented as bits in an integer, and the version with the mask clears the bits represented by the mask, except those explicitly set by the first parameters. Formatting bits not represented by the mask will remain unchanged. The second form, the one accepting only one parameter, sets the flags sent as parameter, and leaves the others unchanged (in other words, it bitwise "or"es the current bit-pattern with the one provided as the parameter.) Now you begin to see why this is messy. If the masked version is called, and the mask is "ios::basefield", the only formatting flags of the stream that will be affected are "ios::hex" or "ios::dec" or "ios::oct". The three of these are mutually exclusive, so a call to "os.setf(ios::hex)", is potentially dangerous (what if "ios::oct" was already set? Then you'd end up with both being set.) The second parameter "ios::basefield" guarantees that if you set "ios::hex", then "ios::oct" and "ios::dec" will be cleared. While it's possible to set two, or all three of these flags at the same time, it's not a very good idea (yields undefined behaviour.) That was setting the base for integral types, now for something that's common to all types, field width and alignment. The field width is set with "os.width(int)", and the curious can get the current field width by calling "os.width(void)." Simple enough, let's try it out:

  #include 

  int main()
  {
    cout << '[' << -55 << ']' << endl;
    cout.width(10);
    cout << '[' << -55 << ']' << endl;
    cout << '[';
    cout.width(10);
    cout << -55 << ']' << endl;
    cout << '[' << -55 << ']' << endl;
    return 0;
  }
Executing this programs shows something interesting; the width set does not affect the printing separate characters, and the width is reset after printing the first thing that uses it. This is not very intuitive I think. The result of running the program is shown below:

  [-55]
  [       -55]
  [       -55]
  [-55]
Had you expected this? I didn't, for sure. Now, let's play with alignment within a field. If the field width is not set, or the field width set is smaller than that necessary to represent the value to be printed, alignment doesn't matter, but if there's extra room, alignment does make a difference. Alignment is set with the two parameter version of "os.setf()", where the first parameter is one os "ios::left", "ios::right", or "ios::internal", and the second parameter is "ios::adjustfield". As with the base for integral types, the three alignment forms are mutually exclusive, so don't set two of them at the same time. Let's alter the width setting program to show the behaviour.

#include 

int main()
{
  cout.setf(ios::right, ios::adjustfield);
  cout << '[' << -55 << ']' << endl;
  cout.setf(ios::left, ios::adjustfield);
  cout << '[' << -55 << ']' << endl;
  cout.setf(ios::internal, ios::adjustfield);
  cout << '[' << -55 << ']' << endl;
  cout.width(10);
  cout.setf(ios::right, ios::adjustfield);
  cout << '[' << -55 << ']' << endl;
  cout.width(10);
  cout.setf(ios::left, ios::adjustfield);
  cout << '[' << -55 << ']' << endl;
  cout.width(10);
  cout.setf(ios::internal, ios::adjustfield);
  cout << '[' << -55 << ']' << endl;
  return 0;
}
The result of running this is, after the above explanations, not very surprising:

  [-55]
  [-55]
  [-55]
  [       -55]
  [-55       ]
  [-       55]
Well, OK, I found the formatting of "ios::internal" to be a bit odd, but it kind of makes sense. If the field width is larger than that required for a value, the current alignment defines where in the field the value will be, and where in the field space will be. Space, but the way, is just the default, we can change the "padding character", by calling "os.fill(char)", and get the current value with a call to "os.fill(void)". Let's exercise that one too:

  #include 

  int main()
  {
    cout.width(10);
    cout.fill('.');
    cout << -5 << endl;
    cout.width(10);
    cout << -5 << endl;
    return 0;
  }
Running it yields the surprising result

  ........-5
  ........-5
Why was this surprising? Earlier we saw that the field width is "forgotten" once used. The pad character, however, remains the same until explicitly changed. Now that you have the general idea, why not try the other formatting flags there are:
  • ios::fixed and ios::scientific control the format of floating point numbers (the mask used is ios::floatfield.)
  • ios::showpos controls whether a "+" should be prepended to positive numbers or not (just like a "-" is prepended to negative numbers.
  • ios::uppercase controls whether hexadecimal digits should be displayed with upper case letters or lower case letters.
  • ios::showpoint controls whether the decimals should be shown for floating point numbers if they are all zero.
The only thing remaining for formatting is "os.precision", which comes in two flavours. One without parameters which reports the current precision, and one with an int parameter. The unpleasant thing about this parameter, is that many compilers interpret it differently. Some think the precision is the number of digits after the decimal point, while most think it's the number of digits to display. The November 1997 draft C++ standards document (which, by the way, most probably is the final C++ standards document,) says the number of digits after the decimal point is what's controlled, but I'm not sure if that's what the current standards document says. At any rate, inconsistencies aside, this is a mess, isn't it?

An easier way

The authors of the I/O package realized that this is a mess, so they defined something called "manipulators." You've already used one manipulator a lot, "endl." A manipulator does may, or may not, print something on the stream, but it will alter the stream in some way. For example "endl" prints a new line character, and flushes the stream buffer. There are two kinds of manipulators, those accepting a parameter, and those that does not. Let's first focus on those that don't, just like "endl." The ones available are: "dec", "hex", "oct", "endl", "ends", and "flush". Their use is simple:

  #include 

  int main(void)
  {
    cout << hex << 127 << " " << oct << 127 << " "
        << oct << 127 << endl;
    return 0;
  }
The advantage of this is both that the code becomes clearer, and that there's no way you can accidentally set illegal base flag combinations. "ends" is rarely used, it's there to print a terminating '\0' (the terminating '\0' of strings is never printed normally.) "flush" flushes the stream buffer (i.e. forces printing right away.) How do these manipulators work? There's a rather odd looking operator<< for output streams. It looks like:

  ostream& operator<<(ostream& (*f)(ostream&))
  {
    return f(*this);
  }
Now, what on earth does this mean? It means that if you have a function accepting an ostream& parameter, and returning an ostream&, that function can be "printed," and if you do, the function will be called with the stream as its parameter. Let's exercise this by rolling our own "left" alignment manipulator:

  ostream& left(ostream& os)
  {
    os.setf(ios::left, ios::adjustfield);
    return os;
  }
This function matches the required signature, so if we "print" it with "cout lt;< left", the above mentioned operator<< is called, and it in its turn calls the function for the stream, so "cout << left", actually ends up as "left(cout)". Cool, eh? Roll your own "right" and "internal" manipulators as a simple exercise (they're handy too.) Then there are some manipulators accepting a parameter. To access them, you need to #include . The ones usually accessed from there are "setw" (for setting the field width,) "setprecision", and "setfill". Their use is fairly straightforward and doesn't require any example.
Every compiler I've seen provides its own mechanism for writing such manipulators, so doing it in a portable way is very difficult. Or actually, it isn't if you skip the mechanism offered by your compiler vendor and do the job yourself, because it really is simple. Let's write one that prints a defined number of spaces:

  class spaces
  {
  public:
    spaces(int s) : nr(s) {};
    ostream& printOn(ostream& os) const {
      for (int i=0; i < nr; ++i)
        cout << ' ';
      return os;
    }
  private:
    int nr;
  };

  ostream& operator<<(ostream& os, const spaces& s)
  {
    return s.printOn(os);
  }
Can you see what happens if we call "cout << spaces(40)"? First the object of class "spaces" is created, with a parameter of 40. That parameter is in the constructor stored in the member variable "nr". Then the global operator<< for an ostream& and a const space& is called, and that function in its turn calls the printOn member function for the spaces object, which goes through the loop printing space characters. I think writing manipulators requiring parameters this way is lots easier than trying to understand the non-portable way provided by your compiler vendor.
Now something for you to think about until next month, what about our I/O of our own classes with respect to the formatting state of the stream? How's the "Range" class printed if the field width and alignment is set to something? How should it be printed (hint, your probably want it printed differently from what will be the case if you don't take care of it.)

Exercises

  • Find out which formatting parameters "stick" (like the choice of padding character) and which ones are dropped immediately after first use (like the field width.)
  • With the above in mind, and remembering that destructors can be put to good work, write a class which will accept an ostream as its constructor parameter, and which on destruction will restore the ostreams formatting state to what it was on construction.
  • Experiment with the formatting flags on input, which have effect, and which don't? Of those that do have an effect, do they have the effect you expect?
  • Write an input manipulator accepting a character, which when called compares it with a character read from the stream, and sets the ios::fail status bit if they differ.

Recap

This month you've learned a number of things regarding the fundamentals of C++ I/O. For example
  • How to set, clear and recognise the error state of a stream.
  • Why exceptions are not to be used when input is wrong.
  • How to make sure your own classes can be written and read.
  • The very messy, and the somewhat less messy way of altering the formatting state of a stream.
  • How to write your own stream manipulators.

Standards update

  • The prefix and postfix functions are history. Instead you create an object of type istream::sentry or ostream::sentry, and check it, like this:
    
          istream& work(istream& is)
          {
            istream::sentry cerberos(is);
            if (kerberos) {
              ...
            }
          return is;
          }
        
    The destructor of the sentry object does the work corresponding to that of the postfix function.
  • "istream" and "ostream" are in fact not classes in the standard, but typedef's for class templates. The class templates are template class basic_istream, and template class basic_ostream. "istream" is typedefed as "basic_istream >", and "ostream" as "basic_ostream >". There's also the pair "wistream" and "wostream", that are streams of wide characters.
  • The mechanism for writing manipulators is standardised (and heavily based on templates.) I still think it's easier to write a class the way I showed you.
  • Any operation that sets an error status bit may throw an exception. Which error status bits cause exceptions to be thrown is controlled with an exception mask (a bit mask.) By default, though, no exceptions are thrown.
  • Formatting of numeric types (and time) is localised. By default most implementations will probably use the same formatting as they do today, but with the support for "imbuing" streams with other locales (formatting rules.)
  • The header name is (no .h) and the names actually std::istream and std::ostream (everything in the C++ standard library is named std::whatever, and every standard header is named without trailing .h)

0 comments:

Post a Comment