As we have seen in the previous chapter, C++ provides the tools to derive classes from one base type, to use base class pointers to address derived objects, and subsequently to process derived objects in a generic class.
Concerning the allowed operations on all objects in such a generic class we
have seen that the base class must define the actions to be performed on all
derived objects. In the example of the Vehicle this was the
functionality to store and retrieve the weight of a vehicle.
When using a base class pointer to address an object of a derived class, the
pointer type (i.e., the base class type) normally determines which actual
function will be called. This means that the code example as from section VStorage
which uses the storage class VStorage, will incorrectly compute
the combined weight when a Truck object (see section Truck
) is in the storage --- only one weight field, of the cabin part of the
truck, is taken into consideration. The reason for this is obvious: a
Vehicle *vp calls the function Vehicle::getweight()
and not Truck::getweight(); even when that pointer actually points
to a Truck.
The opposite is however also possible. I.e., C++ makes it possible
that a Vehicle *vp calls a function Truck::getweight()
when the pointer actually points to a Truck. The terminology for
this feature of C++ is polymorphism: it is as though the pointer
vp assumes several forms when pointing to several objects. In other
words, vp might behave like a Truck* when pointing to
a Truck, or like an Auto* when pointing to an
Auto etc.. (
A second term for this feature is late binding. This name refers to the fact that the decision which function to call (one of the base class or one of the derived classes) cannot be made at compile-time. The right function is selected at run-time.
The default behavior of the activation of a member function via a pointer is
that the type of the pointer determines the function. E.g., a
Vehicle* will activate Vehicle's member functions,
even when pointing to an object of a derived class. This is referred to as
early or static binding, since the type of function is known
compile-time. The late or dynamic binding is achieved in
C++ with virtual functions.
A function becomes virtual when its declaration starts with the keyword
virtual. Once a function is declared virtual in a base
class, its definition remains virtual in all derived classes; even
when the keyword virtual is not repeated in the definition of the
derived classes.
As far as the vehicle classification system is concerned (see section VehicleSystem
ff.) the two member functions getweight() and
setweight() might be declared as virtual. The class
definitions below illustrate the classes Vehicle (which is the
overall base class of the classification system) and Truck, which
has Vehicle as an indirect base class. The functions
getweight() of the two classes are also shown:
class Vehicle
{
public:
// constructors
Vehicle ();
Vehicle (int wt);
// interface.. now virtuals!
virtual int getweight () const;
virtual void setweight (int wt);
private:
// data
int weight;
}
// Vehicle's own getweight() function:
int Vehicle::getweight () const
{
return (weight);
}
class Land: public Vehicle
{
.
.
}
class Auto: public Land
{
.
.
}
class Truck: public Auto
{
public:
// constructors
Truck ();
Truck (int engine_wt, int sp, char const *nm,
int trailer_wt);
// interface: to set two weight fields
void setweight (int engine_wt, int trailer_wt);
// and to return combined weight
int getweight () const;
private:
// data
int trailer_weight;
};
// Truck's own getweight() function
int Truck::getweight () const
{
return (Auto::getweight () + trailer_wt);
}
Note that the keyword virtual appears only in the definition of
the base class Vehicle; it need not be repeated in the derived
classes (though a repetition would be no error).
The effect of the late binding is illustrated in the next fragment:
Vehicle
v (1200); // vehicle with weight 1200
Truck
t (6000, 115, // truck with cabin weight 6000, speed 115,
"Scania", // make Scania, trailer weight 15000
15000);
Vehicle
*vp; // generic vehicle pointer
int main ()
{
// see below (1)
vp = &v;
printf ("%d\n", vp->getweight ());
// see below (2)
vp = &t;
printf ("%d\n", vp->getweight ());
// see below (3)
printf ("%d\n", vp->getspeed ());
return (0);
}
Since the function getweight() is defined as
virtual, late binding is used here: in the statements above below
the (1) mark, Vehicle's function
getweight() is called. In contrast, the statements under
(2) use Truck's function getweight().
Statement (3) however will still lead to a syntax error. A
function getspeed() is no member of Vehicle, and hence
also not callable via a Vehicle*.
The rule is that when using a pointer to a class, only the functions which
are members of that class can be called. These functions can be
virtual, but this only affects the type of binding (early vs.
late).
When functions are defined as virtual in a base class (and hence
in all derived classes), and when these functions are called using a pointer to
the base class, the pointer as it were can assume more forms: it is polymorph.
In this section we illustrate the effect of polymorphism on the manner in which
programs in C++ can be developed.
A vehicle classification system in C might be implemented with
Vehicle being a union of structs, and having an
enumeration field to determine which actual type of vehicle is represented. A
function getweight() would typically first determine what type of
vehicle is represented, and then inspect the relevant fields:
typedef enum /* type of the vehicle */
{
is_vehicle,
is_land,
is_auto,
is_truck,
} Vtype;
typedef struct /* generic vehicle type */
{
int weight;
} Vehicle;
typedef struct /* land vehicle: adds speed */
{
Vehicle v;
int speed;
} Land;
typedef struct /* auto: Land vehicle + name */
{
Land l;
char *name;
} Auto;
typedef struct /* truck: Auto + trailer */
{
Auto a;
int trailer_wt;
} Truck;
typedef union /* all sorts of vehicles in 1 union */
{
Vehicle v;
Land l;
Auto a;
Truck t;
} AnyVehicle;
typedef struct /* the data for a all vehicles */
{
Vtype type;
AnyVehicle thing;
} Object;
int getweight (Object *o) /* how to get weight of a vehicle */
{
switch (o->type)
{
case is_vehicle:
return (o->thing.v.weight);
case is_land:
return (o->thing.l.v.weight);
case is_auto:
return (o->thing.a.l.v.weight);
case is_truck:
return (o->thing.t.a.l.v.weight +
o->thing.t.trailer_wt);
}
}
A disadvantage of this approach is that the implementation cannot be easily
changed. E.g., if we wanted to define a type Airplane, which would,
e.g., add the functionality to store the number of passengers, then we'd have to
re-edit and re-compile the above code.
In contrast, C++ offers the possiblity of polymorphism. The advantage
is that `old' code remains usable. The implementation of an extra class
Airplane would in C++ mean one extra class, possibly with
its own (virtual) functions getweight() and
setweight(). A function like:
void printweight (Vehicle const *any)
{
printf ("Weight: %d\n", any->getweight ());
}
would still work; the function wouldn't even need to be recompiled, since late binding is in effect.
This section briefly describes how polymorphism is implemented in C++. Understanding the implementation is not necessary for the usage of this feature of C++, though it does explain why there is a cost of polymorphism in terms of memory usage.
The fundamental idea of polymorphism is that the C++ compiler does not
know which function to call at compile-time; the right function can only be
selected at run-time. That means that the address of the function must be stored
somewhere, to be looked up prior to the actual call. This `somewhere' place must
be accessible from the object in question. E.g., when a Vehicle *vp
points to a Truck object, then vp->getweight()
calls a member function of Truck; the address of this function is
determined from the actual object which vp points to.
The most common implementation is the following. An object which contains virtual functions holds as its first data member a hidden field, pointing to an array of pointers which hold the addresses of the virtual functions. It must be noted that this implementation is compiler-dependent, and is by no means dictated by the C++ ANSI definition.
The table of the addresses of virtual functions is shared by all objects of the class. It even may be the case that two classes share the same table. The overhead in terms of memory consumption is therefore:
A statement like vp->getweight() therefore first inspects the
hidden data member of the object pointed to by vp. In the case of
the vehicle classification system, this data member points to a table of two
addresses: one pointer for the function getweight() and one pointer
for the function setweight(). The actual function which is called
is determined from this table.
The organization of the objects concerning virtual functions is further illustrated in the following figure:
As can be seen from table ImplementationFigure
, all objects which use virtual functions must have one (hidden) data member
to address a table of function pointers. The objects of the classes
Vehicle and Auto both address the same table. The
class Truck however introduces its own version of
getweight(): therefore, this class needs its own table of function
pointers.
Until now the base class Vehicle contained its own, concrete,
implementations of the virtual functions getweight() and
setweight(). In C++ it is however also possible only to
mention virtual functions in a base class, and not define them. The
functions are concretely implemented in a derived class. This approach defines a
protocol, which has to be followed in the derived classes.
The special feature of only declaring functions in a base class, and not defining them, is that derived classes must take care of the actual definition: the C++ compiler will not allow the definition of an object of a class which doesn't concretely define the function in question. The base class thus enforces a protocol by declaring a function by its name, return value and arguments; but the derived classes must take care of the actual implementation. The base class itself is therefore only a model, to be used for the derivation of other classes. Such base classes are also called abstract classes.
The functions which are only declared but not defined in the base class are
called pure virtual functions. A function is made pure virtual by
preceding its declaration with the keyword virtual and by
postfixing it with = 0. An example of a pure virtual function
occurs in the following listing, where the definition of a class
Sortable requires that all subsequent classes have a function
compare():
class Sortable
{
public:
virtual int compare (Sortable const &other) const = 0;
};
The function compare() must return an int and
receives a reference to a second Sortable object. Possibly its
action would be to compare the current object with the other one.
The function is not allowed to alter the other object, as other is
declared const. Furthermore, the function is not allowed to alter
the current object, as the function itself is declared const.
The above base class can be used as a model for derived classes. As an
example consider the following class Person (a prototype of which
was introduced in section Person
), capable of comparing two Person objects by the alphabetical
order of their names and addresses:
class Person: public Sortable
{
public:
// constructors, destructors, and stuff
Person ();
Person (char const *nm, char const *add, char const *ph);
Person (Person const &other);
Person const &operator= (Person const &other);
// interface
char const *getname () const;
char const *getaddress () const;
char const *getphone () const;
void setname (char const *nm);
void setaddress (char const *add);
void setphone (char const *ph);
// requirements enforced by Sortable
int compare (Sortable const &other) const;
private:
// data members
char *name, *address, *phone;
};
int Person::compare (Sortable const &o)
{
Person
const &other = (Person const &)o;
register int
cmp;
// first try: if names unequal, we're done
if ( (cmp = strcmp (name, other.name)) )
return (cmp);
// second try: compare by addresses
return (strcmp (address, other.address));
}
Note in the implementation of Person::compare() that the
argument of the function is not a reference to a Person but a
reference to a Sortable. Remember that C++ allows function
overloading: a function compare(Person const &other) would be
an entirely different function from the one required by the protocol of
Sortable. In the implementation of the function we therefore cast
the Sortable& argument to a Person&
argument.
Sometimes it may be useful to know in the concrete implementation of a pure
virtual function what the other object is. E.g., the function
Person::compare() should make the comparison only if the
other object is a Person too: imagine what the
statement
strcmp (name, other.name)
would do when the other object were in fact not a
Person and hence did not have a char *name
datamember.
We therefore present here an improved version of the protocol of the class
Sortable. This class is expanded to require that each derived class
implements a function int getsignature():
class Sortable
{
.
.
virtual int getsignature () const = 0;
.
.
};
The concrete function Person::compare() can now compare names
and addresses only if the signatures of the current and other object match:
int Person::compare (Sortable const &o)
{
register int
cmp;
// first, check signatures
if ( (cmp = getsignature () - o.getsignature ()) )
return (cmp);
Person
const &other = (Person const &)o;
// next: if names unequal, we're done
if ( (cmp = strcmp (name, other.name)) )
return (cmp);
// last try: compare by addresses
return (strcmp (address, other.address));
}
The crux of the matter is of course the function getsignature().
This function should return a unique int value for its particular
class. An elegant implementation is the following:
class Person: public Sortable
{
.
.
// getsignature() now required too
int getsignature () const;
}
int Person::getsignature () const
{
static int // Person's own tag, I'm quite sure
tag; // that no other class can access it
return ( (int) &tag ); // hence, &tag is unique for Person
}
When the operator delete releases memory which is occupied by a
dynamically allocated object, a corresponding destructor is called to ensure
that internally used memory of the object can also be released. Now consider the
following code fragment, in which the two classes from the previous sections are
used:
Sortable
*sp;
Person
*pp = new Person ("Frank", "frank@icce.rug.nl", "633688");
sp = pp; // sp now points to a Person
.
.
delete sp; // object destroyed
In this example an object of a derived class (Person) is
destroyed using a base class pointer (Sortable*). For a `standard'
class definition this will mean that the destructor of Sortable is
called, instead of the destructor of Person.
C++ however allows virtual destructors. By preceding the declaration
of a destructor with the keyword virtual we can ensure that the
right destructor is activated even when called via a base class pointer. The
definition of the class Sortable would therefore become:
class Sortable
{
public:
virtual ~Sortable ();
virtual int compare (Sortable const &other) const = 0;
.
.
};
Should the virtual destructor of the base class be a pure virtual
function or not? In general, the answer to this question would be no: for a
class such as Sortable the definition should not force
derived classes to define a destructor. In contrast, compare() is a
pure virtual function: in this case the base class defines a protocol which must
be adhered to.
By defining the destructor of the base class as virtual, but not
as purely so, the base class offers the possibility of redefinition of the
destructor in any derived classes. The base class doesn't enforce the
choice.
The conclusion is therefore that the base class must define a destructor function, which is used in the case that derived classes do not define their own destructors. Such a destructor could be an empty function:
Sortable::~Sortable ()
{
}
As was previously mentioned in chapter Inheritance it is possible to derive a class from several base classes at once. Such a derived class inherits the properties of all its base classes. Of course, the base classes themselves may be derived from classes yet higher in the hierarchy.
A slight difficulty in multiple inheritance may arise when more than one
`path' leads from the derived class to the base class. This is illustrated in
the code fragment below: a class Derived is doubly derived from a
class Base:
class Base
{
public:
void setfield (int val)
{ field = val; }
int getfield () const
{ return (field); }
private:
int field;
};
class Derived: public Base, public Base
{
};
Due to the double derivation, the functionality of Base now
occurs twice in Derived. This leads to ambiguity: when the function
setfield() is called for a Derived object,
which function should that be, since there are two? In such a duplicate
derivation, many C++ compilers will fail to generate code and (correctly)
identify the error.
The above code clearly duplicates its base class in the derivation. Such a
duplication can be easily avoided here. But duplication of a base class can also
occur via nested inheritance, where an object is derived from, say, an
Auto and from an Air (see the vehicle classification
system, section VehicleSystem
). Such a class would be needed to represent, e.g., a flying car (AirAuto would ultimately contain two Vehicles, and
hence two weight fields, two setweight() functions and
two getweight() functions.
Let's investigate closer why an AirAuto introduces ambiguity,
when derived from Auto and Air.
AirAuto is an Auto, hence a
Land, and hence a Vehicle.
AirAuto is also an Air, and hence a
Vehicle. The duplication of Vehicle data is further illustrated in the
following figure:
The internal organization of an AirAuto is shown in the
following figure:
The C++ compiler will detect the ambiguity in an AirAuto
object, and will therefore fail to produce code for a statement like:
AirAuto
cool;
printf ("%d\n", cool.getweight());
The question of which member function getweight() should be
called, cannot be resolved by the compiler. The programmer has two possibilities
to resolve the ambiguity explicitly:
// let's hope that the weight is kept in the Auto
// part of the object..
printf ("%d\n", cool.Auto::getweight ());
Note the place of the scope operator and the class
name: before the name of the member function itself.
getweight() could be created for
the class AirAuto:
int AirAuto::getweight () const
{
return (Auto::getweight ());
}
The second possibility from the two above is preferable, since it relieves
the programmer who uses the class AirAuto of special
precautions.
However, besides these explicit solutions, there is a more elegant one. This will be discussed in the next section.
As is illustrated in figure InternalOrganization
, more than one object of the type Vehicle is present in one
AirAuto. The result is not only an ambiguity in the functions which
access the weight data, but also the presence of two
weight fields. This is somewhat redundant, since we can assume that
an AirAuto has just one weight.
We can achieve that only one Vehicle be contained in an
AirAuto. This is done by ensuring that the base class which is
multiply present in a derived class, is defined as a virtual base class.
The behavior of virtual base classes is the following: when a base class
B is a virtual base class of a derived class D, then
B may be present in D but this is not necessarily so.
The compiler will leave out the inclusion of the members of B when
these are already present in D.
For the class AirAuto this means that the derivation of
Land and Air is changed:
class Land: virtual public Vehicle
{
.
.
};
class Air: virtual public Vehicle
{
.
.
};
The virtual derivation ensures that via the Land route, a
Vehicle is only added to a class when not yet present. The same
holds true for the Air route. This means that we can no longer say
by which route a Vehicle becomes a part of an AirAuto;
we only can say that there is one Vehicle object embedded.
The internal organization of an AirAuto after virtual derivation
is shown in the following figure:
Concerning virtual derivation we make the following final remarks:
Land
or Air with virtual derivation. That also would have the effect
that one definition of a Vehicle in an AirAuto would
be dropped. Defining both Land and Air as virtually
derived is however by no means erroneous.
Vehicle in an AirAuto is no
longer `embedded' in Auto or Air has a consequence
for the chain of construction. The constructor of an AirAuto will
directly call the constructor of a Vehicle; this constructor will
not be called from the constructors of Auto or Air.
Summarizing, virtual derivation has the consequence that ambiguity in the calling of member functions of a base class is avoided. Furthermore, duplication of data members is avoided.
In contrast to the previous definition of a class such as
AirAuto, situations may arise where the double presence of the
members of a base class is appropriate. To illustrate this, consider the
definition of a Truck from section Truck
:
class Truck: public Auto
{
public:
// constructors
Truck ();
Truck (int engine_wt, int sp, char const *nm,
int trailer_wt);
// interface: to set two weight fields
void setweight (int engine_wt, int trailer_wt);
// and to return combined weight
int getweight () const;
private:
// data
int trailer_weight;
};
// example of constructor
Truck::Truck (int engine_wt, int sp, char const *nm,
int trailer_wt)
: Auto (engine_wt, sp, nm)
{
trailer_weight = trailer_wt;
}
// example of interface function
int Truck::getweight () const
{
return
( // sum of:
Auto::getweight () + // engine part plus
trailer_wt // the trailer
);
}
This definition shows how a Truck object is constructed to hold
two weight fields: one via its derivation from Auto and one via its
own int trailer_weight data member. Such a definition is of course
valid, but could be rewritten. We could let a Truck be derived from
an Auto and from a Vehicle, thereby explicitly
requesting the double presence of a Vehicle; one for the weight of
the engine and cabin, and one for the weight of the trailer.
A small item of interest here is that a derivation like
class Truck: public Auto, public Vehicle
is not accepted by the C++ compiler: a Vehicle is already
part of an Auto, and is therefore not needed. An intermediate class
resolves the problem: we derive a class TrailerVeh from
Vehicle, and Truck from Auto and from
TrailerVeh. All ambiguities concerning the member functions are
then be resolved in the class Truck:
class TrailerVeh: public Vehicle
{
public:
TrailerVeh (int wt);
};
TrailerVeh::TrailerVeh (int wt)
: Vehicle (wt)
{
}
class Truck: public Auto, public TrailerVeh
{
public:
// constructors
Truck ();
Truck (int engine_wt, int sp, char const *nm,
int trailer_wt);
// interface: to set two weight fields
void setweight (int engine_wt, int trailer_wt);
// and to return combined weight
int getweight () const;
};
// example of constructor
Truck::Truck (int engine_wt, int sp, char const *nm,
int trailer_wt)
: Auto (engine_wt, sp, nm), TrailerVeh (trailer_wt)
{
}
// example of interface function
int Truck::getweight () const
{
return
( // sum of:
Auto::getweight () + // engine part plus
TrailerVeh::getweight () // the trailer
);
}