OOP in C++ and D

Introduction

The C programming language has been remarkably successful. The language is simple and efficient, but for better or for worse, it lacks object orientation. Since C's inception, there have been a number of attempts to turn C into an Object Oriented Programming (OOP) language. Two notable ones are C++ and D.

C++

Simply put, C++ was designed to be a “better C” [BS2007]. The primary purpose of the language was to turn C into an OOP language, but it has many features like namespaces, template meta programming, the standard library, and many other features which are not strictly object oriented [BS1997].

C++ is (almost) a superset of C, so (almost) all of the features in C are also in C++. There are a few exceptions to this with the version of C defined in 1999 (C99), but they are mostly insignificant.

D

To say that C++ is a feature-rich language would probably be an understatement. For example, one 10 year old feature, export, is currently only supported by two compilers and is unlikely to gain any more support in the near future [SP2003]. It was in this context that the D language arose. Its designers explicitly state the language was created for “those who decide the promise of C++ object oriented programming is not fulfilled due to the complexity of it” [D2007O]. Most of the differences between the two languages were guided by this relationship.

The designers of D decided not to make their language a superset of C. The incompatibility is lamentable, but the end result is that D is a more consistent language, and it allowed D to fix a number of design issues that C++ retains (like arrays). However, D compilers are designed to be “link compatible” with C. They can import C libraries and call C functions.

The D language is more evolutionary than it is revolutionary. Most of its features have appeared in some language before, although there are a number of notable differences.

Basic Similarities

C-isms

While C++ and D are different languages, they have plenty in common. Both of them are statically typed languages. C++ and D favor a stronger and safer style, but they both support traditional pointers and casts. Consider the following lines:

int main()
{
    char* str = "Hello World";
    str += 6;
    printf("%s\n", str);
    return 0;
}

This is valid code in both languages (though in C++ the statement, #include <stdio.h> is required at the top and C compiler may give a cryptic warning message if it is missing). The given code will output World plus a newline; however, if the number in line 4 is large enough (greater than 11), then printf will try to print the contents of some unknown part in the memory. The actually result depends on many factors, but it is always ugly and may result in a serious security vulnerability known as a buffer overflow.

Classes

The basic syntax for defining a class in the two languages is fairly similar. The chief difference is that in D, the implementation and the definition are in the same place. D has done away with the need for header files (although there is an optional compiler feature called interface files to speed up the compilation process).

Header files in C (and C++) serve three purposes. The first one is a workaround for the fact that the compiler must read the files sequentially and so functions must be declared before they are used in the code. The result is a list of function prototypes at the beginning of the code. It seemed natural to put these prototypes in a separate file. The second reason for header files is to serve as a lookup table when importing libraries when the source code is not available. Finally, header files can be useful place to provide a quick reference for the definition without providing details about the implementation. Extra details can make it difficult to single out the important information in code.

In D, none of these quite apply. D compilers are designed to be able to find function definitions that come after the code that calls them. Lookup tables are built into the libraries themselves, so header files do not need to be shipped separately. Finally, the argument for headers as a reference is becoming increasingly less important with the advent of more modern development tools which can automatically build documentation straight from the code itself. On the other hand, having a separate definition file does force a more intentional approach to code design. It is difficult to say exactly which method is better here.

In C++ a class is defined like this:

#include <string>
#include <iostream>
using namespace std;
// definition
class Cat {
public:
    Cat(string);
    ~Cat();
    Litter mate(Cat&);
private:
    string name;
}
// implementation
Cat::Cat(string name) {
    this->name = name;
}
Cat::~Cat(string name) {
    cout << "poor " << this->name <<
          " died :(" << endl;
}
Litter Cat::mate(Cat& spouse) {
    // do stuff and return a Litter
}

In D it would look like:

import std.stdio;
class Cat {
   private char[] name;

   this(char[] name) {
      this.name = name;
   }
   ~this {
      writefln("Poor "+this.name+" has died");
   }
   Litter mate(Cat spouse){
      // do stuff here and return Litter
   }
}

Note that in C++ the constructor is a function with the same name as the class. In D, a function named this is used. However, this difference is merely cosmetic and the two constructors behave the same.

Differences

References and copying

In C++, objects have copy semantics. To pass and object by reference the & operator must be used, or a pointer to the object (like the magic this pointer). If a pointer is used, slightly different syntax must be used for attribute and method access. The pointer.attribute is invalid as the pointer is a pointer, not an object. Either pointer->attribute or (*pointer).attribute are both equivalent and valid.

In D, all objects (and dynamic arrays) have reference semantics. Passing an object will, by default, pass the reference. D also includes an operator to test identity, the is operator, which is necessary in a number of cases.

First, it is necessary for testing whether an object has been initialized. object == null will cause a compile time error because the null type does not have a version of opEquals (the overloading function for the == operator) that works with objects. So the test should be object is null. Second, is is more efficient, and doesn't need to be overloaded to function properly.

Third, when passing an object to a function, the function may either make a copy of the object, modify the copy, and then return the copy; or it may take a reference and modify that reference directly. In D, the convention is to take a reference and return the reference if it is unmodified, but if the object is to be modified, then create a copy and return the copy.

While the behavior is probably the most versatile, this can be a problem sometimes if you want to make sure the result is a copy. For example, the following code will cause a segmentation fault:

import std.string;

int main()
{
    char[] start = "SPAM";
    char[] end;
    end = toupper(start);
    end[1] = 'p';
    return 0;
}

This is because string literals are read-only. Since start is all upper case, the toupper function does not modify start, so a reference to start is returned. Therefore, end is the same object as start, a read-only string. Then the program proceeds to change values in the read-only string, which causes a segmentation fault.

If the value in start was “sPAM”, the program would run successfully. So if you are planning to modify end, make sure to test if end is start first. If it is, make end a copy of itself before modifying.

Allocation

In C++ objects can be statically or dynamically allocated. Dynamically allocated objects are really pointers. Also, there is no garbage collector, so dynamically created objects should be manually destroyed when they are no longer useful. Here are some code samples (using made up names):

Static allocation:

ClassName object(args)

Dynamic allocation:

DeclaredClass *object = new RealClass(args);
// note, remember to do this later on
~object;

In both cases the variable has copy semantics as noted earlier; However, with the dynamic allocation, the variable is actually a pointer, so it acts like it has reference semantics (unless you dereference it). Also, DeclaredClass and RealClass are often the same name, unless we are using polymorphism.

In D, there is only one way to allocate an object. It is dynamic, and has reference semantics. However, unlike the C++ version, it is not a pointer. D has a garbage collector so it is not necessary to destroy the object manually, it will be destroyed as soon as it has gone out of scope.

Allocation in D:

DeclaredClass object = new RealClass(args);

Virtual Functions

In C++, if you had a pointer to an object which is of one class (say Cat) and it was declared as a member of the Animal class (say Cat extends Animal), then calls to virtual functions would look for the function in the Cat class. Non-virtual functions always look for the function in the class the object is declared as (in this case, Animal).

Here is an example of the above:

#include <string>
#include <iostream>
using namespace std;

class Animal {
public:
    Animal(string);
    string getName();
    virtual int countLegs();
protected:
    string name;
};
class Cat: public Animal {
public:
    Cat(string);
    string getName();
    virtual int countLegs();
};

Animal::Animal(string name) {
    this->name = name;
}
Cat::Cat(string name):Animal(name){
}
string Animal::getName() {
    return this->name;
}
int Animal::countLegs() {
    return 0;
}
string Cat::getName() {
    return "cat: " + this->name;
}
int Cat::countLegs() {
    return 4;
}

int main() {
    Animal *garfield = new Cat("garfield");
    cout << garfield->getName() << endl;
    cout << garfield->countLegs() << endl;
    return 0;
}

And the output:

garfield
4

Note that Cat::getName is not called because Animal::getName is not virtual, but Cat::countLegs is called because Animal::countLegs is virtual.

In D, all methods which can be virtual, are virtual. Some methods (like constructors) can not possibly be used virtually, so the compiler will ignore them when setting up virtual functions.

Operator Overloading

Both C++ and D support overloading of operators. However, the way that they do it, and what they allow is somewhat different.

Operator overloading in C++ is extremely flexible, possibly to a fault. Almost all of the operators can be overloaded (including operators like new and delete). There are no constraints on what how the overloading works so it is entirely up to the programmer to decide what is “reasonable” behavior. Also, if the object is on the left-hand side of the equation, the overloading must be done in a function outside of that object's class. This kind of overloading can result in confusing code.

D's operator overloading provides constraints that removed most of these problems. First, while the list of overloadable operators is fairly large, operators for object instantiation can not be overloaded. Also, operators may be communicative (e.g. A == B is the same as B == A). So the opAdd operator (for +) can be used if the object is on the right or the left side of the expression. If the operation is not supposed to be communicative (e.g. matrix multiplication), then another operator opAdd_r can be defined. This is the same for many other operators too. Some operators like - and % are not communicable.

Also, operators which should do complementary actions, do. The == operator and the != operator both call the opEquals function, and opCmp is used for <, %%<=%%, >, and >=. Thus, reasonably consistent behavior can be expected from overloaded operators.

Pure Virtual and Abstract

Both languages support ways to declare that a method is not yet implemented so must be implemented by a base class. In C++ this done with pure virtual methods. A pure virtual method is a virtual method that is defined with a = 0 at the end. While the technique is less than obvious, and slightly confusing, it works. For example:

class AbstractParent {
public:
    virtual void doFoo() =0;
}

In D, classes can be label abstract. It works in a fairly predictable manner:

abstract class AbstractParent {
   void doFoo();
}

Attribute Visibility

Both languages support data encapsulation, but there are some differences in the details. C++ supports 4 levels, private, protected, friend, and public. D supports 4 slightly different levels: private, protected, package, and public. In C++ private means available inside the class, protected means available to the class and all subclasses, friend specifies another type that can access the classes private/protected members, and public means available to everyone.

In D, There are two differences. First, private and protected members are also available within the module they are defined. Second, the package visibility means that it is available with the package (a group of modules).

D also has another specifier, export, which allows access to external libraries and modules. However, this is not particular to classes or OOP.

Arity of Inheritance and Interfaces

In C++, a class may inherit from multiple base classes classes. In D, support for multiple inheritance has been intentionally dropped because it would add too much extra complexity to the complier implementations to outweigh any perceived value.

Instead of multiple inheritance, D provides interfaces. While interfaces are limited in what they can do, they are sufficient in many cases. An interface in D is fairly similar to how it is done in other programming languages:

interface List
{
    void length();
    void get(in int);
    void put(in int, in Object);
}

Note that in D, there are in, out, and inout keywords that specify whether a parameter is to be used for input, output or both. This eliminates the need for a separate definition of the interface in IDL.

Additionally, in D, all objects inherit from the class Object, either directly, if they have no other parent class, or indirectly, because their parent class will inherit from Object.

How to make a string

Strings in C

Earlier, the problems with careless pointers were outlined (n.b. in C/C++ arrays are really glorified pointers so problems that apply to one generally apply to the other). The situation also has other side effects. For example, string concatenation does not work as one might expect.

char* hello = "Hello ";
hello + "World\n";
// error invalid operators to binary +

Simply put, the length of the string is not a property of the string itself. The strlen function can guess it by looking the first \0 character, but that does not always give the desired results. Copying one string to another is probably the most common case. Consider the following implementation of the standard function strcpy:

char *strcpy(char *dest, const char *src)
{
    while (*dest++ = *src++);
    return dest;
}

This is how strcpy was designed and every C compiler ships with a similar function. The problem is that if the strings are not the same length, memory loss or buffer overflows can occur. To illustrate this:

char s1[] = "foo";
char s2[] = "Hello World";
strcpy(s1, s2);

Now, s2 ends up looking like "foo\0o World\0" and everything after the first \0 would be out of reach. If s1 and s2 were reversed, a buffer overflow occurs and s1 contains "Hell", and the next 8 spaces in memory are filled with "o World\0". This is a subtle bug because printing s1 would result in the expected output, Hello World, but depending on how the system arranges the memory, the code may overwrite memory that belongs to another variable or program.

As a result, the strncpy function is provided which specifies how many bits to copy. This means that the length of a string must be stored separately from the string itself. It also needs to be manually updated whenever the string is changed.

String objects in C++

C++ provides a solution to this problem with the string class in the Standard Template Library (STL). Strings have variable length, and operator overloading allows strings to perform a number of basic operations easily. Assigning a char[] works as expected:

char f[] = "foo";
std::string s = f;

Strings can be added (together or with char[]'s) which concatenates them. Strings can be compared and have their indexes accessed as if they were an array of characters. Strings know how long they are and can resize themselves when necessary.

Like all good objects, strings have methods too. The earlier problems with figuring how long a string is can be solved by either capacity() or length() depending on the situation. There are also methods for getting substrings, iterating thought the string, clearing the string, and many other things. Like other classes, string can be sub-classed (although some methods like operator= are not inherited directly).

There are a few problems with C++ model. First of all, it introduces a new type into the language which is not compatible with the old one. Overloading of common functions like getline have helped to ease the pain of this, but the difference is still there. Also, it doesn't solve the base problem: arrays don't know their own length.

Strings in D

The D solution is not object oriented, (although it has some similarities). Instead of creating a separate class for the string, D fixes the underlying problems with the array (and to varying extents, the other primitive types as well).

Arrays in D may be dynamically sized or given a static size. Dynamically sized arrays are created simply by not specifying the size in the initialization (e.g. char[] foo; is a dynamic array).

While arrays and strings in D are not objects, and do not have methods, they do have a feature called properties, which mostly makes up for the lack of methods. Properties will be explained later on.

The lack of having string objects with methods is a disadvantage, although there is a sufficient list of string functions in the std.string module. On the other hand, strings in D are more efficient, and having fixed the problems with the arrays is a big bonus. The new arrays (and associative arrays too) are almost as powerful as Vector and Map in STL, and much more efficient.

Complex numbers

As with strings, C++ provides a complex number class in STL. D avoids OOP again and provides a built-in type. There are a number of advantages to this.

  1. Simpler and more intuitive syntax for creating complex numbers. In D: cfloat a = 1 + 2i / (2 + 1i). The equivalent in C++: complex<float> a = 1 + complex<float>(0,2) / complex<float>(2,1);

  2. As with strings, the D version is more efficient. Particularly because in D there are the ifloat, idouble, and ireal types which spare the program from having to do extra calculations when dealing with purely imaginary numbers.

  3. In C++ there is no imaginary type, only complex. According to Digital Mars [D2007C] some equations this can result in incorrect values. Also, it is doubtful that there would be any use for inheritance with from a complex number class. If there was, a workaround using composition would be more than sufficient.

Properties in D

D uses a feature called properties. Every variable, type and expression has properties associated with it. The properties init, sizeof, alignof, mangleof, and stringof are always defined.

Here is an example of properties on an array:

import std.stdio;

int main()
{
    char[] hello = "Uryyb Jbeyq";
    writefln(hello.sizeof);
    writefln(hello.length);
    writefln(hello);
    return 0;
}

And the resulting output:

16
11
Uryyb Jbeyq

Classes and structs have properties as well. By default they have the five that all types have, but more can be defined as well. For classes and structs, a property is simply any method with 0 or 1 arguments. Thus to get a property:

var = obj.propertyName;
// is translated into
var = obj.propertyName();

And to set a property:

obj.propertyName = var;
// is translated into
obj.propertyName(var);

Consider the following class definition:

class Person
{
    protected char[] fName;
    protected char[] lName;

    this(char[] firstName, char[] lastName)
    {
        fName = firstName;
        lName = lastName;
    }

    public char[] name()
    {
        return fName ~ " " ~ lName;
    }

    public char[] firstName(char[] str)
    {
        return fName = str;
    }
}

Then the following program will give the following output:

import std.stdio;

int main()
{
    Person p = new Person("Some","Guy");
    writefln(p.name);
    p.firstName = "Another";
    writefln(p.name);
    return 0;
}
Some Guy
Another Guy

Setting properties can be overloaded like other methods and the actually method called will depend on the type of the variable being assigned to the property.

This enables D to conform to the Uniform Access Principle [BM2005] and gives programmers more flexibility. The traditional design pattern of getter and setter functions can be implemented transparently.

Conclusion

Both C++ and D promote unique styles to object oriented programming. While they do share a some similarities, there are a large number of differences in the details of how things are done and the constraints that are set upon different parts.

C++ is an old and widely used language. C++ is heavily dependent on STL, using objects and classes for almost anything possible. C++ has many, many features and writing a compliant compiler would a difficult task.

D is a relatively new language. It solves many of the problems with C, but it breaks compatibility with the language too. To make the language easier to understand and use, D does not add a number of the less useful features of C++, like multiple inheritance and non-virtual methods.

While classes and objects are a fundamental part of the language, arrays can sometimes be used to archive the same effect in a much simpler and more efficient manner. D also has many other features that are interesting, but not particularly related to OOP like contracts and built-in unit tests. It is definitely a language worth observing.

Notes

The code samples in this document were all compiled and tested with the Gnu Compiler Collection (GCC) version 4.1.2 with the GDC (D front-end) version 0.23. The D compiler should be compatible with version 1.007 of the Digital Mars D compiler (DMD).

References

[BS2007] Bjarne Stroustrup. 2007-1-28. The C++ Programming Languager.
[SP2003] Herb Sutter and Tom Plum. 2003-3-3. Why We Can’t Afford Export.
[D2007C] 2007-3-23. D Complex Types and C++ std::complex. In Digital Mars – The D Programming Language.
[BM2005] Bertrand Meyer. 2005-10. Business plus pleasure. EiffelWorld.