The Joy of Reference Counting
David Chappell - August 1997
One of the standard parts of object technology is the notion of
an object's lifecycle, the key aspects of which are creation and
deletion. To create a new instance of some class, for example, C++
and Java programmers can use the new operator, while a COM programmer
might call CoCreateInstance to create a new COM object. While it
exists, this object instance can be accessed by one or more clients,
and some memory is typically allocated for its state.
Figuring out how to get rid of an object instance when it's no longer needed is a bit more challenging. One option is to assume omniscience on the part of the programmer: just as she knows when to create a new object, she also knows when to destroy it. This is essentially the strategy used in C++, where an object allocated with new can be destroyed with delete.
In Java, however, a client can create an object, use it for awhile, then just forget about it. Objects with no active clients are eventually deleted by Java's garbage collector, the existence of which frees programmers from worrying about explicitly deleting objects. Automatic garbage collection is a very nice thing to have, since it's easy to forget to delete an object when you're done using it.
Deleting COM Objects
Unlike C++, COM doesn't provide an explicit delete operation. COM also doesn't necessarily assume the presence of an automatic garbage collector. Instead, COM relies on reference counting to determine when an object can safely delete itself. To accomplish this, each COM object maintains a count of how many clients hold references to its interfaces. Every time the object hands out a pointer to one of its interfaces, it adds one to that reference count. In some cases, when a client acquires an interface pointer to an object from a source other than the object itself, that client must call the object's AddRef method, which also causes the object to increment its reference count by one. Whenever a client is finished using an interface pointer, it calls Release on that pointer, and the object subtracts one from the reference count. When all clients have finished using all of an object's interfaces, i.e., when the reference count falls to zero, the object typically commits suicide, freeing any resources it has been consuming.
It doesn't take a whole lot of thought to realize that requiring software developers to correctly call AddRef and Release all the time might lead to problems. Some developers (not readers of this magazine, of course, but some other developers) might forget to call Addref when they should, or they might call Release too often or not often enough. Errors like these will result in odd behavior, such as objects that die prematurely or wear out their welcome by hanging around after all of their clients are gone. Yet the original view of C++ COM programming assumed just this kind of correctness on the part of COM programmers. The predictable result was that, sometimes, things didn't work as expected.
COM Reference Counting Today
The situation is different today. For one thing, COM is supported by lots of different languages, including Visual Basic, Java, and many more. In most of these cases, reference counting is hidden from the programmer-he never needs to call AddRef or Release. Instead, the language runtime takes care of the work required to make reference counting work correctly.
For example, in Microsoft's implementation of the Java Virtual Machine, a Java client can treat an external COM object as if it were a Java object. When the client is done using the object, it can forget about it, just as with any other Java object. As always, the Java garbage collector notices the now-unused object and proceeds to delete it. If the garbage collector in Microsoft's Java VM determines that the object in question is a COM object, it simply calls Release on the object. Reference counting is still used to control the lifetime of every COM object, but the Java programmer is freed from the responsibility of worrying about it-it's taken care of automatically.
Even C++ programmers can now avoid this potentially error-prone task. Microsoft's Visual C++ 5.0 can automatically generate smart pointer classes that know how to do reference counting correctly. While many C++ COM programmers have long used this technique to simplify their lives, having the classes produced automatically is quite handy. Reference counting remains the underlying mechanism used for controlling the lifetime of COM objects, but it no longer need be the programmer's concern.
Reference Counting for Remote Objects
Hiding reference counting from programmers is unquestionably a good thing, but it doesn't affect another potential problem: how can reference counting be done effectively across a network? Distributed COM (DCOM) faces exactly this problem, since COM uses reference counting to control the lifetimes of both local and remote objects.
There are two main thing to worry about with remote reference counting. The first concern is efficiency: if clients call AddRef and Release frequently (and they typically do), one could imagine that lots of network traffic might be generated just keeping each remote object's reference count up to date. The second problem is making sure that unexpected client failures (remember, we're talking about clients running Windows here) don't result in garbage objects that hang around forever.
DCOM's designers handled the first problem by realizing that all an object needs to avoid premature self-immolation is a non-zero reference count. Whether the count's value is one or one hundred, the object won't go away. This means that a large number of the AddRef and Release calls clients make on remote objects can be handled locally on the client's system-as long as an object thinks it has at least one client, it won't go away. In DCOM, most calls to Addref and Release go no farther than the client proxy (DCOM's term for a client stub)-they aren't actually sent across the network. Only when the last Release occurs is a message actually sent to the remote object. While this means that the object's reference count won't always match its actual number of clients, who cares? The result is the same-objects die only when they're supposed to-and a good deal of unnecessary network traffic is avoided.
The second problem is a little harder. When a client dies unexpectedly, it will never be able to call Release on the objects to which it holds references. Without some way for the remote object to learn about its client's untimely demise, it will never be able to delete itself. If this happens a lot (as it will in a network of any reasonable size), servers will grow without bound, and eventually need to be shut down and restarted to free those resources. This is an unattractive solution, one that's bound to annoy existing clients that are using those servers.
There's a standard way for an object to learn about its client's death: the client can periodically send the object's server a packet, reassuring the object that its client is still alive. If enough time passes without the arrival of one of these keepalive packets, the server can assume the client is dead and take the appropriate action. This is exactly what DCOM does. Every two minutes a ping packet is sent to the server; if no such packets arrive for three ping periods-six minutes-the server assumes the client is dead and calls Release as needed.
The prospect of every client sending a ping packet to every object it holds a reference to on every server is chilling. If done naively, it's easy to imagine all of a network's bandwidth being eaten up by pinging. Fortunately, DCOM's designers didn't choose to do things this way. Instead, the DCOM infrastructure automatically creates ping sets consisting of every client/referenced object combination on each pair of machines. The entire ping set is then kept alive with a single packet sent every two minutes between those two machines. One packet every two minutes between each pair of machines, while not exactly free, is much better than blindly sending a separate ping packet for each client/object pair.
Complaining About Problems vs. Solving Them
The OMG has been fiercely critical of DCOM's pinging approach to distributed garbage collection, claiming that it will never scale. But the CORBA standards deal with this problem by pretending it doesn't exist-they don't define a solution. As a result, different CORBA-based products do different things. Some essentially require servers to be periodically shut down and restarted to avoid infinite growth due to garbage objects maintained for crashed clients. Others rely on some kind of pinging, whether based on TCP keepalives or an ORB-specific mechanism. The key point is that these allegedly standard products solve this crucial problem in different ways, which doesn't bode well for interoperability among them.
Reference counting isn't without cost. But for true component architectures, where an all-knowing component that knows when to delete everything can't exist, it's the only viable solution to the problem of deleting objects when they're no longer needed (in fact, even the apparently dead OpenDoc relied on reference counting). And when hidden beneath language-appropriate mechanisms, reference counting can be simple-programmers don't even need to know it's there.