| |

Understanding .NET's Common Type System
David Chappell - December
14 , 2001
What is a programming language? One way to think about it is as
a specific syntax with a set of keywords that can be used to define
data and express operations on that data. While language syntaxes
differ, the underlying abstractions of most popular languages today
are very similar. All of them support various data types such as
integers and strings, all allow grouping executable code into methods,
and all provide a way to group data and methods into classes. When
a new programming language is defined, the usual approach is to
define underlying abstractions such as thesekey aspects of
the language's semanticsconcomitantly with the language's
syntax.
Yet there are other possibilities. Suppose you chose to define
core programming abstractions without mapping them to any particular
syntax. If the abstractions were general enough, they could then
be used in many different programming languages. Rather than inextricably
mingling syntax and semantics, they could be kept separate, allowing
different languages to be used with the same set of underlying abstractions.
This is exactly what's done in the Common Type System (CTS).
A fundamental part of the .NET Framework's Common Language
Runtime (CLR), the CTS specifies no particular syntax or keywords,
but instead defines a common set of types that can be used with
many different language syntaxes. Each language is free to define
any syntax it wishes, but if that language is built on the CLR,
it will use at least some of the types defined by the CTS. While
the creator of a CLR-based language is free to implement only a
subset of the types defined by the CTS, and even to add types of
her own to her language, most languages built on the CLR make extensive
use of the CTS-defined types. Visual Basic.NET, C#, and pretty much
every other language used with the .NET Framework rely heavily on
the CTS.
Figure
1 The CTS defines reference and value types, all of which inherit
from a common Object type.
Figure
1 shows a substantial subset of the types defined by the CTS.
The first thing to note is that every type inherits either directly
or indirectly from a base Object type. Notice, too, that every type
defined by the CTS is either a reference type or a value
type. As their names suggest, an instance of a reference type always
contains a reference to a value of that type, while an instance
of a value type contains the value itself. Reference types inherit
directly from Object, while all value types inherit directly from
a type called ValueType, which in turn inherits from Object.
Value types tend to be simple. As Figure 1 shows, the types in
this category include Byte, Char, signed integers of various lengths,
unsigned integers of various lengths, single- and double-precision
floating point, Decimal, Boolean, and more. Reference types, by
contrast, are more complex. As shown in the figure, for instance,
the CTS's reference types include the following:
-
Class: A CTS class can have methods, events, and properties;
it can maintain its state in one or more fields; and it can
contain nested types. Classes have one or more constructors,
which are initialization methods that execute when a new instance
of this class is created. A class can directly inherit from
at most one other class, and act as the direct parent for at
most one inheriting child class. A class can also implement
one or more interfaces, described next.
-
Interface: An interface can include methods, properties,
and events. Unlike a class, an interface can inherit from one
or more other interfaces simultaneously.
-
Array: An array is a group of values of the same type.
Arrays can have one or more dimensions, and their upper and
lower bounds can be set more or less arbitrarily (although languages
built on the CTS commonly restrict this freedom).
-
String: A string is just a group of Unicode characters.
Strings can't be modified once they're created.
-
Delegate: A delegate is effectively a pointer to a method,
and they're commonly used for event handling and callbacks.
To really understand the difference between value types and reference
typesa fundamental distinction in the CTSyou must first
understand how memory is allocated for instances of each type. In
managed code, values can have their memory allocated either on the
stack managed by the CLR or on a CLR-managed heap.
Variables allocated on the stack are typically created when a method
is called or when a running method creates them. In either case,
the memory used by stack variables is automatically freed when the
method in which they were created returns. Variables allocated on
the heap, however, don't have their memory freed when the method
that created them ends. Instead, the memory used by these variables
is freed via a process called garbage collection.
A basic difference between value types and reference types is that
an instance of a value type has its value allocated on the stack,
while an instance of a reference type has only a reference to its
actual value allocated on the stack. The value itself is allocated
on the heap. Figure
2 shows an abstract picture of how this looks. In the case shown
here, three instances of value typesInt16, Char, and Int32have
been created on the managed stack, while one instance of the reference
type String exists on the managed heap. Note that even the reference
type instance has an entry on the stackit's a reference
to the memory on the heapbut the instance's contents
are stored on the heap.
Figure
2 Instances of value types are allocated on the managed stack,
whereas instances of reference types are allocated on the managed
heap.
There are cases when an instance of a value type needs to be treated
as an instance of a reference type. For situations like this, a
value type instance can be converted into a reference type instance
through a process called boxing. When a value type instance
is boxed, storage is allocated on the heap and the instance's
value is copied into that space. A reference to this storage is
placed on the stack. The boxed value is an object, a reference type
that contains the contents of the value type instance. A boxed value
type instance can also be converted back to its original form, a
process called unboxing.
CLR-based programming languages such as C# and Visual Basic.NET
construct their own type systems on top of the CTS types. Despite
their different representations, however, the semantics of these
types are essentially the same in typical CLR-based languages. Because
of this, no matter which CLR-based language you're working
inC#, VB.NET, or something elsethe CTS underlies a large
part of what you're doing. As Windows development shifts more
and more to .NET, the CTS will become the foundation for a growing
part of the world's software.
|
|


Website
design and development by kmcreative.
KMCREATIVE is a Silicon Valley based graphic design firm specializing
in corporate collateral, web design, web development, identity,
medical illustration and product illustration.
|
 |