C#: A language alternative or just J–?, Part 1
What the new language for .Net and post-Java Microsoft means to you
C# (pronounced “C sharp”) is Microsoft researcher Anders Hejlsberg’s latest accomplishment. C# looks astonishingly like Java; it includes language features like single inheritance, interfaces, nearly identical syntax, and compilation to an intermediate format. But C# distinguishes itself from Java with language design features borrowed from Delphi, direct integration with COM (Component Object Model), and its key role in Microsoft’s .Net Windows networking framework.
In this article, I will examine common motivations for creating a new computer language, and speculate on which might have led to C#. Next I will introduce C# with regard to its similarities to Java. Then I will discuss a couple of high-level, fundamental differences in scope between Java and C#. I close the article by evaluating the wisdom (or lack thereof) in developing large applications in multiple languages, a key strategy for .Net and C#.
Currently, C# and .Net are available only as a C# language specification (not yet in final form), a “pre-beta SDK Technology Preview” for Windows 2000, and a quickly growing corpus of articles on MSDN. This article is based on those resources and some of my own speculation.
Read the whole series, “C#: A Language Alternative or Just J–?”:
- Part 1. What the new language for .Net and post-Java Microsoft means to you
- Part 2. An in-depth look into the semantic differences and design choices between C# and Java
Enter C#
Imagine you’re creating a new computer language, and you want to solve some of the traditional problems for C and C++ programmers: memory leaks, difficulty writing multithreaded applications, static linking, illegal pointer references, overly complex multiple-inheritance rules, and so on. To flatten the learning curve, you design the language to look a great deal like C and C++. Then you add garbage collection, integrated thread interlocking, and dynamic linking, you throw out pointers, you allow only single inheritance but introduce the concept of an interface, and so on. Five years ago, Sun Microsystems introduced Java technology, which did those things and was platform-neutral, to boot.
In June 2000, Microsoft preannounced C#, which was designed expressly for its nascent .Net application development framework. In addition to C#, the immensely talented Hejlsberg created the revolutionary languages Turbo Pascal and Delphi while at Borland, but also the counterrevolutionary Visual J++ while at Microsoft. C# and Java address many of the same problems with C and C++. In fact, C# looks so much like Java that you could very easily confuse them.
So why create C# at all? Is C# a “Java wannabe?” Since Microsoft obviously needs to deal with the Visual J++ developers it has left stranded, is C# just “Visual J–“; that is, Java with some new features and without the Sun logo, trademark, and narrow-eyed lawyers? Or is C# a technology that gives Windows developers the functionality of Java, could possibly compete directly with Java, and is useful in its own right?
It’s easy to be skeptical of C#, given its almost surreal similarity to Java in syntax, design, and even runtime behavior. It looks almost as if, having failed to corrupt the Java marketplace with proprietary extensions and strategic omissions, Microsoft has simply created a copy of Java, with a new name and a familiar market approach. This is at least not entirely the case: in the context of COM and .Net, C# may well have a place in the world of Windows development.
Motivation for creating a new language
A new computer language could be created as part of a research project, to explore new system architectures or new ideas in programming semantics, or to pull together advances from several other language projects to produce a more powerful language. Innovations in computer technology often change basic assumptions about programming and system development, and new languages arise to take advantage of new ideas. Special applications sometimes require new languages, which are tied intimately to the domain in which they operate. General-purpose languages, however, are usually created either to address existing languages’ inadequacies, to fill some business need, or both.
For example, C++ was created as an extension of the C programming language, and was originally called “C with classes.” Though innovative and extremely powerful, C suffered from problems with scalability, code fragility, and memory management complexity, among others. C++ was created as an object-oriented approach to solving those problems.
C++ has been widely accepted as a system development language, but its “improvements” came at the cost of increased complexity. C, and to a lesser extent C++, are widely considered to be highly portable, exemplified by the portability of the Unix operating system.
Portability between processors is different from portability between underlying operating system APIs. Different operating systems factor access differently to similar system services. The resulting “impedance mismatch” (to appropriate a lousy metaphor) creates a layer of complexity and potential software flaws in the software layer where the application accesses system services. Anyone who has tried to create, for example, a GUI framework portable across platforms, understands this problem.
Java was created, in part, to address the issues of language complexity, memory management, and cross-platform portability. Java also addresses the business needs of consumers and companies who want to leverage their existing hardware assets, instead of being locked into a particular platform by an operating system vendor. Finally, the rise of the Internet and the ubiquity of network computing make cross-platform portability and airtight security even more important.
C#, announced by Microsoft but not yet released, addresses technical and business problems that Microsoft has recently encountered. Despite several attempts at simplification, the COM object programming framework has never been easy to use, and DCOM (Distributed Component Object Model) adds yet another layer of difficulty. Thus, COM development has been mostly limited to highly trained (and expensive) Windows C/C++ programmers, and Visual Basic users who have taken the time to learn to use a stripped-down interface to COM. The C and C++ languages alone require a great deal of skill to be used effectively and safely; Visual Basic has some object-oriented-like features, but is not a true object-oriented language.
When Java burst onto the scene in 1995, it grabbed an enormous amount of mindshare from Microsoft; people started to talk about a world where an operating system’s underlying applications were irrelevant. Java looked so much like C and C++, existing programmers came up to speed in record time. Java also provided cross-platform portability at the operating-system level and addressed many problems that had limited the productivity of C and C++ programmers.
Microsoft initially embraced Java as a language that solved problems with C and C++ while maintaining the training assets of the existing C and C++ programmer base. Unfortunately, Microsoft found that when it tried to extend Java in Visual J++ and tie it more closely to the Windows operating system, Sun hit Microsoft with a lawsuit (see Resources) for violating the terms of its licensing agreement. As a result, Microsoft dumped its Visual J++ product (as well as the developers it had attracted to the tool). There was talk last year of a possible new Microsoft language called Cool, which Microsoft did not acknowledge. Rumor has it C# is that language. (Microsoft still sells Visual J++, but there has not been a new release since October 1998 and Visual J++ has no place in the .Net platform. Java is being integrated into .Net by a separate vendor.)
So what kind of language has Microsoft created? The next section discusses C# in terms of its similarity to Java, since an understanding of Java is common to most JavaWorld readers.
C# and Java similarities
In the grand tradition of programming tutorials that began with C, my comparison of Java and C# begins with a familiar “Hello, world!” example. The code for this multilingual example appears in Table 1.
Java | C# |
---|
|
|
The similarities between these two simple programs are obvious. Both encapsulate their main function, which is static, within an enclosing class. Both access a global name, System
, that wraps access to system services. The similarities do not end with source code: Java, as you probably know, compiles to byte code — operation codes in the instruction set of the Java Virtual Machine. C# compiles to MSIL (Microsoft Intermediate Language, formerly known as portable binary format), an intermediate, assembly-like language to which all .Net languages compile. MSIL could easily be called “Windows byte code”; however, just-in-time (JIT) compiling is only one of its design goals. MSIL’s design was influenced heavily by the design goal of language interoperability. (Learn more about this in the section entitled Intermediate Language below.)
External reference
Usage of code external to a module is handled similarly in Java and C#. Java uses the import
keyword to declare references to external names; C# provides the using
keyword, as shown in Table 2.
Java | C# |
---|---|
|
|
The two keywords work in a similar manner; both allow you to use names from another compilation unit without fully specifying the name. Neither C# nor Java use the C preprocessor construct #include
, because the reference to the external module is at a logical, not a lexical, level. This means the external reference is resolved at link time, as well as at compile time. This has special significance for C#, since it allows modules to subclass and operate with modules written in other languages.
The difference between import
in Java and using
in C# is that Java has a concept of packages, which has a specific meaning in the context of symbol accessibility, while C# uses namespaces much like those of C++. The using
keyword makes all names in the given namespace accessible to a module. So, the line using System;
lets you access the .Net runtime namespace System
. The System
namespace contains the static global method System.Console.WriteLine()
, which is accessible as Console.WriteLine()
without specifying the System
namespace. (Compare Tables 1 and 2.) In the Java example, System
is a class defined in java.lang
, which is implicitly imported into every Java source file; therefore, the import
statement is not needed. However, including import java.lang.System.*;
does not permit you to omit the System.
from System.out.println
as in C#, because System
is a class, not a namespace. Thus, external names are referenced in a way that seems similar, but has different underlying mechanisms. This difference could be more confusing to programmers accustomed to Java than to C++ programmers who understand and use namespaces. Neither option is more expressively powerful; the two languages simply use different mechanisms to disambiguate names.
Control constructs
Simple statements in C# and Java look alike, since both languages descend primarily from C and C++. Table 3 presents common language constructs in C# and Java.
Statement | Java | C# |
---|---|---|
if/then/else |
|
|
switch |
|
|
while |
|
|
do/while |
|
|
foreach |
|
|
break/continue |
|
|
return |
|
|
new |
|
|
throw try/catch/finally |
|
|
exclusive access |
|
|
class definition |
|
|
interface definition |
|
|
interface implementation |
|
|
As you can see in Table 3, most procedural statements in C# are similar, if not identical, to their corresponding statements in Java, and both languages are very similar to C++. There are a few differences worth noting:
The foreach
keyword: C# has a built-in construct, foreach
, used for iterating collections. In Table 3, this keyword iterates over a collection of int
s. The expression (int j in theList)
defines an iteration variable j
, which is subsequently assigned to the integer values of the array theList
. If a value in theList
cannot be converted and assigned to an integer, an exception is thrown; in other words, the iterated collection is not of a specific type, and therefore the foreach
keyword is not type-safe at compile time.
Java lacks the foreach
keyword, of course, so one possible implementation of collection iteration appears in Table 3, using java.util.Vector
as the collection. Vectors are not type-safe either, and the collection iteration relies on methods in the collection class, rather than on a language construct like foreach
. In addition, Java’s distinction between Integer
objects and int
primitives results in some typecasting that is unnecessary in C#.
Empty catch
clauses: In C#, the clause that follows the catch
keyword and specifies which exception is caught is optional. If the catch clause is absent, the catch
block is executed for any exception the try
block throws. If the programmer doesn’t care about the contents of the thrown exception, omitting the catch clause alleviates the burden of defining a variable (the thrown exception) that isn’t used. I think the optional catch clause encourages programmers to be cavalier about ignoring error conditions, and therefore encourages poor programming practice. Compared to the absence of explicit exception declarations (the next item in this list), which I consider a serious design flaw, this minor language feature is a venial sin at worst.
No explicit exception declarations: In Java, a throws
clause is mandatory for checked exceptions; in C#, throws
doesn’t exist. Novice Java programmers often complain that throws
is tedious, and usually short-circuit their program exceptions by using try/catches
with empty catch
blocks. That technique is the software equivalent of putting pennies behind fuses: it works fine until something goes wrong. (Regular readers of my columns are not invited to point out that I use this technique myself in sample code for my articles. I don’t do it in “real” programs.) By not requiring explicit exception declarations in method signatures, C# values short-term programmer convenience over program safety and correctness.
Interface and class definition syntax: C# replaces the Java extends
and implements
keywords with a colon. Java and C# are similar not only in syntax, but in architectural language design decisions. C#’s designers arrived at many of the same conclusions about how to “fix” C++ that Java’s designers had reached 5 years (or so) earlier.
Both languages use automatic garbage collection for memory management (though C# allows access to pointer types and unmanaged memory in so-called “unsafe” code sections). Both have eliminated the delete
operator. Neither currently has generic types, (implemented in C++ as templates), which are clearly needed. Both allow only single inheritance, offering interfaces as an arguably superior alternative to multiple inheritance. Both provide a hierarchical naming scheme, though C#’s namespaces are a more C++-like solution than are Java packages. Both have a base object type, from which all other classes are, by definition, derived. And so on. This does not necessarily mean that C# directly copies Java. Perhaps these decisions simply reflect the current consensus about what is desirable in this type of language.
The similarity between C++ and C# is a great benefit to any organization with an existing training investment in C and C++. Programmers accustomed to C++ will have no trouble understanding C#. Moreover, Windows programmers who invested time learning Java (often in the form of Visual J++) will come up to speed on C# even more quickly than C++ programmers. As I noted before, former Visual J++ programmers developing Windows applications will not have to choose between Java’s ease of use and clean design and an API designed specifically for Windows. Writing to the Windows API sacrifices cross-platform compatibility in any case, and Microsoft would rather have a language it can control than one controlled by the Java Community Process. C# provides Windows developers with a language as easy to use as Java, and provides Microsoft with a language, targeted directly at its platform, that it can control entirely. Interestingly, C# has been submitted to a standards body, a topic I will discuss further in Part 2 of this article. If C# becomes standardized, it will be interesting to see how much control over C# Microsoft is really willing to relinquish.
C# and Java contrasts
C#’s most intriguing facets are its differences from Java, not its similarities. This section (and much of Part 2 of this series) covers features of C# that Java implements differently or entirely lacks.
Intermediate language
Microsoft is very flexible about choosing when MSIL is compiled to the native machine code. The company takes care to say that MSIL is not interpreted, but compiled to machine code. It also understands that many — if not most — programmers accept the idea that Java programs are inherently slower than anything written in C. The implication is that MSIL-based programs (written in C#, Visual Basic, “Managed C++” — a version of C++ that conforms to the CLS — and so on) will outperform “interpreted” Java byte code. Of course, this has yet to be demonstrated, since C# and other MSIL-producing compilers have not yet been released. But the ubiquity of JIT compilers for Java make Java and C# relatively equal in terms of performance. Statements like, “C# is compiled and Java is interpreted,” are simply marketing spin. Java byte code and MSIL are both intermediate assembly-like languages that are compiled to machine code for execution, at runtime or otherwise.
COM integration
The biggest win for Windows programmers with C# may be its painless integration of COM, Microsoft’s Win32 component technology. In fact, it will eventually be possible to write COM clients and servers in any .Net language. Classes written in C# can subclass an existing COM component; the resulting class can be used as a COM component too, and can then be subclassed in, for example, JScript, to provide yet a third COM component. The result is an environment in which components are network services, subclassable in any .Net language.
Microsoft’s goal is to make component creation accessible from as many languages as possible, integrated within the .Net framework. Several vendors have already committed to creating .Net-enabled versions of programming languages as diverse as COBOL and Haskell. Developers could choose different languages to solve different problems. More important, programmers would not have to learn a new language to use .Net: they could choose one they already knew. For more on COM integration, see Jacques Surveyer’s excellent survey of C# in Dr. Dobbs Journal. (See Resources.)
Best of breed or Franken-code?
The language interoperation goal certainly has its appeal. Imagine a system that uses “best of breed” languages for various tasks, “empowering” every developer to use his or her favorite language, “leveraging” existing information assets, and so on. The idea of being able to use any language anywhere, and even use multiple languages within a particular inheritance hierarchy, sounds exciting. However, I’m not sure I’m interested in that sort of excitement.
C# will initially ship with Visual Studio 7, which may be able to provide source-level debugging for multiple languages, an impressive achievement if Microsoft can make it work. But imagine actually working on a project of significant size in which multiple languages are used in a single application. Consider these concerns:
- Would you want to manage that project? (Yeah, OK hotshot, you understand all those languages, because you’re an über-hacker. Let me rephrase the question: Would you want your current boss to manage that project?)
- How many languages are you using? Three? Six? Eight? What about that guy in your group who refuses to code in anything but APL, Haskell, or Prolog? There’s a lot to like about those languages, but do you want to train everyone in your group to use them just so they can effectively debug the system? Or will Mr. Prolog have to sit in on all debugging sessions that use his code?
- Let’s say you’re using six languages — are you single-sourced for any of them? What will you do if the only company that makes the compiler for one of your languages goes out of business? Or if the other vendor’s language products are incompatible with your code because your developers used language extensions proprietary to the original, now unsupported, language version?
- What are the language version numbers for the n languages you’re using to implement your system? If you think this question doesn’t matter, think again. Languages may change more slowly than programs, but they do change.
-
Q. What’s more of a headache than a bug in a compiler?
A.
Bugs in six compilers.
- Microsoft’s Common Language Subset, which describes language features necessary for .Net interoperation, places restrictions on languages that compile to MSIL. For example, Microsoft will provide Managed C++, which is C++ with some additional proprietary Microsoft “managed extensions.” C++ programmers will have to learn to use these Microsoft-specific extensions. The same may be true for other languages.
- What if you decide you want or need to change platform vendors? What if you someday decide that Microsoft’s products aren’t meeting your needs? Or some technology you desperately want or need doesn’t integrate with .Net (possibly because Microsoft locks that technology out of .Net because it’s a threat to other MS products)? Where will you go? And, more important, in this scenario, who decides where you go? (Hint: it’s not you.)
With a little thought, I’m sure you can add to this list. The reality probably isn’t quite that bad, though. Components written in multiple languages comprise many large data processing systems (the World Wide Web, for example). Most servers, and many applications, are extensible in multiple scripting and/or compiled languages. Used wisely, language mix-and-match in a project can provide needed flexibility and power. But only if the languages are selected for valid architectural reasons, not simply to “leverage” (salvage) creaky, poorly structured legacy code, to make use of undertrained entry-level programmers, or to satisfy managers cutting corners on training costs.
Conclusion
This article, the first in a two-part series, has provided a superficial overview of C#, focusing on its similarity to Java. The next article will cover C#’s language features, and demonstrate that C# and Java are actually quite different; they have many subtle semantic differences and design choices, and fill different technological and market niches. I’ll also cover Microsoft’s attempts to standardize C#, and what it may mean to Java. Stay tuned next month.