Why are there so many programming languages?

A few friends who aren’t in software have recently asked me why there are so many different programming languages out there, and what makes them distinct from one another. They’ve accompanied these queries with rather delightful misconceptions such as “XML sounds like the adult version of HTML”, and “Ruby on Rails, is that like for women?” By the way, the latter is totally understandable, given that there’s a character named Ruby Rails, after the programming language, in the GoldieBlox engineering toy set targeted towards girls.

As I’ve attempted to answer this question to them, I’ve come up with a few useful analogies such as European languages and car transmissions. I’d like to share these out more widely, so that a broader audience can get a window into the strange inner workings of the software world.

Comparing programming languages to natural languages

The dominant languages of the world change as new societies emerge and power shifts. Looking back at the last millennium, roughly a dozen new natural languages have become influential, which gives us one new major language about every 100 years. Programming languages evolve much faster; over the last 50 years, a new language has been popularized about every 5 years. Later I’ll explain how programming languages become dominant and stick around, but first, let’s take a look at a select few of the most influential programming languages in the history of programming.

How might we go about corresponding programming languages to natural languages? Compared to natural languages, programming languages are all very closely related to each other; they evolved for very specific purposes in a very short timeframe, adapting their terminology from successful programming languages before it. Because of these tight-knit relationships, this analogy will be most successful if we limit the comparison to Indo-European languages and their influence on the Western world.

The equivalent of classical Greek would be the language ALGOL: Just as Greek played an important role in the foundation of science and philosophy in classical antiquity, ALGOL was developed in the classical days of programming—the 1950s—where computer scientists were just beginning to explore theoretical properties of programs and algorithms. ALGOL popularized the idea of programs structured into blocks, and heavily influenced most programming languages that are popular today, including C and its descendants. Just as Greek is no longer a lingua franca but its letters and roots still appear frequently in academia and science, to this day ALGOL is still in use in theoretical discussions of algorithms in textbooks although it is rarely used in software development.

The equivalent of Latin would be C. C emerged in the 1970s alongside the Unix operating system, and was somewhat influenced by ALGOL. Just as Latin directly influenced many major world languages of today (French, Spanish, Portuguese, Italian), C directly influenced major programming languages (C++, C#, and Java), some of which are backwards-compatible with C. Just as Greek and ALGOL were influential in theoretical matters, Latin and C are both influential in institutional matters: Latin for law, medicine, and religion; C for operating systems, networking, and compilers. Although unlike Latin, C is still commonly used for many casual, practical purposes. Some of the most widespread languages that evolved directly from C are Java and Python, perhaps they could correspond to French and Spanish.

And the equivalent of English would be JavaScript. Just as English has a hybrid of influences (Germanic at its core, French and many others for vocabulary), so does JavaScript (Smalltalk at its core, C and Java for syntax). English and JavaScript are easy to learn but difficult to master, due to their many influences. And both owe their popularity to the Internet: the Internet has cemented English’s position as the dominant language of the information age, and JavaScript is the language run by web browsers, making it literally the programming language of the consumer Internet.

So now let’s explore some of the different reasons that languages emerge.

Different languages are supported by different operating systems

Many operating systems require that their apps be written in one or two supported programming languages. Java is used for Android apps; Objective-C is used for iOS and Mac OS X apps; C# (pronounced C-Sharp) is used for Windows apps, and C and C++ can be used for many of these operating systems.

Once an operating system creator like Microsoft or Apple decides which programming languages to support, it will usually continue supporting those languages indefinitely, so that popular apps work in the newest version of the OS. For example, if I wrote an iOS 6 app in Objective-C, it’s much easier for me to slightly modify it to ensure it works in iOS 7, than it is to completely rewrite it in a different language. This language lock-in becomes reinforced over time, as reusable open-source code for iOS apps gets written in Objective-C.

Thus, the decision to tie certain languages to certain operating systems often dates back to decades ago. In fact, the decision that led to Apple’s use of Objective-C for its operating systems dates back to the 1980s, when NeXT decided to use Objective-C for its operating system NeXTSTEP, which was acquired by Apple and became the basis for Mac OS X and later iOS.

So long as these operating systems remain popular, app-specific languages like Java, C#, and Objective-C will remain in heavy use, even though they all offer a similar set of programming features and performance trade-offs, as explained below.

Newer languages make computer memory easier to use

Newer programming languages make it easier to perform certain tasks automatically, and this improved ease of use allows these languages to flourish and surpass their predecessors in usage. The programming language C allows you to directly modify computer memory, and may require you to specify when you’re done using it; while the newer language Java only allows you to modify variables in your code, not memory, and memory is managed indirectly through your code’s variables. Java’s strict separation of memory from programmer control can be seen as an abstraction of memory; newer languages often offer improved abstractions over older languages. Memory management is a feature that differentiates many languages popularized in the 1990s from languages popularized earlier; C and C++ require manual memory management, but Java, C#, and Objective-C offer automatic memory management, as do all scripting languages explained below.

So why is Java’s memory management enough of an improvement over C’s to allow it to flourish as a new language? Java’s memory management could be compared to automatic transmission in a car, while C is like manual transmission. With Java and automatic, it’s harder to make mistakes that crash or stall the machine. It’s easier to operate since the machine does some of the work for you: Java programmers don’t need to clean up memory that’s no longer used; drivers using automatic don’t need one hand on the stick shift when accelerating.

Even though the automatic approach is easier and safer, the manual approach still has its uses. The manual approach gives its operator more fine-grained control over the operation, which is important to specific users. C programmers can make programs faster and use less memory than Java programmers, so programmers writing operating systems and browsers prefer languages like C, because performance and memory usage are critical for these uses. Similarly, race car drivers use manual transmission to have full control over the transmission, so they can maintain control of the car when handling it in extreme ways. Since Java and C both have their uses, both remain well-known languages that are used in industry and in academia.

Scripting languages make programming more interactive

Besides memory management, another important categorical difference between programming languages is compiled vs. scripting languages. Compiled languages include C, C++, Java, C#, and Objective-C. Scripting languages were mostly popularized in the 2000s alongside the spread of the Internet, and include Python, PHP, Ruby, and JavaScript.

Compiled languages must be read and executed as separate stages: the entire program is interpreted together by the computer in a stage called compilation; the program will run if compilation is successful. If the programmer makes a mistake in the code that the computer doesn’t understand (a syntax error), it will cause the entire compilation to stop, and nothing will run. It’s like filming a long take; any mistake will require filming the entire take over again from the start. Similarly, some particularly large programs indeed take minutes to compile and need to be re-compiled if there is a syntax error, so this drawback of compiled languages can be quite bothersome.

On the other hand, scripting languages can be read line-by-line. A line is read in from the program and then run by itself, so a syntax error will only disrupt execution of one line, rather than the entire program, making these errors less costly. Continuing the film analogy, this would be deciding to split a long take into shorter takes, where mistakes no longer cost minutes of people’s time. This property of scripting languages is particularly useful since it allows programmers to improvise programs line-by-line and quickly see the result, rather than having to plan out several interacting parts in advance. Scripting languages are easier to learn because this interactivity is possible.

However, this benefit of scripting languages comes at a cost of speed and memory efficiency: if C is a race car and Java is a regular car; scripting languages are golf carts at best. Writing an operating system or a browser would be like driving across this country, this speed makes a huge difference and you’ll choose speed over ease of use. But if you’re just writing a simple application like an address book or a puzzle game, then it’s like driving down the block: the difference in speed doesn’t really matter since there’s so little work to be done.

Other types of languages

In the above explanations, I only covered a small set of general-purpose programming languages, so there are many languages you may have heard of that weren’t mentioned, and I’d like to note them here for completeness’s sake. I left out functional languages like LISP and Haskell that are not as widespread. There are other types of programming languages that are specific to certain uses: markup languages like XML, HTML and CSS that describe data or documents; query languages like SQL that describe operations on databases; and mathematical and statistical programming languages such as MATLAB and R. Correspondingly, there are programmers who specialize in building webpages, databases, and analysis tools using these languages.