 
The Infinite Bit: An Inside Story of Digital Technology  
Published by Arvind Padmanabhan.  
Smashwords Edition.

Copyright © Arvind Padmanabhan, 2013.  
Cover Design © Arvind Padmanabhan, 2013.  
Chapter Cartoons © Boopathy Srinivasan, 2013.

Pictures under Creative Commons license are released as such.  
Pictures in public domain remain so.  
Other Illustrations © Arvind Padmanabhan, 2013.

Thank you for downloading this free ebook. Although this is a free book, it remains the copyrighted property of the author, and may not be reproduced, copied and distributed for commercial or non-commercial purposes. If you enjoyed this book, please encourage your friends to download their own copy at Smashwords.com, where they can also discover other works by this author. Thank you for your support.

**ARVIND PADMANABHAN** graduated from the National University of Singapore with a master's degree in electrical engineering. He has worked extensively on various wireless technologies. His interests include cryptography, Internet technology, and natural language processing. This is his first book on digital technology and it aims to simplify the subject for the layperson. He lives in Bangalore, India.

Mapping of Book Chapters to a Typical Digital Communication System

# Table of Contents

Preface

0001 Once Upon A Time

0010 The Science of Engineering

0011 Appreciating Noise

0100 A Measure of Information

0101 All in a Few Words

0110 Reaching for the Limit

0111 For Your Eyes Only

1000 In the Land of Ones and Zeros

1001 The Goodness of Being Soft

1010 Beyond Borders

1011 Bits on Wings

1100 From Carbon to Silicon

Acknowledgements

Notes

Bibliography

#  Preface

**Some eight years** ago, I received an email from a customer complaining that our solution wasn't working for him. We used to supply equipment for testing cellular mobile phones before they are released to the market. The customer experienced failures often enough that he decided to capture a series of screenshots and attach them in the email so that I could debug them.

The problem was that each image was 2.5 megabytes (MB) large and he had ten of them. Worse still, his mail server had some sort of an upper limit on attachment sizes. He got around it by sending me a series of five emails, each with two attachments. At my end, the corporate mail servers and systems were configured to scan all emails and attachments for viruses. In the end, the entire process became dead slow. The necessity of downloading 25 MB of data from a remote mail server held up for a quarter of an hour more urgent emails. When I finally got down to analysing the screenshots, I found that the error messages were of a textual nature with simple error codes. The screenshots, at least in this case, added little value.

The problem with today's technology is that it can be easily misused. Technology has not reached a point where it can assess and decide what the user wants. Technology is yet unable to find the best match across user needs, content, and delivery. Sometimes when it tries to do so, it gets it wrong. Until technology becomes smart, until it learns to learn and adapt, users need to know something about the tools they work with. Only then can they use them in the right manner. Every engineer's dream is to build a system whose technology is transparent to the user. In other words, the common person on the street need not know anything about it to use it. But this dream in the present day is yet to be realized.

Since that email incident, things have changed so much that the old paradigms are surviving only in pockets of almost obsolete systems. The order of the day is networking and ubiquitous connectivity. Broadband data speeds have increased dramatically. Systems are usually connected to the Internet. Large data farms have grown world over, riding on waves of miniaturization, specialization, distributed computing, and the ensuing economies of scale.

Today, the same customer might possibly send me a notification by email or Twitter but the screenshots themselves might be stored in a web cloud. I could pick up the files from the cloud when I want them. In other words, the culture of _pushing_ content to collaborators has been transformed to one of _pulling_. The customer's email system might be more intelligent, compressing large images before sending them out. Files amounting to 25 MB can be brought down to just 1 MB. The customer can dispense with images altogether and send only the necessary error codes. For interactive debugging, given the necessary privileges, I can even log into the customer's system halfway across the world, debug, and perhaps fix the problem in a matter of minutes.

But not everyone is tech-savvy. People often learn from their mistakes but to learn by the method of trial and error is expensive. To learn by reading about technology is usually difficult since most books are in textbook-style, ridden with equations and technical jargon. This book aims to simplify the concepts of digital technology from a broad perspective. With such an understanding, it is hoped that readers will appreciate the complexity of technology that is often taken for granted.

If technology is perceived to be simple, it is a tribute to the engineers. It is the engineer's mission to make complex things simple to use. Where engineering is successful, external simplicity often belies an underlying complexity. This has the unfortunate consequence that engineers are often not appreciated for their contributions. By telling their stories, it is hoped to set the record straight. While the numerous stories behind discoveries and inventions are often interesting, we restrict ourselves to the best and the most important of them. The effort is to weave together a complete and coherent account rather than be comprehensive. After all, this is not an encyclopedia. Neither is this a textbook. It has few equations. It uses little mathematics but is not unmathematical. Mathematics, when its essence is conveyed in plain words, can add value and bring clarity.

Today's digital world is often equated with the Internet. Internet is only one of many things that make up this digital world. A better approach to understanding digital technology is to look at the general framework of a digital communication system. The need of every user is to communicate and to experience. Communication in the beginning was private. When it became public, it was in the form of broadcasting. Recent years have seen a steady shift in the dynamics where individual expression in public domain is as important as standard news articles and announcements. Experience comes with a rich diversity that includes interaction, learning, and entertainment. Digital systems attempt to satisfy these needs based on three core principles:

1. Efficiency—since resources are expensive and finite, we must make the best of them.

2. Correctness—preserve data integrity since corrupt data is useless or even misleading.

3. Security—keep our identities intact, protect our data from eavesdropping, and enable confidence in systems that drive e-commerce.

To these traditional core principles, recent decades have seen the emerging importance of three secondary principles:

1. Connectivity—inter-network systems and make possible transparent access of distributed services.

2. Mobility—enable access beyond systems tethered to fixed locations so that users can avail services worldwide on the move.

3. Usability—use smart systems that make possible minimal manual intervention to achieve best user experience.

The enablers of the above principles are science and engineering. This book is about these twin enablers as much as it is about the men and women behind them. Readers can relate by direct experience since many aspects of this technology are pretty much integrated into our lives and culture—secure e-commerce transactions, mobile telephony, Internet radio, 10-megapixel digital camera, DVD movies, MP3 songs, HDTV on LCD flat screens, Skype calls, JPEG images attached to emails, broadband modems, Wi-Fi hotspots, or laser printing. Needless to say, all these acronyms of modern technology need getting used to.

A Framework for Digital Technology

A Caltech colleague once asked the eminent physicist, Richard Feynman, to explain certain concepts relating to Fermi-Dirac statistics. Feynman had by then made discoveries of his own in the field of quantum mechanics. He was one of the most respected physicists of his time. Responding to his colleague's request, being as ever an enthusiastic educator, he decided to prepare a freshman lecture on the topic. He came back a few days later and told the faculty member, "You know, I couldn't do it. I couldn't reduce it to the freshman level. That means we really don't understand it."

To simplify a complex and vast subject as digital technology is a daunting task. Almost every branch of the field leads to many sub-branches, many of which have in time matured to become formidable branches of knowledge and advancement in their own right. Then there is the complexity of inter-branch influences and cross-application of concepts. To do justice to this immense tree of digital technology, as within my reach, I have taken the approach of breadth of coverage rather than depth.

If I have failed to bring clarity in some areas, I must appeal to Feynman's sentiments, that I don't really understand it at a fundamental level. Nonetheless, the task of writing this book has brought me lots of new knowledge and clarifications on the old.

#  0001 Once Upon A Time

**One winter evening** in 1819, students at the University of Copenhagen assembled for a lecture-cum-demonstration. The subject was electricity, still in its infancy. Researchers had been experimenting on electricity since early seventeenth century. There was still much to be understood and much more to be explained. This lecture was therefore nothing short of the state-of-the-art knowledge and investigative work current at the time. The lecturer for the evening was the university's Professor of Physics, Hans Christian Oersted.

Professor Oersted had spent the entire afternoon assembling the equipment for the lecture. It was customary for him to demonstrate a few standard experiments for which the results were known in advance. When ideas occurred to him, he usually added new experiments in the presence of his inquisitive audience, often enlisting them to assist him in his endeavours.

The lecture proceeded as intended. The apparatus was explained. The expected effects were observed and the underlying theory was put forward to the students. Just as his audience proceeded to disperse, something unexpected happened. A magnetic needle had been placed accidentally, or perhaps by fortunate chance, near a wire. When one of his assistants closed the circuit by mistake, the needle turned. The professor exclaimed in surprise. The audience moved forward for a closer look. The professor repeated the experiment, opening and then closing the circuit repeatedly. The results were unmistakable. The flow of current in the wire affected the magnet. Thus was born the science of electromagnetism.

The above story is nothing more than a popular account of Oersted's discovery of electromagnetism. The public loves nothing more than the picture of an eccentric scientist struggling with the elements for years, until nature takes pity and decides to give up the secrets. If a scientist is an astute observer, he may not lose this rare chance and the discovery is his for all eternity. Historians of science, looking at the evidence before them, often incomplete and sometimes contradictory, have quite a different view of Oersted's discovery. For this, we need to travel further back to the start of the nineteenth century.

The year 1800 is well remembered as the year of the birth of electric current. _Galvanic current_ , as it was later called, was in tribute to Italian Luigi Galvani, who in 1786 had experimented with electric discharges through bodily fluid and muscle tissue of a dead frog laid bare on a metallic table. Professor Galvani could neither offer a satisfactory explanation to the convulsions of the frog nor translate his experiment to more useful effects. That was done partially by his contemporary and compatriot Alessandro Volta, a professor at the University of Padua.

Until the time of Volta, electricity was known only in its static form. In other words, electric charge could be accumulated, often in great quantities, and discharged. Discharge was in an instant and often in spectacular fashion. It was man-made lightning, only in a much smaller scale. Among the early experimenters was American Benjamin Franklin who in 1752 trapped charges from a lightning by flying a kite in a thunderstorm. By this process, he was able to store large amounts of charge in Leyden jars, which was an early form of today's charge capacitors. Using this knowledge, he would later invent the lightning rod, something we use to this day.

Through much of the eighteenth century, static electricity did not have any significant application. It was used for entertainment such as Francis Hauksbee's glowing glass spheres. It was used at times for shock treatment of patients. There was no scientific basis for this. No one understood electricity much, less so the effect of electric discharge through the human body. Still, those were curious times and there were many who were not hesitant to try new things.

Until prehistoric man tamed fire, he had been in awe of it. As the eighteenth century drew to a close, man had some control over static electric discharges but he had not yet tamed it. If anything, European Renaissance had brought about a new outlook in scientific enquiry. Prehistoric man had been a poor scientific being. He had accepted nature as it was. Nineteenth-century man was more inquisitive. He was not a mere observer of nature's workings. He wanted to pry open her secrets. He was, in short, passionately involved with her.

It was in such a time that Volta tamed electricity. Taking the cue from Galvani's results, he hypothesized that physical contact of two dissimilar metals results in charge separation and charge flow. Building on this hypothesis, his experiments led him to invent what we now call the _Voltaic pile_ , a stack of pairs of zinc and copper plates separated by cloths soaked in brine. Soon Volta invented a variant of the pile, which came to be called _Volta's_ _crown of cups_. With these inventions, Volta could produce a form of current that was no longer instantly discharged. Current was now brought to a form that was continuous.

By the year 1800, we thus had three distinct fields of knowledge—magnetism, in which early pioneering work had been done by William Gilbert in the court of Queen Elizabeth I of England; electricity, then a term that referred only to static or frictional electricity that could be created by rubbing dissimilar materials; galvanism, the continuous current produced by Volta's invention of the cell, soon to be improved by other scientists in the field. More than these branches of knowledge, scientists became interested in their similarities and interrelationships.

It was not difficult to see that static electricity and galvanism were one and the same. Both related to flow of electric charges. Their difference was only one of form—one uncontrolled and impulsive, the other controlled and continuous. The relationship between these and magnetism was less obvious. It was not even clear if the two were connected in some way.

By the time Oersted entered the field, scientists had been seeking a way to unify electricity and magnetism for nearly two hundred years. In a series of carefully constructed experiments, William Gilbert had established two fundamental facts about magnetism—that like poles repel and opposite poles attract. In 1785, performing even more delicate experiments involving accurate measurement apparatus that he himself had constructed, French scientist Charles Coulomb did the same for electric charges—that like charges repel and opposite charges attract. While Gilbert had experimented on electrostatics as well, he had not known about charges repelling. Seventeenth-century folks who cared about these scientific advancements noted that at times when lightning occurred, iron needles lying nearby got magnetized. The suggestion that electricity and magnetism had an underlying unifying force, a common origin, was not outrageous even to careful sceptics.

But not everyone was convinced. French mathematician André-Marie Ampère for one expressed his views in 1802 stating that electricity and magnetism are two fluids of rather distinct nature. Englishman Thomas Young in 1807 too saw no obvious connection between the two. Ampère would later make pioneering discoveries and lay the foundations of the science of electrodynamics. Young for this part had already proposed that light travels like a wave and challenged Newtonian view that light travelled as particles. In time, Ampère's friend Augustin Fresnel would bring Ampère over to the wave theory camp. The commonality between Ampère's and Young's contributions would be electromagnetism. Light was after all an electromagnetic wave. A conclusive proof of this would come only in the 1880s in the hands of an ingenious German physicist (Chapter 11).

Scientists of the age were influenced as much by philosophy as by the work of their colleagues. Those enquiring into the observable nature of the world were known as _natural philosophers_ only because the word _scientist_ was yet to be coined. The twin aspects of philosophical theorizing and active experimentation often, but not always, went hand in hand. When science had so little foundation to build upon, this was a necessity. Scientists were part-philosophers. In this environment grew the German school of Romantic _Naturphilosophie_ of Friedrich W. J. Schelling whose ideas influenced Oersted. Schelling believed in the unity of the forces of nature, manifested in various forms in our world of experience. Secondly, it is impossible to know this unity by experiment alone. Thus, we must take some concepts as a priori from which speculative physics is possible and acceptable.

Oersted believed that Schelling's idea of unity meant that electricity and magnetism must be related, somehow. As often is the case in science, belief in something is the starting point for experimentation and discovery. But the line of attack was not obvious. No experiments suggested to Oersted or others. As if to elevate the challenge, a new element was thrown into the mix—chemistry.

Ever since Volta invented the cell, investigation into electrical phenomena took a new turn. Within a few weeks of Volta's discovery, English chemists William Nicholson and Anthony Carlisle decomposed water by passing galvanic current through water. Hydrogen bubbles formed at one end and oxygen bubbles at the other. A little later in 1807, Humphrey Davy in England decomposed metal salts to isolate potassium and sodium. A year later, he isolated many new metals from their oxides—magnesium, strontium, barium, and calcium. Thus was born the new field of _electrochemistry_. Galvanic current directly propelled chemistry to new heights. Electrochemistry suggested that electric forces could be used to break chemical forces. Thus, it was possible to break a chemical compound by the use of electric force.

When the English chemists passed electricity through water, galvanism might have inspired them to do so. They possibly applied Schelling's belief in the unity of all things. Volta had shown that a chemical reaction could produce electrical reaction—although Volta himself did not advance the chemical explanation. To Nicholson and Carlisle, this possibly suggested that electrical reaction could produce chemical reaction.

These developments were not lost on Oersted. He enquired if electric forces and chemical forces were simply different manifestations of the same force. Oersted's first communication on this matter was published in a French journal in 1806. One would have expected things to move on quickly from this point onwards but Oersted remained silent for six years. He clarified his views in 1812 in a German publication, followed a year later in a French publication. Ideas were now beginning to take a definitive shape in Oersted's mind but much of it was still speculative in the manner of Schelling's _Naturphilosophie_. Nonetheless, it was an important starting point in the evolution of scientific thinking.

Oersted was seeking to identify the unifying force behind all chemical processes. With this, all of chemistry would be explained with reference to primary forces. He also conjectured that perhaps galvanic electricity has a greater affinity to magnetism than frictional electricity. On the physical reality of electricity, borrowing from Young's wave theory, he came closer to the truth. He proposed that electricity was not a fluid but disturbances of equilibrium in matter, a series of continual loss and replenishment of charges, which gives the effect of a continuous flow of galvanic current. This idea was a direct influence of chemical reactions. Thus, electricity propagated like a wave and did not flow like a fluid, about which he stated,

One could express this succession of opposed forces which exists in the transmission of electricity by saying that electricity is always propagated in an undulatory manner.

Following these remarks in 1813, Oersted seems to have moved on to other things and set aside further investigation, until the winter of 1819. By the time of the memorable lecture of 1819, he came to believe galvanism as a transition between the extremes of static electricity and magnetism. Static electricity was transient and momentarily discharged. Magnetism was ever-present. The earth's magnetic field always existed. Lodestones and magnets rarely lost their power to attract or repel. Galvanism with its continuous flow of current, so long as the Voltaic pile remained chemically potent, seemed to fit snugly between static electricity and magnetism.

Historians do not agree on the dates of Oersted's discovery. Was it in the winter of 1819 or was it in the spring of 1820 that Oersted made the discovery in front of an audience? Some claim that it happened only in July 1820 when full details of his experiments became public. Oersted's own account of 1820 mentions the winter of 1819 in insufficient detail. However, modern commentaries that include English translations of Oersted's 1821 publications state clearly that the discovery was made in April 1820. Historiography of science is not an easy science. What concerns us more is the nature of the discovery.

Oersted's main impediment seems have to been a Newtonian view of the world, in which forces are central and act at a distance. Even the very terminology of the day reflected the overbearing influence of Newton. Scientists talked and wrote about magnetic forces, electric forces, and chemical forces. Volta was a living person and voltage as a word was not yet in the vocabulary of researchers in electrics. _Force_ was the key operative word and it followed directly from Newtonian mechanics and gravitational force laws. What we today call voltage was for a good part of nineteenth century called _electromotive force_ or _emf_. The term continues to persist in some modern textbooks. Against the well-established theories of Newton and everything that followed in agreement, it was difficult to conceive or propose contrary views. Scientists had great faith in Newtonian theories and this prevented them from thinking differently. It was in this environment of constrained selective thinking that Oersted made his first mistake.

When a magnet is placed in east-west direction near a magnetic needle, the needle that normally points to earth's magnetic north, would reorientate itself to the magnet east-west. Oersted extended this idea and placed a current-carrying wire in the east-west direction in place of the magnet. If magnetism proceeded from galvanism, the effect of the current would be similar to the magnet. The poles of the magnet due to flowing current would be located somewhere along the wire. Nothing happened to the needle, which kept pointing to earth's magnetic north. Oersted concluded that perhaps the effect was too small to overcome earth's magnetism but battery technology in the 1810s was primitive.

Electric discharge gave off light and heat. An idea occurred to Oersted. If all forces of nature were related, then light and heat would possibly be related to electricity and magnetism. He added a platinum wire to his galvanic circuit so that he now had an incandescent wire that carried current. The results were the same. The magnetic needle remained unaffected. Apparently, even a glowing wire did not result in magnetism strong enough to deflect the needle. All this while, Oersted had been reluctant to place the wire in the north-south direction.

He then made an incremental change in the setup. He placed the wire perpendicular to the plane of the needle, perhaps vaguely inspired by lightning affecting magnetic compasses. He noticed some effect on the needle but the results were not consistent. Nothing could be concluded. He tried bending the wire into different shapes. He obtained consistent results but his attempts to locate the poles of magnet along the wire failed.

It was this that he demonstrated to his audience. The fact that he failed to locate the poles did not impress the audience and they proceeded to leave. By now, Oersted was in a mixed state of desperation and disappointment. All his scientific lines of attack following the tradition of Newton had failed. In a last desperate attempt, he ditched Newton, placed the wire in north-south direction, and saw the needle turn. The departing visitors were called back and the effect was demonstrated.

In the days that followed, Oersted performed a series of experiments, as many as sixty of them, to establish some basic facts of electromagnetism. He placed the magnetic needle above the galvanic current-carrying wire. Then he placed the needle below the wire. He also reversed the direction of current flow. The result of these experiments was that the central force theory of Newton was in doubt. Magnetism was not inside the conductor, it was outside it. It was also circular, looping all around the conductor with decreasing effect as the distance from the conductor increased. The question of identifying the poles became irrelevant. Circularity meant that the poles were not concentrated within the wire and the magnetic needle's deflection depended strictly on its placement relative to the direction of the current. In Oersted's own words,

It is sufficiently evident from the preceding facts that the electric conflict is not confined to the conductor, but dispersed pretty widely in the circumjacent space. From the preceding facts we may likewise collect that this conflict performs circles; for without this condition, it seems impossible that the one part of the uniting wire, when placed below the magnetic pole, should drive it towards the east, and when placed above it towards the west; for it is the nature of a circle that the motions in opposite parts should have an opposite direction.

Oersted's Experiments on Electromagnetism

(a) Magnetic needle aligns itself tangential to a circular magnetic field centred on the wire. (b) Effect of galvanism on the needle is not seen because the needle already points north. (c) Oersted's crucial discovery when wire is placed in north-south direction. In this case, the needle deflects to east-west. (d) Changing the current direction changes deflection by 180 degrees. (e) Placing the wire below the needle changes deflection by 180 degrees.

Oersted summarized the results in a short Latin paper of July 1820. The paper was dispatched to many leading scientific institutions across Europe. It immediately triggered feverish experimental work across Europe. Ampère came to know of Oersted's work only in September that year. Within a week, he verified many of Oersted's experiments and established his own current laws, which are now fundamental to the science of electrodynamics. By September 1820, Johann Schweigger in Germany created a device to multiply the magnetic effect by looping wires many times over. With its increased sensitivity, it could be used for accurate quantitative measurements of current flow in a circuit. Ampère would later name this device the _galvanometer_ , heralding the start of instrumentation.

Although Newton's theory was challenged and the circular nature of magnetism was established, there was no satisfactory scientific explanation. Experience is one thing but without explanation, it stood to critical scrutiny. It would take a lot more than observation and experimental results to topple Newton's central force theory. A proper scientific explanation had to wait for two of the greatest names in the field—Michael Faraday and James Clerk Maxwell (Chapter 11). In the decades that followed, central force theory was limited to Newton's gravitational laws; but in the 1910s, a German physicist named Albert Einstein would rewrite the books. It is therefore fair to say that today's knowledge is only a partial truth, its validity not absolute but relative to the limitations of our current understanding.

Oersted's discovery must not be dismissed as either trivial or obvious. For nearly two hundred years, the link between electricity and magnetism had been suspected but no one had managed to prove it. In the years 1776 and 1777, the Bavarian Academy of Sciences announced a prize to anyone who could find the missing link. Needless to say, there were no winners. For twenty years since the birth of the Voltaic pile, no scientist conceived of a suitable experiment though galvanic circuits had been in regular use in laboratories all across Europe and America. Given this scientific landscape, Oersted's discovery is remarkable. If it was accidental, accident had a minor role to play when considered against a backdrop of numerous experimental failures and evolving thought processes that led to the discovery. A few decades later, microbiologist Louis Pasteur made a philosophical remark that seems relevant to Oersted's discovery: "Chance favours the prepared mind."

Electromagnetism is one of the fundamental principles of modern science. Twenty-first-century world as we have built it would be quite different without an understanding of this principle. Steam turbines that convert heat into electricity rely on electromagnetism. A kitchen blender would not work without it. A computer hard disk would not exist. Satellite TV and mobile phones that depend on wireless transmission exploit it. The earliest forms of communication by electricity would not have been conceived. It is to this form of communication that we now turn our attention.



**One remarkable property** of electricity that everyone noticed from the early years was that it was instantaneous. If there was a limit to the speed at which it travelled, it was imperceptible. In 1746, Abbe Nollet arranged a group of two hundred Carthusian monks in a mile-long circuit for an unprecedented experiment. The monks, clad in their simple robes and tunics, held hands and an iron wire as if in fervent communal prayer, quite unsure what was going to happen next. They had unwittingly consented to stand in as eighteenth-century lab mice. When Nollet discharged charges from a Leyden jar, the shock caused the monks to jump up and shriek out at the same time. It was no doubt an experiment designed to impress spectators. If current flowed thus without delay, it seemed possible to use it to pass information quickly from one place to another.

The idea of transmitting information using electricity was a conceptual leap in scientific thinking. Communication need no longer be linked to transportation. The days of carrier pigeons and horseback messengers were numbered. However, the separation of communication from transportation was not as novel as it sounds. Smoke signals had been used in primitive communities. For centuries, people had been using blazing beacons and flickering lanterns from hilltops and ridges to communicate, particularly at night. The idea of using light to communicate was applied to reflecting mirrors and waving flags. Both the British and the French fleets made extensive use of flag signalling at the 1805 Battle of Trafalgar. Well into the twentieth century, navies around the world continued to use flag signalling. Even today, software engineers use the term _flag_ to signal control operations from one code block to another. The problem with these optical means of communication was that they required line of sight. Any obstruction in the path had to be cleared away or alternative sites located. Fog or inclement weather resulted in a communication blackout. But there was an alternative.

Sound too had been a means of distant communication for centuries. Using bugles, bells, and drums generals had given orders to their lieutenants fighting on battlefronts. From eighteenth century onwards, Europeans exploring sub-Saharan Africa discovered an ingenious system of African _talking drums_. These drums were capable of conveying long phrases and entire sentences in rich rhythmic tones and overtones. Their inspiration had been human speech itself from which they had evolved an entire phonetic vocabulary for the drums. Europeans took a long time to understand the advanced system of the Africans. An authoritative insight was published only in 1949 by John Carrington.

Electrical communication—the idea occurred to many, some using Leyden jar discharges and others using the continuous current of Volta. In Germany, Samuel von Sömmering used the still new science of electrochemistry in 1809 to convey information at a short distance. In his apparatus, galvanic electricity was passed through water to create bubbles. The bubbles were trapped into a column, its volume measured and interpreted according to agreed conventions between the sender and the receiver. Following the tradition of Nollet, Francis Ronalds managed to construct a message transmission system in 1816 to a simulated 8-mile distance. Two wooden frames supported a cage of wires in his back garden in a London suburb. The sender and the receiver ends had synchronized dials using a clockwork mechanism. When a message needed to be sent, an electrically charged wire common to both parties was grounded. This discharged a pair of separated pith balls attached to the ends of silk threads suspended from the wire. With the loss of charge, the balls came together. At that exact moment, an operator at the receiver read the pointed markings on the dial. The idea of using static discharges and pith balls had been suggested anonymously in 1753.

The Ronalds system was slow and depended on the two dials being always in synchronization. With the benefit of hindsight, we today realize that any system based on loosely synchronized clock dials was doomed to fail. Synchronization cannot be assumed a priori. In almost all modern communication systems, both parties first establish synchronization before commencing message exchanges. They are also required by design to maintain synchronization at all times, track, and correct drifts as often as possible. Thus, synchronization precedes communication and subsequently communication assists in maintenance of synchronization. In some cases, an external common source of synchronization may be employed. Such is the case with mobile cellular systems that use Global Positioning System (GPS) satellites as the common clock reference.

From such early developments was born electric telegraphy. By 1820, Oersted had sort of traced the separate lineages of electricity, galvanism, and magnetism to discover electromagnetism at their root. Within a year, Ampère proposed that it might be possible to transmit information at a distance using deflections of a magnetic needle. He was of course perfectly correct but as every engineer knows, there is a vast difference between scientific theory and reduction to practice. The earliest known implementation of Ampère's suggestion comes to us sixteen years later. Called Alexander's telegraph, it was exhibited in Edinburgh in 1837. At best, it was a working prototype. Thirty magnetic needles were arranged in a grid of 5 x 6, so that an operator could signal all letters of the English alphabet plus four punctuation marks. The entire setup required thirty pairs of conducting wires. Using that many wires simply wasn't practical. Meanwhile, a completely different form of telegraphy had already ventured beyond the confines of the laboratory and into the open world.

For some time, Frenchman Claude Chappe had been intrigued by the possibilities of communication at a distance. But he was a clergyman and did not take the decisive step of putting his ideas into practice. Then came the French Revolution with the storming of the Bastille in 1789. Chappe lost his clerical position and returned to his hometown of Brûlon. There, with apparently nothing else to do, he turned his attention to communication. With the assistance of his four brothers, he conceived of a primitive system using a couple of synchronized pendulum clocks with identical dials. Messages were exchanged with nothing more inventive than by a crude means of banging on casseroles. The limitation of using sound was obvious and notwithstanding casseroles with best acoustic properties, they could not communicate more than 400 metres. Experiments with static electricity failed from the start due to lack of proper insulators for the wires. The problem with static electricity had always been unreliability due to leakages. Therefore, the Chappe brothers fell back upon the ancient method of using optical signals. Though optical signalling had been in use for centuries, the Chappe brothers brought an essential improvement.

It sometimes happens that just when a system is thought to have attained maturity, a new technology enters the scene and propels the system to new heights. For optical communication, this technology was the invention of the telescope. Dutchman Hans Lippershey invented it in 1608 and Galileo Galilei improved it the following year. By the time of the French Revolution, the telescope had seen numerous innovations—some using only refracting lenses, others only reflecting mirrors, and yet others a combination of the two in various configurations. Telescopes had become more precise, portable, and affordable. The popular one then was the achromatic refracting telescope patented by John Dollond in 1758. Beyond the possibility of increasing viewing distance by using a Dollond telescope, Chappe believed that he could find much better ways of sending messages from one place to another.

The fact was that in ancient times messages were agreed upon in advance between communicating parties. By mutual agreement, if someone waved a red flag, it might mean danger. A white flag might mean surrender. But what if someone wanted to say something more complex: "It is dangerous to attack now. Wait till after dusk." Unless this message had been agreed upon earlier, there was no way to signal arbitrary messages. In other words, one had to talk within the constraint of possible messages that both parties had worked out in advance. Clearly, this was a big constraint. One either had to have a large number of possible messages and a suitable means of signalling all those messages, or often resort to approximations by selecting a message that compromised only a little on the meaning. Aeneas (350 BC) and Greek historian Polybius (150 BC) had written about such signalling systems of fixed message sets.

It was this grand problem Claude Chappe was intending to solve. He wanted to convey any arbitrary message without prior agreement between the sender and the receiver. Using the same pendulum clocks, but this time replacing the clumsy casseroles with telescopes and rotating panels, the Chappe brothers demonstrated a working prototype to municipal officers on March 2, 1791. The world's first ever telegraph message was sent from Brûlon to Parcé, a distance of 12 miles, a phrase of nine words communicated in just four minutes. The message read, " _Si vous réussissez vous serez bientôt couvert de gloire_ (If you succeed you will soon bask in glory)." History was made that day. Suddenly, the world did not seem as big as it had been assumed. Unfortunately, the details of translating those nine pertinent words into optical signals are not preserved. It was clear, however, that Claude Chappe had solved the translation problem to handle any message. All that was needed now was to refine the method.

The Chappe brothers did not bask in this initial success, for they wanted glory itself. Claude Chappe recognized many areas of potential improvements. The pendulum clocks had to go. Something simpler in form yet more powerful in capability had to be invented. This had to happen quickly before competition came in. The government, still in its uneasy formative period, had to be convinced of its value.

In the ensuing months, the Chappe brothers experimented with both the transmission apparatus as well as the method of conversion from message to optical signals. In modern computing terminology, we would call the former _hardware_ and the latter _software_. More precisely, the method of conversion is what engineers term _encoding_ , which is simply the process of representing the original message into a form that is more suitable for transmission. It is encoding the forms the revolutionary aspect of Chappe telegraph.

In 1792, a new apparatus composed of movable shutters was installed in Belleville, northeast of Paris. Before any trials could be done, a French mob, suspecting Royalist involvement, destroyed it. Then in 1793 came the Reign of Terror and King Louis XVI went to the guillotine. In such troubling times, Claude Chappe pulled off another successful trial, this time with a new and improved system. The National Convention, convinced of the power that telegraph would bring in such revolutionary times, sanctioned close to sixty thousand francs for the construction of the first optical telegraph line. But the sanctioned money would not come on time and Claude Chappe had to deal with labour problems and delays.

Despite all odds, in July 1794, less than a year after the sanction, the line was completed. Optical telegraphy was born. It also went by the names _aerial telegraphy_ or _semaphore line_. Operation commenced on the first semaphore line, from Paris to Lille, a distance of 120 miles covered by eighteen telegraph stations. Its first application, one may say in exclusive use, was for the military who craved for rapid news from the frontiers to the capital. In fact, the news of the recapture of Le Quesnoy from the Austrians and the Prussians travelled from Lille to Paris within two hours of the victory. This victory was in some sense a victory for telegraphy. In the decades to come, the French optical telegraph system would grow to become the most advanced and well-managed system in the world, until the coming of electric telegraph. The system extended into neighbouring countries under Napoleonic control, reaching as far as Algeria, Morocco, and Egypt. Some countries followed the French system with variations while others succumbed to the not-invented-here syndrome.

Murray's Shutter Telegraphy

With six shutters, each in two possible positions, an alphabet of 64 messages could be signalled. This was among the earliest binary systems in communication technology.

In 1795, the English adopted a system of six shutters proposed by Lord George Murray. Each shutter took one of two possible positions—vertical (closed) or horizontal (open). This meant that a total of 26 or 64 distinct messages could be signalled. It is claimed that Murray got the idea from Abraham Edelcrantz's system of ten shutters in operation in Sweden since 1794. However, there is no doubt that the shutter system had been partially attempted by Claude Chappe himself in 1792 before the fateful riot. Today we can recognize in these shutters the earliest form of a binary system used for communication, a method of representation built from only two possibilities—open or close, a zero or a one.

Murray's shutters did not catch on because it had practical difficulties. Chappe was a true engineer of the field, not a scientist of the laboratory. His first-hand experience told him that shutters were not easy to view from a distance. When the sun reflected off the panels at particular angles or if the sky was too bright, operators made mistakes. An open shutter could be wrongly interpreted as being closed, and vice versa. Moreover, the use of only two states, though efficient, slowed transmission due to the inherent difficulties in converting a letter or a number to the correct orientation of each of the shutter panels. What was needed was a method that compromised a little on efficiency but increased the speed of transmission. What was needed was a system that would be easier to operate for the sender. What was needed was a system that would be easier to read through a telescope some six miles away.

The end result of much deliberation was that Chappe came up with a system that would stand the test of time. His apparatus consisted of a central regulator, to which were hinged two smaller indicators that could be folded or extended as required. The indicators were balanced with counterweights. The regulator itself was mounted on a ladder and linked by a system of brass chains and pulleys to operator controls. One key design feature was that the controls mimicked the arrangement of the regulator and the twin indicators. Thus the operator knew exactly what signal had been "written in the air" without ever going outside to see what he had written. The word telegraphy itself comes from its Greek roots, _tele_ (far) and _graphein_ (write), and therefore stands for "far writing."

The regulator and indicators were wooden and painted black to stand out against the bright sky. To improve stability in strong winds, which was common since the installations were mostly on hilltops and tall buildings, the regulator and indicators were louvred. This had the added advantage of a lighter mechanism and hence easier to operator. In addition, the regulator and indicators were divided into segments so that louvres of alternate segments were offset by ninety degrees. This gave better visibility whatever be the angle of the sun. The overall design resembled a human communicating with outstretched arms. Chappe argued that this gave much better visibility than shutters. The chance of making a mistake was much lower.

Chappe Telegraphy

Positions of regulator and twin indicators signal a particular symbol. As an example, the illustration shows signalling of the French word _bonjour_.

As for the encoding scheme, everything depended on the angles of the regulator and its indicators. The regulator could be in vertical or horizontal position. Diagonal positions of the regulator were special positions to indicate that the operator was in the process of setting the signal and the receiving operator should wait for it to be ready. The indicators could be vertical, horizontal, or at 45-degree angles. Barring the position when the indicator was aligned to the regulator, this meant each indicator had seven possible positions. Overall, the system was capable on indicating 2 x 7 x 7 = 98 distinct symbols in the air. A system capable of handling only 98 symbols was not very impressive, particularly given the fact that constructing and installing the signalling apparatus through towns and countryside involved a substantial investment. Chappe had to figure out a way to signal hundreds of messages using only 98 symbols.

Chappe understood one key aspect of communication—the separation of _message symbols_ and _control symbols_. Message symbols are those that carry information from sender to receiver. Control symbols are those that facilitate transmission of message symbols. If things go wrong, control symbols help to resume or ensure proper error-free communications. Control symbols are like watchdogs so that message symbols can be transmitted as intended by the sender. Chappe's motivation for this was simple. It is impossible to guarantee proper communication with only message symbols. Where human operators are involved, mistakes can happen. Foggy weather can introduce errors. To overcome these limitations, the system ought to have built-in ability to catch errors, recover from mistakes, and confirm proper message reception. Control symbols enable this.

To be fair, Chappe was not a genius for he had borrowed this idea from one of Newton's contemporaries, Robert Hooke. Way back in 1684, speaking at a lecture at the Royal Society of London, Hooke put forward the basic principles of a communication system. He stated that any communication system must adopt a set of symbols that can be arbitrary but an efficient representation of the language alphabet. With the regulator and indicators, Chappe had done just this and assigned meaning to each of his new symbols. Hooke also proposed the separation of message and control symbols. With control symbols, the designer could define and implement a necessary set of rules. These rules would enable both parties to coordinate their actions to ensure smooth communication. To clarify, Hooke gave explicit details for such control signalling,

I am ready to communicate [synchronization]. I am ready to observe [synchronization]. I shall be ready presently [delay]. I see plainly what you shew [acknowledgement]. Shew the last again [error detection and retransmission]. Not too fast [rate control]. Shew faster [rate control]. Answer me presently [request acknowledgement]. Dixi [end of message]. Make haste to communicate this to the next correspondent [message priority]. I stay for an answer [stop transmission and wait for reply].

This last aspect, critical for any communication system, is what we today call _protocol_. A protocol is nothing more than a set of rules and control messages that allow two computers or devices to talk to each other without running into confusion. Protocols are at the heart of all message exchanges on the modern Internet. Therefore, before we pat our backs in self-congratulatory manner for the wonders of the Internet and the ingenuity of our own generation, let us pause to see that the basic principles had already been laid down by Robert Hooke 350 years ago.

Hooke, as great a scientific figure as Newton himself, did not manage to make his proposal into a working system of any commercial success. His proposal lay neglected for more than a hundred years. Whether Chappe discovered Hooke's proposal or did the proposal discover Chappe, either way, it was a fruitful union from which optical telegraphy was born. Claude Chappe thus became the world's first communication engineer. The official title given to him at the time was _Ingénieur Télégraphe_.

Inspired by the ideas of Hooke, Chappe reserved six symbols for control, leaving only 92 for messages. If a single symbol is used per message, the system can handle only 92 messages. However, if one were to combine two symbols to indicate a message, the operator can signal 92 x 92 = 8464 possible messages. This can easily be understood within the context of modern decimal numeric system. We have only digits 0-9 but if we use two digits to represent a number, we can represent any number from 0-99. Any higher number could be represented by allowing further extension of the basic digits. The key concept here is the encoding of a larger set of messages or numbers in terms of a smaller set of symbols or digits. Engineers would later exploit this rather simple-looking concept. In fact, it would become the most important tool in their engineering toolkit; but the discovery would come independently from an unexpected quarter (Chapter 2).

To handle the enlarged set of possible messages, Chappe prepared a codebook. While the first symbol referred to line number, the second symbol referenced a particular page in the codebook. In addition, the use of a single symbol for a message was retained for more frequent messages. In essence, letters of the French alphabet, numbers, common words, and phrases used single symbols. This optimized on speed of transmission. By 1799, code designers added more messages to enhance the capability of the system. In later years, more control codes were added based on Edelcrantz's system in Sweden. The fact that the Swedish system had better control codes might have come about out of necessity, due to the higher error rates in using shutters rather than Chappe's better visual design. In fact, the spirit of Chappe's design was not unlike modern sign language used by the deaf and dumb.

Chappe's optical telegraphy established the first principles based on which future developments would take shape. Speed of communication was revolutionized. A communication network was established with an effective system of relay stations manned by trained operators. Some intermediate stations did more than simply relay the message. They interpreted them with the aid of codebooks and asked for retransmission if errors were detected. In 1833, there were 1000 uniformed operators, 34 inspectors, and 20 directors. By the 1840s, the network had more than 530 relay stations spanning almost 5000 kilometres. To oversee operations, special civic bodies were set up. In fact, laws had been passed earlier to enable Chappe to cut down trees and access any property across the land to bring up stations. Most importantly, the need for codes for efficient signalling inspired engineering innovation.

Despite these successes, optical telegraphy had its problems. Communication by night was not possible. Bad weather increased error rates. In winter months, transmission was prone to errors and only one in three signals arrived correctly. The mechanical nature of the transmitting devices meant breakdowns and repairs. The entire network was costly to operate and was never opened for private messaging. The government used it to relay military commands, lottery results, and financial news. Thus from the outset, it was the public that owned telegraphy in France. It was all too important and powerful to be left in private hands. Countries across Europe adopted the model. This would go on to influence electric telegraphy and even telephony of the future. On the other hand, the US government would fail to see the potential of these new technologies and leave it to private players. To this day, private companies dominate telecommunications in the US. Only towards the end of the twentieth century, would European governments embark on the path of privatization.

If only Oersted's discovery had happened thirty years earlier, and had scientists focused on tapping this new power for communication as a priority, electric telegraphy might have displaced optical telegraphy early on. But nature had decreed that the secrets of electromagnetism would be more difficult to unravel. In fact, the road to electric telegraphy was fraught with misleading signposts along the way. The first of these came in 1824, when Englishman Peter Barlow famously declared that electric telegraphy could not work for distances more than 200 feet. To transmit electricity over long distances was indeed a first-class problem worthy of the best scientists. Any of the leading European minds could have solved this problem had they only persevered and not been biased by Barlow's statement. The honour was left to an American.

Joseph Henry was born of Scottish parents and was apprenticed to a watchmaker. At the age of 22, he got interested in science and enrolled at the Albany Academy in New York. His diligence and commitment to science paid off when he eventually became a professor at the Academy. The work of European scientists kindled his interest in electromagnetism. In 1827, he began research in that field in earnest. In 1825, William Sturgeon had invented a powerful electromagnet that could lift weights. He had done this by first bending a soft iron bar into the shape of a horseshoe and winding conducting wire around the bar. His experiments had shown that this increased the magnetic strength at the poles.

Henry combined Sturgeon's work and Schweigger's multiplier to increase significantly the strength of the magnet. He first insulated the wire with silk, but rather than loosely wind the wire, he tightly covered the iron core pole to pole with many turns. Henry published this work in 1828 but this was just a start. After performing numerous experiments, his seminal work on the subject appeared in 1831. In those days, the terms current and voltage had not yet been established. Respectively, the equivalent terms of the day were _quantity_ and _intensity_. Henry's research led him to conceive of two types of electromagnets—quantity magnet and intensity magnet.

What Henry discovered was that when he increased the number of turns, magnetic strength increased alright but sometimes when he used a different cell, the result was reversed. He surmised that the length of the winding must have some effect and decided to connect in parallel separate smaller windings, but collectively still covering the entire horseshoe. He argued that this would give the current multiple paths of passage. The result was more dramatic than he had expected. While a single small winding covering a ninth of the core could lift only 7 pounds, nine such windings in parallel could lift 650 pounds of weight. This astonishing result was accomplished by using only a small galvanic cell. He also discovered that if the small windings are connected in series, in a sense similar to a single winding covering the entire core, more lifting power could be obtained by adding more cells to the galvanic battery.

The principle Henry had discovered is what we today call _impedance matching_. In simple words, the power transferred to a load is maximized when its resistance equals the battery's internal resistance. In other words, the resistances of load and battery should be matched. But in those early days, no one knew anything about resistance let alone the intricacies of matchmaking. George Ohm had implicitly defined resistance and published his famous law only a few years earlier. His law, first published in German, was at the time neither well known nor widely accepted. An English translation of Ohm's work appeared only in 1838. Henry had independently arrived at the _idea of resistance_ but had not put it through quantitative analysis. Nonetheless, Henry had established the key fact that wires connected in parallel result in a lower resistance than the same connected in series.

From these remarkable studies, Henry drew important conclusions. An intensity battery, one supplying large voltage using many cells in series, can be used to drive an intensity magnet miles away. It did not matter if this circuit had a higher resistance due to the long wire. Cells in series resulted in higher battery resistance and hence were matched to the circuit. This intensity magnet, by opening or closing of a secondary circuit, can trigger a quantity magnet. The quantity magnet needs to use only a small cell but it magnetic strength is derived from tightly wound coils connected in parallel. Thus, mechanical effects at a distance can be obtained by a combination of these two types of magnet. He demonstrated the same to his students by passing current through a mile of wire and thereby ringing a bell or crashing heavy weights. The basic principle was that an electrical circuit could affect, via electromagnetic action, a neighbouring circuit. Henry's device that utilized this phenomenon was later named the _relay_. In the early twentieth century, the relay would play a key role in the workings of the world's first computers.

Henry's Experimental Setup Using a Quantity Magnet

In this setup, Henry shows how a quantity magnet (A) with many parallel windings is strong enough to lift heavy weights even when the power source is a single cell (B & C). Source: (Henry 1831, pp. 408).

The relay was independently invented by William Cooke and Charles Wheatstone in England. Unlike Henry, who remained a scientist all his life, Cooke was an enterprising businessman. Although by profession a maker of anatomical models, his interest in electric telegraphy came about after being introduced to an early device of P. L. Schilling, a Russian diplomat in Germany. Schilling had used magnetic needles, as many as six, to signal messages. In later years, he reduced his system to use a single needle and came pretty close to something similar to Morse code. His premature death in 1837 forestalled further development.

Teaming up with Wheatstone, then a professor at King's College, London, Cooke invented a form of telegraph that used five magnetic needles. Since a needle could be made to turn left or right, one is tempted to think that his telegraph could signal 25 or 32 symbols. But the design was such that some orientations of the needles were invalid. Though the design could accommodate only 20 symbols, by selecting a pair of needles per symbol, the arrangement made it easier to read the symbols directly from the receiving instrument's dashboard. The fact that letters C, J, Q, U, X, and Z were left out did not cause major confusion. G was substituted for J, K for C, and KW for QU. Often the surrounding context in a message gave clues to interpreting the substitutions. Punctuations were left out and there was no lower case. Cooke and Wheatstone took out an English patent in 1837, No. 7390, the world's first patent concerning electric telegraphy.

Five-Needle Telegraph of Cooke and Wheatstone

To signal a symbol, two needles are activated. Example shows how the word _cat_ can be signalled; but since C is not part of the signalling alphabet, it is substituted with K.

Across the Atlantic, similar developments were taking place, though from an unlikely source—a New York painter named Samuel Morse. Although technically skilled, Morse was only half-successful as a painter. His preference for classical themes was at odds with the tastes of the American public who demanded only portraits to decorate their halls and offices. Perhaps this experience honed his approach to engineering—never put effort into something that has little commercial value. His initiation into telegraphy came rather unexpectedly on one of his return journeys from the Continent back to America. On board the ship, he met Dr Charles Jackson who introduced him to needle telegraphy. Morse was immediately inspired. These were early days when science was fledging in its nest and engineering application of science was almost nil. Every month seemed to bring out new discoveries that excited the public. Public lectures were commonly held and the excitement was shared. On the eve of Christmas since 1825, at the Royal Institution of London, eminent scientists of the day gave lectures to the public. Queen Victoria and Prince Albert sometimes attended them. Michael Faraday gave as many as nineteen Christmas lectures over the years, primarily on electricity and chemistry. In such an era, it was easy for persons outside the scientific community to get a foothold into scientific research. It was in this manner that both Cooke and Morse entered the realm of telegraphy.

The telegraphy of Morse differed from that of Cooke and Wheatstone in two keys aspects. First, the receiver enabled automatic recording of messages as they came through. This eliminated possible mistakes that a human operator could make in translating the received symbols. Second, perhaps the most important contribution of Morse, was the code used for transmission. Morse invented a system of dots and dashes. Each unique combination of dots and dashes represented a letter of the English alphabet, a numeric digit, or a punctuation mark. Thus, the code served as the language of transmission. In physical terms, the transmitted signals were short or long electric pulses. In modern computing terminology, we call the symbols _characters_. The method of representing characters as dot-and-dash sequences is known as _character coding_.

In Morse code, we find the first use of a binary system of electrical transmission, the use of two states. While Murray's shutter telegraph used six individual panels, the Morse system was much more versatile and better suited for electrical transmission. All messages went through a single stream of dots and dashes, separated only in time. Short pauses represented intersymbol separation, longer pauses stood for word separation, and even longer ones to delimit sentences. In addition, two characters could have same number of dots but differ only in the pauses amongst the dots. Likewise, the length of each dash mattered. In other words, the element of time played an essential role in the definition of the code. Ultimately, it was simplicity that made Morse code a success. It is therefore understandable that today Morse is well remembered for his code rather than for telegraphy itself.

Morse went about creating a codebook of alphabets, words, and phrases, all of which would be represented by only dots and dashes. The approach was not very different from the codebooks of Chappe. Making these codebooks was a mountain of labour. But language is never static. It evolves over time. New words suddenly become common with changing times. Old phrases become archaic. This meant that codebooks must evolve too. This meant that one could always find more common and up-to-date phrases. This explained why the Chappe brothers continued to expand their codebooks over many years. After many months of effort, Morse completed his codebooks in 1837. Just then, a question arose in his mind—why bother with encoding words and phrases if letters of the alphabet alone would do? This had been difficult in the era of optical telegraphy but with electric telegraphy, speed of signalling would not be limited by manipulations of regulator and indicators using unwieldy cables and pulleys. Electric transmission is comparatively easier. In addition, more frequent letters of the alphabet could be mapped to shorter sequences and hence reduce transmission times. Thus, E was given a single dot; T a single dash; J, being rare, a longer sequence of dash-dot-dash-dot. The result of this belated insight was that Morse discarded the codebooks and the Morse code of 1838 was born.

Message Encoded in Morse Code

Alfred Vail's example of a message encoded in the original American Morse code. Notice how S, C, and R use three dots but differ by the placement or absence of a pause. Source: (Vail 1845, pp. 20).

For his telegraphic system, Morse took out a caveat in 1837 and obtained a full American patent in 1840. He failed to get patents in Europe, except in France. Ironically, the French were not interested in electric telegraphy. They stuck to their established semaphore system, arguing that saboteurs could cut telegraphic wires. But all these developments came later. Back in 1836, Morse had got stuck. He had been unable to solve the distance problem. In England, the state of affairs for Cooke and Wheatstone was not much better.

It is at this juncture that Henry's ideas came to the fore. In 1836, Morse was introduced to Henry's paper of 1831. The following year, Henry visited London and met Wheatstone. The two exchanged notes and discussed their respective discoveries. The fact that Henry had solved the distance problem and had invented the electromagnetic relay became apparent. Telegraph engineering had finally obtained the missing pieces of the puzzle. There was no stopping now.

In 1837, Cooke and Wheatstone's telegraph system was installed between Euston and Camden Town, these being on the longer London-Birmingham Railroad. Transmissions on this 13-mile distance were entirely successful. This heralded the partnership between electric telegraphy and the railroad in England. The same year, Morse gave a public demonstration of his system in New York. One of those in the audience was Alfred Vail, an expert machinist who would soon join Morse and take telegraph engineering to new heights.

The US Congress was not easily convinced of the gains telegraphy would bring. After years of lobbying, the Congress finally appropriated $30,000 for the first experimental line between Washington DC and Baltimore, a distance of 30 miles. There were engineering problems to be solved at every turn. Both the sending and the receiving devices needed improvement. Outdoor installation of electric wires had never been done before. The first approach was to lay insulated cables underground inside lead pipes. This was a costly enterprise and after spending $23,000 of the allocated money, the insulation broke. Someone came up with a proposal to string the wires above ground from pole to pole, the wires supported by cross-arms. The proposal worked and engineers completed the line within budget. The first message was dispatched to Baltimore on May 24, 1844. The sentence, taken from the Bible, read, "What hath God wrought." But clearly, God had nothing to do with it. It was solely man's understanding of nature and his ingenuity in solving real-world problems.

The neglected real hero of this success was Alfred Vail. His skills in machining and original inventive genius transformed what was an unreliable Morse model into a working one fit for rugged everyday use. The original recording equipment of Morse moved the pencil transverse to the motion of the paper tape. Vail changed this to a vertical motion that embossed dots and dashes as the tape rolled. This device, now called the _Morse Register_ , improved stability and reliability. Vail's own comments on the invention make it clear that engineering success is rarely a flash of inspiration. Rather, it is a series of experiments, evaluation, and redesign,

The first working model of the Telegraph was furnished with a lead pencil, for writing its characters upon paper. This was found to require too much attention, as it needed frequent sharpening, and in other respects was found inferior to a pen of peculiar construction, which was afterwards substituted. This pen was supplied with ink from a reservoir attached to it. It answered well, so long as care was taken to keep up a proper supply of ink, which, form the character of the letters, and sometimes the rapid, and at others the slow rate of writing, was found to be difficult and troublesome. And then again, if the pen ceased writing for a little time, the ink evaporated and left a sediment in the pen, requiring it to be cleaned, before it was again in writing order. These difficulties turned the attention of the inventor to other modes of writing, differing from the two previous modes. A variety of experiments were made, and among them, one upon the principle of the manifold letter writers; and which answered the purpose very well, for a short time. This plan was also found objectionable, and after much time and expense expended upon it, it was thrown aside for the present mode of marking the telegraphic letter. This mode has been found to answer in every respect all that could be desired. It produces an impression upon the paper not to be mistaken. It is clean, and the points making the impression being of the very hardest steel, do not wear, and renders the writing apparatus always ready for use.

Vail also invented the _Morse Key_ , a simple tapping device that could make or break the electric circuit in quick succession. This dramatically improved transmission speeds. To define Morse code with short codes for frequent letters and longer codes for less frequent ones, the inventors needed statistical data on the English alphabet. It was Vail who got the idea of visiting a local newspaper. Looking at their stock of typefaces, he derived the frequency of occurrence that lead to an optimized Morse code design.

The experimental line of 1844 was not a commercial success. The line was mostly idle and few people wanted to send messages. This left Vail ample time to experiment. Over the next few months, he performed more than fifty experiments. This was engineering at its best, solving problem after problem, studying and improving every aspect of electrical transmission over long distances. He laid out procedures of operation. He studied best configurations of battery, line voltage, insulation, and relays. It is therefore regrettable that Morse, a proud man, did not give adequate credit to Vail. He regarded Vail as little more than his assistant. Vail died unrecognized while Morse profited from the eventual commercial success of electric telegraphy.

The commercial failure of the first line meant that the US Congress was not interested any further with telegraphy. This turned out to be a turning point for telecommunication in the US. Private companies mushroomed all over the country and spurred innovation due to intense competition. It was then, as it is today, a case of big-fish-eat-small-fish. A giant was in the making; and in 1856, it absorbed many smaller companies. This was the Western Union Telegraph Company. In 1866, it also acquired the one Morse had founded. The age of large corporations in telecommunications had started.

Just as the French had built a network of semaphore lines fifty years ago, electric telegraphy had similar but bigger ambitions. Starting from the 1850s the race was on to build national networks. The first transcontinental line from New York to California was completed in 1861. News that would have taken ten days by Pony Express now took minutes. Subsequently, countries started to build international messaging systems and agreements. In Europe, the International Telegraph Union (ITU) was formed in 1865. To connect Europe with Asia was no easy task and the British were the first to grapple with this challenge.

In the early 1840s, Europeans had discovered gutta-percha, a rubber-like material in the Malayan Peninsula. This had interesting properties that caught the attention of engineers. It was waterproof. It was a good insulator. At high temperatures, it softened and could be drawn easily. At higher pressure or lower temperatures, its insulation increased. It was the best material for undersea telegraph cabling. Thus far, rivers had been crossed by high masts on both banks. The coming of gutta-percha got engineers thinking. Perhaps it was not too far-fetched to bridge the English Channel.

In Germany, Werner Siemens first introduced in 1847 a reliable manufacturing method to insulate conducting cores with gutta-percha. This invention made the use of gutta-percha practical on an industrial scale. In 1851, the English Channel was telegraphically spanned. When the Crimean War broke out in 1854, the British rushed to connect Balaklava to Varna with a 340-mile cable laid at the bottom of the Black Sea. Russia for its part built its own line from Crimea to St Petersburg. Telegraphy changed the way wars would be fought from then on. Interestingly, the French who allied with the British used a mobile version of their reliable semaphores during the war.

The Crimean War heightened British urgency in connecting Calcutta with London. By early 1860s, the Indian side of the line had reached Kabul and the European side had reached Baghdad. Now the cable had to pass through Iranian territory for which the British had already secured an arrangement with the Shah. The real obstacle lay elsewhere. A border dispute between the Ottoman Empire and Iran, unsettled since the seventeenth century, began to raise it ugly head. The disputed territory was a short segment from Qasr-i-Shīrīn to Khanaqin, a mere 18 miles. So, here we had an entire line stretching from London to Calcutta, over land and under sea, sitting idle for nearly eight months waiting resolution of an old border dispute. For the British, the cooperation of these two powerful empires was vital. They saw the Iranian and the Ottoman Empires as a buffer against Russian interests in the Middle East and India.

It took all of British tact and diplomacy to get this sorted. If nothing else, the solution was almost hilarious. The Ottomans had constructed other parts of the line using iron poles while the Iranians had used wooden poles through Iranian territory. The agreement reached was to have alternating wooden and iron poles along the disputed border segment. Iranians would patrol and repair their wooden poles while the Ottomans would take care of their iron ones. The border issue itself was sidestepped, at least for the moment. The economic and military benefits of a telegraph line were obvious to all parties and they had finally agreed to a compromise. The construction itself took only two days and on January 7, 1865, Calcutta and London were telegraphically connected.

Now engineers turned their attention to the Atlantic, to connect Europe and North America. This was no mean task, a distance of 2300 nautical miles, the cable alone weighing a few thousand tons. Logistics was also a concern. Could there be a ship strong enough to carry such a cable? Could such a large ship get across the Atlantic on steam power alone? The scientific reasoning behind these questions was speciously sound. Engineers reasoned that to sail across the Atlantic a certain quantity of coal was required to power any steam ship, but to carry that amount of coal meant increasing the size of the ship's hull, hence increasing surface area and water resistance. This implied yet more coal and yet bigger hull. It was a catch-22 situation. Then came Isambard Kingdom Brunel, possibly the greatest engineer of the Industrial Revolution. Brunel argued that if one increased the hull size, capacity to carry coal increased in cubic proportion while resistance increased only in quadratic proportion. Thus, a steamship, no matter how large, could cross the Atlantic.

The first such ship was the _Great Western_ , its hull measuring 236 feet in length. In 1843, came the _Great Britain_ with a 320-foot hull. It had two of Brunel's innovations. It introduced the screw propeller whereas earlier ships had used paddle wheels. This was not an arbitrary design change. Rather, Brunel had done tests of both mechanisms and compared the results. For greater strength, the ship replaced the older wooden hull construction with a modern iron hull. Suddenly, carpenters were made obsolete and they had to learn quickly new skills in metalworking. Then in 1857 came the largest ship of the day, the _Great Eastern_ , with a 692-foot hull, a mammoth that would not be surpassed for another forty years. It had important design improvements as well—watertight double hull, bulkheads, and sealed compartments.

The first attempt of 1857 to span the Atlantic failed. Three more attempts the next year resulted in a partial success. The cable worked for about a month when improper usage resulted in loss of cable insulation. Apparently, the line voltage had been too high. This was an expensive business and financiers did not take this lightly. Immediately a study committee was formed to understand all that could possibly be understood of long-distance electrical transmission properties, cable laying, and undersea operation. Attempt was made again in 1865, for the first time using the _Great Eastern_ , but the cable snapped when just 600 miles remained from the coast of Newfoundland. A resumed attempt the next year succeeded. _Great Eastern_ then searched, found, and repaired the cable of 1865. Thus, in the year 1866, the world had two cables spanning the Atlantic. By then, there were about fifty submarine cables across the world. Given that Europe and Asia had been connected the previous year, Calcutta was now connected to Alaska. In the 1870s, came an all-sea route to India and subsequent routes to the Far East. By 1904, a circuit spanned the globe.

The pinnacle of engineering achievement really was crossing the Atlantic and much of the credit goes to Scotsman William Thomson. In his paper of 1855, he analysed the propagation of signals from a theoretical perspective. Fleeming Jenkin, in a paper from 1862, experimentally confirmed much of Thomson's work. Thomson had shown that there is a limit to the speed of transmission on long cables. Long lines have significant capacitance (charge accumulation) so that electrical changes at one end take time to reach the other end. This delay limits the rate of communication. In addition, dots and dashes have different maximum voltages, different trails of charge and discharge. This meant that neighbouring dots and dashes electrically interfered with one another. This interference sets a much lower limit to signal transmission. This phenomenon would occur in a different avatar in communication systems a century later, what engineers today call _intersymbol interference_ or _ISI_. In Jenkin's own words,

Long before the limit is reached at which signals cease to produce any change at the receiving end, the interference of one signal with another causes so great a confusion in the currents received as to put a fresh limit to the practicable speed of transmission. This confusion is still further increased by the effect of a pause in the signals between letters, words, or sentences.

The loss of the 1858 cable had been due to high voltage. Thomson had already invented the mirror galvanometer capable of measuring even the slightest of magnetic deflections. This meant that a small voltage would be sufficient for signal transmission. He analysed the problem of cable breakage by studying the profile in which cables fall to the ocean floor, given a constant speed of the moving ship. To improve tensile strength, the conducting core was increased. Although this increased cable weight, it also increased buoyancy so that effective weight in water was less. The cable itself was different for shallow waters than for deeper waters, the former having much more protection with the use of hemp yarn and iron wire armouring. Through all the troubling failures that faced this project, engineers never lost hope. Led by Thomson, they were always confident of success. William Thomson, one of the great scientists of the nineteenth century, with many contributions to his name, was knighted by the Queen for his work in spanning the Atlantic. He is more commonly known today as Lord Kelvin and is associated with the Kelvin temperature scale.

The impact of electric telegraphy was much more than optical telegraphy. Telegraphy was now for everyone, not just for governments. The availability of information and news in almost an instant had a huge economic impact. New services, not unlike those offered by today's Internet, flourished—postal money orders, purchase of railway tickets, catalogue shopping, and delivery. Modern commentators have even referred to telegraphy as the _Victorian Internet_. Empires, be it the British, Ottoman, or Iranian, embraced the new technology for it enabled them to centralize power and effect greater control on their provinces. In the scientific realm, particularly with the Atlantic cables of 1866, for the first time longitudes could be determined accurately. Given the time at Greenwich, one could know accurately the time at San Francisco. Socially, people communicated using telegrams. Telegrams, which initially brought only foreboding news of death and suffering, began to be used for joyous occasions, particularly when tariffs dropped.

Those who did not understand the technology had a tough time. There was a case of a man wanting to "send" a pair of boots to his son over the wire. He had heard of money being sent over the wire. So, he hung the boots on overhead transmission lines and waited all day. Nothing happened. The next morning he found that the boots were gone. Without realizing that they had been stolen overnight, he assumed that his son must have received them.

Mistakes were sometimes made, particularly when messages were handwritten by customers. In one case, an Indian Maharaja had wired his durbar to receive him at the train station. Instead, he found his barber waiting for him. Other mistakes were a lot costlier, for they caused loss of revenue or overbuying of goods. The message "Sell for 55 dollars" if miscommunicated as "Sell for 5 dollars" would be disastrous. For this reason, telegraph companies advised customers to be explicit, "Sell for fifty-five dollars." The cost of transmitting a digit was same as for a word. When mistakes occurred, legal battles were fought between telegraph companies and customers. Publications of the _Michigan Law Review_ , _The Yale Law Journal_ , or _Columbia Law Review_ describe many such cases in the 1910s and 1920s. Telegrams survive to this day, but only as collector's items for those who desire a whiff of nostalgia. The difference is that today the Internet is used to relay the messages.

As electric telegraphy expanded across the world, it soon replaced optical telegraphy. The French for a while used electric telegraphy to transmit Chappe codes but they soon realized that Morse code was far better. Countries in Europe purchased rights to the Morse code. Needle telegraphy of Cooke and Wheatstone faded into history. Transmissions on the French semaphore system finally ceased in 1855. The last optical telegraph line closed in 1917 at Curaçao by the coast of Venezuela. In time, Morse code would inspire popular culture and Hollywood. Years later, in a strange case of 2005, phone company Nokia would file a patent for transmitting Morse code from a mobile phone using LED lights, to be read and understood by a camera-phone.

It generally happens that any new technology while solving problems, begets new problems. As telegraphy became an international phenomenon, traffic increased beyond expectations. At the time, a telegraph wire could carry only one message at a time. Telegraph companies kept stringing more and more wires to keep up with the growing demand. It soon became obvious that the solution could not scale up. The problem became so urgent that Western Union offered a million dollars for anyone who could solve this wiring problem. The best minds of the day got down to serious work.



**The first person** to alleviate the wiring problem was Joseph Sterns who in 1872 came up with a system of duplexing that enabled simultaneous transmission and reception of messages on a single line. It was not long before another American, Thomas Edison, undoubtedly the greatest inventor of all-time, patented in 1874 a system that he called _quadruplex_. This allowed four messages, two in each direction, to be handled simultaneously. It did not matter at the time that each message required two operators. The problem of infrastructure was much more acute than hiring of trained operators. Automated transmission and reception had to wait its time when the need arose. Meanwhile, some engineers had the conviction and boldness to think that a single pair of wires could carry many more than four messages. Among them was one notable inventor whose research led him to invent something better than the telegraph itself.

Alexander Graham Bell, with the financial backing of Gardiner Hubbard and Thomas Sanders, started looking into this problem in 1874. But the seeds of his approach had been sown at least a decade earlier, long before the Bell family migrated to the US in 1870. What separated Bell from the rest of the research crowd was his intimate knowledge of sound. His grandfather, Alexander Bell, had personally tutored him in elocution, with attention to diction and accent. His father Alexander Melville Bell too had done pioneering work in speech. In particular, Melville Bell had analysed and catalogued human speech sounds. This led him to devise representations of sound in terms of the position of the tongue and lips, really a tool to teach speech to the deaf. He called this _visible speech_. It was therefore natural for Bell to follow in this tradition.

Bell came across research done earlier by Hermann von Helmholtz. Helmholtz had used tuning forks excited by intermittent currents and resonating cavities to create human vowel sounds. Charles Wheatstone working on a suggestion from Baron von Kempelen created a machine from pipes, values, bellows, and levers. Wheatstone's idea was quite simple—sound was nothing more than vibrating air and any mechanism to do just that should lead the way to artificially creating speech. Working from this assortment of new developments, Bell picked up on the tuning fork resonance of Helmholtz. Using the phenomenon of sympathetic vibrations, he measured the pitch of each vowel sound. For the moment, these experiments had nothing to do with solving the problem of the telegraph industry but it was not long before many recognized that the two were related.

So when Bell started his research in 1874, the approach was inspired by his work with the tuning forks. If multiple forks were used, each of a different pitch, each paired fork at the receiving end would respond by sympathetic vibration at its own pitch. Since each tuning fork responded only to its own pitch, all tuning forks could be operated at the same time. Since this can be done with sound, there was no reason why it couldn't be done with telegraph signals on the wire. Unlike Edison who had always been snobbish of scientists, Bell viewed theory as a necessary component of invention. In this regard, his intimate knowledge of speech production aided him towards harmonic telegraphy. In harmonic telegraphy, multiple messages can be transmitted at the same time by having each stream of dots and dashes in its own pitch or frequency. Engineers today call this _Frequency Division Multiplexing (FDM)_.

Bell got down to the task of building a prototype. A struggle of many months finally culminated in an experiment conducted in June 1875. Consisting of multiple pairs of transmitters and receivers, each tuned to a different pitch, the system was able to handle multiple tones simultaneously. Then something unexpected happened. Bell heard not just the clear tones but overtones. This immediately suggested to Bell that not just telegraphy but also human speech could be carried over wires electrically. Bell's interest significantly shifted from telegraphy to speech transmission. Hubbard was not very happy, particularly when the former had a clear and immediate market need. As for speech transmission over wires, there was yet no clear market.

Birth of an idea is one thing, but to make a working prototype calls for different skills altogether. Bell had solved the problem of transmission but there remained the problem of conversion. The crux of the problem can be summarized in one word: transducer. That's the name engineers give to any device that converts one form of signal to another. In this case, Bell had to convert human speech, vibrations in the air into electrical signals. At the receiving end, the reverse process had to be accomplished. Receiver design was comparatively simple. Helmholtz and Wheatstone had already done this earlier. Making the transmitter was the main challenge. Others before Bell had either worked on the idea or at least given it some thought.

Charles Bourseul in 1854 managed to transmit sound via electricity. The same was achieved independently by Johan Philip Reis in 1861. Reis in particular came close to building the telephone. The apparatus worked accidentally and Reis did not understand the principle at all. In fact, his approach was wrong from the start. Taking hint from the dots and dashes of telegraphy, Reis attempted to transmit sound by making and breaking the circuit. The intermittent flow of current gave poor reproduction of speech. It was only by accident, when the contacts did not break the circuit cleanly, Reis obtained a continuous current that carried speech.

Bell was granted his telephone patent, titled "Improvement in telegraphy," on March 7, 1876, a remarkable fact given that it was filed only on February 14, 1876. More surprising is the fact that months of experimentation had led to no satisfactory prototype. He was not ready to file a patent but Hubbard, suspecting that the competition might beat them to it, rushed Bell to file one prematurely. This was alright since the Patent Office had since 1870 waived the requirement of a working model for new inventions. Three days after the patent was granted, Bell got a new model working, and the first famous words were spoken to his assistant, "Mr. Watson—come here—I want to see you!" After months of disappointment, this sudden success was not accidental. Later investigations revealed that Bell's attorneys had visited the Patent Office in Washington DC just before filing the patent. On the same day, February 14, 1876, another inventor Elisha Gray had filed a caveat for a similar idea. It is claimed that Bell's attorneys surreptitiously took a peek at Gray's caveat and made last-minute changes. When Bell visited Washington DC later that month, the attorneys made him add handwritten sentences on the margins of his earlier draft. In fact, the idea of transmitting "vocal or other sounds" was mentioned only in the fifth claim, almost as a footnote to preceding claims on harmonic telegraphy. The fourth claim relating to the method of conversion from sound to electricity is said to have been added just before patent submission. Speech transmission was not explicitly mentioned in the patent. Author Seth Shulman in a recent book makes a compelling case that Bell stole Gray's idea.

Gray's own idea had eluded Bell for long. Both Bell and Gray knew that the approach of Reis was flawed. Speech transmission had to be continuous unlike the intermittent nature of telegraphy. That current could be controlled by changing circuit resistance was a critical idea to transmitter design, but was this something Bell had picked up from Gray? Change of resistance was to be effected by speech-induced vibrations in the air. Imposing speech this way by changing a property of the electric circuit is today called _modulation_. Gray's transmission apparatus was innovative. A metallic needle dipped into water, the needle being attached to a vibrating diaphragm. Both the needle and the solution formed part of the electric circuit. The motion of the needle was smooth, sensitive, and continuous, resulting in accurate speech modulation of circuit resistance. This "liquid transmitter" was present in Gray's caveat but also appears in Bell's patent, mercury being replaced for water. The discrepancy is that the idea is completely absent in his laboratory notebook. Gray's idea appears in Bell's notebook on March 8, only after his return from Washington DC. This key point would later become important in all legal suits involving the Bell patent. Gray for his part did not proceed to convert his caveat into a full patent, being content with harmonic telegraphy. He believed that transmission of speech didn't have a major commercial value. In any case, other inventors, including Edison, came up with much better transmitter designs.

Gray's Caveat and Bell's Notebook Sketch

(a) Drawing in Gray's caveat of February 14, 1876, shows the workings of the liquid transmitter. (b) Bell's laboratory notebook reproduces a similar sketch dated March 9. Such a design does not appear in his notebook before March 8.

Perhaps historians should rewrite the books but students continue to be taught that Bell invented the telephone. Surprisingly, it has been acknowledged recently that the real inventor of the telephone is neither Bell nor Gray. The honour goes to an Italian migrant to the US, a contemporary of Bell. Antonio Meucci had a working model as far back as 1849, almost thirty years before Bell. Meucci had filed a caveat for his invention in December 1871 but being poor and lacking the right connections, he could not patent it or renew the caveat. In his final years, he lived on the charity of his friends. History remembers not those who just invent in their garages and attics, but only those who take their inventions out into the open world.

When Bell demonstrated his telephone at the Centennial Exhibition at Philadelphia in June 1876, it caused quite an excitement. Sentences from Shakespeare's Hamlet were read out. Writing about it two years later, William Thomson described,

Mr. Alexander Graham Bell exhibits apparatus by which he has achieved a result of transcendent scientific interest—the transmission of spoken words by electric currents through a telegraph wire.... With my ear pressed against this disc I heard it speak distinctly several sentences. I need scarcely say I was astonished and delighted; so were others, including some judges of our group who witnessed the experiments and verified with their own ears the electric transmission of speech. This, perhaps, the greatest marvel hitherto achieved by the electric telegraph, has been obtained by appliances of quite a homespun and rudimentary character.

The Bell Telephone Company was created in 1877. Western Union, initially sceptical of the telephone's potential, realized its mistake in not buying Bell's patent when Hubbard offered it for $100,000. In the same year, it formed the American Speaking Telegraph Company, its technology being based on the patents of Gray, Edison, and others. This was a natural deal for Western Union since in 1872 it had already acquired one-third stake in Gray's original company of 1869 and promptly renamed it to Western Electric Manufacturing Company. For the new kid on the block, Western Union was a giant with thousands of miles of telegraph wires across the country. How could David compete against Goliath? The Bell Telephone Company had one important sling on its shoulder: patent.

As Bell's company grew from strength to strength under the able leadership of Theodore Vail, the US Supreme Court in November 1879 ruled against Western Union, whose business was subsequently limited to telegraphy. In 1881, American Bell Telephone Company acquired Western Electric. Through many transitions of the original Bell Telephone Company, the American Telephone and Telegraph Company, AT&T, was formed in 1885. This grew to such an extent that in 1909 it acquired controlling interest in Western Union. This marked a turning point in history when telephony overtook telegraphy, although the two would continue to exist side by side for many more decades.

In the year 1800, Volta's galvanism was born. In 1820, electromagnetism was discovered. By 1844, an experimental telegraph line was in operation. In 1876, the telephone was invented. In 1879, Edison invented the incandescent electric lamp. In 1882, Pearl Street power station in New York was up and running. The age of electrification had begun. Electricity had changed not just how we communicated but also how we lived. If coal and steam power had given impetus to the Industrial Revolution, electricity had already launched its own revolution for the modern era. Only one nagging problem remained—no one really understood electricity.

#  0010 The Science of Engineering

**To say that** engineering is applied science is an oversimplification if not being naïve. The relationship between science and engineering is a lot more intimate that one is perhaps willing to acknowledge. For generations, scientists and engineers have been cynical of each other. Scientists consider themselves pioneers at the vanguard of path-breaking work. They regard engineering as nothing more than fitting the bolts and turning the screws to get things working. Engineers for their part regard scientists as laboratory geeks working in the ethereal realm of the abstract, quite happily isolated from the real world and its needs.

Telegraphy is perhaps the first major application of science. The works of Gilbert, Coulomb, Volta, Oersted, Ampère, and Henry are works of science. What followed almost naturally were novel applications of these discoveries in the hands of Schilling, Cooke, Wheatstone, Morse, and a few others. It is not a coincidence that all these inventors started working on telegraphy at the same time. When science had established the fundamentals, it was ready for others to make use of it. All engineers saw the obvious need and advantage of communicating at a distance. Only their approaches differed, but they all relied on Oersted's electromagnetism and Henry's theories. Henry was quite right when he said in 1842, "science is now fully ripe." With telegraphy, a new era was born in which science gave impetus to technology. This may even suggest that technology did not exist before the nineteenth century.

On the contrary, technology is as old as human civilization itself, perhaps even going back to prehistoric man. The design of stone tools, the sharpening of arrow heads, or the crafting of stone beads are diverse aspects of engineering. Imhotep, regarded as the world's first engineer, built the Step Pyramid in 2800 BC. Ancient Greeks, who gave the world many eminent mathematicians and philosophers, were among the first to enquire and question but they did not follow up with applications of their speculative theories. In fact, they considered building anything condescending to their status as philosophical thinkers. One exception was Archimedes whose giant catapults defended his hometown of Syracuse from Roman invasion in 212 BC. Even for Archimedes, his engineering devices came out of dire need and had secondary status to his philosophy and mathematics.

Romans on the other hand, focusing more on conquests and everyday concerns, almost ignored science but took engineering to new heights. What survive today of baths and aqueducts are testament to the greatness of Roman engineering. What differentiates these past creations from the present is that their creators did not require a great deal of scientific understanding to make them. The least the Romans needed was to make the connection between gravity and water flow to build an aqueduct. To build the baths, they perfected a process to melt and beat lead into sheets without understanding what made lead heavy, durable, or malleable. They knew how to heat the baths without knowing the intricacies of convection. Those were the days when science was dispensable. Engineering was grand but not sophisticated.

When science was thus lagging behind engineering until the coming of the European Renaissance, it was almost predictable that engineering should enable science. Engineering opened up new opportunities that had not been possible earlier. The telescope was an engineering invention. Galileo used it to discover the satellites of Jupiter and sunspots. The science of astronomy rose to new heights, eventually establishing the Copernican view that the sun and not the earth was at the centre of the Solar System. Another example relates to warfare. In 1742, Benjamin Robbins invented a device called the _ballistic pendulum_. It was designed to measure muzzle velocity and thus validate if a cannon had been constructed to required specifications. His experiments required him to factor in air resistance. In the course of this work, he unexpectedly encountered an anomaly. At a speed of about 1100 ft/sec, air resistance increased dramatically. It was in this manner that he discovered the sound barrier. It was an engineer, not a scientist, who made the discovery.

Telegraphy marked a turning point. Science was no longer going to be in the back seat. From here arose the distinction between a scientist and an engineer. A scientist discovers; an engineer creates. A scientist uncovers; an engineer assembles. A scientist needs to be inquisitive while an engineer needs to be creative. The essential attribute of the scientist is curiosity; for the engineer, it is ingenuity. Attempts to put scientists and engineers into such straightjacket definitions have always been controversial.

History has shown us that scientists often have engineering inventiveness and engineers are capable of scientific theorizing. When a scientist puts forward a theory, he is actually being inventive. When an engineer tinkers with parts of an engine, he is being curious. One may therefore argue that these attributes are part of a process in which the final goal becomes the key differentiator. The goal of a scientist is to understand the workings of nature and uncover natural phenomena. He may be creative and inventive but this is of secondary importance. He should first be curious. The goal of an engineer is to tap into natural phenomena to make something useful out of them. Again, an engineer's curiosity, though necessary and important, is less of a motivating factor than his need to create something. As telegraphy expanded and competitors entered the fray, a controversy that had been simmering for a while soon erupted. The protagonists were Morse and Henry, an engineer and a scientist.

Henry reluctantly got dragged into an argument with Morse, who had sued his competitors for patent infringement. These competitors called Henry under subpoena to testify that he had established the principles of telegraphy earlier to Morse's patent. The result was that Morse's patent stood but its wide scope of application was reduced. Morse patent was valid for his code and apparatus but not for the principles of telegraphy. For the rest of their lives, Morse and Henry remained bitter enemies. The entire episode brought to attention the key divide and differing perspectives between scientists and engineers. Henry placed much greater importance to the discovery of scientific principles than to their application. Henry therefore did not patent his discoveries. In fact, he had freely given his expert advice to both Morse and Wheatstone in 1837. Though he credited Morse for applying them to telegraphy, he stated that Morse had not made

a single original discovery, in electricity, magnetism, or electro-magnetism, applicable to the invention of the telegraph. I have always considered his merit to consist in combining and applying the discoveries of others in the invention of a particular instrument and process for telegraphic purposes.

The developments of the nineteenth century made it obvious that science and engineering could not exist independent of each other. The failure of undersea Atlantic cabling of 1858 was an engineering failure but also ignorance of basic science. This failure was a shot in the arm to seek greater scientific understanding. The success of 1866 was due to Thomson's scientific studies as well as the engineering efforts of Jenkin, who supplied detailed measurement data on the resistance of copper, and the insulation and capacitance of gutta-percha. Thomson engineered the mirror galvanometer, which became essential for cable operation.

Science, Engineering, and Entrepreneurship

History has thrown up numerous examples to show that commercial success is a combination of three disciplines.

Notwithstanding all the advances in the generation and use of electricity, no one had a clue about its inner nature. Was electricity a movement of matter, perhaps a fluid of some sort, a force acting on matter, or something else altogether? Charles F. DuFay in France had proposed in the eighteenth century that electricity was composed of two types of fluids. Objects with one type attracted objects of another type while objects of the same type repelled each other. Thus, when glass is rubbed against silk, positive charges are produced on the former and negative charges on the latter. This two-fluid theory was questioned by Benjamin Franklin who promoted his alternative one-fluid theory. All objects normally had a balance of positive and negative charges but the process of rubbing resulted in transferring charges from one object to another. Therefore, we did not require two fluids when a single fluid offered a satisfactory explanation. In Franklin's proposal, we find the essence of any scientific enquiry. One should always seek a simpler explanation over the complex. More importantly, unlike what DuFay had suggested, charges are not produced but merely transferred. Implicitly, Franklin established in this statement the principle of conservation of charges.

These early theories were just that, theories that could not be verified by experimentation. Meanwhile, electromagnetism had been discovered. Well into the second half of the nineteenth century, the original question about the nature of electricity remained. Without an answer, scientists realized that further progress with electricity was unlikely. Engineers for their part did not bother with the question. Most engineers thought that everything that needed to be discovered had already been discovered, in a way downplaying the importance of scientists. All that one needed to do now was refine measurements, methods, and processes. What followed from that point onwards was an unparalleled synergy between the two camps. With differing but not conflicting focus, scientists and engineers complemented each other and jointly paved the way for progress.

Practical needs of the industry motivated the engineers. In particular, undersea telegraphy was most challenging of all technologies in that era. Essential to reliable operation was to measure accurately resistances at all points along the cable. Resistance measurements could also identify points of failure quickly. Most importantly, resistance required precise validation at the factory before laying the cables across continents. The problem was that there was no standard for resistance. In England, a mile of No. 16 copper wire was used as the reference for a unit of resistance. The Germans used a mile of No. 8 iron wire while the French used a kilometre of iron wire 4 millimetres in diameter. The result was when one engineer quoted measurements of resistance, another could not easily verify. The problem beset the scientists as well.

While resistance is important, its wide range is even more essential for electricity to be put to any use. A photographer requires both light and shadow. A musician requires both sound and silence. Likewise, for electricity to work, insulators are just as important as conductors. For without insulators, electrical circuits could not be constructed and we would have no control over the paths of current flows. In his famous _Treatise on Electricity and Magnetism_ James Clerk Maxwell succinctly summarized the importance of resistance,

In the present state of electrical science, the determination of the electric resistance of a conductor may be considered as the cardinal operation in electricity, in the same sense that the determination of weight is the cardinal operation in chemistry.

Engineers and scientists thus began to collaborate on standardizing resistance. But this idea of resistance is so fundamental that we need to trace it to its origins to understand it better.

Ohm's Law, known to every electrical engineer today, was formulated by George Simon Ohm in 1827. In the early days of electricity and magnetism, experiments were qualitative. It was Coulomb who first introduced quantitative measurements in his experiments with static charges. Ohm followed in this tradition and put electricity on a quantitative footing. It is remarkable that while research from 1800s to 1820s was focused on batteries, electrochemistry, and electromagnetism, no one thought of studying the characteristics of the circuit itself. These were days when avenues for investigation were many and often a new discovery crowded out another waiting by the sidelines. Voltage, current, and resistance were concepts in primitive formation. People were loosely and interchangeably referring to terms of their own invention without any consensus whatsoever—tension, intensity, quantity, excitation, electromotive force, and electroscopic force. Ohm did not set out to solve this problem. His ambitions were far grander, to understand how electricity flowed.

Are geniuses born of genetics, or are they a product of circumstance? The answer of course is elusive and could never be conclusive. The case of Ohm is a classic case of a scientist being born at a time when rapid advances in knowledge were happening in various fields. One such field that directly influenced Ohm was heat. In 1822, French scientist Joseph Fourier published what is today regarded as a classic, _The Analytic Theory of Heat_. This single publication was to revolutionize not just our scientific approach to heat but the whole of engineering for generations to come. Ohm was among those who came across Fourier's work.

Fourier did not address the real question of the inner nature of heat. The physical reality of heat did not bother Fourier. His mathematical analysis did not require it. Fourier's real concern was the dynamics of heat transfer from the hot end of a metal bar to its cooler end. Heat flowed faster when the temperature difference was greater, slowed down when the difference was less, and eventually stopped when thermal equilibrium was reached. Moreover, rate of heat transfer was determined by the cross section of the bar and a property of the metal known as thermal conductivity. Fourier's work on heat inspired Ohm. An analogy to the flow of electric charges was suggestive. Ohm asked himself a few questions. Could each conductor have a property that we may call electrical conductivity? Could the rate of flow of charges be increased if we increase the conductor's cross section? Could there be a gradual dip in tension (electric potential) from one end of the wire to the other, just as temperature drops in Fourier's analysis of heat? Could there be a local equilibrium at each point of the conductor so that charges going out are replenished continuously by charges coming in from the other side?

Ohm, being the son of a master locksmith, had learnt the art of metalworking at a young age. His mechanical skills were promptly put to good use. He accurately constructed bars of different dimensions and metals. Due to lack of sophisticated measuring devices, he used a reference wire for benchmarking. This was connected in series with the wire under test. Current was measured using an early galvanometer. Voltaic cells back then were unreliable since their voltage quickly decreased with usage. No one had complained about it earlier, but Ohm needed to perform accurate quantitative measurements over many experiments. He needed something more reliable. Fortunately, for Ohm, Thomas Seebeck had invented the thermoelectric cell just a few years earlier. Such a cell worked on the principle that junctions of different metals, such as copper and bismuth, when maintained at different temperatures, produced an electric current. Seebeck's cell handed to Ohm one more reason to believe in the analogy he had drawn between heat and electricity.

With the setup ready, Ohm performed his experiments. He not only answered in the affirmative the questions he had asked himself earlier but went further to formulate an equation connecting current, voltage, electric conductivity, and wire dimensions. He showed that if he doubled the length and doubled the cross section, circuit current remained unchanged. From his tests of nine different metals, he concluded that copper was most conductive and lead was least conductive. He even gave a quantitative figure for each metal, giving copper a reference figure of 1000. Most importantly, this conductivity was a property of the metal and had nothing to do with circuit current or voltage. His paper of 1826 was the first publication of what we today call Ohm's Law. There was no mention of resistance, a word that would be coined much later. Instead, Ohm talked about _reduced length_ , in which he included wire dimensions and conductivity. The beauty of Ohm's Law lies in this final reduction. Reduced length, being today's equivalent of resistance, was an abstraction that liberated circuit analysis from physical parameters. Physical parameters are necessary only in the manufacture of a resistor, but engineers who design or analyse circuits need to concern themselves only with the concept of resistance. Researchers later applied this powerful method of abstraction in many areas. They could treat currents and voltages as abstract signals, which could be then analysed in a generic way. The real irony was that despite the fundamental importance of Ohm's Law, it was ignored for a long time. Recognition came only in the 1840s.

Ohm's Law

The famous law is simply stated as V = IR. Voltage (V) contributes to greater current flow (I). Resistance (R) has the opposite effect of reducing current flow.

For centuries, a handful of measuring concepts had been sufficient—mass, length, time, angle, and temperature. But electricity was a completely new phenomenon. There was no consensus on how to express it quantitatively. There was a clear need for a physical standard and everyone had to agree to use the same one. The British Association for the Advancement of Science (BAAS) had been setup in 1831 to address the problem of decline of science in Britain. In the matter of standardizing resistance, the BAAS perhaps saw its greatest opportunity to put science back on its track. In 1861, the British Association Committee on Electrical Standards, under the able leadership of William Thomson, commenced work on standardizing resistance. Some of the groundwork had been done earlier by two Germans, Carl Friedrich Gauss and Wilhelm Weber.

Most mathematicians agree that Gauss is one of three greatest mathematicians of all-time, the other two being Archimedes and Newton. It is unfortunate that just as much as the public is ignorant of engineers over scientists, they are in greater extent ignorant of mathematicians. The fact that Archimedes and Newton are well known is largely due to their being primarily physicists. In their hands, mathematics was merely a crafted tool. Archimedes is known for dashing out of his bath dripping with nakedness, while the words "Eureka! Eureka!" proclaimed his revelation. Not to be outdone, Newton too has a story to tell—that the seed of dormant ideas germinated with the falling of an apple. Gauss, on the other hand, was primarily a mathematician for whom there is no such anecdote. While his best mathematics was done at a young age, he contributed to physics as well in his later years. In particular, it was physicist Weber who persuaded Gauss to experiment on magnetism. The project had a grand ambition from the start. It was to map earth's magnetism.

Much of their work was done in the 1830s. In a landmark 1833 paper, Gauss proposed a system of measurements that could be used to measure magnetism. What was really needed was to relate mechanical and non-mechanical quantities in a consistent manner. Measurement of length, mass, and time had been in practice for centuries. Defining these as fundamental units, Gauss showed how non-mechanical quantities could be derived in terms these fundamental units. Newtonian mechanics had done this for mechanical quantities. For example, velocity does not require a new unit since it is based on length and time. Likewise, force can be expressed in terms of mass (M), length (L), and time (T) by using Newton's Second Law: F = ma. Thus, force has dimensions [MLT-2]. We may give the unit of force a new name, _newton_ as it is called today, but it is a derived unit that can be expressed in terms of fundamental units. By definition, it would be the force needed to give one unit of mass an acceleration of one metre per second square. Coulomb and Ampère had already expressed charge and magnetic interaction in terms of forces. Using these laws, it was a small step to define units of electricity and magnetism based on mass, length, and time. The use of MLT as fundamental units is purely arbitrary and perhaps ill suited for quantities involving electricity and magnetism. Gauss selected them perhaps for historic reasons, since standards existed for them.

In surveying the earth's magnetism, Gauss and Weber were perhaps the first in the world to create a system of codes and communicate up to a distance of a mile using their own electric telegraphy. This by-product of their main research was not well documented and eventually lost. It was at this time that Morse started working on telegraphy but it would take him a few more years to demonstrate a working model. In those years, Gauss invented the heliotrope as well as the bifilar magnetometer. It is rare for mathematical genius and engineering skills to be found in the same person. By 1838, Gauss had gathered enough measurements to plot earth's magnetic field at various points. He showed mathematically that earth's magnetic field originates from within the earth. This was a significant finding. William Gilbert had suspected it two centuries earlier but no one had managed to prove it until now.

We know that historically static electricity and magnetism evolved separately. Though Oersted had shown the link between the two, for the purpose of measurements, scientists continued to treat them independently. This was as much a legacy of eighteenth-century practice as an unwillingness to conceive of something better. When the problem was finally addressed, unexpected difficulties surfaced. There were units for static charges and separate units for moving charges. The dilemma was how to express an equation in which both static and moving charges were involved. This problem confronted Weber in 1846 when he modified Coulomb's Law to express force as a function of not just static but also moving charges. This led to a strange constant in the equation, which was not immediately understood.

Weber noted in 1851 that resistance could be expressed as a velocity. This may appear strange to modern engineers but the approach he took to arrive at this conclusion was quite simple. By looking at the dimensions of quantities expressed in either electrostatic (esu) or electromagnetic (emu) units, he saw that they related to velocity. This approach is what we today call _dimensional analysis_. Specifically, resistance when expressed in emu units had the dimensions of velocity: [LT-1]. Weber's doctoral student Friedrich Kohlrausch got interested in the problem. Kohlrausch and Weber then proceeded to measure this velocity indirectly. Their experiment itself was quite simple. A Leyden jar was charged and the amount of charge was quantified in esu units. Then the jar was discharged and the current was measured using a galvanometer. This measurement yielded the charge in emu units. A ratio of the two yielded the much sought after velocity, the same conversion factor that had appeared in Weber's equation. In 1856, they published their findings. The ratio was 310,740,000 metres/sec. For the first time, here was proof that electricity is not instantaneous. Its speed was limited and appeared to be a constant.

This was an exciting period for scientists. They were getting closer to answering the question about electricity. The number quoted by Kohlrausch and Weber was uncannily close to the speed of light. Englishman James Bradley had estimated the speed of light experimentally to 301,000,000 metres/sec way back in 1729. More recently in 1849, French physicist Hippolyte Fizeau had measured it to 315,000,000 metres/sec. That someone conceived that speed of light could be a constant is in itself a remarkable achievement of science let alone the task of measuring it. The work of Kohlrausch and Weber prompted German physicist Gustav Kirchhoff to comment in 1857 that electricity travelled at the speed of light. In subsequent decades, right up to the turn of the century, measurement of the speed of light became a prime concern. Many pursued the dimensional approach of Kohlrausch and Weber, except that they used better and more accurate apparatus. At least seven different methods of determining the speed of light are known. Others took a direct approach as suggested by Fizeau or Foucault.

James Clerk Maxwell turned out to be a key figure in these developments. He not only measured the speed of light but also laid down new theoretical foundations of understanding. Forget electricity for a while, he proposed, and let us look at light itself. Since the flow of electric current was at the same speed as light, does it not make sense to say that light was electromagnetic in nature? By now, the debate about the nature of light had been going on for two centuries since the time of Descartes and Newton. Maxwell leaned towards the wave theory nature of light. By mathematically formulating equations of electricity and magnetism, he proposed that light was an electromagnetic wave that propagated through any given medium just as sound waves did. In addition, Maxwell's theory created a new force to complement Newton's gravitational force. It was electromagnetic force that later explained why atoms are relatively stable.

Maxwell's theory of electromagnetic propagation stands to this day as a formidable pillar of modern science. That Maxwell should discover one of nature's deepest secrets is vindication of his simplicity, modesty, and love of nature. These attributes were matched by his penetrative thoughts and mathematical prowess. It is often said of mathematics that one doesn't really understand a formula or a derivation unless one has done it himself. Maxwell subscribed to this philosophy. At only fifteen, his work on conic sections was presented at the Edinburgh Royal Society. While moving to Cambridge, his zest for experimentation was quite clear when he packed for experimentation "his scraps of gelatins, gutta percha, and unannealed glass, his bits of magnetised steel, and other objects."

William Thomson credits Jenkin for putting into engineering practice the absolute system of measurements proposed by Gauss and Weber. Inspired by these developments, the British Association Committee standardized the unit of resistance in 1862. The unit was named _ohm_. The ohm was then a fundamental unit but today under the Système International d'Unités (SI), the only electromagnetic quantity that is fundamental is the _ampere_ , the unit of current. Today the ohm is a derived unit. It may be said that the work of standardizing resistance partly helped in the understanding of electricity. Given that the speed of electricity was finite and the equations of Gauss, Weber, and Maxwell agreed, this showed that electricity was in fact movement of charged particles. In a letter of October 1861, C. J. Monro communicated his thoughts to Maxwell,

The coincidence between the observed velocity of light and your calculated velocity of a transverse vibration in your medium seems a brilliant result. But I must say I think a few such results are wanted before you can get people to think that, every time an electric current is produced, a little file of particles is squeezed along between rows of wheels. But the instances of bodily transfer of matter in the phenomena of galvanism look like it already, and I admit that the possibility of convincing the public is not the question.

No one really knew what exactly was this "little file of particles." When the answer came towards the end of the century, it would shatter more than two thousand years of thought and herald the birth of a new science.

Since the time of the ancient Greeks, it had been supposed that the world was made up of small indivisible particles. Democritus in the fourth century BC gave such an entity a name: the atom. To Aristotle, this was rubbish. There was no such thing as an atom. The universe was made up of four elements—air, water, earth, and fire. Beyond the realm of the planets was a fifth element named quintessence. As chemistry progressed, it was clear that Aristotle had been wrong. To the chemists of the nineteenth century, atoms were represented physically as chemical elements. Atoms were the building blocks of nature. One could combine atoms to form molecules and compounds but one could not subdivide atoms. There was, however, a fundamental problem.

If atoms were indivisible, shouldn't all atoms be of the same nature? If there were so many different elements, shouldn't these differences arise due to different internal makeup of these elements? If this were the case, shouldn't atoms be divisible? Alternatively, if we consider atoms to be indivisible, why are there so many of them and how many more are still undiscovered? If there are so many different indivisible atoms, doesn't this imply that nature is complex at a fundamental level? Isn't this contrary to the simplicity we have come to expect of nature?

The answer came in 1897 from an English physicist experimenting at the University of Cambridge. J. J. Thomson was at the time Cavendish Professor of Experimental Physics. He was not particularly recognized in experimentation but "his talent lay instead in knowing at every moment what was the next problem to be attacked." It was under his leadership that the Cavendish Laboratory rose to great fame and recognition. The first problem Thomson attacked was the study of mysterious emissions inside cathode ray tubes. Cathode ray tubes had been in existence for more than half a century. At the start of the century, chemists had been inspired to pass electricity through water. Later, scientists got the idea of passing electricity through gas. The cathode ray tube was born from such a conception. Initial attempts did not throw up any significant findings. The turning point came when Johann Geissler in Germany and Hermann Sprengel, another German but settled in London, independently invented the mercury pump that could create a high degree of vacuum. All of a sudden, the cathode ray tube came on its own and revealed new secrets. In the years 1858-1859, Julius Plücker noticed a greenish glow near the cathode. Something seemed to come out of the cathode, travelled through near vacuum, and reached the anode. Moreover, a magnetic field near the tube deflected the path taken by the "rays," suggesting that whatever was coming out of the cathode clearly had a negative charge. William Crookes of England reported similar results in 1878-1879. Neither offered a satisfactory explanation and that was how things stood until Thomson took up the investigation in the 1890s.

What was needed was a quantitative approach and accurate measurements. Interpretation and explanation would follow in proper sequence. Thomson designed his apparatus such that in separate experiments a magnetic field or an electric field would deflect the rays. He would measure the deflections and based on other known parameters, mass-to-charge ratio could be derived. His method did not allow either mass or charge to be measured individually. For the moment, the ratio would have to suffice. In separate experiments, he filled the tube with various gases—air, hydrogen, or carbon dioxide. He changed the cathode too—aluminium or platinum. His experiments showed that mass-to-charge ratio in all cases turned out to be about the same value. In fact, when compared against modern measurements, it was overestimated by a factor of two. This is hardly relevant. The conclusion that Thomson derived from his experiments is what startled the world of science,

we have in the cathode rays matter in a new state, a state in which the subdivision of matter is carried very much further than in the ordinary gaseous state: a state in which all matter—that is, matter derived from different sources such as hydrogen, oxygen, &c.—is of one and the same kind; this matter being the substance from which all the chemical elements are built up.

It was clear from Thomson's experiments that whatever was coming out of the cathode was in fact something so basic that it is found in all types of atoms. Physicists and chemists knew about ions, which are charged atoms or molecules. However, the new particle identified in Thomson's experiment was a thousand times lighter. It had to come from inside the atom. Some years earlier, when studying ions, George Stoney had given a name for the quantity of charge acquisition— _electron_. Now the term acquired a new meaning.

The discovery of the electron was an important milestone in the progress of science. The atom was divisible after all. Suddenly, differences among elements could be understood. The chemists could explain why certain elements combined in certain ways and why certain molecules were more stable than others. Maxwell had written about electromagnetic force and the negatively charged electron eventually explained it. Most importantly, the flow of electricity was really flow of electrons. Electrons were the charge carriers in electrical flow. In good conductors of electricity, some electrons were loosely bound to the metallic structure. These _free electrons_ moved with purpose in an electric field and this phenomenon explained electricity. The science of subatomic physics was born. In time, it would give rise to quantum physics. While new discoveries continue to happen and particles once thought fundamental turn out otherwise, the electron remains to this day stable and unchanged since its original discovery.

The electron finally explained electricity but the inner nature of electromagnetism continued to be elusive. Which was the base principle—electricity or magnetism? While Gauss had answered the question about the location of earth's magnetism, a more important question had been asked long ago. What caused earth's magnetism? Ampère answered it using a thought experiment. Oersted had shown that current flow creates magnetism. Ampère started with the postulate that the earth's magnetic field is possibly due circulating currents within the earth. Applying the same thought process, Ampère argued that a permanent magnet gets its magnetism from little circulating currents within the magnet. When not magnetized, the orientations of these currents were random and cancelled each other's effect. When magnetized, they aligned to give rise to magnetism. This view was in contrast to Coulomb's belief that magnetic fluids create electricity.

Modern science agrees with Ampère. Earth's magnetism is due to currents in its molten core and the current results from plasmic convection. Permanent magnets get their property due to electron spins and orbital motions. In his honour, we have named these as _amperian currents_. But back in those days, Ampère's views were doubted. Faraday criticized them. A suggestion was no proof. A journey to the centre of the earth was fiction and no one had seen little currents inside magnets. Yet Ampère had put forward these proposals with conviction that came from the power of his thought experiments.

_Gedankenexperiment_ , a term coined by Oersted and later popularized by physicist Ernst Mach, is German for _thought experiment_. Thought experiments can be traced to the time of Galileo Galilei in the seventeenth century. Galileo holds an enviable position in the history of science as being the first to adopt new methods of scientific investigation. He dismissed Aristotelian methods, which seemed to throw up many contradictions. The wisdom of Aristotle had been handed down through the generations as gospel truth. For Aristotle, the world of experience gathered through our senses was inferior to logic and thought. Experience could be used to look at effects but they could not lead you to the causes; or for that matter, give a suitable explanation. By such philosophical thinking, Aristotle proposed that heavier objects fall faster than lighter objects. Weight was the cause and speed of motion was the effect. No one thought otherwise since everyone had seen an iron ball fall faster than a feather. Galileo, arriving at the scene almost two millennia later, questioned the logic of Aristotle. Something didn't seem right.

In his book, _Two New Sciences_ , Galileo visualizes an experiment in which a heavier ball is falling and midway through the fall a lighter ball is attached to the heavier one. Aristotelian logic says that the balls taken together are heavier, and would therefore fall faster. At the same time, it is known by common experience that attaching an object to another retards speed of motion. This contradiction means that Aristotelian logic is flawed. From here it is concluded that rate of fall is independent of an object's weight. This classic example is a suitable introduction to the idea of thought experiments. They are not real experiments but experiments in a limited sense in that they are performed in the laboratory of the mind. They do not involve new observations or study of new phenomena. Rather, they reorganize and recombine in new ways knowledge already gained from prior experience. Thus, thought experiments are not philosophical speculations in Aristotelian style. They are in fact empiricism extended.

In Galileo's own time, the concept of experimentation was not yet born. Without performing any real experiment, Galileo concluded that all objects fall at the same speed given the same time. His conclusion was drawn from his prior experience and knowledge rearranged to give new knowledge. His method was philosophical and his book was written in the style of dramatic dialogues between three proxy-characters. The difference was that Galileo dressed his philosophy in mathematics with empirical support. If he performed experiments at a later date, it was only to demonstrate his knowledge to disbelievers. Therefore, unwittingly he started the scientific method of experimentation although his book did not quote a single experimental result. His experiments were only thought experiments. Yet he gives sufficient details for others to set up apparatus and conduct experiments.

In the history of scientific methods, two schools subsequently emerged from the failings of Aristotelian methods. One was rationalism, which stressed on reasoning with the aid of mathematics as a way of proving hypotheses. Copernicus, Kepler, and Galileo were among the rationalists. Their work can be seen as mirroring the rationalist philosophers Descartes, Spinoza, and Leibniz. To these philosophers, knowledge was available to the mind a priori. Everything about the world could be constructed by reasoning and logic. The opposing school was empiricism, which stressed on experimentation. Things in nature are not obvious and a scientist must create circumstances to obtain new observations that might lead to new knowledge. In this school were Francis Bacon, William Gilbert, Robert Hooke, Henry Cavendish, and William Harvey. They too had their brethren-philosophers, chiefly Locke, Berkeley, and Hume. To them, the mind was incapable of a priori knowledge. All knowledge comes from the world around us via our senses. True to this philosophical stand, Francis Bacon articulated the need for controlled experiments. Rebecca Goldstein, a Harvard researcher, recently expressed Bacon's concerns rather poignantly,

Nature should be looked on as an uncooperative witness in a courtroom, who must be interrogated and even tortured in order that the information be extracted.

It is easy to see that the rationalists were congregated in the Continent while the empiricists lived across the English Channel. It is therefore not surprising that when the Royal Society of London was founded in 1660, the focus from the outset was on experimentation and observation. In this tradition arose Humphrey Davy and Michael Faraday. While mathematics languished in England after Newton, Europe dominated the field all through the eighteenth and nineteenth centuries. It was therefore natural for European scientists to follow in their steps and adopt a mathematical approach to enquiry. Sanctioned by Napoleon Bonaparte, the École Polytechnique was instituted in Paris in 1794 and attracted the best intellectuals of France. In a three-year curriculum, focus for the first two years was on basic sciences and mathematics. In this mathematical tradition arose Laplace, Jean-Baptiste Biot, and Fourier. In hindsight, today we can rightly say that the world needed them both.

It was only towards the end of the nineteenth century that Mach articulated the concept of thought experiments. Mach reconciled empiricism with knowledge creation without requiring further experimental observations. History had shown that many before Mach's time had taken recourse to thought experiments—Galileo, Newton, Ampère, and Maxwell. There were limits to what could be learnt by experimentation alone. On the other hand, experimental results provided a vast store of knowledge, which could be rearranged to reveal something new. In time, modern science would make extensive use of thought experiments. Maxwell's Demon and Schrödinger's Cat are colourful creations of thought experiments. In formulating his theories of relativity, Einstein took thought experiments to a new level, so much so that Mach himself regretted that Einstein had gone beyond empiricism and ventured into ideas purely metaphysical.

If Mach's criticism of Einstein is valid, perhaps scientists had regressed into Aristotelian methods in which experiments had no place. On the contrary, Einstein had one powerful tool the Aristotelians did not possess—mathematics. The mathematics of the ancient Greeks had been limited to geometry. Mathematics in Einstein's time was much more sophisticated. If Einstein had indulged in thought experiments, it was because mathematics gave him the confidence to do that. Einstein's method was not purely thought and reasoning as the Aristotelians. For Newton, experience served well in his treatment of forces, motion, and gravity. For Einstein, experience alone could not lead him to relativity just as Newton remained uncomfortable about his notion of absolute space. There was no experiment he could have performed to arrive at relativity. He had to do it the other way—start with thought experiments, arrive at a simplest possible theory, and leave it to others to validate by experimentation. In a lecture delivered at Oxford in 1933, Einstein commented,

If, then, it is true that the axiomatic basis of theoretical physics cannot be extracted from experience but must be freely invented, can we ever hope to find the right way?... I answer without hesitation that there is, in my opinion, a right way, and that we are capable of finding it. Our experience hitherto justifies us in believing that nature is the realization of the simplest conceivable mathematical ideas. I am convinced that we can discover by means of purely mathematical constructions the concepts and the laws connecting them with each other, which furnish the key to the understanding of natural phenomena. Experience may suggest the appropriate mathematical concepts, but they most certainly cannot be deduced from it. Experience remains, of course, the sole criterion of the physical utility of a mathematical construction. But the creative principle resides in mathematics. In a certain sense, therefore, I hold it true that pure thought can grasp reality, as the ancients dreamed.

Mathematics allowed a confident Einstein to predict effects even before anyone had observed them in the real world. The famous experiment conducted by Arthur Eddington in 1919 on the island of Principe, off the coast of Africa is a case in point. Through his _General Theory of Relativity_ , Einstein had unified his own _Special Theory of Relativity_ and the gravity of Newton. One of the predictions of this theory was that intense gravitational fields would bend light, which otherwise travelled in straight lines. Of course, when the theory was first published in 1915 not many believed in it. A chance came in May 1919 when a solar eclipse was expected. The moon standing directly between the sun and an earth-observer should bend light and show the predicted gravitational lensing. The day arrived. The experiment was duly performed. The good news was conveyed to Einstein. Einstein did not jump in joy. All along, he had known that his theory was correct and if there had been a contradiction, he would have deemed the experiment faulty.

The power of mathematics to predict effects and uncover secrets has many remarkable examples. Irish mathematician William Hamilton was a child prodigy from the start. As a boy, he never attended school yet he knew more than a dozen languages. At seventeen, even before entering Trinity College, Dublin, he made significant strides in the study of optics to which he applied algebra. Unlike his predecessors who had analysed optical rays individually, Hamilton considered a multitude of light paths as a system of rays and reduced it to a single mathematical function. From this came out his most spectacular discovery about refraction of light through biaxial crystals such as topaz. He predicted that a single ray of light would be refracted to give an infinite number of rays arranged in a conical geometry. When Humphrey Lloyd experimentally verified this, Hamilton became an overnight celebrity within scientific circles. The methods that Hamilton employed would later prove useful to quantum theorists.

Sometimes mathematics contributes in seemingly trivial ways that are no more than an ordered arrangement of numbers with a fair bit of ingenuity thrown in. In 1869, when Russian chemist Dmitri Mendeleev arranged his table of chemical elements, he took the approach of organizing the elements by chemical _valence_ , valence being a measure of the combining power of an element. All elements of same valence had to be on the same column since they exhibited similar properties. To satisfy this criterion he sometimes placed elements of higher atomic weights before those of lower atomic weights, an idea that had not occurred to others. Moreover, he did not insist that the periods of rise and fall of valence should be fixed. When he had done this, he noticed gaps in the table. Rather than treating these gaps as failures of his theory, he remarked that at these positions there were yet undiscovered elements. True to his prediction the elements gallium, scandium, and germanium were later discovered. This was just the beginning. Scottish chemist William Ramsay discovered the gas argon in 1894. It had a valence of zero but no column in Mendeleev's table had this valence. However, if Mendeleev's approach is indeed correct, argon hinted that there would be more such elements with a valence of zero. Indeed, Ramsay went on to discover helium, neon, krypton, and xenon, all within the family of inert gases.

Mathematics is never parochial in scope. It gives power equally to both science and engineering. Modern mathematics received a much-needed boost in the hands of Isaac Newton and Gottfried Wilhelm Leibniz. Not only did these two men independently formulate calculus, they also had distinct approaches to analysing the physical world. Newton looked at the details, striving to explain the world in terms of forces and momentum. Leibniz analysed at the system level, studying such aspects as work and energy. This diversity in the application of mathematics also gave it an unusual power. Results by one method could be verified or questioned using the other method. In the experiments leading to the discovery of the electron, J. J. Thomson had in fact used both methods. Unfortunately for Thomson, he mistook the more accurate result as an anomaly.

While eighteenth century saw great advances, it is the nineteenth century that we today recognize as the golden age of mathematics. From this period was born a concept so extraordinary that it would soon become the bread and butter of every engineer.



**The most useful** gift that mathematics has bequeathed to the world of engineering is perhaps the concept of orthogonality. Although orthogonality had been known to mathematicians much earlier, engineers picked it up seriously only in the nineteenth century. Pre-Renaissance engineering was based on a strict system of apprenticeship and craft guilds. Engineering then was not a science but more an artistic tradition. Traditional skills in metalworking, carpentry, or tannery that had evolved slowly over many centuries were passed from one generation to the next. Those who participated in these crafts lived at the level of subsistence. There was little scope for risk-taking or experimentation. A skilful practitioner's experience sometimes suggested small design or process changes. Those that worked survived. Those that didn't, perished, at times taking the craftsman with them. Without science or mathematical support for upfront analysis, no one could afford to make big design changes.

Renaissance brought changes, first in scientific thinking and later in engineering practices. New inventions brought new business opportunities, which in turn encouraged investors to take risks. The medieval guild system slowly began to lose its power. New ideas entered mathematics. Mathematics entered physics and their fruitful union was called mathematical physics. From here, ideas percolated into engineering. From this point onwards, engineering became less an art and more a science. Design would no longer be trial and error. Engineers could formulate sophisticated rules of practice. Engineers could simulate, analyse, and predict performance even before building the first prototype. Engineers could compare alternatives. Engineers could execute complex projects under strict constraints of time and resources. Taken together, these changes enabled engineers to more than just create. They could now innovate.

Orthogonality. The word suggests something complex but at a basic level, it is quite simple; but simple things are often hard to explain. If one is asked to give one's location on earth, three things are required—latitude, longitude, and altitude above sea level. It is not possible to substitute one for the other. For example, given latitude and longitude one could not derive the altitude. In this sense, all three are independent of one another and defined to be orthogonal. This system of three entities make up earth's coordinate system and each entity represents a dimension of space. Since we know that space is three-dimensional—ignoring for the moment the deeper reality of Einstein's space-time continuum or the multidimensional ideas of String Theory—any spatial location in the universe can be represented with just three dimensions. These dimensions are probably not based on latitude, longitude, and altitude, which are limited to earth; but they could be anything we suitably define them to be so long as they are orthogonal.

One way to understand orthogonality is through the physics of Newton. Two forces that are perpendicular to each other in direction are orthogonal. One force cannot be projected on to the other. In a tug of war, the rope moves only left or right, never up or down. A woman attempting to swim directly downstream will never be able to cross to the other bank. Her best chance of crossing it quickly is to swim directly across. She may be washed downstream but her strokes directly contribute to how fast she can get to the other bank. In this sense, the direction of the stream is orthogonal to direction of crossing. The river will never help the woman to get across. The fact that she is a good swimmer is her only hope. Likewise, a block placed on an inclined plane may or may not slide down. The way to analyse this is to project the force of gravity into two orthogonal components, one that is parallel to the inclined place and the other perpendicular to it. The latter contributes to the block's frictional force against the plane and prevents it from sliding but only if it's greater than the downward pulling force parallel to the plane. Orthogonality helps us to break down complex situations into simpler forms and arrive at solutions.

The above examples suggest that orthogonality was perhaps understood by the early natural philosophers based on their observations of the world around them. Any concept that remains at the level of accustomed use never rises to greater glory. It was therefore left to mathematicians to give a formal definition of orthogonality. René Descartes of seventeenth-century France is generally not regarded as a great mathematician but he made one important contribution. He brought algebra to geometry and gave rise to the field of analytic geometry. Although he did not articulate the idea of orthogonality, the roots of the idea are to be found here.

The ancient Greeks had almost perfected geometry through such greats as Euclid and Apollonius. The Western world had no knowledge of algebra until the twelfth century when it was introduced from the Arab world. What Descartes did was to bring the analytical power of algebra to the visualization of geometry. For example, in planar (two-dimensional) geometry, he carved up the plane into four quarters by drawing two lines perpendicular to each other. One of them is what we today call the _x-axis_ , the other being the _y-axis_. Now a circle of radius r could be represented in algebra by an equation: x2 \+ y2 = r2. Intersection of a line such as y = -2x and a parabola represented by a quadratic equation such as y = x2 \- 2x - 3 could be solved algebraically but also verified by geometry. What Descartes gave us was not just a representation of points on a plane but also the idea of relationship, meaning that a y-value can be written in terms of an x-value.

Descartes gave equations connecting the x and y variables but fell short of identifying independent and dependent variables. Had he only done that he might have arrived at the modern notion of functions. Today, we say that y is a function of x and write y = f(x). Leibniz coined the word _function_ a few decades later and functional analysis itself came of age only in the eighteenth century. In any event, Descartes marked the starting point in a line of developments that led to functional analysis. Without Descartes, calculus might have come perhaps a generation after Newton and Leibniz. His name was forever immortalized in such terms as Cartesian plane and Cartesian coordinates.

It will now be apparent that a line that is parallel to the x-axis implies that the y-value has no dependence on the x-value. Likewise, for a vertical line, y-value does not influence the x-value. Thus, we may say that x-coordinate is orthogonal to the y-coordinate. Both are necessary to mark a point in the Cartesian plane. The concept is easily extended to the third dimension, usually named z-axis, to form a space coordinate system not unlike the latitude, longitude, and altitude of earth coordinate system. Although surveyors in ancient Egypt (3200 BC) used a grid of squares to locate areas on land, there was no mathematical formulation the way Descartes effected.

It all seems trivial today but this remarkable development of Descartes had not occurred to anyone for two thousand years following the age of Greek supremacy in geometry. Later in life, Descartes attributed his success to long mornings spent in bed in peaceful contemplation. As a boy, he had been physically weak and had taken to the habit of getting up late. These morning meditations, if we may call them so, gave him ample occasions to philosophize and penetrate the deeper truths of reality. He never married, preferring to dedicate his life to philosophy and mathematics in which he found both peace and purpose.

However, Descartes was left with one problem that defied Cartesian representation. About a century earlier, European mathematicians had encountered a strange mathematical fact. They did not believe at first what they found and relegated it to the domain of fiction. While seeking solutions of cubic equations, they ended up with the need to take square roots of negative numbers. They had had trouble understanding negative numbers in the first place and now they were asked to perform non-trivial operations on them. Though this can happen even for quadratic equations, for some reason they either had not met them or had dismissed them lightly. These numbers are what we today call _complex numbers_ , that is, numbers that have in them a square root of minus one. Descartes declared that these numbers did not have a geometric representation. He called them _imaginary numbers_ , commonly represented today with the letter _i_.

The conceptual leap happened at this point in representing complex numbers not on the Cartesian plane but on a Cartesian-like plane. In 1806, Jean-Robert Argand was among the first to conceive of representing real numbers on the x-axis and imaginary numbers on the y-axis. With this move, geometry joined algebra in the realm of the abstract. A geometric plane need not have a physical interpretation. Since then, such representations have been called _Argand Diagrams_. Earlier in 1797, Norwegian Caspar Wessel had presented a similar idea to the Royal Danish Academy of Sciences but his contribution was largely ignored until about a hundred years later. The paper of Wessel articulated not just the complex plane representation but also how it could be used. In Wessel we find the beginnings of modern vector analysis.

A vector can be visualized as an arrow pointing in a said direction and of such length that indicates magnitude. A billiard ball rolling towards a pocket has both speed and direction. Speed is just a number whereas velocity is a vector that incorporates both speed and direction. When two billiard balls collide, mathematical analysis considers the interaction of two vectors. Although Wessel did not treat this exact problem of colliding billiard balls, the principles he established provided a new mathematical treatment of old Newtonian mechanics. He used the geometry of a parallelogram to add the two vectors to get the resultant vector. While his parallelogram law is now famous, underlying it is the concept of orthogonality. By separating each vector into its orthogonal components, adding them individually, and finally recomposing the final vector, one arrives at the same answer. In other words, orthogonal components add numerically without affecting other orthogonal components. William Hamilton articulated the same concept in the 1830s. He gave an algebraic definition of complex numbers (or vectors) by considering each complex number ( _a_ \+ _ib_ ) as an ordered pair ( _a_ , _b_ ). From here, he went on to define addition of ordered pairs: ( _a_ , _b_ ) + ( _c_ , _d_ ) = ( _a_ \+ _c_ , _b_ \+ _d_ ), just as Wessel had done geometrically.

The above examples are of the simplest kind, the vectors being two-dimensional. In these cases, it is evident that two vectors orthogonal to each other would suffice to represent any arbitrary vector. Given a set of _M_ _N_ -dimensional vectors, how are we to know if they are orthogonal to each other? If not, can we find or construct a set of orthogonal vectors in which all _M_ vectors can be represented? Effectively, what we are trying to do is to find a minimum set of vectors that can be used to represent any arbitrary vector. This is of extreme importance in engineering and we have seen examples of this in Chappe telegraphy and Morse code.

To answer these questions, mathematicians first defined _orthonormal vectors_ , this being similar to orthogonal vectors except that their magnitudes evaluate to unity. This normalization is only for mathematical convenience. Towards the end of the nineteenth century, arising from the work of two researchers, the _Gram-Schmidt procedure_ was formulated. This enabled engineers to construct a set of orthonormal vectors, now called an _orthonormal basis_ or _basis vectors_. In essence, the procedure relied on successive projections of a vector on already identified orthonormal vectors until no new orthonormal vectors were found necessary. To illustrate the idea of orthonormal basis, let us take the example of four three-dimensional vectors:

v1 = [ 1 0 0 ], v2 = [ 0 1 0 ], v3 = [ 0 0 1 ], v4 = [ 3 2 0 ]

We may think of these dimensions as [ x y z ] of a Cartesian system or [ Latitude Longitude Altitude ] of earth coordinate system. It is easy to see that v4 can be represented in terms of v1 and v2: v4 = 3v1 \+ 2v2. This can be seen as v4 being projected on to vectors v1 and v2. However, there is no way we can represent v3 in terms of only v1 and v2. In this case, we state that the set { v1, v2, v3 } forms an orthonormal basis. Any vector can be represented with this basis alone. From here arises the concept of _linear independence_. The vectors of the orthonormal basis are linearly independent in that none of them can be represented as a linear combination (simple addition with numerical factors) of the others in the set. The fact that v4 is linearly dependent on this set means that it is not part of the orthonormal basis. What we have done successfully in this example is to arrive at a method of representing any vector with just three basis vectors. This has enormous practical importance in engineering, as we shall see in later chapters.

As early as Ohm, the concept of abstraction had taken shape. Currents and voltages could be treated generically as signals. However, in those days electric current was primarily direct current. The power of signal analysis was therefore not necessary. Once Faraday and Henry had independently discovered electromagnetic induction soon after, alternating current was born. Induction and alternating current are intimately related. When a permanent magnet is plunged into the core of a coil of wires, current flows. When the magnet is retrieved, current flows in the opposite direction. Repeating this motion of the magnet indefinitely gives rise to periodic current that is continuously changing direction. To complicate matters, capacitance and inductance in the circuit gave rise to complex relationships between current and voltage. The analysis of these relationships led to _phasors_ , which are vector representations of signals. Phasor diagrams are an engineering application of Argand diagrams. Thus, any signal could be represented as a linear combination of basis signals and this approach lent itself well to electrical circuit analysis.

The name phasor is due to the word _phase_ , which is the angle of the signal with the x-axis. This representation is a transformation of Cartesian form to Polar form that uses amplitude and phase ( _r_ , θ) rather than ( _x_ , _y_ ) for a signal. A transformation as this can change the basis signals but cannot reduce the number of basis signals. To understand phasors, let us consider for a moment a circuit in which a capacitor is in series with a resistor. The capacitor acquires charge due to the flow of current and as charge accumulates, voltage across the capacitor builds up. Thus, for a capacitor, voltage lags the current by 90 degrees. Engineers call this _phase lag_. This is unlike a resistor for which voltage and current are always in phase. The power of phasors comes from the fact that circuit analysis and application of circuit laws that had been invented for direct currents can be directly applied to alternating currents with only minor modifications. For example, Ohm's Law uses resistance R but with alternating currents the equivalent term is _impedance_ , represented as _Z_ = _R_ \+ _jX_ , where _R_ is resistance as usual and _X_ is reactance due to capacitors and inductors in the circuit. Impedance expressed this way is mathematically a complex number, engineers preferring to use _j_ rather than _i_.

Phasor Diagram of an RC Series Circuit

(a) A simple circuit with resistor (R) in series with a capacitor (C). (b) Phasor representation of the RC circuit showing the application of parallelogram law. We can see that VC lags current I by 90 degrees.

Let us think of current (I) as a sinusoidal wave. Thus, I = Imcos( _ω_ t+ θ), where Im is the maximum value and _ω_ is the angular frequency, which represents the rate at which the current is varying. If voltage (V) is lagging current by 90 degrees, this means that V = Vmsin( _ω_ t+θ). Functions sine and cosine are the well-known trigonometric functions. In the phasor diagram, sine appears as a 90-degree anticlockwise rotation of cosine. This agrees well with the mathematical understanding that the imaginary part is a similar rotation of the real part. In fact, this is the proper definition of a complex number. Mathematicians claim that the use of the word _imaginary_ is an unfortunate historical mistake of early misunderstanding. This is something they have had to live with forever.

Although the theory of phasors developed only towards the end of the nineteenth century, the final beauty that crowns it was figured out by Leonard Euler, one of the giants of eighteenth-century mathematics. Euler gave a beautiful relationship between what is now the Polar form of a phasor and its trigonometric representation. A phasor (P) can be written as P = rejθ = r(cosθ + jsinθ). Here we clearly see the orthogonal form as a complex number. The trigonometric functions sine and cosine are in fact orthogonal to each other over an entire period of oscillation. It therefore appears possible to form an orthonormal basis of signals based on only trigonometric functions.

Orthogonality of Sine and Cosine Waves

Consider the curve sin(x)·cos(x). Positive areas under the curve (A & C) offset the negative areas (B & D). Hence, sine and cosine waves over one period are orthogonal to each other.

It so happens that long before electricity was born and phasors roamed the earth, someone had already thought about this. Sometime during the early part of the eighteenth century, mathematicians started to study the phenomena of vibrating strings, propagation of sound, and the harmony in music. By then, the calculus of Newton and Leibniz of the previous century had advanced sufficiently to be applied to diverse areas of physics. Jean Le Rond d'Alembert in 1747 not only came up with an equation for the vibrating string (such as of a violin or cello) but also gave a general solution for it. In arriving at his general solution, he imposed certain restrictions on the initial displacement of the string. Euler welcomed this solution but disagreed with D'Alembert saying that his restrictions were not necessary.

Meanwhile by 1742, Swiss mathematician Daniel Bernoulli had come up with his own solution. It was natural for Daniel Bernoulli to attack the most difficult problems of his time. He was after all a member of the great Bernoulli family. His father was a mathematician and so were his two uncles. His two brothers were mathematicians and so were his two nephews. His father tried to make a businessman out of him but Daniel Bernoulli instead chose to become a physician, until the instinctive call of mathematics put him on the right path.

Unlike Euler and D'Alembert, Daniel Bernoulli adopted a physical approach to the problem. He noticed that when he vibrated an elastic cord he could hear both simple tones and dissonant overtones. In the case of vibrating strings, the overtones were perceived less easily as they were consonant to the fundamental tone. Effectively, the main tone was accompanied by its harmonics whose frequencies are integral multiples of the main one. Though all these different tones are produced at the same time, they do not interfere with one another, each one retaining its independence and nature. There was no wave equation or mathematical derivation in Bernoulli's method. His own heuristic reasoning combined with similar work published by Joseph Sauveur in 1701 prompted him to suggest that a multitude of sinusoidal waves, all of them being harmonics of the fundamental frequency, can be superimposed to describe the motion of the vibrating string. Bernoulli may also have realized that each sinusoidal component is orthogonal to another component. Harmonics are always orthogonal just as sine and cosine of a given frequency are orthogonal to each other.

Bernoulli's solution was a problem to both Euler and D'Alembert. Later, when the French mathematician Joseph Lagrange entered the debate with his own opinions, it only complicated matters. The debate raged for many years and there was really no resolution even towards the end of that century. Lagrange denied that Bernoulli could have heard harmonics from the elastic cord. Harmonics, he said, must have come due to resonance from surrounding objects. Thus, there could be no physical reality to Bernoulli's solution although he agreed that trigonometric series could be an approximate representation of continuous functions. While Euler partially agreed with Bernoulli's superposition of harmonics, he stated that an infinite number of sinusoids could not have a physical reality. To D'Alembert, multiple harmonics from the same string was sheer nonsense and he preferred to enjoy the music without getting into its mathematics. This was purely a mathematical problem and the physical reality did not trouble him. However, Euler and D'Alembert disagreed between themselves on certain restrictions to the problem at hand.

This debate amongst the greatest mathematicians of the eighteenth century highlighted many loopholes in mathematics. Calculus as Newton and Leibniz had left it was not rigorous enough. New applications of calculus threw up new unforeseen problems. The tools that mathematicians had in their hands were themselves open to suspicion. Mathematics had to question itself repeatedly and reinvent itself at times. What exactly was a function? When can we claim a function is continuous? The result of such soul-searching was that mathematics progressed. Definitions were refined. A new level of rigour was introduced. Loopholes were not simply plugged. Rather, entire structures were sometimes torn down and rebuilt to form a solid defence. Much of this work happened in the nineteenth century.

However, at the start of the nineteenth century, the validity of representing a function as a superposition of sinusoids was still an open question. Building on earlier results to the problem of vibrating strings, French mathematician Joseph Fourier took up the challenge. Born to a tailor, orphaned at eight, and educated at an institution run by Benedictine monks, Fourier was destined for priesthood. But men who have any talent and determination, make their own destiny and Fourier eventually found God's calling not in the clergy but in mathematics. His first publication on the matter of vibrating strings appeared in 1798. Fourier agreed with Bernoulli on the superposition principle and the physical reality of harmonics. No significant contribution appeared thereafter during the period 1798-1801 when Fourier kept himself busy in Egypt under Napoleon's desire to institute a cultural reformation of the "uncivilized" Egyptians. Nothing much came out of this except that years later Fourier published his research on Egyptian history, thus becoming one of history's few men to contribute in both mathematics and humanities.

A few years after his return from Egypt, Fourier commenced work towards understanding heat propagation, which at the time was a great, unsolved problem. By then, many facts about heat had been established. Unlike vibrating strings, heat was intangible but it could be sensed from temperature differences. Newton had figured out long ago that rate of cooling was proportional to temperature differences. In 1760, Joseph Black defined caloric heat, the capacity of a body to store heat. When ice melted into water, water "absorbed" heat and this is termed caloric heat. Black then defined specific heat, the ability of a body to change temperature and conduct heat. Fourier's approach naturally used these discoveries.

The work done by Biot prior to Fourier was in the Newtonian tradition, which treated point masses at different temperatures within the solid. These point masses interacted with one another at a distance. Fourier initially followed in this tradition but quickly realized the complexity of the enterprise. He later began to represent a solid as a composition of thin slices whose surfaces were at small temperature difference. Applying the principle of energy conservation to this thin slice, he arrived at the now famous _heat equation_. He also considered the loss of heat to the ambience, thus formalizing the study of _boundary-value problems_.

Fourier's solution of the heat equation was a function that gave the temperature at a future time, at any point within the solid under consideration. The problem itself is complex but Fourier solved it by representing the initial conditions, and hence the solution, as a superposition of harmonics. In general, any arbitrary function f(x) denoting the initial conditions could be written as,

f(x) = a0 \+ a1·cos(x) + a2·cos(2x) +... + b1·sin(x) + b2·sin(2x) +...

The coefficients of this expression could be determined using the property of orthogonality. For example, a1 = ∫f(x)·cos(x), since all other terms being orthogonal to cos(x) will evaluate to zero. Thus, a periodic function could be expressed in terms trigonometric functions, which are by nature periodic. Fourier makes clear his belief in this solution and its similarity to vibrating strings,

If the order that takes place in these phenomena [of heat propagation] could be seized by our senses, it would cause us an impression comparable to the harmonic resonances.

In 1807, Fourier submitted his thesis for consideration to the Institut de France. Among the reviewers were Laplace and Lagrange. The old debates on the vibrating strings, still fresh in old Lagrange's mind, meant that Fourier's representation using infinite trigonometric series was not accepted. Since trigonometric functions are continuous, it seemed illogical to represent discontinuous initial conditions using Fourier's suggested expansion. In the problem of heat conduction, discontinuity was common since two dissimilar metals in contact created such a discontinuity. With infinite number of terms, even the convergence of Fourier expansion was not certain. To arrive at the coefficients such as a1 and b1, the necessary integral could very well diverge. The paper was never published, until many years later Fourier himself took the trouble to publish it independently in 1822. It became an immediate classic, not because everyone agreed with it but because it was found to be practical for engineering applications.

Here is one of the fundamental differences between mathematics and mathematical physics, the difference being more pronounced between science and engineering. Mathematics as pursued by Lagrange, Laplace, and Euler had to be exact. Definitions and methods had to be perfect from every angle. Mathematics could not rely on anything external to itself. Nature cannot contribute to its understanding. Even in the abstract, mathematics is more real to a mathematician that all the realities of physics. This is the reason pure mathematicians never use thought experiments. Mathematics is built up from basic axioms and deduced thereafter from the simple to the complex. Mathematics is the rationalist's approach to knowledge. Though a self-contained system as this would be partial and has it limitations—a debate waiting to happen in the twentieth century—there was no suitable alternative in the eighteenth century.

Fourier claimed that his trigonometric expansion was valid for any function. The nature of heat conduction was found to fit this claim. Fourier did not set out to prove rigorously his claims. He gave particular proofs for a number of cases experienced in the world of physics. Fourier in this sense was a practical mathematical physicist. Likewise, engineers do not look for exact solutions, which in many cases are neither practical nor warranted. Engineers look for approximations within suitable bounds.

For a scientist, the resistance of a resistor may be taken as 5 ohms. He uses this to make an exact analysis of current flow and power dissipation. For an engineer, the same resistor is 5 ± 0.05 ohms. An engineer addresses real-world problems. He recognizes that it is impossible to manufacture a resistor or measure its resistance exactly. There is always a window of uncertainty. This uncertainty is factored into his design. This idea of uncertainty and tolerance is essential to engineering practice since parts can be replaced or interchanged so long as they are within a prescribed zone of tolerance. Loose tolerances for parts generally lead to poor quality or require better system design. Highly tight tolerances increase cost. Engineering precision is all about making devices within set tolerance limits.

For decades, mathematicians struggled to formalize the work of Fourier. In the process, mathematics itself underwent a transformation. _Fourier Series_ as the summation of trigonometric terms was applied in engineering but there were doubts. Since the series contains an infinite number of terms, will there be cases when the function does not converge? In the first place, are there interesting functions that cannot be represented in Fourier series? Counter-examples were proposed whereby such functions resisted Fourier series expansions. Ideas of convergence, limits, and continuity were revisited. Greater rigour was brought into calculus. Overall, this was good for mathematics and it did not rob engineering of its most useful mathematical tool.

Fourier series led to _Generalized Fourier Series_ expansions whereby the orthogonal basis need not be trigonometric functions. Any set of orthogonal functions could be used. From Fourier series, came the idea of _Fourier Transforms_. These are extensions of the same concept from the discrete to the continuous. Summation is replaced with integrals. Trigonometric functions are replaced with the exponential form due to Euler. While Fourier series applied to periodic functions, Fourier transforms extended the concept to non-periodic functions as well. Together, their use came to be called _Fourier Analysis_. Sometimes engineers referred to this as _spectral analysis_ or _harmonic analysis_ , since the analysis essentially transformed a signal in time domain to a signal in frequency domain given by the many harmonics. This meant that signals that are hard to analyse in time domain can be analysed in the frequency domain more easily.

Making of a Square Wave via Fourier Series

A few sine terms result in an approximate square wave. By using more terms in the series, we obtain a better approximation. Each term is orthogonal to another. (a) Use of two terms: sin(x) + sin(3x)/3. (b) Use of five terms: sin(x) + sin(3x)/3 + sin(5x)/5 + sin(7x)/7 + sin(9x)/9. (c) Use of twenty terms.

Applications of Fourier analysis in engineering are diverse. Its use in solving partial differential equations relating to vibrating strings or heat conductions are of historic importance. In the domain of image processing, it can be used to remove noise, which may appear in the frequency domain as isolated spikes. Image processing is an excellent example to demonstrate that Fourier analysis is not limited to time-varying signals. It can very well be applied to signals defined in space. In chemical engineering, Fourier analysis can be used to study the diffusion of solutes in liquids. The study of water flow through porous media as relevant to groundwater basins is one more example where Fourier analysis comes to engineering aid. Many natural phenomena are periodic and Fourier series suits them perfectly—recurrence of sunspots, phases of the moon, or frequency of brain waves at various stages of sleep. When William Thomson analysed the behaviour of electrical signals in long transmission cables in the 1850s, he adopted the approach suggested by Fourier. He could do this because his analysis yielded a differential equation that was similar to Fourier's heat equation, the difference being that temperature was replaced with electric potential along the wire.

Most people today are familiar with Magnetic Resonance Imaging (MRI) due to its extensive use in modern medical diagnosis. MRI has become popular because it is non-invasive, non-radioactive, and gives excellent images. MRI is a good example of effective use of many scientific principles in the practice of engineering. The launch of subatomic physics with J. J. Thomson's discovery of the electron led to later discoveries of the proton and the neutron inside the atomic nucleus. Protons carry positive charge and they too spin. A hydrogen atom has a single proton within its nucleus. Hydrogen is also abundantly found in the human body. Knowledge of Oersted's electromagnetism and amperian currents was applied to MRI technology. Fourier analysis assisted engineers in the applying the right electromagnetic pulses that can be effectively absorbed by hydrogen nucleus. It also helped in imaging the human body when the absorbed energy was released by the nucleus.

For MRI to work, the main magnetic field should be about 40,000 times greater than earth's magnetic field. Such high magnetism requires high currents. By applying Ohm's Law, engineers achieved it by reducing resistance to extremely low levels. Called _superconductivity_ , the phenomenon had been discovered years earlier in the 1910s. The technique to reduce electrical resistance was to cool the electric coils to temperatures as low as 4 degrees Kelvin, something that can be done using liquid helium. The technique of reaching such low temperatures is in itself an achievement of engineering. Finally, modern computers are employed to control the magnetic fields, trigger electromagnetic pulses, perform detection, and process the images. Many image slices are generated so that the radiologist has a three-dimensional view of the human body. In addition, he can manipulate these images to enhance certain features to aid in diagnosis. MRI is a complex technology that has evolved over many decades. It is an engineering application of a series of discoveries of natural phenomena.

Electrical engineers were at one time called electricians. It was only when science elevated engineering did they truly become engineers. Despite their many contributions that make this world a better place, despite their many gifts of technology that improve lives, engineers are less well known in the eyes of the public. Most of us can probably name a few scientists but will be tongue-tied when asked to name some engineers. Possibly one may name Scott Adams's Dilbert, the only engineer in popular public view and open to satire. We all know Einstein as a great scientist but how many of us know that he was briefly an engineer, who in 1930 obtained a patent for a new type of refrigerator? The popular conception that engineers fix things persists. Engineers are mostly seen as glorified mechanics. Worse still, when things go wrong, engineers are often at the receiving end of criticism. A classic quote from 1967 summarizes society's take on the engineering profession,

Every rocket-firing that is successful is hailed as a scientific achievement; every one that isn't is regarded as an engineering failure.

While society places value on doctors and teachers, engineers are seen in a less personal way since they work mostly with machines and not with people. Yet a single failure on a commercial aircraft can cause loss of many lives. This is the great responsibility that an engineer carries. A scientist may predict that an asteroid is likely to hit the earth six months from now, but it is an engineer who is entrusted with the difficult task of taking measures to prevent a disaster. An engineer has a moral obligation to society while a scientist is free from it. When Lise Meitner discovered nuclear fission in 1938, she had been wearing a scientist's hat. When America asked her to join the Manhattan Project to assist in the making of the atomic bomb, she felt uncomfortable. Her morals came into play when she was asked to make the transition from scientist to engineer.

Nonetheless, engineers soldier on unperturbed by society's partiality. Engineers rarely crave for recognition or honours from the public; but within scientific circles disputes over priority of invention are common. The Bell-Gray controversy and the Henry-Morse controversy are only two examples. Engineers are great problem solvers and problems are their primary motivation. They are on the constant lookout for improvements. By the end of the nineteenth century, electrical engineers had solved many problems. They had incorporated new developments of mathematics and science into engineering practices. They understood electricity much better than at the start of the century. They knew a great deal about currents, voltages, phasors, and techniques of signal analysis. Yet all this while, they had conveniently overlooked a fundamental fact of nature that would soon put their priorities into perspective.

#  0011 Appreciating Noise

**Nature's rich diversity** often hides an underlying similarity. No one suspected in the nineteenth century that something as simple as the electron would be a common feature in all types of matter, or that common to all living organisms is the biological cell. For that matter, at the start of the twentieth century, when the existence of the atom itself was still hotly denied by those who believe only by seeing, no one would have dared to lay claim that there was something common between botany and electricity. No two fields of science could perhaps be so different. Yet when one of the greatest debates in modern science was finally put to rest, the answer would directly benefit electrical engineers. The origins of that answer lay in botany.

In the summer of 1827, British botanist Robert Brown was peering through his microscope to observe the behaviour of some pollen he had immersed in water. Ever since the microscope was invented in the seventeenth century, and Dutchman Antonie van Leeuwenhoek had made numerous discoveries with it, a new whole world had been opened up to science. This microscopic view of the world revealed living organisms and structures more spectacular than anyone could ever have imagined. The microscope did to biology what the telescope did to Chappe telegraphy. It gave biology a new direction, a new line of interpretation.

What Brown saw at first did not surprise him. He noticed that the pollen moved erratically within the fluid in which it was suspended. Pollen was organic matter and is directly responsible for fertilization in flowers. It was not unnatural to notice pollen particles move in lifelike manner. Researchers in the past had noted that such movement is possible due to currents in the fluid or due to evaporation. Brown convinced himself that the movements he observed could not be due to the surrounding fluid but due to the particles themselves. Brown called these particles molecules, a word that to most biologists suggested life.

In the true spirit of a botanist, he next considered pollen from different types of plants taken at different stages of development. In all cases the effect remained, though modified only in intensity and form. Brown's description of his experimental observations, written in the first person, leads the reader through his process of questioning, discovery, and interpretation. One gets the sense of being in the laboratory along with Brown and observing in vivid terms the motions of pollen—swimming, oscillating, moving, contracting, swelling, or turning. One gets the feeling that one is on the verge of a new discovery—perhaps a new species, a new fundamental form of life, or a window into the very origin of life on earth. It is easy to mistake such vivid description to be indication of life itself. Nothing would suggest otherwise and certainly none of Brown's predecessors who had occasionally seen such movement doubted in it the manifestation of life. It was at this point that Brown made the decisive move.

It is not clear what prompted Brown to consider replacing live pollen with pollen specimens of plants that had been long dead. His subsequent experiments convinced him that even dead pollen exhibited the same behaviour. This was a new discovery. Why should dead matter look alive under the microscope? He therefore proceeded to experiment on inorganic matter he knew for certain did not contain life—numerous types of rock, window-glass, volcanic ash, and even a fragment from the Sphinx. He then tried burning bits of cotton, wool, silk, or hair, extinguishing them before they charred and subjecting them to the same examination. Using drops of oil mixed with water, he eliminated the usual physical causes of circulating currents and evaporation. Thus, in true spirit of scientific enquiry, Brown left nothing to chance, so that every doubt was disposed of with experimental observation.

What had started as an investigation into the reproductive mechanisms in plants, now seemed outside the scope of botany itself. Brown took the view that the particles wriggled and moved not because they were alive but because of some fundamental property of the medium in which they were suspended. His publications give us no clue as to what might be causing the zigzag motions. Standing on unfamiliar ground, he was clearly not willing to speculate. Many years later, in his autobiographical account, the now famous Charles Darwin recalled of his association with Brown, including his visits of 1831,

He seemed to me to be chiefly remarkable for the minuteness of his observations and their perfect accuracy. His knowledge was extraordinarily great, and much died with him, owing to his excessive fear of ever making a mistake. He poured out his knowledge to me in the most unreserved manner, yet was strangely jealous on some points. I called on him two or three times before the voyage of the _Beagle_ , and on one occasion he asked me to look through a microscope and describe what I saw. This I did, and believe now that it was the marvellous currents of protoplasm in some vegetable cell. I then asked him what I had seen; but he answered me, "That is my little secret."

The greater secret, however, lay with nature. Once Brown had established that such vivid movements were not due to life forms, botanists gave up the problem to the physicists; but physicists of the period had their hands full. Ohm's Law had just been formulated and electromagnetism was a new and exciting science. No one had time to look into erratic movements of microscopic particles in fluids. Therefore, Brown's discovery went neglected for nearly three decades until someone noticed a link in a new theory. Strangely, this theory had nothing to do with either microscopes or particles suspended in liquids.

This was the new and developing science of thermodynamics. Thermodynamics deals with the relationship between energy and matter. Through thermodynamics one attempts to understand why matter is generally stable, and if it changes state, what conditions trigger such changes. Why does water become steam at a certain temperature? When heated and allowed to expand, why do gases expand so much more than solids or liquids? What is the relationship between pressure, volume, and temperature? These are the questions that thermodynamics attempts to answer. The challenging aspect of this new science was that heat, work, energy, pressure, volume, and temperature were all familiar terms known even to the layperson. Thermodynamics had the tough task of rethinking familiar concepts from a scientific perspective and formulating theories that could explain as well as transcend everyday experiences.

Thermodynamics is one science that started with a basic scientific understanding but was soon overtaken by industrial application. The early scientific principles were due to Robert Boyle, Jacques Charles, and Joseph Gay-Lussac. Together they established the fundamental relationship between pressure, volume, and temperature of gaseous matter. The first engineering application of this understanding was due to Denis Papin who invented the pressure cooker in 1679. Other applications proceeded in parallel with science, the most significant being the invention of the steam engine. The early decades of the eighteenth century saw the commercial success of Thomas Newcomen's engines. These were soon surpassed by those of James Watt, engines of much higher efficiency. Steam power thus became one of the cornerstones of the Industrial Revolution. Engineers began to ask if one could design engines of higher efficiency. They began to investigate the fundamental limits, if any, for the conversion of thermal energy to useful mechanical work. The old laws of Boyle, Charles, and Gay-Lussac, though still valid, had developed rather slowly. For all purposes, they seemed to have hit a dead end.

Indeed thermodynamics might have died then and there had it not been for Frenchman Sadi Carnot, who not only gave it a new lease of life but also redefined it in the modern sense. Rightly called today as the father of thermodynamics, Carnot was a student of such eminent professors as Joseph Gay-Lussac, Siméon Poisson, and André-Marie Ampère. Unlike his contemporary Fourier, who gave a mathematical basis for heat propagation, Carnot sought a physical explanation to the workings of steam engines. Common approaches prevalent at the time were to increase pressure or use a different working fluid. The first thing Carnot did was to set aside the complications of real-world steam engines. He imagined an idealized engine and argued in abstraction. In doing so, he could successfully ignore imperfections of engineering construction, loss due to friction, or even the type of working fluid.

What Carnot showed in 1824 was that efficiency depended solely on the temperatures of the reservoirs, one hot and one cold, with which the engine operated. In fact, no other factor influences efficiency and no other type of engine could achieve a higher efficiency. In presenting his arguments, Carnot disagreed with the prevalent _caloric theory of heat_. Work was not due to loss of caloric, a subtle substance that caused heat. Work was due to transfer of caloric from hot to cold bodies. The decisive blow to the caloric theory came in the 1840s when James Prescott Joule proved from experimental work that heat was equivalent to work. They were simply different forms of energy. At first, the idea was contrary to current thinking but by the end of the decade, many came to accept it. Caloric theory was soon relegated to the annals of history. Heat was not a fluid as previously thought. It was a form of energy.

Joule, not widely recognized for his work when it was first published, had adopted such novel approaches of experimentation and deduction that it baffled even the experts for years. Based in Manchester and quite removed from the scientific thinking of London, this isolation might have contributed to his unconventional methods. What separated Joule from the crowd of physicists was that he came from the background of electrochemistry. To this, he added his own conviction that chemical, thermal, mechanical, and electrical energies were all equivalent. In a series of experiments, using no more than an electrolysis cell, a voltaic cell, a galvanometer, a mercury thermometer, and a rotating coil carrying magnetically induced current, he measured currents and temperatures to derive accurate conversion factors and energy losses. His results are surprisingly accurate even by today's standards. It is therefore fitting that the SI unit of energy, the _joule_ , is today named in his honour.

From here was born the _First Law of Thermodynamics_ , which states that energy is conserved. It is neither created nor destroyed but merely transferred from one form to another. This may be obvious to us in hindsight but back then, Carnot had believed in conservation of heat, not of energy. With the dispelling of such long-standing myths, thermodynamics was now all set on the road of development with far-reaching implications. The science of mid-nineteenth century would belong to thermodynamics. Central to its early development were many who can be grouped into two categories—the English including William Thomson (Lord Kelvin) and James Clerk Maxwell; and the Germans including Rudolph Clausius, Julius Robert Mayer, and Hermann von Helmholtz. All were indebted to the early work of Carnot; but the children of French Carnot, being half-English and half-German, squabbled over priority.

It is surprising to modern day scientists and engineers that Helmholtz, Maxwell, and Thomson were major contributions in electromagnetism as well as in thermodynamics. True genius is rarely limited in scope or application. Motivation to a scientist lies in understanding nature and the domain of work is of least importance. Thomson, at first sceptical of Joule's ideas, brought an open mind to his own enquiry. In time, he not only accepted Joule's work but also collaborated with him. While Joule experimented, Thomson supplied the theoretical foundations and suggested new experiments.

Classical thermodynamics as laid down by Clausius and Thomson looked at macroscopic properties of gas and relationships among them. Some scientists started to ask what caused them. Daniel Bernoulli, whom we had encountered earlier in relation to vibrating strings, had given one of the first explanations in 1738. He had argued that a gas was composed of little particles in random motion. When they collided against the walls of the container, the macroscopic effect is what we call pressure. This explanation, which had long been ignored, found favour more than a century later. Clausius agreed that molecules in motion caused pressure but he went further to say that temperature was a reflection of the average kinetic energy of molecules. Thus, macroscopic properties were explained at a fundamental level based on microscopic molecules. This was the beginning of the _kinetic theory of heat_.

When a puff of cigarette smoke rises in the air, it diffuses slowly and in no particular direction. When an odour is released at one corner of a closed room, it takes some time for it to be sensed at the opposite corner. Clausius was puzzled by this delay. He knew that molecules travel at incredible speeds. He reasoned that molecules moving randomly through the room collide with one another. The path is never direct and always zigzag. He then went on to calculate the _mean free path_ , that is, the mean distance a molecule travels without experiencing a collision. This being a small value explained the diffusion delay. These developments happening in the 1850s were exciting on their own but no one yet noticed the similarity with the microscopic observations of Robert Brown some thirty years earlier. If only the founders of thermodynamics had taken note of Brown's work, thermodynamics might have evolved very differently.

Brownian Motion

(a) Cigarette smoke diffuses as it rises up. (b) A drop of ink diffuses in still water. Reprinted with permission of Daniel Harari. (c) A computer simulation shows the Brownian path of a particle in motion.

Clausius was the first to make the link between molecular kinetic energy and temperature. Although Clausius did realize that not all molecules could be moving at the same speed, he did not emphasize on this point. He did his calculations based on a supposed average speed. To Maxwell, this was too gross an approximation. Relying on his mathematical physics, he showed in 1860 that it was simply not possible for all molecules to have the same kinetic energy. The random nature of molecular collisions meant that some molecules slow down and others speed up. Many will be spread around the average speed of the entire gaseous sample. He went on derive a statistical distribution of molecular speeds and hence of kinetic energy.

When Austrian physicist Ludwig Boltzmann entered the University of Vienna in 1863, these ideas were still new and controversial. Boltzmann got interested in Maxwell's version of the kinetic theory of heat and went on to do his PhD dissertation on the subject. It took him many more years to derive the now famous _Maxwell-Boltzmann Distribution_ that formalized the spread of molecular speeds. When a gaseous substance was heated, the effect was to increase the kinetic energy of molecules on average. At the same time, the peak of the distribution reduced since there were now many higher energy states spread about the average. In addition, Boltzmann showed that given any initial state of the molecules, they would eventually evolve to the Maxwell-Boltzmann Distribution.

Suppose we take a container with two partitions, one with cold gas (slow molecules) and the other with hot gas (fast molecules). Then we remove the partition. The two gases will not remain in their partitions. They will mix freely and end up according to Boltzmann's predictions. Not only that, it is highly unlikely they will separate and congregate into their original partitions. In arriving at this evolving probability distribution, Boltzmann introduced to the world the concept of the random process, more formally known to physicists and engineers as the _stochastic process_. Randomness arises from molecular collisions. Molecules being too many cannot be practically analysed using traditional Newtonian laws. However, precisely because there are so many molecules to consider, the macroscopic properties of the system can be derived from a statistical analysis of the microscopic.

While the focus had always been on gaseous matter, the theory worked well for liquids and solids as well. Molecular collisions are to be replaced with vibrations. Thermodynamics had established a fundamental fact that any matter above absolute zero temperature was in motion on a microscopic scale, although by common experience the same matter might be at rest. It was now a small step to link this with Brown's observations, which by now had acquired the name of _Brownian Motion_. During the 1860s, Christian Wiener and Giovanni Cantoni were among the first to suggest that Brownian Motion could be due to thermal energy within the fluid. Others erroneously proposed that it could be due to heat transferred to the medium, heating it non-uniformly and causing currents. Experiments confirmed that this could not be the case. By the end of the next decade, the general opinion was that it was due to molecular impact of the surrounding fluid.

German botanist Karl Nägeli disagreed with this interpretation. Nägeli had been studying the spreading of fungus by air. He had with him estimates of dimensions and speeds of fungus and air molecules. Fungus particles were 300 million times more massive than air molecules. There was no way they could be pushed around. English chemist William Ramsay expressed a similar view when analysing suspended particles in water. Their opinions seemed to make sense for a while until French physicist Léon Gouy offered an alternative explanation. It is true that water molecules are a lot smaller compared to pollen, but because there are so many of them it is quite likely that many of them gang up together and attack the pollen at the same time. Actually, water molecules do not perform a concerted attack. It happens purely by chance that many of them collide with pollen in the same direction resulting in a movement visible through the microscope. It is statistically probable that such "coordinated" collisions occur, although Gouy did not exactly bring in concepts of probability or statistics.

Thermodynamics supplied the first clue towards an explanation of Brownian Motion. Statistical mechanics of Maxwell and Boltzmann supplemented the classical thermodynamics of Clausius and Thomson. By the end of that century, thermodynamics had stated its First Law and its Second Law. The laws served to explain most known phenomena relating to heat. The Second Law in particular had implications beyond what anyone had ever suspected. It explained why we could never build perpetual machines, why steam engines can never fully convert steam power to mechanical work, and why we need energy and effort to main order. Before this scientific success could be celebrated, other physicists questioned the very foundations on which thermodynamics had been built. They could not ignore the strides of progress that thermodynamics had made, because if they quietly accepted it, the implications would transform all of science at its core.

The first aspect of their attack was the statistical nature of proofs and methods employed. From the time of Newton, physics had been an exact science. Given position, mass, and velocity of a particle, and an external force of impact, the physicist could calculate precisely where the particle would be in future time. In their view, the world was deterministic. Pierre-Simon Laplace, sometimes called the Newton of France, took an extreme view that would have surprised Newton himself. What is now called _Laplacian Determinism_ , it implied that given the state of the world today and all the physical laws, one could predict the future. The statistical approach adopted by Maxwell and Boltzmann was contrary to this accepted view of physics. Even mathematicians, for that matter, had ignored for centuries the branch of probability and statistics until the seventeenth century. Chance was associated with the will of God and it was not man's prerogative to attempt even to study it. Ironically, Laplace published his own views on probability in 1774, but his was only a mathematical and philosophical treatment. It was never applied to physics. When Napoleon was given a copy of Laplace's masterpiece on celestial mechanics, Napoleon enquired if it was true that Laplace had not mentioned the Creator even once. Laplace replied, "Sire, I have no need of this hypothesis."

The second objection was much more serious, going all the way back to Greek philosopher Democritus. Statistical mechanics and the kinetic theory of heat had relied a little too much on atoms and molecules. In their exuberance to offer explanations, they had overlooked the validity of their assumptions. Where was proof that atoms existed? Matter as we normally perceived is continuous. To say that matter is made up of atoms was simply unhealthy imagination. Chemists for their part had long accepted the existence of atoms in the tradition of Englishman John Dalton (1766-1844). Italian physicist Amadeo Avogadro (1776-1856) even claimed that the volume of any gas related directly to the number of molecules present. These early views were somewhat philosophical without experimental evidence or mathematical backing. Not everyone took them seriously.

On the other hand, Maxwell and Boltzmann had built the entire science of thermodynamics based on molecular collisions. Just because the results were consistent with that of classical thermodynamics did not mean the approach was right. It did not prove that atoms existed. Wilhelm Ostwald and Ernst Mach were the most vocal of the anti-atomists. Even J. J. Thomson's discovery of the electron did little to change their opinion. If you say atoms exist, show them to us, they said. Much of their opposition was targeted directly at Boltzmann. Boltzmann's had no satisfactory answer. For years, his work was publicly criticized. In the end, he took his own life and was labelled as having lost his mind. With the perspective of time, we celebrate him today as a martyr of science.

By the start of the twentieth century, the scientific community was divided—the atomists and the non-atomists; the descendents of Laplacian Determinism and the believers of Boltzmann's statistical theories. Is the world made of atoms or is it just continuous matter as we see it? Some mathematicians believed that God created integers. Everything else, reals and irrationals, were creations of men. Central to this argument was the concept of infinity. Infinity was something no one really understood. Even the great mathematician Gauss avoided absolute infinities and preferred to view them through limits and convergence. Towards the end of the nineteenth century, mathematician Georg Cantor said there were many more irrationals than integers. Between any two integers, one could name infinity of irrationals. The physical relevance of this mathematical debate was whether matter could be subdivided infinitely until the atom itself stood isolated and identified. If infinity really existed, would we ever arrive at the atom? This became the greatest question of all-time. One person connected the dots and saw in Brownian Motion an opportunity to prove the existence of atoms.

To say that Albert Einstein wanted to explain Brownian Motion is a misunderstanding. Einstein had the knack of selecting the most important problems of the day. He chose Brownian Motion because if he could relate it to measurable properties as well as molecular dimensions, it would prove beyond doubt the existence of atoms. Many before Einstein had linked Brownian Motion to thermal energy at the molecular level but no one had proposed a method for experimentation. There remained a gap between theory and experimental verification. Einstein closed this gap and informed the scientific world exactly what was to be measured.

1905. That's the year when Einstein published all his major theories—Special Theory of Relativity, Photoelectric Effect, and Brownian Motion. It was as if all ideas that had been taking shape in his mind over many months, suddenly matured, and burst out into the open. But by March that year, Einstein, then pursuing a doctorate programme with the University of Zurich, was yet to submit a suitable dissertation paper. His first three submissions had been rejected. One of them would later be published in June the same year and would become famous as the "relativity paper." Something built solely from concepts and reasoning was alright for publication but certainly not good enough for a PhD dissertation. Physics of the day emphasized experimental work and observable phenomena. Faced with this challenge of meeting mundane demands of academia, Einstein submitted in April a method by which molecular dimensions could be calculated and experimentally verified. He made use of two well-known laws of physics—fluid viscosity due to G. G. Stokes and osmotic theory of J. H. van't Hoff. Closely related to his dissertation submission, came his paper of May that was published in the German journal _Annalen der Physik_ , a paper titled "On the Movement of Small Particles Suspended in Stationary Liquids Required by the Molecular-Kinetic Theory of Heat."

In the very first paragraph of this paper, Einstein states his intentions; and in the second paragraph, the consequences. To learn how one should write technical papers, one should read Einstein. His writings are a window to the clarity of his thoughts. In just a few lines, he lays bare the important question that is at stake. He wastes no time in informing the reader that this is not just about Brownian Motion but something far grander,

If the movement discussed here can actually be observed (together with the laws relating to it that one would expect to find), then classical thermodynamics can no longer be looked upon as applicable with precision to bodies even of dimensions distinguishable in a microscope: an exact determination of actual atomic dimensions is then possible. On the other hand, had the prediction of this movement proved to be incorrect, a weighty argument would be provided against the molecular-kinetic conception of heat.

Einstein was the first to adopt a probabilistic approach to analysis as well as apply kinetic theory to liquids. First, he derived the diffusion coefficient in terms of temperature, gas constant, Avogadro's Number, fluid viscosity, and radius of the particle that is experiencing Brownian Motion. Next, he wrote down the partial differential equation of the particle in time and space. This diffusion equation was the same as Fourier's heat equation for temperature. However, Einstein did not need Fourier's trigonometric series since he was not looking for a general solution. He instead studied the displacement over time of a single pollen particle dropped in water. In technical jargon, this sort of an initial condition is called an _impulse_. The system's behaviour to such an input is called _impulse response_. Einstein was attempting to evaluate this. The solution is the well-known _Gaussian Distribution_ in space with time factored into it as deviation from the mean. Einstein's next step is almost magical.

By eliminating the diffusion coefficient, Einstein related the mean distance travelled in a certain direction to known constants, including the famous _Avogadro's Number_. That his number would play such an important role one day would never have occurred to Avogadro. In fact, the number itself was named decades later. With Einstein's equation, one could actually measure the mean distance travelled. Such a measurement had not been possible with the mean free path of Clausius. This displacement was proportional to square root of time. Moreover, one could arrive at a precise value for Avogadro's Number. No one had done this earlier via Brownian Motion. If someone could verify this experimentally, there would be no doubt that atoms existed. Equally important is the fact that a deterministic number could be arrived at by measurements of a stochastic quantity.

French physicist Jean Perrin was awarded the Nobel Prize for Physics in 1926 for his experimental work that finally convinced even sceptics as Ostwald about the existence of atoms. Using various methods, he derived Avogadro's Number and the one obtained via Einstein's analysis of Brownian Motion gave an agreeable value. The scientific world took a new interest in Brownian Motion. Mathematicians started to study the properties of the Brownian path, that is, the path traced by a particle of pollen. They had been working on _pathological functions_ for sometime, functions that were continuous but nowhere differentiable. In other words, these functions change direction at every point. These were only mathematical curiosities and mathematicians never for once expected to find such functions in nature. To their surprise, Brownian paths exactly fit the description of pathological functions. Paul Langevin introduced to the world _stochastic differential equations_ , which became a new and important tool for mathematical physics. Slowly, everyone started to accept statistical analysis in science.

Einstein himself developed the treatment of stochastic processes over many years following his seminal paper on Brownian Motion, only this time he studied light and blackbody radiation. Notable among his ideas in this field is the relationship between _autocorrelation_ of a signal in time and its _power spectrum_ in the frequency domain. The former says how well a signal is correlated with time-shifts of itself. The latter represents the distribution of the signal's power with respect to frequency. The two are directly related by Fourier transform and Einstein saw this as early as 1914. Today this relationship goes by the name _Wiener-Khintchine Theorem_ but Einstein had done it long before either Norbert Wiener or Alexandre Khintchine. This theorem is essential to signal analysis in communication engineering. Ironically, Einstein, who had applied statistical methods successfully to Brownian Motion, would later object to its use in quantum mechanics.

Brownian Motion has a special place in physics. It is the link between a deterministic world and a world of uncertainties. Nothing in life functions without heat. Thermal vibrations at a molecular level are persistent so long as matter is above absolute zero temperature. At the same time, thermodynamics tells us that one requires almost infinite energy to cool matter down to absolute zero. Therefore, there is no escape from uncertainty. When communication engineers transmit signals, be it for telegraphy or telephony, this uncertainty plays its part. What is received is usually not an exact copy of what was sent. Engineers have to overcome this uncertainty and make communication possible.



**In the beginning,** engineers used a pair of wires for telegraph transmission but later for cost reasons they replaced one of them with ground return. This had worked well for telegraphy but caused problems for telephony. Telegraph signals were composed of only dots and dashes. Telephone lines on the other hand carried continuous voice signals. Static charges picked up via the ground affected continuous signals more easily. This was the first realization of noise in electrical circuits. The solution was quite simple—reintroduce the wire pair and do away with ground return.

Graham Bell realized early on that braiding the two wires improved transmission characteristics. This was the beginning of the twisted wire pair, commonly used even today. When telephony expanded and distances increased, it was seen that transmission on one wire affected signals on the other. This phenomenon was termed _crosstalk_. Crosstalk was lot worse in cables that bundled many telephone circuits together. A quick solution to this problem was to separate transmission and reception circuits, giving rise to four-wire circuits. Improved insulation in cables also reduced crosstalk.

Another problem with long lines was echo. Echoes occur when impedances along the signal path are not perfectly matched. Signals get reflected and the speaker hears himself after a short delay. This is just one type of echo among many. To say the least, echoes are annoying. Again, the use of four-wire circuits alleviated the problem.

Soon engineers realized that noise comes in different shapes and sizes. That signals got weaker as they travelled farther was known since the early days of electricity. William Thomson had claimed in 1854 that signals got weaker in proportion to square of the distance. _Signal attenuation_ , as engineers call this, was a well-known phenomenon by the time telephony became commercial. Open wires picked up environmental noise and although underground cabling avoided such noise, it weakened signals because the wires were a lot thinner. Engineers inexperienced in the science of long-distance transmission may naïvely propose to solve the attenuation problem by simply pumping enough power into the signal so that it comes out alright at the other end. Experienced engineers know that the amount of power required to do this is rather impractical if not impossible. To communicate in this manner halfway across the globe using copper circuits would require an estimated 50,000 trillion times the total power radiated from the sun; or to talk across the North American continent by radio would require 1.80 x 1029 kilowatts.

Fortunately, engineers knew of an alternative—amplify signals often enough along the line before noise had a chance to takeover. Solving attenuation for telegraphy was rather simple. Intermediate stations received the messages and forwarded them to the next intermediate station towards the destination. Even Chappe telegraphy had done this. Thus, signal was sent as far as possible, this determining the number of intermediate stations along a line from sender to receiver. The compromise was on speed of communication since multiple reception and forwarding en route incurred delays. No one complained because it was vastly better than the old days of carrier pigeons and post-chaises. However, for telephony, the solution had to be more sophisticated.

One fundamental difference between telegraphy and telephony was that the former need not be real-time. By this, we mean that a message sent at one end need not be received immediately. Likewise, no immediate response is expected. This is very much like today's electronic mail systems. The message is stored and the receiver can pick it up at any convenient time. Telephony on the other hand is real-time. The nature of telephony is live interactivity between communicating parties. Thus, a system of storing and relaying messages via intermediate stations would not work for telephony. If intermediate stations have to relay signals to cover longer distances, they have to do it with minimal and imperceptible delay.

It may surprise modern engineers that these intermediate stations, also called _repeaters_ , were initially mechanical systems. The repeater received electrical signals and converted them to sound using a conventional telephone receiver. The vibrating diaphragm of the receiver also doubled as direct input to a telephone transmitter, which generated electrical signals to be sent out to the next intermediate repeater. This was the state of the art in the first decade of the twentieth century. The problem with these repeaters was that one could not indefinitely keep adding repeaters along the path of a long-distance communication line. Costs would mount up. Transmission delays would increase. Imperfections in mechanical apparatus and frequent conversions between sound and electric currents added noise to the signal at each repeater. The upper limit was just three repeaters in tandem. Anything more, speech got noisier and annoying. Something had to be done to solve this; but this was only one of many problems that telephone engineers faced at the time.

More problematic for telephony was _signal distortion_. Telephone lines are designed to carry human speech from frequencies a few hertz to about 3,300 hertz, _hertz_ being the modern unit of frequency. Since line capacitance attenuates signals in a frequency dependent manner, distortion occurs. Distortion is a serious problem. It garbles speech and makes it unintelligible. Communication becomes impossible. In fact, it was distortion that Bell had alleviated by twisting wires but his simple approach was clearly not going to work on longer lines.

Solution to the distortion problem is one of the classic examples in which science directly helped engineering. It was a solution that found immediate use in the Bell System at the turn of the century. When it came out, it was timely and saved AT&T close to a hundred million dollars in installation cost during the first quarter of the twentieth century. The story of this invention is quite remarkable. It comes with its own controversy over priority and the well-known tussle that sometimes happens between science and engineering.

Michael Idvorsky Pupin was born in a peasant family of Serbian descendent. His village of Idvor was too small to either contain his dreams or satisfy his ambitions. Pupin inevitably felt the calling of America. If the history of telecommunication engineering looks somewhat biased towards American contributions, it is because the United States has welcomed the best minds from all across the globe. It was then as it is now, a land of opportunity, perhaps more so then due to relaxed immigration laws. With little more that a Turkish fez on his head and five cents in his pocket, Pupin arrived on Manhattan Island at the age of sixteen. He worked on farms and took up odd jobs in cities. He learned the American way of life, read widely, took good advice, all of which prepared him well for college. He did so well in the entrance examinations that he obtained a tuition-free admission into Columbia University. Despite his wide knowledge and interests, he was soon drawn to the electromagnetism of Faraday and Maxwell.

Once the Norwegian mathematician Niels Abel was asked how he had managed to achieve so much at such a young age. Abel supposedly had replied, "By studying the masters, not their pupils." It was so with Pupin. He learnt French so that he could study the original works of Laplace, Lagrange, and Ampère. He travelled to the University of Berlin to learn first-hand from Hermann von Helmholtz the latter's take on Maxwell's theories. It was under Helmholtz that he completed his PhD in 1889. The same year he returned to Columbia University as a teacher in the Department of Electrical Engineering. The life of Pupin is a heart-warming tale of someone from a modest background epitomizing the dreams of all immigrants to the United States. He is also perhaps the only engineer to win the Pulitzer Prize, in 1924, for his autobiography, _From Immigrant to Inventor_. What exactly did Pupin invent?

It was known in Pupin's time that the problem of distortion was due to capacitance. One solution was to increase the distance between the wires but this was very much impractical in the field. Even when the science of underground cabling was mastered with better insulation, costs and rights of way dictated that wire pairs normally followed the same paths. Circuits were often encased within the same cable. An alternative and more practical solution was to neutralize the capacitance by increasing circuit inductance. Since capacitance was distributed all along the wire, inductance too had to follow suit. Idea is one thing, execution quite another. No one knew how much inductance to add and how to distribute it uniformly along the wire. Ad hoc approaches had been tried with limited success. Simply put, engineers had not achieved a theoretical understanding of the problem.

British mathematician and electrical theorist, Oliver Heaviside, had already established the underlying theory as early as 1893. In fact, Heaviside and Josiah Willard Gibbs are credited with inventing the modern method of vector analysis. Maxwell had expressed his equations of electromagnetism in unwieldy forms based on Hamilton's quaternions. Heaviside rewrote these equations in vector forms and that's the way they stand today. Until Heaviside looked into the problem of signal propagation along transmission lines, theorists had used Fourier's diffusion equation. William Thomson had done just that in the years of undersea transatlantic project. Now Heaviside used the power of Maxwell's equations to analyse transmission lines in terms of transverse electromagnetic waves travelling along the wire. In the end, he derived what we call the _Heaviside criterion_ , that there is a definite relationship between resistance, capacitance, inductance, and leakage. If this criterion is met, said Heaviside, the signal would travel with minimum distortion.

In telephone lines of the time, inductance was far lower than what was required by Heaviside criterion. Naturally, the solution was to introduce extra inductance into the line. Despite the urgency of the problem to the industry, no one took note of Heaviside's proposals. The problem was that engineers did not understand Heaviside's theory. Heaviside on his part did not take the trouble to simplify his knowledge or translate it into practical numbers that engineers could use. Secondly, Heaviside's proposal was against engineering intuition. Engineers had often used inductors in the past only to introduce impedance into circuits. They couldn't see any benefit in adding inductors to solve the distortion problem. Moreover, in the 1880s, phasors were a new invention and their use in electrical analysis had not yet crept into engineering practice.

It was John Stone who first applied Heaviside's theory. He increased circuit self-inductance by using a bimetallic wire of iron and copper. Stone's primary intention was to eliminate reflections in transmission lines, such reflections being caused due to mismatch of impedances between line and load. He was granted a patent for his invention in 1897 and although he made no particular reference to the distortion problem, he was aware that his solution not only solved reflection but also distortion. When Stone left AT&T in 1899, a promising young engineer named George A. Campbell was tasked with continuing Stone's work.

Campbell immediately saw that such bimetallic wires would be costly. He looked for an alternative approach. His mathematical ability served him well to understand and apply Heaviside's theories. Instead of increasing circuit self-inductance uniformly, he argued that adding inductive coils at specific locations along the wire would result in a similar behaviour. Heaviside himself had suggested this approach but it was Campbell who expressed Heaviside's equations in terms of coil impedance and the distance between successive coils. Thus, he made theory practical for engineering application. He went further and tested his proposals on cables similar to those being used in the field. By September 1899, he had demonstrated quality voice conversation over a 46-mile circuit, what was once possible at best at half the distance without inductive loading. His work got noticed high up in the organization and work commenced on patenting the invention. Patent application was filed in March 1900 and the invention went into active commercial use within a couple of months.

Unfortunately for Campbell and AT&T, their patent application conflicted with another prior application. Pupin had filed his own patent in December 1899, just three months earlier to Campbell. Although Pupin had started working on the problem way back in 1894, his ideas matured only in 1899. If only Campbell and the Bell patent attorney had known the urgency of the situation, they might have filed earlier. What is more interesting is that Pupin's inspiration came from a completely different source and apparently unrelated to electrical signal propagation.

Among the originals works of the French masters that Pupin read, was Lagrange's approach to the vibrating strings. Thus, once more and rather unexpectedly, we start at the same point that Fourier had started almost a century ago for investigating heat propagation. Lagrange had modelled the vibrating string using discrete point masses distributed uniformly along the string. This appealed to Pupin and he saw in this an analogy to the problem at hand. Inductors would perform a similar function of retarding signals and thus preserve their form. Distortion would be removed. In his patent application of 1899, he described it thus,

The magnetic energy of the current corresponds to the kinetic energy of the vibrating string, and just as a dense string transmits mechanical energy more efficiently than does a light string so a wire of large self-inductance per unit length will under otherwise the same conditions transmit energy in the form of electrical waves more efficiently than a wire with small self-inductance per unit length, for a wire of large self-inductance can store up a given quantity of magnetic energy with a smaller current than is necessary with a wire of small self-inductance.

Pupin's Inductive Loading of Transmission Lines

(a) Pupin draws an analogy with vibrating strings by showing that an unloaded string suffers signal attenuation. (b) A string loaded with point masses achieves maximum transfer of energy. (c) Loading coils placed at regular intervals on an electrical transmission line achieve the same effect. (d) Pupin's toroidal coil. Source: (Pupin 1899, fig. 3, 7, 10, 11, 12).

Today most people recognize Pupin as the inventor of the loading coil, although modern commentators have claimed that the experimental work he did in a basement laboratory of the university was minimal. Support for Pupin is to some extent biased due to his popular autobiography. Nonetheless, Pupin did invent the toroidal inductor, a critical component for loading telephone lines. A conventional inductor did not work because mutual induction between the lines increased crosstalk. Pupin often published his work and was an active member of the American Institute of Electrical Engineers (AIEE), while Campbell rarely published anything. Heaviside for his part, like Joseph Henry, remained a scientist, never attempted to commercialize his research, and never took out a patent.

AT&T realized the importance of the loading coil early on. Before loading coils came into the industry, the distance of 1200 miles from Boston to Chicago appeared to be the limit for telephone conversations. Loading coils now practically doubled that distance. AT&T was not going to sit back and wait patiently for a conclusion to the patent litigation. This conclusion incidentally happened in 1904 but by then AT&T had bought Pupin's patent. By January 1901, Pupin had become richer by $200,000. That a university professor can own a patent and then handsomely capitalize on it is unheard of even today.

Telegraphy had spanned the North American continent in 1861 but by 1905, transcontinental telephony was still a dream. The success of Pupin's loading coils gave engineers confidence and hope that perhaps someday this dream would be realized. Sure enough, by 1911, the line from New York to Chicago was extended to Denver with equivalent performance. Reaching San Francisco remained a far cry. Pupin's invention alone would not lead them to the promised land. A whole new approach was needed. A young engineer who had just joined AT&T in 1904 voiced the same opinion. This was Frank Jewett, whose father had been an enterprising engineer in railways and electric lighting. Jewett certainly inherited his interest in engineering from this father. It was Campbell who personally spotted and handpicked Jewett at a chance meeting at the Massachusetts Institute of Technology (MIT) the previous year. Jewett's contributions to engineering from a purely technical perspective is unclear but his true genius lay in analysing system bottlenecks, identifying technical challenges, articulating his views clearly to higher management, and above all, recruiting the right people through his contacts with academia.

When AT&T President Theodore Vail echoed in 1908 the need of the industry, and gave his engineers the challenge to build a transcontinental telephone line by 1914, in time for the Panama-Pacific Exposition, he was not joking. He had confidence in his engineers and above all in Jewett who was put in charge of the project. To Vail, this was not a gimmick but rather a necessary response to the stiff competition that AT&T was facing at the time. Ever since the expiry of Bell's telephone patent in 1894, many new players had entered the industry. By the time Vail resumed leadership of AT&T in 1907, after an interlude in Argentina running successful enterprises of his own, half of the telephones were from non-Bell companies. To Vail, the future of AT&T lay in long-distance telephone calls, a premium market in which competition was slim. To make this a reality, he was committed to research and backed his engineers fully.

Jewett got down to work and undertook a detailed survey of the situation. From this came his significant insight, which was expressed in a memorandum of 1910,

to achieve this result it will be necessary to employ skilled physicists who are familiar with the recent advances in molecular physics and who are capable of appreciating such further advances as are continually being made, also that the work must be carefully supervised by someone having a full understanding of the requirements.

Jewett's contact with the academia now helped. He was aware that Robert Millikan of the University of Chicago had been working for some years on electronic repeaters. Millikan would later achieve fame for experimentally verifying Einstein's photoelectric effect as well as accurately measuring electron charge by his now famous oil drop experiment. Millikan responded with a candidate and in 1911, Harold D. Arnold joined Western Electric Engineering Department, Western Electric being the manufacturing arm of AT&T.

It was Arnold who made transcontinental telephony a reality but the story of his invention starts three decades earlier. Given the importance of this invention, it is worthwhile tracing it from its very beginnings. Like any story, it has its twists and turns entwined with the lives of actors that make it. There is no better way to start the story but with the greatest inventor ever, the man with more than a thousand patents to his name—Thomas Edison.

When Edison was only twenty-two, he obtained his first patent. It was for an automatic vote recorder, which unfortunately did not sell. Immediately, Edison resolved never to invent anything that didn't sell. This early experience honed his approach towards selecting problems that were relevant to industry and the needs of the people. His methods of research were just as different from others. When an idea occurred, Edison would make a sketch and discuss it with his team. His team would then attempt to realize his idea into a working model. Often the team would work entire nights, Edison himself snatching only occasional naps on laboratory benches. The team did state-of-the-art research in an environment of informality and open camaraderie.

In 1876, Edison started a dedicated R&D facility at Menlo Park, the first of its kind anywhere in the world. There was not going to be any manufacturing done at Menlo Park. It was purely R&D with the sole mandate to generate patents. Within the first two years, this facility produced about seventy-five inventions. Among them was an improvement of Graham Bell's telephone transmitter. Edison's use of carbon granules at the transmitter was not an overnight revelation. Edison and his team had tried more than two thousand different substances, some of which seem outrageously irrelevant—rubber, tobacco leaf, fish bladder, and ivory. Edison was never daunted by failures. To him, each failure was a discovery, one more possibility that he knew didn't work.

When the electric lamp was invented in 1879, commercial success required an entire ecosystem to generate and deliver electric power to the residents of New York. Edison and his team came up with an entire gamut of inventions—generators, power stations, distributions lines, switches, sockets, safety fuses, lamp fixtures, and meters. While all this was going on, Edison was also tinkering with the electric lamp itself in an attempt to improve its efficiency. One day in 1883, he noticed a strange glow at the far end of the bulb away from the incandescent filament. Edison modified the bulb to incorporate a long glass nose. A metal plate was placed at the end of the nose to trap the glow. When the plate was positively charged, current flowed. When it was negatively charged, there was no current. Frankly, Edison did not understand what was happening. He had little patience for scientific investigation. Like a true inventor, he documented his apparatus, described accurately what he saw, and took out a patent. This eventually became famous as _Edison Effect_. Unwittingly, Edison had discovered one of science's most important emission phenomena. This would one day give birth to electronics.

If only Edison had investigated the effect, he might have discovered the electron a full fourteen years before J. J. Thomson. In fact, the glow that Edison saw was due to electrons coming out of the filament. It is sometimes said that a bad idea is a good idea ignored. Edison Effect lay neglected for two decades, until John Ambrose Fleming working across the Atlantic found a use for it. As a consultant to the Marconi Wireless Telegraph Company, Fleming did something deceptively simple in 1904. His device was quite similar to Edison's. It had a filament placed close to a metal plate whose job was to collect the electrons. Instead of supplying direct current (DC) to the device, he supplied it with alternating current (AC). When the current went into positive phase (plate positive with respect to the filament), current flowed. In negative phase, there was no current. Fleming had simply extended Edison's observation. Effectively, he created a means to convert AC to DC, a process now called _rectification_. The device was called a _rectifier_ but because it had two elements (plate and filament), it was also named the _diode_. The diode thus became the world's first electronic device. The British, drawing an analogy with mechanical devices that allowed water to flow in only one direction, called it a _valve_. Fleming was so impressed by this that he commented,

So nimble are these little electrons that however rapidly we change the rectification, the plate current is correspondingly altered, even at the rate of a million times per second.

There is a reason why the world's greatest inventor missed out on the diode. Edison from the start was a staunch promoter of DC. When the electric lamp was invented, DC was the main form of current. Few engineering applications used AC. Moreover, AC generators were often linked to commutators that converted AC to DC. Motors that used AC directly were not proven until the late 1880s. In Germany, the Siemens brothers had invented in 1876 an AC generator to be used with Jablochkoff candles but when Edison's incandescent DC lamp came out, there was no further use for that generator. It looked for all practical purposes that AC had no future.

Nikola Tesla, a Serbian immigrant living in New York, went completely against established conventions and focused only on AC. Tesla had seen that commutator contacts attached to AC generators often sparked. Tesla started thinking about a machine that would be free of sparks. From this investigation, emerged his great idea of a polyphase system based on magnetic fields rotating out of phase with one another. In 1884, he joined Edison's company but Edison was dead against AC. Tesla quit the very next year. By 1888, he had devised a complete polyphase AC system for which he obtained a patent. The wonderful thing about AC was that it could be used to distribute power over long distances. DC usually operating at low voltages required large currents and thick conductors for long-distance distribution. Power was wasted with such DC distribution. On the other hand, AC could be easily raised to a higher voltage and distributed at low currents. Faraday and Henry had long ago discovered the transformer principle, later made practical by other inventors. The transformer was exactly what AC needed to raise and lower voltages.

George Westinghouse, an enterprising businessman, bought the necessary patents. He teamed up with Tesla and the system was demonstrated to raving success at the 1893 Columbia Exposition. Their greatest success came with the design and installation of a 11,000-volt three-phase system for the Niagara Falls Power System, which went into operation in 1896. In these years of the rise of AC, the battle between DC and AC was really a battle between Edison and Tesla, between General Electric and Westinghouse. Eventually, they cross-licensed their patents and agreed to jointly promote AC. There was no question now that AC was the winner for long-distance power distribution.

Thus, by the time Fleming resurrected Edison Effect from its long forgotten crypt, AC was of considerable interest in laboratories. Fleming himself had been researching into AC transformers since 1885. In 1896, he even reported the property of rectification but the idea for the value did not come to him until eight years later. This is a remarkable example of the fact that sometimes discoveries of science sit around waiting for an application. Fleming's real motivation towards his invention was that wireless telegraphy as an application was in need of a suitable detector in the wireless receiver (Chapter 11). Thus, to be fair to the genius of Edison, he was quite right to ignore the effect because there was really no application for it in the early 1880s.

Fleming's diode interested an American inventor, Lee de Forest, one of the science's most interesting characters. A graduate of Yale University, de Forest preferred to remain an independent inventor in the spirit of Edison, rather than work for someone else. The problem with de Forest was that he rarely gave credit to others whose work he freely used. He got into numerous legal disputes, sued others, and got sued in return. Very few in the scientific community respected him. Yet, it was de Forest who took up Fleming's diode and transformed it into something powerful. The principle he introduced into the diode is something we use even today in the twenty-first century.

It is not known what prompted de Forest to do what he did but this is what he did. He took a piece of nickel wire and introduced it between the filament and the plate of the diode. Nothing magical happened at this point. Based on a suggestion from his assistant, John Grogan, he then increased the surface area by crumpling the wire. He called this the _grid_. Then he connected the grid to a positive voltage so that electrons coming off the filament were more easily channelled to the plate. The remarkable thing with this setup was its high sensitivity. Even a small change of the grid voltage resulted in a much larger change in electron flow from the filament to the plate. De Forest patented his invention, which he called the _audion_. It has also been called the _triode_ due to the three elements inside the tube.

De Forest, unable to see any serious utility for his audion, sold it to AT&T in 1913 for a sum of $50,000. It was at this point that Harold Arnold, working under the direction of Jewett, started looking into the audion. He saw immediately that electron flow between filament and plate could be directly controlled by charges on the grid. The electron flow mimicked the signal input to the grid. In other words, a small signal into the grid was amplified in the same form as the current between filament and plate. What Arnold had created was the world's first electronic amplifier. To achieve reliable and efficient operation, the vacuum inside the tube had to be increased. He did this and thus was born the _high vacuum thermionic tube_ or _amplifier_. The filament, the plate, and the grid are today respectively called _emitter_ , _collector_ , and _base_.

Early Electronic Devices

(a) Parts of a diode. (b) Parts of a triode. (c) A sample of early vacuum tube amplifiers.

Arnold's important modifications to de Forest's audion and its direct application to telephony had come just in time for the planned exposition, which anyway had been postponed to 1915. Loading coils continued to be used but now electronic amplifiers complemented them with active signal amplification. It is worth mentioning that loading coils are _passive devices_ , in the sense that they do not introduce extra power into the transmission line. Amplifiers on the other hand are _active devices_. The veterans of telephony, Graham Bell and Thomas Watson, were asked to mark yet another milestone in history. From New York, Bell placed the first transcontinental call to San Francisco where Watson listened to speech that had gone through a few hundred loading coils spaced eight miles apart and eight vacuum tube amplifiers.

Arnold's key invention became the workhorse of the telephone industry for more than three decades. Apart from signal amplification, they were used as switching devices (Chapter 10) and eventually replaced the old electromechanical relays. These were used by the millions all across the Bell System and beyond. When radio communication became possible, they were essential components (Chapter 11). The first electronic computers were built out of these vacuum tubes (Chapter 8). Apparently, engineers had solved the noise problem through signal amplification. Little did they know that noise would not be so easy to defeat.



**It is commonly** known that understanding the enemy is the first step towards victory. For a long time communication engineers ignored this maxim and failed to understand their enemy—noise. They sort of resigned themselves to the belief that noise was ever-present. Thermodynamics had said so. There was simply nothing they could do to eliminate noise. All they could do was to increase transmission power of a signal and thus overcome noise. To their surprise, they were proved wrong. Whenever they increased signal power, they discovered that noise power increased too. Moreover, noise tended to propagate and accumulate over long transmission lines, often marking what appeared to be a fundamental limit to long-distance communication. This important problem required immediate attention. Given the phenomenal growth of telephony during the early decades of the twentieth century, there was a palpable sense of urgency among engineers to find a solution. Therefore, they paused in their routine work and turned to study the enemy.

There is a fine line between noise and system faults, which can be mistaken as fundamental noise. Poor electrical connections, variable battery outputs, or inadequate vacuum in tubes are really system faults that manifest as noise. Likewise, a mismatch between the transmitter and receiver parameters can introduce noise. Poor frequency and phase synchronization come under this category. Poorly designed amplifiers or amplifiers operated at the margins result in non-linear behaviour. Non-linearity causes signal distortion since the output does not follow the shape and form of the input. Engineers recognized this and adopted a two-pronged approach. One was to improve manufacturing quality and optimize system operation to alleviate all of the above. The other was to investigate fundamental noise sources.

Working from the German laboratories of Siemens, Walter Schottky became the first person to study electronic noise. Sometimes it happens that a technical paper not just publicizes a new discovery but launches a whole new science. Just as Einstein's 1905 June paper launched relativity theory, so did Schottky's 1918 paper launch the study of electronic noise. Schottky identified two main types of noise, which he named _schroteffekt_ and _warmeffekt_. The former, translated into English as shot effect, was due to the random nature in which electrons are emitted from the filament. Sometimes more electrons are emitted but at other times less. Electrons are also emitted at different velocities. These variations affected signal form and energy. Anything over which the electronics engineer has no control, and is generally not deterministic, interferes with signal and is therefore noise. The second type of noise, thermal noise, was due to the thermal energy of loose electrons in the conductor. Sometimes more electrons may flow in the circuit if they possessed enough energy to dislodge themselves from the surrounding ionic lattice. Thus, Schottky got down to the fundamental causes of electronic noise. Now further work could be directed towards analysing the details that might suggest solutions.

Meanwhile in the United States, Theodore Vail had been actively promoting his vision of a nationwide network of seamless service. His famous slogan was, "One policy, one system, and universal service." The key aspects of his vision were standardization, integration, and innovation. He travelled all over the country spreading his message at all levels so that in time, ideology was transformed into corporate culture. Though AT&T had acquired something of a monopoly status, Vail did give in to governmental pressures. In 1913, Western Union was divested from AT&T. Local operating companies were allowed to interoperate with the Bell System while AT&T retained control of toll and long-distance calls.

It is generally true that only with balanced competition there is motivation for innovation and progress. Despite the lack of competition, Vail did not let short-term goals mask the long-term needs of not just AT&T but of the entire telephone industry. It was under Vail's leadership that AT&T was transformed into an innovation powerhouse. There were two aspects to this innovation. AT&T's own engineering department focused on basic research. Basic research was necessarily aimed at developing new technologies that would cause paradigm shifts within the industry. The engineering department of Western Electric matched this by focusing on adaptive research. This was engineering research aimed at improving system performance and processes. Arnold's vacuum tube amplifiers came under this category. Arnold had improved and adapted de Forest's audion for application in telephone networks.

In 1925, realizing the importance of basic research, Bell Telephone Laboratories was incorporated. Jewett was appointed as its first President. Over many decades, this institution, commonly known as Bell Labs, produced numerous disruptive inventions. It is fair to say that the lifestyles we lead today would have been delayed by many decades, if not very different, without these inventions. The engineering of Western Electric was no less important. Any serious fault in design, if discovered too late, would mean replacement of millions of parts in the network, network shutdown, and loss of revenue. It is for this reason that though transatlantic undersea telegraphy had been realized as early as 1866, the first transatlantic telephone cable could be operated successfully only in 1956. The latter required undersea amplifiers that had to work without fault for many years. Years later, speaking at a centenary dinner in 1969, Paul Gorman, the President of Western Electric, summed up the company's commitment to quality and reliability,

when we build underseas amplifiers to carry messages across the oceans, they're built to perform faultlessly for a minimum of 20 years. Every component is made of specially selected materials and assembled under surgically clean conditions. Or consider the simple relay, which we make by the millions for use in telephone switching offices. They are designed and built to last for 40 years. They have to. An ordinary telephone call involves the operation of about 1,200 such relays. If just one fails, the call doesn't get through.

Bell Labs attracted some of the best engineers of the day. Among them were two Swedish immigrants, J. B. Johnson and Harry Nyquist. Johnson came across Schottky's paper two years after it was published but his first significant paper appeared eight years later in 1928. What could explain this delay of a decade? What really happened was that most engineers accepted Schottky's argument that shot noise dominated in vacuum tubes. T. C. Fry published in 1925 some theoretical work in agreement with Schottky's results. Johnson's early papers followed Schottky's findings. It was only when Johnson started serious experimental work in 1927 that limitations in Schottky's work came to the surface.

What Johnson discovered was that as filament temperature increased, shot noise increased as expected. However, after a certain maximum, shot noise dropped and eventually became almost negligible. This was due to a dense accumulation of charges near the filament, which smoothed the variations of current flow from the filament to the plate. In fact, it is in this region of temperature saturation that vacuum tubes are usually operated, implying that it was really thermal noise that dominated in electronic circuits and not shot noise as engineers had long supposed. Johnson also discovered other types of noise, though the fundamental causes were only conjectured or at best partly understood. One was 1/f noise that occurred at low frequencies. The other was ionic noise that occurred with tungsten filaments at high currents. Ionic noise was so high that it swamped both shot noise and thermal noise. Schottky himself had identified _fackelneffekt_ or flicker effect with certain filaments.

That Schottky discovered shot noise and Johnson discovered thermal noise is really a modern myth. It's just that Schottky saw that shot noise dominated circuits under certain conditions, chose to study that, and ignored thermal noise. The important realization that thermal noise dominated in vacuum tubes came to Johnson after testing a sample of hundred triode tubes and plotting the results, whereby it was seen that noise hit an abrupt minimum even when amplification was reduced. This was an unexpected finding and he discussed the same with his colleague. Nyquist had a PhD in physics from Yale University. Within a month, he provided a mathematical formula for thermal noise. Having seen the history of thermodynamics and Brownian Motion, it will not surprise us to know that Nyquist's formula was derived from thermodynamics and statistical mechanics.

Both Johnson's and Nyquist's papers were published back to back in _Physical Review_ , July 1928. The fact that noise power was independent of resistance or frequency was astonishing. In fact, noise power per unit bandwidth equalled _kT_ , _k_ being Boltzmann's constant and _T_ being temperature on the Kelvin scale. In other words, thermal noise was dependent only on temperature. More importantly, contrary to what engineers had assumed, noise was not "everywhere" in the spatial sense. The noise that really mattered was at the input of the receiver or amplifier. Yes, it's true that temperature determined thermal noise, but it's only temperature of the receiver or amplifier circuit that mattered. Temperature of the room where a person was speaking or temperature at the bottom of the Atlantic through which a telegraph cable passed had little relevance.

Johnson's paper is a window to the wonderful world of laboratory work. To correlate temperature with noise, Johnson varied temperature using Dewar flasks with materials as diverse as "boiling water, melting ice, solid carbon dioxide in acetone, and old liquid air." On the other hand, the amount of noise coming into the circuit or amplifier was dependent on the input resistance. To verify this dependence, Johnson used an even more diverse and somewhat ingenious range of materials to achieve resistances from a few kilo-ohms to a few mega-ohms—India ink on paper, carbon windings on lavite, platinum deposits on glass, and sulphuric acid in ethyl alcohol. One thing that unites the world's best scientists and engineers is their superb ability to improvise. From these experiments, Johnson could derive Boltzmann's constant. The value agreed so well with prior knowledge of the constant that there remained no doubt that Johnson's experiments and Nyquist's formula were consistent with each other.

These discoveries immediately benefited engineers. First, they looked to improve the amplifiers themselves in terms of quality—high vacuums, higher purity of filaments, and good electrical contacts. Second, they redesigned their circuits so that resistance at the amplifier's input could be optimized. Some fundamental facts were known. If input impedance was high, grid noise dominated; else, plate noise dominated. In addition, operating temperature must be such that shot effect was minimal. Temperature also had to be low so that thermal noise was minimized.

With so many types of noise working against the primary goal of preserving signals intact, achieving low noise was not trivial. Engineers had to first resort to modelling noise and put it through circuit analysis. Fortunately, circuit theory had sufficiently progressed by the end of the 1920s. In the previous century, circuit theorists had laid a good foundation—Gustav Kirchhoff, Leon Charles Thévenin, and Edward Norton. In particular, Kirchhoff's Current Law and Voltage Law were by now standard knowledge. Thévenin had shown that any circuit, no matter how complex, can be abstracted into just two components—one that represented voltage and one that represented a series resistance. Norton had established a similar abstraction using a current source parallel with a resistance. These directly helped in the modelling of electronic noise. Using Schottky's equation for shot noise and Nyquist's equation for thermal noise, engineers could model these noise sources and put them through the rigours of circuit analysis. Ultimately, they could evaluate their designs from the perspective of noise. Their approach was thus no longer limited to signal analysis as in the olden days.

There were, however, a separate group of engineers who were ignorant of electronic noise all through the 1920s. These were the radio engineers who had grown up in the tradition of Guglielmo Marconi. At the turn of the century, Marconi had introduced to the world a completely new form a communication, one that did not use wires (Chapter 11). Radio engineers faced a set of problems quite different from telephone engineers. Telephony at the time was not wireless. Therefore, it happened that radio engineers did not follow the work of Schottky, Johnson, or Nyquist. The turning point came when F. B. Llewellyn of the Bell Labs decided to publicize the work of his colleagues to a broader audience. His paper appeared in the _Proceedings of the Institute of Radio Engineers_ , February 1930.

The delay probably didn't matter too much to radio engineers because what beset them in those days was not electronic noise. The main noise they had to contend with was environmental or static noise. Users normally experienced this as hiss or crackle. The sources of this type of noise are quite diverse—solar flares, lightning, electrostatic discharges, sky noise, and man-made noise. These noises never troubled telephone engineers much, so long as they insulated their cables and circuits properly. For radio engineers, this noise was limiting system performance since it was easily picked up by receiver antennas. Static noise appeared to be everywhere. John R. Carson of the Bell Labs went so far as to state, "Static, like the poor, will always be with us."

One of the earliest solutions is due to G. W. Pickard who in 1920 published an article describing what we today call _directional antennas_. If we know that our desired signal is coming from a specific direction, it doesn't make sense to receive from all directions. Pickard's approach was to optimize signal reception and minimize noise pickup by orienting the antenna towards the signal. The next advance in this area was _selectivity_ , whereby the receiver system was improved to block all frequencies that were not of interest. A signal typically contains frequencies of a well-defined range, a range that engineers call _bandwidth_. If receiver systems are designed to process only within this bandwidth, then noise entering the system is minimized due to out-of-band rejection.

When radio engineers got introduced to electronic noise in 1930, it was probably timely because by the late 1930s, wireless transmission had slowly migrated from low frequency spectrums to higher spectrums. Static noise was dominant at low frequencies but not so at higher frequencies where thermal noise became dominant. By the time of Word War II, noise problems that radio engineers faced were similar to that of telephone engineers. One group could directly benefit from the other's work.

While all this was going on within radio engineering, something unexpected happened. It was a discovery that went beyond engineering. Karl Jansky working at Bell Labs was tasked with analysing static noise in the sky. Radio engineers had always recognized static noise but no one had actually measured it systematically. Jansky simply took a directional antenna, pointed it at different points in the sky, and measured whatever the antenna picked up. His measurements collected over many months varied widely and clearly depended on where the antenna pointed. It became clear that noise picked up by the antenna was extraterrestrial. Lots of points in the sky had extremely low noise but never nil. On the other hand, there were some very noisy sources such as the sun, interstellar gas clouds, and the Milky Way's centre. The strongest noise seemed to come from the constellation Sagittarius about 25,000 light-years away.

When the results were published in 1933, it triggered curious interest among the public about an alien race attempting to contact inhabitants of earth. The New Yorker magazine wryly commented that this was "the longest distance anybody ever went to look for trouble." What Jansky had really done was to father the science of radio astronomy. It was possible to tell the age and composition of stars and galaxies by detecting such radiation. Years later in 1965, A. A. Penzias and R. W. Wilson measured the minimum background noise and placed it at 3 degrees Kelvin. This noise is suspected to be the last ripples of the Big Bang, the formative fire of the universe. Thus, it is to some extent true that noise exists everywhere, except that such low noises sources are not the dominant ones in our communication systems.

Once the different types and sources of noise were established, system engineers for convenience clubbed them under the general heading of noise. Electronics engineers and radio engineers are concerned about specific noise sources. System engineers on the other hand care only about total noise entering their receivers. Their job is to design a system so that the signal can be extracted as cleanly as possible. By abstracting the specifics into a single general noise source, they defined a measure to indicate how much stronger the signal was with respect to noise. This they named _signal-to-noise ratio (SNR)_. SNR became a key metric often quoted in system design and evaluation. It was a relative measure and usually used on the logarithmic scale with decibel (dB) as its unit. Noise power itself was quoted in the absolute unit of decibel-milliwatt (dBm), power taken with reference to 1 mW of power. This made sense because power in communication systems are so low that use of watt as unit was unwieldy. For example, by Nyquist's theorem, the thermal noise floor of a source operating at 10 megahertz (MHz) bandwidth at room temperature of 300 K is 41.4 x 10-15 watts, which is better expressed as -103.8 dBm. One final question remained to be answered—how noisy was an amplifier?

The motivation for studying electronic noise in the first place was the problem of noise in amplifiers. While signal got amplified, additional noise was also introduced. The question was by how much. Dwight North of the R&D Laboratory of RCA Manufacturing Company first answered it in 1942. He introduced the concept of _noise factor_ and in the process simplified noise measurements. Two years later, Harald Friis of the Bell Labs introduced the concept of _noise figure_ , which was defined as the ratio of input SNR to output SNR. In the process, he introduced new and powerful concepts of available gain and available power. From here, Friis derived the cumulative noise figure of a circuit that used cascaded amplifiers. The clarity of his paper and the power of his methods meant that many in the engineering community adopted the proposal of Friis at the expense of North. Those who followed North's noise factors only introduced confusion since we now had two groups talking in two languages about the same thing. Part of North's dissatisfaction was that Friis had not quoted his earlier work. Here was yet one more controversy over priority. Only this time, there were no patents to talk about. It was only about acceptance and reputation, and the engineering fraternity had clearly voiced its verdict.

The fundamental result of this work was that noise figure of the first amplifier mattered most. It was alright for the first amplifier to have a low gain but it was imperative that it had a low noise figure. Such an amplifier, a critical part of any radio receiver, was christened with a special name— _low noise amplifier (LNA)_. Later amplifier stages could compensate with higher gains. Thus, the work of Friis directly established clear design principles from which engineers could benefit immediately. Because noise figures were extremely small numbers, for convenience, engineers defined _effective noise temperature_. Just as dBm had replaced mW some years ago, effective noise temperature (K) now replaced noise figure (dB). Then something really strange happened.

For years, engineers had been trying to conquer noise but now they started to build noise generators. Non-engineers are often puzzled by this but engineers have a perfectly valid explanation. The Indian spiritualist Swami Vivekananda had once mentioned that to remove a thorn one must use a thorn. It is in this spirit that engineers used noise to analyse noise. To start with, they had to measure amplifier noise figures. Just comparing input and output signals did not isolate noise performance of amplifiers. They used noise generators as controlled noise sources. Noise itself could not be controlled but its statistical properties could be controlled; and therefore, these became essential for measuring and characterizing amplifier performance.

Friis introduced in 1942 a laborious measurement technique. This was soon dropped in favour of the diode as the noise source. The diode's shot effect modelled on Schottky's equation served well for years until engineering applications entered the spectrum above 300 MHz. Engineers tried stopgap measures to extend the use of the noise diode in this high frequency region but none proved satisfactory. Then came an unexpected breakthrough.

Bill Mumford working at Bell Labs under the direction of Friis was asked to perform accurate noise figure measurements of triodes. Not having the right equipment, Bill tried to figure out a way to build better noise generators. One day while watching television at home, he noticed that the picture suddenly turned noisy. Upon investigation, he discovered that his wife had just switched on a fluorescent lamp in the kitchen. He correlated the two and inferred that these lamps generated noise at television frequencies. It might therefore be possible to use gas discharges to generate noise at microwave frequencies. Thus was born in 1949 the gas discharge noise source. Previous measurements of noise figures using diodes had been accurate to a few dB. Over the years, Mumford perfected his device to such an extent that an accuracy of ±0.1 dB became possible. Mumford's device pretty much replaced the noise diode completely.

Thus, over a span of three decades, noise was categorized, analysed, measured, and understood. Amplifier designs improved. Design principles were established. But great inventions have a way of their own. Two inventions that tremendously reduced noise flouted all the key design principles. It is to these two great inventions that we now turn our attention.



**The problem with** successful technology is that once it becomes popular, it opens up new problems for itself. The good thing is that new problems are right in the engineer's alley and he immediately starts searching for solutions. It is therefore fair to say that successful technology starts a chain reaction of discoveries, inventions, and theories. It is true that the vacuum tube amplifier made transcontinental telephony possible but it also brought with it a host of new challenges.

To start with, amplifier performance was sensitive to temperature. Gain varied with plate voltage or age of the amplifier. On long transmission lines with their associated long delays, echoes were a problem. Particularly on the transcontinental lines, the use of many loading coils increased signal delay and hence the echoes. Given that amplifiers were also used, they amplified these annoying echoes. Engineers attempted a combination of reducing the loading plus increasing the number of amplifiers. Signal therefore became more dependent on active components. It was imperative to get amplifier performance right.

When an amplifier entered non-linear regions of operation, which is something like that worst-case scenario for amplifiers, they distorted signals and also caused interference to other channels. Each channel carried a single voice call and it was common to multiplex many channels on the same wire. This concept had been introduced earlier in telegraphy (Chapter 1). Telephony used the same method of frequency division multiplexing. Such systems on the whole are named _carrier systems_ since each channel used a unique carrier frequency. Often channels occupied continuous bands in the frequency spectrum. Therefore, when amplifiers introduced distortions due to non-linearity, signals from one channel interfered with other channels on the wire. In 1923, the Bell System had a four-channel transcontinental carrier system with twelve amplifiers. What this means is that only four voice calls can happen simultaneously between New York and San Francisco. When this was the state of the art, it will not surprise us to know that AT&T supervisor J. S. Jammer was quite puzzled when one of his engineers came up with analytical data for a system that multiplexed 3000 channels. Jammer wondered why bother with so many when we are using only four.

That engineer was Harold S. Black, who had joined AT&T in 1921 for a weekly salary of $32. In a few months, he contemplated resigning his position because other new recruits who had joined for $27 a week were raised to $30 but he himself did not get a raise. Luckily for him and for engineering, he didn't—for Black proved to be an engineer of talent and diligence. As a boy, he had taken a keen interest in electromagnetism. He even built a primitive communication system, strung a wire to his neighbour's house, one who had five daughters, heard all their conversations until it was discovered and destroyed. When in AT&T, he used his Sundays to study systematically every important development that had ever happened in its history from 1898 to 1921. This self-pursued activity gave him a complete perspective on telephony and the current challenges the industry faced. Given the growth of telephony, Black knew that one day the world would need carrier systems that handled thousands of channels. Therefore, the results he showed to his supervisor were certainly not misplaced, though perhaps a little ahead for its time.

For two years, Black and other engineers did not make much progress. Making a linear and stable amplifier seemed to be an impossible task. Engineers could possibly achieve a ten-fold improvement but what they needed was many thousandfold. In March 1923, Black attended a talk by Charles Steinmetz, a renowned scientist. Inspired by Steinmetz, Black got down to basics and approached the problem from a new angle. Instead of trying to linearize amplifiers, it was perhaps better to accept that amplifiers are non-linear and instead focus on removing distortions that occur in the process. In fact, Black was with the Systems Engineering Department and system-level solutions are often about integrating components the right way rather than designing better components. From this grew his idea of the _feedforward amplifier_. Essentially, this translated to isolating the distortion, amplifying it in a separate amplifier, and subtracting it from the original output.

For the next four years, Black struggled with the feedforward amplifier not because it was a wrong idea but because it was not practical. Every experienced engineer knows that there's a vast difference between laboratory work and fieldwork. In fact, many start-up technology firms who claim a ready product as soon as the prototype works have learnt their lessons the hard way. Black's amplifier was notoriously difficult to maintain. Changes in voltages and temperatures meant that distortion could never be cancelled fully. Amplifier gain varied by as much as 1 dB and this was problematic.

When active thinking fails, sometimes one must give the subconscious a free reign. When the four walls of claustrophobic office space prove counterproductive, one must free the mind under more expansive settings. Black's moment of serendipity came one morning (August 2, 1927) on his way to the office, on a ferry crossing from New Jersey to Manhattan. One can never pin that momentary transition from ignorance to knowledge. Black himself failed to explain it later. Was it the quiet waters of the Hudson or the boat's rhythmic motions that soothed the mind towards reflection? Was it the familiar sight of Manhattan's skyline backlit against the rising sun? Whatever it was, when the flash of insight came, Black did not lose it. Finding a blank page in _The New York Times_ , he quickly scribbled his thoughts. A similar flash of insight occurred four days later, again on the same ferry. Black once more used the day's paper to record these improvements. Thus was born the _negative feedback amplifier_. Writing about it in 1934, Black summarized his invention,

by building an amplifier whose gain is deliberately made, say 40 decibels higher than necessary (10,000 fold excess on energy basis), and then feeding the output back on the input in such a way as to throw away the excess gain, it had been found possible to effect extraordinary improvement in constancy of amplification and freedom from non-linearity.

Negative feedback essentially takes the imbalance between input and output, and uses this cautiously to compensate the output so as to maintain stability and linearity while sacrificing gain. Engineers until then had been using something called the _positive feedback amplifier_ in which the output of the amplifier was fed back into the input for greater gain and sensitivity. Therefore, when Black suggested negative feedback, it seemed like a crazy idea. Engineers had sweated for a couple of decades to increase amplifier gains and here was Black suggesting that they throw away hard-won gain. Even Harold Arnold, the man who made de Forest's audion practical, rejected Black's idea, and asked him to work on conventional amplifiers. In fact, Black's proposal was so outrageous that not even the US Patent Office could see its benefits. It would take a working model and nine long years before the patent was granted.

Negative Feedback Amplifier

(a) Plain amplification of signal. (b) With negative feedback, part of the signal is taken from the output and subtracted from the input.

Negative feedback gave numerous advantages. Frequency response of the amplifier was flattened, meaning that it gave the same gain over a wide bandwidth. This was of great importance in carrier systems designed to carry hundreds of channels just as Black had envisioned in 1921. To put this in perspective, today's fibre optic carrier systems carry about half a million channels. Harmonics introduced due to non-linearity were reduced ten-thousandfold. Negative feedback stabilized gain to a variation within 0.01 dB even in the face of voltage or temperature fluctuations. In other words, it had become possible to effect linear behaviour from non-linear components. While this is what Black had meant when he referred to stability, stability had a different meaning for other engineers at Bell Labs. To them, stability meant freedom from oscillation. This is easily understood using the example of a thermostat. If the room temperature is too high, the heating is switched off. If the room cools off quickly and temperature drops below the desired value, the thermostat will trigger the heating to come on. Thus, response delays in the system and improper control can result in oscillations. The room temperature will oscillate about the desired value and worse still, the heating system will be continuously switching on and off. Oscillation is a real problem in all systems that employ feedback. This concerned engineers though Black himself never really addressed this.

It was Harry Nyquist who presented the theory of negative feedback in 1932. In the past, if an amplifier went into oscillation, engineers would simply throw up their hands in despair and start tinkering with the design almost blindly. Nyquist introduced to engineers a simple method to measure amplifier gain and phase shift over the amplifier's bandwidth, plot the same on a polar diagram, and read off directly if the amplifier was stable. If not, he gave them the criterion under which it would become stable. Another theorist of Bell Labs, Hendrik Bode, followed up on Nyquist's work and vastly improved it. Bode took feedback theory to a new level of sophistication. He gave engineers powerful tools that could be used to shape gain and frequency response in a precise way. Today engineers commonly use _Nyquist Diagrams_ and _Bode Plots_. From a direct application of negative feedback, two students of Stanford University designed and built a high precision audio oscillator in a garage. They were the pioneers of Silicon Valley culture. They were William Hewlett and David Packard. The company they founded in 1939 is now known to most engineers.

The idea of negative feedback is extensively used not just in amplifier designs but also in all systems that require stability and control. In the eighteenth century, James Watt had used a governor so that engine speed could be maintained at a constant level. This is indeed a mechanical example of negative feedback. An example from biology is about maintaining a constant concentration of glucose in the bloodstream. If glucose increases, insulin is released; if glucose decreases, glucagon is released. Information about current glucose levels is the feedback and it triggers either corrective action. What Black, Nyquist, and Bode gave the world was an abstraction of the negative feedback principle that could be applied to more than just engineering.

That feedback had to be only positive was really a myth that Black destroyed. Another myth at the time was that smaller signal bandwidth resulted in lower noise. In fact, since bandwidth is a limited resource, it was desirable to transmit and receive messages in as small a bandwidth as possible. In addition, Nyquist had clearly shown that thermal noise power was directly proportional to bandwidth. When theory gives credence, any myth becomes established as fact. One engineer thought otherwise.

If the public is asked to name a celebrity by the name of Armstrong, most people may name Neil Armstrong. It is funny how history selects and immortalizes all the wrong people. In this case, history chose to remember a small insignificant step of man when it should have celebrated the giant leap of mankind. In fact, there is no such thing as a giant leap. Progress comes from a train of small steps and occasional frog leaps. It is therefore proper to remember the right Armstrong and give him due recognition.

Edwin H. Armstrong comes in the old tradition of independent inventors that includes Morse, Bell, Edison, and Tesla. By the 1930s, this was not so common since establishments such as Bell Labs had changed the dynamics of R&D and patent generation. Big corporations had their own R&D departments and the rights of inventions belonged to them and not to inventors per se. It was in this era that Armstrong proposed a new method of noise reduction, first conceived in 1931. The reactions from other engineers were not very different from what Black had experienced in his own time.

_Frequency Modulation_ _(FM)_ was a technology that had been in the laboratory for a long time. All along, it remained a complex technology that could not be made practical for the field. On the other hand, telephony relied on something called _Amplitude Modulation (AM)_ , whereby human speech dictates the strength of sound waves and directly modulates the strength of electric current. In AM, speech amplitude modulates electrical signals for the purpose of transmission. In FM, instead of amplitude, instantaneous frequency is varied based on the modulating signal. Interest in FM started in the 1910s with the aim of conserving precious bandwidth. Engineers tried in vain until John R. Carson of the Bell Labs theoretically proved in 1922 that FM bandwidth must be at least twice the highest modulating frequency. This came to be known as _Carson's Rule_. Engineers quietly gave up and continued to use AM. No one was particularly interested in sacrificing precious bandwidth. When technology later moved on to higher frequency spectrums, bandwidth availability renewed interest in FM in radio engineering; but it was not of much interest to telephone engineers.

Analogue Signal Modulation by AM and FM

(a) Modulating signal such as speech. (b) Carrier waveform to be modulated. (c) AM changes signal amplitude. (d) FM changes carrier frequency keeping amplitude constant.

In this scheme of things, Armstrong suggested that signal amplitude be limited to defined values at the receiver and thereby discard noise. This was alright for FM since modulation was on frequency and not on amplitude. He added that the bandwidth of FM transmission must be increased much more than what was given by Carson's Rule. Rules, after all, are meant to be broken. This was contrary to all current knowledge that straightaway predicted an increase in noise. It took Armstrong some effort to convince the crowd. A demonstration to Radio Corporation of America (RCA) engineers in 1934, followed by another in 1935 at a meeting of the Institute of Radio Engineers (IRE), made it obvious that signal quality was much improved with Armstrong's method. The invention was named _wideband FM_ to differentiate it from the older narrowband version.

FM is a non-linear process. It is exceedingly difficult to analyse mathematically. Fourier analysis leads to non-trivial coefficients. While the mathematical formulation came a few years later, Armstrong himself remarkably did all his analysis using just phasors. How did increasing FM bandwidth lead to noise reduction? This became a question of great curiosity and over the years, many different explanations were given.

In wideband FM, engineers faced the classic exercise that has always been at the core of all engineering. This is called _trade-off_. To gain something, one has to give up something else. To make high precision equipment, an increase in manufacturing cost must be accepted. With FM, trade-off meant that it was alright to use up extra bandwidth because one gained much more in terms of SNR. It was alright to widen lanes to reduce road accidents. Engineers never forgot this lesson. Years later, they applied the same principle when designing PCM systems (Chapter 4) and CDMA systems (Chapter 11).

When the superiority of wideband FM became established, RCA tried to claim part of the glory. They challenged Armstrong's patent of 1933, contending that RCA engineers had played a significant role in developing wideband FM. The earliest note of Armstrong that documents his idea is from July 1932. Supposedly, the idea first occurred to him in September 1931 but there is no dated proof of this. The fact that he was a consultant to RCA on narrowband FM made things difficult for him. The bitter dispute that followed broke a 20-year friendship between RCA engineer David Sarnoff and Armstrong. Being a lone inventor, the challenges that Armstrong faced were far greater than Black's.

To be fair to historians, Armstrong didn't make it easy for them to establish facts. Armstrong did not document well enough his ideas. By being an independent inventor, he had not learnt that necessary skill that all engineers need. Take Black, for example. When the idea occurred to him on the ferry, he wrote it down immediately and dated it. When he reached his office, he got it witnessed by a colleague. It is customary for engineers to maintain laboratory log books giving as much detail and clarity as possible. These then become indispensable for patent attorneys to establish priority.

RCA did not promote Armstrong's FM and in fact attempted to block its adoption. After much struggle and personal investment, Armstrong launched FM broadcasting on his own at 42.8 MHz frequency. Many licensed Armstrong's invention but the inventor spent much more than he ever earned. From then until the end of World War II, FM radio offered the American public the only form of high-fidelity audio, since magnetic tapes and vinyl LPs for public use appeared only later. Armstrong's FM was licensed to the US military free of charge and it was used extensively during the war years.

In 1945, television broadcasters lobbied with the Federal Communications Commission (FCC) and had the FM spectrum moved to 88-108 MHz, where it remains to this day. Back then, this move made obsolete all of Armstrong's transmitter and receiver sets. For television broadcasting, RCA used wideband FM without proper license. Armstrong sued. The litigation dragged and all that Armstrong got were expensive legal fees. By 1953, his finances were running out, his license had already expired three years earlier, and RCA continued to lay claim to his invention. This claim gave excuse to others for not paying Armstrong license fees due to him. Depressed, in January 1954, he threw himself from the thirteenth floor of his apartment building. For the man who made radio noiseless, the world was still too noisy to live in.

This then is the grand story of noise, from the days of Brownian Motion to the days of FM radio. Because noise can never be eliminated completely, and because insatiable demand constantly pushes technology to its limits, noise always has a place in a world of signals. In fact, the reverse is just as true—signals have to find their place in a world of noise. The stage was thus set for a revolutionary new technology to enter the scene.

#  0100 A Measure of Information

**In the days** of the Second World War, there was a need to maintain secrecy in communication. Ciphers had been known for centuries and with time, they had improved in sophistication and protection. In fact, the use of private ciphers had become popular with telegraphy, particularly so because operators who relayed messages could read them. Unlike postal messages, telegraph messages were not in sealed envelopes. Given that ciphering of important telegraph messages was in common use, one would expect that achieving secrecy for the Allies would be an easy task, a simple application of already known principles of cryptography. However, it was not. The requirements given to engineers were quite different. It was not the simple written messages of the past that needed a new engineering solution.

In the telephonic conversations between the American President Franklin D. Roosevelt and Winston Churchill in his War Rooms on the other side of the Atlantic, the messages were of such importance that it was imperative that they be protected from enemy eavesdroppers. Telephony across the Atlantic at the time was quite different from what we are used to today. Radio waves carried voice directly, these waves able to travel such great distances due to wave refractions possible from a certain layer of the earth's atmosphere called the ionosphere. The use of satellites and undersea cables for telephony were technical achievements that came years later. The problem with radio transmission was that anyone could listen to it; and no one had succeeded in creating a suitable voice cipher system.

The problem with voice, unlike telegraph's dots and dashes representing a language alphabet, is that the waveform is a continuous signal. This signal follows the contours and variations of speech, except that instead of being in sound it is in the electrical domain. Ciphering any message means that the message should resemble gibberish to the enemy. In fact, one may say that the message should look like noise to the enemy and try as he might, he shouldn't be able to figure out the message. Of course, making any good thing bad is easy business. The challenge for engineers was to make the message look like noise but still enable complete reconstruction of the original message at the receiving end, something an authenticated party can do but not the enemy. This had been possible for telegraphy since the alphabet set was fixed and finite. Ciphering for telegraphy was merely a translation of the message into an unintelligible form that used the same alphabet set.

With voice, the signal is not discrete in the sense that it is not from a fixed set of possibilities as in telegraphy. Since a voice waveform is continuous, it can therefore take on an infinite number of forms. Engineers simply did not know how to cipher such continuous waveforms. If ever they tried, they would have changed the composition of the signal in terms of its orthogonal components. Signal bandwidth might have been affected. The message itself might have changed in an irreversible manner. Thus, the receiver would be in the same position as the enemy, unable to decipher the message. Communication would have become impossible. Despite these challenges, the A-3 Scrambler was engineered and was in operation at the start of the war. Realizing that it was probably not very secure, top officials urged Bell Labs to come up with something better. Surely enough, unknown to the Allies, the Germans had cracked the A-3 Scrambler. A solution was needed urgently. Fortunately for the engineers of Bell Labs, a British engineer, some five years earlier, had invented something that might be just what was now needed. Unfortunately for the same engineers, they were blissfully unaware of the invention across the Atlantic and started excitedly on their own line of design and development.

Alec Reeves was a scholar who obtained his bachelor's from the City and Guilds Engineering College and later a postgraduate degree from Imperial College, London. Soon after, he joined the International Western Electric Company, which in 1925 became the International Telephone and Telegraph (ITT). The same year Reeves moved to Paris to work at the company's central laboratories, the Laboratoire Central de Telecommunications (LCT). On this side of the Atlantic, the LCT attempted to match the research efforts of AT&T's Bell Labs. Here, Reeves found himself in the right place. To be surrounded by vacuum tubes and radio receivers is really a passionate engineer's idea of heaven and Reeves found himself in one. However, Reeves was more eccentric than one may suppose of an ordinary engineer. His work on radio systems triggered his curiosity in search of extraterrestrials. It was normal for the office night guards to spot him strolling on the terraces with a pensive gaze directed at the starry skies. In the same spirit, he thought he was communicating with the dead. In particular, he saw Michael Faraday to be his guiding spirit.

Engineers, cloistered in their own limited world of signals and systems, are often derided for being unimaginative. The truth is that an engineer's imagination is vast though limited to his own world. It is an imagination that outsiders fail to understand simply because there is a wide gap between everyday experiences and the intricacies of engineering invention. At LCT, Reeves made numerous improvements to communication systems. This work familiarized him to the problem of noise in radio communications. The transatlantic telephonic system was noisy. Reeves began thinking of a way to solve the noise problem and his approach was unlike anything anyone had done before. This was the time when radio engineers knew about static noise as well as electronic noise from the pioneering work of Schottky, Johnson, and Nyquist.

The history of modern technology informs us that science often supplies a new phenomenon, which engineering then harnesses. The invention of Reeves is perhaps one of the few that relied less on scientific discovery and more on engineering imagination. Amplitude modulation was the first and earliest of methods to transmit voice messages using electrical signals. On the contrary, telegraphy had shown engineers that dots and dashes modulated carriers with pulses. From this was born the concept of _pulse modulation_. Someone saw the general principle that pulses suited data transmission much better than the continuous signals of amplitude modulation.

As early as 1842, Scottish mechanic Alexander Bain conceived of a system to send facsimile images via telegraphy. Those who followed in Bain's footsteps either made incremental improvements or came up with different designs without accompanying improvement to efficiency. Arthur Korn of Germany introduced in 1902 an important invention that combined optical scanning at the transmitter with image recording on a photographic film at the receiver. The critical innovation towards facsimile transmission did not come until 1921 when Western Electric's Paul M. Rainey came up with a system that could transmit images using telegraphy. What separated Rainey from his predecessors was the use of pulses together with an innovative encoding process, not unlike Morse code. From the sixteen claims of his US patent application, the first one summarized his innovation,

The method of transmitting pictures by electricity which comprises transmitting a code combination of electrical impulses for each elemental area of the picture, utilizing said combinations to correspondingly vary the intensity of a light beam, and causing said beam to reproduce the picture on a sensitized form at a distant station.

Many key ideas are discernible in this first claim—impulses rather than continuously varying signals, codes defining combinations, a code combination representing brightness, and the entire picture viewed as a jigsaw of small pieces. The last of these particularly agrees with today's approach of dividing a digital image into basic _picture elements_ , also called _pixels_. The term megapixel has come into common everyday vocabulary in relation to the capability of digital cameras. Pixels are the bridge between images we perceive as continuous and their digital representation.

Rainey's idea too suffered the same fate as Bain's—the idea was ahead of its time, the industry did not seem to need it, and the future of telecommunications lay entrenched in traditional telegraphy and voice telephony. Facsimile systems of the day were variously referred to as teleautograph, telephotography, phototelegraphy, telegraphoscope, pantelegraph, or _kopiertelegraph_ (copying telegraph). When revolutionary technologies require new terms of reference, even name-giving becomes a problem since people often approach it with differing perspectives or worse still, don't understand it.

Communication engineers have been using the term _data_ to mean anything non-voice. Telegraphy was data and so was facsimile. Pulse modulation had been proposed for data transmission but now the question was asked if it could be used for voice as well. No one had asked this for a long time. Voice was always linked to telephony. Data on the other hand evolved from telegraphy. Reeves was among the few who first imagined what would happen if one used pulse modulation for voice. Indeed, this was an uncomfortable notion since pulses were discrete in nature and voice was continuous.

Even before Rainey, many engineers had used pulse modulation in relation to _Time Division Multiplexing (TDM)_. In the beginning, _Frequency Division Multiplexing (FDM)_ was used exclusively in carrier systems for telegraphy, telephony, and radio transmissions. With FDM, multiple channels could be carried on the same medium with each channel occupying a unique frequency. With TDM, channels used the same frequency but were separated in time; that is, conversations took turns to transmit. To engineers, frequency and time were both resources that could be traded-off as desired to meet system requirements.

The nature of TDM is that it requires accurate time synchronization so that conversations can be made active or inactive at high speeds. By mistake, if two conversations ended up overlapping their transmissions even for a small interval, communication got affected. Such a poor system design manifested itself as noise. The general name for this is _interchannel interference_. FDM systems were comparatively simple to design and implement. This explains why TDM systems, though proposed as early as mid-nineteenth century, exhibited a first practical demonstration only in 1903. Based on a decade-old invention of F. J. Patten, Willard W. Miner built a system that switched each conversation as fast as 4,300 cycles per second. At any lower rate, voice quality was poor. Just before transmission, the voice signal was sampled. The value was then represented as an electrical pulse of an equivalent amplitude and sent down the line. This became known as _Pulse Amplitude Modulation (PAM)_. This was the first significant form of pulse modulation that later went into commercial use. Little did Miner know the theoretical basis of his system. He deemed 4,300 cycles per second enough but this was far from the ideal. Pulses suffered from intersymbol interference and this was nothing new. Thomson and Jenkin had known this back in the 1850s.

Reeves looked at PAM from the perspective of noise rather than multiplexing. Since PAM represented voice samples as pulses of varying amplitudes, it had the same problems as amplitude modulation. Noise altered amplitude values and hence distorted the signal. Reeves asked if there was a way to make the signal resilient to noise. He was not analysing noise or looking to build better amplifiers. These were traditional approaches to signals that were continuous, signals that used either AM or FM. Instead, Reeves started to look at the pulses themselves. He put aside physical considerations of the signal and looked at it in abstraction. He wondered if there could be a different method of mapping speech to pulses, a method that resulted in better noise immunity.

The remarkable thing about engineering, or scientific research for that matter, is that the essential ingredients of anything new are long in place. Often they are scattered across different laboratories and across national boundaries. Technical papers published in different languages remain known to a limited readership. It's only years later when someone translates such a paper, travels to a conference, or comes across a new journal by a chance recommendation that the links are made. The concept then takes shape and a new invention or knowledge is born. At other times, the inventor makes the discovery independently without knowledge of earlier works of others.

What Reeves invented is today named _Pulse Code Modulation (PCM)_. Being of the family of pulse modulations, PCM is related to PAM though vastly superior in terms of noise immunity. It is quite likely that Reeves got his inspiration for PCM from Baudot Code, which Frenchman Émile Baudot had invented in the 1870s. In Baudot Code, five symbols are used whereby each symbol can take one of two values. This is similar to Morse code, which uses only dots and dashes. The difference is that with Baudot Code the characters are all of a fixed length of five binary symbols whereas in Morse code this number is variable. Thus, Baudot Code is able to signal 25 = 32 characters in all. This is strongly reminiscent of Murray Shutter Telegraphy. What Reeves did next was almost magical, though if we are not careful, we might say it was obvious.

Reeves took the pulsed nature of PAM and combined it with the coding principle of Baudot Code. Instead of representing speech samples using pulses of differing amplitudes, he encoded each sample using a combination of pulses much like Baudot Code, which used five symbols per character. In PAM, there is no intermediate level. Speech sounds directly become electrical pulses. In PCM, Reeves introduced a level of abstraction in between, that of encoding samples before converting them into electrical pulses. Even when pulses were limited to only two values, present or absent, PCM encoding did the trick to represent a range of values.

At the time of this invention, a modified version of Baudot Code was quite popular for data transmission. Since the 1920s, teleprinters had grown in popularity for the exchange of text-based data. Though Morse code was more bandwidth-efficient, Baudot Code's fixed character length made it easier to build automated equipment for message transmission and reception. Paper tapes recorded the characters using punched holes. Given this familiarity with Baudot Code, anyone could have invented PCM. Yet it was only Reeves who did it. His idea germinated in 1937, was patented in France in 1939 and in the US in 1942.

The aspect of PCM that made it revolutionary was that noise affected it only in the most adverse conditions. Since the encoding was in binary form, a one or a zero for each symbol, noise had to be really significant to change a pulse representing a one to complete absence of pulse, a zero; or vice versa. Small changes in pulse amplitudes did not affect the encoding and hence preserved the final interpretation of the speech sample. Moreover, repeaters along the line would _reshape_ the pulses to their original forms and thus remove noise early on before it got significant to cause errors. PCM's intermediate level of encoding brought noise immunity and improved voice quality. The trade-off for this improved performance was that signal bandwidth increased, a trade-off not unlike that of FM. In fact, PCM is unbelievably so good that it outclasses FM. In FM, the SNR increases only logarithmically in relation to increase in bandwidth. In PCM, SNR increases linearly with bandwidth. If PCM was a simple application of Baudot Code from telegraphy to telephony, the real difficulty lay in taking samples of a varying speech signal. Telegraphy was inherently suited for pulse transmission. Speech signals on the other hand varied continuously.

PAM and PCM

(a) Clean analogue speech waveform. (b) Channel adds noise to the signal and corrupts it. (c) PAM samples are just as vulnerable to noise. (d) PCM encoding overcomes noise since encoding preserves original sample value. Analogue signal is transmitted in digital form.

The question did not trouble Reeves. Many researchers working independently in the preceding decades had already answered it. Even PAM designers knew the answer and had applied it successfully. While the answer leads to one of the most famous results in all of communication engineering, there is a much more fundamental question hidden within the principle of PCM. At the turn of century, Einstein and Perrin, among others, had put to rest the almost eternal question about atomism. Engineers working on applying pulse modulation to telephony were faced with a similar interplay between the continuous and the discrete. Speech signals, even when speech was interspersed with periods of silence, were continuous. Pulse modulation sampled this continuous waveform at a constant rate and represented it as discrete pulses. If continuous speech were to be represented thus as a train of discrete pulses, this was at best an approximation to the original signal. How many samples should we take per second to ensure an acceptable approximation? If we take too few, the signal would be poorly represented. If we take too many, the bandwidth utilized would be too high. Engineers looked for the theoretical optimal number that would reconstruct a smooth continuous waveform at the receiver based on just discrete samples.

Harry Nyquist, the same Bell Labs theoretician whom we had encountered in relation to thermal noise, provided the definitive answer to these questions. His famous result, implicit in a paper from 1928, was initially named _Nyquist Sampling Theorem_. Essentially the theorem states that given a band-limited signal, the signal must be sampled at least twice its maximum frequency. This enables complete reconstruction of the signal from its samples without any distortion. If a signal is sampled below this rate, then a phenomenon called _aliasing_ occurs. Aliasing is not limited to speech. It can happen for any continuous signal that is inadequately sampled, including images. That Nyquist was the first to prove this result is stated in most engineering textbooks. Today it is widely recognized that Nyquist was one among many and certainly not the first.

Undersampling and Aliasing

(a) When too few samples are taken, the original signal is misinterpreted as a signal with a lower frequency. (b) When sufficient samples are taken, original signal can be properly reconstructed. (c) Sample image of Barbara. Copyright ownership is unknown. (d) Image is undersampled. Lines on her trousers and scarf suffer distortion. This is visual aliasing.

Russian scientist Vladimir A. Kotelnikov had arrived at a similar result in 1933 but his work was not known in the West for many years. Claude E. Shannon indirectly cited Kotelnikov's work in a paper of 1949. By then, Shannon was such an important figure in communication engineering that his publication prompted others to coin a new name— _Shannon-Nyquist Sampling Theorem_. Later historical digging found much earlier precedence. Similarity was found in Joseph Lagrange's 1765 treatment of vibrating strings and Charles-Jean de la Vallée Poussin's 1908 work on interpolation from equidistant functional values. The truly astonishing precedence from the mathematical world was due to E. T. Whittaker (1915) whose equations could be applied directly to band-limited signals. In recognition to the various contributors, the theorem has more recently acquired an exotic name— _Whittaker-Nyquist-Kotelnikov-Shannon Theorem._ In a concise article summarizing these developments, Professor Hans Lüke has commented,

this history also reveals a process which is often apparent in theoretical problems in technology or physics: first the practicians put forward a rule of thumb, then the theoreticians develop the general solution, and finally someone discovers that the mathematicians have long since solved the mathematical problem which it contains, but in "splendid isolation."

Another remarkable result from this theorem is that it tells engineers how to construct a band-limited pulse or signal, that is, one that has a well-defined bandwidth with no frequencies outside this defined range. When talking of pulses, we often visualize sharp rectangular pulses. These look ideal in time domain but they are a disaster in frequency domain. Fourier analysis informs us that such a sharp pulse has frequencies spread over a wide range. The signal has significant energy outside the main frequency band. This was clearly not good because it resulted in intersymbol interference, and in the case of carrier systems, _intercarrier interference_. The Fourier transform of a rectangular pulse is of the form sin(x)/x, which engineers define as _sinc(x)_. Engineers including Nyquist and Kotelnikov argued that if we took sinc(t) in the time domain as the pulse, then the signal in the frequency domain would be perfectly band-limited. Thus, a sinc pulse gives the perfect frequency response. Engineers today refer to this as _pulse shape_. Nyquist called it _shape factor_.

With the coming of pulse modulation, Fourier analysis, which in earlier times had been limited to signals, was now applied to pulses as well. Shannon looked at the same thing in terms of orthogonal basis expansion since time-shifted sinc pulses are all orthogonal to one another. He went on to state that such an infinite expansion could be approximated by a finite and manageable one. This has come to be known as the _Dimensionality Theorem_. What comes out of Shannon's statement is that while _Fourier bandwidth_ is what a signal occupies, _Shannon bandwidth_ defines what the signal needs to use. The closer the two are to each another, the more efficient is the design. These theories gave engineers methods and measurable goals to aim for in their designs.

But out in the real world—since philosophers keep telling us that this world is an illusion, to keep sanity and purpose intact, engineers keep reminding themselves that the world is real—constructing a sinc pulse is a difficult proposition. Engineers like approximations just as mathematicians like limits and convergence. It does not matter if the ideal is not achievable but at least we can approach it as close as possible. So engineers approximated the sinc pulse using a triangular pulse, a raised cosine (RC) pulse, and a root-raised cosine pulse. The last of these has become a popular form of pulse shaping that minimizes energy spill into frequencies outside the main band.

Nyquist Theorem helped Reeves in his sampling of speech signals. Since human speech as sent down a telephone line is usually in the range 300 to 3,300 hertz, it was necessary to sample at 6,600 samples per second. Although sampling rate was thus settled to avoid signal distortion, the question of encoding each sample remained open. Baudot Code had used five binary symbols per character. If we used only five binary symbols for each speech sample, we could represent only thirty-two distinct levels. Would that be enough? More binary symbols meant more bandwidth. Lesser binary symbols meant a coarser representation of the actual sample value. It was necessary to get this right since irreversible distortion occurred if one used too few levels. The receiver can never reconstruct the original sample value but only an approximation. This process of approximating a sample value from a fixed set of discrete amplitude levels is called _quantization_. In PCM, quantization defines the dominant form of noise. Each additional binary symbol improved SNR by 6 dB. Incredibly, noise, which once depended on thermal or environmental considerations, was now determined by the engineer's choice. Noise, which was once uncontrollable, was now taken care upfront at design stage.

Quantization can be understood by an analogy with Lego blocks. Suppose one is required to construct a Roman arch of semicircular profile. The arch can only be built in an approximate fashion because the blocks themselves are rectangular. If one uses large blocks, the arch will look more like a triangle suggesting the pointed gable of a farmhouse. If one uses small blocks, then there is greater control and one can effect a curve that looks like an arch. If we look at the details, we will notice a _staircase effect_ ; but from a distance, these details are lost and the arch is what we see. Quantization of speech is quite similar. Use of small quantization steps, leads to an approximation that's close to the actual signal. Low-pass filters in receivers smooth out the final output; that is, removing high frequencies removes the staircase-effect.

Quantization

(a) Clean analogue waveform. (b) Quantization with only 3 bits shows the staircase effect. (c) Quantization with 6 bits gives a better approximation of the original signal. (d) Image quantized with 3 bits can represent only eight shades of grey. (e) Image quantized with 6 bits.

Reeves himself did not bother too much about how best to quantize signals since he did not actually build a system for commercial deployment. In his patent, he talked about using five bits per sample. Later experimental investigations by others showed that four bits per sample reproduced good speech but when speech was loud six bits were found necessary. Today's PCM systems use 12 or 13 bits per sample plus a sign bit to differentiate between positive and negative values. What is actually sent down the line is far less due to the use of an intelligent technique called _companding_.

We see that engineers not only borrow words but also coin new ones. In companding, we find a marriage of _compression_ and _expanding_. The former is done at the transmitting end and the latter at the receiving end to reconstruct the original samples. In companding, finer resolution of samples is taken at lower signal levels and coarser resolution at higher levels. This is because human auditory perception is more sensitive at lower levels and the coarser quantization is masked by the higher signal strength. Companding requires only 8 bits per sample including the sign bit. Thus, only 8 bits are sent down the line. The receiver expands them back before converting it to analogue speech form. Two variants exist for companding— _A-Law_ and _μ-Law_. All countries use one of these two variants.

Today, speech is typically taken to be band-limited at 4 kHz. By Nyquist's theorem, it is sampled at 8 kHz. When companding is used, 8 bits per sample are required. This translates to a bit rate of 64 kilobits per second (kbps). In a basic method of modulation, this implies a bandwidth of 64 kHz. Thus, PCM has increased signal bandwidth sixteen-fold but this penalty is more than made up for through improved system performance. This is now standard for any high quality speech telephony. Such quality is usually referred to as _carrier-grade_. PCM enabled TDM whereby many circuits were multiplexed on the line. In fact, Baudot himself had multiplexed multiple telegraph channels on a single line using 5-bit pulses. Baudot multiplexing was PCM/TDM in every sense except that it was for data. PCM/TDM extended the principle to voice samples and two transmission systems evolved—T1 carrier that multiplexed 24 circuits and E1 that multiplexed 32 circuits.

In T1, the line rate was at first 24 x 64 = 1536 kbps = 1.536 Mbps. Time was divided into _slots_ , each slot carrying 8 bits of a single circuit. While T1 lines were the _physical channels_ , engineers started referring to calls and individual circuits as _logical channels_. The multiplexing of 24 logical channels was grouped into a higher logical structure, which was termed as a _frame_. Thus, each frame had 8 x 24 = 192 bits. This was not enough. Designers of T1 needed a mechanism to maintain synchronization between transmitter and receiver, synchronization being a critical part of all TDM systems. Data itself could not be used reliably for this purpose since it was arbitrary. Engineers decided to add a single bit into the frame for the purpose of synchronization. Thus, T1 line rate ended up being 1.544 Mbps. When these numbers were published, 193 being a prime number and 1.544 having a mystical aura to it, engineers outside the transmission group joked that transmission engineers who had designed T1 must be numerologists. In Europe's E1 system, line rate was 32 x 64 = 2048 kbps = 2.048 Mbps. This made sense to Europeans who wanted elegance, 32 being a power of 2. Of the 32 circuits, two were reserved for synchronization.

Such high line rates, if they were to be harnessed from existing cables, meant that the loading coils of Pupin and Campbell had to go. Progress is nothing more than one technology making obsolete another. From T1 and E1, an entire hierarchy of transmission rates have been built up. These transmission technologies are the stuff from which today's telephony networks have been built. A call from California to Singapore, while being carried on a dedicated circuit, is just one of many calls carried on a shared T1 line that multiplexes many others. It is often only when the circuit reaches the local telephone exchange, that it is de-multiplexed from T1 lines, the pulses are decoded using the μ-Law, the decoded form is converted to continuous waveforms, which are then sent to the customer at his premises. PCM has enabled almost noise-free transmission across thousands of miles with fewer repeaters than what was once possible with waveforms that were neither pulsed nor encoded. While T1 and E1 hierarchies continue to exist in many parts of the world, since the 1990s they are being slowly replaced with optic fibres that use modulated light rather than electric current.

Since PCM was pulsed, pulses representing a binary code of just ones and zeros, it was easy to devise a method to cipher sampled, quantized, and encoded voice. What had been a difficult task with continuous waveforms now became a breeze with PCM. Engineers at Bell Labs had designed their own voice encryption that went by the name _SIGSALY_ , also known as the _X System_. An early demonstration of this system happened in 1939. It was about this time that the engineers came across Reeves' PCM and realized its importance. The principle of PCM was adopted for SIGSALY by using distinct frequencies to represent each of the binary symbols after the voice samples had been encrypted. All through the war years, voice encryption that relied on PCM technology remained a matter of secrecy. Research papers in this area were classified. Had the defence ministry known of PCM earlier or realized it potential, they would never have allowed it to be patented since patents are in public domain. SIGSALY came into operation in July 1943 and possibly played an essential role towards Allied victory. Speaking at its formal opening, O. E. Buckley, then President of Bell Labs, paid tribute to the engineers behind it,

As a technical achievement, I should like to point out that it must be counted among the major advances in the art of telephony. Not only does it represent the achievement of a goal long sought—complete secrecy in radiotelephone transmission—but it represents the first practical application of new methods of telephone transmission that promise to have far-reaching effects.... To do these things called for a degree of precision and a refinement of techniques that scarcely seemed possible when the researches that led to this result were undertaken. That speech transmitted in this manner sounds somewhat unnatural and that voices are not always recognizable should not be surprising. The remarkable thing is that it can be done at all.

Given these advantages of PCM/TDM, one would expect them to have dominated communication technology within years of their conception. Yet the earliest commercial use of T1 multiplexing occurred only in 1962, a good quarter of a century after Reeves had invented it. Translating an idea into a working system of high reliability is not a trivial task. Often it calls for many particular solutions for many particular problems. To take one example, the encoding of PCM samples to bits was encoding only from a logical perspective. It was not obvious at the time how the ones and zeros were to be represented on the line even though sampling theorem had established the shape of an ideal pulse. Should zero be zero voltage or a negative pulse? Should one be a full positive pulse or a combination of positive and negative half-pulses? The method of such a translation is called _line coding_.

There were of course many possibilities and engineers had to choose the best possible line coding. Particularly with the high line rates of TDM and the bundling of many such lines into a single cable, crosstalk could become a serious problem. The early trials of 1956 by engineers of Bell Labs showed that crosstalk was indeed very high. They had used _unipolar_ line coding, which used only positive pulses for ones and nothing for zeros. A subsequent field trial in 1959 demonstrated minimal crosstalk using a new method that used pulses of alternating polarity for the ones. This came to be called _bipolar_ line coding. Thus, solution to a single engineering problem from concept to field demonstration took three long years.

When Reeves designed PCM in the late 1930s, what he had in mind was improved noise immunity on microwave links. These are wireless transmissions in the order of gigahertz. At such a high frequency, enough bandwidth was available to allow the use of PCM. However, the bulk of telephony was still carried on cables and PCM was not exactly the right match. The network was already invested heavily in analogue technology and making the transition to PCM was naturally delayed.

A more compelling reason for PCM/TDM's delay was that computers of the 1940s were primitive. The best technology of the day was the vacuum tube, which simply could not supply the computational power necessary to make PCM/TDM work. Only the military could afford the cost and complexity. In fact, SIGSALY stretched the day's state-of-the-art computing power. SIGSALY was a colossal 55-ton machine arranged in forty racks. It consumed a whopping 30 kW of power. The Second World War slowed the adoption and growth of telephony. All research of the early 1940s was directed towards the war effort. In the post-war period, efforts towards commercializing PCM/TDM proved difficult, particularly when FDM carrier technology became more efficient and cheaper. Among the early PCM inventions was a ten-inch vacuum tube that contained a perforated plate to encode speech samples into a sequence of on-off pulses. It was no doubt ingenious but cumbersome and impractical. Enabling technologies including transistors and integrated circuits were invented and commercialized years later (Chapter 8). Depending on one's point of view, one may say that PCM was far ahead of its time or that the enablers had a much-delayed delivery. It is therefore not surprising to learn that Reeves himself got recognition only in the mid-1960s. When it finally came, Reeves commented on his invention,

Pulse code modulation has been a child with a long infancy.... Twenty-five years after its invention, it can be said that pulse code modulation has little past as yet; the real interest is in its future.



**With PCM/TDM,** the world of communication was transformed once and forever. The world no longer belonged to analogue systems that used AM, FM, or FDM. PCM/TDM showed that communication was possible with just ones and zeros. Complexity of any level can be built from these simplest of elements. With this was born digital communication. This went hand in hand with the development of digital computers, which too relied on just ones and zeros (Chapter 8). Thus, communication and computing together propelled the glorious development of digital technology. Digital communication does not means that electrical transmission is in the form of ones and zeros. Line coding has shown us that the transmitted waveform is continuous, but on a logical level, it represents the message in digital form. Thus, we don't actually see discrete sharp pulses of uncontrolled bandwidth sent on a communication channel. Much of telecommunications, starting from the time of Chappe telegraphy, makes a clear separation of physical and logical, form and content.

Digital communication should have been born much earlier. In fact, Morse telegraphy was digital from the start because it used only dots and dashes. When telephony came into prominence, everyone took the obvious approach of Graham Bell's speech waveforms. At the time, engineers neither knew noise nor recognized its importance. Their research focus abandoned the simplicity of telegraphy for the sweet-sounding voices of their own. It was only when noise asserted its presence on long-distance lines that sweet voices became coarse, annoying, and at times noise itself. PCM changed all that and brought back digital methods into R&D agenda.

A common misconception is that communication in the beginning was analogue and digital communication came later. Paradoxically, non-engineers also debate that communications has always been digital. Have we not always done our counting with digits since the earliest of times? A number such as 486 is composed of three digits. Shouldn't any system making use of numbers in this fashion be called digital?

Engineering often gives new meanings to common words and expressions. One might even say that engineers have hijacked many common words and used them suitably in their own domain. While this familiarity works in favour of understanding engineering concepts, at times it only confuses. Analogies go only so far and original meanings often tend to cloud understanding. Thus, Microsoft Windows is not exactly windows in the common sense; yet it is a window to a whole new world of experience in a different space. A mouse is not a nocturnal door mouse; yet it is physiologically similar with a compact body and a tail. Just as humans evolved from their ancestral apes and got rid of their tails, modern wireless mice too have found their ancestral tails unnecessary. Technology at times evolves in ways surprisingly similar to life; and it is not too far-fetched to stretch Darwinism to technology.

The word digital comes from the use of digits, meaning fingers or toes. This physiological origin is not surprising since ancient man use his digits to do counting. While indeed Mayans and Lincolnshire shepherds had adopted a base-20 counting system, the word evolved to mean any of the digits from 0 to 9 in conformance with the decimal numeric system. In the modern era, the word digital has acquired a new meaning whose origins are in engineering. The only digits that count are 0 and 1. Formally named as _binary digits_ , they are often shortened as _bits_. With this came the popularity of the binary numeric system. In this system, any number can be represented by a string of ones and zeros. For example, the number 73 can be written as 64 + 8 + 1 = 1·26 \+ 0·25 \+ 0·24 \+ 1·23 \+ 0·22 \+ 0·21 \+ 1·20 = 10010012. One may wonder who actually invented such a binary representation of numbers. What is Reeves, Rainey, or Baudot?

The work of a historian is never done. Just when she thinks that the origin of something has been identified, she uncovers an older layer of dust that reveals a deeper secret. In fact, the binary numeric system is a lot older than we may think. English mathematician Thomas Hariot is credited with inventing such a system at the start of the seventeenth century. His work remained unknown until recently since he never published it. Then in 1670, a bishop by the name of Juan Caramuel Lobkowitz not only published the binary system but also showed how one could perform arithmetic operations with just ones and zeros. Unfortunately for Lobkowitz, no one took notice of his work and it faded into oblivion. Thus, the honour of inventing the binary system went to Gottfried W. Leibniz, his work appearing about three decades after Lobkowitz. Leibniz, a man of many talents and interests, was not only a mathematician but also a keen philosopher. Through his contact with a Jesuit missionary in China, he came across a binary representation that was possibly four thousand years old.

Leibniz became aware of the famous Chinese text _Yijing_ , attributed to the legendary Fu Xi, the father of Chinese writing. The _Yijing_ supposedly contained the binary system but historical records trace the origins only as far as the first century BC. In any case, Leibniz concluded that the Chinese had a complete understanding of the binary system. In fact, modern scholars do not agree on this point. The texts betray no mathematical operations such as addition or multiplication. The Chinese indeed had a binary system but it was more philosophical than numeric. Leibniz saw the philosophical perspective as well. One represented existence, unity, and God. Zero represented nothing, the starting point of all creation. The world in its entirety was a creation out of ones and zeros. The binary system thus represented complexity from simplicity, variety from economy. When English romantic poet William Blake wrote in the nineteenth century, "To see a world in a grain of sand,... and eternity in an hour," he gave words to the notion that every part has in it the essence of the whole. The bit embodies the infinite just as the atom embodies all matter. These philosophical qualities attracted Leibniz to the binary system.

Writing about it in 1703, Leibniz presented patterns in numbers when they were written as strings of ones and zeros. He saw an inherent order, symmetry, and harmony. One may even claim that Leibniz saw an essential beauty in the binary representation of numbers. Though the binary system is cumbersome for humans, Leibniz argued that it is perfectly suited for scientific enquiry. Leibniz found support of his philosophy in the Chinese. To the Chinese, all of creation was based on two principles—the _yin_ and the _yang_. Contraries made up this world—light and shadow, white and black, male and female, active and passive. This then was their basis of a binary treatment of all things in nature. From the combining of _yin_ and _yang,_ arise the great _yang_ , the small _yin_ , the small _yang,_ and the great _yin_. Taking three basic forms, we arrive at eight elements—heaven, earth, lake, mountain, fire, water, thunder, and wind. These elements arise from forms that can be grouped as heavenly (sun, moon, stars, space) or earthly (water, fire, ground, stone). What becomes obvious in this description is that all numbers are powers of two.

The Binary Nature of Chinese Philosophy

Though many of us may not care for Leibniz's philosophical view or that of the Chinese, it is certain that the binary system brings simplicity. Transmission of signals encoded as strings of ones and zeros brings simplicity to equipment design and improves system performance. Imagine a positive pulse signalled using +5 volts. In binary, it would take noise as high as ±2.5 volts to introduce an error. Instead, if we were to use a decimal system, the range of 5 volts would be divided up into ten parts and it would take noise only ±0.25 volts to introduce an error. It is far easier to decide between two alternatives than ten of them. The binary system of number representation is essential to PCM and all of digital technology.

Now that PCM established transmission in binary, engineers naturally got curious about the rate of transmission. Exactly how many bits can we transmit per second? What was its relation to such important measures as bandwidth, signal power, and noise? Engineers at Bell Labs who had been working on pulse modulations for telegraphy had already asked this question even before Reeves invented PCM. It is in this context that we once more encounter Harry Nyquist. Of course, since PCM had not yet been invented, Nyquist did not talk in terms of bits. Rather, he talked about symbols and the number of symbols used to form a coded character. Different systems might use different values for these design parameters. Nyquist was looking for a way to unify these differences, so that apples and oranges could be compared not as apples and oranges but as fruits. He borrowed a common English word and used it in a new context: _intelligence_.

In his classic paper of 1924, Nyquist defined the rate at which intelligence could be transmitted. The relationship was logarithmic with respect to the number of symbols and the encoding of those symbols. He constructed an "ideal" code that was able to convey more intelligence per symbol. He then used this to analyse existing telegraph systems and found that Continental Morse code that used two current values (symbols) needed 8.45 signalling elements per letter. This was poor compared to Nyquist's ideal of 3.63. Continental Morse code that used three current values, thus a ternary system of signalling, was at 3.77, a figure lot closer to the ideal. In Nyquist's time, such a ternary system was in use for submarine telegraphy.

The next advancement came in 1928 from an American engineer. Ralph Hartley was a Rhodes Scholar with a degree from Oxford University. In the pioneering years of radiotelephony he had made splendid contributions in electronics and matched this achievement during the years of World War I. Studying the transmission of intelligence as noted by his colleague at Bell Labs, Hartley came up with a significant finding about band-limited systems,

the total amount of information which may be transmitted over such a system is proportional to the product of the frequency-range which it transmits by the time during which it is available for the transmission.

To put it in simpler terms, greater the signal bandwidth, greater the information that can be conveyed; greater the time for transmission, greater the information. In such a definition, Hartley for the first time introduced the word _information_.

These were the days when certain ideas that had been in nebulous forms took root in theory and began to mature. Ideas that had been in the background came to be well articulated and recognized. Before the 1928 seminal papers of Hartley and Nyquist, it was recognized rather informally that the rate at which pulses could be sent depended on bandwidth; but the notion was vague and had no mathematical backing. Hartley and Nyquist changed all that. Today this relationship is taken for granted as if it had always existed since the beginning of telegraphy. Yet what seems simple today in fact took a long time to enter mainstream engineering.

While the sampling theorem of Nyquist states that the signal should be sampled often enough to avoid aliasing, there is an interesting counterpart to it in relation to signalling. In his paper of 1928, Nyquist noted that if a system attempts to signal pulses at more than twice the available channel bandwidth, intersymbol interference occurs. With the use of sinc pulses at a rate exactly twice the channel bandwidth, ISI can be overcome. This is possible only because when a pulse is at its peak, all other pulses pass through zero. Today this condition goes by the name _Nyquist criterion for zero ISI_. Hartley had assumed that ISI was a limiting condition for information rates. He had followed in the tradition of Thomson and Jenkin of the early transatlantic telegraph era. Now Nyquist established conditions of zero ISI and debunked a long-standing myth. At least, this was the theory. Reducing it to practice proved to be more difficult.

With sinc pulses, even a slight timing error leads to ISI that does not converge. Engineers compromised by using non-ideal pulses at the expense of extra bandwidth. It is from here that raised cosine pulses have their origin. They are more tolerant to timing errors since they decay faster in time when compared to sinc pulses. Alternatively, Nyquist suggested that it was possible to maintain the same bandwidth with non-ideal pulses but allow for some ISI that could be corrected using a receiver technique called _equalization_. Equalization has some similarity to Harold Black's negative feedback amplifier, whereby delayed samples of the received signal are fed back into the receiver chain to compensate for ISI.

Though both Nyquist and Harley answered the question about symbol rate, they had given different names for it. In the end, Hartley's information prevailed over Nyquist's intelligence, never mind the fact that both words were distortions of their common meanings. Clarification on what exactly was meant by information had to wait for later decades. For the moment, engineers were happy with their definitions and no one questioned them. Like Nyquist, Hartley too had derived a similar logarithmic equation that related information to symbols. While Nyquist had presented his logarithmic measure without suitable justification, Hartley had related it to intersymbol interference and transient signal decay on the line. The pioneering ideas of Nyquist and Hartley remained dormant for two decades before they were taken up by a brilliant engineer who would one day be called the father of information theory.

This future father was born in 1916 as the son of Claude Sr and Mabel Shannon. Claude E. Shannon showed from a young age his twin interests in building things and abstract thinking. This marriage of opposites was indeed rare and marked Shannon's uniqueness. It was therefore natural for him to graduate from the University of Michigan with a double degree—one in electrical engineering and the other in mathematics. He landed his first job the same year as a research assistant in MIT. He was then only twenty and within the next four years he completed his master's as well as PhD dissertations. While the latter was not widely known, the former was a work of the first class (Chapter 8). This alone would have granted him immortal fame in engineering circles but something else happened. Sometime in 1939, Shannon started looking into the problems of digital communication.

Shannon picked up the threads where Nyquist and Hartley had left them. He was particularly inspired by Hartley's idea of information and how it related to symbol encoding, bandwidth, and time. Though Shannon saw many avenues for improving these results, he never got a chance to work on them full-time. War loomed on the horizon and there was every possibility that America would be dragged into it. It was under these circumstances that Shannon joined Bell Labs in the summer of 1941. During the war years, he contributed in two areas. First was on cryptography with direct involvement in the X System in which PCM played a pivotal role. Here too Shannon produced a masterpiece that would become public only after the war ended. The second was on the problem of controlling anti-aircraft guns, which at the surface looks to be a problem unrelated to the general work of Bell Labs. What on earth does gun firing have anything to do with telephony?

Engineering and mathematics have one thing in common—the ability to see through specifics and derive powerful generalizations. The problem in anti-aircraft gunfire control was to be able to predict the future position of an aircraft based on past data. Engineers saw that past positions are usually imprecise. Noise corrupts measurements. To be able to remove this noise and obtain clean usable data was like a telephone receiver picking out signal from the noise. Signal analysis was in itself an abstract method that could be applied equally well to telephony and gunfire control.

This was right in the domain of Norbert Wiener, a professor of mathematics at MIT. A recognized child prodigy, Wiener had studied such diverse subjects as biology, mathematics, and philosophy. He obtained his doctorate in philosophy from Harvard at the age of nineteen but it was his postdoctoral years in Cambridge under the guidance of G. H. Hardy, which made him one of the best mathematicians of the century. Hardy was unique among professors. Not only did he introduce to his students the beauty of mathematics, but he also took keen interest in promoting exceptional talent. Just as Indian mathematician Ramanujan benefited from Hardy's guidance, so did Wiener. Where Wiener differed from Hardy was in his broad perspective, his interest in application of the subject to science and engineering. Thus, we find in Wiener a parallel with Shannon.

Among Wiener's earlier work was functional integration applied to Brownian Motion and generalized harmonic analysis, which can be seen as an extension of Fourier analysis. Wiener's interest in both stochastic processes and harmonic analysis was a perfect match for the gunfire control problem. Wiener's idea was to filter out the noise, smooth out the data, and thus derive predictions. That signal could be modelled as a stochastic process was Wiener's key idea. Previously, it was only noise that had been treated as stochastic, not signal. His classified report was circulated in 1942. It came with a yellow cover. Everyone who read it knew that Wiener had done something significant but very few could explain precisely what he had done. Rich in mathematics, engineers found it difficult to understand much of Wiener's theory. The book was soon canonized as the _Yellow Peril_.

Shannon read Wiener's work and found similarity to what he himself had been doing. Shannon too had been looking at message sources probabilistically. While Wiener had looked at continuous sources, Shannon started with discrete sources, which he later extended to continuous ones. Through the war years, Shannon worked on and off on his ideas, after office hours, at night, at home. Therefore when his classified work on cryptography appeared in 1945, it contained novel ideas that would later come under the all-encompassing heading of _information theory_.

Meanwhile in another continent, unknown to researchers in the West, Russian mathematician Andrei Kolmogorov had made significant advances in probability theory. His masterpiece, _Grundbegriffe der Wahrscheinlichkeitsrechnung_ , was published in 1933 and established the place of probability firmly through mathematical formalism. Probability and statistics, which for long had been relegated to secondary importance, now occupied a place within the inner sanctum of mathematics. Modern theory of probability owes a great deal to Kolmogorov though others may see his contributions merely as synthesis of what had gone before. In later years, Kolmogorov did analysis on _stationary time-series data_ , that is, data that varies with time but whose statistical nature does not vary with time. This statistical constancy is what engineers call _stationary_. This was similar to the gun firing control problem. The task of estimating a signal based on noisy observations is done by what we today called the _Kolmogorov-Wiener filter_.

This revival of probability within mathematics was partly due to its successful application, first to thermodynamics and later in the 1920s to quantum mechanics. German physicist Werner Heisenberg, through his famous _Uncertainty Principle_ , gave probability an essential role in quantum mechanics. The world of atoms and quantum theories excites our curiosity simply because we find it alien to our own world of everyday experiences. Heisenberg himself commented that language is inadequate to describe the atomic world but mathematics is perfectly suited to do this. Mathematics has no limitations with regard to expressive power. According to the Uncertainty Principle, it is impossible to be certain of an electron's position and its momentum at the same time. There is uncertainty associated with each of these measures and it is probability that aids in understanding this uncertainty.

Like Kolmogorov, Vladimir Kotelnikov's 1933 contribution to sampling theorem came years before Shannon's own publication on the subject. For his doctoral dissertation of 1947, Kotelnikov applied statistics to signal detection in the presence of noise. His contributions to modern communications are immense though much of the impact remained within Russian borders and came to the West only years later.

Many of these developments happened unknown to Shannon whose inspiration came from Nyquist, Hartley, Reeves, and others at Bell Labs working on diverse aspects of technology. There was radio. There was telephony. There was telegraphy. There was PCM. There was cryptography. There was Wiener's prediction theory. Shannon saw in this diversity the need for a unifying concept. Ultimately, it was all about communication, getting messages from one point to another. Neither the medium nor the form of message transmission mattered all that much. These were secondary considerations. Shannon wanted to get to first principles.

Any cursory reading of technical journals and proceedings will inform us that most of research is evolutionary. Research papers start with principles established by others. What follow then are improvements, alternatives, verifications, or challenges. We get the impression of a community of world's best scientific minds collaborating, not competing. Disputes over priority are really in the minority. In this atmosphere of sharing and collaboration, grew many notable organizations—Institute of Radio Engineers (IRE), American Institute of Electrical Engineers (AIEE), just to name a couple. Perhaps the world's oldest scientific institution, founded in 1660, that continues to exist to this day is the Royal Society of London. But once in a while, a revolution happens.

While ideas about information since Hartley's time had been circulating loosely among scientific circles, it was Shannon who pinned them to the board and gave precise definitions. It was Shannon who picked up the scattered threads, trimmed their frayed ends, threw out the dross, created missing ones, and wove them all together. He could do this because he could see the tapestry, where others had seen only disconnected threads. He could see all components of the communication system and how they related to one another. Unconsciously following in the tradition of Newton, his theories were not published in a series of papers but _ex cathedra_. Shannon's information theory, which had taken shape all through the war years, did not appear in public domain until 1948. Under the title "A Mathematical Theory of Communication," it ran to nearly eighty pages and appeared in two issues of the _Bell System Technical Journal_ , of July and October. With 28 sections, 23 theorems, and 7 appendices, it was nearly a book that became the de facto manual for all communication theorists.

That Shannon's theory was mathematical was in no doubt. The first task was to define information. In the world of mathematics, we can define it in any way we want so long as it is consistent and practical. Shannon started with three axioms and stated that the only function that satisfied these axioms was the logarithm. Logarithm was also practical since every time an extra signalling symbol was added, it increased the number of possibilities exponentially. For example, when three binary symbols were used, there were 23 = 8 possibilities but when four were used, this doubled to sixteen. It made more sense to define information as the logarithm of these possibilities, that is, proportional to the number of signalling symbols, since this corresponded to system resources of time and bandwidth. Moreover, since the simplest way of looking at any information source was in terms of two values per signalling symbol, such as a one or a zero, the logarithmic measure was in base two. This was how Shannon defined information. It was a new measure that needed a new unit. It was named _binary digit_ or _bit_. This unit appeared for the first time in Shannon's 1948 paper, as a new meaning of an old English word of German origin.

Today we use this word so often that we rarely pause to seek its origins. Eight bits make a _byte_. An image file may be of the order of thousands of bytes called _kilobytes_ (kB). An MP3 file may be of the order of a few million bytes called _megabytes_ (MB). The size of a hard disk may be many thousands of millions of bytes called _gigabytes_ (GB). In terms of rates at which these bits are transferred, one talks of kilobits per second (kbps), kilobytes per second (kBps), megabits per second (Mbps), or gigabits per second (Gbps).

Talking of bits and bytes in this manner is not very interesting because it is an absolute measure. Shannon realized that the key is to analyse information sources themselves. He attempted to understand the source first and later apply that knowledge to message representation and transmission. Understanding the source meant studying source probabilities and states. This notion had completely escaped Hartley, who had looked at symbols straightaway without looking at the source. Information had to be related to sources and these had to be modelled probabilistically. Say for example, the toss of a coin. It has two equal possibilities of either head or tail. It was therefore capable of producing a single bit of information. The roll of a dice has six equal possibilities and we require three bits to signal the outcome. Clearly, three bits can signify eight outcomes, not just six. So the information of a dice throw was somewhere between two and three. This is where Shannon's logarithmic measure came into play. The information of a dice throw is exactly log2(6) = 2.585 bits per outcome. What would happen if the dice were biased, so that most of the time only four outcomes arise and the other two are very rare, say only 1% each? In this case, we expect the information from the biased dice to be a lot closer to two bits, two bits being sufficient to represent four outcomes. Thus, Shannon introduced source probabilities in the definition of information.

Now the question naturally arose in Shannon's mind whether source probabilities were fixed. In the case of a dice throw, the probabilities were clearly constant. One throw did not affect the outcome of future throws. Such a system is what engineers call _memoryless_. Each event is independent of preceding events. In the real world, we are not usually communicating coin tosses or dice throws. More commonly, we express ourselves in languages of well-defined structures. Every letter and every word has a dependence on those surrounding it. For example, in the sentence, "Today is Wednes___," it is easy to guess the next three letters that complete it. Everyday language is not memoryless. The probabilities of occurrence of letters are always changing. Not only that, it is easy to predict to some extent what may come next given what has already been read or heard. If we are able to predict something with certainty before it actually occurs, then even when it occurs later, it does not surprise us. Anything that does not surprise, gives no new information.

Shannon's measure of information was a measure of uncertainty. If things are always certain then there is no information. If a biased coin always shows heads, it brings no information. Not even that single bit is necessary since the outcome is already known. This is why an unbiased dice has higher information content than a biased one because the former has more uncertainty. While Nyquist had used the letter _W_ to signify the transmission rate of intelligence, for some reason Hartley used the letter _H_ for information. This was particularly prescient of what happened in Shannon's own paper. Shannon not only used the same _H_ , but also drew an analogy to a science that apparently had no affinity to information theory—the science of thermodynamics.

In the late nineteenth century, it was Maxwell and Boltzmann who had redefined thermodynamics from a statistical perspective. In particular, the Second Law of Thermodynamics, which came to represent not just its subject matter but also the fundamental workings of the universe, was shown by Boltzmann to be not absolute but only statistical. In the process, Boltzmann had talked about uncertainty and disorder. He quantified these in terms of _entropy_ , a term that Clausius had introduced earlier through classical thermodynamics. Josiah Gibbs later extended this measure of entropy. Entropy was simply energy that was unavailable. The First Law showed that energy could never be destroyed and the Second Law implied that it could become unavailable. The letter _S_ came to represent the entropy of thermodynamics.

Even in its own time, the entropy of thermodynamics caused confusion. The implications of the Second Law arrived at through statistical mechanics was so unsettling that entropy became the scapegoat of all debates. The controversy between classical and statistical thermodynamics gave birth to such fantastic creatures as _Maxwell's Demon_. The Demon showed that through his supreme intelligence, sharp vision, and nimble fingers he could see individual molecules, separate fast ones from slower ones, and thereby overcome the Second Law. Of course, the Demon was only a thought experiment and Maxwell used it merely to show that, in reality, no demon exists and the Second Law stands valid. In a gaseous mixture in thermodynamic equilibrium, one cannot tell slower molecules from faster ones. This is uncertainty. Though energy differences exist, useful work cannot be got from this system. In other words, energy is unavailable. This is entropy. Uncertainty and entropy are simply different ways of looking at the system.

Shannon saw the relevance to information sources and gave information another name, entropy. This was perhaps unfortunate because the use of the term stirred up old debates. People started to stretch the analogy between thermodynamics and information theory, so much so that it did more harm than good. For example, information was used to explain the actions of Maxwell's Demon. The Demon gathered information by observation and this meant consumption of _negative entropy_. Negative entropy, represented by the letter _H_ , had been central to Boltzmann's controversial _H-theorem_ , which was simply another way of stating the Second Law. Shannon's own motivation for using the symbol _H_ for information entropy came from the H-theorem. The only problem was that in Shannon's definition there was negative sign.

Entropy was in itself a confusing concept and now people started to talk about negative entropy. Perhaps this explained the Second Law better but it did not benefit information theorists. Mathematician John von Neumann, who had reframed in the 1920s Gibbs entropy for quantum mechanics, supposedly supported Shannon in his choice of the word entropy. Von Neumann argued in jest that Shannon would win every argument simply because on one really understood entropy. Shannon quickly followed up the definition of entropy by explaining its relevance,

The ratio of the entropy of a source to the maximum value it could have while still restricted to the same symbols will be called its _relative entropy_. This is the maximum compression possible when we encode into the same alphabet. One minus the relative entropy is the _redundancy_.

Entropy is a measure of how many bits on average one requires to transmit the source symbols. It is a statistical measure. Some symbols may use a lot of bits but on average, the source requires only what its entropy specifies. If one uses more than this number of bits, then there is redundancy in the transmission. We are wasting bandwidth. There is potential room for compressing the transmission to fewer bits. Entropy essentially defines a fundamental limit. One cannot compress the source messages to fewer bits that its entropy. What Sadi Carnot had once done for steam engines, Shannon now did it for data compression. Such fundamental limits are the pillars of scientific theories. They serve as ultimate engineering goals. Aiming for anything better is as futile as squaring the circle.

It is due to Shannon, Wiener, and Kotelnikov that communication became statistical, just as in earlier times thermodynamics had become statistical. The foundations of entropy and redundancy are in probability and statistics. There are many ways to understand entropy and redundancy. Suppose one is asked to produce a completely wet bath towel with the constraint that it should also be the lightest. In this case, we would wring out every possible drop of water from it but still not remove all of it because we want to preserve the essential quality of wetness. Every thread of the towel is wet but we can't squeeze out a single drop more. Such a towel is at its entropy. All redundancy has been removed.

Shannon himself talked of redundancy based on statistical properties within the source. A source in which all symbols are equiprobable has no redundancy. This was certainly true for an unbiased dice but not so for languages. Taking English language as an example, he showed that random choices of letters and words based on statistical analysis of the language resulted in sentences that resembled a lot like English. For example, when letters were randomly selected with dependence on the previous two letters, the following resulted,

IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.

The above sentence does not make sense but it contains a few correct words and others that suggest corrections. The prepositions also suggest a grammatical structure. English has lots of redundancy because of its structure and its grammatical rules. This is apparent because _evn whn mny letrs ar misng we cn oftn mke out wat tey rpresnt_. Some letter combinations in English, such as _th_ and _he_ , occur more frequently than _xp_. If the source is taken to contain 27 symbols (26 letters plus space), English has an entropy of log2(27) = 4.755 bits. In reality, most of these bits do not convey information due to redundancy. It is estimated that the entropy of English is only about a single bit per symbol.

Another classic explanation says that entropy is the average number of questions one would need to ask to identify a specific source message from a set of possible ones. Let's say a deck of playing cards (without the jokers) is used as the source. A card is picked at random and we are required to guess it. If we are really lucky, we may get it right with a single try. If we are really unlucky, we would need 51 tries. Let's say that this game is played thousands of times. We will find that on average, we would require 26.5 tries. In terms of information measure, this is 26.5 bits. Can we do any better? What is the entropy of this deck of cards?

Shannon's formula says that the entropy is log2(52) = 5.700 bits. Theoretically, we require only 5.7 questions on average to guess the right card, not 26.5 questions. Suppose we adopt a different strategy. Instead of running through the deck one card at a time, let us partition the message space into distinct sets—blacks or reds; hearts, diamonds, spades, or clubs; ≤ 6 or otherwise; and so on. Now we ask intelligent questions by this method of divide and conquer. Is it a red card? Yes. Is it a diamond? No. Is it ≤ 6? Yes. Is it ≤ 3? No. Is it four of hearts? Yes. In this case, we have managed to do it in just five questions, lower than the entropy. In most other cases, we require six questions. On average, it can be shown that we require 5.769 bits. This is significantly better than our old strategy that made us ask 26.5 questions on average and 51 questions in the worst case. Thus, we have certainly approached the entropy but there is still room to improve on our strategy.

Approaching the Entropy of a Deck of Cards

By asking intelligent questions, we require on average (5·3 + 6·10) · 4/52 = 5.769 bits per symbol. This is close the source entropy of 5.700 bits per symbol.

Shannon argued that rather than guess the card from a single experiment, it makes sense to perform lots of experiments and make a combined guess of all experiments. It sounds counter-intuitive and complex. This is perhaps why no one had thought of it earlier. Only when Shannon put it down as equations and mathematical derivations that people understood what he really meant. In a way, it is the advantage of economies of scale in the world of probability. In exceptional circumstances, it is possible to encode source messages in bits equal to its entropy. For example, in a test cricket match, assume that there can be only four outcomes—team A wins, team B wins, match is a draw, or match is abandoned due to rain. The respective probabilities are {1/2, 1/4, 1/8, 1/8} and the bit encodings are {0, 10, 110, 111}. The average number of bits to perform this encoding can be obtained by a useful technique called a _weighted average_ , the weights in this case being the probabilities. Thus, the average number of bits is 1·1/2 + 2·1/4 + 3·1/8 + 3·1/8 = 1.75 bits per outcome. This is exactly the source entropy, which is given by -1/2·log2(1/2) - 1/4·log2(1/4) - 1/8·log2(1/8) - 1/8·log2(1/8) = 1.75 bits per outcome.

However, such well-ordered distributions of source probabilities, as inverse powers of two, are uncommon. In the card game, all cards have equal probability. In such situations, Shannon showed that encodings (or guesses) should be based on sequences of source symbols, not just symbols in isolation. There is some wastage in bit encoding if symbols are considered in isolation. With long sequences approaching infinite length, encoding asymptotically approaches entropy. Even when source symbols are not equally probable, advantage is gained because in such long sequences, some combinations of outcomes are typical. _Typical sequences_ are those that occur with probabilities within a certain expected interval. Most probable outcomes are often not typical. Encoding typical sequences with shorter codewords brings us closer to the entropy. This is Shannon's famous _noiseless coding theorem_ , also called the _source coding theorem_. The penalty is that we have to wait for a long sequence of experiments before we can start guessing or encoding the results. This delay means that in communication systems, Shannon's idea is not realizable, although from a theoretical perspective it is insightful.

Shannon did not stop here. The noiseless coding theorem ignores one thing that all forms of message transmissions have in common: noise. While Nyquist and Hartley had looked at information and symbols, neither had related them to noise. This is ironical since Nyquist himself had done theoretical analysis of thermal noise the same year (1928). Shannon saw that information had to be related directly to noise, which was fundamental to all communication systems. He gave noise the prominence it deserved in system engineering. Noise intruded upon signals before they reached the receiver. Everyone knew this but Shannon put this down in his paper in a form that engineers call a _block diagram_. In such a diagram, the system is analysed in terms of its components. Even before Shannon, engineers had realized the power of abstraction in such diagrams. These diagrams enabled system-level analysis without requiring details of amplifiers, impedance matching, or line capacitances. It really didn't matter if noise was from the environment or at the input of the receiver. That noise was modelled as adding itself to the signal was the real significance. Shannon's block diagram of a communication system is now canonical. Some have even called it the information theorist's "coat-of-arms."

Shannon's Canonical Diagram for a Communication System

Source: (Shannon 1948a, pp. 381). Reprinted with permission of Alcatel-Lucent USA Inc.

While Wiener had tried to filter out noise from the signal, Shannon overcame noise by the process of encoding. Even in Shannon's time, encoding was an old concept, in fact as old as Chappe telegraphy. Though encoding made the signal resilient to noise, it also slowed down the rate of transmission. Say, the toss of a coin is to be communicated. Instead of sending {0, 1} we might send {000, 111}. The basic symbol is repeated so that even if one bit is in error, by applying a majority rule, the receiver can figure out messages correctly. Reception of 001, 100, or 010 will be interpreted as zero. If only one of these three bits is in error, the system can correct it. If two or three bits are in error, message gets corrupted. It became obvious that if we want to reduce error probability below a certain threshold, we need to increase the repetition. While this is good from the perspective of overcoming noise and approaching error-free communication, encoding by repetition reduces rate of communication. In the above example, if the channel is able to handle transmission at 300 kps, effective rate for the source is only 100 kbps. Clearly slower communication was not desirable. This is why not many had looked further into encoding. What came next was Shannon's most spectacular result.

Shannon proved that it was not necessary that transmission rates be reduced to near zero to achieve error-free communication. Encoding by repetition was really a dumb way to do it. If we adopt more intelligent encoding schemes, it is possible to overcome noise without sacrificing on the transmission rate. In fact, transmission rate can approach channel capacity and still make communication possible with arbitrarily low probability of error. This was as revolutionary as Einstein saying that travelling close to the speed of light slows down time and we stop aging. Just as physicists at the turn of the century found it difficult to accept Einstein's theory of relativity, so did communication engineers about Shannon's _noisy coding theorem_ , also called the _channel coding theorem_ or the _fundamental theorem_. Shannon knew this would be the case and to convince the sceptics, he gave an example of how such a code could be constructed.

Since source symbols have associated probabilities, these can be used to design better codes. Transmission of a message was nothing more than selection from a possible set of messages with an a priori probability. Shannon once more looked at long typical sequences. Instead of encoding each source symbol separately, we need to encode long typical sequences. The reason this works is simply because each typical sequence is separated from another by a "large distance," so that even with errors in some bits, the decoder in the receiver can figure out the correct transmitted sequence and hence the correct source message. In a typical sequence of _n_ source symbols, we can visualize the encoding as being effected in an _n_ -dimensional space (Chapter 6). Probabilistically, noise may introduce errors in a few dimensions but almost never in so many dimensions as to cause a decoding error. The best part is that the codes can be chosen at random. The only criteria are that sequences must be really long and transmission rate must be below channel capacity. Specifically, encoded sequences can be selected at random from 2nH(X) typical sequences, where H(X) is the source entropy and X is the random source symbol.

All that now remained to be defined was channel capacity. In channel capacity, all the concepts fell into place beautifully. Source entropy was related to entropy of symbols received at the other end of the channel. In a noisy channel, uncertainty of the source alphabet was reduced at the receiver based on symbols received. The difference is today called _mutual information of the channel_ or _information processed by the channel_. Since uncertainty means information, the goal was to maximize this mutual information. This maximization is defined as channel capacity. Shannon himself saw encoding as a process of matching source characteristics to suit the channel for maximum transmission rate. This matching process was not very different from the impedance matching of circuit theory. Based on entropies of sent and received symbols, Shannon gave an intuitive explanation for transmission rates and hence channel capacity

as the amount of information sent less the uncertainty of what was sent.... the amount received less the part of this which is due to noise.... the sum of the two amounts less the joint entropy and therefore in a sense is the number of bits per second common to the two.

The final embellishment to the fundamental theorem came by a direct relationship between channel capacity and noise. Capacity was proportional to bandwidth as well as to the logarithm of SNR. This equation established the necessary trade-off between signal power and bandwidth. It gave engineers the ultimate goal for the future. Given a certain SNR and bandwidth, they aimed to achieve the capacity given by Shannon. The way to do that was through intelligent coding. In the 1940s, channel capacity had rarely been reached. Shannon's equation suggested that information theory had just been born and there was still a long way to maturity. Reaching channel capacity remained the ultimate goal all through the twentieth century and beyond.

Some mathematicians including Joseph L. Doob commented that Shannon's proofs were not rigorous enough, but Shannon was addressing mostly engineers. Though his theory was mathematical, he had great physical intuition about how things worked. His block diagrams, fan diagrams, intuitive explanations, examples, and relevant discussions made the subject matter readable. Detailed proofs when required were relegated to the appendices. Moreover, Shannon hated writing. When he had solved a problem, he immediately moved on to other problems of interest. If at all he took the trouble to publish, he dictated to his wife Betty, the result was typed, and sent off to the printing press without further editing. Betty, herself a mathematician at Bell Labs, had met Shannon at work. They had dated over lunch. They had got married in the spring of 1949 when his classic 1948 paper was already well known and his reputation established.

In later years, though Shannon continued to work on information theory, his achievements afforded him sufficient freedom to do what he wanted. He programmed a mechanical mouse that could solve mazes. He programmed a computer to play chess. He analysed the art of juggling mathematically. He was often seen in office corridors riding a unicycle while juggling at the same. No one in Bell Labs questioned. Shannon's preference for pure research with possibly no foreseeable output is almost impossible today. After fifteen productive years in Bell Labs, he returned to MIT in 1956 and took to teaching. He was enthusiastic about teaching. His students had the rare privilege of learning the subject from one who had founded it.

As information theory got popular, it came to be applied in diverse areas of study—biology, linguistics, financial market analysis, thermodynamics, and quantum computing. Biology, which had long been about chemical reactions, cellular processes, or enzymes, now added a new line of enquiry, that of information. Biological structures are complex and the question of passing them from one generation to the next, or one cell to another, was really passing information through biological processes. Ukranian researcher George Gamow went so far as to reduce the analysis of amino acids and proteins to a purely combinatorial problem. In the field of quantum mechanics, Heisenberg's Uncertainty Principle could be quantified if necessary in terms of information theory's entropy.

In a publication to celebrate fifty years of the theory, Sergio Verdú listed its influence on eight scientific disciplines spanning more than forty sub-disciplines. This impressive list is perhaps only partial. This popularity can be traced to debates over entropy and extrapolation of the word information beyond the boundaries defined by Shannon. Message, information, bits, uncertainty, redundancy—these are words of everyday usage. Information theory had simply reused them in a new context; but people found it easy to extend the theory to other disciplines because language served as a convenient bridge. What and how we understand, communicate, and interpret are strongly, though not exclusively, based on language. Shannon cautioned researchers about this new fad of jumping on to the bandwagon of information theory. Wiener agreed and urged that information theory should never lose the essence of its statistical origins.

While some embraced information theory in an attempt to bring mathematical structure to their own scientific disciplines, others saw its faults. We communicate to be understood. Information we exchange carries meaning. Is it then sensible to quantify communication and meaning using just bits? And why are all bits equal? Clearly, some words are more important than others. Between the phrases, "please do not sit" and "please do sit" it is clear that loss of a single important word can change the entire meaning. This suggested to many that different parts of a sentence carry different amounts of information. The theory does not consider non-verbal cues, which are so important in communication. Hugh Prater had once said, "If a man takes off his sunglasses, I can hear him better." Information theory had said nothing about such realities of communication. It had failed by focusing solely on abstract symbols, probabilities, and encodings.

Shannon had made it clear in the first page of his 1948 paper that meaning was irrelevant and communication from an engineering perspective was not about human cognition but about transfer of messages,

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have _meaning_ ; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem.

Part of the confusion may be due to a 1949 publication in _Scientific American_ by Warren Weaver. With a degree in civil engineering and mathematics, and interest in mathematical physics, Weaver had published in his younger days a book on electromagnetism. This had become the standard textbook for graduate students for many years. In 1932, he moved from the world of teaching and research to become director for the natural sciences of the Rockefeller Foundation. The focus of the foundation was in the life sciences. When war came, it was under Weaver's leadership that the anti-aircraft gunfire control project operated. It was under such circumstances that Weaver met Shannon. When Shannon's paper came out, Weaver intended to introduce the work to a wider audience.

He removed the difficult mathematics, simplified the concepts, and drew analogies from everyday experiences of communication. He equated the information source to the human brain, the transmitter to the vocal cords, the signal to pressure variations in the air, the channel to air, the receiver to the ear, and finally the destination to the listener's brain. It was an innocuous representation meant only to simplify Shannon's original block diagram that might have appeared alien to common readers. This analogy did not bother information theorists who knew all along what they were doing; but it was a disaster for psychologists and linguists. For years, they had been following the Aristotelian model consisting of speaker, listener, and message. Suddenly information theory threw into the works encoding, decoding, channel, and noise.

On one hand, psychologists and linguists wanted to benefit from information theory; but on the other hand, they got themselves into a knot over interpretation of the communication process. It was difficult to reconcile the two viewpoints. Information theory suggested that all of human communication was simply mechanistic, executed by mechanical apparatus. There is no role for the human mind. There was nothing to understand but merely consume bits coming in. The nature of information did not matter. What matter was how much and how fast. To the linguists, information theory was not about communication at all. It was only about transportation of bits. It seemed absurd that one could talk about communication without acknowledging the essential role of meaning. Charles Kettering summarized the argument in particularly poignant words, "You can send a message around the world in less than a second but it takes years to get it through the human skull."

What this debate brings out is that one must be cautious in applying information theory. The use of the word information itself has a specific connotation within the theory. When we try to do more with such common words, we are asking for trouble. Information theory definitely does not concern itself with meaning. Meaning cannot be obtained from raw bits, which is why machine translators are complex to build, which is why human translators and interpreters are still employed at the United Nations. Meaning is in itself a complex topic and firmly belongs to the linguists. Associating meaning to words and phrases relies on cultural, sociological, and linguistic influences. Even linguists squabble amongst themselves about how language conveys meaning. Suppose one were to look up any word in the comprehensive Oxford English Dictionary. The word would be explained in terms of other words. To understand the explanation, one may need to refer to those words, which in turn refer to other words. When explanation of words is based on only words, there is circularity. Words alone do not serve to bring out meaning. Ideas and associations are learnt through a complex process with support of human memory and experience. Applying information theory to meaning is not trivial and perhaps not recommended.

By mid-twentieth century, PCM had kick-started digital technology. Statistical communication theory had laid the necessary foundations. Shannon had pointed out the ultimate goals for communication engineers. He had, however, not told them how to achieve those goals in a practical manner. His typical sequences needed complex processing and introduced long delays. Nonetheless, Shannon's theories were reason enough to keep engineers busy for the rest of the century. There was a revolution waiting to happen and it was digital.

# 0101 All in a Few Words

**One day in** 1951, Robert M. Fano gave his students at MIT a choice. They could choose to write the final exam or submit a term paper. The subject matter was information theory, formally inaugurated by Claude Shannon just three years ago. There was still much to be discovered and many more practical problems to be solved for any enterprising engineer. Should the student choose the latter option, he was required to come up with a better method of encoding the source than currently existing methods. Most students took the easier option of writing the final exam. As early as 1948, Fano himself had proposed a method of encoding that brought the number of required bits closer to the source entropy. Shannon too had come up with a method based on cumulative probabilities of source symbols. He noted the similarity to Fano's work. Since then, that method has come to be called _Shannon-Fano Coding_. It was a challenge indeed, perhaps even presumptuous, to even think it possible to beat the father of information theory at his own game. Fano, however, knew that Shannon-Fano coding was not optimal and there was much room for improvement. One student rose to the challenge: David A. Huffman.

As a child, Huffman was a slow learner. He had a difficult childhood, which included the divorce of his parents. As he started school, it soon became clear to his mother and teachers that he was gifted. He graduated at eighteen from Ohio State University, immediately joined the US Navy, and served during the Second World War. His love for mathematics was what motivated Huffman to take up the challenge posed by Fano. It was clear to Huffman that any coding method had to make use of source probabilities. Morse code had done this. Shannon-Fano coding had done this. What was not clear to Huffman, or to anyone else, was the means to achieve optimality. After many frustrating months of hard thinking that seemed to lead nowhere, Huffman almost gave up on the task. It seemed probable that he would write the final exam instead. It was then that he had a flash of insight. The problem was quickly solved and so was born the famous _Huffman Coding_.

Methods of encoding that include Morse code, Shannon-Fano coding, and Huffman coding come under the broad category of _source coding_. The goal of source coding is to encode the source messages or symbols using as few bits as possible. Source codes strive to remove redundancy and approach source entropy. While Morse code is based on a priori probabilities of the source alphabet, Huffman coding looks at the source, calculates source symbol probabilities, and then commences the encoding. Thus, we do not talk of Huffman code but of Huffman coding. Huffman coding is all about the method rather than about a fixed code. In this sense, Huffman coding is also more general. While Morse code is limited to the English alphabet, Huffman coding can be applied to any source. Morse code for instance could be used to communicate text written in Chinese, Japanese, or Korean scripts (CJK), but this would involve tedious manual mapping of nearly 75,000 symbols to elaborate and lengthy sequences of dots and dashes. The procedure for doing this is automatic if one applies Huffman coding. Moreover, it is matched for particular source statistics. Thus, two different Japanese texts could both use Huffman coding but end up using different optimal Huffman codes.

The underlying principle for all source codes is the same—to use shorter codewords for more frequently occurring source symbols. Shannon-Fano coding had done this by successively partitioning the symbol alphabet into roughly equal cumulative probabilities. Huffman saw the flaw in this approach. Sometimes it happened in Shannon-Fano coding that some of the shorter codewords remain unused. Since encoding is done in binary digits, it made sense to Huffman to consider two symbols at a time rather than cumulative probabilities of many symbols. In other words, Huffman saw the essential principle that it was better to start with lowest probable symbols and consider pairwise accumulation of probabilities rather than consider sets of symbols. Huffman's bottom-up approach, rather than Shannon-Fano's top-down approach, proved to be optimal. Once the source was analysed in the form of a _code tree_ , it only remained to assign ones and zeros to define the codewords. Huffman himself used visual imagery to make this point,

Since the combining of messages into their composites is similar to the successive confluences of trickles, rivulets, brooks, and creeks into a final large river, the procedure thus far described might be considered analogous to the placing of signs by a water-borne insect at each of these junctions as he journeys downstream. It should be remembered that the code which we desire is that one which the insect must remember in order to work his way back upstream.

Considering a biased dice as an example, it is easy to demonstrate the gain due to Huffman coding. Suppose the symbol probabilities are {0.2, 0.35, 0.1, 0.2, 0.1, 0.05} for the respective symbols {1, 2, 3, 4, 5, 6}. This has an entropy of 2.3394 bits per symbol. Shannon-Fano coding will give us the binary encoding {100, 00, 110, 01, 1110, 1111} with a weighted average of 2.6 bits per symbol. Though Huffman coding does not give a unique code, all possible codes have the same efficiency. One possible Huffman code for the biased dice is {11, 00, 011, 10, 0100, 0101} that uses only 2.4 bits per symbol, thus better than Shannon-Fano coding. It is clear in this example that Shannon-Fano coding has not utilized the two-bit codewords fully. With Huffman coding, a sequence of outcomes of dice throws such as "1225432146" will be encoded and transmitted as "11 00 00 0100 10 011 00 11 10 0101"

Huffman Coding of a Biased Dice

Huffman coding gives 2.4 bits per symbol and is known to be optimal. Entropy of this source is 2.34 bits per symbol.

In reality, spaces are not transmitted. Yet the decoder knows exactly where a symbol begins and where it ends. This is because Huffman codes, like Shannon-Fano codes, have the important property that the bitstream is _uniquely decodable_ even without symbol separators. That is not all. The encoding is such that the bitstream is _instantaneously decodable_. In other words, the decoder does not have to look ahead and then decide on the bits already received. The decoder can decide as soon as enough bits have been received. For example, receiving "00" implies without doubt a 2. Receiving "01" means that the decoder has to wait for at least one more bit. If the next received bit it "1" it implies 3; if it is "0" the decoder waits for one more bit to decide between a 5 and a 6. Such codes are called _instantaneous codes_. This is possible only because no codeword is a prefix of another codeword. These fundamental principles of source coding had been laid down by pioneers in the field in the early years of information theory. In 1949, L. G. Kraft for his master's thesis at MIT laid down the basic mathematical formulation for the existence of instantaneous codes. Seven years later, B. McMillan went further and proved that the same condition holds to ensure that codes are uniquely decodable.

Huffman did indeed produce a term paper but it turned out to be much more than that. His method was published in September 1952 in the _Proceedings of the IRE_ , then a leading technical journal. The simplicity of the whole thing, which others had obviously missed, made Fano exclaim, "Is that all there is to it!" Huffman code was the culmination of source coding techniques that had begun informally with Morse code more than a century earlier. For its class of coding techniques, Huffman coding remains to this day the optimal way of encoding the source. As Shannon had shown in 1948, better encoding comes if we extend the code by considering joint probabilities of many symbols and encoding them together. This, however, is rarely done in practice because of the complexity in calculating joint probabilities.

The idea of source coding is that fewer bits are required for communication. The only penalty is that the code itself is not known to the decoder in the receiver. While Morse code is a fixed code and known in advance, any code due to Huffman's method is generated based on a particular source. The encoder is therefore required to transmit the codewords, one per symbol, to the decoder. This is a small penalty often outweighed by the gains, especially when the source has only a few symbols. The second disadvantage of Huffman coding is that the encoder requires two passes over the source, the first one to calculate the probabilities and the second to do the encoding. This may be acceptable for encoding a text file but it is unsuitable for real-time communications that often comprise continuous streams of messages.

This prompted information theorists to think of alternatives to Huffman coding but real interest did not come for almost two decades after Huffman's work. This highlights some underlying truths about research. Once theorists identify something as optimal, they give up the problem to engineers. Engineers rarely take up something out of curiosity alone, unless they have fat R&D budgets to work with and nothing else to do. To interest engineers, the problem must be immediate and real. All through the 1950s and 1960s, voice dominated communications. PCM/TDM had established itself as the dominant technology. Research focus was all about improving PCM/TDM. Data communications was limited to a few specialized groups. The public was content with voice, at least for now.

What no one anticipated was that the era of data was just around the corner. The early beginnings of the modern Internet happened in the late 1960s. Given extremely low data transmission speeds, the need was felt to improve upon Huffman coding, not necessarily in the theoretic sense but in terms of simplicity and computational footprint. Among the early published papers was one by L. D. Davisson of the University of Southern California, published in 1973 under the promising title "Universal Noiseless Coding." Davisson's own research had been sponsored by the National Science Foundation (NSF) and by the Advanced Research Projects Agency (ARPA) of the Department of Defense. ARPA itself had been established in 1958 during the Cold War era, when the Soviet Union seemed to be taking the lead in technological supremacy. The ancestor of the Internet, ARPANET, was itself an outcome of ARPA funding. Ironically, Davisson quoted earlier works of Russians, Kolmogorov and B. M. Fitingof, and gave Kolmogorov credit for introducing the term _universal coding_.

As the name suggests, universal coding can be applied to any source even when the encoder has no knowledge of source statistics. In fact, when encoding continuous streams of data there is no guarantee that what was true of the past will be true either of the present or of the future. In engineering jargon, universal coding does not assume stationarity of the source. The source is treated not just as a stochastic process but possibly one whose statistical behaviour is changing over time. What then is the secret of universal coding? How does it manage to approach source entropy by ignoring source symbol probabilities?

Davisson presented a simple case of a binary source generating only ones and zeros. Davisson treated even the probability of a one as a stochastic parameter of a certain distribution. The trick is to encode large blocks of source symbols together. Each block can then be encoded in two parts—the first part indicating the number of ones, the second part containing a codeword that specifies the positions of the ones. Davisson then went on to show that with increasing block size, the first part tended to zero on a per symbol basis while the second part tended to the source entropy. Both parts are fixed-length codes unlike the variable lengths of Huffman codes. Davisson's method remarkably approached source entropy even without knowledge of source probabilities. However, his method was only theoretic and not constructive. It was not practical to encode extremely large blocks of symbols. Nonetheless, Davisson's paper formally launched the idea of universal coding that would soon interest many others.

Jacob Ziv and Abraham Lempel, two Israeli researchers of the Technion-Israel Institute of Technology at Haifa, published the most famous paper on universal coding. The work had been done two years earlier but as is common for reputed journals, there is a long delay between submission of manuscript and final publication. Technical submissions go through thorough peer reviews, often involving multiple reviewers. Corrected drafts go through further reviews. In the process, ideas are often sharpened, improved, discussed at conferences, and circulated privately. This explains why sometimes some papers are referenced by others even before they are published. Though the new proposal of Ziv and Lempel had been floating around in scientific circles for two years, it came to wider public attention only in 1977. In time, their method became famous as the LZ77.

One year earlier to Davisson, Zev had proposed an encoding method that did not involve source symbol probabilities. Now in 1977, Zev and Lempel addressed practical concerns of performance and implementation. It was time for methods of source coding to move out of purely theoretical considerations. LZ77 was indeed quite simple when we pare it down to the basics. The algorithm learnt and built a dictionary of symbols as it proceeded with the encoding. When a new symbol was encountered, it was added to the dictionary. When a symbol was already in the dictionary, the encoder replaced the actual symbol with an index into the dictionary. The real saving came because the dictionary contained not just single symbols but long sequences of symbols. When similar long sequences occurred again, encoder sent only the index and the length. LZ77 was extremely simple to implement. It was versatile in the sense that it could be applied to letters of the English alphabet or at a lower level of bit sequences. What was powerful about LZ77 was that the process of dictionary building was sequential. This meant that the encoder didn't need to send the dictionary to the decoder. The decoder could build the dictionary in exactly the same way based on bits already received.

This method of indexing is quite prevalent not just in modern communications and computing but also in everyday experiences. Suppose one walks into a Chinese restaurant in London's Soho area. It suffices to say "14" and the waiter instantly understands the main course that has been ordered. A few minutes later, just as one expects, a hot steamy bowl of egg noodles with shredded chicken, shiitake mushrooms, and spinach is at the table. The only requirement is that both the customer and the waiter share the same mapping of numbers to items in the menu. In the case of LZ77, both the encoder and the decoder generate this mapping dynamically and independently. Yet they end up with the same dictionary since the _method_ of building and updating it is predetermined.

It turns out that this method of using a custom-built dictionary goes beyond mere data compression. In an interview of 2004, Ziv describes how it can be used for classification. For example, a dictionary built out of Beethoven's _Eroica_ may compress well his _Symphony No. 9_ but won't work quite as well for Mozart's _Eine kleine Nachtmusik_. This is because the dictionary consists of musical phrases that are typical of Beethoven but not of Mozart. This is particularly true of longer phrases. This application of the LZ algorithm is quite novel and is relevant to classification in diverse fields.

One of the problems with LZ77 was that the dictionary grew quite large at times. The creators had foreseen this problem and had addressed it by limiting the dictionary to a certain length of recent symbols. This was really a trade-off in the design of universal coding. Large dictionaries needed large memories but the process of encoding could become slow if the dictionary overflowed fast-access memories to slower hard drives. Smaller dictionaries capped performance degradation and cost of hardware, but the penalty was paid in terms of lesser coding efficiency. These considerations spawned an entire family of compression methods for the next three decades, each one proposing small changes to the basic framework of LZ77. LZ77 thus became the grand daddy of all methods that came to be classified under the general heading of _lossless data compression_.

The term data compression had been in common use since the early 1960s. It was a less pompous title compared to Shannon's noiseless source coding or Huffman's minimum redundancy coding. Lossless compression methods are those by which the original source symbols can be constructed exactly without loss of precision or fidelity. Voice on the other hand had taken the PCM route of quantized speech samples. The analogue nature of voice meant that PCM transmission and speech reconstruction at the receiver was only an approximation. It was not an exact copy of the original source. This meant that PCM came under the general category of _lossy compression_. It was alright to suffer some loss of fidelity for voice because quality was perceptual. Lossy compression is something like précis writing that many of us may well remember from our days of school grammar and composition. The idea is to leave out the details and focus on the main ideas. Data compression, on the other hand, had to be exact. Nothing could be discarded.

Ziv and Lempel, and their successors, left voice alone for the moment and focused on lossless data compression. They created the following year a variation of their original method. It came to be called LZ78. This was subsequently improved in 1984 by Terry Welch to give birth to LZW. LZW might have dominated compression technology all through the 1980s except that it was patented. When patent owners Unisys started suing users of LZW, this proved disastrous not just for LZW but also its predecessor, LZ78. The industry started to favour LZ77 and its variants. The result was that though LZ78 was more efficient than LZ77, it was unlucky and its misfortune propagated to all those who used it. By then, a company named CompuServe had introduced a new graphics image format in 1987. It was named _Graphics Interchange Format (GIF)_.

Until then, bitmaps or BMP files were common for storing and exchanging images. A single bitmap file, such as the screenshot of a 1024 x 768 display monitor, would consume 2.25 MB of memory. GIF was so revolutionary for its time that in some cases it could bring the same image down to only 100 kB. The real credit was due to the method that created the magic under the hood. This was LZW. The use of LZW was unfortunate for GIF since the threat of patent suits curtailed the spread of the format. Nonetheless, GIF continues to survive to this day because of its simplicity. Its early adoption has ensured widespread support in modern tools. Though limited to 256 colours, it has remained popular, particularly for its ability to handle animations, which are effected by sequencing many images in a single file.

Fearing the death of GIF, CompuServe in 1994 introduced a new image format called _Portable Network Graphics (PNG)_. The name itself was suggestive of the new trend in computing, which was to move away from standalone systems to networked systems. Networking required standard and efficient means of exchanging data. The 1990s was the decade when the World Wide Web (WWW) came into existence. It was in this era that PNG was born. This time CompuServe did not make the mistake of using any compression method based on LZ78. Instead, they chose DEFLATE, a method based on LZ77 and invented by Phil Katz in 1993. DEFLATE achieved a compromise between compression efficiency and speed by using a mix of LZ77 and Huffman coding. DEFLATE was all set to become the greatest compression method of the twentieth century.

Meanwhile, a separate concept had evolved for data exchange. The logic was quite simple. If one had to exchange many files to another person, it was cumbersome and inefficient to send each file one by one. If a bunch of papers had to be sent to someone in another country, it made sense to parcel them together rather than send them in separate envelopes. Clearly, the idea was neither new nor revolutionary but no one had come up with a format for exchanging digital data in this manner. It was from such a need that Thom Henderson released in 1985 the first file archival format called the ARC. Interestingly, ARC did not just archive files but also compressed them using LZW. Phil Katz improved on ARC and released PKARC in 1987. It turned out that PKARC was a dead end, just like GIF. It was based on LZW. Katz himself got into trouble with Henderson for copying the ARC format. The ARC encoder/decoder was commercial software but Katz had released PKARC as shareware, meaning that those who downloaded it used it on a trial basis and got no support.

Katz did not waste the experience he had gained in creating PKARC. From his subsequent efforts was born DEFLATE, which was used in a new archival format that he named PKZIP. Version 2.0 of PKZIP was released in 1993. It was a time when data was growing quickly and memory storage devices were lagging behind. Katz saw that users might need to store large files or archives spread across multiple 3.5-inch disks. PKZIP 2.0 introduced such a split-file feature, one that proved useful for many years. DEFLATE continued to remain the workhorse of many compression and archival tools through the 1990s and beyond. Compression utilities such as WinZip of Windows and compress/uncompress of UNIX used DEFLATE. DEFLATE is used by many networking software that work quietly in the background and are often not noticed by computer users. Though Katz had a patent on DEFLATE, he never enforced it. Perhaps he saw that though DEFLATE was superior to LZW, patent problems of LZW had also given DEFLATE an uncontested popularity. Katz made money from his invention but did not live long to enjoy it. Given to alcoholism, and even convicted of drunk driving, he died at the age of thirty-seven. Given his contribution to compression technology, we may rightly say that he lived a full life though somewhat compressed.

While such dictionary-based methods were simple, there was in the 1970s another method of coding not requiring the use of building and maintaining a dictionary. The encoding is based on the idea of reducing source messages to a single number. It seems almost incredible that a single number can represent long messages but the idea is rooted in the concept of infinity. Infinity is at the heart of that ethereal divide between the continuous and the discrete. Initial ideas date back to the ancient Greeks. In fifth century BC, Greek philosopher Zeno had proposed a thought experiment in which the legendary hero Achilles races against a proverbially slow tortoise. The distance is only a hundred metres and heroic Achilles graciously gives the tortoise a head start of half the distance. If we assume that the tortoise can run at half the speed of Achilles, Zeno argued that Achilles can never catch up. When he reaches the 50-metre mark, the tortoise has advanced to the 75-metre mark. When he reaches the 75-metre mark, the tortoise continues to be ahead at the 87.5-metre mark. This was one example of many of Zeno's famous paradoxes. It underscored the difficulty that the Greeks faced in understanding infinity.

The paradox is easily resolved by noting that as the race proceeds, the tortoise's lead is shrinking all the time. As the competitors approach the finish line, the distance separating the two is infinitesimally small. In fact, there may really be no clear winner since at the limit both may reach the finish line at exactly the same time. In reality, neither Achilles nor the tortoise can take infinitesimally small steps; nor can they live their lives in infinitesimally small parcels of time; nor can the judges arbitrating at the race measure infinitesimally small distances. The paradox is all about convergence to a finite limit of an infinite series of diminishing numbers. The relevance of Zeno's paradox to data compression is that on the number line from zero to one, one can name an infinity of real numbers. For that matter, given the two numbers 0.999999 and 1, one can still name an infinity of numbers greater than 0.999999 and less than 1. It is from here that the idea of _arithmetic coding_ was born.

Arithmetic coding was implicit in Shannon's 1948 paper when he had talked about encoding using cumulative probabilities. Through the sixties, others refined this idea but real development came only in the seventies at one of the world's largest makers of computing machines. International Business Machines (IBM) had been founded in the 1924 but its history can be traced all the way back to the 1890s. During the 1970s, the company started to look into distributed computing. Distributed computing implied frequent data exchanges between computers and called for novel methods of compressing data. Like Bell Labs, IBM had its own technical journal, the _IBM Journal of Research and Development_. It was in this than two engineers of the IBM Research Laboratory, J. Rissanen and G. G. Langdon Jr, published a dedicated paper on arithmetic coding in 1979.

The basic idea of arithmetic coding is not very different from the shrinking distances of Zeno's paradox. Suppose we want to communicate the results of the familiar biased dice experiment encountered earlier. Rissanen and Langdon suggested that we first partition the number line from 0 to 1 in conformance with the probabilities of the symbols. Given the symbol probabilities {0.2, 0.35, 0.1, 0.2, 0.1, 0.05}, if the dice shows up a 1, any number between 0 and 0.2 would represent such an outcome. If 2 is the result, the encoded number would be between 0.2 and 0.55. To convey a sequence of outcomes such as "23" we would select the interval for 2 and then partition it once more into smaller intervals and select the smaller interval for 3. Thus, "23" would be encoded as 0.2 + 0.35 x 0.55 = 0.3925. The encoder transmits this number in binary format. Like Huffman coding, the symbol probabilities, or at least their estimates, have to be sent to the decoder.

Arithmetic Coding

This example illustrates the encoding of outcomes of a dice throw. The sequence of outcomes is "236." (a) Number line represents probabilities of outcomes of the biased dice. (b) & (c) Subdivisions of the number line when a particular outcome is being encoded. In this case, the value 0.42575 would be used to signal "236."

It becomes obvious from this example of just two symbols that the rational number becomes increasingly precise with the addition of each symbol. Therefore, we can imagine how quickly this requirement of precision will grow when we attempt to compress a file of a few megabytes. This and other problems due to arithmetic coding found practical solutions through the next decade, but real implementation and widespread use came only in the 1990s. This was because arithmetic coding needed more computational resource. It was only in the 1990s that computers and memories got faster and cheaper. When this happened, the _Lempel-Ziv Markov chain Algorithm_ _(LZMA)_ was born in 1998 and within a year, it found commercial application in a new archival format named 7-Zip. LZMA was derived from the LZ77 family. While most relatives of the family had applied the LZ77 method at byte level, LZMA did bitwise compression. Compressed data was then put through arithmetic coding to result in further compression.

It is likely that compression and archival methods, formats and tools will not stop evolving and improving. It is also likely that many will coexist. Without them, storing and sharing files would have been onerously difficult. Thanks to them, today we are able to pack hundreds or even thousands of compressed files into small memory devices almost as small as pills in our pockets. Although these come under the class of universal coding, they are universal only in the sense of being able to compress any type of data. In terms of performance and effectiveness, some perform better with text and some others with images. Some that compromised compression for performance may continue to live on older machines. Both Ziv and Lempel could hardly have known back in 1977 that their method would have such a long history and impact on data compression.

Compression alone is not exciting without consideration of an everyday application. In fact, some compression techniques were perfectly suited to a particular application. One such technique was born in 1966 under the name _run-length coding_. Solomon W. Golomb of the University of Southern California, possibly in support of the croupier, introduced it in a rather interesting fashion,

Secret Agent 00111 is back at the Casino again, playing a game of chance, while the fate of mankind hangs in the balance. Each game consists of a sequence of favorable events (probability _p_ ), terminated by the first occurrence of an unfavorable event (probability _q_ = 1 - _p_ ). More specifically, the game is roulette, and the unfavorable event is the occurrence of 0, which has a probability of _q_ = 1/37. No one seriously doubts that 00111 will come through again, but the Secret Service is quite concerned about communicating the blow-by-blow description back to Whitehall.

The bartender, who is a free-lance agent, has a binary channel available, but he charges a stiff fee for each bit sent. The problem perplexing the Service is how to encode the vicissitudes of the wheel so as to place the least strain on the Royal Exchequer. It is easily seen that, for the case _p_ = _q_ = 1/2, the best that can be done is to use 0 and 1 to represent the two possible outcomes. However, the case at hand involves _p_ >> _q_ , for which the "direct coding" method is shockingly inefficient.

Finally, a junior code clerk who has been reading up on Information Theory, suggests encoding the _run lengths_ between successive unfavorable events.

Perhaps in the 1960s transmission of data was much more expensive than what James Bond could lose at the casino. It was certainly costlier many fold compared to what we pay today for fast broadband connections direct to the home. The idea of run-length coding is simple and powerful when long runs of certain outcomes are expected. Suppose James Bond had this opening sequence of outcomes at the roulette: 111111101111111111110110. In this case, one simply encodes the number of successive favourable outcomes given by the ones—7, 12, and 2. If each run-length is encoded using four bits, then the encoded transmission would be 011111000010. Thus, 24 bits of the original sequence has been compressed to just 12 bits. Obviously, the use of 4-bit encoding allows a maximum run-length of sixteen. Real application of run-length coding has many refinements that allow the encoder to handle runs of any length and thereby achieve maximum compression. One such application is facsimile transmission.

Facsimile is one of those unfortunate technologies that was born premature, had an uncertain childhood, languished long in its teenage years, and nervously entered adulthood only when society was ready to accept it. Even its single syllabic childish nickname, _fax_ , was coined only in adolescence. Born in the 1840s due to Alexander Bain, it might have displaced telegraphy early on except that the market was not ready for it. No one felt the need to send copies of documents or images across distances. At least, the need was not urgent and old traditional ways sufficed. The simplicity of telegraph appealed to one and all, particularly at a time when transmission speeds were low. Telegraphy, with its short messages of dots and dashes, was suited perfectly for the growing volume of messages that came from universal adoption. Facsimile was really a luxury and early telegraph networks were ill placed to handle the demands of high bandwidth.

Italian inventor Giovanni Caselli's pantelegraph led to the first commercial facsimile service. This was in the 1860s in France but the system did not survive into the next decade. Apparently, the general perception was that facsimile was intended for image transmission. It was not obvious that even typed or handwritten documents could benefit from it. Through the rest of the century many other inventions came, tried, and failed. It seemed that the century belonged to telegraphy and its new rival, telephony. The public did not seem to need anything more. When Arthur Korn invented his novel methods of facsimile in 1902, it generated new interest. Nonetheless, it took Korn many years to commercialize his system. Korn's success caught the interest of big corporations but they could not effectively compete until the 1920s. This was the decade when facsimile had a real beginning. It found use in limited circles for transmitting news photographs and weather charts.

Right up to the 1960s, this was the state of the facsimile market. It remained in specialized industries and was therefore expensive. Early attempt to introduce facsimile into homes for news broadcasts was stillborn. It was not a bad idea but the timing was bad since facsimile had acquired by then a formidable competitor—the television. Towards the close of the 1960s, it appeared that the time of facsimile had finally come. Riding on the strength of the transistor, an invention of 1948, electronic devices were getting cheaper and more reliable. The Japanese saw in facsimile a perfect technology for transmitting documents, which in their language were mostly pictorial. The growth of digital transmission methods was an added incentive. Dacom's Rapidfax became the first fax machine to use digital data compression in the late 1960s. This did not immediately translate into a roaring success partly due to lack of standardization. The first facsimile standards introduced by the Comité Consultatif International Téléphonique et Télégraphique (CCITT) in Europe were analogue. The first digital standards, Group 3 and 4, came out only in the 1980s. Dropping costs of hardware and general affordability made fax machines attractive for businesses and even homes. The new digital standards, along with methods of modulating waveforms digitally (Chapter 6), allowed machines to be directly connected to telephone lines, which were by then widespread. Standards allowed interoperability between machines from different manufacturers.

Typical documents have large areas of whites with lines of black text. This meant that run-length coding was perfectly suited for compressing such documents. Naturally, compression is best for text documents and poorer for images with many shades of grey or fine details. Once a line is scanned, it is converted to run-lengths. These are then coded using Huffman codes. For high compression, different Huffman codes are specified in the standards for different run-lengths, for contiguous white regions, or black regions. While there have been many factors to the eventual success of facsimile technology, the final push came from digital data compression. With analogue facsimile standards, a page had required many minutes of transmission. With digital compression, it is now done in a matter of a few seconds with better image quality. While it seems that facsimile found growing up difficult, its old age seems to be just as difficult. Then again, which technology doesn't have a difficult end of years? Obsolescence is the eventual fate of most technologies. Fax machines have largely been displaced today by the Internet and digitally signed documents (Chapter 7).

Perhaps the best method of data compression is to remove redundancy right at the source. This is not an engineering problem. Compression of this sort is purely a matter of taste, culture, and fashion. We know that English has much redundancy built into its structure. So why bother saying, "Great! I'll see you tonight," when one can mean the same thing with fewer characters: "g8 c u 2nite." Or be a little more cryptic with "GIAG BBFN TC" for saying, "Give it a go. Bye-bye for now. Take care." This is compression at user level. For such expressions, there are few rigid rules. It is a convention, a new language established by users, one that evolves like any formal language. This is the _Short Messaging Service (SMS)_ , which has gained much popularity through the growth of mobile telephony.

Examples of SMS Language

Mobile phone technology is extremely complex because it integrates diverse aspects of science and engineering. Yet in all this complexity, the application that saw tremendous growth through the 1990s and early 2000s was SMS. Even today, SMS continues to hold a key place in mobile communications, particularly in the delivery of notifications, reminders, and subscribed services. The success of SMS surprised even the technologists who had initially viewed it only as an extension to the older paging service. The power of SMS is its simplicity, the same attribute that made Morse code a success. SMS messages are short, simple, and to the point. There are no long ramblings. Messages are limited to 160 characters. Users are required to be terse and say no more than what is needed. Unlike the ringing of the telephone, SMS messages are non-intrusive. The user can choose to read and reply at her convenience. Messages can be stored and delivered to the user even hours later should she be unreachable at times. It is not surprising that many of these characteristics are shared by another successful application: Twitter. When Twitter turned six in March 2012, by then clocking 340 million tweets a day, a post on Twitter's own blog highlighted the power of expressing ourselves in a few words,

Now it seems that there are as many ways to express yourself in 140 characters as there are people doing it.... However concisely, it turns out there's plenty to say.

Part of the success of SMS has a psychological basis. Popular with teenagers, it occupies that delicate space between disconnection and interaction. Frequent messaging between two dating teenagers gives them the experience of a conversation that does not demand instant replies nor suffers awkward silences. It gives them space to think, evaluate, and reply. It has an additive influence too. Users send messages and restlessly wait for a reply. SMS is today more than a means of communication. It is a culture of the twenty-first century. This new culture that talks in its own language of compressed expressions, imprecise grammar, and complete disregard for punctuation, has raised concerns among parents and teachers. UK daily, the _Daily Telegraph_ , reported in 2003 the case of a Scottish schoolgirl who had handed in an essay in the language of SMS. Those unskilled and unschooled in the art of SMS always find it difficult to decipher messages, so much so that some say SMS is hieroglyphics of the modern era. Its successor that enables exchange of images and videos, the _Multimedia Messaging Service_ or _MMS_ , hasn't quite seen the same level of success. Images and videos too have their own special ways of compressing data by methods that go far deeper than just bits and bytes.



**Sometime in 1973,** researchers at the University of Southern California were searching for a high quality colour image to use as a demonstration of compression techniques they had been investigating. Their research was under the sponsorship of ARPANET, the ancestor of today's Internet. Those were the days when one couldn't simply download an image from a remote server. File sharing and searching across geographic boundaries was still a research concept. So the researchers rummaged through their limited collection of boring mundane pictures from the previous decade, pictures that had been used in TV research. Just then, a colleague walked into the laboratory with the November 1972 issue of _Playboy_ magazine.

When they looked at a nude picture of Lena Sjööblom, a 21-year-old Swedish model, engineers being engineers, had thoughts that were far from kinky. It occurred to them that this was the perfect image for their research. It had fine details, regions of varying contrasts, and flat regions of uniform colour. It probably didn't matter to them that Lena described her ideal man as a "24-year-old advertising salesman who's nutty and chubby." The image was cropped to the head and shoulders, scanned, and converted to digital form using a Muirhead wirephoto scanner and a Hewlett-Packard minicomputer. The image was put through digital compression techniques and the results were published. This was the beginning of Lena's rise to stardom in the world of image processing. For nearly thirty years thereafter, Lena appeared in various technical journals so much so that that _IEEE Transactions on Image Processing_ was considered the sexiest journal for years. Lena was the Mona Lisa of the image-processing world. Incidentally, that particular issue of _Playboy_ is to this day the magazine's best selling issue, with more than seven million copies sold.

Communication has always been a mixture of verbal and non-verbal elements. When telegraphy first came, the non-verbal aspects were sidelined and the message was reduced to just words. Telephony repaired some of this damage. Human voice carried not just words but emotions as well. Telephony brought back personal connections that users had with one another. Telephony was not just about the content of the message. Interactivity and conversational communication made telephony a success. In the early days of the Internet, technology once more regressed to the days of the telegraph. This was not intentional. Internet was conceived as a system to exchange data and machines to communicate. It was not meant as a system for human communication.

Those were times when data transmission technology was primitive. Telephone lines and networks had bandwidth to carry primarily text. Anything more sophisticated was a strain on the networks' capabilities. It was not long before users started to use the early Internet to send messages and mails. There was interactivity via Internet chatrooms but users talked with their fingers. The real change came through the 1990s when Internet's cousin, the World Wide Web, began to support images, videos, and even speech. This growth was the commercial aspect of communications. What really forged it was the digitization of all media.

Still pictures, videos, audio recordings, and speech come under the all-encompassing heading of _multimedia_. In the digital world, all media are reduced to bits. Once this happened, it was easy to build systems that did not care all that much about the type of media. Internet, WWW, and telephone networks had different capabilities but they could all carry bits. Only equipment at the periphery of these networks needed to translate between bits and their analogue forms. Networks carried bits. Users experienced multimedia. In between were hardware and software systems that did the necessary translations. In a world of analogue signals, all media had been different and needed unique solutions. One remembers the old days when VHS tapes stored videos, audio cassettes held rock and roll hits on black magnetic strips, and rare recordings were available only on gramophone discs. In a digital world, all one needed was a device to store bits—all sorts of bits, regardless of what they represented. With digitization, one witnessed the democratization of all media. In this digital democracy, all bits are created equal. It later turned out that some bits were more equal than others (Chapter 12).

Democratization of Media

In a digital world, all media are reduced to bits. Once in this common format, bits can be more easily processed, copied, stored, or shared. In addition, bits bring better transmission quality due to their inherent noise immunity.

What separated text from multimedia was that the former was created in a digital world. Text was a composition from a finite set of symbols. Multimedia on the other hand was born in an analogue world and then converted to digital formats for processing, storage, and transmission. In the case of speech, the bits were finally converted back into analogue waveforms. In the case video, it really depended on the end device. If the consumer had subscribed to digital cable TV but had an old analogue TV set, a _set-top box_ in between converted digital TV data to analogue signals. If the consumer had a digital TV, no conversion was necessary. Data was received digitally and displayed digitally. The human eye, however, did not see distinct pixels of a digital image or video. The human eye perceived and the brain interpreted images as a whole, as if they had been analogue all along.

What makes us perceive digital representations as analogue relies on the concept of _resolution_. Resolution has its underpinnings in Nyquist's sampling theorem. Paul Rainey's 1921 patent for a facsimile system had laid down the idea of spatially sampling an image and then coding the values with bits. Too few samples horizontally or vertically, or too few bits per sample, meant loss of resolution and a coarse approximation of the actual image. In such an approximation, details would be lost and fine features would become annoying contours. There would also be aliasing when Nyquist's sampling theorem was ignored. On the other hand, if engineers took care to render a digital image in high resolution, the image would look almost the original. Since resolution is a critical criterion of image quality, most digital devices quote it prominently. Scanners and printers are specified in terms of _dots per inch (DPI)_. Digital cameras are valued for the number of megapixels that are used to capture an image. Screen resolutions such as 1024 x 768 pixels define display capabilities of computer laptops. In general, bigger these numbers, better is the approximation to the analogue original. But why live with approximations, if the original is available? If analogue is so good, why bother with digital?

Apart from the democratization of media, digital has many other advantages. Digital has much higher noise immunity, which means that it can be transmitted over long distances with fewer errors. Digital representations can be easily compressed, thus saving on precious resources of time and transmission bandwidth. Compressed data can be stored and shared more easily. The engineering challenge is to get the parameters of conversion and compression right. While higher resolution generally leads to better quality, it is not always the case. While we expect a digital camera of 10 megapixels to be better than one of 5 megapixels, this is not guaranteed if the increase in number of pixels is achieved by making each pixel smaller. Smaller pixel means lesser capacity to gather light and hence lower SNR. Likewise, a standard digital video on a 40-inch LCD flat panel display is not going to look much better than on a 21-inch display unless the viewing distance is adequately increased.

Shannon had realized the difficulty in compressing analogue signals way back in 1948. Since an analogue waveform has an infinite number of possibilities, the entropy of such a source is infinite; we require an infinite number of bits to represent any sample and a channel of infinite capacity to transmit. It seems really a hopeless situation until Shannon pointed out that sometimes we don't need to transmit exact values. An approximation will suffice if we accept a certain level of signal distortion. When Reeves had quantized PCM samples in the 1930s he was actually accepting a certain distortion, distortion that could be quantified, perhaps even measured, but not really noticeable to the human ear. Though PCM continued to grow in interest post Second World War, no one gave much importance to studying distortion until a decade later. It was Shannon who once more came to the rescue and published his classic paper of 1959. It was this paper that launched _rate-distortion theory_ and in the process, Shannon became a father all over again.

It is from here that formal analysis of all lossy compression methods has its beginnings. Huffman coding, run-length coding, and arithmetic coding came under the class of lossless source coding. Their aim had always been to approach source entropy for discrete sources. For the encoding of continuous sources, the aim was to minimize the transmission rate given an acceptable level of distortion. Distortion was quantified in terms of deviation from the original signal. One common measure was the _mean-square-error distortion_. Such definitions enabled analysis and engineering design. While GIF, PNG, and BMP image formats are considered as lossless, they are lossless only in the handling of bits post sampling and quantization. Strictly speaking, there is some loss of fidelity because GIF can handle only 256 colours, because all digital images have a certain finite spatial resolution. However, this is not the usual definition of lossy image compression. With the coming of the WWW, the timing was just about right for an efficient lossy image compression technique to come into existence. It turned out to be so good that it soon became the de facto standard for image capture, storage, and transmission.

JPEG is today the dominant image format, from publishing on the Web to exchanging pictures by emails, from satellite imaging to storing in digital cameras that capture them. The acronym stands more for the group that created it than for the format itself: _Joint Photographic Experts Group_. It became a standard in 1994 and quickly acquired popularity from experts to amateurs. The compression that JPEG achieves is at times staggering. A good quality image may be twenty times smaller than a BMP image. GIF may achieve only a fifth of JPEG's compression. However, when it comes to compressing non-typical images, such as scanned textual documents, a GIF file may be half the size of a JPEG file. To understand why this is so, we need to look into JPEG's methods.

All lossy compression techniques exploit the single fact that human perception is neither perfect nor needs to be perfect. To put it differently, human perception is tolerant to distortion within certain thresholds. It is only when distortion becomes noticeable, what engineers call _Just Noticeable Distortion_ _(JND)_ , that compression is at its limit. Until that happens, it is quite alright to discard information that we are not going to miss anyway. In a live soccer match, viewers don't really care about the patterns on the ball or the stitches on its seams. These details are irrelevant for a pleasant viewing experience of the match. The details viewers care about are the kicks and the passes, or the quick flash of a flag that signals the offside rule. Compression can therefore be tuned to eliminate unnecessary details.

Years of research has gone into understanding human perception. Our vision is not all that sensitive to details, particularly on the periphery of a scene in view. In a large scene, our focus is limited to small areas of the picture, which is why magicians often manage to distract attention and fool us. The eye is more sensitive to greens and less to reds and blues. To put it differently, the eye is more sensitive to luminance than to colour information. Therefore, it is pointless to support millions of colours when the eye is incapable of distinguishing so many. When it came to videos, it is sufficient to convey smooth motion by taking as few as twenty-five time samples per second. This is because every image formed on the retina remains impressed long enough to merge seamlessly with the next so as to convey motion. Engineers looked to these properties of human vision to compress images and videos more intelligently. As for the right approach, they needed help and mathematics provided it.

In the late 1940s, two statisticians Kari Karhunen and Michel Loève, working independently in Finland and France respectively, came up with the idea of representing any stochastic process as an expansion of orthonormal basis functions. This was not very different from Fourier expansions except that in their method the coefficients are stochastic variables and so are the basis functions. Unlike the fixed sines and cosines of Fourier expansion, the _Karhunen-Loève Theorem_ selected the best possible basis functions derived from the process under consideration. When this was applied to images, the result was that data was decorrelated so that the image could be represented in terms of coefficients of the basis functions. This was only a transformation of data, called the _Karhunen-Loève Transform (KLT)_ , which represented data in a more suitable domain that could then be compressed effectively. Compression came because we could discard the coefficients of higher basis functions since the eye did not perceive the loss easily.

Considering KLT from a philosophical perspective, it is nothing short of remarkable. A stochastic process may have statistical properties but it has no structural properties, no symmetry, and no predictability. Despite being born from the outset into a condemned life of randomness, a stochastic process holds within itself a secret genetic code. The process is built up from fundamental stochastic processes that function as the genetic basis. By tapping into this code, one arrives at a language that allows us to express the genetic code in as few words as possible. KLT represented this language. Because the best things in life are rarely free, KLT came with its set of practical problems.

Though applying KLT to an entire image gave best compression, it also required enormous computation. It was also necessary to include the set of basis functions along with the coefficients. This set is so specific to the image that it cannot be used anywhere else. The purpose of compression itself was defeated. The compromise engineers suggested was to break up the image into smaller blocks, say blocks of 8 x 8 pixels, and apply KLT on each of the blocks. Computation was reduced at the loss of some compression. Compression came because one discarded higher coefficients and reused basis functions across blocks. In addition, it made use of correlation between the first coefficients (called _DC coefficients_ ) of neighbouring blocks. This is easy to visualize since in any scene, changes in colour and brightness are usually gradual at the scale of just eight pixels. The DC coefficient of one block gives us an estimate of those of its neighbours. This redundancy can be removed and hence achieve greater compression. Despite these advances, KLT remained at best a tool for the laboratories. Its complexity was generally high. It could not be reduced for practical application.

The breakthrough came in the early 1970s when N. Ahmed and T. Natarajan of the Kansas State University, and K. R. Rao of the University of Texas at Arlington, discovered a new transform that gave almost optimal compression while being tractable at the same time. Their findings were published in a paper of 1974 and this was the launch of the famous _Discrete Cosine Transform (DCT)_. Today it is impossible to thing of image compression without reference to DCT, which does the magic for JPEG compression. Like KLT, it results in decorrelated data in the frequency domain, which is then compressed by discarding higher frequency coefficients. Discarding is done by simply quantizing the coefficients. This is where the lossy nature of JPEG plays its part. The quantized coefficients are coded using run-length coding. This is followed with Huffman coding. In this sense, JPEG built on the success of digital facsimile standards that preceded it.

JPEG Image Compression

This demonstrates that even with extremely high compression, JPEG does well to retain a semblance of the original image (b). A high quality image (d) can be obtained while still achieving an impressive compression ratio of 1:10. The original colour images are reproduced here in monochrome.

The simplicity of DCT and the quantization that follows it allows creators to control image quality by simply adjusting the quantization steps. Smaller the steps, lesser the loss and higher the quality; but if greater compression is desired one may choose larger quantization steps. Compression is also achieved by a process named _subsampling_ , whereby fewer samples of colour are taken than of luminance since the eye is less sensitive to colour. Colour components are also put through larger quantization steps that result in fewer coefficients. JPEG has the plain vanilla baseline encoding but it also has a progressive encoding that is suitable for large images. Bits in a progressive JPEG are arranged such that the main coefficients are processed first so that software such as web browsers can display an approximation first. As higher coefficients come in, the display is updated with progressively greater image detail. The successors of the early JPEG standard, the JPEG 2000 and JPEG XR standards, have vastly improved on compression. They bring greater flexibility to image quality and even allow for lossless coding.

The newer standards of JPEG reflect the fact that no single image size or imaging format can suit all purposes. JPEG being lossy is not suitable for image editing. The strength of JPEG is in image transmission on bandwidth-limited channels. This brings out a key difference between archival images and distribution images. JPEG is for distribution in a final form. Formats such as BMP and TIFF, or one of proprietary formats such as Adobe's PSD or GIMP's XCF, are best suited for image editing since they are lossless. A thumbnail image is best for display on mobile devices with small screens. A bigger image is more suitable for display on a desktop digital monitor. This scalability of image size and resolution is now possible with the new JPEG standards that use _wavelet transform coding_ , also called _sub-band coding_. What this means is that the same image on a web server can be scaled and displayed as fit for the display device. While wavelet transform may one day displace DCT, that day is not anywhere near.

JPEG 2000 failed to address devices that were low on computational resources. Mobile phone manufacturers therefore neglected the standard until, in 2009, JPEG XR simplified the computations. While JPEG had its own patent problems like GIF, luckily it came out unscathed thanks to prior art that invalidated patent claims. The newer JPEG standards are secure for now but universal support from tools and web browsers is still lacking. Meanwhile, a new lossy imaging format, Google's WebP, is starting to challenge JPEG's supremacy.

In the last couple of decades, image compression has become more important than ever. In 1971, a student at the University of Illinois, Michael Hart, launched Project Gutenberg. His grand aim was to digitally preserving rare and historic documents. With this in mind, he started by digitizing the US Declaration of Independence with no more sophistication than manually typing it word by word. Over the years, the project has digitized more than forty thousand books or documents including such classics as Newton's _Opticks_ (1730) and Lewis Carroll's _Alice's Adventures in Wonderland_ (1865). The value of digitization is apparent even for recent publications. Specialized books often go out of print since they have a limited market. Once digitized, books never go out of print. Production, warehousing, and distribution of bits are far easier. Improper digitization can lead to poor quality and large file sizes. Though manual typing of text hindered Project Gutenberg's speedy growth in its early years, engineers came to the aid with a new technique: _Optical Character Recognition (OCR)_.

With OCR, language is converted from image representation to textual code. This not only brings down file sizes but also enables such operations as word indexing, search, and copy. For instance, Rene Descartes' _A Discourse on Method_ published in 1635 is probably about seventy pages in modern paperback form. Project Gutenberg's digitization of the same in HTML web format consumes as little as 150 kB. On the other hand, Claude Shannon's classic paper of July 1948 runs to forty-five pages and takes up a whopping 18 MB, about 400 kB per page. This is because Bell Labs had digitized Shannon's paper as images without undertaking character level analysis. These numbers give us some indication that a picture is worth more than a thousand words—in fact, a lot more. Sometimes this is done for genuine reasons. The intent is to preserve the original presentation and not just the content. More often, it is a mismatch between user needs and technical decisions. The possibility of misuse also discourages owners from digitizing their copyrighted content as text. OCR is a complex technology that requires lots of computational resource. When documents are physically damaged by stains, holes, and mildew, it works poorly. It has been reported that even the best OCR tools with 99% accuracy require ten corrections per page. This makes manual proofreading essential.

While OCR still has a long way to go, its current limitations are useful in web applications that require users to fill out forms. Often users are prompted to enter the text of a grainy distorted string of letters and numbers to validate form submission. This is really to prevent automated software from submitting forms using ghost identities. Such software use OCR to figure out the validation strings. As OCR improved, so did the purposeful distortions in validation images, also called CAPTCHA. This is no more than a tussle between technology makers and hackers. In the process, OCR has constantly improved. However, not much progress has been made in character recognition of non-Latin scripts. This has made it difficult to digitize valuable classics composed in medieval Chinese, Tamil, or Arabic, to name a few.

The case of video compression is an extension of image compression. While image compression exploits spatial redundancy, video compression exploits that plus temporal redundancy. Redundancy is particularly high in scenes such as news reads or interviews where motion is limited to head movements and gestures. Backgrounds rarely change in these scenes. Videos are pictures in sequence. Just as images are processed in spatial blocks, videos are processed in temporal _frames_. Frames played in quick succession give the impression of continuous and smooth motion. When successive frames are largely similar, compression is achieved by encoding the differences rather than sending original pixel values. Delving into the details, the scheme is much more interesting than that.

Video encoders look beyond just frame differences. They first apply something called _motion compensation_. Given a block of pixel data, encoders look to displace this block in a limited neighbourhood to see if differences with the previous frame can be minimized. Suppose a company has moved its office to the next block on the same street, it is easier to inform regular customers "next block" than give the full address. Motion compensation is something like that. Its goal is to remove as much redundancy as possible by specifying a displacement vector and thereby achieve greater compression. Motion compensation doesn't work well for fast moving scenes such as a soccer match. It doesn't work at all when scenes suddenly shift, something that happens very often in music videos or action-packed movies. In such cases, the frame is compressed as a still image not unlike JPEG compression. These _intraframes_ also find a regular place in any digital video sequence. Intraframes allow viewers to quickly fast forward or reverse digital video as desired.

The pioneer among video compression standards was ITU-T H.261 released in 1988, ITU-T being an organization that had evolved from the earlier CCITT. H.261 established many concepts that found acceptance in subsequent standards. At the same time, H.261 borrowed many of the principles of JPEG compression. From the ideas of H.261 was born MPEG-1 video standard that became popular in the early 1990s. Like its counterpart JPEG, MPEG is named after the committee that standardized it: _Motion Picture Experts Group_. With quality as good as VHS video, it operated at rates low enough to fit an entire movie on a couple of _Video Compact Discs_ ( _VCD)_. VCD quality was decent and initially good enough for a public used to old-style analogue video. Whether the public needed better quality at that point is debatable but engineers saw that there was much untapped potential in digital video. The revolution was just around the corner. Meanwhile, VCDs ruled for a few years. In developing countries, piracy of VCDs became widespread. Going digital had proved once more in a rather unexpected fashion that it was indeed easier to copy and distribute bits.

The revolution did come and the most successful video compression standard to date was released in 1994. That the new standard, called MPEG-2, had launched a revolution was not in doubt. Standards body ISO/IEC was awarded the 1995-1996 Engineering Emmy for Outstanding Achievement in Technological Development for its development of MPEG and JPEG standards. While MPEG-2 was of a higher quality than its predecessor it came with a higher bandwidth requirement, otherwise called _bit rate_ in digital terminology. This was an agreeable trade-off for digital broadcasters, who quickly adopted MPEG-2. MPEG-2 is now standard for satellite TV and cable TV. Soon _Digital Versatile Discs (DVD)_ displaced VCDs for distribution of digital videos.

The wonder of MPEG-2 is its flexibility to cater to different delivery channels. DVD videos are of the highest quality and so are videos broadcasted on cable TV. The same MPEG-2 format can be used to create videos of lower sizes, frame rates, and hence bit rates to suit video streaming on the Web. Videos with small footprints are matched for delivery over wireless channels of limited bandwidths for display on mobile devices. For example, a high quality video of 10 Mbps is clearly unsuitable for web streaming since only a small fraction of users have access to such a high bandwidth. A lower figure of 2 Mbps is more suited for the Web. This is usually achieved by reducing the video resolution (320 x 180) and encoding at a lower frame rate of 15 frames per second (fps), which is just about the lower limit to give flicker-free video for scenes of average motion. Strictly speaking, some of the advances towards lower bit rate video have come from the descendents of MPEG-2. These have found support in high-end entertainment devices including XBox 360, PlayStation Portable, iPod, and iPhone.

It was natural for technologists to ask if the industry should slowly get rid of analogue video and migrate completely to digital. Unfortunately, this question was asked only in the 1990s when it should have come two decades earlier. Despite the success of digital technology in PCM/TDM voice systems and the rise of digital telephony, adoption of digital video did not come so easily. The success of analogue television had polarized engineering thinking in terms of such hard metrics as frame rates and resolutions, when they should have looked to the flexibility of bits and bytes. Thus, it happened that when the Japanese started thinking of better quality video at higher resolutions, they looked to analogue video. This was the beginning of _High Definition Television (HDTV)_. The Japanese called it Hi-Vision and it was analogue. Europe and the US followed suit and took the analogue route. What came out after years of research was an inflexible analogue solution that very few could afford. Worse still, the European standard was different from the Japanese. FCC in the US, fearing Japanese dominance in their markets, started to look towards formulating its own analogue standard.

Meanwhile, DCT had been invented. JPEG and MPEG standards had proved their worth. Computers and memories had become cheaper and faster. The necessary computational power that was needed for digital compression had become available. As they say, better late than never. The analogue programmes were ditched and the tide was turned in favour of digital HDTV. What MPEG-2 really achieved was to prove beyond doubt that digital video was the way to go for the future. Analogue video was too outdated for the modern public that hungered for more channels. It was inefficient for broadcasters who wanted to squeeze more content into a fixed bandwidth. Digital video, compressed by the sophistication of MPEG-2 encoders, made this happen. An uncompressed HDTV video at 1280 x 720 resolution and sampled at 30 fps runs at a bit rate of 650 Mbps. To store a two-hour movie at this rate would require 570 GB of memory or nearly 140 DVDs. When compressed, the same video runs at 10 Mbps and fits into a couple of DVDs.

Since the industry was heavily invested in analogue technology, broadcasters and users needed time to make the switch. In the transition period, unused frequency spectrum was used to broadcast video programmes in HDTV while the analogue transmissions of the same programmes continued in parallel. This was termed _simulcast_. Those who had analogue systems and picked up transmissions off the air received analogue video. Those who had subscribed to HDTV channels usually upgraded to a HDTV set; if they hadn't, set-top boxes converted HDTV transmission to their analogue displays. The reverse was more common. Users who had purchased HDTV sets of say 1280 x 720 resolution could watch almost any type of digital video. Channels that were in standard resolution of 640 x 480, called _standard television (SDTV)_ , were enlarged automatically to fit the display resolution of 1280 x 720. Channels that were programmed in HDTV 1920 x 1080 resolution would be subsampled for 1280 x 720 by the television set. This flexibility to scale demonstrates the power of digital technology. Broadcasters have full flexibility to determine how they want to programme channels or how many channels to squeeze within the allotted bandwidth. They could choose to squeeze many SDTV channels. If premium quality sports coverage is in demand, they could instead choose to transmit one HDTV channel for sports and fewer SDTV channels for other programmes.

The conversion between analogue and digital created one key problem. When television first came in the 1920s, it followed the format of 35-mm photographic film. What engineers call _aspect ratio_ , it is the ratio of the width to the height of image display. Analogue television had an aspect ratio of 4:3. As television became popular, movie studios wanted to recapture consumer interest and entice them back to the cinemas. Through the 1950s, movies were produced in wider formats under various commercial names—Cinemascope, Panascope, Techniscope, or Warnerscope. These wider formats were not necessarily in the same aspect ratio. It was clear though that HDTV wanted to approach the viewing experience of cinematic productions. HDTV had to be vastly better than analogue transmissions of the day. J. A. Flaherty who had been part of the advisory committee that evaluated technical proposals for HDTV, commented in a lecture of 1985,

As we evaluate tomorrow's TV and HDTV and plan for its implementation, we must bear in mind that today's 'standard of service' enjoyed by the viewer will not be his 'level of expectation' tomorrow. 'Good enough' is no longer 'perfect,' and may become wholly unsatisfactory. Quality is a moving target, both in programs and in technology. Our judgements as by the future must not be based on today's performance, nor on minor improvements thereto.

Based on the research of Kern Powers of Princeton University, the aspect ratio of HDTV was fixed at 16:9. This is the reason why today when HDTV or widescreen movies are displayed on analogue TV sets of 4:3 ratio, black bands appear on the top and bottom. What is worse is that often movies are clipped off at the right or left so that it can fit within the 4:3 ratio. This is annoying at times when important scene details are removed even when the shot is panned before being clipped. Worse still are widescreen videos stretched vertically to fill up a 4:3 display screen. This distorts scenes but surprisingly people put up with them, possibly because women flatteringly appear slimmer and men taller. A more acceptable case is to view SDTV programmes on a HDTV display. Again, the picture could be distorted if the settings on the TV set are wrong, this time not in any flattering way. More commonly, black curtains appear on the sides of the screen and the video is suitably displayed in its original 4:3 format.

These problems plague computer users as well. These days it has become common to use widescreen LCD displays as computer monitors. While this is right for viewing HDTV movies from DVDs, it results in distortions when we are browsing the Web or composing a digital artwork. A circle will appear as an ellipse, a square as a rectangle. User experience becomes an effort because the brain has to work harder to adjust to the change in perceptions. Even the latest technology, when improperly used, does more harm than good.

Interworking of SDTV and HDTV Formats

Programmes in standard TV formats when viewed on HDTV screens can result in either curtaining (c) or image distortion (e). Likewise, programmes in HDTV format when viewed on standard displays can result in either letterboxing (b) or image distortion (d). Figures (a) and (f) show the best match between transmission format and display format. Note that transmission in standard format implies that extreme ends of the scene are not available. Source: Christofer Andersson on Flickr.

Apart from aspect ratio, a similar issue of compatibility and preference occurred between TV broadcasters and computer makers. TV sets used a form of line display called _interlacing_. In interlacing, odd numbered lines are displayed first followed by even numbered lines. This was done to save on analogue bandwidth. Computer monitors used _progressive_ scanning instead by which all pixels of the frame are drawn in a single pass. Progressive scanning is better, particularly for videos with considerable vertical movements or fine details. Since HDTV was digital, the computer industry pushed forward the progressive scan method. TV broadcasters caught up in their legacy systems argued for the interlaced method. In the end, HDTV standard catered for both. Creators of Internet videos mostly realize these differences and are careful to deliver videos in progressive format. Elsewhere, it is common these days to find HDTV sets as well as programme broadcasts in 1080i or 720p. The ultimate in user experience is the 1080p resolution but since this takes up twice the bandwidth of 1080i, broadcasts in this format are less common and so are TV sets.

What has been ignored thus far is the role of audio. MPEG and HDTV are not just about digital video but also about sound that goes with it. Starting from the days of PCM, digital speech has a much longer history than digital video. In fact, basic techniques of speech compression played a key role in influencing image and video compression.



**During the years** when Alec Reeves had been designing PCM with the aim of improving quality, there were some engineers who had a completely different focus. They ignored the problem of speech transmission for the moment and studied the source itself. They began to analyse the manner in which humans talked. They noticed that when we talked, we also listened in between to the voice at the other end of the line. This meant that human speech was often interspersed with pauses. Therefore, the idea was put forward that these pauses could and should be removed so that precious bandwidth could be utilized more efficiently. While the focus of Reeves had been to reduce noise by expanding signal bandwidth, removal of pauses enabled reduction of average bandwidth per speech call. Among the early published work on this came from engineers at Bell Labs.

A. C. Norwine and O. C. Murphy published experimental results in a paper of 1938 in which they defined such esoteric terms as talkspurt, double talking, resumption time, response time, and lockout. What is remarkable about their experiments was that they were not simulated conversations in the laboratory. Rather, they rigged up recording and measurement equipment to the New York to Chicago line and collected data on actual speech conversations. Technology often gives engineers a foothold to improve and optimize systems. In other words, technology builds upon itself by giving not just operational systems but also tools to analyse those systems. By such experimental research, engineers found that on average a talkspurt—a continuous segment of speech without pauses—is only four seconds before the speaker gave the other party a chance to talk. A maximum talkspurt of 144 seconds was also recorded, an indication that the speaker might have been either a habitual bore or a great orator.

The first transatlantic telephone cable named TAT-1 came into operation in 1956. It carried 36 speech calls, each with a bandwidth of 4 kHz. This figure came from an understanding of human speech production and hearing sensitivity. Human speech is limited to 4 kHz. By this, we mean everyday speech and not the high-pitched renditions of operatic performers. Further analysis on aural perception showed that our hearing is most sensitive to frequencies in a limited range of 800-3000 Hz. Telephone engineers exploited this to reduce channel bandwidth to 3 kHz, thus allowing twelve more calls to be carried on TAT-1. While this introduced some loss of fidelity, speech was still intelligible. This was the first form of lossy speech compression. Later engineers following in the tradition of Norwine and Murphy discovered that during conversations pauses account for as much as 60% of call time. This enabled them to increase the call carrying capacity of TAT-1 to 72 calls. Physically, the TAT-1 cable allowed 48 calls, each with 3 kHz bandwidth; but engineers had managed to multiplex 72 calls.

The concept that made this possible is today called _statistical multiplexing_. When a pause occurred in an active call, the channel was quickly reassigned to another call that was just getting into active talking mode. Statistical analysis showed that it was extremely rare for more than 48 calls to be in talking mode at the same time when 72 calls were ongoing. In fact, research showed that we talk only 40% of the time during a telephone call. The challenge in implementation lay in detecting accurately the transition from talkspurts to pauses and vice versa, and switch channels as quickly as possible. Achieving this was far from simple. Line echoes had to be almost eliminated so that pauses could be detected correctly. Pauses had to be distinguished from whispers. The method has found successful application in TAT-1 cable since 1959. This is a delay of two decades after the initial work of Norwine and Murphy. This is easily explained in the same way we explained the delay in PCM/TDM. Statistical multiplexing was later applied to TDM and named _Statistical TDM (STDM)_. In the subsequent years, many papers were published on statistical multiplexing and greater understanding was gained through the 1960s.

At the heart of all compression is the idea of removing redundancy, which is simply correlations between source symbols. In other words, given a symbol, one gets some idea of the next symbol. Suppose a weather station on a remote island is required to measure temperature every minute and send this data to a meteorological centre. It is unlikely that successive measurements will change by any significant amount. In this sense, the data is correlated. However, temperature may change over a longer period of say one hour. It is the same with speech except that the time scale is in the order of milliseconds. While sampling rate is designed to accommodate the highest frequency, a typical conversation mostly contains mid-range frequencies. In this typical range, one voice sample is correlated to the next. It is therefore more efficient to encode the differences of successive voice samples. Differences have far less correlation, less redundancy, and hence represent good compression. It is from here that the idea of _Differential PCM (DPCM)_ was born.

Dutch researchers at Phillips Company were the first to propose DPCM in their French patent of 1951 that had been filed two years earlier. Cassius C. Cutler of the Bell Labs filed for a similar patent in 1950, which was granted in 1952. Cutler noted that one could quantize voice samples and take their differences; or take sample differences first and then quantize the difference. Ingeniously, Cutler combined these two approaches in such a way that it simplified decoding procedure and avoided accumulation of errors due to quantization. The solution was to quantize and transmit the difference between current sample and quantized form of the previous sample. This method is now standard for modern implementations of DPCM. Cutler saw the advantage of his proposal,

It is a characteristic of differential type systems in accordance with the invention that errors of quantization are not cumulative since a quantization error made on one sample is subtracted from the next sample and thereby tends to be corrected in the next quantization so that there is effectively no cumulative error.

This observation of Cutler is reminiscent of Harold Black's negative feedback amplifier. Indeed, DPCM operates on the principle of negative feedback. Quantized form of a sample is in fact constructed or predicted from previous quantized samples. Even if the predictor does a less than perfect job, the difference is given due importance, quantized, and transmitted on the channel. Thus, the only error the system suffers is quantization noise and not due to poor prediction. This result comes directly from Black's negative feedback principle. Of course, poor prediction means that the encoder achieves less compression than possible.

DPCM thus marked a milestone in the history of data compression when differential encoding attained an important status. Its impact was later felt outside the domain of speech compression. JPEG and MPEG removed spatial correlation by differentially encoding DC coefficients of neighbouring blocks. MPEG also differentially encoded motion compensated blocks, thus removing temporal correlation. Even within speech compression, DPCM inspired other differential methods including _Delta Modulation (DM)_. DPCM remained the state of the art in speech compression through the 1960s. When the next decade commenced, engineers started to wonder what would happen if DPCM could be made adaptive.

The very definition of adaptive was imprecise in the beginning since different engineers approached the problem differently. The efficiency of DPCM was directly linked to how well the encoder predicted the sample based on a weighted addition of previous samples. Questions that troubled engineers were many. What was the best way to choose the weights? Could weights be adapted from sample to sample for better prediction? How many previous samples should the encoder consider? Given short-term and long-term statistics of the source, can the prediction model adapt itself? It was obvious to all that the problem was no different from Norbert Wiener's prediction theory applied to gunfire control during the early 1940s. It was then in 1973 that P. Cummiskey, N. S. Jayant, and J. L. Flanagan of Bell Labs redefined the meaning of adaptive.

Adaptive didn't have to mean better prediction. Adaptive could be as simple as changing the quantization steps and thereby minimize quantization noise. This was a new way of looking at an old problem. This was far simpler than the adaptive prediction. In fact, this too is reminiscent of Harold Black's work on feedback amplifiers. Black had accepted non-linearity of devices and attempted to remove that by redesign. In the same manner, Cummiskey and company accepted that prediction could be inaccurate. Instead of trying to fix the prediction problem, they tried to see the best way to minimize quantization noise. While PCM operated at a bit rate of 64 kbps, DPCM had brought this down to 48 kbps. _Adaptive PCM (ADPCM)_ reduced bit rate requirement to 32 kbps with potential for further reduction. What this really meant was traditional PCM/TDM systems could now carry twice the number of voice calls for the same cabling with minimal loss of call quality, the loss being mostly in the higher frequency range. ADPCM is now commonly used in voice mail systems and answering machines.

Human speech is something of a paradox. It is difficult to express precisely what we feel. It is difficult to describe our myriad emotions—anxiety, fear, disappointment, or elation. Language seems to be a poor proxy to give words to emotions. Pages filled with words may mean less than a smile or a sigh. Despite our best efforts with words, the source encoder always claims that we talk too much. Josh Billings once said, "I don't care how much a man talks, if he only says it in a few words." However, human speech being a product of thought, culture, linguistics, and relationships always has redundancy built into it. No matter how few words we use, the speech encoder always finds a way to compress them. Even if we were to speak gibberish, it would be compressed. The reason is that speech encoders operate at the lowest level of bits. They don't work at the level of languages and meanings. Some engineers then broke away from conventional thinking and began to consider the intermediate level of sounds.

Human speech is conceived in the brain and modulated by emotions. What the listener hears is a mechanical representation of the speaker's thoughts and emotions. In between, lay the mechanics of sound production—the vocal tract, the vocal cords, the larynx, the pharynx, the lips, and the tongue. Engineers began to wonder if speech could be compressed by studying speech production. Engineers thus became part-time biologists and speech therapists. From here was born _Linear Predictive Coding (LPC)_. In fact, the idea went beyond compression. If engineers could construct speech from an understanding of its components, they could make machines talk. This would also help the handicapped. Artificially producing human speech is older than telephony itself. In fact, early work of Helmholtz, Kempelen, and Wheatstone had inspired Bell to invent the telephone.

The antecedent of LPC was the _voice coder_ or _vocoder_. This goes back to the days before SIGSALY of wartime voice communications. Homer W. Dudley of Bell Labs came up with novel idea of filtering speech into multiple frequency bands. In each band, the waveform is characterized and its parameters are transmitted. Speech could thus be compressed to within 400 Hz. Dudley did not talk about bit rate because the time for bits and digital transmission had not yet come. Dudley's system was FDM. Dudley's vocoder found use in SIGSALY that soon combined elements of Reeves' PCM encoding. Dudley remains a forgotten name known to only a few. Yet the impact of his work is enormous. Not only was it essential for SIGSALY and the victory of the Allies, it also became the starting point of modern LPC compression methods. Mobile telephony, teleconferencing, Skype calls over the Internet, and secure telephony use LPC because of its low bit rates.

Japanese researchers Fumitada Itakura and Shuzo Saito were the first to write about LPC, their publication appearing in a conference paper from 1968. The idea of LPC is easily understood by studying any musical instrument. Suppose a soloist plays a Vivaldi number on his violin. One could either record the performance and play it back; or input the notes and the precise timing into a computer program so that the computer could reconstruct the performance. The latter approach is LPC when applied to speech. Engineers analysed speech in terms of its components and classified them. From combining these components in different ways, distinct sounds could be produced. This general approach is thus known as one of _analysis-synthesis_.

Engineers saw that the basic unit of speech was not a syllable. It was something more fundamental. They called this the _phoneme_. All sounds came from phonemes. They identified 75 phonemes in all. For example, phonemes _b_ and _m_ involved lip closures while only the latter involved a nasal sound. Phoneme _i_ did not involve the lips. But phonemes alone do not complete the picture. For example, the phoneme _t_ could be used in so many different ways—tea, tree, steep, city, it'll, or beaten. Thus, when phonemes combine with other phonemes, the spectral characteristics are modified. Engineers termed these phonetic segments as _allophones_. Importantly, speech could be broadly classified into _voiced_ segments and _unvoiced_ segments. The former was quasi-periodic and contained most of the signal energy. The latter was random and spread over a broad spectrum of frequencies. This distinction between voiced and unvoiced segments helped engineers synthesize speech. By analysing speech in this manner from its basic units and building up slowly into higher units, engineers understood speech mathematically. They discussed these intricacies at conferences and in journals in such esoteric terms as prosody, diphones, morphemes, resonances, and formants.

LPC compression happens by providing to the decoder parameters from which speech samples can be constructed. Instead of transmitting sampled speech, the encoder transmits parameters of the speech model. Dictionaries are used so that parameters are indexed and picked up from the dictionaries. Due to the reliance on speech models and parameters, LPC comes under the category of _model-based compression_. The fact that LPC can compress speech to as low as 4 kbps makes this attractive in environments where quality is not the prime concern. A fair quality LPC speech operates at about 16 kbps. Even with this quality, there is a problem.

Given the score, a Vivaldi performance from an expert violinist would most likely be different from one played by a computer. The computer does exactly what it is programmed to do. It lacks emotion that's essential for any musical performance to come alive. It will therefore not surprise us to know that speech constructed by LPC sounds artificial and synthetic. Lack of naturalness means poor user experience. It may even be difficult to recognize the speaker at the other end of the line. This is the price that one pays for high compression and reduced bandwidth. Recent research in this area has looked to combine the high compression of LPC with the higher quality of ADPCM in a new generation of _hybrid coders_. Hybrid codes have drawn inspiration from techniques used in image compression and applied them to speech. KLT, DCT, and sub-band coding have achieved fair to good quality speech at 9.6 kbps. This comes at an increased complexity and cost of encoders and decoders. To this day, good quality speech at 2.4 kbps remains the holy grail of speech compression.

Most people like to talk more than they like to listen. Perhaps this was the reason why engineers focused more on how we talked rather than how we listened. In reality, speech technology came first because it is simpler. It is limited to 4 kHz. Human hearing has a much higher range, right up to 20 kHz. This includes not just human speech, but music and all sorts of environmental sounds from the hoot of an owl to the high-pitched sound of an ambulance siren. From the outset, telephone networks had been designed for speech, not audio. This is why when a receptionist puts us on hold to the sound of elevator music, it doesn't sound all that great. While a lot of work had been done on speech compression, progress in audio compression was limited even by the start of the 1980s.

Sometime during the 1970s, Professor Dieter Seitzer of Erlangen-Nuremberg University in Germany started looking into the problem of compressing speech. His timing was perhaps not right because the telephone industry was changing. Through the course of the decade, telephone companies started looking into fibre optic cables, which had the potential to carry thousands of voice calls. Facsimile in analogue form had become a success and work was underway on standardizing digital facsimile. At the same time, they thought of making speech digital all the way to the end subscriber. While much of the telephone network had moved to digital transmission, at the customer premise speech was still analogue. An initiative was launched to define a new digital standard to make speech digital at the customer premise. This standard was the _Integrated Services Digital Network (ISDN)_. The standard would cater for all types of digital services to homes and businesses. A basic data rate of 128 kbps was to be made available to the customer. Suddenly, speech compression didn't appear to be important. Therefore, the professor turned to audio compression instead.

For years, the public had listened to quality music in analogue form. Even today, audiophiles swear by rare vinyl LPs even though each LP holds less than half an hour of music per side. When music was digitized in the form of the _Compact Disc (CD)_ , it was a revolution. In 1982, Billy Joel's _52nd Street_ became the first album to be released on CD. When Japanese company Sony released the _Walkman_ in 1979, music had become portable. With its successors, the DiscMan and the _MiniDisc_ , digital music too became portable. The quality of CD storage is truly remarkable, with an SNR of 90 dB at 32 bits per stereo sample. The downside of CD quality music is that the bit rate is a high 1.4 Mbps. So a CD could hold only 74 minutes of uncompressed audio. At the same time, the Internet was on the rise all through the 1970s. It was natural to look for better ways of distributing music. The idea was to compress audio without sacrificing on quality. After all, it was not far-fetched to visualize that someday people may download music directly to their computers or want to save thousands of songs in a personal library of their own.

When a research team was formed in 1987 between Erlangen-Nuremberg University and the Fraunhofer Institute for Integrated Circuits, their vision was not quite so futuristic. With funding from the European Union, their modest goal was to come up with a new audio format for Digital Audio Broadcasting (DAB). One of Seitzer's PhD students, Karlheinz Brandenburg, completed his doctoral thesis on what he called _Optimal Coding in the Frequency Domain_ _(OCF)_. In his invention, he employed ideas of psychoacoustics. Just as engineers had studied sound production not too long ago, Brandenburg studied the manner in which humans listened. In Brandenburg's own time, psychoacoustics was not really a novel subject. Study of human auditory system and the way we perceived sound had been going on for decades. As early as 1876, the year of the telephone's invention, John Strutt (Lord Rayleigh) had published a study on how we perceived the direction of sound. Then in 1931, A. D. Blumlein obtained a British patent on the subject of surround sound. This research was continued on and off right up to the late 1970s but no one really thought of applying psychoacoustics seriously to audio compression. Brandenburg was among the early researchers to do just that. Thus was born one of the most famous audio codecs to date, _codec_ being a short form for coder-decoder. It became a standard in 1992 as part of MPEG-1 as Layer 3 audio. It is more commonly known today as MP3.

The principles of psychoacoustics are not that difficult to understand. The human ear's sensitivity to sound varies with frequency. Essentially, human hearing is most sensitive in the range of human speech. Charles Darwin might put this down to evolution. While we may hear a whisper at 1 kHz, sound at 100 Hz would have to be 30 dB higher for us to perceive it. Moreover, a tone at 1.1 kHz that's 10 dB lower than another at 1 kHz will be masked due to close proximity in the spectral domain. The same is true in the time domain when a louder sound can mask the presence of a whisper even when the whisper is an octave higher. For that matter, the encoder could even discard the entire audio spectrum beyond 16 kHz and the distortion would not be noticeable in most cases. Coding that made use of such characteristics came under the general heading of _Perceptual Audio Coding (PAC)_.

Brandenburg saw that if we could separate audio samples into many fine sub-bands, then each of these sub-bands could be quantized differently to suit the perceptual characteristics of human hearing. Where sensitivity was high, more bits could be allocated. Where sounds are masked, lesser bits could be used by doing coarser quantization. Though this introduced quantization noise, it really didn't matter because our auditory system didn't perceive it. In some sense, this was like a visitor perceiving a room as clean and tidy even when the dirt lay under the carpet and the menagerie stuffed into the closet.

At the foundation of perceptual audio coding is the fact that quality cannot be measured with such hard metrics as SNR. Quality was defined by the way we perceived sound or didn't perceive the imperfections. For decades even before the time of MP3, engineers had known this and had therefore defined subjective measures of quality. Trained users rated quality on a scale of excellent to bad. Such ratings were used to compare different compression methods. These subjective measures are necessarily with reference to human perceptions and experience. Human perception involves not just our senses but also the manner in which the brain processes information. Fundamental limits, idiosyncrasies, and filtered selectivity of human perception are at the heart of lossy compression. Such compression being tuned to humans quite likely won't work for others. Bats and dolphins communicate at a different aural spectrum. Cats are said to be colour blind. Many other species may have a higher visual sensitivity to the reds and the blues rather than the greens. The less we say of aliens, the better. Therefore, if at all non-humans were to listen to MP3 songs or look at JPEG images on the Web, they may not appreciate them quite as much as we do. They may perceive a lot more distortion that we do.

MP3 was not the only one to exploit psychoacoustics. One other famous codec that used it came from Dolby Laboratories, a firm that had been involved in high-end audio research since the 1960s. In 1991, digital audio was launched by Dolby under the name Dolby AC-3 or Dolby Digital 5.1. This is now standard for audio recordings in the movies, video games, DVDs, and HDTV. While Dolby AC-3 offers higher quality, it also has lesser compression. Hence, for most everyday purposes, MP3 is the preferred method for carrying our favourite songs on the move. It is now common to carry 10,000 songs compressed to high quality in just 40 GB of memory. In the days before MP3, carrying 10,000 songs would have required 700 audio CDs.

The success of MP3 is now well known to most people who have ever downloaded songs on the Internet. Ironically, MP3 didn't become a standard for DAB, for which a lower complexity codec was selected. Brandenburg, now known as the father of MP3, later recalled that when he first heard Suzanne Vega's _Tom Diner_ drifting down the office corridors, he knew he had found the perfect song for testing the codec he had invented. By 1999, Internet searches for "MP3" had toppled "sex" from the top spot. For all its success, MP3 is not a vinyl LP, a magnetic tape, an optical CD, or a DVD. It has no physical form unlike most of its predecessors in the world of audio. MP3 is truly a product of the digital world. It is just software, a method of analysing audio and arranging bits in flexible ways. This flexibility means that it can be easily upgraded if desired. More likely, it can be just as easily pirated or displaced into obsolescence.

When PCM came out in the 1930s, it was revolutionary. It is still considered revolutionary except that it is also seen as simplistic. Data compression has come a long way since PCM. At every stage in the progress of compression, it was felt that perhaps the effort was unnecessary. After all, with every passing decade hardware technology benefited from economies of scale—greater volume, drop in production costs, and higher manufacturing efficiency. Memories and storage devices got cheaper. Transmission bandwidth increased due to more cabling and newer technology such as fibre optics. Yet compression finds a permanent place and source coding remains as relevant today as in the mid-twentieth century. The reason is perhaps obvious. While memories got cheaper, amount of data has grown just as quickly. While transmission channels have increased in number and capacity, so has the volume of traffic.

For compression and source coding, the goal has always been to approach source entropy by removing redundancy just as Shannon had said in 1948; or allow for a certain distortion that was deemed acceptable. For engineers working towards these goals, the journey has been long and hard. Today they are nearer to the goal than ever before. While these engineers had sweated over compression for many decades, some engineers were strangely adding redundancy. After source coding engineers had compressed data to fewer bits, the other engineers purposefully added extra bits before either storing them on a DVD or transmitting them on the channel. It seemed that engineers working on information theory were split into two opposing factions. It turns out that the world is big enough for both. It turns out that both are just as essential for digital communication.

# 0110 Reaching for the Limit

**Somewhere in the** Dravidian heartland of South India, a fruit seller is busy with his oranges. He meticulously places them such that each orange touches its neighbours for a compact arrangement. When he is done with one layer, he builds a second layer balanced within the little voids of the layer beneath it. Within half an hour, he has thus built an attractive pyramid of fresh juicy fruit, just the kind of enticement for folks who would soon battle the midday heat of an Indian summer.

Somewhere in the cold expanse of the Solar System, a spacecraft is cruising effortlessly through the endless void. It is an itinerant wanderer that travels without a destination. It challenges its own limits of knowledge and knows no boundaries. It captures all that it sees and beams the images back to earth. In these images of other worlds are the rings of Saturn and Jupiter's giant swirl of red clouds.

Somewhere closer home a philosopher who had been living an isolated life for many years has just stepped out of his barrel. He learns of the progress that mathematicians have made. He begins to ponder on the secrets of the universe by contemplating on a number, a really large number: 808,017,424,794,512,875,886,459,904,961,710,757,005,754,368,000,000,000. The number looks random but the philosopher knows otherwise.

These seemingly unrelated scenes have something in common. To unravel their secret we need to set the context and start at a familiar place: telephony. By mid-twentieth century, the telephone network was vast. Engineers began to ask how such a network could be leveraged to carry digital information. Telephone lines carried analogue speech within a defined bandwidth. These waveforms were baseband signals. _Baseband_ means that the signal is in its natural frequency spectrum. It has not been shifted to another band as is commonly done in FDM carrier systems. Analogue modulations including AM and FM perform such shifts from baseband to carrier band. Engineers now asked themselves if somehow digital information could modulate analogue waveforms.

Technological progress often is the combining of existing elements in new ways. It is often the result of someone asking a "what if" question and indulging in an unconventional approach. With information becoming digital and modulation being already mature, it was almost predictable that engineers would one day come up with _digital modulation_. It didn't require a leap of imagination. A digital modulator takes ones and zeros. It converts them to analogue waveforms. A digital demodulator being part of the receiver system does exactly the opposite.

Among the early binary digital modulation schemes were _Amplitude Shift Keying (ASK)_ and _Frequency Shift Keying (FSK)_. In ASK, a one would be signalled as a cosine wave. To signal a zero, nothing would be sent. In this simple form, ASK is also known as _On-Off Keying_. More generally, it is also known as _Digital PAM_. With FSK, the modulator toggles between two frequencies, each frequency representing one of the two binary states. So a device could be attached between a computer and a telephone line so that digital information could be carried as analogue waveforms. The signals would make their way through the telephone network and eventually be converted into and consumed as bits. The piece of device that did the translation was the _data modem_.

What this modem did was to transform an ordinary telephone line into a medium that could carry bits. The modem took bits at one side and converted them into analogue waveforms at the other, and vice versa. The word modem is in fact a compressed term for what it does— _mod·_ ulate and _dem·_ odulate. It formed a bridge between two worlds, one analogue and one digital. With the coming of the modem, one thing was clear. Information had been liberated from the medium. It was now possible to move seamlessly between the two worlds.

The real reason the data modem became a success was that it came in an era that saw the birth of the Internet. Electronic mail or email was one of Internet's first widespread applications. It would not have been possible without modems. By volume, data communications was nowhere close to voice communications but it was growing. The modem was just what the public needed to send their precious bits on existing telephone lines. It was a timely solution to an immediate need.

Modulation, whether analogue or digital, is as important to electrical communication as language is to everyday communication. It is with language that we articulate our thoughts. Just as they are many languages, so are there many methods of modulations. Modulation enables transfer of information from one communicating party to another. The method of modulation depends on the medium in use—twisted wire pairs of copper, wireless, or optic fibres. At the heart of all methods of modulation is the need to use resources efficiently and make communication effective.

The first modems weren't all that efficient. Though telephone bandwidth is nearly 3 kHz, an early modem released by AT&T in 1958 supported a mere 300 bps. It seems unbelievable today that a data rate that slow can actually be useful. Yet when CCITT standardized V.21 modem standard in 1962 using FSK at 300 bps, this was the state of the art. Computers of the day were slower, less powerful, and certainly less hungry for communication bandwidth than today's counterparts. Facsimile transmissions over telephone lines, called _fax modems_ , were finally beginning to see real need from the public. They too settled happily for 300 bps.

While the concern of early modems was on sending bits over telephone lines, digitized speech transmission was already in operation in the early 1960s with the coming of PCM/TDM systems. Bits were sent on T1/E1 lines as baseband signals. The reason was that multiple circuits were time-multiplexed. They didn't need to be separated in frequency. There was no modulation to speak of in the usual sense since bits were sent down the line as electrical pulses. Engineers called this _line coding_. T1 lines, for example, used Alternate Mark Inversion (AMI) while E1 lines used High Density Bipolar (HDB3) coding. The difference between line coding and modulation is subtle. The former is composed of positive and negative pulses of electrical current. The latter is a modification of analogue carrier waveforms. Line coding is therefore a true child of the digital world.

Though speech had been digitized through PCM/TDM, this change had happened inside the telephone network. Telephone lines coming to subscribers were still analogue and of limited bandwidth. After all, the network had been designed for voice and no one had foreseen the need to send data. Modems became necessary devices in this context. Engineers were required to design digital modulation schemes far more efficient than ASK or FSK.

It was therefore timely that in 1962 two engineers, Jim Cryer and Arthur Kohlenberg, founded Codex Corporation. It was based out of Newton, Massachusetts. The Cold War was on and the US seemed to be clearly lagging behind its rival in the Space Race. Under John F. Kennedy's leadership, the nation was galvanized into action. Defence budgets were boosted. Codex Corporation was just one among many to capitalize on the government's willingness to dish out fat dollars to suitable solutions. What Codex Corporation sold to the government is less important to the present discussion than what it achieved in the early seventies. It became the first company to commercialize high-speed data modems. High speed for that era meant 9,600 bits per second but it was staggeringly fast compared to anything that had come before it. To understand how Codex engineers had managed this, we need to start with one of the simplest of digital modulations.

While ASK used amplitude to differentiate ones and zeros, FSK used frequencies. It occurred to engineers that signal phase could be used as well. For every bit that was a one, the modulator could transmit a cosine wave. For every zero, it transmit could transmit a negative cosine wave. This type of modulation was termed _Binary Phase Shift Keying (BPSK)_. It was binary because the waveform had only two states, a positive or a negative cosine wave. More importantly, there was phase relation between these two states. A negative cosine wave can be seen as a 180-degree shift of the positive cosine wave. Each wave was termed as a symbol just as Claude Chappe's symbols were written in the air with the regulator and indicators. In the case of BPSK, it was clear that number of symbols transmitted per second was exactly same as the number of bits per second. Engineers then looked to increase transmission rates.

The first approach was to increase the number of symbols per second. Nyquist had shown that information could be signalled at twice the channel bandwidth without suffering from ISI. Therefore, the limit of 300 bps was only superficial. It was a limitation of engineering. As modem hardware evolved through faster and smaller devices, engineers could improve on the symbol rate. The second approach was to see beyond the dual states of BPSK modulation. If modems could be made to distinguish more than two states, then it would be possible to improve efficiency.

In BPSK, symbol phase defines the state of modulation. Instead of just two phases, if the modem could construct, transmit, and recognize four distinct phases, then each symbol could carry more information. A single symbol would convey one of four possibilities, each possibility represented by two bits. Each symbol could therefore carry two bits at a time. This was a remarkable innovation. It was something like increasing the passenger carrying capacity of a highway. If the highway had been built to handle at most hundred cars per minute, the only way to increase passenger traffic was to add more passengers per car.

While BPSK carried only one bit per symbol, the new method could carry twice as much. It was termed _Quadrature Phase Shift Keying (QPSK)_. Engineers then extended this concept to _8-PSK_ modulation, which could carry three bits per symbol. These new forms of modulation needed new terms of definition. _Symbols per second_ was defined as _baud rate_ , a term in recognition of Emile Baudot's early work in data communication systems. Digital modulation made the distinction between bits and symbols, between bit rate and baud rate, between bit rate and bandwidth. Bit rate is a digital term while bandwidth is an analogue term that refers to the frequency spectrum. Efficiency of modulation links the two together. With these definitions, it now became easy to compare different modulation schemes. For example, both BPSK and 8-PSK could be operating at the same baud rate but the latter gave thrice as much bit rate.

Concepts of digital modulation evolved simultaneously in many research laboratories and corporations. Still, we must give the engineers at Codex due credit for reducing known techniques to a form that could be manufactured cost-effectively and reliably for the mass market. By the end of the 1960s, Codex engineers achieved 9,600 bps by squeezing more and more bits into the limited voiceband. Their competition managed only half as much. The price for squeezing more bits into a limited bandwidth was ISI. The way to overcome ISI is to use an equalizer. Nyquist had proposed it four decades ago but the market of the sixties needed a compact implementation that could be sold commercially.

Fortunately, Robert Lucky of Bell Labs had come up with a design of an adaptive equalizer to do just that. In 1965, he proposed a technique of using a known training sequence of pulses. The equalizer at the receiver looked at this sequence after it had gone through channel. This information was used to adapt equalizer settings. An early prototype of such an equalizer had about a hundred relays assembled in a six-foot rack. Lucky himself described how these early mechanical systems gave engineers a real feel for the system and its workings. When the equalizer set off on its task, a hundred relays started clicking frantically. When the sounds quieted into sporadic clicks, the engineer knew that the equalizer had done its job.

While focusing on reducing ISI, Lucky had ignored noise, or at least had assumed noise is low enough to be significant. His criterion for the equalizer was to force pulse trails to zeros where other pulses are supposed to be at their peaks. This _zero-forcing criterion_ worked at low symbol rates but failed at higher rates when noise started acting up. Researchers then came up with an enhancement. Rather than use zero-forcing criterion, it was better to use _mean square error criterion_. With this, deviation of equalized signal from the actual signal was minimized. In the long run, this proved to be a better criterion. These research findings found their way into commercial modems.

Codex's first modem, AE-96, was release in 1968. The modem's name can be attributed to a built-in equalizer, "AE" standing for _adaptive equalization_. Increasing miniaturization in electronics had made this possible. Equalizers were no longer six-foot racks. At $23,000 for a pair of these modems, it was an expensive package. It was not meant for home users. Rather, business managers and corporates needed to exchange data across many remote locations. The cost was easily justified. Many terminals could be connected to a multiplexer and share a single 9,600 bps line. This would work out cheaper than leasing multiple lines operating at lower bit rates. Market interest in AE-96 was overwhelming. Orders were more than expected. Unfortunately, the modem simply wouldn't work on the field. Codex engineers joked that they might have manufactured something like a hundred modems but shipped out three hundred as customers often returned their units for repairs.

It was clear from the outset that equalizers were critical for error-free operation. During installation, qualified technicians performed careful measurements on individual lines and used these to tune or adapt the equalizers. AE-96 was not designed to do this automatically. Problem was that telephone line characteristics varied with time. This resulted in _phase jitter_ , which meant that the channel altered the phase unpredictably. This was a disaster for the product since demodulation depended on phase. AE-96, as revolutionary as it was, failed in the real world. Codex engineers learnt an important lesson in the process. Product design and development can never succeed in isolation. The process requires an understanding of the system in which the product is meant to work. In this case, engineers had not understood the telephone system very well. They had not known how badly phase jitter would affect data transmission.

The net effect was that AE-96 might work well for a few days, slowly deteriorate over time until one day, bit errors became excessive. David Forney came to the rescue. Forney had a PhD from MIT in information theory. He had interviewed with Bell Labs and IBM Research Labs but had chosen to join Codex instead. He came up with a method to detect and correct phase jitter. Codex engineers provided this solution as a separate attachment to their AE-96. Even then, AE-96 did not capture the market. Its early technical problems alienated potential customers. Worse still, the equalizer was sold as an additional unit at an extra cost. Customers chose to stick to 4,800 bps modems, which they perceived as more reliable. It was at this point that Codex made an important technical decision. AE-96 was then using 7-level PAM, meaning that amplitude was the primary method to encode signal states. At about the same time, Robert Gallager, Forney's former guide at MIT, was looking at an alternative modulation technique.

Gallager's modulation varied both phase and amplitude so that each symbol could carry more bits. From this idea was born 16-QAM modulation, whereby four bits are used to select one of sixteen possible symbols for transmission. _Quadrature Amplitude Modulation (QAM)_ used variations of both phase and amplitude to represent symbols. While BPSK did modulation on a per bit basis, 16-QAM modulated the carrier in blocks of four bits. Two bits were used to distinguish among four amplitude levels. The remaining two bits were used to distinguish among four phase values. QAM turned out to be superior when compared to what the original AE-96 had used. QAM was more resilient to phase jitter. The coefficients of the equalizer were still relevant but now they were complex numbers. It was precisely 16-QAM that the Codex engineers used for their revolutionary 9,600 bps modem of 1971. Operating at 2,400 bauds, it carried four bits per symbol. It took the competition a couple of years to catch up. By then, Codex made money and established itself. CCITT subsequently standardized this modulation as the V.32 modem standard that operated at the carrier frequency of 1,800 Hz.

If increasing bandwidth efficiency was as easy as this, it seems obvious that we can obtain higher efficiency by expanding modulation schemes to more levels of phase and amplitude. Potentially, we could get lots of bits through for the same number of symbol rate and bandwidth. Communication engineers knew the real reasons why this was not possible. Firstly, to control the phase or amplitude of a signal to precise levels is a challenge for the modulator. As more levels are added, cost and complexity of modulator increases. Secondly, the situation is far worse for the demodulator.

We may presume that the modulator and the demodulator function as equal partners in a digital communications system. On the contrary, the demodulator has always been in the limelight. Over the decades, it has constantly been feted upon, cajoled, and even coaxed into working. While the modulator is comparatively simpler, the demodulator has the tough job of digging out the signal from the noise. As more and more phases and amplitudes are added to a modulation scheme, it becomes easier for noise to corrupt the signal; that is, move the signal from its phase or amplitude to another nearby. The demodulator doesn't know that it has received a corrupted symbol. It simply translates such a signal to wrong bits. While in the analogue world, engineers talked of distortion, in the digital world they talked of _bit error rate (BER)_ , which was a more suitable and fundamental measure of digital modulation. It was true that packing more bits into a symbol increased bandwidth efficiency; but this was of little use if the demodulator ended up with high number of bits in error. BER thus became an important measure to judge if a particular modulation worked well under prevailing noisy channels. A good system achieved a bit error of one in a million (10-6). An excellent system improved this to one in a billion (10-9).

Engineers took a geometric approach towards evaluating BER. They took a complex plane and applied it to their signals. In the case of BPSK, since there was a single cosine wave with differing polarities, the two symbols could simply be represented as two points on the x-axis. The distance that separated the points depended on the signal energy. More energy the signal had, the greater the distance. For errors to occur, noise had to be strong enough to overcome this distance. Noise had to be strong enough to move a symbol from its own half to the other half of the complex plane. To calculate the theoretical value of BER, engineers needed to model the noise process. Noise was modelled as a stochastic process that was white and Gaussian. _White_ meant that noise was around and constant at all frequencies of the spectrum. _Gaussian_ referred to the distribution of noise strength, peaking at a mean value and falling off on both sides. Such a model, now commonly called _Additive White Gaussian Noise (AWGN)_ , is standard in all basic analysis. Just as radio engineers and amplifier designers had studied noise once upon a time out of necessity, engineers involved with digital systems had to accord noise its due importance.

A single cosine wave was enough for BPSK. In the case of QPSK, 8-PSK, or 16-QAM, a cosine wave alone did not suffice since there were now many more phases to deal with. It seems reasonable to suppose that to encode 8-PSK, many more waveforms would be needed. Surprisingly, engineers figured out that they needed only two waveforms—a cosine wave and a sine wave. These two are orthogonal to each other. Using just these two as orthonormal basis, engineers were able to construct all the different PSK and QAM modulation schemes. In fact, given any set of symbols, the Gram-Schmidt procedure gave engineers the power to reduce the symbol set to a minimum number of basis functions. Even the apparently complex 16-QAM was simplified because the modulator and the demodulator needed to deal with only two basis functions. This practically lowered cost and brought system reliability. An equally important consequence was for demodulating signals from a background of noise. The only noise that mattered was those that mapped to the orthogonal basis. Any other noise, no matter how large, did not affect the signals. This was almost like saying that no matter how loudly bats scream and squeal, we don't hear them. The only noise that mattered was due to owls and hyenas.

It so happens that the complex plane is ideally suited to geometric representation of all modulation schemes that could be described in only two basis functions. Such a geometric treatment started with Kotelnikov and Shannon in the late 1940s. Engineers identified the axes with the basis functions. Knowing the phase and amplitude of symbols, all symbols could be expressed in terms of the basis functions and marked as points on the plane. The resulting diagram of all symbol points was named the _signal_ _constellation diagram_. This became a powerful tool for engineers to analyse BER performance. In higher dimensions beyond the complex plane, Shannon visualized a sphere around each point in the constellation. Even after adding noise to a signal, if the displaced point remained within the sphere, it could be demodulated without error. Only if noise exiled the point to a neighbouring sphere, the symbol was in error.

Signal Constellation of Common Digital Modulations

4-PAM and BPSK are seen as one-dimensional. 8-PSK and 16-QAM are seen as two-dimensional requiring two orthonormal basis functions.

By looking at the constellation, it becomes clear that as more points are added within the same area, distances between points decrease. Thus, it becomes easier for noise to introduce errors. Noise puts a limit on how many bits engineers can squeeze into a single symbol. Engineers then asked what might appear an obvious solution to the noise problem. To conquer noise, it is sufficient to increase the distance between points every time we add more symbols to the constellation. This was actually a good proposal because bandwidth efficiency was improved without compromising on BER. The catch was that by increasing the constellation area, the system effectively consumed more power. Therefore, it was not fair to compare 16-QAM against QPSK when the former consumed say four times more power. Here lies one of the toughest decisions that face communication engineers.

The problem of selecting a suitable digital modulation presents an engineer a classical scenario of trade-off. This is a trade-off between bandwidth and power. While _bandwidth efficiency_ is all about getting more bits out of each symbol, _communication efficiency_ is all about achieving almost error-free communication at the lowest possible signal power. The former is expressed in terms of bits per second per hertz (b/s/Hz). The latter is quantified by BER and SNR. Something more fundamental than SNR is Eb/No, Eb being energy per bit and No being a measure of noise power. Eb/No is particularly useful for comparing modulation schemes since signal bandwidth and transmission rate are taken out of consideration.

What trade-off means in this context, is that it is sometimes acceptable to increase power for the benefit of improved bandwidth efficiency. This makes sense for systems that are _bandwidth-limited_ , for example, telephone lines and microwave links. In general, engineers have a rule of thumb for such systems. Given bandwidth _W_ and symbol time _T_ , smaller the product _WT_ , higher is the efficiency. Where power comes at a high premium, the engineer can sacrifice bandwidth efficiency. These are _power-limited_ systems. Satellite communication systems and deep space probes that have little power to work with come under this category. Engineers knew that there was not a single solution or the best modulation that could be applied across the board. Each system required its own solution that often had to be fine-tuned. What engineers possessed were not rigid solutions but guidelines, principles, and methods to arrive at suitable solutions.

PSK and QAM modulations were used in bandwidth-limited systems. When it came to power-limited systems, the use of two orthogonal basis signals was simply not enough. Bandwidth was sacrificed to conserve power. More orthogonal signals were added so that bandwidth efficiency dropped below 1 b/s/Hz. In fact, Shannon had pointed out long ago that it was possible to achieve error-free communication with Eb/No as little as -1.6 dB. This came to be known as the _Shannon Limit_. The limit is practically not achievable since we require infinite bandwidth to communicate at such a low power. It is interesting though—if only we had infinite bandwidth, error-free communication was possible even when signal power was below noise power. One example of the use of multiple orthogonal signals in power-limited systems is _M-ary FSK_ modulation or _Frequency Shift Keying_ that uses _M_ orthogonal signals. Unlike QAM symbols, which are differentiated by phase and amplitude, FSK symbols are differentiated by frequency. When 64 such signals are used, bandwidth efficiency drops to 0.2 b/s/Hz but the system still requires 6 dB of Eb/No to communicate at a BER of 10-5. As more and more orthogonal signals are added, it becomes all the more difficult to reduce power requirements. One of the first uses of M-ary FSK was in the secure voice communication system, the SIGSALY, of World War II.

Though the V.32 modem standard used 16-QAM at 9,600 bps, it incorporated one important provision. Should a good channel become noisy at times, V.32 allowed the demodulator to instruct the modulator to switch to QPSK. QPSK compromised on the bit rate, delivering only 4,800 bps; but at least it suffered less bit errors. This is not very different from an American talking slowly to a Japanese so that he is understood. Modem technology is not rigid. The design allows modems to negotiate and configure themselves as appropriate to channel conditions or equipment capability. It is therefore customary for modems to go into a digital crackle of beeps and blinking lights at the start of every communication. This is their way of negotiating the best parameters for the talk that is to follow. Engineers have suitably named this behaviour _handshaking_.

This was all well and good but one little detail had been overlooked. How exactly was one to map bits to symbols? This was so trivial a question that no one had paid much attention to it. In the case of QPSK, one may start with the symbol on the top right quadrant and map this to bits 00. Moving anticlockwise from there, the other symbols may be mapped respectively as 01, 10, and 11. Very quickly, engineers realized that in some cases when symbol errors occurred, two bits were in error. At other times, only one bit was in error. It became apparent to them that the key to reducing errors lay in mapping bits to symbols smartly. It didn't take them long to figure out that they didn't need to be smart. They didn't need to innovate. Someone has already done it at Bell Labs in the 1940s.

Frank Gray, a researcher at Bell Labs, put out a patent application in 1947. It was titled "Pulse Code Modulation" but it was less about PCM and much more about a novel method of mapping quantized speech samples to bits. It was customary to map quantized values to direct binary representation. So a value seven was simply represented as 00111 in 5-bit resolution without companding. If the next quantized sample was eight, PCM encoding represented that as 01000. In other words, for this transition from seven to eight, four bit positions of the encoding had changed. This is not really a problem for modern PCM systems but back in the 1940s, the best PCM encoders were implemented by cathode beams, encoding masks, and collectors inside vacuum tubes. Even a small shift in timing or deflection of the beam would lead to erroneous encoding at the transition between the seven and the eight. This resulted in as many as four wrong bits. Many solutions were proposed towards greater precision of timing and beam control. Naturally, these approaches increased complexity and cost of PCM encoders. It was in this context that Gray proposed his revolutionary encoding.

Gray pointed out that the problem was not necessarily with hardware. It was software that needed some tweaking. The problem was in design, not in manufacturing. He proposed a new method of encoding whereby two adjacent symbols on the encoding mask differed in only one bit. So if there was an error, only one bit was affected. Interestingly, Gray's new code had other useful properties. Manufacturing of the mask was made simpler. The code had a symmetrical structure so that one half could be constructed as a reflection of the other. Gray named his code _reflected binary code_. Truly great inventors are not vainglorious in naming. Glory follows afterwards, naturally. Today the code is more commonly called _Gray Code_.

Years later when engineers tackled the problem of mapping bits to digitally modulated symbols, they used Gray Code. For the example of BPSK, the symbols would be mapped in sequence as {00, 01, 11, 10} rather than {00, 01, 10, 11}. This way, two adjacent symbols differ in only one bit. Symbols that differ by two bits are separated further apart along the diagonal of the signal constellation. This had the immediate effect of reducing BER and improving communication efficiency. Similar logic applies to higher modulation schemes. Gray himself might never have imagined that one day his code would occupy such a central position in all digital communication systems.

Gray Coding

Gray Coding ensures that adjacent symbols differ in only one bit. This has the advantage that if noise moves a symbol to an adjacent symbol, only one bit is in error.

The most difficult part of any modem is the work of the demodulator. Naturally, a good deal of engineering research has focused on the demodulator. In a limited sense, a digital communication system is not all that different from human face-to-face interaction. Talking is simple. Listening is difficult. Take 8-PSK for example. Each symbol is separated from its adjacent ones by a phase of 45 degrees. From each received symbol, the demodulator has to estimate its phase accurately. If it gets it wrong, bits come out with errors. BPSK is far simpler to demodulate because the symbols are separated by 180 degrees. All PSK modulation schemes require the demodulator to estimate the phase of the incoming carrier wave and track it accurately, a specialized job delegated to the _Phase Locked Loop (PLL)_. Demodulation done using such a carrier reference is called _coherent demodulation_.

Once the demodulator achieves coherence, it does some sort of a pattern matching. It takes the received signal and compares it against each of the possible signals that correspond to each point in the constellation. The point that gives the closest match is taken as the transmitted symbol. The symbol is interpreted to its associated bits and the job of the demodulator is done. The demodulator can be implemented using two equivalent techniques— _correlators_ or _matched filters_. Both can be viewed as implementations of pattern matching. Either way, demodulating a 16-QAM signal requires sixteen correlators or matched filters. In practice, the solution is ingeniously simple thanks to the concept of orthogonality. Correlation of the received signal is done against only the two basis functions, the cosine wave and the sine wave. The demodulator then estimates the phase of the received signal and selects the closest constellation point. This sort of demodulation is called _I/Q demodulation_ , that is, demodulation of a signal based on its basis components that are termed as _in-phase_ and _quadrature-phase_.

As systems became complex, it was natural and necessary for engineers to study and refine its components. They made a distinction between demodulator and detector. A demodulator simply compared the received symbol to possible ones of the constellation. The results of these comparisons were sent to the detector. Demodulator did the dirty work. Detector took the final decision based on inputs from the demodulator. This was a clever design. Receivers went through a stage when prototypes would become products fit for commercial use. In this stage, the demodulator was often stable. The detector, on the other hand, often required adjustments. By separating the two, the detector could be modified without affecting the demodulator.

An optimal detector calculates the probability that a particular symbol was transmitted given the received symbol. It does this for every symbol in the constellation. It then selects the symbol that maximizes this probability. This is often called the _maximum a posteriori probability (MAP)_ criterion since the method requires knowledge of probabilities of source symbols. When source symbols are equally probable, the MAP criterion leads to a simplification. The detector can evaluate the reverse instead—the probability of the received symbol given that a particular symbol was sent. It tries to maximize this new criterion, which is called _maximum likelihood (ML)_ criterion. In most practical implementations, this simplification is used. Maximum likelihood detection is another way of saying that the detector looks for the closest symbol point in the constellation.

Coherent demodulation is critical for PSK and QAM modulations. It is less important for PAM and FSK, although they do benefit from coherent demodulation particularly at lower SNR. Noncoherent demodulation of PAM and FSK are used for reasons of cost and simplicity or in extreme wireless channels in which tracking the carrier's phase becomes tough. Then engineers started to think about eliminating the need for coherent demodulation even in the case of BPSK.

_Differential PSK (DPSK)_ was an innovation that approached the problem from a new perspective. Since carrier phase estimation was so difficult, engineers asked themselves if they could build intelligence into the modulation so that estimation was no longer necessary. The answer came by way of a novel encoding of bits to symbols. Instead of absolute encoding, engineers suggested that each symbol represent a phase change with respect to the previous symbol. For example in BPSK, when the modulator saw a zero it did not change the phase but if it saw a one it changed the phase by half a cycle. As a result, the demodulator needed only to compare phase changes over two symbols. Coherent BPSK was definitely better since demodulation was done using an accurate estimate of the carrier phase. The performance of DPSK was relatively worse but its simplicity made it attractive. In fact, at a sufficiently high SNR of about 10 dB, at a BER of 10-5, DPSK requires only 1 dB more than BPSK. This was a small price to pay for reduction in hardware cost. A good quality telephone line has an SNR of about 35 dB or more. It is for this reason that the V.32 modem standard differentially encodes the phase changes within the 16-QAM constellation.

Digital modems are not limited to little boxes that sit between data terminals (fax machines or computers) and telephone lines. Data modems and fax modems came early on in the history of networked data communications. Towards the close of the 1980s, a new system of communication was in its final phase of design. It was digital and it was wireless. Launched in 1991, it was originally named _Groupe Spécial Mobile (GSM)_. Being digital, GSM too required a modem that converted bits to symbols and symbols to bits. Designers of GSM went around looking for a suitable modulation scheme. They found one in an invention that had come almost three decades earlier.

It was in the sixties that people began to look towards improving the spectral behaviour of modulations. Orthogonal M-ary FSK modulation had become a standard for power-limited systems but it was not bandwidth-efficient. Moreover, when frequencies were switched symbol to symbol, signal power leaked into neighbouring frequencies. This reduced performance and efficiency. This problem plagued PSK systems as well in which transitions of phase occurred suddenly. For example, when bits 00 followed 11 in a QPSK scheme that used Gray coding, the phase changed suddenly by 180 degrees. At the transition, signal amplitude passed quickly from one extreme through zero to the other extreme. These sudden changes caused interference to neighbouring frequencies. Engineers began to look for better modulations that changed the phase in a continuous manner so that the signal bandwidth was contained, so that it was never more than what it needed to be. At the same time, to prevent drastic changes in amplitude, they looked towards modulations that would give a _constant envelope_. From these investigations emerged _Continuous Phase FSK (CPFSK)_ and _Continuous Phase Modulation (CPM)_. The modulation that was chosen for GSM is part of this family of continuous phase and constant envelope modulation methods.

In patent filing of 1958 Melvin Doelz and Earl Heald introduced _Minimum Shift Keying (MSK)_ modulation that would one day form the basis of global cellular communications. MSK is just one of many silent workers that perform their magic inconspicuously in the background. A characteristic aspect of all continuous phase modulations is that phase of current symbol is dependent on the phase of previous symbol. This means that such modulations are endowed with memory. Each symbol is not modulated independently of another. Memory enables the modulator to output waveforms that do not contain phase jumps. The simplest of such modulations was a variant of QPSK in which _I_ and _Q_ components are offset by a symbol period. This simple arrangement, named _Offset QPSK (OQPSK)_ , minimized phase jumps. Symbols transitions are limited to either _I_ or _Q_ , and not both at the same time. OQPSK may have more frequent phase jumps for a particular sequence of bits but each jump was limited to 90 degrees. Unlike QPSK, it could never have a 180-degree jump. From here, it was a simple matter to construct MSK signals.

Each OQPSK symbol in its _I_ and _Q_ components was weighted by a sinusoidal wave over two symbol periods so that each symbol had a greater absolute weight at its centre that smoothly dropped to zero at its edges. This effectively eliminated phase jumps. While _I_ and _Q_ components were individually sinusoidal, the composite MSK signal had a constant envelope. An MSK signal in the spectral domain is more compact than either QPSK or OQPSK. MSK signal power falls off quickly away from the carrier frequency, thereby causing less interference to neighbouring bands. The modulation that was eventually standardized for GSM was a modification of MSK. Since GSM requirements for adjacent channel interference were stringent, MSK in its plain form wasn't good enough. GSM therefore adopted a Gaussian pulse shaping instead of the sinusoidal shaping of MSK. The modulation of GSM is therefore called _Gaussian MSK (GMSK)_.

Digital Modulation Waveforms

(a) QPSK modulation shows sharp phase transitions of 180 degrees. (b) OQPSK limits such transitions to 90 degrees, though they may occur more often. (c) In-phase of MSK waveform. This is shown along with its input bits. (d) Quadrature-phase of MSK waveform. This is shown along with its input bits. (e) MSK, combined from (c) and (d), has constant envelope and continuous phase transitions.

MSK can be considered not just a form of OQPSK but also a form of CPFSK. In fact, the waveform is seen to toggle between two frequencies around the carrier frequency while maintaining continuous phase changes. This is no coincidence. Frequency is really a change of phase. When MSK signal phase is rising, the higher frequency is in effect. When MSK signal phase is falling, the lower frequency is in effect. The demodulator for its part needs to look at two symbols before it can do its job. This comes naturally as a consequence of modulation methods that have memory. Demodulation of a symbol must be with respect to the previous symbol. MSK is thus one of those modulations that seem to belong happily to two family trees. Ultimately, MSK signals are unambiguously defined but they can be analysed in more than one way.

While signal constellation diagrams and Gray coding are useful in PSK and QAM, they are not directly applicable for CPFSK and CPM. This is because CPFSK and CPM are based on phase changes, not phase values per se. Constellation diagrams are still used, but they do not show bit encoding. Their use is primarily to visualize start and end phases, and the paths the modulator takes for phase changes. These diagrams, sometimes appearing in technical journals as decorative geometric patterns, mean a lot more to an engineer than an artist.

With every generation, engineers kept innovating. New methods of modulation kept coming but the rewards were becoming increasingly tenuous. One wondered if they had reached a fundamental limit. One wondered if their efforts to improve things further were in vain. Of course, Shannon had given the answer long ago. Yes, there was a limit and Shannon's noiseless coding theorem gave the numbers. Despite their best efforts, engineers had managed only a marginal success. They had fared badly against Shannon's expectations. BPSK modulation at a BER of 10-5 required about 10 dB of Eb/No; but according to Shannon capacity bound, it should have been only 0 dB. 64-QAM, though more bandwidth-efficient than BPSK, took 8 dB more power than predicted by Shannon. Clearly, something was missing. Where had they gone wrong?



**It was in** 1947 that a mathematician who had joined Bell Labs just the previous year was given the task of performing some heavy-duty computations. Computation time in those days was expensive and limited. The mathematician could put his calculations into the wheelwork only on weekends when the machine became available. Moreover, computers were not the electronic devices of today. They were built out of vacuum tubes and mechanical relays. Needless to say, they were far more unreliable than modern day computers. When any large program was executed, chances were that things went wrong. If things did go wrong, the computer helplessly threw up its hands in the air, trashed the programmed, and moved on to the next one in the queue. It was therefore not surprising that by the second consecutive Monday back at the office, the mathematician was exasperated,

Two weekends in a row I came in and found that all my stuff had been dumped and nothing was done.... And so I said, "Damn it, if the machine can detect an error, why can't it locate the position of the error and correct it?"

This rather innocuous observation was the start of a spectacular branch of mathematics that would soon serve its grand purpose in the field of communication engineering. The mathematician in question had once contemplated becoming an engineer; but fate had intervened in the form of a scholarship in mathematics. Soon after getting his PhD in mathematics from the University of Illinois at Urbana-Champaign, he worked on the Manhattan Project. When the war ended, he went where many of the brightest minds of the day worked. That mathematician was Richard Hamming. The branch of mathematics he launched is today called _Coding Theory_. Though a mathematical theory, back in 1947, its engineering application far outweighed its mathematical background.

It wasn't long before Hamming came up with a method by which errors could not only be detected but also corrected. He made the computer a bit more intelligent, whatever intelligence meant for non-cognitive mechanical systems. The computer no longer halted when it saw errors. To err is human; to forgive is divine. Researchers like Hamming could now aspire to more than divinity. Why stop at forgiveness when one can correct those errors? Why wait for a second chance when you can recover and move on with the first attempt?

While source coding theorists had removed redundancy from messages to achieve compression, Hamming realized that one must add redundancy to recover from errors. The simplest method of adding redundancy was of course by repeating the symbols. Repeating every bit twice over meant that the sequence 01001 would be transmitted as 000111000000111. This naturally slowed down overall rate of communication to a third but the advantage gained was in terms of error protection. In fact, the 1/3 repetition code corrected single errors. Extra bits inserted into a bitstream are called _parity bits_.

The idea of parity and purposefully adding redundancy was not due to Hamming. When computers at Bell Labs detected errors, it was parity that enabled detection. Parity was simply a set of redundant bits concatenated with the information bits based on some clever mathematics. Information bits and parity bits taken together must obey certain rules of mathematics. If they did not, the computer flagged an error, like an accountant looking at a balance sheet and raising a red flag when the numbers didn't add up. In the simplest of schemes, parity was a single bit designed to result in odd or even number of ones based on a block of information bits. For example, if the design required a single odd parity bit to a group of three information bits, then the following combinations of (information, parity) were valid: (000, 1), (001, 0), (010, 0), (011, 1), (100, 0), (101, 1), (110, 1), (111,0). In each four-bit sequence, the parity bit ensured that there were an odd number of ones. Such a sequence was called a _code_ and each part of it was called a _codeword_. If a single error occurred, the computer could detect it because the resulting four bits would not match any of the valid codewords. If two errors occurred in a four-bit sequence, the error sneaked through the computer's primitive detection. In any case, the best the computer could do was to detect errors. It didn't know which bit was in error or how to correct errors.

Many such error-detecting codes were in existence and some were in active use. The relay computers at Bell Labs often used something called the _2-out-of-5 code_. The mathematical rule for constructing this code was simple. In a group of five bits, exactly two ones were allowed to occur. If the computer encountered anything else, it was an error. Programs those days were on perforated tapes. An extra hole, a blocked hole, or a smudged tape led to errors. The computer detected them. Another popular code was the _van Duuren Code_ , otherwise known as _3-out-of-7 code_. In this code, there were thirty-five valid codewords. This was simply the number of ways in which three ones can be arranged in a group of seven bits. Thirty-five codewords were more than enough to represent all letters of the English alphabet. For computing, the van Duuren Code was better suited than Morse code since each character was of fixed length and its rule of three ones enabled error detection. The van Duuren Code was used, for example, in radiotelegraphy. It is obvious in these early examples that the 2-out-of-5 code used even parity and the 3-out-of-7 code used odd parity.

Today the use of parity for error detection is widespread. Most books have a number called International Standard Book Number (ISBN). Most products have thin strips of black lines for product identification. Commonly called the _barcode_ , optical scanners read it. ISBNs and bar codes contain parity bits so that scanners can detect errors. Thus, an ISBN of thirteen decimal digits has its last digit solely for parity check.

Applications of Error Detection and Correction

(a) Such barcodes are commonly found on everyday products. In this example, 13 digits are used. The first digit, 5, represents the method of encoding digits as strips of black and white lines. The last digit, 7, is parity. (b) Such QR codes are used to refer users to websites. In this example, the word _Wikipedia_ , while being artistic, destroys some of the original information. Despite this, scanners can read the data correctly because of error correction. Source: Qrc-designer, Wikimedia Creative Commons.

Hamming's invention is part of a family of codes that today go by the name of _linear block codes_. The encoder takes a block of _k_ information bits and then gives out _n_ bits. In the process, it adds _n-k_ parity bits. Such a code is called an ( _n_ , _k_ ) code with a code rate of _k_ / _n_. A low code rate meant higher redundancy, better error protection but lesser information bit rate. It was linear because a simple addition of any two codewords resulted in another codeword. When Hamming invented his own (7, 4) code, he may not have known about the general case of linear block codes. Compared against (3, 1) repetition code, Hamming's (7, 4) code was vastly more efficient. The latter required seven bits to transmit four information bits, while the former required twelve bits to transmit the same four information bits.

Because of this improved efficiency, Hamming simply named his code _minimum redundancy code_. Confusing one might say, because David Huffman in 1952 used the same name for his own code. The difference was that Hamming was adding redundancy while Huffman was removing redundancy. Hamming belonged to the camp of channel coders while Huffman belonged to the camp of source coders. Hamming was trying to correct errors while Huffman was trying to compress data. Hamming didn't need compression to run his program over weekends. What he really needed was error correction.

The manner in which Hamming constructed his (7, 4) code, containing four information bits and three parity bits, is in itself so simple that one feels that one doesn't need a PhD in mathematics to perform this feat. He used even parity for all codewords. Ingeniously, he included both information bits and parity bits in the computation. He carefully selected the formulas for computing each parity bit. Every codeword had seven bits that included three parity bits. The three parity bits could represent eight unique values. Almost by magic, the numbers matched his intention perfectly. The eight values of parity bits could represent an error-free codeword or indicate the exact position of a single error within the seven-bit codeword. It didn't matter if the error occurred to an information bit or a parity bit. The code was designed to locate and correct a single error in any bit position. While earlier engineers had attempted to overcome noise using better electronics, Hamming invention was novel. It accepted noise at a part of life, even in a digital world. The key was to correct errors when they occurred. Hamming later commented on the context in which the invention took place,

We grew up in the great depression, so we believed we owed the world a living. During the war, we all had to learn things we didn't want to learn to get the war won, so we were all cross-fertilized. We were impatient with conventions and had often had responsible jobs very early.... We were first-class troublemakers. We did unconventional things in unconventional ways and still got valuable results. Thus, management had to tolerate us and let us alone a lot of the time.

At Bell Labs, Hamming shared his office with Shannon for a while. Naturally, they talked about each other's work. Hamming talked about his new (7, 4) code. Shannon talked perhaps little less about the grand paper soon to be published. Shannon's own approach to error correction was somewhat different. Starting from his typical sequences, Shannon proposed his _random codes_. In other words, given _M_ possible source messages, each messages could be mapped to an _n_ -bit sequence whose ones and zeros could be chosen at random. So long as _n_ was much greater than _M_ , there was negligible chance that one codeword would clash with another. In fact, one codeword would differ significantly from the rest and this separation enables error correction and detection. Precisely because of the constraint of large _n_ , Shannon was well aware that his method could not easily be reduced to practice. It was therefore left to Hamming to bridge the gap between simple repetition codes and Shannon's unrealizable random codes.

When Shannon's classic paper of 1948 came out, he slipped in Hamming's code as an example. Patent attorneys of Bell Labs clearly had no idea about this. They did manage to prevent Hamming from publishing his invention. Hamming never imagined that something of this nature, a piece of software, a code, a mathematical formula, could be patented. Meanwhile, Shannon's paper had put Hamming's (7, 4) code out in the open. The code, soon to be known simply as _Hamming Code_ , came to the attention of Marcel Golay, a Swiss scientist with a doctorate in physics from the University of Chicago. At the time, Golay was with the US Army Signal Corps Engineering Laboratories. Golay realized the brilliance of Hamming Code. It dawned on him that this was a _perfect code_. No parity bit was wasted. No combination of parity bits was wasted. Each combination indicated either a unique bit error or the error-free case. It intrigued him if there were other perfect codes in existence. This was no longer an engineering problem. It was mathematics delving deep into the mysteries of numbers and their inner secrets.

In June 1949, Golay's brilliant paper appeared in the _Proceedings of the IRE_. The paper was little more than half a page. Golay's (23, 12) code could correct up to three errors. Apart from this perfect binary code, Golay also mentioned a ternary code, that is, one that used not just ones and zeros but also twos, an alphabet that had three possible values. Golay's (11, 6) ternary code could correct up to two errors.

With such a short paper, the details were understandably sparse. It is likely that many took a while to understand it. Most certainly, no one knew how Golay had managed to find these perfect codes involving so many bits. With twelve information bits and eleven parity bits, the calculations were not simple. Years later, Golay himself acknowledged that he had no idea how he had managed it. To ensure that the (23, 12) code obtained its three-error-correcting capability, as many as 2509 combinations had to be verified. When asked, Golay simply said that he verified them by inspection. In the world of mathematics, verification by inspection and proof by enumeration are possibly the worst methods. Mathematicians like elegance and simplicity. It they have to resort to anything more elaborate, it is their failure to unveil the inner beauty of mathematics.

The real task in constructing error-correcting codes is finding the best way to compute the parity bits. It couldn't be arbitrary. Each codeword had to be not only unique but also "far away" from all other codewords. The farther a codeword was from another, the better it was for error protection. If a little noise corrupted an encoded signal, the noisy signal would still hover in the neighbourhood of the original codeword. Within the neighbourhood, the receiver could still figure out that the original from the corrupted. This was how error correction worked. It was Hamming who introduced the term _distance_ to signify how far a codeword was from another. While _Euclidean distance_ was what mattered in the design of signal constellations (such as QPSK or 16-QAM), in the design of channel codes what mattered more was _Hamming distance_.

Hamming distance between two codewords was simply the number of bit positions in which they differed. Just as Euclidean distance determined if the demodulator mistook one symbol for another, Hamming distance determined if the channel decoder mistook one codeword for another. In both cases, the decision boundary was in between the two symbols or codewords. What these distances shared in common was noise. Noise was the great unifying factor as it were. Noise, as bad as it was, was also democratic and fair. Communication engineers realized that if we can't beat noise, we have to protect against it. Modulation had got them only so far. Perhaps what really was needed was a combination of efficient modulation methods and good error-correcting codes. Once this realization came, coding and modulation became co-conspirators against the fair tyranny of noise.

For Hamming Code, this protection against noise came due to a Hamming distance of three. In other words, each codeword differed from another in exactly three bit positions. This meant that if a single error occurred, the corrupted codeword would be closest only to the original codeword. Hence, the error can be corrected. In the case of Golay (23, 12) code, the distance was seven. Thus, three errors could be corrected. If four bit errors occurred, the decoder translated that to a wrong codeword. Four or more errors were not correctable. The reason error correction is possible is because some possible bit patterns are not valid codewords. In (7, 4) Hamming Code, seven bits can yield 27 = 128 possible codewords. However, only 24 = 16 codewords are considered valid. All remaining 112 codewords are considered invalid. When the channel decoder encounters invalid codewords, it flags them as errors. The existence of invalid codewords is the real meaning of redundancy in channel coding.

Code Design via Sphere Packing

(a) Without coding, a zero can be misinterpreted as one and vice versa. No error detection or correction is possible. (b) A (3, 2) block code in which valid codewords are farthest from each other and they differ in all three bits. It can detect single errors. (c) Error correction is possible because many codewords fall within the sphere of a valid codeword. Error detection is possible because some codewords are left out as orphans. When noise is too strong, a transmitted codeword may move to the region of a neighbouring sphere. In this case, parts of the original message are lost.

Hamming did get his patent with assistance from his colleague Bernard Holbrook. When he eventually published his code in 1950, it came a year after Golay's own publication. Naturally, there was some debate about who should get credit for what. Those supporting Hamming claimed that he knew all the binary codes even before Golay's paper. Today it is customary to credit Hamming for all the single-error-correcting codes even for non-binary cases. Golay gets credit for all the multiple-error-correcting codes. At this point, coding theorists started to wonder if there were any other perfect codes. They searched and found none. Eventually, proofs showed that no other non-trivial perfect codes exist. The perfect codes found by Hamming and Golay were the only ones in nature.

This then was the state of the art in channel coding for much of the 1950s. Despite the error-correcting capabilities of Hamming and Golay Codes, implementing encoders and decoders for these codes was not trivial. Computers were meant to perform real work and to take some of that computation to perform error correction was expensive. Noise was also not well behaved. Errors sometimes occurred in bursts and affected many bits at once. More powerful yet simpler codes were needed. Three things then happened independently.

Two researchers invented in 1954 a class of codes that soon came to be called _Reed-Muller Codes_. In 1959-1960, a new class of linear block code called _BCH Codes_ came into existence. The name was based on the initials of its three inventors: Bose-Chaudhuri-Hocquenghem. Quick on its heels came the _Reed-Solomon Codes_. The coming of BCH Codes was a watershed in coding theory. After a lull of nearly a decade, the new codes promised better efficiency with simpler implementation. This was also timely. The Soviets had launched Sputnik I in 1957. They had sent the first human to space. Their Luna spacecrafts were the first to land on moon and send back to earth first pictures of the lunar surface. The US needed to catch up. They did not know how the Soviets had done it. Perhaps communicating from the moon was manageable but deep space communication was in another league altogether. Reliable communication technology meant better channel codes that could correct errors. Error correction those days had not been important in telephone networks or on terrestrial wireless links. In the worst case, the network made wrong connections. The customer hung up and simply redialled. With space communications, delay was significant. Retransmission of messages was not the ideal solution. It was far better to correct errors. What really opened the floodgates was the discovery of a specialized branch of mathematics invented by a young Frenchman more than a century earlier.

The story of Évariste Galois is one of unfortunate circumstances that someone of an Eastern philosophy might label as fate or _karma_. He was a genius in mathematics who devoured the classical texts of Lagrange and Legendre in a matter of weeks when others would take many months. This gift was equally matched by his lack of discipline. He got demoted in school for failing in classics. He did most calculations in his head because the methods were so obvious to him that he felt embarrassed to write them down. His notes were messy and unreadable. His intuition of mathematics was such that none of his teachers understood him. He attempted twice to enter the École Polytechnique. On both occasions, he failed the entrance examinations. When he did manage to write a paper or two, they seemed to disappear into black holes.

When Galois was only seventeen, he submitted to the Académie des Sciences a paper inspired from a reading of Lagrange's work. The paper reached the desk of Augustin-Louis Cauchy. Cauchy was the first of the great nineteenth-century mathematicians to put calculus and analysis on a more rigorous footing. Cauchy meant to read Galois's paper and present it to the Academy in January 1830 but nothing was heard of it thereafter. The manuscript was lost forever. Galois was persuaded to submit a second manuscript as an entry to the Grand Prix of the Academy, perhaps the most coveted award in all of Europe. The manuscript is said to have reached Jean-Baptiste Fourier, one of the judges for the competition. Fourier supposedly took the manuscript home but as fate would have it, Fourier died in May 1830. No more was heard or seen of this manuscript.

By now, it seemed to Galois that the entire French education system was against him. Denied entrance to the École Polytechnique, Galois settled for the mediocre École Préparatoire. As if to give occasion to vent his frustration, the Second French Revolution erupted. In July 1830, King Charles X dissolved parliament, restricted voting rights, and curtailed freedom of the press. Riots erupted all over Paris and beyond. Nothing of this scale had been seen since 1789. When Galois was prevented by the Préparatoire in taking part in the revolution, it served only to incite more of the revolutionary fire inside him. When he got his chance, he spoke openly in support of the revolution. This got him expelled from the Préparatoire but it gave Galois the freedom he desired. He joined the National Guard. He participated in underground meetings. A new revolution was needed and it would install a true Republican government. In the view of hardcore revolutionaries, the July revolution had failed. Monarchy had been restored with King Louis-Philippe at the helm.

It was in January 1831 that Galois started presenting lectures to the public on some of his mathematics. His lectures were just as abstruse as his papers had been. It was therefore not surprising that his audience dwindled and eventually disappeared altogether. One good thing came out of these lectures. The great mathematician Siméon Denis Poisson convinced Galois to submit once more to the Academy a manuscript detailing his novel ideas. Galois's ideas in mathematics were so revolutionary that no self-respecting mathematician who cared enough about the subject could ignore them. Galois did submit a manuscript. Nothing was heard of it for five long months until Poisson was goaded into action by some bad publicity. One of Galois's friends printed an article in the _Le Globe_ , condemning the Academy and Poisson's lack of response in particular. Poisson returned the manuscript back to the author with some not so positive comments. It was clear that Poisson had understood little of Galois's ideas.

If Galois had to make any impact in mathematics, he had to write down his ideas clearly. He had to lead the reader systematically. In July 1831, Galois was arrested and sentenced to nine months in prison for wearing the banned uniform of the National Guard and carrying weapons. Prison life was a blessing in disguise. Galois started to write up his ideas. He completed his sentence in April 1832 and it was about this time that he fell in love. Little did he realize that the dynamics of love are not as clear-cut and definite as in mathematics. Failure in love led Galois to a duel arranged for the morning of May 30, 1832. The previous night, Galois scribbled hastily some his mathematical thoughts on a few sheets of paper. The collected works of his writings today fill a meagre sixty pages but they were enough to establish a new branch of mathematics called _Galois Theory_. The next morning he was found bleeding from a bullet wound to his stomach. He was only twenty-one.

Mathematicians of the past had solved quadratic, cubic, and quartic equations. These were equations that contained in them variables raised to the power of two, three, and four respectively. For centuries, mathematicians had battled the problem of solving quintic equations, those with fifth powers. Just before the time of Galois, Neils Abel had shown that no general for equation could exist for solving such equations. Galois extended and refined this result by showing the necessary and sufficient conditions in which equations of higher powers could be solved. In the process, he introduced to the world a new form of abstract algebra that built on Cauchy's _Group Theory_.

Galois introduced the concept of _finite fields_. It suffices for us to know that fields obey certain mathematical rules. When operations are carried out with only a limited set of numbers, then such fields are termed finite. Everyday algebra is not finite since numbers can take any value. The ideas of Galois were truly abstract. There could be little physical relevance to them. It was not certain if finite fields would have any applications in the real world. The question didn't bother mathematicians. In fact, those who indulge in pure mathematics find nothing more satisfying than spend their entire lives working on ideas as far removed from real applications as possible. They do not do this deliberately as a protest against applied mathematics or engineering. They truly believe that whenever they are discovering new theorems, they are penetrating a deeper layer of reality. That there are only a handful of perfect codes is not an accident of nature. It reveals a structure among numbers that couldn't have been in any other way.

So, one can imagine the surprise when coding theorists found in the mid-1950s that Galois theory had direct relevance to what they were doing. This application of Galois theory was what united Reed-Muller (RM) codes, Reed-Solomon (RS) codes, and BCH codes. Irving Reed was a mathematician. His former classmate at Caltech, David Muller, had become a theoretical physicist but he was just as good with mathematics. In fact, Muller's mother was a mathematician and his father, Herman Muller had been awarded the Nobel Prize in genetics. As for Gustave Solomon, he was a pure algebraist with a PhD from MIT. He joined Reed at the MIT Lincoln Laboratory. The three inventors of BCH codes were all mathematicians. Therefore, following the early work of Hamming and Golay, real impetus to the subject came from first-class mathematicians who were persuaded to move away from the abstract into something that could be used in the Space Race against the Soviets.

Galois theory brought structure to the design and analysis of codes. Structure is important because it can be exploited towards simpler implementations of encoders and decoders. The reason all this had become possible was that in the digital world there were only two possibilities—ones and zeros. The design of codes involved only ones and zeros. In the new terminology, error-correcting codes functioned in GF(2), a Galois field of two elements. All arithmetic operations were reduced to this field. Additions were performed in a new form of arithmetic called _modular arithmetic_. In modular-2 arithmetic, one plus one equalled zero, not two. If Galois had used his finite fields towards analysing algebraic equations of higher powers, coding theorists used the same tools to isolate bit errors. In fact, the roots of algebraic equations gave the location of bit errors.

Mathematics gave coding theorists new perspectives. They saw that linear block codes could often be _cyclic codes_ as well. In other words, a cyclic shift of a codeword by a single bit resulted in another valid codeword. They saw that encoding and decoding operations could be expressed as algebraic operations in GF(2). Alternatively, codewords and the process of computing parity bits could be neatly written down as vectors and matrices. These mathematical tools gave a further layer of abstraction but the results were the same. In the realm of the abstract, there was more than one way to do things. The process of decoding was equated to a typical day at the clinic. A doctor looked at symptoms and prescribed medicines. The task of locating bit errors was somewhat similar and it acquired a fancy term: _syndrome decoding_.

A channel code, no matter how great in its error-correcting capability or its beautiful mathematical structure, is only as great as its best decoder. Just as demodulators called the shots in modems, decoders dictated what was practical in the real world. Therefore, a lot of research effort was expended towards simplifying decoders. If optimal decoding was not possible, engineers compromised on bit errors for simpler decoding. The first decoding algorithms for BCH codes appeared in 1960 due to W. W. Peterson. True simplification came four years later due to R. T. Chien, who made use of the cyclic properties of BCH codes. BCH decoding matured only towards the end of that decade with the coming of _Berlekamp-Massey algorithm_. At least for BCH codes, it was true that decoders benefited a lot from mathematics. Coding theory straddled the boundary between mathematics and communication engineering. Pure mathematics, which was once thought sublime and useless, had become practical.

Reed-Solomon codes, which are really an extension of BCH codes for non-binary alphabets, turned out to be more powerful than BCH codes in particular cases. It came out of a defence contract project at MIT Lincoln Laboratory and was ingeniously simple. If BCH codes added redundancy at the level of bits, the aim of RS codes was to add redundancy at the level of symbols, with each symbol formed from a finite number of bits. While BCH code corrected bits, RS codes corrected symbols, which translated into correction of many consecutive bits. To give ourselves an idea of its power, BCH (255, 223) code corrects four bit errors. RS (255, 223) code corrects two symbol errors with each symbol being an 8-bit byte. In other words, it corrected sixteen bit errors, four times as much as the equivalent BCH code.

One might argue that this is an unfair comparison since the example BCH code operated on 223 bits but the example RS code operated on a longer sequence of 223 x 8 = 1784 bits. This increased block length only means greater delay and memory requirements. Otherwise, the two codes have the same algorithmic complexity. The second argument could be that the example BCH code corrected four bit errors in any position while the example RS code handled only two symbols in any position. For RS codes, at the level of bits, errors had to be correlated. RS codes were better when noise came in short bursts rather than being completely random. Then engineers came up with another invention that was an epitome of simplicity itself and yet more powerful than any code they had ever invented.

Suppose one took a block of bits, rearranged the bits in some known order and then output the scrambled bits to the modulator. Suppose noise came in bursts and caused many consecutive bits to be in error. After the demodulator, the channel decoder did not straightaway commence its job of correcting errors. Instead, the bits were first rearranged back to their original order before the channel decoder took over. In the process, the effect of burst noise was mitigated by spreading out the errors over a block of bits much larger than the length of the noise burst. Bit errors appeared uncorrelated and effectively matched to the capability of decoders. This process of rearranging bits is called _interleaving_. In simple terms, interleaving was all about putting one's eggs in many baskets for transportation. To the end user though, all eggs were rearranged and delivered in a single basket.

Bit Interleaving

(a) Bits coming out of the channel encoder include both information and parity. (b) Bits are rearranged by the interleaver. (c) A burst error corrupts all four bits in a single block of data. (d) At the receiver, de-interleaving scatters bit errors across multiple blocks, which can be more easily corrected by the channel decoder.

While RS codes had in-built capability to handle errors in small bursts, interleaving helped when bursts were much longer than the number of symbols they could correct. In addition, a combination of interleavers and multiple RS codes enabled correction of random as well as burst errors. The perfect example of such a system is the audio CD, launched commercially in 1982. It must not be supposed that channel coding is only about communication from one place to another. It is also about storage, that is, communication from one time to another. Storage devices too suffer noise due to mishandling and general wear and tear. To protect against such things as smudges, scratches, and substandard recording equipment, Sony and Philips jointly released in 1980 a coding standard. Within limited circles, this came to be called the _Red Book Standard_ , no doubt in reference to the colour of its binding.

Error protection in audio CDs was achieved through several stages—an RS (28, 24) code, then an interleaver, then an RS (32, 28) code, yet another interleaver, and finally more bit transformations to suit the physical characteristics of optical storage. So although stereo audio operated at about 1.4 Mbps, bits were actually read from storage at thrice that speed. In other words, two-third of the data on an audio CD is actually for error protection. The protection is so powerful that a scratch 2.5 mm (4000 bits) long can be fully recovered. Scratches up to 7.7 mm (12,300 bits) can be partially recovered by interpolating from neighbouring bits. This is certainly a long way from the single error-correcting capability of Hamming Code. It is not an exaggeration to claim that without RS codes we would not have audio CDs, DVDs, satellite communications, and mobile phone technology. The fact that RS codes operated on symbols meant that they dovetailed well with M-ary FSK modulation for space channels. While these developments triggered by an understanding of Galois fields were happening in the coding world, something interesting happened in the mathematical world.

It almost never happens that engineering and applied mathematics contribute in any major way to pure mathematics. This was all about to change in a spectacular way in the mid-1960s. If algebra had given power to coding theory, geometry was no less successful. In fact, geometric interpretation came first. It was geometry that Hamming had used to explain his perfect codes. The (7, 4) Hamming Code can be seen as a code in a geometry of seven dimensions. Each bit of a codeword represented one dimension. When Hamming defined the distance between two codewords, he was actually referring to the number of dimensions in which two codewords differed. He extended this to define the _Hamming Sphere_. Such a sphere of unit radius had a codeword at its centre and contained on its surface all points at unit distance from the centre. If all such spheres filled the n-dimensional space completely without overlap, we had a perfect code.

To take a specific example, the (7, 4) Hamming Code corrects single errors. The 7-dimensional space has 27 = 128 points. Considering single errors and the no error case, every Hamming sphere in this space contains exactly 7 + 1 = 8 points. This means that the geometry contains 128/8 = 16 non-overlapping Hamming spheres of unit radius. Each Hamming sphere represents exactly one valid 4-bit information bit sequence. This is the geometric interpretation of perfect codes. The same holds true of Golay (23, 12) code except that the spheres are of radius three.

The link with pure mathematics was becoming obvious. Better design of a code was simply how tightly one could pack spheres in a given dimensional space. Tighter the packing, more errors the code can correct in that dimension. The problem of sphere-packing, as it came to be called, had been studied Johaness Kepler as far back as 1611. When grocers arrange their oranges every morning, they are unconsciously addressing the sphere-packing problem. They attempt to pack as many oranges as possible in a given finite space. While a grocer did this in three dimensions, coding theorists had done it in higher dimensional spaces. Mathematically, the design of linear block codes can be seen simply as a transformation. Golay (23, 12) code is a transformation from 12-dimensional space to 23-dimensional space. In the process, closely spaced points are stretched to packed spheres. Points in 12-dimensional space are just information bits but spheres in 23-dimensional space are codewords with ability to correct errors. In a grocer's lingo, codewords are the best places to place the oranges, assuming we can visualize oranges in a 23-dimensional space!

Golay had not stopped with (23, 12) perfect code. When he added an extra bit to each codeword, he obtained a new code, which today we call the _extended_ Golay (24, 12) code. This code had the additional advantage that it could detect four bit errors because the minimum distance of the code had increased to eight. Likewise, there also exists the extended Hamming (8, 4) code. These formed the starting point that interested British mathematician John Leech in the problem of sphere-packing.

Taking the cue from coding theory, Leech wondered about the best way to pack spheres in higher dimensions. The perfect codes of Hamming and Golay suggested there might be something more special about them than what had been discovered. Indeed, Leech discovered that the best packing methods in 7-dimensional or 23-dimensional spaces were not as good as those in 8-dimensional or 24-dimensional spaces. For example, packing in 23-dimensional space approached 57% of the theoretical upper bound but in 24-dimensional it was almost 80%. More astonishing was the fact that the number of neighbours that each sphere touched matched the theoretical upper bound. Mathematicians call this the _kissing number_. Each 24-dimensional sphere kissed 196,560 neighbours. The centres of these 24-bit codewords formed a lattice in geometrical space. It was soon named the _Leech Lattice_.

The beauty of the Leech lattice was its symmetry. One could rotate it or flip it and the properties of the lattice would remain the same. This was right in the alley of group theorists, who deal with symmetries. Leech, however, was not one and he went around scouting for someone who might take interest in his lattice. Because of his lattice's symmetry, Leech suspected that there might be an undiscovered group within its structure.

Over the decades, group theorists had codified all the groups they had discovered. Groups to group theory are something like elements to chemistry. Today we know that there are eighteen _finite simple groups_ that neatly fit into a pattern just as elements fit into Mendeleev's periodic table. When French mathematician Émile Mathieu discovered five new groups in the nineteenth century, the pattern was broken. Mathieu's groups didn't seem to fit with the others. The misfits were named _sporadic groups_. So, one can imagine the surprise when more than a hundred years later Croatian mathematician Zvonimir Janko discovered more sporadic groups.

In the summer of 1966 John Conway, a mathematician from Cambridge, attended the International Congress of Mathematicians at Moscow. At this conference, Conway was introduced to the lattice Leech had discovered just two years earlier. Leech seemed to think that there might be a new sporadic group hiding within the lattice, though at this time he had not yet achieved the best packing in the 24-dimensional space. Might Conway be interested in this problem? By then, Conway was in his thirties and he felt he hadn't made any significant contributions to mathematics. The Leech lattice could perhaps be his chance to make a mark on mathematics. He made a plan. He would work every Saturday for twelve hours straight and put in another six hours on Wednesdays. He would keep to this routine week after week until he discovered the group within the Leech lattice. Naturally, his family wasn't too pleased with his decision, but when the mathematician had made up his mind, they didn't have much of a choice.

Thus, it happened that on the first Saturday, he locked himself promptly at noon and got down to serious study of the Leech lattice. The calculations were tediously long and a discarded roll of wallpaper served the purpose well. By six that evening, he had some idea of a new sporadic group that was beginning to emerge from the lattice. A little later the same evening, he discovered that the group contained sub-symmetries of many other known sporadic groups. Whatever it was, this group was huge. Over the course of another six hours, he worked out the order of the group, that is, the number of symmetries of the group. What had started as a project for many weeks had been accomplished in a single sitting. In fact, Conway discovered not one but three new sporadic groups. The first of these had 8,315,553,613,086,720,000 symmetries.

The discovery of these new sporadic groups by Janko and Conway generated new interest. About the same time as Conway's discovery, German mathematician Bernd Fischer discovered three more sporadic groups. There was a growing belief that there might be more sporadic groups waiting discovery. Truly enough, more sporadic groups were discovered through the 1970s but the most spectacular of them depended on the Leech lattice for its construction. This incredibly large sporadic group, appropriately named the Monster, exists in 196,883-dimensional space. It has an order of 808,017,424,794,512,875,886,459,904,961,710,757,005,754,368,000,000,000. Some say that the Monster holds the secret to the workings of the universe. It has been linked to String Theory that operates in multidimensional geometries. The number 196,883 is no coincidence either. It occurs in certain special functions within Galois theory.



**Like any new** technology, RS code had its share of growing pains. It came in an era when computer speeds and memories were limited and expensive. BCH codes and RS codes were in the same situation of having to deal with their own complexities. The first Mariner spacecrafts that flew past Venus and Mars in the early sixties used BPSK modulation without any channel coding. When the question arose for selecting a suitable code for the Mariner missions of 1969, it was not the RS codes that were chosen but the less powerful but simpler RM (32, 6, 16) code, where sixteen is the minimum distance of the codewords. This code was supposed to be first channel code to venture out boldly beyond the earth's orbit. Unfortunately for the RM (32, 6, 16) code, it was beaten by a code from a rival family that had been quietly making progress without the sophisticated algebra of Galois.

While many researchers driven by the beauty of Galois fields had developed their block codes and cyclic codes, some researchers at MIT began to focus on codes that did not have a strong mathematical background. Peter Elias had proposed the first of these codes way back in 1955. These codes did not easily lend themselves to analysis because they could not be reduced to neat mathematics. Like any other channel code, they too took information bits, computed parity bits in known ways, and appended them to the information. However, their design was quite radical. While block codes took finite blocks of information bits at a time and encoded them, these new codes relaxed the strict definition of blocks. In these new codes, the parity bits depended not just on current information bits but also on past information bits. These codes remembered the past. They had memory. Memory determined the computation of parity bits. Thus, information bits were better protected by effectively more parity bits than had been possible in block codes. Greater the memory, better the error protection. These codes were named _Convolutional Codes (CC)_.

In an era when all focus was on linear block codes, Elias dared to think differently. If not for Elias, we might have never obtained convolutional codes, which were soon to play an important role in coding theory. The ideas Elias generated led to first-class PhD theses from many of his students. He introduced new approaches to code design and decoding. These influenced future work that eventually led to performance approaching Shannon's channel capacity bound. His contribution to information theory is perhaps second only to Shannon's.

The memory that a convolutional code possessed is termed its _constraint length_ _K_. The code is usually written down as ( _n_ , _k_ , _K_ ) code. With this memory, it is evident that the encoding and decoding processes involve states and paths entering and leaving each state. There are precisely 2k(K-1) states and 2k paths leaving each state. The code performed better if _K_ was larger, that is, it had more capacity to remember the past. But increasing _K_ increased computation requirements at an exponential rate. On one hand, large _K_ was good. On the other hand, it was a constraint for implementation. With _K_ = 20, the decoder had to maintain more than half a million states. With such enormous demands, decoding was a problem.

The simplest was _threshold decoding_ that looked at the number of errors and figured out the position of those errors. Threshold decoding was alright for small codes but definitely not suited for space applications that required greater error-correcting capability. J. M. Wozencraft at MIT invented _sequential decoding_ in 1957. Robert Fano subsequently improved it in 1963 under the name of _probabilistic decoding_. Indeed, while block codes had used algebra for decoding, convolutional codes used probability. The decoder operated on the premise that a certain sequence of bits had been transmitted and this is compared against the incoming sequence of bits. If the comparison fulfilled a certain criterion, it was probable that this was the transmitted sequence. Otherwise, the decoder discarded that path of decoding and tried another possible path that might fulfil the criterion. Essentially, the decoder worked by eliminating paths that were too improbable.

Sequential decoding was the state of the art in the mid-1960s. Decoders implemented in software were available. These didn't cost much compared to specialized hardware units required for RM (32, 6, 16) code. Moreover, the CC (2, 1, 20) code selected for the Pioneer 9 mission of 1968 had a coding gain of 3.3 dB while the RM (32, 6, 16) gave only 2.2 dB at a lower code rate. The credit for getting this convolutional code to space first goes to D. R. Lumb of the NASA Ames Research Center. James Massey, associated with Codex Corporation that implemented some of the decoders for such codes, commented on this development,

After rapid development of the convolutional coding system, Lumb succeeded in getting it aboard Pioneer 9 as _an_ _experiment_. This neatly side-stepped the long approval time that would have been necessary if this coding system had been specified as part of an operational communications system for a spacecraft. The operational communications system for Pioneer 9 was, of course, an uncoded BPSK system. The experimental coding system was activated as soon as Pioneer 9 was launched—it was never turned off!

Convolutional codes brought performance closer to the capacity bound given by Shannon. Given the complexity of decoding, researchers wondered if the bound could ever be reached. In fact, they were so certain of its futility that they defined something called Rcomp, an upper limit to rate determined by computation. For about half a decade, this was thought to be an unbreakable barrier. This was not the only problem with sequential decoding. After all, the decoding procedure was really nothing more than guesswork. In noisy channels, guesses were often wrong. The time to find the right path through the decoding tree was highly variable. The result of this was that memory buffers could easily overflow. Two things then happened.

Russian Mark Pinsker, a former student of Kolmogorov, shattered the Rcomp barrier in 1965. It was in fact possible to achieve higher rates except that no one had managed to find a better decoder. Once the myth of Rcomp was shattered, there was no looking back. Researchers started searching for decoders that could handle high-rate codes and a great deal of progress was made on this front. Secondly, convolutional codes finally got their optimal decoder that could operate at rates above Rcomp. The new decoding algorithm, first published in 1967, did not have the buffer overflow problem of sequential decoding. Its inventor simply named it _probabilistic nonsequential decoding_. Today it is more commonly called _Viterbi Algorithm (VA)_ in just tribute to its inventor, Andrew J. Viterbi.

Viterbi, an Italian Jew, was only four when he arrived in the US in the summer of 1939. Just five days after his family's arrival in the US, Germany invaded Poland. World War II had begun. As a boy he dreamt of studying at MIT whose portals he often passed by. By twenty-two, he had fulfilled this dream by acquiring a bachelor's and a master's in Electrical Engineering. Working at the Jet Propulsion Laboratory (JPL), he was among the engineers to work on Explorer 1, the first US satellite in earth's orbit, never mind that it came four months after Sputnik 1. For many years, he was deeply involved in missile and deep space programmes at JPL. It was during these years that he obtained a PhD from the University of Southern California. This kindled his interest in academia. In 1963, he became an assistant professor at the University of California, Los Angeles. Teaching was a hard thing to do. Digital communications was a complex subject. Viterbi looked for ways to simplify the teaching of sequential decoding to his students. In the process, almost unexpectedly in March 1966, he invented his now famous algorithm.

The crux of the algorithm is quite easy to explain. Given many possible land routes from Paris to Warsaw, it makes sense to remember only the shortest route if time is at a premium. If the traveller wishes to save his money instead, he needs to remember only the cheapest route. There was no point remembering all possible routes. Likewise, a decoder need not track all possible paths in the decoding process. It needs to track only those paths that survive a certain criterion. The job of the algorithm is to discard irrelevant paths and keep only one path per state at every stage of the decoding process.

No one seemed to be enthusiastic about patenting the new invention. Implementations of the algorithm were expected to be expensive and perhaps only the US government would ever buy it. Thus it happened that the algorithm was freely available in public domain, which directly contributed to its universal adoption. Today the algorithm is used not just in decoding convolutional codes but also in voice recognition, OCR, and DNA sequencing, among others. The algorithm is really the optimal way to tackle the maximum likelihood problem, the task of selecting one from many, one that has the best probability of being the right one. The algorithm is the perfect approach for demodulating CPM and CPFSK signals. These modulations involve memory and phase transitions just as convolutional codes involved memory and state transitions.

Viterbi himself did not immediately realize that his algorithm was optimal in the sense that no other decoder of a convolutional code could do any better for reducing the BER. If Viterbi's explanation of the algorithm in his own paper of 1967 had been much too verbose and mathematical, David Forney gave a better representation of the decoding process in a paper of 1973. Forney gave a representation of the code in terms of a _state transition diagram_. He then combined states in a sequential timeline to result in what he called a _trellis diagram_. This gave a powerful visualization of paths and the actual workings of the algorithm. The diagram made it easier for engineers to see that elimination of non-survivors at each stage actually led to optimal decoding. Since then, trellis diagrams have become a standard in the design and understanding of convolutional codes.

While Hamming had launched coding theory in the late 1940s, it was the sixties that was its golden age. This was the age when not just theoretical foundations were strengthened but practical considerations were given due importance. Codex Corporation started with the intent of designing and implementing better error-correcting codes. It became one of the first companies to put information theory and coding theory into practice. Its early market was exclusively the military for deep space channels. Later when Codex attempted to sell its work to industry, it saw that error-correcting codes had limited potential. The company moved into modems and multiplexers. By then, Viterbi had formed his own company named Linkabit. Unlike Codex, Linkabit's focus remained in coding. Both companies found success in their own fields. It was also around the early sixties that coding theorists came to realize the difference between hard and soft.

The decoder's job is to pry out information bits from bits that come in with errors. When noisy symbols are received, the demodulator does it work and passes on its estimates to the detector. The detector for its part takes a decision; that is, it decides between a one and a zero. This is then passed on to the channel decoder. The channel decoder attempted to correct errors based on Hamming distances. Engineers who designed decoders then realized they could do much better if the original estimates of the demodulator were available to them. What this meant was that the detector was subsumed into the channel decoder. The older method was termed _hard-decision decoding_ while the new improvement was termed _soft-decision decoding_. Indeed, it gave a coding gain of about 2 dB. Soft-decision decoding employed Euclidean distances rather than Hamming distances. Hard-decision decoding had done a lousy job. Soft-decision decoding on the other hand squeezed the last drop of juice from the code. It could be applied to any type of code, be it a block code or a convolutional code. In fact, most of the coding gain of RM (32, 6, 16) used on the Mariner missions of 1969 were due to soft-decision decoding rather than the code itself.

The reason why soft-decision decoding had been ignored for so long was due to a trap unknowingly created by information theorists, including Shannon. Unfortunately, it was a trap into which coding theorists and engineers fell. Shannon and others had talked about a channel model called the _Binary Symmetric Channel (BSC)_. This was a rather simple model of a channel. It served as a starting point for illustrating concepts of information theory. In such a model, a binary input of one or zero was corrupted with a certain fixed probability. The receiver got only ones and zeros except that some might actually be corrupted data. Coding theorists used the BSC in their own analysis. Hamming distances, which deal with only ones and zeros, were analysed with respect to the BSC. Problem was that real-world channels were not anywhere close to being BSC.

Noisy channels of the real world corrupted signals on a continuous scale. The AWGN was an example of such a channel. In particular, deep space channel was almost a perfect AWGN channel. In such channels, corrupted signals were real numbers that represented amplitudes and phases. The interpretation to digital comes later. Coding theorists, in their enthusiasm to go digital, had disregarded the analogue world. Analogue signals continue to have an important place in digital communication systems. Every time we convert an analogue signal to digital, we lose some information. It is therefore important to be careful in both design and implementation that only redundant information is discarded.

A Simple Illustration of Viterbi Algorithm

A secret agent travels by road from Paris to Warsaw. Enemy spies at Paris, Berlin, Prague, and Warsaw try to estimate the agent's route. They know the distances and hence the travelling times. The spy at Prague spots the agent and estimates that he must have covered 1330 km. He therefore determines that the agent must have taken Pa-Fr-Mu-Pr route. He discards the other two routes. The agent at Warsaw estimates the distance covered as 815 km. He claims the agent must have taken the Pr-Kr-Wa route. He discards the Pr-Wa direct route. In all, only five alternatives had to be considered. Instead, if the spy at Warsaw had estimated the total distance as 2145 km without intermediate data, he would have selected the same correct route but would have needed to consider nine alternatives. With hard-decision decoding, let's suppose that the spies can estimate distances only to the nearest hundred. The above figures then become 1300, 800, and 2100 km. The spies would then select wrongly the route through Zurich.

By the start of the 1970s, it had become clear that convolutional codes gave superior performance compared to block codes. They were also simpler to implement. While most long BCH codes are bad, convolutional codes with large constraint lengths performed better. Nonetheless, Viterbi algorithm for constraint lengths longer than ten was highly demanding. Sequential decoding was preferred for such cases and therefore Fano's algorithm was often used on spacecrafts journeying into deep space. This was the case with Pioneer 10 and 11 missions that happened in the early seventies. A CC (2, 1, 32) code that used 3-bit soft decisions at the decoder gave almost 7 dB gain compared to uncoded BPSK. Pioneer 11 flew past Jupiter in December 1974. Although giving commands to the spacecraft from earth and getting a confirmation back took about ninety minutes, the data rate was healthy at 2048 bps. Five years later, Pioneer 11 reached the neighbourhood of Saturn. From here, Pioneer 11 communicated back to earth at 512 bps. This was reduced to lower rates in the presence of solar interference. After sending back some pictures of Saturn and its satellite Titan, Pioneer 11 bid farewell to earth and ventured out of the Solar System.

The next notable deep space missions with relevance to coding were that of Voyager 1 and 2, launched in 1977. They were to fly past not just Jupiter and Saturn but farther to the orbits of Uranus and Neptune. These missions used Viterbi decoding for a (2, 1, 7) convolutional code. With smaller error-correcting capability, they had less coding gain compared to the Pioneer 10 and 11 missions. This might seem like a step backward but it was not. The convolutional code did not operate alone. It formed a partnership with another code just as David Forney had suggested in 1966. Forney had called his method _concatenated codes_.

Concatenated codes were born out of a need to keep decoding tractable while achieving exponentially decreasing error rates. These codes took two codes and used them back to back. Forney found that this gave much higher coding gains than using a single code of greater complexity. For the Voyager missions, information bits were first put through RS (255, 223) that could correct up to sixteen symbol errors. After this, the bitstream passed through the (2, 1, 7) convolutional code. The resulting concatenated code gave a coding gain of 7.3 dB, certainly better that the Pioneer (2, 1, 32) convolutional code. Engineers had come within 2.5 dB of the capacity bound. Such a concatenated code was eventually standardized for use in satellite TV transmissions.

It was in 1610 that Galileo Galilei discovered the satellites of Jupiter. Almost four centuries later the Galileo spacecraft was launched to study Jupiter and its satellites. The Voyager spacecrafts had briefly flown past Jupiter and Saturn but Galileo would stay as an uninvited guest in the Jovian system for nearly two years. Galileo reached Jupiter six years later in December 1995 via a touristic route 2.5 billion miles long. From its new home about 600 million miles away, Galileo sent back to earth spectacular high-resolution images. Through these images, we understood a lot more of Jupiter's Giant Red Spot, Io's volcanic activities, Europa's icy plains, and Ganymede's extensive ridges. From the initial 129 images, only two were lost. The rest were sent back to earth without errors. Rather, it was convolutional coding that corrected whatever errors that might have crept in during transit. The Galileo mission used (4, 1, 15) code with Viterbi decoding. This would not have been possible a decade earlier but by the time of Galileo's launch in 1989 computing hardware had become more powerful. This enabled the decoder to maintain and evaluate 214 = 16384 states in the trellis. Of course, this was concatenated with RS (255, 223) code bringing us to within a single dB from Shannon's capacity bound. In a lecture delivered in 1992, Viterbi summarized these achievements,

In the early '60s block-code designs provided about a 3 dB coding gain, meaning that we could extend the range of space missions by the square root of 2, or reduce ground station antenna diameters (then 25 to 60 metres) proportionately or spacecraft transmitter power by half. In the '70s these were replaced by convolutional codes which brought the gain to 6 dB for double the range, half the antenna diameter or one quarter the transmitter power. The '80s saw further progress in convolutional codes, as well as concatenation with Reed-Solomon block codes, which enhanced the coding gain to almost 8 dB. In the '90s increased processing power will bring that gain to 10 dB, a full factor of 10 in power or more than triple the range. To the theorist the achievement is that this is only 2 dB from the absolute limit set by Shannon's channel theorem.

Did Viterbi's prediction come true? Did we finally cross that last frontier? Indeed, through the 1980s and 1990s newer codes enabled coding theorists to touch the Shannon bound. It was true that best coding gains were obtained when the communication system used the best of both coding and modulation. But an Austrian born researcher at IBM, Gottfried Ungerboeck, thought that cooperation between coding and modulation was just not enough. It ought to be an integration of coding and modulation. Ungerboeck's ideas came as part of IBM's effort to position itself as a leader in voiceband modems in the mid-1970s. Codex Corporation had taken the lead but IBM wanted a share of the digital transmission market. What Ungerboeck suggested was that although constellations such as 8-PSK and 64-QAM were crowded, it was possible to partition the constellation into distinct subsets in a hierarchy. Each subset thus had less symbol points and therefore higher Euclidean distances. Some bits selected the subset and others selected a particular symbol within the subset. Gray coding was not relevant in this scheme of things. It was more important that a pair of points with greater Hamming distance had a greater Euclidean distance.

When the new method was presented at the Information Theory Symposium in 1976, no one took notice. The code languished for years until Ungerboeck decided to publish its full details in 1982. By then, IBM had released a modem with Ungerboeck's coded modulation. The modem was capable of 14.4 kbps, the first of its kind in the market. IBM, however, was more concerned with selling its computers and little focus was given to peripherals. It was only in the eighties that coding theorists took note of it. Part of the reason might have been the isolation between those who designed modems and those who designed channel codes. Ungerboeck's method, today referred to as _Trellis Coded Modulation (TCM)_ , requires knowledge of both disciplines. While the V.32 modem standard defined 16-QAM, it also defined TCM that gave a coding gain of 4 dB in excess of 16-QAM. TCM enabled modems to reach speeds of 33,600 bps in the 1990s. Digital TV transmissions via cable modems use TCM.

At a conference in Geneva, three French engineers presented a new code, which they called _Turbo Codes (TC)_. The inventors, Claude Berrou, Alain Glavieux, and Punya Thitimajshima, gave their paper an ambitious title, "Near Shannon Limit Error-Correcting Coding and Decoding: Turbo Codes." Of course, no one believed them and if the audience stopped themselves short of open ridicule, it was only out of politeness. Out of curiosity, some researchers went back to their laboratories and tried out what the Frenchmen had proposed. They couldn't believe what they found. Indeed, just as the inventors had claimed, these codes that looked apparently simple, almost achieved the Shannon bound.

The code was simple because it was fundamentally nothing new. The inventors had simply applied the concatenation principle of Forney in a new form. They had shown that by combining simpler elements a more powerful code could be constructed. For a long time researchers attempted to understand why these codes worked so well. It was clear that even the inventors could not explain it. Indisputable, however, were the gains from it. The code was a combination of an interleaver and two convolutional codes that worked in parallel. One coder worked on the plain information bits while the other worked on the interleaved bits. The real gain came from the decoding technique called _iterative MAP decoding_. Since there were two CC decoders, each updated the other of its results, which in turn served to update its probability calculations for better detection. Typically, for a BER of 10-5, four iterations are enough at high SNR but at low SNR up to ten iterations may be needed. Though this incurs decoding delay, it is usually tolerable. Turbo codes have been adopted for wireless cellular standards, WiMAX technology, deep space channels, and Digital Video Broadcasting (DVB).

Quick on the heels of turbo codes came the _Low-Density Parity Check (LDPC)_ codes. D. J. C. MacKay and R. M. Neal presented these codes in 1995. This time around, engineers were open to anything. Turbo codes had shattered an old myth that Shannon capacity bound could never be reached, that coding theory had matured long ago, and that there was nothing more to be done. Interestingly, LDPC turned out to be a code from the sixties. It was in 1962 that Robert Gallager of MIT first released his LDPC codes. He had been guided by Peter Elias, who had invented convolutional codes the previous decade. LDPC is the perfect example of a code far too advanced for its time. The computational requirement for LDPC encoder and decoder was far too high and computers of the 1960s simply could not supply it. LDPC was promptly ignored until it was resurrected three decades later.

Gallager's idea was to have a code that simplified decoding while sacrificing on code rate. LDPC was a linear block code but with a reduced minimum distance because the parity checks were sparse. Though code rate was sacrificed, it could be regained by increasing the block length. Decoding complexity increased only linearly with block length but BER dropped exponentially. It was thus possible to consider enormous block sizes, as much as even ten million bits. The decoding was probabilistic and iterative. Today LDPC codes are in the same league as turbo codes. If turbo codes came within 0.5 dB of capacity, LDPC has been shown to come within 0.0045 dB. LDPC has been adopted for Digital Video Broadcasting (DVB-S2), Gigabit Ethernet, Wi-Fi, and WiMAX standards.

It had taken researchers fifty long years to reach the channel capacity limits predicted by Shannon. What was the road ahead? Were coding theorists finally out of jobs? Daniel Costello and David Forney, in a survey of developments in coding theory, gave a suitable answer,

There will always be a place for discipline-driven research that fills out our understanding. Research motivated by issues of performance versus complexity will always be in fashion, and measures of "complexity" are sure to be redefined by future generations of technology. Coding for nonclassical channels, such as multi-user channels, networks, and channels with memory, are hot areas today that seem likely to remain active for a long time. The world of coding research thus continues to be an expanding universe.

# 0111 For Your Eyes Only

**While channel coding** protected information from noise, noise was not the only problem. For millennia, secrecy had been an important element of communication. Most messages were private and intended only for addressed recipients. They were not meant to be read by others. Information was not just a sequence of symbols. Information was a valuable mix of news, plans, ideas, and plots. If it fell into the hands of competitors or enemies, the results could be disastrous. If noise was problematic for communication, human nature was much worse.

The conspirators of the Babington Plot realized the importance of secrecy. The plot was hatched through the months of 1586 by 24-year-old Anthony Babington and his Catholic supporters. The fact was that Queen Elizabeth of England had imprisoned her cousin Mary Queen of Scots on rather slim evidence. Queen Elizabeth was a Protestant and Mary Queen of Scots was a Catholic. The Catholics in the Continent took to the cause of Mary. The problem was that by the time they managed to smuggle in some letters to the deposed queen, eighteen years had passed and Mary had little hope left. It was about this time that the Babington Plot took shape. The intent was to free Mary, assassinate Queen Elizabeth, and restore the Catholic faith. If the jailer intercepted their messages, it would be the end of Mary. It was therefore important to encode the messages.

Encoding is a general term that means different things in different contexts. Source encoding is about compressing data. Channel encoding is about adding redundancy to correct errors. For data secrecy, encoding is about making messages unintelligible or cryptic to the enemy. In the context of secrecy, encoding has a more specific term— _encrypting_ or _enciphering_. In any case, the concept of all encoding is similar—to replace a given sequence of symbols with another sequence that achieves the desired goals. In the case of encryption, symbol sequences that make up a language of communication are replaced by others that appear gibberish. Encrypted messages may mean next to nothing should the enemy intercept them. By considering sequences, it is implied that encryption does not mean that one set of symbols ought to be replaced by another set. It is quite acceptable to use the same symbols in encrypted messages so long as the symbols are rearranged intelligently. The mathematical term for rearrangement is _permutation_. For example, the letters of the English alphabet, A to Z, can be used in encrypted message but the vowels are no longer vowels, the consonants have been garbled, and any direct reading of encrypted messages results in alien sounds that no one understands. It was to such encryption that Babington and Mary looked to for keeping their communication secret. Indeed, it was quite necessary since the courier who moved these letters was a double agent. He worked for Sir Francis Walsingham, Principal Secretary to Queen Elizabeth. Walsingham desired nothing less than Mary's death. Here was a perfect opportunity to implicate her and send her to the gallows.

All letters that came in and out of Mary's prison were first routed to Thomas Phelippes, a polyglot and master cryptanalyst. No one of course knew this. Worse still, they communicated explicitly believing that encryption would always render their messages unintelligible to the enemy. Indeed their messages might have remained safe and secure from prying eyes had they only chosen a strong ciphering mechanism. Unfortunately, what they used was a form of simple substitution called a _nomenclator_. In such a scheme, a symbol was always substituted unvaryingly for another. Some symbols represented common words or phases that were referenced from a codebook, not unlike the codebooks of the Chappe brothers. To confuse the enemy, some null symbols were sprinkled about in the message. The null symbols added nothing to the message. These various elements of the nomenclator, though they did scramble the message to make it unintelligible, were not sufficient to deter the analytical genius of Phelippes.

Simple substitutions had been used even during the time of Julius Caesar. The most popular of these was the _Caesar Shift Cipher_ , in which each letter was replaced with another three places down the line. A was replaced with D, C with F, T with W, and Z with C. Hence, "cat" would be encrypted as "FDW." The former is called the _plaintext_ and the latter, the _ciphertext_. With such a shift cipher, it was all too easy to try out all possible shifts. There are only twenty-five possible shifts. When one of them yielded a message that made sense, the shift value became obvious. It was now possible to decrypt all messages that used that shift.

A powerful extension of the Caesar Cipher was to make the transformation arbitrary. Instead of a simple shift in places, a letter in the plaintext could be arbitrarily mapped to another in the ciphertext. For example, A could be replaced with H, H with Y, Y with D, and D with Q. There is no pattern in this replacement and it is clearly not a shift of places. This makes it enormously difficult for the enemy cryptanalyst. While with shift ciphers the cryptanalyst had to figure out only one shift value, he now has to figure out twenty-six unique mappings. More specifically, while with shift ciphers he had to try out only twenty-five possibilities, he now has to slog through 4 x 1026 or 400 trillion trillion possible mappings. The cipher that Mary and Babington had employed was derived from such a scheme of arbitrary mapping. The mapping was not available to the jailer, Walsingham, Phelippes, or anyone else who might intercept the messages. However, Mary and Babington had somehow secretly agreed on the mapping. They assumed their cipher was secure. There was no way anyone could try 400 trillion trillion possibilities. They were of course right but they didn't know what Phelippes knew.

Monoalphabetic Substitution Ciphers

(a) Caesar Shift Cipher is quite trivial. The regularity and predictability of the substitution makes this obvious. (b) A general monoalphabetic substitution cipher requires lot more work from a cryptanalyst. (c) A plaintext ciphered by the two methods (a) and (b). The message in this case is "Honey, I am coming home with a million dollars in cash and a dead guy in the back seat."

While Europe had languished during the Dark Ages, the Arabs had made a great deal of progress. The Islamic faith had been established in the seventh century. As with any young faith, there was a great deal of interest and enthusiasm to propagate it. Islam and the Arabic culture soon forged ahead into a golden period that lasted from the eighth to the twelfth centuries. They made great strides in mathematics, astronomy, chemistry, calligraphy, and architecture, among others. It is from the Arabs that algebra was imported into Europe in the twelfth century. It is to the Indo-Arabic system that modern mathematics owes the concept of zero as a number. It was the Arabs who gave early cryptanalysts an important technique of analysis.

Eighth-century grammarian al-Khalīl ibn Ahmad was among the earliest of cryptanalysts but the discovery that helped Phelippes was that of al-Kindi. Al-Kindi showed that it was not necessary to use brute computational force. There was a smarter way to analyse any ciphertext. His method relied on the statistical structure of languages, something Claude Shannon used centuries later in entropy calculations of the English language, something Morse and Vail used in the design of the Morse code. Considering English as an example, the letter E occurs most frequently in a typical text. Among the consonants, letter T is most frequent. Digram structures too have statistical properties. The pairs TH and ED are common. When Q is seen, almost always a U follows. Moreover, because of the natural redundancy present in most languages, if one could work out part of the original message from the ciphertext, it would be easy guesswork to fill in the rest.

By doing a frequency analysis on the ciphertext, it was possible to figure out the mappings. If B occurred most frequently, there was a good chance that this was actually an E. If NC occurred frequently, there was a good chance that this represented TH. Frequency analysis worked best when the cryptanalyst had lots of ciphertext to work with. This was the case with Phelippes. He had many intercepted letters to work with. He knew all about frequency analysis and he applied it successfully. Walsingham went further than merely reading private communication. He directed Phelippes to forge letters and encrypt them with the same nomenclator to entice the perpetrators into revealing more details. Thus, for the want of a strong cipher, by the autumn of 1586, Mary lost her head. So did Babington and his accomplices.

When things don't work, it is time for evolution. When biological viruses fail against effective vaccines, they evolve into something more potent. When computer viruses are captured and neutralized, hackers design more sophisticated ones that can infiltrate under the radar of antivirus programs. Likewise, cryptography has always been a tussle between code makers and codebreakers. With each generation of ciphers, cryptanalysts came up with better methods to crack the ciphers. Cipher designers had to innovate continuously to ensure they were always a step ahead. While the simple nomenclator of Mary and Babington had been sufficient till the ninth century, with the coming of al-Kindi and his frequency analysis such straightforward substitution was vulnerable. It was like locking the vaults of Fort Knox with a five-dollar aluminium padlock.

Coincidentally, in the same year that Mary was executed, Frenchman Blaise de Vigenère published a treatise that gave a new direction to cryptography. All earlier ciphers came under the general category of _monoalphabetic substitution_. In other words, once the mapping was agreed upon, each letter of the plaintext was always changed to the same letter in the ciphertext. This one-to-one substitution made it vulnerable to frequency analysis. Vigenère saw that to break the statistical structure, a letter must be substituted by a different letter each time it occurred in the plaintext. Moreover, the substitutions were not to be predetermined and fixed. They must depend on the position of each letter in the message. This was in the category of _polyalphabetic substitution_. In this method, "meet me at noon" might be substituted as "ABRFOPUEWEFT." It is common in ciphertext to leave out spaces and punctuations. The idea is to give the cryptanalyst as few clues as possible to sentence construction and word boundaries. There are three occurrences of E in this example but each one is substituted differently. Moreover, there are two occurrences of E in the ciphertext but each one represents a different plaintext letter. The Vigenère cipher was many-to-many substitution.

Though polyalphabetic substitution can be traced to the fourteenth century, it was Vigenère's publication that popularized it. Frequency analysis was powerless since many-to-many substitutions smoothened out the frequency of letters in the ciphertext. In time, the Vigenère cipher acquired a reputation of being undecipherable. Indeed, for nearly three centuries no one managed to crack it open until English scientist and engineer Charles Babbage was challenged one day to give it a go. Only the previous decade Babbage had achieved fame for ambitiously attempting to automate mathematical calculations using mechanical apparatus. In short, he had attempted to build the world's first computer (Chapter 8). Now in 1854, he turned his attention towards breaking the Vigenère cipher.

The Vigenère Cipher

This is one possible representation of the cipher as concentric discs. Inner disc is for the plaintext, the outer is for the ciphertext. In this example, the keyword is "LYRE." The outer disc will be rotated through these four positions and in each position the equivalent ciphertext letter would be read. For example, "e" in plaintext can be one of P, C, V, or I.

Ciphers had always fascinated Babbage even from a young age. Deciphering in particular was almost an art. If there were any scientific principles, they were probably of little use. The necessary assumption was that code makers too knew these techniques and had probably taken care to design better ciphers impervious to standard attacks. Deciphering was therefore always an exploration into uncharted territories. Vigenère had published the method of encrypting plaintext and the method hinged on the selection of a secret keyword. The keyword exclusively determined the transformation from plaintext to ciphertext.

Babbage analysed a typical ciphertext and saw that some combinations of letters repeated themselves. Though substitutions were not fixed, the ciphertext seemed to betray repetitive sequences. These came directly from the use of the keyword. If the keyword was say five letters long, then any word would be ciphered in one of five different ways depending on where it occurred with respect to the keyword. From such an approach, Babbage first worked out the length of the keyword. Once this was achieved, it was trivial to figure out the letters of the keyword. This was because each letter of the keyword pointed to a monoalphabetic cipher within the Vigenère cipher. For example, if the keyword was five letters long, each fifth letter of the ciphertext could be put through frequency analysis. If the parties had chosen an easily memorable keyword, the task was made easier for the cryptanalyst. By working out a few letters of the keyword, he could complete the rest without further analysis.

For some reason, Babbage did not make his methods public. It is supposed that he was prevented from doing so. It was the time of the Crimean War. If the Russians used the Vigenère cipher, thanks to Babbage, the British now knew how to read their ciphered messages that passed along telegraph lines. With the coming of telegraphy, spying had become a lot easier. One didn't need to go in hot pursuit of horseback messengers or shoot down pigeons. One simply needed to tap into telegraph lines. The problem with sending ciphered messages using Morse code was that communication was a lot slower. Morse code had been designed optimally for speed of transmission so that more frequent letters used shorter sequences of dots and dashes. This optimality was broken when Vigenère cipher was applied to messages. It really didn't matter. Secrecy was more important than speed. If glory had bypassed Babbage in his own time, it fell on another. Nine years after Babbage, German Friedrich Kasiski cracked the Vigenère cipher using similar techniques.

The same year that Babbage broke the Vigenère cipher, Charles Wheatstone of the electrical telegraph fame, invented a new form of cipher. This later came to be called the _Playfair Cipher_ since it was Lord Playfair who tried to promote it. The need for secrecy in telegraphy combined with the need to save on cost of transmission saw the emergence of many codebooks for both compression and cryptic encoding of messages. Indeed, cryptography, which had largely been reserved for military purposes and royal machinations, came into popular use with the growth of telegraphy. Edgar Allan Poe and Jules Verne introduced ciphers into their fictional writings and kindled interest among the public. Even the great detective Sherlock Holmes had to get down to studying cryptic messages in order to solve a case. It was in this context that Wheatstone proposed his own method of ciphering.

In Wheatstone's method, the letters of the alphabet were randomly arranged in a 5 x 5 grid, with the letters I and J sharing the same space. Ciphering was done by taking a pair of letters from the plaintext, looking them up in the grid, and substituting with another pair nearby based on well-defined rules. These rules depended on the rows and columns of the plaintext letter pairs. Despite its apparent sophistication, it was no more secure than a monoalphabetic cipher. It was a polyalphabetic cipher alright but because substitutions were made based on letter pairs, when the cryptanalyst considered digrams in the ciphertext the cipher was reduced to a monoalphabetic nature. The cryptanalyst could make out repetitions that pointed to the more frequent digrams occurring in English—th, he, an, in, er, re, es. The Playfair cipher never attained great popularity though it was supposedly used during the Boer War. Clearly, those who adopted it did not realize its weakness.

Amidst these developments in the nineteenth century, there was one cipher that defied analysis for two centuries. This was the cipher invented by Antoine and Bonaventure Rossignol in the court of King Louis XIV of France. Most letters of the king were encrypted using the Rossignols' method, which came to be known as the _Great Cipher_. The letters perhaps contained the real truth behind the identity of the mysterious Man in the Iron Mask, supposedly a twin brother of the king. When in 1890 a new series of letters were discovered, the task of deciphering them fell upon Étienne Bazeries, then working in the French Army's Cryptographic Department. Bazeries toiled for three years before meeting with success. He figured out that it was a monoalphabetic cipher but instead of operating at the level of letters of the French alphabet, it operated on syllables using letter pairs. The reason that the Great Cipher had remained secure for so long was that by mapping syllables to letter pairs, the definition was novel and unknown to the cryptanalyst. The deciphered letters revealed that the Man in the Iron Mask was in fact Vivien de Bulonde, an army commander who had abandoned his post and fled the battlefield, thereby compromising the entire campaign at the French-Italian border.

Though new ciphering methods came and went, none were quite as good as the Vigenère cipher. Unfortunately, the best cipher in history was no longer secure following the work of Babbage and Kasiski. It seemed that code makers were at a loss. Codebreakers had managed to gain an upper hand. There was a dire need for new principles if cryptography was to go any further. For the moment, it appeared that every cipher had a weakness unknown to its maker and could be exploited if a codebreaker persevered. Edgar Allan Poe, who himself had solved many simple substitution ciphers, commented,

Few persons can be made to believe that it is not quite an easy thing to invent a method of secret writing that shall baffle investigation. Yet it may be roundly asserted that human ingenuity cannot concoct a cipher which human ingenuity cannot resolve.



**If code makers** were to beat their counterparts at their own game, they needed a more formal approach. One tacit principle of cipher design was that it was more important to guard the secret key than guard the method of ciphering. This was explicitly expressed by Dutchman Auguste Kerckhoffs when he published _La Cryptographie Militaire_ in 1883. Kerckhoffs's ideas are still valid today. Even if the enemy captures ciphering equipment or methods of encryption, it should not be easy to derive the plaintext from the ciphertext without the shared secret key. More importantly, one can even make the ciphering methods public, thereby giving cryptanalysts a chance at locating vulnerabilities early on.

A relevant example is in Chappe telegraphy. Intermediate relay stations could not interpret messages since they did not possess the codebook. If by chance any of them got hold of the codebook, messages became plain and secrecy was compromised. There was nothing to do but to change the codebook, truly a costly affair given the vast network of Chappe telegraphy. Chappe telegraphy used only a codeword. There was no separate secret key. Kerckhoffs's principle proposes a more practical alternative. It is far better to make public the method of encryption, which could be initialized in numerous ways depending on the secret key. Today the principle is embodied in a single statement—security by obscurity just doesn't work. The use of such keys had become commonplace by the start of the Great War of 1914-1918.

It may be said that war serves to goad nations into action. In the humiliation that follows every defeat, the vanquished take great pains to prepare for the next war in the future. It will therefore not surprise us to know that the French were among the world's best cryptanalysts as Europe entered the Great War. Their heavy defeat in the Franco-Prussian War of 1870-1871 had left them feeling vulnerable in an unfriendly neighbourhood. They wanted to know what their neighbours were up to. By the time of the Great War, there was no problem getting ciphertext. In fact, with the coming of wireless communications, even the good old wire-tapping had become unnecessary at times. Enemy communication was readily available and easily intercepted. If anything, there was just too much of encrypted data to analyse. It is estimated that the French intercepted as many as a hundred million words during the course of the Great War.

The Germans used a form of monoalphabetic substitution called the _ADFGVX Cipher_. These letters were chosen since in Morse code they resulted in as few miskeying as humanly possible. Though it was only a monoalphabetic scheme, the Germans put the ciphertext through a second stage that reordered the letters. The reordering or permutation was based on a secret key shared between the sender and the receiver. The reordering also involved a simple interleaver not unlike those that coding theorists later employed to protect bits against burst errors on the channel. Despite all such sophistication, the fact remained that a single letter in plaintext was always replaced with the same pair of letters in the ciphertext. Reordering did make things difficult for the cryptanalyst but it was not impossible. Indeed, it was Georges Painvin who in June 1918 finally cracked the ADFGVX cipher and this directly led to German retreat before they could reach Paris.

A more spectacular example of cryptanalysis during this war had happened a year earlier. The British had intercepted a high-level telegram from German Foreign Minister Arthur Zimmermann to the German ambassador in Washington. The proposal was to assist Mexico in invading American territories. While the Americans were thus kept busy in their own backyard, the Germans would pursue unbridled U-boat attacks on all Allied naval traffic, thus forcing the Allies to surrender before Americans could come to their assistance. The British intercepted the Zimmermann telegram. The ciphering was not trivial but when the cryptanalysts managed to uncover the message, it changed the course of the war. America, which had thus far remained neutral, entered the war in support of the Allies.

The Zimmermann Telegram

Decryption of this telegram played a key role in persuading the US to participate in the Great War. Source: The US National Archives, Wikimedia Creative Commons.

Right up to the end of the Great War, the best ciphers were based on nineteenth-century methods, which had been proven vulnerable in one way or another. While code makers of the past knew intuitively that the best way to cipher a message was to make complex substitutions and permutations they could not hide the natural redundancies and patterns present in languages. In fact, most ciphers were based on substitutions and permutations at the level of language alphabets. This was also the time when teleprinters were becoming more common for rapid communication. Teleprinters used the five-bit Baudot Code or variations of it. So for the first time it was possible to conceive of a cipher that worked at the level of bits. Bits were the fundamental units of data representation and any ciphering method that operated at this level provided greater flexibility. Such a cipher would be perfect for securing teletypewriter data.

From such a direct need arose the Vernam cipher of 1918. Its inventor, Gilbert Vernam of AT&T, spoke of closing and opening of electrical circuits by means of electromechanical relays. The plaintext would be represented as a sequence of electrical pulses. This would be modified by a separate random sequence of electrical pulses. Such a random sequence would be the secret key shared between the sender and the receiver. At the decrypting end, the same random sequence would transform the ciphertext cleanly to the original plaintext. By such a construction, Vernam established the principle of bit operations using electromechanical relays, though the term bit itself was coined years later by Claude Shannon. Historically, the Vernam cipher is the first digital cipher, operating on just ones and zeros. The essence of his invention, perhaps the greatest in the history of cryptography, was distilled in a single statement that appeared in his US patent application,

For ciphering and deciphering the message the ciphering devices at the opposite ends of the line are provided with identical sections of tape upon which are recorded a series of code signals which are preferably selected at random but if desired may themselves represent a predetermined series of letters or words.

Many key features of modern cryptography are apparent in this statement. Both ends have the secret key. The secret key must be random. Vernam also acknowledges that truly random secret keys are not practical. In such cases, a compromise must be made. The design of Vernam cipher comes down to understanding randomness. What does it mean to be random? Can machines generate random sequences? How do humans recognize random events?

The human brain is built to recognize patterns. When things happen around us, it is natural for us to get into causal analysis in order to explain and justify. Anything in the environment that defies human understanding makes us uncomfortable. Even when we are faced with truly random events, we strive to sieve out patterns that may not be any more than a coincidence. This may have something to do with the fact that few everyday events are truly random. So our cognitive abilities have evolved likewise to reject randomness and accept predictability. Predictability makes us feel secure and gives a notion of control. What randomness means is that the occurrence of an event has absolutely no relationship with the past, the future, or even conditions of the present.

Take an unbiased coin for instance. Which sequence of outcomes is more random: TTTHTHTTHTHTHTTHHTHTHTTHTHHT or HHTTHTTTTTTHHHTTTTTTHHTTTHTH? Most people will claim that the first sequence is more random of the two. The second sequence contains long streams of successive tails. But if the coin is truly unbiased, both sequences are just as likely. When Apple introduced the "shuffle" feature on its iPod, customers complained that songs were not played in random fashion. Customers came to this conclusion because they heard some songs more often than others. The creators of the iPod responded by making the feature less random so that users perceived it as being more random. While this points to a failure in humans to recognize randomness, we should also question the workings of the iPod.

Using deterministic processes, it is not possible to generate true randomness. Software alone cannot generate true random sequences. At best, sequences can approach random behaviour. Engineers term them _pseudorandom sequences_. So it is possible for an iPod to select songs pseudorandomly but it can never achieve true randomness through software alone. True random sequences can be achieved by making using of special physical processes that are by nature random. Radioactive decay in a given time window is a random process. Such a decay can be measured and processed to yield a binary random sequence.

Vernam had said that for practical reasons both ends should agree on a key beforehand. He knew that if the key were to be truly random there would be a problem of distribution. The only way to share such a key would be for one party to generate the key, then both parties to meet secretly, and exchange it. If this was not possible, as is often the case in wartimes, the alternative was to adopt a common method to generate a pseudorandom key sequence. The method must be kept out of the enemy's reach. In Vernam's time, generating even a pseudorandom key sequence was not trivial. Such sequences have a periodic behaviour whereby the sequence replays exactly after a certain time. This means that randomness is compromised over many periods, a fact exploited by cryptanalysts. From here was born the idea of the world's most secure ciphering mechanism.

The _One Time Pad (OTP)_ is really a Vernam cipher in which the key is infinitely long. In practical terms, when the sender and receiver share a random key, the key is used only once to encrypt the plaintext. Once used, it is immediately discarded. OTP swears by the common adage that since we live only once, live life to the full. OTP is a perfect cipher. It can never be broken, now or in the future. In the 1940s, Shannon saw the perfectness of the OTP. He explained it using concepts that later became fundamental to information theory. He saw that a random source means that there is no redundancy in the source. It is not possible to compress the sequence any further than what the source emits. The sequence in fact looks just like noise. There are no patterns. There is no predictability. Every alphabet coming out of the source is a surprise. Without patterns and predictability, code makers had finally defeated the cryptanalysts. They had grasped the Holy Grail of cryptography. Well, almost.

Early cryptographers had visualized their methods in terms of tables or graphs. The Vigenère cipher could be implemented either as a grid of 26 x 26 letters or as concentric rotating discs. The Playfair cipher was certainly designed as a grid of 5 x 5 letters. Today these ciphers can be expressed mathematically but its creators probably did not view them so. In fact, while code makers had not used mathematics, codebreakers had done so and had achieved greater success. Vernam was the first to look at encryption in terms of bit encoding and bit operations, though he described these in terms of electromechanical relay operations rather than in the language of mathematics. It was therefore an historic departure from the norm when Lester Hill published in 1929 a new cipher that was based on modular arithmetic and matrix algebra. By doing so, Hill heralded a new way of looking at ciphers that could lend themselves to analysis at the design stage. Although his sample cipher contained twenty-six symbols, he pointed out that if a cipher could be designed within finite algebraic fields, then the power of Galois algebra could be harnessed towards constructing stronger ciphers.

The Hill cipher embodied two of the most important properties of good ciphers. The first was that a single plaintext letter should affect many letters in the ciphertext at the same time. In other words, should we change a single letter in plaintext, the equivalent ciphertext would look quite different over many letters. The second was that a single ciphertext should be formed as a consequence of many letters in the plaintext. This clearly meant that it was no longer possible to match a ciphertext letter to a unique plaintext letter. The shared secret key was no longer a simple memorable word. It was the transformation matrix from which the receiver could derive the inverse matrix easily. The Vigenère cipher encrypted plaintext letter by letter. It came under the category of _stream ciphers_. The Hill cipher encrypted on a block of letters and hence came under the category of _block ciphers_. Hill recommended a block size of 8 x 8 = 64 letters, although he noted that even something as small as 5 x 5 = 25 should be highly secure from nosy cryptanalysts.

This was the state of the art when the Second World War began but neither the Allies nor the Germans adopted the Hill cipher. Once more, as is often seen in the history of technology, here was a method that was far too advanced for its time. Programmable computers did not exist and the first of these arrived only in 1943 when the British built Colossus, a machine of 1,500 vacuum tubes designed to crack the German Lorentz cipher. Hill cipher required a series of multiplications and additions in modular arithmetic. The Germans adopted simpler methods that were refinements of nineteenth-century ciphering principles. The Lorentz cipher in particular incorporated elements of the Vernam cipher.

Though the Lorentz cipher was rather secure, the cryptanalyst had one thing in his favour—the old and ever-present difference between science and engineering. Vernam had laid down the principles for perfect ciphering but German engineers made compromises for practical reasons. They built their machines to generate pseudorandom key sequences since truly random sequences were not achievable. This introduced the first loophole, which was subsequently left wide open by lazy unsuspecting German operational personnel. Sometimes when transmission failed, operators reset their machines to an earlier key and retransmitted the messages. In such a retransmission, they made small changes to the message for brevity or unknowingly introduced errors. The real mistake was in the reuse of earlier keys. This was enough for British cryptanalysts poring over scores of ciphertext. Further statistical analysis on ciphertext revealed the non-random nature of the Lorentz cipher machine. In the process, British cryptanalysts invented the _double-delta_ method of analysis that looked at binary differences between letters in the ciphertext.

Had the Lorentz cipher used more random keys and had the operators never reused the keys, it might not have been broken. The same can be said of the T-52 Geheimschreiber cipher, which was more secure than the Lorentz cipher. The Germans used it mostly on landlines for securing teleprinter communication. Each Geheimschreiber machine weighed a hundred kilos and the entire process was mechanized so that operators never saw the ciphertext. As many as ten interconnected wheels marked with ones and zeros gave an output of ten bits. Five of these transformed each plaintext block of five bits. The other five bits reordered the result. Secure as it was by design, operators sometimes reused initial settings. The manner in which the wheels turned resulted in weakly random bit sequences. The most famous of the war's ciphers was the Enigma. It was used extensively by the Germans. While the Germans made small continuous improvements to Enigma machines, the Allies continued to steadily decipher their messages. Again, it was improper usage involving repetition and reuse that made the Enigma vulnerable. If it was unclear at times who had triumphed and who had failed, it was quite clear that during the Second World War both code makers and codebreakers had mechanized their work. German machines mechanized ciphering and deciphering. The Colossus mechanized cryptanalysis.

German Ciphering Machines

(a) The Geheimschreiber machine, also known as _Sturgeon_. Source: NSA, Wikimedia Creative Commons. (b) The Enigma machine showing the plugboard settings in the front. Front keyboard is used to type plaintext. Back keys light up to indicate equivalent ciphertext. Source: Swiss Transport Museum, Lucerne, Wikimedia Creative Commons.

The Allies themselves used truly random keys for encrypting the speech samples of SIGSALY. They used noise generators based on vacuum tubes to determine the random keys. These were then stored on phonograph records shared between the two ends of the communication channel. The playback of these records had to be precise to ensure than the two ends did not go out of synchronization. More importantly, the keys were never used more than once. SIGSALY indeed was perfectly secure since it applied OTP the right way.

Though OTP was indeed perfect, its implementation was somewhat impractical. Its use was limited to extreme requirements of the military. In the sixties during the Cold War era, Russian and East German secret agents extensively used OTP. They carried small booklets hidden in secret compartments of their luggage. These were printed with long sequences of digits or letters. Once a key was used for communication, it was destroyed. If they ran out of keys, they had to smuggle in fresh booklets. If booklets were discovered, the keys became useless. Worse still, if a stray copy of a booklet fell into enemy hands all communication became transparent. Cipher designers therefore did not and could not stop with OTP. The world could very well do with less than perfect ciphers that were more practical.

In the post-war period, much of the work done by cryptanalysts remained classified until the seventies. The Colossus itself was destroyed immediately after the war. Meanwhile, mathematicians including Claude Shannon were allowed to publish on cryptography without specific references to the work done during the war. Details of the SIGSALY encryption systems did not appear in public domain till decades later. Shannon's classic sixty-page paper on cryptography appeared in 1949. Riding on the success of his information theory of the previous year, he laid down the foundations of cryptography in his new mathematical language. He introduced two key terms that defined the essentials of any good cipher.

_Diffusion_ , he said, was the process of spreading out redundancy in a message. The statistical structure was preserved but diffusion hid it over many symbols. This meant that the enemy had to capture loads of ciphertext to figure out the message statistics. _Confusion_ , he said, was the process of complicating the relationship between the secret key and the ciphertext. This made it difficult to reverse engineer the secret key. The enemy may intercept the ciphertext easily but the secret key would remain secret. Shannon's ideas expressed the design goals of all modern ciphers. Diffusion is permutation of plaintext symbols and is usually implemented as _P-boxes_. Confusion is substitution of plaintext symbols and is usually implemented as _S-boxes_. The world was now ready to take the next big leap.

Principles of Diffusion and Confusion

A combined use of multiple stages of P-boxes and S-boxes implements both diffusion and confusion.



**In the two** decades that followed the Second World War, cryptography did not undergo a revolution. Cryptographic machines and methods of the war were merely refined and improved. OTP cemented its place as the ultimate ciphering method. Otherwise, there was no groundbreaking research in cryptography. Applications of cryptography were limited. Only governments and large private firms used it. The public had no real need for ciphering machines. Through the fifties and sixties, the research community focused on voice communications, channel coding, and modulation. Data communications was limited to space applications. The use of teleprinters and facsimile remained within particular industries.

It was only towards the end of the sixties that researchers took interest in machine-to-machine communications involving data bits. By then, computing machines were becoming a necessary part of many industries. It was also the time when the precursor to the modern Internet was being conceived. When _Automatic Teller Machines (ATM)_ arrived, they made it is possible for customers to withdraw cash at places far removed from any service branch. ATMs communicated with the customer's branch, where the identity of the customer and the validity of transactions were verified. This communication happened over leased lines. Digital modems did the translations between bits that machines understood and signals that these lines carried. This was definitely progress but it came with a real danger. If someone tapped this line and obtained all the relevant information, he could potentially empty another's account.

IBM, then a recognized high-tech company and a supplier of some of the world's most advanced computers, looked to solve the problem of data security. IBM's solution used a mix of S-boxes and P-boxes, concepts that had been derived from Shannon's fundamental principles of confusion and diffusion. The idea was to use these boxes in every round of encryption, a _round_ being the basic blocking of the encryption method. Many such rounds were strung together in succession. Each round in itself was not all that secure but many rounds in combination made a formidable defence against any serious cryptanalyst. This was not the only departure from older encryption methods.

IBM's proposal was a block cipher but rather than applying transformations to all bits in like manner, bits were first partitioned into two sets. Bits in each set underwent different transformations and then mixed with the other set. The sets were then swapped before entering the next round. This was something like a baker working on a piece of dough. He stretches the dough. He then adds garnishing to one half of it and kneads it. He then folds the two halves and kneads the mixture once more. He then commences another round of the entire cycle. For the new ciphering method from IBM, stretching of the dough was like partitioning the bits into two sets. The garnishing was the secret key that was sprinkled across the bits through a combination of P-boxes and S-boxes. The folding and kneading was like mixing and swapping of bits of the two halves.

It is also interesting to note that the baker analogy has some resemblance to the _Baker's Map_ from _Chaos Theory_ , although in such a map the two halves are kept separate within each iteration. Chaotic systems are non-linear systems in which small deviations in initial conditions can result in huge changes in system behaviour over a period of time. Problem is we can't accurately measure initial conditions of real-world systems. Small uncertainties in initial conditions lead to much bigger uncertainties in the future. Weather systems are chaotic systems and therefore accurate weather prediction has been for decades a challenging problem. Chaotic systems, as the name suggests, certainly look random in behaviour but they can be modelled using deterministic equations. An ensemble of many such models can statistically give us better predictions of chaotic system behaviour.

One such model is the Baker's Map. Suppose this model is applied to an image. With each iteration, the image undergoes a transformation, so that within a few iterations the image has no resemblance to the original. The IBM cipher was built on similar principles, transformations that lead to increasing uncertainty, transformations that are deterministic and even known to the eavesdropper. The eavesdropper, however, has no knowledge of the key. Any two keys, even if only slightly different, can lead to completely different results. The keys represent the initial conditions that define the time evolution of chaotic systems. Despite these similarities, direct application of chaos theory to image cryptography did not occur till the late nineties. Though chaos theory had its beginnings in the sixties, serious study of Baker's Map did not happen until the next decade. So, it is possible that IBM researchers applied chaos theory to their novel designs.

Baker's Map Applied to an Image

(a) Baker's Map stretches, slices, and stacks equal parts. (b) Original image of Peppers. Copyright ownership is unknown. (b) Five iterations of Baker's Map renders the image unintelligible. Image appears like noise. This has relevance to cryptography to protect private data.

The key cryptographer at IBM was Horst Feistel, a German who had migrated to the US in the thirties. During the years of the Second World War, most Germans in the US were under suspicion and Feistel was not spared. His interest in cryptography only made things more difficult for him. Even after the war, the National Security Agency (NSA) hounded him whenever he indulged in cryptographic research. Meanwhile in 1961, IBM inaugurated the Thomas J. Watson Research Center in Yorktown Heights, New York. Research had always been a key driving factor for high-tech companies and IBM was not new to research. Back in 1945, IBM had sponsored pure research with the opening of the Watson Scientific Computing Laboratory at Columbia University. Over the years, its product development had relied a lot on research. With the coming of the Thomas J. Watson Research Center, research became important on its own right besides being an enabler for long-term strategy and product roadmaps. It was under these circumstances that Feistel joined the Thomas J. Watson Research Center. By the early seventies, he had invented new methods of encryption, which have ever since been categorized as _Feistel Ciphers_. Was NSA happy about this?

Yes and no. By late seventies, NSA no longer had a grudge against Feistel. He was working for an American company and his research was meant for commercial purposes that would benefit Americans. The National Bureau of Standards (NBS), responding to the private sector's need for data encryption, had initiated the standardization of an encryption method in 1972. IBM's proposal was accepted and it came be called _Data Encryption Standard (DES)_. The problem for the NSA was that DES was simply too good to be true. It was too secure, which meant that NSA could not crack it. Security to the NSA meant that data is protected from the prying eyes of everyone else but not the NSA. The public of course had a different view.

NSA claimed the right to listen into private conversations, a claim justified under reasons of national security. A strong cipher meant that if enemies adopted it, NSA will lose its ability to listen to intercepted messages. Intelligence was important for more than military reasons. Economic reasons mattered just as much since intercepting international messages, whether governmental or corporate, would give critical information on "negotiations against foreign competitors, changes in the prime interest rate, crop forecasts, the availability of critical materials, and developments in advanced technologies." This was the decade of the great privacy debate, a debate that continues today in the face of new age terrorism. How much security checks and body frisking should we allow at airports? Should we insist that Facebook implements stricter privacy policies to protect user data? Should we allow governments to track user movements by legally tapping into cellular networks? If safety is the concern, how much of video surveillance should we permit in public toilets, if any at all? Is our data and our lives truly private or are we already living in an Orwellian world?

In George Orwell's _1984_ , the Big Brother could see, hear, and know everything. Today's world is much more sophisticated. Data collection, analysis, information extraction, and more, are done by tens of thousands of computers all across the globe. There is no single entity to shutdown, no single head to cut-off, and no single Cyclopean eye to blind. The greater worry amongst the public is that data so collected is not simply reduced to statistics. Individual records are tracked and private lives are tapped. At times, identities are stolen. Part of the problem is that the word privacy itself is greatly misunderstood. It has the connotation of things wrong and illegal. Privacy really is more about separating parts of information that are confidential and those that aren't. The problem occurs when confidential information is shared without user's consent. Sharing non-confidential information poses little risk and may lead to economic advantages—better marketing campaigns, better products, and lower prices.

The dilemma is that not everyone agrees on confidentiality. In some situations, a person's birthday is confidential and primitive systems use it as a means of authentication. In most other situations, birthdays are non-confidential and are freely shared. People share lots of personal stuff on Facebook, but rarely do they put their credit card information or medical reports on the Web. This view of privacy places the responsibility of sharing information on persons who own that information. But personal stuff available on Facebook can be seen as private when it falls into wrong hands and used in more ways than one had imagined. Most things private relate to family and relationships. It is said that when the Indian sage Vatsyayana wrote his classic text on love, the _Kama Sutra_ , in the fourth century, some of the text dealt with the subject of ciphering love letters. Privacy is certainly not a new thing but its interpretation is not exactly in black and white. One of the popular definitions of privacy sees privacy in terms of information as well as relationships,

Privacy is the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others. Viewed in terms of the relation of the individual to social participation, privacy is the voluntary and temporary withdrawal of a person from the general society through physical or psychological means, either in a state of solitude or small-group intimacy or, when among larger groups, in a condition of anonymity or reserve.

Often withdrawal of private information is not that easy. Once something is in public domain, it is "out there" and little can be done to erase it. In the age of the Internet, information has a way of replicating itself and taking permanent places in various databases linked to search engines. These engines are capable of locating hard to find information in a flash. If forgetting is difficult, why share at all? We share information for the simple reason that it makes life convenient. The real challenge is to keep confidential information that we have shared within responsible entities and secure systems. The idea is associated with the policy of minimum disclosure, that is, only those who need to know it should know it. Thus, when we share credit card information for purchases on the Web, only the credit card provider should know about it. The online store needs to know nothing beyond the fact that the user has been authorized for the transaction and payment has been made. Moreover, systems that perform this verification ought to be physically and electronically secure. Despite modern sophistication with such systems, it has been difficult to attain perfect security. Computer hackers have given us many case studies.

When DES was finally standardized in 1977 after intense public debate, the concerns of the public remained unanswered. NSA got what it wanted. DES, supposedly a highly secure method, was deliberately weakened by adopting a 56-bit key. It was secure enough for most purposes but if someone had sufficient computing resources, like the NSA, they would be able to crack it. It was only years later that it was enhanced to _Triple DES (3-DES)_ that used 168-bit keys and hence much more secure than the original DES. While this work had been going on through the 1970s, researchers in academia were mentally tinkering with novel ideas of cryptography. They were not trying to fulfil any prophecy. They were attempting to simply address immediate concerns of the time. What came out though was almost prophetic of the way personal transactions would be performed in the future.

In the autumn of 1974, an undergraduate student by the name of Ralph Merkle was taking a course titled "Computer Security and Engineering" at the University of California, Berkeley. During the course, a question arose in his mind. Suppose secure access to a remote computer was compromised, how was one to regain the lost security? This was not a new scenario at all. If OTP keys were stolen from secret agents, the only way to regain secrecy was to obtain new sets of keys. In the world of computers, this was far from an ideal solution. Computer users can't wait for days or even hours for keys to be transported physically in punched cards. The question that Merkle was asking was really about exchanging keys securely over insecure communication channels that linked remote computers. Before Merkle, no one had even thought in this direction. In a matter of days, Merkle worked out a solution.

Since the channel was insecure, anything sent on the channel is clearly visible to an eavesdropper. It was thus obvious that any method to restore security must depend on some random event, something the eavesdropper can never guess. Merkle suggested a set of puzzles. Suppose _A_ wishes to send _B_ some encrypted data. The first step would be to establish a shared secret key. _A_ starts by sending _B_ a set of _N_ puzzles. _B_ randomly selects one of these puzzles and solves it. _B_ then indicates to _A_ the solution. Using the solution, _A_ figures out which puzzle _B_ has selected. The puzzle then becomes the shared secret key. To the eavesdropper, let's say _C_ , the problem of getting this key is not trivial. _C_ has to solve all the _N_ puzzles one by one until she finds a matching solution to _B_ 's own solution. If each puzzle takes _m_ units of computational time to solve, the order of complexity for _A_ and _B_ is only _m_ ; but the order of complexity for _C_ is _mN_. Greater the _N_ , more difficult it becomes for the eavesdropper to derive the key. Here was an important principle discovered by Merkle.

It was not necessary that secret key be kept absolutely secret. It was sufficient for it to be relatively secret, relative in the sense of being computationally difficult. The eavesdropper knows to how to get the secret key but she usually can't because she has to slog through months or even years of computation to arrive at it. When she eventually obtained the key, it would be worthless since the communicating parties would have renegotiated a new key. Merkle proposed his idea as the basis of a term project to Lance Hoffman, the professor who was delivering the course on computer security. Hoffman had absolutely no idea what Merkle was talking about. He repeatedly suggested that Merkle should work on data compression instead. Merkle eventually gave up on the professor and dropped out of the course. He then tried to interest journals for publishing his idea. In October 1975, a reply came from the _Communications of the ACM_ stating politely that,

I also read the paper myself and was particularly bothered by the fact there are no references to the literature. Has anyone else ever investigated this approach?... The paper proposes to describe cryptographic security by transmitting under various unrealistic working assumptions, puzzles conveying key information, a puzzle which is just another word to talk about a cryptosystem.... Experience shows that it is extremely dangerous to transmit key information in the clear. Such practices of the legitimate user open the setup to illegitimate test procedures which only a very strong system could resist.

Einstein's 1905 papers contained only a handful of references and so did Shannon's masterpiece of 1948. Revolutionary ideas often have a tough time finding acceptance and Merkle was certainly not the only one at the receiving end. Not far away, two researchers at Stanford University got started on a similar thought process at about the same time as Merkle.

A few years earlier, Martin Hellman had been working at IBM and in the same department as Feistel. Over many lunches with Feistel, Hellman had come to appreciate the need for public cryptography beyond the usual applications of military and espionage. Later at Stanford, if Hellman largely focused on this problem, it was because he had understood the context and its commercial value. Hellman worked in isolation partly because not many were working on similar ideas. The other reason was that he was not taken seriously. With NSA's mega budgets, there was no way Hellman could ever hope to invent anything in this field. Even if he did, NSA would classify it and the public would never see it. Meanwhile, back at IBM's research centre, a self-taught cryptographer by the name of Whitfield Diffie shared his own ideas to the IBM crowd in 1974. These were still early days and what Diffie presented to his audience was perhaps more a vision for the future than concrete ideas. With a mathematics degree from MIT, it was natural for Diffie to approach the problem mathematically. Then someone suggested to Diffie that he should look up Hellman at Stanford, who was working on similar ideas. Thus began a fruitful partnership between two researchers who thought alike but different from everyone else. Great ideas are often the result of unconventional thinking. In the words of Walter Lippmann, "Where all men think alike, no one thinks very much."

The basic problem Merkle, Hellman, and Diffie were trying to solve was less about ciphering methods and more about key distribution. They knew that when cryptography enters the mass market for protecting data carried on telephone lines, it would be a nightmare for pairs of users to exchange secret keys. The old method of locking keys in a titanium briefcase chained to the wrists of a trusted agent was not going to work. Suppose in a group of hundred users each one has to have a secret key with every other user, there would be close to five thousand keys necessary. If anyone made the mistake of using a wrong key, data would become incomprehensible. If keys were stolen, there was nothing to do but commence the difficult process of key agreement.

Diffie joined Stanford as a graduate student and then collaborated with Hellman. They first presented their ideas at the IEEE Information Theory Workshop in June 1975. Reception was as bad as it had been for Merkle not long ago. Early in 1976, Merkle became aware of the work going on at Stanford. The turning point came later that year when the classic paper of Diffie and Hellman appeared under the title "New Directions in Cryptography." The paper started with a prophecy and its immediate fulfilment, "We stand today on the brink of a revolution in cryptography."

The paper made it clear that the authors were not trying to beat OTP. Rather, they were attempting to create a more practical form of cryptography. Secondly, it reiterated Merkle's notion of security—that it was sufficient for any method to be computationally secure rather than being unconditionally secure. The proposed concept was called the _public key cryptosystem_. In such a system, every user would generate a pair of keys. One will be held private by the user and the other would be made public, perhaps the way contact numbers are publicly available in telephone directories. Anyone wishing to encrypt messages to that user, made use of the public key. Only the user in possession of the private key could decrypt the contents and read the message. Users no longer need to generate and keep track of numerous key pairs. A single pair suffices for all communication. Security is not compromised for the simple and important reason that knowledge of public key cannot help anyone in deriving the private key. At least, it was computationally infeasible to figure out the private key. In a latter paper, Hellman explained this better with a simple analogy,

A public key cryptosystem can be likened to a mathematical strongbox with a new kind of resettable combination lock that has two combinations, one for locking and one for unlocking the lock. (The lock does not lock if merely closed.) By making the locking combination (enciphering key) public anyone can lock up information, but only the intended recipient who knows the unlocking combination (deciphering key) can unlock the box to recover information.

Before this paper appeared in literature, cryptographers had been using what one calls _symmetric keys_. In other words, the same key is used for encryption as well as decryption. It was therefore necessary to keep the key private just as Kerckhoff has proposed in the nineteenth century. Diffie and Hellman argued that this was not strictly necessary. One part of the key can be public while the other could be kept secret. The former solved the problem of key distribution while the latter ensured secrecy. From here was born the idea of using _asymmetric keys_. There was another beauty in using such keys.

A private key was like a person's signature. If a document is encrypted with a private key, anyone can decrypt the document and read it using the user's public key. Secrecy is not the focus here. It is about authenticity of the document. Only the user in possession of a particular private key could have encrypted the document. Asymmetric keys are the basis of modern implementations of digital signatures. These signatures are not digitization of names scribbled in freehand. Signatures in the digital world are private keys that are nothing more than sequences of ones and zeros.

In the summer of 1976, researchers were intrigued by the fact that Diffie and Hellman had presented only a framework for such a system. They had not actually arrived at any method for generating asymmetric keys. The race was on. Anyone who got there first, would carve a name for themselves in the annals of cryptography.

If Diffie and Hellman had failed to generate asymmetric keys, they did propose one method by which two parties can agree on symmetric keys over insecure channels. Their proposal was much more practical than the puzzle-solving approach of Merkle. The method relied on the concept of _one-way functions_. These are mathematical operations that are easily computable one way but the reverse is often hard. This is something like saying that it is easy to mix two gases but it takes no lesser genius than Maxwell's Demon to separate them out. Diffie and Hellman proposed a one-way function based on the computation of integer powers in modular arithmetic. Specifically, when Y = _q_ X mod _p_ , X is kept private. For the eavesdropper, even the public knowledge of _p_ , _q_ , and Y are not enough to easily compute X. Computation of _discrete logarithm_ is difficult enough to make this a one-way function. Of course, the numbers must satisfy special properties such as _p_ being prime.

The reason two persons _A_ and _B_ —later in literature replaced with hypothetical characters Alice and Bob—could exchange keys was because of certain rules of modular arithmetic. Alice computed Ya = _q_ Xa mod _p_. Bob computed Yb = _q_ Xb mod _p_. They subsequently exchanged Ya and Yb. Both could now compute the symmetric key, K = YbXa mod _p_ = YaXb mod _p_. The eavesdropper, usually named as Eve, the proverbial temptress from the Garden of Eden, had to compute either Xa or Xb before she could obtain the secret key.

The year 1978 was a watershed in the history of modern cryptography. The long sought after method for implementing asymmetric keys finally arrived. It had been only two years since the landmark paper of Diffie and Hellman but these two years had been long in such exciting times. Researchers working independently came up with methods distinctly different. However, only one of them was destined to become a commercial success. Meanwhile, Merkle's initial paper of 1975 was finally published in April 1978, having gone through at least eight revisions.

Merkle seems to have had a fascination with puzzles. At the IEEE International Symposium on Information Theory, Cornell University, October 1977, Merkle and Hellman presented a method of public key cryptography based on a well-known computational problem called the _knapsack problem_. Suppose Alice decides to go on a short trip to Wonderland but instead of falling through a rabbit hole by accident, she has time to pack. She has in her possession a list of eight items whose weights in grams are thus distributed: {10, 20, 40, 80, 160, 320, 640, 1280}. She packs her knapsack with some of these items and visits Wonderland. In Wonderland, Eve finds that Alice's knapsack weighs 1410 grams. Eve knows all about Alice's possessions and their individual weights. The problem for Eve is to figure out the items in the knapsack. This is one of the most computationally difficult problems in the branch of combinatorial mathematics.

The particular example noted above is in fact quite trivial. Firstly, it has only eight items. Secondly, the weights are such that any weight is greater than the sum of lesser weights. Such a sequence of weights, known as a _superincreasing sequence_ , is easy to solve. In fact, Alice has packed just four items: 10 + 40 + 80 + 1280 = 1410. Suppose Alice had a different set of items to consider for her adventurous trip: {13, 21, 75, 167, 245, 444, 889, 1280}. If Eve weighs the knapsack at the same 1410 grams, finding the solution is a little more challenging. The problem is really difficult for Eve when Alice decides what to pack from a selection of a thousand unique items. It may take even supercomputers years to find the solution. It occurred to Merkle and Hellman that this problem could be useful for public key cryptography.

The fact is that a hard knapsack problem can be transformed into a superincreasing sequence by simple mathematical operations. Each item in the sequence is multiplied in modular arithmetic and we get a superincreasing sequence. Alice can make public the original sequence. Anyone wishing to send encrypted messages to Alice can send numbers that are combinations of items in the sequence. Eve cannot figure out the original message since the problem is hard. Alice on the other hand knows the equivalent superincreasing sequence. She can easily figure out what Bob or anyone else has sent her. The superincreasing sequence, or the method of deriving it, is Alice's private key. One calls this a _trapdoor_ , which only Alice possesses.

Over the years, researchers discovered ways to crack the knapsack cryptosystem of Merkle and Hellman. Modifications of the original system were proposed. They too in their turn were cracked. Despite these developments, interest in knapsack cryptography is still alive. If more secure trapdoors are discovered, the knapsack problem may yet turn out to be a suitable solution for public key cryptography.

Earlier the same year, a noted coding theorist by the name of R. J. McEliece published a method in a progress report of the Jet Propulsion Laboratory (JPL). His method was a unique application of error-correcting codes to cryptography. Following in the lineage of linear block codes, V. D. Goppa had introduced in 1970 a new class of codes that was a generalization of BCH codes. While long BCH codes were mostly bad, long Goppa codes were mostly good. Goppa codes also had the advantage of fast decoding algorithms. McEliece proposed that one could select a random Goppa code that is capable of correcting _t_ errors. The generator matrix is then pseudorandomly scrambled, this matrix being a mathematical representation of transforming _k_ information bits into _n_ bits that include parity. Evidently, this was an ( _n_ , _k_ , _t_ ) Goppa code. The scrambled generator matrix is made public. The method of scrambling is also known to the decrypting end in advance. The actual transmission of information is rather interesting.

Alice adds redundancy as specified by the scrambled generator matrix. She then adds _t_ errors in randomly selected bit positions. Eve, who intercepts these bits, can do nothing even after knowing the scrambled generator matrix. Eve cannot correct the errors that have been purposely introduced. The intended user, Bob, first unscrambles the bits. He then performs Goppa code decoding that automatically corrects the errors. He obtains Alice's original message and Eve has no clue of what they are talking about. McEliece estimated that with a block length of over a thousand bits, cryptanalysts will have a tough time cracking the system. For Alice and Bob though, the process of encrypting and decrypting was a lot simpler. In fact, it was simpler than what three researchers at MIT proposed in early 1978. Despite this, it was MIT's contribution that would soon become popular.

The trio at MIT were Ron Rivest, Adi Shamir, and Leonard Adleman. Their solution for implementing asymmetric keys came to be called _RSA_. Like knapsack cryptography or Diffie-Hellman key exchange, it too relied on modular arithmetic. There is something special about modular arithmetic that it lends itself so well to modern cryptography. Modular arithmetic operates on a finite field. This means that higher numbers fold themselves into the finite field. Thus, 4 x 4 in GF(5) is not 16 but 16 mod 5 = 1. This many-to-one relationship tends to destroy normal arithmetic relationships amongst numbers. At the same time, relationships involving prime numbers are not destroyed but hidden. When it comes to revealing these hidden relationships, the computation necessary is tedious. Prime numbers happen to be the building blocks of mathematics, like electrons are for atoms and bits are for digital technology.

At the heart of a great deal of modular arithmetic is a famous theorem stated by Pierre de Fermat in the seventeenth century. For most of his career, Fermat worked as a King's councillor in the parliament of Toulouse. If he took to studying mathematics it was only out of curiosity. He discovered some of the greatest theorems in history, the most famous of which, called _Fermat's Last Theorem_ , remained unsolved for decades till the close of the twentieth century. With relevance to cryptography, _Fermat's Little Theorem_ is a gem of mathematics. It states a simple property of prime numbers: _a_ p mod p = _a_ , for any prime number _p_ and any positive integer _a_. This equation comes without conditions or constraints. Its universality is truly remarkable.

RSA method relies on the difficulty of factoring a large integer, the factors of the integer being two large prime numbers. Specifically, when Bob wants to send a message to Alice, he will use Alice's public key E. The plaintext P gives ciphertext C through a simple equation: C = PE mod n, where n is the integer that contains two prime factors. The integer n must as large as three hundred digits. When Alice receives the ciphertext C, she computes the plaintext P = CD mod n, where D is her private key. Eavesdropper Eve knows C, n, and E but still cannot obtain P since she cannot derive D. The only way to do this would be to factor n but this is a really hard problem. Fermat's Little Theorem may be employed to test if a number is composite (non-prime) but this does not lead to any suitable method to solve the prime factorization problem.

E is the one-way function and D is the trapdoor that only Alice has. The RSA system enables Alice to generate a key pair (E, D), an asymmetric key pair that had been the object of many researchers. She publicizes E along with n. She keeps D as her private key. By such means, Alice is able to receive encrypted messages from anyone without having to maintain multiple keys for each of her contacts. Bob maintains his own key pair. So does Eve to communicate securely with her co-conspirators. This was not all.

Just as Diffie and Hellman had suggested two years earlier, RSA was perfectly suited to implement digital signatures. If Bob signed an electronic document and emailed it to Alice, he cannot deny at a later point that the document did not originate from his office. Engineers call this _non-repudiation_. The only way Bob could be right was if someone had accessed his computer and stolen his private key. It therefore becomes necessary by law that stolen private keys should be reported immediately to relevant authorities.

Use of RSA in Public Key Cryptosystems

(a) Bob uses Alice's public key to encrypt messages. Only Alice can read them since only she has her personal private key. (b) Alice uses her private key to digitally sign a document. Bob uses Alice's public key to ensure that document is really from Alice and no one else. (c) RSA is used between Alice and Bob to exchange a shared secret key, which may be used subsequently for data encryption.

Once the mathematical basis for achieving privacy and digital signing were in place, commercialization meant that all support systems must be established. If Alice and Bob were close friends, Alice could simply email her public key to Bob upon request. Anything more elaborate, it was better to have a system that enabled immediate retrieval of public keys. Public keys could be stored with a central authority who must be a trusted party. It is important that the authority is trustworthy. Suppose Bob requests the authority for Alice's public key. Eve, however, has seduced the authority and convinces him to send to Bob her own public key in place of Alice's. Bob encrypts his document using the false key and emails it to Alice. Alice decrypts the document and finds that it makes no sense. She informs Bob. By then it is too late. Eve has already intercepted the message by hacking into the mail server. She decrypts the message easily and learns of Bob's intentions. In this scenario, Eve, clever as she is, has been unable to break into Bob's computer. She therefore identified the authority and the mail server as the weaker links and exploited them to success.

There was yet one more achievement of the year. MIT undergraduate Loren Kohnfelder working under the guidance of Adleman put out his bachelor's thesis titled "Towards a Practical Public-key Cryptosystem." It was alright to share public keys through a trusted authority but there was another method that was even better. If the authority's database was compromised, the whole system collapses grandly. Anything centralized and dealing with perhaps millions of public keys is a logistic and maintenance nightmare. Every transaction needed to query the authority for public keys. It would perhaps be simpler for users to exchange keys themselves but allow an authority to certify those keys.

Kohnfelder described the creation and use of such certificates to remove the burden from a centralized authority. He introduced the concept of _digital certificates_. Such a certificate associated a user with her public key. The user's name and her public key in plaintext, along with various other parameters such as date of expiry, are part of the certificate. What prevents forgery is that all certificates are signed by the authority using his private key. The authority's public key is known to all and this can be used to verify that a certificate is genuine. The use of certificates does not really enhance system security but it does simplify operations and maintenance. This simplification should not be dismissed lightly. History has shown that even the best scientific principles may falter on the field if it is too complex for engineering application. The fact that it is important to simplify operations was summarized in an article in the context of Intelligent Network (IN) rollout in the US,

90% of the effort involved in bringing new services to market is not service software development... it is in operations... though those software may make things less cumbersome for some system engineering, it is impossible to hide these complexities for operations personnel.

It is the use of _Public Key Infrastructure (PKI)_ that makes possible today's e-commerce and secure transactions on the Internet. Suppose one goes to a website and orders a large Hawaiian pizza with generous toppings of ham and pineapple. A little lock may appear on the web browser while the customer is requested to key in his credit card details. This lock indicates that the webpage contains a certificate issued in the name of the pizza vendor. One may click on the lock to satisfy oneself that the certificate is genuine. The browser automatically does this by contacting the Certification Authority (CA) that has signed the certificate. For faster verification, browsers usually maintain an updated list of trusted authorities. This has the added advantage that CA servers may be offline and yet browsers can verify certificates correctly. The same is true for secure communication via email.

What happens if Alice returns from Wonderland and finds that she has lost her house key? The best thing to do would be to call a locksmith, perhaps prove that the house is hers, and get the lock replaced. It is the same with private keys. The entire PKI hinges on the security of private keys. If private keys are compromised, it is important to generate new key-pairs and apply to the CA for a fresh certificate. If private keys are lost, even the owner cannot decrypt messages. Only fresh keys and certificates will help. To ensure that old certificates are not misused, they are revoked and such revocations are published by the CA.

In most cases, trust and verification work in a chain. The certificate of the pizza vendor might have been signed by another certificate issued in the name of the parent company, which in its turn might have been signed by another belonging to a third-party online security agency, and so on. The browser has to track this chain until it gets to a certificate signed by a trusted CA. If the browser does not find a trusted CA in the chain, and in fact certificates can also be self-signed, the browser alerts the user. Any information submitted on such a website is open to prying eyes. To simplify and speed up verification of a chain of certificates, the pizza vendor may supply the browser all the necessary certificates of the chain. PKI is an efficient solution for online transactions but it was not a workable solution for users who wanted to simply exchange emails securely.

In late 1980s, Phil Zimmermann started working towards simplifying digital security for the common person. It was an era in which personal computers were primitive and expensive, at least by today's standards. The use of RSA for data encryption was simply too much of a luxury that home users could not afford. Zimmermann saw right away that the solution was to use simpler symmetric keys for ciphering data but use RSA for establishing those keys. RSA could also be used for digitally signing the emails. He conceived of a system that would provide authentication and confidentiality. Given that transmission bandwidth was limited, he included data compression as well. In other words, once data was encrypted, it would be compressed before being emailed. Interestingly, Zimmermann's system did not require a CA. The system operated on a system of mutual trust. If Alice communicates with Bob, they do not require a CA because they trust each other. They may use digital certificates but these are self-signed. What the common man needed was simplicity of use, from key management to certificate generation, from data compression to integration with email systems. Zimmermann called it _Pretty Good Privacy (PGP)_. It was released in June 1991. It was free and it became almost instantly popular.

The rise of PGP and Zimmermann was not without hurdles. As a spin-off from the laboratories of MIT, RSA Data Security Inc. had been founded in 1982. The company had patented RSA and they did not take Zimmermann's infringement lightly. The US government was not happy either. Cryptographic solutions were in the same category as arms. If enemies employed strong ciphers, it would be difficult for the US to intercept and read their intentions. Strong ciphers were good for Americans but should not be available outside the US. The Internet does not care for such rules and knows no boundaries. PGP was available online and users all over the world downloaded it freely. Although Zimmermann was investigated for years, the case was finally dropped in 1996.

In recent years, there has been considerable interest in a new system that seeks to simplify the definition of public keys. Rather than generating public keys using RSA, the recipient's email address is used to implement a public key. The recipient can obtain his private key from a key server. Known as _Identity-Based Encryption (IBE)_ , this is perhaps the system for the future. For now, PKI based on RSA is the system of choice.

With the coming of RSA, perhaps it was time to finally acknowledge that code makers had got the better of codebreakers. Codebreakers could yet break the system but only if they had enormous computing power. Almost certainly, they didn't. If an eavesdropper can't benefit from listening, the next best thing is to throw a spanner in the works. If Eve can't decrypt messages, at least she can prevent Alice and Bob from communicating. She can do this by simply corrupting the data. Worse still, unknown to Alice or Bob, she can introduce software viruses in an attempt to create trapdoors for herself. Data privacy, user authentication, and non-repudiation are not the only goals of secure communication. Data integrity is just as important. Data integrity means that communicating parties should have methods to verify that data has not been tampered with by third parties who might have access to communication channels.

The basic principle that enables data integrity checks is not very different from coding theory. In coding theory we add parity bits and use them to detect and correct errors. For data integrity, the entire message is used to calculate and append a fixed number of bits. These bits are called by various names depending on the exact method employed— _checksum_ , _hash_ , _Cyclic Redundancy Check (CRC)_ , _Longitudinal Redundancy Check (LRC)_ , _Message Authentication Code (MAC)_ , _Hash-based MAC (HMAC)_ , or _Message Digest (MD)_. These are all one-way functions in the sense that one cannot generate the message using its MAC or digest. This is exactly what is required in the context of saving user passwords. Passwords are usually stored not as plaintext but as ciphertext along with a hash value. As a result, even if the password file is stolen, passwords cannot be read. This also explains why when we forget passwords the system generates a new password rather than telling us the forgotten password. Any system that gives us back the old password is probably not secure since it would have stored the password either as plaintext or does not use a one-way function.

In the more relevant context of data integrity, Alice sends her message to Bob along with a computed MAC. Since the method of computation is known, Bob computes the MAC on his own. If this matches with Alice's MAC, the message is accepted. On the other hand, if Eve has modified the message in any way, Bob's own computation will not match what has been published by Alice. The right thing for Bob to do is to reject the message.

Of the many available methods, MAC is the more secure one since it requires the use of a shared secret key by both parties. All others are generally used without keys and provide basic support for integrity checks. One popular scheme on the Web is called MD5, originally invented by Ron Rivest in 1991. The MD5 algorithm breaks up the message into blocks of 512 bits. It puts each block through a series of rounds and iterations that involve many types of substitutions and permutations. The 128-bit outputs from all blocks are linearly combined to result in a final 128-bit message digest. There is one easy way by which Eve can break the MD5 hash system.

Whenever Eve modifies the message, she can recompute the MD5 hash and send Bob the updated hash. For example, when users download free software from websites, the websites publish the associated MD5 hash on the web server. Eve can infect the software with a virus during download and also change the hash on the webpage. Even if the web server is really secure, Eve can simply impersonate the server and Bob will never know the difference. This is the classic _man-in-the-middle_ method of attack. In this scenario, there is nothing wrong with MD5 itself. The problem has more to do with network security and impersonation of servers. Where customers are conned into divulging confidential information by such impersonating servers, the process is called _phishing_.

If instead of MD5, Alice and Bob decide to use a shared secret key for the purpose of using MAC with messages. They can easily agree on a secret key using RSA or Diffie-Hellman key exchange method. The problem for Eve is suddenly quite challenging. If Eve tampers with the message, she can't calculate the equivalent MAC since she is not in possession of the secret key. Eve, being as clever as always, tries a novel approach. She argues that if she can find another message whose MAC matches exactly what's been published by Alice, Bob will never know that the message has been corrupted. The existence of such a message is for certain since messages of any length are always reduced to a MAC of fixed length. The problem facing Eve is really the classic _birthday paradox_.

In the paradox, we ask how many people should there be in a group so that there is 50% change that any two people have the same birthday. Someone new to this problem might naïvely suggest the answer of 183 when in fact the answer is only 23. To achieve a probability of 99%, one requires only 57 people in the group. This is directly relevant to the problem facing Eve. The question she asks is this—what is the probability that two randomly selected messages have the same MAC? If this probability is high, she can spoil the party for Alice and Bob. Such an approach is called the _birthday attack_. But Eve is in for a disappointment. With an _m_ -bit hash, Eve would have to try about 2m/2 messages on average. For example, a hash algorithm called _Whirlpool_ computes 512-bit hash values. Eve has to try out about 2256 = 1.2 x 1077 messages, clearly impossible for today's computers. Data integrity today is possible and is quite secure if one uses shared secret keys of sufficient key length.

Thus we note that both symmetric and asymmetric keys have a place in today's cryptographic systems. Asymmetric keys solve the problem of key distribution but their main fault is that there are heavy on computation. The situation is much worse for Eve but even for Alice and Bob, RSA takes up a lot of processing power. As a result, in environments where processing power is limited, symmetric keys are preferred because their methods of encryption and decryption are simpler. A good example where almost everything falls into place is WiMAX. WiMAX is a broadband wireless access technology. Equipment that connects to WiMAX networks is required by design to produce a digital certificate. These certificates use RSA signing. In addition, RSA enables exchange of symmetric keys once the equipment is authenticated. Encryption of data is done using either DES or its better successor _Advanced Encryption Standard (AES)_. Management messages exchanged over the wireless interface are important in managing data connections. These important messages are protected using 160-bit HMAC digests, which use shared secret keys. WiMAX is an example that achieves privacy, authentication, and data integrity.

Goals and Methods of Data Security

There is, however, an important commercial system that does not use RSA and yet achieves all the goals of security. Mobile phones rely on symmetric keys for protecting data. Use of RSA is difficult because such devices run on batteries. The processing available is also limited. Symmetric key distribution in this environment is achieved easily because SIM cards are issued by the operator in a controlled manner. Inside each SIM card is a secret key that is also available with the operator in his _Authentication Centre (AuC)_. No one else has access to this secret key. Moreover, no equipment can read the secret key from the SIM even if physical access to the SIM is possible. The secret key is completely internal to the SIM at the user's end and to the AuC at the operator's end. Privacy in GSM and 3G rely entirely on keeping this key secret.

Despite such sophistication, hackers managed to obtain GSM secret keys and clone SIM cards. This was possible only because some operators used weak algorithms based on examples provided in the standards. Operators didn't have the expertise to develop algorithms on their own and so they took the easy approach. This turned out to be fatal for security. So, although the secret key could not be read from a SIM, it was possible to figure it out by feeding controlled inputs to the SIM and then analysing the outputs. The earliest attack came in 1998 and it exploited a weakness in diffusion. In 2002, engineers at IBM obtained the secret key with only eight queries to the SIM. Once the secret key was known, it was easy to obtain all other keys including the ciphering key. Operators realized their mistake and moved to more secure algorithms. Newer 3G systems are much more secure. One of the fatal flaws in GSM was that a third party could masquerade as the cellular network and fool the mobile into connecting to this false network. This is no longer possible in 3G because the mobile also authenticates the network just as the network authenticates the mobile. SIM cloning, which seems to be possible in Hollywood movies in a matter of seconds, is true only in fiction.

Another popular technology that uses symmetric keys is the Wi-Fi standard. Wi-Fi wireless access technology is widespread today. It is integrated into laptops, tablets, network devices, and modems. It is what we use when we access the Internet at airports or cafes. Early Wi-Fi access mechanisms were found to be insecure. _Wired Equivalent Privacy (WEP)_ is no longer considered secure since its successful hack in 2001. In subsequent years, more vulnerabilities were exploited that led to faster hacks. Despite this, many people continue to use WEP without realizing the danger. The replacement to WEP, _Wi-Fi Protected Access (WPA)_ , was released in 2003. German researchers Eric Tews and Martin Beck working at the Technical University of Darmstadt made a partial breakthrough with an early version of WPA that used _Temporal Key Integrity Protocol (TKIP)_ encryption algorithm. They were not only able to obtain the encryption keys but they could also inject packets into the connection and decrypt some encrypted packets into plaintext. They could do all this within only twelve minutes.

The current standard for Wi-Fi security is the WPA2 released in 2004. It uses AES as the encryption algorithm of choice. No inherent vulnerabilities have been discovered thus far and it should be the preferred mode of operation for all Wi-Fi connections. If algorithms and key updates themselves are secure, the weakest links are perhaps elsewhere. Indeed, such a weak link was discovered in early 2012. The attack exploited a vulnerability that was purely operational. Most Wi-Fi access points or routers come with a fixed eight-digit PIN number displayed on the device. This is part of an easy-to-use feature named _Wi-Fi Protected Setup (WPS)_. Such a number helps users who are not tech-savvy, who want a quick and easy way to get connected. Unfortunately, the security of shared secret keys and strong encryption algorithms is compromised in the process. Any user can enter the network by simply trying all possible values of the eight-digit PIN, which is not that many.

During Second World War, German operators sometimes reused machine settings and thus compromised on security. WEP's poor design meant that the same initial values were recycled frequently, thus giving cryptanalysts an easy entry into the system. WPS followed in this tradition where the real problem was in usage and not in the encryption algorithm itself. The best defence today is to simply turn off the WPS feature but this is not likely to appeal to users who find technology difficult. It all comes down to a single question for users—do we value security more than anything else or are we willing to sacrifice a little bit for the sake of convenience? The question for designers is more challenging—how do we make technology easy to use without compromising on security?

Let us recall that a great deal of modern cryptography and e-commerce depends on making computation difficult for the eavesdropper. Computers are not static, meaning that a system that's computationally secure today may not be so few years down the line. When RSA first came into commercial operation, a key length of 512 bits was deemed sufficient. In 1999, researchers factored such a key in about five months using about 300 computers. As computing gets cheaper every month, even without advances in codebreaking, cracking a 512-bit RSA key would take weeks rather than months. Today's computers are not only powerful but they can also cooperate efficiently for a coordinated parallel attack. Today's RSA public keys are at least 1024 bits long. Back in 2003, researchers estimated that data protected using 1024-bit RSA keys would be secure till the year 2010. In other words, it would about seven years for codebreakers to succeed. For most online transactions, this is adequate for today's needs but technology is always shifting the landscape in favour of codebreakers. It therefore becomes necessary for cryptographers to be vigilant of advances in computing. While our sights are on computers of the future, it is just as necessary and interesting to know where they came from.

#  1000 In the Land of Ones and Zeros

**The machine was** impressive and futuristic. It stood three feet high and contained a tightly knit array of vertical axles, interlocking gears and wheels marked with decimal digits. Its flesh and bones were made of iron and brass. Its life force was steam. Most importantly, it had a purpose. It was meant to compute. Numbers were to this machine as cotton threads were to a loom. Impressive as it was to look at, its creator and designer had far greater ambitions. It wasn't a hasty creation of six days. Construction had started in 1822, ten years ago, and it was still far from finished. When completed, the machine would weigh fifteen tons and would contain twenty-five thousand individual parts. The parts would move in perfect coordination to perform work. But this work would be unlike that of the steam engines of Cornish mines or the early railway locomotives. Thermodynamicists had just begun to study the conversions from steam energy to useful work or useless heat. This new miraculous machine produced something different and precious: information.

The computing machine looked complex but its inventor Charles Babbage had envisaged it as a means to simplify human effort. Lengthy and laborious hand computations were to be made automatic. The machine was designed to answer certain specific questions, such as the computation of sines, cosines, and logarithms. Such mathematical tables were of great interest in the nineteenth century. This was the age of steam and the Industrial Revolution. Gone were the days of pure thought. Experiments and observations churned out large amounts of data. These needed to be analysed and matched against predictions from scientific theories and models. From personal experience, Babbage knew that many tables contained too many errors. Humans made errors, but machines, if properly constructed, would be perfect. Unfortunately, the pursuit of perfection comes with its own problems.

Babbage was at the time a Lucasian Professor of Mathematics at Cambridge but he spent most of his time in London. His interest had shifted from an investigation of numbers to the computation of numbers. He had his own lathe. He had hired draftsmen and metal workers in the design and construction of his _Difference Engine_. It was an appropriate name for a machine that had taken shape in the age of steam engines, except that it was not an engine in the traditional sense. It was the world's first ever automatic computer. For all its historical importance, it was never completed. As the years passed, the design evolved into greater complexity. Babbage had missed the plot. He had assumed that design on paper can be easily translated into machines that worked to specifications. An entire theory can take shape in the mind of a scientific genius but engineering success is almost always achieved through incremental steps. Babbage had attempted a giant leap and found himself stumbling along.

Back in 1822, he had presented to the Royal Society of London a working prototype that did simple computations. Impressed, the government had sanctioned funding. The Difference Engine had far grander ambitions than its precedent model. It was designed to solve polynomials in the sixth degree, that is, equations containing variables to the power of six. The targeted precision for these calculations was twenty decimal places. The fact that machines this complex did not exist meant that Babbage and his team needed to first make handcrafted tools before the machine itself could be built. Then there was the problem of precision. The world of machines was predominantly analogue. No one knew much about building computing machines and much less about digital machines.

Computing machines in their earliest forms were simply calculators that performed basic arithmetic. These calculators were grafted from families of astronomical instruments and chronometers. In fact, following the engineering achievement epitomized in the wheel, the next great invention that is claimed to have had an impact on progress is the chronometer. Watchmakers were renowned for their ingenuity and precision. The pinnacle of their achievement is often represented by the work of John Harrison whose invention of 1759, the H4, enabled timekeeping at sea. Harrison had dedicated more than two decades of his life in arriving at this compact and portable model. Naturally, other industries benefited from precision manufacturing and intricate assembly that the watchmakers had introduced in their own field. Among the earliest to apply these to mechanical calculators were Wilhelm Schickard (1623), Blaise Pascal (1642), and Gottfried Leibniz (1671).

These early calculators were decimal, that is, they operated on digits zero to nine. Binary arithmetic had not been invented and naturally its advantages were unknown. Binary calculating machines are simple to design and easier to manufacture. Ironically, binary arithmetic had just arrived on the scene through the works of Lobkowitz but Leibniz had no idea of it. Leibniz himself got interested in binary arithmetic, conceiving it through the Chinese way of the _yin_ and the _yang_. But this was two decades after he had improved upon Pascal's calculating machine. As for the Chinese, they had been using for centuries a mechanical calculator called the _suanpan_ , better known today as the abacus. The Chinese abacus was divided into two sections. The lower section of five beads per digit represented Earth. The upper section of two beads per digit represented Heaven. Together, each digit was decimal in nature. The binary notion that the Chinese carried was at best philosophical and did not apply to arithmetic.

Arithmetic on a Japanese Abacus

Adding 271 to 463, we get 734. This illustration shows the process of addition using a Japanese abacus. Expert users of the abacus can do computations faster than using an electronic calculator.

The result was that early calculators and computers were stuck with decimal numeric system. Centuries ago the Europeans had constrained themselves with the unwieldy Roman numerals. Their liberation came when the decimal system was imported from the Arabs. The decimal system, as good as it was for humans, was unwieldy for machines. Machines had a language of their own but it was up to humans to discover this language and teach it to them. For the period stretching from Pascal to Babbage, machines clumsily attempted to speak the language of humans. The difficulties were quite obvious in the failure of the Difference Engine.

About ten years into the project, Babbage himself lost interest in the Difference Engine because the long-drawn engineering process had shown him a better way of doing things. He conceived of a new machine that he named the _Analytical Engine_. It was an ambitious name, for it suggested the human faculty to process and analyse information. Computers well into the late twentieth century were at best menial servants. They did what they were told to do. The Difference Engine was the first of these programmed servants. Yet there was something vastly better about the Analytical Engine that marked it apart from the Difference Engine.

The Difference Engine did only a few calculations limited by design. Anything else, it had to be dismantled and reassembled with the axles, wheels, and gears in new positions. The Analytical Engine on the other hand was a revolution in computer design. It incorporated key ideas that found their permanent place in modern day computers, though the terms Babbage used were suggestive of nineteenth-century watermills and barns. Numbers were kept in a _store_ (memory). The actual calculation happened in the _mill_ , which corresponds to modern day _Arithmetic Logic Unit (ALU)_. Operations to be performed by the machine were specified on hole-punched cards that were called _operation cards_ (instructions). These were quite distinct from numbers that might be kept on other cards named _variable cards_ (input data). There would also be a control unit that coordinated all actions.

To take a particular example, if the Difference Engine could compute a functional value _f_ (x) = ax6 \+ bx5 \+ cx4 \+ dx3 \+ ex2 \+ fx + g, the Analytical Engine could compute any general function _f_ (x) = axn \+ bxn-1 \+... In other words, more than just the functional coefficients, even the degree of the polynomial _n_ became an input to the Analytical Engine. The Difference Engine was confined to one specific problem. The Analytical Engine was extensible by design. It was therefore vastly more powerful. It was more than just an adding machine that the Difference Engine had been. How had Babbage achieved this? How could the machine be commanded to do just about anything?

The Analytical Engine's novelty was that it could be commanded by the use of interchangeable operation cards. The machine itself, in all its physical form and metallic constitution, represented the hardware. The sequence of holes punched on paper cards made up the software that specified the exact sequence of operations. Hardware was the name for the machinery that made computation possible—cranks, levels, gears, wheels, axles, pins, and punched cards. Software was the name for the methods of computation. Software looked cryptic because it was in a language of its own, a sequence of holes that made sense only to the machine and its programmer. This separation between hardware and software that had eluded previous designers was the key innovation that Babbage brought to calculating machines. This separation was what gave the Analytical Engine the ability to tackle any kind of computation using the same machinery. One didn't need to build a new machine or rearrange its parts. It was sufficient to feed the machine with a different set of cards. In arriving at this architecture, Babbage gave adequate credit to Joseph Marie Jacquard, who only a few decades earlier had invented the method of punched cards and applied it to automatic looms.

It so happens that weaving shares something innately simple and common with the world of computation. No matter how complex the pattern, weaving involves two simple actions that work together. The weft threads are attached to a shuttle that is thrown across the warp threads attached to the loom's frame. Patterns are created by simply lifting some warp threads while the shuttle is thrown. Thus, warp threads sometimes appear below the weft threads and at other times above. Even if the weft and the warp are of only two colours, a complex pattern can be created. It occurred to Jacquard that whether a warp thread is above or below the weft can be specified easily by the use of hole-punched cards. It was a binary method of input since there were only two positions of warp. The presence or absence of a hole perfectly suited this requirement. This was perhaps the earliest use of binary input to a machine. Actually, it was more than just an input. It was programming of the machine, a concept that Babbage directly applied to the Analytical Engine.

If the machine were to live up to its analytical powers, how did it manage to take decisions? It was understandable that the machine could read cards, add up numbers, and punch out the results, but could it take decisions? In other words, did the machine possess certain mental attributes found in humans? Babbage answered these early doubts by convincing arguments. The machine did not need to take decisions in ways humans did. No causal or cost-benefit analysis was needed. The machine must merely be taught to recognize certain conditions, such as the occurrence of zero or infinity in an ongoing computation. When such a condition was met, the cards specified what the machine ought to do. It could skip a bunch of cards and jump ahead; or it could backtrack to a previous card and recompute with a different set of numbers. The machine did not innately possess decision-making skills but it could be taught to make simple choices.

Teaching a machine to recognize zero was one thing but infinity was something else. For centuries, even humans had faced a great deal of difficulty understanding infinity. Babbage explained that for a machine, infinity was only relative. The Analytical Engine was designed to handle fifty decimal places and its store could contain a thousand numbers. No doubt it was finite but this was the machine's understanding of infinity. In fact, computers even today are finite. Computers are discrete devices and can never achieve infinite precision in their computations. In this context, they share with the world of digital communications the same principles of quantization and finite entropy. In his design, Babbage estimated that fifty decimal places were more than adequate for the scientific needs of the time. Yet in principle, he saw that it was possible to achieve greater precision given the luxury of time,

It is impossible to construct machinery occupying unlimited space; but it is possible to construct finite machinery, and to use it through unlimited time. It is this substitution of the _infinity of time_ for the _infinity of space_ which I have made use of, to limit the size of the engine and yet to retain its unlimited power.

Essentially, what Babbage meant was that it was possible to calculate the value of pi to a million decimal places even if the machine was limited to handling only fifty places at a time. It's just that the machine took longer to do this. This extension of power and precision had two aspects to it. Obviously, the first was the aspect of computation itself. The second was in fact more subtle. If the machine could store only a thousand numbers, where would it store the million digits of pi? Babbage saw the answer in the punched cards, which served not only as an input for the program and constants of the problem at hand, but also as a method of output. The cards could store results of a problem permanently, thus extending the in-built storage capacity of the machine. Such storage of computational results had an another advantage.

If another program required the value of pi to the accuracy of a million digits, the machine need not necessarily recalculate it all over again. It could simply read previously computed values from the punched cards. This was a classic engineering choice for the programmer who operated the machine. If it was faster to read from cards, it was a better way to reuse available data that recalculate them. In other words, entire tables of sines, cosines, or logarithms could be stored on punched cards and called up to service the needs of future programs. In this sense, nothing the machine ever did was wasted. It created information that could be used as building blocks for solutions to more complex problems. No problem seemed unattainable for the Analytical Machine. Understandably, the problems that Babbage concerned himself with were of a mathematical nature. He perhaps did not foresee that such computing machines would one day track stock trends, book airline tickets, or control the opening and closing of a dam.

Construction of the Analytical Engine started in 1833 but it followed alarmingly in the footsteps of the Difference Engine. Every bit of progress seemed to suggest a better design and a redesign was ordered. One example is the problem of carry digits, a formidable one even for Babbage. Suppose one adds 1 to 999, then a carry digit occurs thrice in the process of this addition. If the machine implemented carry operations in sequence, computation time would become a limiting factor, particularly when the concerned numbers were fifty digits long. Babbage solved this by inventing what he called _anticipatory carry_. The real issue was that Babbage's design was in a constant state of flux. Real construction of the Analytical Engine showed little progress. Precision manufacturing of the watchmakers had not fully percolated into other industries. Skilled labour was hard to find. Craftsmen migrated to newer opportunities in America and medieval guild systems showed signs of decay.

Another decade went by and the machine showed no signs of reaching completion. Once more Babbage had reached out farther than he could grasp. The government finally pulled the plug in 1842. Perhaps it was only his mother who still had any faith left. She urged him to complete it even if he should end up living only on bread and cheese.

For all its ingenuity and supposedly supreme power, the Analytical Engine was never completed. Yet it exists, though only on paper, as the world's first programmable computer employing many key ideas found in modern electronic binary computers. One therefore expects that all computers that followed in later decades were evolutions of Babbage's Analytical Engine. One expects someone would have taken up Babbage's cause and taken the machine to completion. Instead, people found that manually prepared tables were good enough for most purposes. There were errors but they were few enough to be insignificant. If at all computations had to done, slide rules served the purpose well. Slide rules, an invention of the seventeenth century was deceptively simple and yet they reigned the world of computing for almost three centuries.

Babbage's failure had been expensive. The government had spent £17,000, money enough to buy almost two dozen steam locomotives. At best, people constructed machines on the lines of the simpler design of the Difference Engine. If Babbage had not managed to inspire future generations of computer architects, it was perhaps strangely fortunate. Computers now had a chance at a new thread of evolution that would be ideal for their binary disposition.



**The true beginnings** of modern computing lies not in numerical calculations but in telephony. Early telephony was mostly about connecting two parties directly and maintaining that connection while speech was carried along the wires as modulated electrical waveforms. This was the transmission aspect of telephony. There was another aspect of telephony that was just as important: _switching_. As telephony grew, it became obvious that direct connections were simply not viable. A hierarchy emerged. Local exchanges handled call processing and routing in a specific geographical area. Trunk offices emerged at higher levels in the hierarchy. A call from one city to another, perhaps in another country, was routed through many levels and passed through many switches. The job of these switches was to establish the correct electrical circuits from one end to the other. Electrical wires were used on a shared basis. When the call ended, the circuits were relinquished and available for anyone else who might need them. Switches opened and closed circuits as required. For decades, they were the workhorses of the telephone network, working almost religiously in the background.

Early switches were electromechanical relays built on the principle of electromagnetism. Using such contraptions as coils, springs, plungers, and metallic contacts, switches were designed to close a circuit when its associated electromagnet was energized and vice versa when de-energized. There were also earlier switches that were pneumatic valves. Then came electronics, born at the start of the twentieth century. The technology for electronic switches had been available for quite some time thanks to the work of Fleming, de Forest, and Harold Arnold. By the late 1930s, Arnold's thermionic amplifiers had pretty much become standard in telephone networks but their use as switches had a delayed entry. Electromechanical relays had been so effective that there had been no need to seek for a replacement. The exact nature of a switch is merely a technical detail. In all cases, a switch implemented a binary state machine. In abstract mathematical terms, an open switch could be seen as a zero and a closed switch as a one.

That a switch could be seen as a physical realization of abstract mathematics happened almost simultaneously across continents. A. Nakashima proposed in a Japanese publication of 1935, a theory for the construction of relay circuits. Three years later, V. I. Shestakov published his dissertation at the Lomonosov State University, Moscow, about how mathematics could be employed in designing electrical networks. That same year at MIT, Claude Shannon, long before he fathered information theory or flirted with cryptography, analysed relay and switching circuits using mathematics. With Shannon's dissertation, it became clear that this was probably the most important master's thesis ever to be published at the time. The diversity of these publications reflected early on that something important was taking shape. A missing link was about to fall into place and guide many engineering disciplines—switching theory, logic design, system theory, and signal processing. In the process, it became obvious that computers should be binary machines.

When all the world's mathematicians, scientists, and engineers had become comfortable with decimal numbers, it was always going to be difficult to move to a different numbering system. When they took note of the binary system, it was only in sudden spurts of interest followed by long periods of indifference. In philosophy, the Chinese adopted the binary interpretation of the universe from which all their metaphysical explanations took shape. When it came to arithmetic, Leibniz proposed the binary system towards the end of the seventeenth century. More than a century later, Galois introduced his algebra of finite fields. Though he did not emphasize much on binary fields, operations in GF(2) are very much the legacy of Galois. Towards the close of the eighteenth century, binary forms were first used in communications. Murray shutter telegraphy is one example. Quite similar to Murray's was the invention of Louis Braille in 1820s. Braille's method of writing and reading for the blind was simply binary encoding of the language alphabet using six raised dots arranged in a rectangular pattern. A decade later, Morse code with its dots and dashes was yet another binary signalling method for versatile communications. Braille Code and Morse code made the point that while binary arithmetic was due to binary number representation, binary communication was due to the twin processes of encoding and decoding. What this implied was that not just numbers but also alphabets could be represented in binary. It was about this time, in the mid-nineteenth century, that a new form of algebra was born using the binary system.

In the beginning, Galois algebra was really a tool for understanding algebraic equations of higher orders. Mathematics has always been about numbers and counting. Somewhere along the way, symbols were introduced into mathematics. The ancient Greek geometers had laid down their postulates and proved their theorems using named points and segments. They did not have symbols to work with. Symbols such as =, +, -, and >, did not appear until the fifteenth century or later. When they did, thanks in part to algebra, proofs and explanations became terse, readable, and in many cases self-explanatory. Mathematics was not just about numbers but also about symbols. The increasing use of symbols is one reason why the subject could became increasingly abstract and still make sense to mathematicians. In this scheme of things, a mathematics professor at Queen's College, Cork, proposed a new way of looking at Aristotle's logic.

Coming from a humble background and son of a Lincolnshire cobbler, the professor was George Boole. To Boole, it became clear that mathematics could be used to analyse logic. Aristotelian logic had always been associated with philosophy. What Boole did was to cast logic in the language of mathematics. He corresponded with another English mathematician Augustus de Morgan who had been thinking on similar lines. What was logic anyway? It was a system of postulates, which by the application of human thought led to inferences. For example, if all cats are mammals and if tiger is a cat, it is inferred by Aristotelian logic that tiger is a mammal. In Boole's proposed method, one could use symbols to represent classes of entities and operations that concerned them: _x_ for mammals, _y_ for cats, and _z_ for tigers. Non-mammals are then represented as (1- _x_ ), non-cats as (1- _y_ ), and non-tigers as (1- _z_ ). To say that all cats are mammals was equivalent to writing _xy_ = _y_. In other words, if from a class of mammals one picks out a cat, it is same as picking out any individual from the class of cats. The power of mathematics becomes apparent when the same equation is written as _y_ (1- _x_ ) = 0. In words, this says that from a set of cats, one would fail trying to pick out a non-mammal. Zero represented emptiness while one represented completeness. This is the reason why one could write non-mammals symbolically as (1- _x_ ) by taking away mammals from something that was complete.

The underlying principle is that everything has a complement—true or false, black or white, mammal or non-mammal. The simplest type of choice was binary in nature. It was possible to breakdown complex decision making into steps that required binary choices. Likewise, it was possible to build complex logic by combining a multiplicity of binary choices. Suppose someone wanted to be in London for a meeting. He would take the evening flight if a ticket was available ( _u_ ). If not, and his wife had not taken the car ( _v_ ), he would drive instead. To formalize the decision making, he could apply Boole's novel method: _u_ \+ (1- _u_ ) _v_. Binary choices, simple as they are, can sometimes prove to be difficult, just as Hamlet showed us when he uttered the famous words, "To be or not to be." One can only imagine Hamlet's confusion had he been presented with more choices.

Since then, such an algebra of binary choices and operations has come to be called _Boolean Algebra_. In the 1930s, Shannon and his contemporaries were among the first to see that this algebra had relevance to the design of relay systems. Relays were binary in nature—open or closed. Algebraic manipulation could be used to simplify complex circuits. Consider a circuit represented by f( _w_ , _x_ , _y_ , _z_ ) = _x_ \+ _xy_ z + _yzx'_ \+ _wx_ \+ _w'x_ \+ _x'y_ , where _x'_ is complement or negation of _x_. Engineers generally call such functions _transfer functions_. They represent the manner in which input characteristics are transferred to the output. The rules of Boolean algebra could simplify f( _w_ , _x_ , _y_ , _z_ ) without affecting its behaviour. In fact, this function is as simple as _x_ \+ _y_. Inputs _w_ and _z_ have no effect on the output. To realize this circuit, we need to perform only a single addition. With fewer circuit elements, the obvious advantages are system reliability and cost.

Boolean operations had circuit equivalents. Addition was an OR operation, meaning that at least one of two inputs had to be true to give an active output. This is something like two switches connected in parallel and only one of them needs to be closed to turn on the light. Multiplication was an AND operation, meaning that both inputs had to be true to give an active output. In this case, the switches were in series and both must be closed to complete the circuit and turn on the light. NOT operation was for negation. XOR (exclusive-OR) operation resulted in an active output if exactly one of the inputs was true. XOR operations are commonly used in cryptography to transform plaintext to ciphertext and ciphertext to plaintext.

Boolean Logic Applied to Basic Electrical Circuits

(a) With AND logic, the lamp lights up only when both switches are closed. (b) With OR logic, it is sufficient for a single switch to be closed to light the lamp. (c) Same as (a) but the switches are normally closed and opened when activated. (d) A more complex circuit involving both series and parallel switches.

It was in 1936 that Shannon entered MIT as a research assistant. His job was to manage and maintain a new type of computer named the _Differential Analyzer_ , first completed in 1931. This early computer was certainly not an autonomous unit that could run for days without user intervention. It needed constant attention and Shannon was tasked to provide it. The Differential Analyzer was the brainchild of Vannevar Bush. Ironically, it could neither perform differential calculus nor analyse anything. Nevertheless, it did solve ordinary differential equations of high orders, normally not solvable by analytical methods but solvable by numerical approximations. Such equations are often effective models of real-world systems. They could describe heat flows, population dynamics, and transient behaviours in electrical circuits. Anyone who witnessed the workings of this machine would have found little resemblance to Babbage's calculating engines.

Indeed, the Differential Analyzer had evolved from a different gene pool of machines that operated in the analogue world. Analogue computers were models of real-world systems. Their parts were manufactured to model physical parameters of the system, just as a column of mercury models temperature in a conventional thermometer. Gear ratios, wheel diameters, angular positions of levers, or electrical voltages were descriptions of analogue machines that represented system variables. What separated such analogue machines from the digital engines of Babbage, was that all variables were updated together since all these parts worked in unison. This was quite different from Babbage's Analytical Engine, which read the punched cards sequentially and executed the commands one at a time. Secondly, parts of the analogue machine always represented the same quantities. In the Analytical Engine, a mechanical part might represent water pressure at the start but a few seconds later could be calculating the rate of flow. Analogue computers were custom-made for specific types of problems. In fact, the very word "analogue" in the context of computing arose mainly because such machines drew an _analogy_ to real-world systems.

The Differential Analyzer solved differential equations without even performing differentiation. The workings of this machine can be traced to the early nineteenth century when land surveyors were finding it difficult to measure areas for the purpose of registration and taxation. Out of necessity, Johann Hermann, a Bavarian land surveyor, invented in 1818 an instrument that could calculate land areas by the process of integration. This instrument evolved over the decades leading to the popular _wheel-and-disc planimeter_ of Swiss engineer Kaspar Wetli. This formed an important design mechanism of the Differential Analyzer. Such integrators were attached to rotating shafts and interlinked to model differential equations. When a pointer traces curves on a map, the point of contact between the rotating disc and the wheel varied. In the process, integration happened and map area was calculated. It is one thing to design a mechanism. It is quite another to apply it in novel ways for which it was never intended. The idea of using the planimeter beyond its original purpose of land surveying is due to William Thomson. Apparently, thermodynamics and electromagnetism were not the only fields of his scientific contributions.

Back in 1876, Thomson's purpose was to automate Fourier analysis. His brother, James Thomson, had been working for sometime in improving Wetli's planimeter. From a chance discussion with his brother, William Thomson got the idea that mechanical integrators could be used to solve differential equations. In a flurry of activity, the Thomson brothers prepared four papers to be presented before the Royal Society of London. Thomson himself assembled a harmonic analyser for use at the Meteorological Office for tidal data analysis. Though other applications of the integrator were introduced, nothing on a large scale appeared on the scene until Bush's Differential Analyzer.

The Differential Analyzer was hundred-ton machine with four-foot cabinets housing integrators. Long rotating shafts extended from one end of the room to the other. Tables of moving parts served as input and output. All in all, the setup looked part of a factory floor than a research laboratory. Much of it was controlled by the hardy electromechanical relays. Shannon, while working with the Differential Analyzer, made the crucial link between these relays and Boolean algebra.

Evolution of Integrators

(a) Wheel-and-disc planimeter is a mechanical analogue integrator. (b) Electronic analogue integrator built around an operational amplifier. It has no moving parts. (c) Modern electronic digital integrator based on sampling and accumulation of sampled values.

The same year (1937) that Shannon completed his master's thesis, George Stibitz of Bell Labs conceived and built what may be considered the world's first binary digital calculator. It was too trivial to be considered anything more than a proof of concept but it worked. It took two 1-bit numbers and added them up. Naturally, since Bell Labs was into telephony, Stibitz relied on telephone relays to build his 1-bit adder. This was the stepping stone that led Stibitz to demonstrate in September 1940 a machine he called the _Complex Number Calculator_. Among its internal machinery were 450 relays and 10 crossbar switches, both of which were essential parts of telephone switching systems. For user interface, teletype terminals and printers were used. The calculator could handle arithmetic operations on 8-bit complex numbers. The impressive part of the demonstration was not that the machine could do complex number calculations but that it could do them remotely.

Teletypes were provided at a conference at Hanover, New Hampshire, where this demonstration took place for the American Mathematical Society. More than four hundred kilometres away in New York, the machine expectedly waited for commands to arrive by telephone lines. When it received them, it performed the required computations and sent back the results within a minute over the same lines. This was perhaps the world's first instance of remote computing. Under the hood, unknown to unsuspecting users, the computer performed a key magic that even the great Leibniz had missed.

Leibniz had built a decimal calculator but in later years he also saw the power of binary arithmetic. The fact that he did not build binary calculators could be because he lost interest in mechanical calculators; or he simply did not see how to reconcile between two worlds. For machines, the binary system was perfect. For humans, decimals were perfect. It would be inconvenient to ask humans to think in bits just for the convenience of machines. Wasn't it meant to be the other way? Wasn't the whole purpose of calculators to ease the work of humans? Stibitz found the solution to this problem. Let humans work with decimals and let machines work with bits. In between, there was to be a conversion. Moreover, let machines do this conversion automatically. Novel as it may sound, teleprinters had been doing this for years using Baudot Code. Telephone switches since the 1910s had been translating dialled decimal pulses into machine-suitable representations. Stibitz extended the idea to machine computing. The problem was that neither Baudot Code nor alternative encodings were suitable for computing. A new code had to be invented.

The code Stibitz used is today called the _Excess-Three BCD_ , BCD standing for _Binary Coded Decimal_. In this system, every decimal digit was represented by its 4-bit binary equivalent plus three. Thus, nine was 1001 + 0011 = 1100. Every decimal digital thus acquired a unique 4-bit binary representation. The excess-three BCD had certain properties that suited machine requirements much better than plain BCD. To give an example, 14 + 5 was understood by the machine as 0100 0111 + 1000 = 0100 1111. This result is obviously wrong because 1111 is invalid in excess-three BCD, though valid in pure binary. This happened because when we added two excess-three BCD numbers using binary arithmetic, we ended up with an extra three in the result. The result is easily adjusted by subtracting three from it. Thus, we have 0100 1111 \- 0011 = 0100 1100, which is the correct result of 19. It was now a simple matter to convert the result into its decimal form and show the user "19" rather than a difficult string of ones and zeros. Users saw numbers in decimal. Teletypes communicated in Baudot Code. The calculator did the computations in excess-three BCD. There was no need to impose a single numbering system for everyone.

This principle is today embodied in all modern computers. When we type documents or write our emails, we see only numbers and alphabets. The computer converts these characters to a code known as 7-bit _American Standard Code for Information Interchange (ASCII)_ , which is a widely accepted standard for character encoding. When we attach binary files to emails, the contents of these attachments are automatically converted by the email software to ASCII. This is done because most traditional email software can transfer data only in ASCII format. This conversion explains why binary attachments size up an extra 36%. Modern computers, unlike Stibitz's creation, do their arithmetic in pure binary. The fact that we use ASCII is only for interfacing computers to humans. The necessary conversions are performed by the computer.

If computing machines had gone into hibernation due to lack of scientific interest or industrial need, the 1930s was the decade when they came out of it. Concepts gone into slumber since the time of Babbage were revived. Many computers appeared, sometimes independently, sometimes as successive improvements to older models. Germany's Konrad Zuse starting working on a binary computer called Z1 in 1936 but a reliable version called Z3 appeared five years later. The work of Zuse was little known in the West and Allied bombings had destroyed many of his machines during World War II. The Z3 was built out of telephone relays. Inspired by punched cards, it used punched 35-mm movie films. It was programmable by simply feeding in different sets of 35-mm films.

Researchers at the Iowa State University completed in 1941 something similar to the Z3 except that it was not programmable. It was designed for a specific purpose—to solve simultaneous linear equations. Since it was built out of vacuum tubes instead of relays, it was faster. Named after its researchers, the Atanasoff-Berry Computer (ABC) was perhaps the world's first computer that was both binary and electronic. It also used capacitors attached to rotating drums as its memory, making it one of the earliest departures from punched cards. In other words, memory underwent a transformation from being mechanical to electrical. The seeds of this transformation had started in 1918 when two British physicists had put out a patent application titled "Improvements in Ionic Relays."

William Eccles and Frank Jordon proposed the use of relays not for switching but for storage. They did not actually use the words memory or storage. They possibly did not foresee the impact their invention would have on the design of computers. Two triodes were connected in a certain configuration with feedback from the output of the second triode to the input of the first. The fact that the circuit was endowed with memory was apparent in the description given by the inventors,

The result of these processes is that a positive stimulus from outside given to the grid of the first tube initiates a chain of changes, which result finally in the plate current of the first tube attaining the highest value possible under the E.M.F. of its battery and the plate current of the second tube falling to its lowest possible value. This condition persists after the disappearance of the initial stimulus.

To return to the initial condition, that is, clear the memory, it was sufficient to disconnect the two triodes for an instant. Known in technical literature as the _latch_ or _flip-flop_ , the invention of Eccles and Jordon is an important milestone in computer design. A flip-flop toggled between two electrical states, which were maintained internally even when external triggers had subsided. For the first time in history, an electronic machine was endowed with memory. The first building block of logic design had arrived. It is easy to miss the significance of this achievement since the word memory is so common in modern computing. Usually, it signifies a hardware module that enables storage. Let us think rather in terms of the ability to remember. A single gear turn for a mechanical digital computer was replaced in the electronic computer as the _clock_. These clocks were generated from periodic oscillations of quartz crystals. They formed the heartbeat of the computer. Later developments of the flip-flop controlled by clocked pulses enabled computers to automatically count. With memory, heartbeat, and counting abilities, computers were beginning to acquire some attributes that humans possessed. Later history proved that it would take a lot more to make them intelligent.

These developments had nothing to do with programming the computer in the software sense. The behaviour of the computer was defined in hardware logic. It was an alternative way of design that Babbage had not foreseen. When certain applications demanded faster computation, such hardware implementation would be preferred over purely sequential software implementation.

The flip-flop stood at the forefront of these new developments. It underscored the idea that computers required two kinds of memory. One was of the simpler and well-known kind that enabled input, output, and storage. We are familiar with these in the form of CDs and USB memory sticks in which we carry our data around. But such a memory would not be enough to enable computation. Computing machinery needed memory that would be close at hand and in the same manifestation in which computations were performed. Modern names for such memories are _registers_ and _Random Access Memories (RAM)_. In Babbage's Difference Engine, these were gears and wheels. In electronic machines, these were flip-flops. Registers or RAM in the form of punched cards would be slow and ineffective.

In the late 1930s, Howard Aiken of Harvard University convinced IBM to sponsor the design and construction of a programmable computer. The fact that it could be programmed was its attraction. Programs were fed via perforated paper tapes, not punched cards. The punched card had originated with Jacquard's loom and was adapted into computing by Babbage. What really gave impetus to punched cards was their use in processing the US census data of 1890, thanks to the genius of Herman Hollerith. That was the beginning of automated data processing. Perforated paper tape had a different lineage. Its beginning was in telegraphy and teletypes. Aiken's computer, named Harvard Mark I, or more formally the Automatic Sequence Controlled Computer (ASCC), was perhaps the first to use these tapes in computing. Otherwise, the design of the ASCC was somewhat backward. It was decimal, not binary. It used electromechanical relays when researchers at Iowa were already working with vacuum tubes for their ABC. Despite this, the ASCC evolved through many versions and continued to be used actively well into early fifties.

Paper Tapes and Punched Cards

(a) Paper tape with holes contains characters encoded in Baudot Code. These were commonly used by teleprinters. Source: Ricardo Ferreira de Oliveira, Wikimedia Creative Commons. (b) Hollerith puched card and its punch used during the 1890s in the US. Source: Indolences, Wikimedia Creative Commons.

World War II saw two famous machines, one of which played a key role in cracking the German Lorentz cipher. The Colossus was in binary, used vacuum tubes, and took input via perforated tapes. It could be programmed but only through changes to plugboards, patch cables, and switches. By the end of the war, ten such machines were in operation, all of which were promptly destroyed when the war ended. The other machine, the ENIAC (Electronic Numerical Integrator and Computer), was built at the University of Pennsylvania under the direction of John Mauchly and J. Eckert. The ENIAC was a strange mix of old and new. It used decimals rather than bits but adopted the latest in electronics in place of the older electromechanical relays. It was built from flip-flops. It was programmable in the same way as the Colossus. For most of the war years, the ENIAC was set to calculate the trajectories of bombs. One might say that the machine was set to a specific problem at the start. This no doubt meant that the ENIAC was faster than the ASCC, which read program instructions from perforated tapes. The ENIAC sacrificed flexibility for speed of operation.

The ENIAC's place in history as the world's first electronic computer is somewhat diminished because it wasn't a binary machine. Even so, it was a formidable machine by the standards of the era—18,000 tubes, 30 tons, 150 kilowatts of power, and 200 microseconds for an addition. Design transition from mechanical to electronic meant that it was faster than any of its predecessors. When its inventors applied for a patent in 1947, they explained the ENIAC in 207 pages of which 91 were hand-drawn figures. Without the sophisticated design tools of twentieth century, one admires the creation. It was just as the Egyptians had built the Pyramids without sophisticated machinery.

In the post-war years, every new computer was binary, electronic, and programmable. When the war ended, it was almost as if everyone knew exactly how to forge the flesh, bones, and nerves of a computer. The days of exploration were over. One report that pointed the way forward was published by John von Neumann in June 1945. Neumann had been associated with the ENIAC project and the difficulties of the ENIAC suggested to him a few improvements. Perhaps he also knew about Babbage's Analytical Engine and was justly inspired by its neat structure. Without concerning himself too much about a computer's construction, von Neumann focused on the logical organization of its parts and how they interacted. In short, his concern was computer architecture. While Babbage had talked about operation cards and variable cards, von Neumann saw that it was best to combine the two in the same memory. Instructions would contain operations while at the same time either contain data or point to memory locations that contained data. This would give the computer the speed of the ENIAC while retaining programmability. This was the beginning of _stored program_ , which has ever since become standard in computer architecture.

Most of the world's computers are based on the _von Neumann architecture_. It is binary in nature and performs computations in modular-2 arithmetic. It incorporates the essential ideas of the Analytical Engine. It learnt from the tried and tested approaches of computers that had gone before. Some elements of the important stored program approach were part of Zuse's Z3. The legacy of von Neumann really is that he put down on paper ideas that had been unconsciously in circulation, as if engineers had been testing the waters. Von Neumann formalized these ideas, put them in a solid framework and thereby charted the future course of modern computing. Among the first notable computers that came out in von Neumann architecture were the Manchester Mark 1, the EDSAC, and the EDVAC.

Before computers became digital in the 1930s, it was analogue computers that ruled the world. Some found useful employment as late as the 1970s. The inspiration of Boolean algebra and the extensive use of relays in telephone networks, combined to shape the idea of binary digital computers. This then is the birth of the modern computer, coming through two distinct threads of development. One was the development of binary methods from Leibniz to Boole. This we may rightly term as the software thread of development. The other was the development of technology by way of precision manufacturing, mechanical integrators, relays, and electronics. This was the hardware thread of development. Though analogue computers contributed to the development of their digital counterparts, the effect of Boolean algebra was much greater. By late 1940s, computer architecture was more or less established. Then something happened that shook up just about everything in the world of electricity, computing, and communications.



**It is sometimes** said that a bad idea is a good idea ignored; and a lost opportunity is a knock on the door that went unanswered. The relevance of these statements will become apparent as we turn back the clock and witness a chance discovery that Michael Faraday made in 1833. It is a fact that electrical conductivity of metals decreases as temperature increases. While investigating the properties of silver sulphide, Faraday noticed that the effect was exactly opposite. He could not explain it and satisfied himself by stating that this was "an extraordinary case."

Four decades later, German physicist Ferdinand Braun found himself probing a galena crystal (lead sulphide) with a piece of thin metal wire. He discovered that the delicate contact between the wire and the crystal caused current to flow in only one direction. Braun had discovered rectification and had invented by chance the world's first rectifier. While John Fleming would invent the rectifier on his own two decades later, Braun's discovery had nothing to do with the Edison Effect. It relied on a completely different phenomenon. Let us recall that in Braun's time, the electron was unknown and electricity could not be explained. It was therefore premature to even aspire to explain galena crystal's special property.

The discovery of the electron by J. J. Thomson changed everything. Fleming invented his own diode. De Forest invented his audion, which led to Harold Arnold's vacuum tube amplifier. Technology took an obvious path simply because scientists had failed to pursue glimpses of another world that Faraday and Braun had peeked. This was the world of semiconductors, elements in the periodic table that sat between electrical conductors and insulators. If only someone had investigated this world in the nineteenth century, today's technology might have been with us half a century earlier. No one did, which meant that vacuum tubes cemented their place as a key enabler of technology. Telephone networks and computers used them by the thousands. This delay probably did not change the course of history. Semiconductors went on to reinvent technology and replace vacuum tubes almost completely.

The truth is that electron discovery was not enough to understand semiconductors. It was only after quantum physics matured towards the end of the 1920s that there was enough scientific knowledge to begin a study of semiconductor behaviour. Even then, there was considerable doubt about semiconductors. Noted physicist Wolfgang Pauli commented in 1931 that, "one shouldn't work on semiconductors, that is a filthy mess; who knows whether any semiconductors exist." That same year, Alan Wilson of Cambridge University explained that semiconductors behaved in their unique ways because they contained impurities. Towards the end of that decade, researchers explained semiconductor rectification in terms of asymmetric charge accumulation on semiconductor surfaces. This accumulation formed a barrier to electron flow. Among these researchers was Nevill Mott of Bristol University and German Walter Schottky, whom we had encountered two decades earlier in relation to electronic noise.

Semiconductor diodes were a good replacement for their vacuum tube counterparts. Problem was that there was no equivalent replacement for the vacuum tube amplifier. Replacing just the diode would not bring improvements of performance or reliability at the system level. Vacuum tubes, though in popular usage, weren't all that good. Engineers used them because there was nothing better. These tubes needed a warmup time of perhaps a minute or more before they became operational. They were power hungry so much so that Ralph Bown of Bell Labs compared them to "sending a twelve-car freight train, locomotive and all, to carry a pound of butter." The tubes took up lots of space. They often burned out and had to be replaced. Among the thousands of tubes that made up the ENIAC, when any one went dead, there was no easy way to figure out which of the tubes had to be replaced. Their soft warm glow attracted moths and caused problems. Engineers had to sometimes "de-bug" the hardware of dead moths. All these problems did not go unnoticed by Mervin Kelly at Bell Labs. Given the tremendous growth of telephony, he had the foresight to see that vacuum tubes could not sustain this growth on their own. By the mid-1930s, semiconductor research was underway at Bell Labs.

Semiconductors are part of a general discipline of study called solid-state physics that is built on the foundations of quantum physics. By early 1930s, a few universities were offering courses on the subject. Those who had studied these courses were ideal candidates to join the research team at Bell Labs. William Shockley joined in 1936 with a doctorate from MIT. James Fisk from MIT joined in 1939. During World War II, Shockley and Fisk would be among the first to create a fission reactor, which was promptly classified. The inventors were denied the rights to patent it. Charles Townes was recruited from Caltech the same year as Fisk. If anyone had lingering doubts about the usefulness of quantum physics, Townes would go on to invent the laser and prove otherwise.

Walter Brattain had been part of the Bell Labs team since 1929. Unlike Shockley, he was more of an experimentalist than a theorist. Brattain's skills were of such a class that he could build almost any circuit that an experiment might require. By the time Shockley joined, Brattain and others had been investigating rectifiers using copper oxide. The copper oxide rectifier had been invented in 1926 by L. Grondahl and P. Geiger of a Westinghouse subsidiary, but no one had managed to satisfactorily explain how it actually worked. The team at Bell Labs hoped to shed light on the matter. They hoped it would lead to a reliable semiconductor rectifier that could be used as switches in telephone networks. There was no explicit intention at the time to build a semiconductor amplifier. When Schottky put forward his theory in 1938, it occurred to Shockley that a semiconductor amplifier might be possible. Initial attempts to build one failed. Then came World War II.

During the war years, semiconductor research took on a different turn and the application that triggered the change was radar. To locate an enemy aircraft to greater accuracy, it was necessary to move towards higher frequencies. Neither vacuum tubes nor copper oxide semiconductors worked well at high frequencies. Braun's point-contact rectifier worked well except that galena crystals were unsuitable for commercial use. Focus shifted towards silicon and germanium, both semiconductors, which had been ignored earlier simply because manufacturing practices had not been able to fabricate them at required levels of purity. Demands of war pushed engineers to improve fabrication. Researchers at Purdue University worked on germanium while at Bell Labs the focus was on silicon. The nature of silicon is that if a controlled amount of phosphorus is added to it, the crystal ends up with an excess of electrons. If boron is added instead, there results a deficiency of electrons (called _holes_ ). Silicon crystals of these types were named n-type and p-type respectively.

One day at Bell Labs, purely by chance, Russell Ohl ended up with a silicon crystal that had n-type on one side and p-type on the other. This unique crystal might have gone unnoticed if not for the experiment that Ohl subsequently performed. Ohl shone light on the crystal and measured a sharp jump in current flow. Moreover, different parts of the crystal behaved differently. By careful probing and measurements, Ohl discovered the junction where the n-type and p-type regions met. This was the beginning of the famous _p-n junction diode_. In the process, Ohl had also discovered the basic principle of today's photovoltaic or solar cells. Ohl's discovery of 1940 and the progress achieved in handling silicon and germanium, no doubt resulted in better radar detectors and radar signal modulation. As a by-product, it set the scene for semiconductor research after the war.

When the war ended, Mervin Kelly reformed the research group at Bell Labs and it was no longer about copper oxide. Germanium was easier to work with than silicon and it became the chosen material. The war had initiated many engineers at Bell Labs into radar and wireless technology. This sort of technical cross-fertilization was essential to basic research. The atmosphere at Bell Labs encouraged open participation. A number of seminars and study groups were organized. Bell Labs after the war was a better research environment that the best universities. Mervin Kelly gave his team a free reign. He understood that basic research could at best be guided and not directed. The Solid State Department was organized into five subgroups, one of which was led by Shockley. His team was a mix of metallurgists, chemists, theorists, and experimentalists. Physicist John Bardeen joined the team in 1945 and almost immediately work started in earnest.

Given that Ohl's p-n junction diode looked a semiconductor equivalent of the vacuum tube diode, there was a real chance that semiconductor amplifiers could be made. One possible approach was to follow the way of the point-contact rectifier. If a grid could be inserted, one might be able to produce amplification. The problem was that the region of contact between metal and semiconductor was as small as a hundredth of a millimetre. Brattain had attempted this in the 1930s and failed. The other approach was to effect a charge separation within a thin slab of semiconductor by applying a transverse electric field. The charge carriers being mobile, changes to the electric field should be reflected as change in conductance. This was Shockley's idea and came to be called the _field-effect_. What happened next stumped everyone.

Shockley's theoretical predictions appeared right but the first devices built at Bell Labs simply did not work. It was worrying and frustrating, mainly because the failure reminded them of failed attempts before the war. It was clear that there was a discrepancy between Mott-Schottky theory and experimental results. From this failure came the most important refinement to semiconductor theory. Bardeen explained that electrons on the surface can get trapped in what he called _surface states_. These trapped electrons formed a barrier to the electric field and thereby greatly reduced the expected field-effect. This was a significant discovery that had eluded researchers since the earliest experiments of the late 1930s. The team was forced to take a step back. Amplification would have to wait. It was more important right now to understand these surface states and figure out a way to overcome the surface barrier. The study of surface states consumed a good part of research resources and lasted from March 1946 to November 1947. Even proving the existence of the surface barrier that Bardeen predicted was not easy. Brattain did it in April 1947 using the photovoltaic effect. With every experimental result, theory was refined. Bardeen and Brattain formed a synergy, one supplying the theory and the other setting up the experiments. In November 1947, they began the final assault. A semiconductor amplifier was in sight but it would be quite different from Shockley's original field-effect conception.

What happened in the subsequent six weeks was less of a planned attack than a series of experiments, accidents, and meticulous observations. A great deal of discovery happens because accidents are not dismissed but investigated. Theory can go only so far and the rest is experimentation bordering on chance. To start with, water condensation on the semiconductor surface caused problems. To avoid this, Brattain immersed the setup in various electrolytes. To his surprise, he found that this increased the photovoltaic effect. Apparently, positive ions in water attracted electrons in the silicon and thus the surface barrier was overcome. This directly inspired Bardeen to introduce a second circuit for the purpose of amplification instead of just observing the photovoltaic effect. This was really an old idea of Shockley that had been in waiting but its time had finally arrived. Investigative research of surface behaviour had finally given way to prototyping of an amplifier. Many electrolytes were tried along with methods of attaching them to the silicon surface.

Early December, following a suggestion of Bardeen, silicon was replaced with high quality germanium, which researchers at Purdue University had been using. There was moderate amplification but it was hardly useful since it didn't amplify anything more than 10 Hz. Again by chance, Bardeen and Brattain observed that glycol borate, the electrolyte they had been using, interacted with germanium and formed an oxide layer. They performed a few experiments with such an oxide layer with a deposit of gold on top to be used as an electrode. The results were disappointing. Chance intervened once more when somehow the oxide layer was inadvertently washed away. The gold electrode directly touched the germanium crystal. The gold was supposed to be negatively biased to the battery but when Bardeen and Brattain tried a positive bias, to their great surprise the output was modulated in line with the input at the gold electrode. Gold behaved like the grid of a vacuum tube.

Germanium n-type was used in these experiments and it had a p-type inversion layer where holes congregated at the surface near the oxide layer. It was supposed that a positive bias would repel the holes out of the inversion layer and into the n-type bulk. The December experiment proved that gold supplied the holes and resulted in amplification at the output. In an n-type germanium, electrons are majority carriers and holes are minority carriers. Everyone had focused on majority carriers the last two years. The latest experiment showed that the key to amplification was in minority carriers. No theory had predicted this. Research intention with electrolytes, the oxide layer, and the negatively biased gold electrode had led to something else. It was not even field-effect. On December 16, 1947, the team obtained voltage amplification of 15 and power amplification of 1.3 at 1 kHz. A week later, voltage and power gains improved to 100 and 40 respectively. Amplification of audio signals was demonstrated. Thus was born the _point-contact transistor_. In July 1948, the transistor was announced to the outside world just as Shannon's monumental work on information theory appeared.

The point contact transistor was a delicate device to start with. Two thin strips of gold, separated by half a millimetre on a base on germanium were held in place by a spring-loaded plastic contraption. Brattain had performed a surgery of sorts to assemble the first working prototype. Shockley, who had been absent during these weeks of hectic progress, now took up the baton. Within a few weeks of the discovery, he worked out an entire theory of semiconductors in which he explained the essential role of minority carriers. He showed how transistors could be built out of p-n junctions rather than point-contacts. He explained that injection of minority carriers is not limited to surface layers, as Bardeen had supposed. They could penetrate bulk germanium provided the base region was very thin. A thin p-type layer within an n-type germanium, an n-p-n transistor, could be used for amplification. It is remarkable that Shockley conceived the entire theory of _bipolar junction transistors (BJT)_ purely from an initial attempt to explain the workings of the point-contact transistor.

Making a junction transistor was no trivial task but technology has a way of spawning necessary supporting technologies. At Bell Labs, Gordon Teal and Morgan Sparks built equipment to grow crystals. They invented the pellet-dropping process to introduce impurities in controlled manner so as to produce n-p-n germanium transistors. The first samples were out by April 1950. These performed just as Shockley had predicted. About a year later, junction transistors were introduced to the world. That same year William Pfann and Henry Theurer introduced _zone refining_ , a technique that yielded crystals of high purity. With this state of the art, it was possible to achieve impurity levels of just one part in ten billion. The point-contact transistor had been invented independently in France but it had come six months too late. Despite its early use in French telephone networks, it was difficult to achieve reliable operation. Point-contact rectifiers and transistors had this inherent problem of a single point of delicate contact between metal and semiconductor. Moreover, they had no strong theoretical explanation. On the other hand, starting with the work of Shockley, junction transistors were placed on a much firmer theoretical foundation. By mid-1950s, it was obvious that junction transistors had sidelined point-contact transistors. As for vacuum tubes, there were becoming salvageable parts in second-hand markets.

The transistor occupied a unique position in history. Usually basic research precedes commercial application by many years and even decades. The transistor was a technology for which there was already an established market thanks to vacuum tubes. The transistor entered the market almost immediately after birth, first in low-power portable devices and later as a replacement to the vacuum tube. The cost of skipping childhood showed up visibly as poor reliability and performance. Quite a lot of research through the 1950s was about manufacturing and materials engineering. It took time for the industry to acknowledge reliability issues and longer to fix them.

Though Bell Labs had patented the technology, it was licensed to interested parties from 1952. This was a positive move to foster competition and faster technological progress. By 1952, hearing aids using transistors appeared on the market. A year later, they entered into computers, first as point-contact transistors and later as junction transistors. For computers, transistors were a boon. Transistors used as switches could operate much faster than vacuum tubes. They consumed far less power and needed no warmup time. Compared to vacuum tubes, they operated far more reliably for months without failure.

This was a problem for Patrick Haggerty at Texas Instruments (TI), one of the companies to have licensed transistor technology. If transistor operated so reliably, there would be no failures and no replacements. The market would saturate quickly. Haggerty was a true visionary. He saw straightaway that the key to success was to take the transistor to the common man. He hired Gordon Teal from Bell Labs. Teal teamed up with Willis Adcock in search of ways to work with silicon. While all transistors up to that point were of germanium, silicon was a better choice at high and sub-zero temperatures. Problem with germanium was that its characteristics changed with temperature, which made some people comment that they made better thermometers than amplifiers  Meanwhile, Haggerty gave a clear mandate to his engineers. The goal was to make a pocket radio that consumers could carry around with them. A few years ago this would have seemed impossible but with the transistor around, new possibilities had been opened up.

In January 1954, Morris Tanenbaum of Bell Labs made the first silicon-based transistor from modifications of the original technique of Teal and Sparks. Bell Labs did not actively pursue this programme. Three months later, Teal and Adcock at TI crafted one of their own. In a famous demonstration of May 1954, a transistorized radio was dipped in hot oil. The radio continued to play. The fate of germanium was sealed. From then on, silicon was going to be the material of choice. TI's transistorized radio became a hit with consumers. The word _transistor_ , till then confined to technologists, entered common vocabulary. This was the turning point for TI as well. For a few years, it had little competition in silicon transistors. Meanwhile in 1955, Shockley obtained private funding to start his own semiconductor firm. It was based out of the Santa Clara Valley in Northern California. The decision was to work with silicon, not germanium. If not for this decision, we might have had Germanium Valley instead of Silicon Valley. Though Shockley's company did not fair well, it was the starting point for many other Silicon Valley companies—Fairchild Semiconductor, Signetics, General Microelectronics, National Semiconductor, Intel, and Advanced Micro Devices.

While these advances were happening in silicon, early 1950s saw an explosion of research in logic design. Since Shannon's publication showing the link between Boolean algebra and relay circuits, there had been marginal progress in this field. The best that had happened was to write down combinations of inputs and their resultant output in the form of a table. These came to be called _truth tables_. Part of the reason was that vacuum tubes did not afford the luxury to design complex circuits. With more tubes, the probability of failure at system level was greater unless engineers had taken the trouble to add redundancy in design. With transistors, reliability came naturally. Complex circuits could be built for complex needs. One major issue was in design. Boolean algebra, though it came with clear rules, was more of an art that only an experienced engineer could wield effectively. Theorists started looking for better methods of _minimization_. The idea of minimization is to reduce a switching function to as few elements and operations as possible. This leads to reduced cost and complexity in electronic circuitry that implements that function.

Early on, it was recognized that every function could be written in canonical forms. It could be either a _sum-of-products (SOP)_ or a _product-of-sums (POS)_. For example, an SOP canonical form would be f( _A_ , _B_ ) = _AB_ ' + _AB_. The same function can be written in POS as f( _A_ , _B_ ) = ( _A_ + _B_ ')( _A_ + _B_ ). The two are considered duals of each other. While canonical forms are not minimal, they form an ideal starting point that can lead to minimization. Because of their definite structure, minimization was possible due to repeated application of known rules of Boolean logic. In fact, the example above can be reduced to f( _A_ , _B_ ) = _A_ , if one applies the rule _X'Y_ \+ _XY_ = _Y,_ since _X'_ \+ _X =_ 1. It was in this spirit that W. V. Quine of Harvard University proposed a tabular method to mechanically minimize logic functions.

Towards Minimization via Canonical Forms

Expressing functions in canonical forms facilitates minimization through automated tools. In the above example, (a) and (b) are duals. Likewise, (c) and (d) are duals. The Boolean logic of (c) and (d) can be represented using symbolic logic gates as shown in (e) and (f) respectively.

The Quine-McCluskey method was proposed in 1956 and became a useful method among engineers. While the method was often long and tedious, a computer could be programmed to execute it on behalf of the design engineer. Indeed, computers of the 1940s were behemoths that could not be easily operated. Computers built from the late 1950s used transistors. They were more easily programmable, accessible, and reliable. Applications of computer usage expanded greatly. Computers that simply computed in the arithmetic sense were primitive machines. Computers could do much more. They could design, control, and automate. Computers became an essential tool of industry. Since then, they have become classic examples of technology that begets future technologies.

There was, however, an alternative to the tabular method. Visualization in science and engineering has a rich history right from the time of the Renaissance. Leonardo da Vinci conceptualized flying machines and bridges through a series of engineering drawings. Sometime in the eighteenth century, Gaspard Monge invented descriptive geometry. He put this to good use in the design and construction of forts. Monge's invention is today essential for architects who make floor plans or for machinists who make precision equipment. More than a century later, American engineer Henry Gantt invented charts to aid in planning and executing projects. These charts are used even today. In all these early examples, it was recognized that drawings were more powerful than words. It was to such visualization, and the ability of humans to recognize patterns, that a Bell Labs engineer turned to for the purpose of minimization.

What Maurice Karnaugh did was to represent switching logic as a rectangular grid indexed by input Boolean variables. Each variable had a pair of rows or columns valued at either zero or one. Thus, every part of the grid could be uniquely indexed. Those grid positions that gave an active output were marked as ones. It was now a simple matter to read the logic directly by looking at the ones on the grid. This gave the canonical SOP. It was also possible to read the zeros and invert the logic to obtain the canonical POS. Either way, this visualization gave a new interpretation to canonical forms. SOP could be seen as a union of minimum terms, terms that covered a minimum area of the grid. POS could be seen as an intersection of maximum terms. Karnaugh called this a _map_. Today these are universally called _Karnaugh Maps_ or _K-Maps_.

Application of Karnaugh Map

A woman relies on latest weather predictions as well as prevailing conditions to decide if she needs to carry an umbrella before going out. She could use K-maps to aid her decision making. (a) She marks all cases in which she needs to carry an umbrella. (b) She adds bit-encoding to the K-map. (c) She groups the cells and simplifies the Boolean logic. She decides that she will carry an umbrella if the skies are cloudy regardless of prediction, or if an experienced weatherman predicts rain even if the skies are clear.

What was really powerful about this visualization technique was that an engineer could see all the ones that were adjacent to one another. She could therefore combine adjacent ones into subgroups that could be described more succinctly with fewer variables and operations. A minimal form could be read visually without recourse to Boolean algebra. Moreover, Karnaugh pointed out certain "don't care" conditions. These were combinations of inputs that never arose during operations and hence outputs for these didn't need to be specified. While Boolean algebra, due to its precision, did not deal with these invalid conditions, map visualization showed how they can be used profitably. The design engineer could at her discretion put a zero or a one for these input conditions so as to obtain better minimization. These "don't care" conditions acted like jokers in a pack of cards. They could take any value in aid of minimization.

In his paper of 1953, Karnaugh gave illustrative examples. Ease of use matched by clear examples made K-maps popular. Even today, students are taught to use it for digital logic circuit design. For that matter, K-maps being well suited for Boolean logic had a much wider scope than just circuit design. Karnaugh's method became an alternative to problem analysis that might have been done previously using Venn diagrams. John Venn at Cambridge University had invented these diagrams in 1880 to aid analysis in logic. This subsequently found application in set theory and probability. K-maps could be seen as extensions of Venn diagrams with bit encoding of distinct regions. To illustrate the wide application of K-maps, consider the following scenario quoted from a mathematics journal:

Consider the unlikely situation of Alec, Bill and Cecil being rather unreliable witnesses to a robbery. The three suspects detained for questioning are Slippery Sam, Fingers Fred and Butch Brown. On questioning the witnesses gave conflicting reports. Alec said, "Sam and Fingers did it, Butch is innocent." Bill said, "Sam and Butch did it, Fingers is innocent." Cecil said, "All three are guilty." From their other statements the police suspect that Alec and Bill are each correct about only one man, and that Cecil is definitely a better witness than either Alec or Bill. With these assumptions, who committed the robbery?

It is possible to find the culprits using K-maps. The statements of the witnesses, qualified by their veracity, can be translated into algebraic equations. The equations of possible suspects according to Alec, Bill, and Cecil can be written down respectively as: A = _sf'b_ \+ _s'fb_ \+ _s'f'b'_ , B = _sfb'_ \+ _s'f'b'_ \+ _s'fb_ , and C = _sfb'_ \+ _sf'b_ \+ _s'fb_ \+ _sfb_. Each equation is marked on a K-map. The intersection of all three K-maps ( _s'fb_ ) gives out the culprits. It turns out that Sam is innocent; the other two aren't. The generality of applying K-maps to problems beyond digital circuit design was not in question.

The problem addressed by Quine and Karnaugh was combinatorial in nature, that is, how best to combine the inputs to give the required logic at the output. Timing was not of essence in such circuit designs. The circuits had no memory. There were, however, other circuits in which timing control was essential to correct operation. Called _sequential circuits_ , they had memory. The simple flip-flop was an example of such a circuit. Flip-flops involved feedback. What would happen if the feedback signal was delayed while the input had already changed? When flip-flops were part of a counter circuit, would delays like this affect the counting? Sometimes the effects could be disastrous including an electrical short-circuit. Neither truth tables nor K-maps were adequate to describe sequential circuits. New methods of analysis had to be invented. Not long ago, David Huffman had done some exceptional work with entropy coding. Huffman's greater contribution came in 1954 when his PhD thesis from MIT was published.

Almost single-handedly, Huffman invented the essential tools for analysing sequential circuits. He showed how sequential circuits could be transformed to combinatorial forms so that standard K-maps could be used to synthesize them. New methods call for new terms. Huffman introduced the _composite transition matrix_ that could be used to identify unstable conditions. He introduced a descriptive tabular method that he named the _flow table_. Flow tables were powerful tools to visualize circuit behaviour. With flow tables, it became easy to see the stable states and the sequence through which unstable states settled down to stable ones. Since sequential circuits had memory, past inputs determined the current state of the circuit. Physically, these states could be represented as states of flip-flops, electromechanical relays, or transistorized switches. The aim of Huffman's methods was to design a safe circuit while using only a minimal number of states.

Unsafe behaviour can occur if, for example, a high-level pulse occurred when the line was supposed to be on the zero level, or vice versa. Engineers call this a _hazard_. One can imagine what might happen if the green man at a traffic light didn't change to red in a timely manner. Safely can be ensured by introducing more states so that operations were properly sequenced. For example, it is better for the green man to blink for while before changing to red. It is better for all lights to be red for a while before lights turned green for vehicles. It was better to introduce amber lights rather than have just red and green lights. Flow tables enabled such safe design.

Each row of the table represented one state of the system. In designing sequential circuits, states are as important as inputs. Sometimes _race conditions_ occurred when two switches compete against each other. Depending on which one wins the race, the next state of the system could be different. Introducing new intermediate states gave more control and removed race conditions. Huffman also proposed suitable ways of bit encoding the states. Arbitrary encoding might lead to race conditions. In one of his examples, he showed that Gray Coding worked well.

No doubt, more states meant greater cost and complexity but the gain was safety. Nonetheless, Huffman presented ways to minimize the number of states without compromising on safety. He called this _redundancy_ , falling back on a familiar term from the world of information theory. One could sometimes use only a few states and still achieve safety but such a system would have a slow response. Alternatively, a high number of states in the system could give both speed and safety of operation. In the process of explaining his methods, Huffman noted that a designer

would like to specify what the circuit output state is to be for all possible sequences of input states, and not for just a single sequence. It is all very well to say what the output states should be for some "normal" sequence of inputs; but if there is even a possibility that other sequences might occur, then circuit action must be specified for these sequences also. A complete problem specification must indicate clearly what happens for each conceivable set of circumstances.

This is one of the cardinal rules of engineering. Every design or implementation must make allowance for unexpected input events or scenarios. Unexpected scenarios can be identified by simple enumeration. If a decimal digit is represented in BCD, only ten combinations of 4-bit patterns are allowed. The remaining six are unexpected. Such scenarios should be caught and brought to attention for corrective operator action or automatic recovery procedures. This is important because in some cases invalid scenarios, if allowed to enter the system, may lead to hazards. A familiar example is from the world of mobile telephony. To prevent misuse, a GSM SIM locks up after three wrong PIN entries. In another technology called _Power over Ethernet (PoE)_ , power is delivered to devices such as Wi-Fi access points via Ethernet cables. If such a device is designed to established standards, it will first check if voltages and currents on electrical lines are within strict limits. If not, the device will simply refuse to operate for its own safety. Perhaps the best example of handling unexpected inputs is in channel decoding. The essence of a channel decoder is to handle unexpected inputs. It either detects errors and gives a warning, or corrects them where possible.

Following closely at the heels of Huffman, two researchers at Bell Labs, George Mealy and Edward Moore, took up the study of sequential circuits on similar lines but with a different focus. Rather than confine themselves to relay switching or digital circuit design, which obviously had been Huffman's main intent, they took a generalized view of the system. Their focus fell on the system's internal states. Moore, in particular, adopted a novel approach of looking at the system as a black box. In other words, the internal states of the system were unknown. An experimenter fed into it a sequence of inputs and studied the ensuing sequence of outputs, both of which in the simplest case were sequences of ones and zeros. The experimenter then attempted to distinguish the different internal states of the machine. Interestingly, if any two internal states were indistinguishable from each other, it implied that one of them was redundant. It could be removed without affecting system behaviour. Huffman had proposed the same thing by the method of merging equivalent rows within the flow table.

By focusing on system states, both Mealy and Moore analysed systems pictorially using state diagrams. These came to be called _Finite State Machines (FSM)_ and became essential tools for _automata theory_. This theory is of such wide scope that it could be used to model not just computing machines but biological systems as well. Any system can be modelled as a set of states. In each state, the system reacts in certain ways to a set of inputs. This reaction leads the system into a possibly new state while also giving out a set of outputs to its surrounding environment. The Mealy and Moore machines functioned in abstraction with clear rules that connected inputs, outputs, and bit-encoded states. Coming out of abstraction and into the real world, they could be applied easily depending on the definition of the system at hand. The system could be as complex as the Enigma ciphering machine of the Germans or as simple as a coin-operated vending machine. It could be the growth model of a virus or a climate model of the Antarctic.

An essential difference between the Mealy machine and the Moore machine was that outputs in the former depended on present inputs and system state, while in the latter they depended only on the present system state. In the Moore machine, inputs are used to affect the state, not the output. Outputs are derived indirectly from states. In the beginning this difference was mainly pedantic but soon it was realized that Moore machines led to safer designs since they eliminated hazards. Inputs can occur at any time in an uncontrolled manner. By having them affect the outputs directly, Mealy machines did achieve reduction of number of states but the design was not necessarily safe. A Mealy machine was something like answering a phone while driving. With a Moore machine, the driver first parked his car (a new state) and then answered the call (the output).

Mealy and Moore Finite State Machines

This illustration demonstrates the use of such machines to model a coin-operated vending machine. It can be seen that Mealy machines give outputs based on current state and input. Moore machines give outputs based on current state alone. Inputs only change states in Moore machines, thereby aiding safer design.

Huffman, Mealy, and Moore talked about synthesis of sequential circuits but it is well known in engineering circles that synthesis does not stand alone. A lot of synthesis is aided by analysis. The two form a sort of symbiosis that drives technology forward. Analysis is the study of system components and how they interwork with one another. Through analysis, engineers attempt to gain insights into system performance. From such insights, engineers then make incremental changes to make better systems. Putting together components in better ways, or making better components, is the synthesis problem. Synthesis without analysis is rare in technological history. Before sequential circuits could be designed better, it was necessary to analyse inputs, states, and their transitions. Tools of analysis were state diagrams, flow tables, and timing diagrams. Huffman and company did not invent the analysis-synthesis paradigm, which is as old as engineering itself.

The upshot of all this research was that switching networks, digital circuits, and digital computers could do much more. If the transistor had opened up new possibilities, new design techniques completed them. Sputnik I might not gone to space if not for these developments. PCM technology, which had been in cold storage for two decades, finally found its purpose when PCM/TDM was launched in 1962. Digital transmission of voice had finally become a reality. Decoding of convolutional codes and RS codes might have hibernated too had they not come at a time when the transistor had already made vacuum tubes obsolete. The first transatlantic telephone cable of 1956 might not have operated quite so reliably had it been based on vacuum tube amplifiers. Yet through these sunrise years of digital progress, there was a new problem brewing in the pot. Strangely, it looked very much like the problem telegraphy had faced the previous century.



**Engineers called it** "the tyranny of numbers," "the number barrier," or "the interconnection problem." A typical digital computers built in the fifties consisted of hundreds of transistors and thousands of diodes. When the design had been done and the parts manufactured, the real problem was connecting them together. Parts had to be wired together by hand. It was a time-consuming task and there was no easy way to ensure that the wiring was right. As transistors replaced bulkier power-hungry vacuum tubes, circuits packed more devices in a given space. This meant more wiring connections in smaller dimensions. Although this problem began to bother the industry only towards the late fifties, engineers had foreseen it a decade earlier. In fact, the first solution to this problem had come during World War II but it was to fulfil a completely different need.

The US Army required a radio proximity fuse that would trigger bombs to detonate when they came close to the target. It was therefore necessary to first miniaturize the vacuum tube and then simplify the wiring. From this need was born the first _Printed Circuit Board (PCB)_. The underlying technologies that led to the first PCBs were available in other industries. It was a matter of adapting them to electronic circuitry. The basic idea was to print metallic contacts and interconnections on an insulating base through techniques established in printing and photography. Circuit components could then be soldered to these contacts without worrying about the wiring. Messy wires could be reduced to printed conducting lines. In addition, passive devices such as resistors, capacitors, and inductors could also be printed if the manufacturing process could achieve desired tolerances. Best of all, the process suited mass production. After the war, the technology was declassified and PCBs appeared in hearing aids as early as 1947.

After the birth of the transistor and its early success, there was a greater drive towards miniaturization. PCB technology was good but not good enough. One could not print active devices on them—diodes and transistors. The goal was clear though the path was not. At a conference in 1952, British engineer Geoffrey Dummer commented,

With the advent of the transistor and the work in semiconductors generally, it seems now possible to envisage electronic equipment in a solid block with no connecting wires. The block may consist of layers of insulating, conducting, rectifying, and amplifying materials, the electrical functions being connected directly by cutting out areas of the various layers.

Dummer's comment coming so early in the history of solid-state electronics was overly optimistic if not wildly futuristic. The techniques for achieving this goal were unknown. It was not even clear if this was what the industry wanted to pursue. The more immediate focus was really on refining the production techniques for junction transistors. The earliest of transistors were of the grown-junction type. Through the fifties, steady progress was made towards better methods: alloy-junction, diffusion, oxide masking, double-diffused mesa, and epitaxy. Each of these was an innovation on its own, the intricacies of which were understood only by those well versed in the art of semiconductor fabrication. These incremental improvements resulted in better control of impurities and ability to operate at higher frequencies.

Among the early attempts to build multiple transistors on the same semiconductor base, and thus achieve smaller footprint, was one of Werner Jacobi, a German engineer working with Siemens. Jacobi's idea, for which he applied a patent in 1949, was based on the point-contact transistor. Given the difficulty of working with point-contact transistors, it was an idea that went nowhere. Three years later, Bell Labs' engineer Sidney Darlington had better success. By combining two junction transistors in a certain configuration, he obtained higher amplification. Wires sprung out of the semiconductor surface to connect the two transistors. So, although they were in the same piece of semiconductor, the wiring problem had not been addressed.

Perhaps the most notable failure of the early fifties was the approach pursued by the US Navy in a project named _Tinkertoy_. The idea was to design and manufacture modules that contained standard printed wirings. Components would be assembled on each module by automated machines. Many such modules could be stacked up and interconnected by electrically conducting pins. Complex circuits could be built by various combinations of standard modules. This approach solved the wiring problem though it did not lead to smaller circuits for the same functionality. Millions were spent on the project but the single factor that led to its downfall was solid-state electronics. The project had started on the wrong foot by selecting a technology on the decline, the vacuum tubes. Tinkertoy became a classic case in which a novel technology can quickly make obsolete systems that are too dependent on an older technology.

Better ways of silicon fabrication did not directly lead to miniaturization. Though Darlington and others had managed to put more transistors on a single silicon slice no one had yet managed to put entire circuits. As early as 1953, Harwick Johnson of RCA thought of fabricating resistors, capacitors, and a transistor on a single germanium slice. Serious pursuit of this goal it did not come until 1957. Even then, RCA engineers only had marginal success, achieving prototypes but never a reliable product. Similar prototypes were produced at Bell Labs, IBM, MIT, and Japan's MITI. These were demonstrations of simple circuits—oscillators, shift registers, or ring counters. All these might have been stepping stones to greater glory but all of them stalled at the first milepost. Historians of technology have not been able to offer satisfactory explanations. The simplest explanation is that there was no immediate business need and the future of miniaturization lay beyond the horizon. As late as 1958, the goal articulated by Dummer was viewed as science fiction. It would appear today that researchers had failed to see new possibilities when many elements of the solution were already in place.

Meanwhile, the tyranny of numbers had begun to raise its ugly head. A few companies started to take note of the urgency as well as the opportunity. Miniaturization was more of an issue for the military than for commercial applications. The Bell System, the largest consumer of electronic devices in the commercial space, was more focused on quality and reliability than miniaturization. The first push therefore came from the military. In 1957, Jay Lathrop and James Nall working at the US Army's Diamond Ordnance Fuse Laboratories in Maryland invented photolithography. This enabled them to attach diodes and transistors to printed circuitry without using wires. Using photolithography, they assembled a counter built out of flip-flips. It was battery-powered and attached to a small lamp that blinked periodically. The circuit was the size of a postage stamp. Photolithography is so important that even today it is hard to think of a digital device that does not rely on it.

By 1957, miniaturization had given way to microminiaturization. It was no longer about smaller circuitry using vacuum tubes. Microminiaturization was about solid-state devices. The US Army Signal Corps initiated a programme called _Micro-Module_. It was not fundamentally different from the ill-fated Tinkertoy project except that it did not use vacuum tubes. Moreover, it could benefit from recent progress made by Lathrop and Nall. RCA was the lead contractor but others including TI and Fairchild supplied diodes and transistors. Micro-Module was a multimillion-dollar programme that lasted well into the sixties. It mitigated the wiring problem but the true solution was not far away.

Solution to the wiring problem came from two different sources, each of which was associated with the Micro-Module programme. At TI, Jack Kilby joined in May 1958 with a clear mandate to work on microminiaturization. It occurred to him that the inherent difficulty in using multiple materials and processes put a limit on how much one could miniaturize. Wiring could be reduced but not eliminated. The best way forward was to fabricate all circuit components from semiconductors. This came to be called the _Monolithic Idea_. This was hard for others to accept. Decades of research had informed engineers that nichrome was best for resistors and Mylar for capacitors. Kilby was now suggesting to discard all that research in favour of silicon. By September, Kilby managed to demonstrate a working oscillator circuit. Since it was only a prototype, individual parts of silicon were isolated by air and connected using gold wires.

Independently at Fairchild, Jean Hoerni went back to the process of oxide masking and looked at it from a new angle. Oxide masking, which had been invented at Bell Labs in 1955, was used to protect the silicon surface during the diffusion process. After fabrication, it was usually washed away. Hoerni thought that perhaps it was better to leave the oxide layer on top. From this simple line of questioning, came out one of the greatest inventions in semiconductors. Transistors, which thus far had been in a three-dimensional form, were flattened and worked better. In particular, this development was essential for digital devices. Older transistors had the problem of leakage currents, which meant that a one might become a zero. Hoerni's new _planar process_ , with a protective oxide layer on top, produced devices with lower leakage currents. What happened next was even more astonishing.

Robert Noyce of Fairchild took Hoerni's idea to the firm's patent attorney. It became clear that this was an idea that had little prior art. In such a case, the patent application had to be as expansive as possible. The attorney pushed Noyce to think of anything and everything that could be done with the planar process. From such questioning, Noyce conceived the method that finally solved once and for all the wiring problem. Once all the components were laid out in silicon, an oxide layer on top protected the junctions except at those points were devices were to be interconnected. On this oxide layer, one could lay out aluminium lines just the way one would lay out lines on a PCB. The patent application was sent out in July 1959, five months after Kilby's own patent had been filed. About a year later, Fairchild produced the first silicon IC.

This was the beginning of the _Integrated Circuit (IC)_ , without which much of today's digital world would have remained unrealizable. The IC, for all its pervasiveness and complexity, is in fact fundamentally simpler than the transistor itself. The transistor was invented after a long period of study. It required an understanding of quantum mechanics. Even then, the transistor that finally arrived was quite different from the one conceived. It was a product of experimentation, accident, and observation. Explanations of how it worked took longer. The IC on the other hand, did not rely on basic science. All the problems and solutions that led up to its invention were related to fabrication. The IC did not bring anything new to scientific understanding. This might explain why initial response was lukewarm. High cost and poor performance were cited against it.

Semiconductor Diode, Transistor, and the Integrated Circuit

(a) A p-n junction diode. (b) An n-p-n transistor can be seen as two p-n junctions. (c) Modern ICs easily contain more than a million transistors. Source: Angeloleithold, Wikimedia Creative Commons.

Back in 1945, Vannevar Bush had proposed a model for R&D. This was to serve as a blueprint for the course of R&D in the US in the post-war era. Bush explained that new technology proceeded from new science. The evidence before him was compelling. It was from nuclear fission that the atomic bomb was born. It was through quantum physics that radar detectors had improved. It was Boolean algebra that had provided the basis for digital computing machines. His own Differential Analyzer from the 1920s, an analogue computer, had diminished in importance. Bush was only partly right. The integrated circuit did not rely on new science. The science that had assisted the transistor was sufficient for the IC. Everything that made the IC was an innovation in technology. It may be rightly said that the IC was a creation of technology rather than of science.

It may also be said that there is much in known science that is as yet unexploited. Engineers need not wait for new science to show the way. They can continuously innovate given the current state of scientific understanding. In his classic work of 1995, Nicholas Negroponte, a pioneer in human-computer interaction, took a more radical view of innovation. He claimed that innovation in the digital age had more to do with human needs. The transistor and the chip were less important to innovation than our need for mobile communications, multimedia, or global connectivity. While this view is perhaps valid for most things that have happened since the eighties, we cannot discount the fact that such applications were only vaguely imagined in the pre-transistor era. It was only when the transistor and the chip arrived that people dared to imagine these applications. The truth is therefore somewhere in between. Technology creates possibilities and builds expectations. Human needs act as feedback for further innovation. This nature of innovation can be glimpsed from that point in history when the digital wheelwork had been set into motion.

It was Shannon who formalized the entire discipline of digital computing and communications. He saw the parallels between Boolean logic and switching networks. At the end of a long road through which digital technology had progressed, telephone switching networks were among the first to reap the benefits. Technology thus came back a full circle back to its roots in switching. Apart from the military, the Bell System was for years the largest consumer of transistors and ICs. Transistors enabled PCM/TDM digital transmission in 1962. Over a decade later, ICs enabled electronic digital switching. Digital switching was in itself an engineering achievement involving long years of R&D. It is a story that must be told in its own proper place (Chapter 10).

Today Kilby and Noyce are recognized as inventors of the IC. If Kilby had shown how to integrate components, Noyce had shown how to interconnect them. Others had pursued similar ideas earlier but it was TI and Fairchild who took it seriously and sought to commercialize the IC. Microminiaturization was still a hot topic. Junction transistors were selling well and were essential to financial stability. It was not easy to move resources towards developing ICs. The usual objections against the IC were all valid but ICs solved the interconnection problem. Though tolerances of components within ICs were not as high as their standalone ancestors, this was overcome by moving away from analogue to digital architectures. In short, ICs were perfectly suited to the digital world. All it needed was a market for the first push. This came suddenly in 1961 when President John F. Kennedy announced America's intention to put man on the moon before the end of the decade. The Space Race was on. The Cold War was heating up. The military and NASA overnight created a market for the IC. For a few years, the only customer for the IC was the government. Cost did not bother them and this provided semiconductor companies much-needed financial support to take the IC forward.

Patrick Haggerty, who had once created a market for TI's transistors, now did the same for the IC. As President of TI, Haggerty saw that demand from the military alone would not sustain growth in the long run. He tasked his engineers, including Kilby, to design a portable calculator. The Pocketronic was launched in April 1971. Its battery ran for three hours. A keyboard took user input. A little printer gave the output on paper through thermal printing. It weighed about a kilo and cost $150. In 1972, five million of these were sold. The IC that entered in the humble calculator is today in toys, phones, digital cameras, computers, assembly line robots, satellites, cars, televisions, and every conceivable electronic device of the modern world. It is often referred to by its more popular name, the _chip_. If the chip had enabled digital technology, it in its turn was enabled by another technology that had waited patiently for far too long.

We will recall that the first transistors were conceived on the field-effect. Shockley wrote about them in the 1930s but an earlier work of 1926 can be traced to Julius Lilienfeld who suggested using copper sulphide. The research team at Bell Labs spent a couple of years just studying surface behaviour. When semiconductor surfaces presented problems, engineers moved into the bulk. Field-effect was thought infeasible and focus was on the junction transistor. Towards late fifties, oxide masking gave indications that perhaps surface problems could be overcome. In 1960, M. M. John Atalla and Dawon Kahng of Bell Labs were the first to achieve this goal, one that had evaded engineers for three decades. Based on its construction, it was named Metal-Oxide-Semiconductor FET (MOSFET). Unfortunately, the Bell System was far too committed to the junction transistor. MOS technology was also perceived as a step back to the field-effect that had been dismissed and forgotten. The first MOS prototypes were slow and unreliable. They simply could not compete against junction transistors.

Work on the MOS technology was pursued by RCA and Fairchild R&D establishments. While multiple junction transistors fabricated into single ICs were just beginning to appear on the market, RCA prototyped an IC with sixteen MOS transistors. This was a significant milestone that had came almost too easily. The fact was that MOS benefited a great deal from processes and know-how that had been gained with junction transistors. MOS didn't have to reinvent them. It simply piggybacked on the strength of junction transistors and tuned processes to its own needs. There was something superior about MOS that made it attractive.

MOS was simple to fabricate and this meant two things. It was better suited for ICs. On a single IC, one could more easily pack lots of MOS devices than junction devices. In the long run, this would lead to greater speed and performance over junction devices. Simplicity also meant lower cost of production. The real surprise came in 1963 when Frank Wanlass of Fairchild demonstrated a device that combined p-channel and n-channel MOS transistors. This was named Complementary MOS (CMOS). The beauty of CMOS technology was that a device in standby mode drew almost no power. If today we are able to carry our mobile phones and iPads for hours without charging, it is because they use CMOS-based ICs that draw very little power when in standby. With ICs based on junction devices, power dissipation was a problem. Any attempt to increase device density was accompanied by a nagging worry of package melting in the works. MOS devices, with their low power consumption, did not have this heating problem. MOS was the driving factor behind the eventual success of the IC.

MOS and the IC transformed the entire electronics industry. Old leaders fell behind and new leaders arose. At Fairchild, there was a disconnect between R&D and business units. With the focus firmly on junction transistors that were the cash cows, management did not seek to reorient the company towards MOS. In the end, Fairchild lost its competitive edge. All through the sixties it had been a leader but towards the close of that decade, its decline had commenced. Key players left the company and together they founded Intel in 1968. Intel for its part focused on MOS technology early on. Commenting on Intel, historian Ross Bassett has written,

Intel's public face has changed over time; first it was a memory company, then it was a microprocessor company, and later it called itself an Internet company. Fundamentally, however, since its second year of existence, Intel has been an MOS company.

Indeed, Intel started by making RAM memory ICs but by 1971 it had released the world's first microprocessor. It had taken a whole lot of steam and moving parts for Babbage's Difference Engine to calculate anything. Intel's 4004 microprocessor did it silently in a single slice of silicon. It worked at the level of electrons and holes. If not for input and output peripherals connected to it, the only evidence that it was doing anything came from the little heat it generated. Between the Difference Engine and the 4004, one company was caught in the sweeping changes.

Back in the sixties, IBM was a giant in computing that knew no equal rival. Within a decade its position was challenged. By the early nineties, its very survival was doubtful. IBM was a vertically integrated company, meaning that it concerned itself with all parts of the finished product. It had its own R&D. It had its own silicon fabrication facilities, which produced MOS memory ICs. These found their way into IBM computers, which the company designed and marketed on its own. These computers were sometimes sold but more often rented to customers. Sales were done through IBM's own distribution network and sales offices. With such vertical integration, IBM believed that it had greater control of the computer industry. It deluded itself into thinking that it was immune to market dynamics. After all, there were no obvious competitors to its powerful computers, which were called _mainframes_.

Though the IC was born in 1959, it was only a decade later that ICs entered IBM computers. IBM's MOS research programme focused on the more complex n-channel variant while Fairchild adopted the p-channel variant. This was one reason why IBM's research took longer and MOS entered its products quite late in the game. Being vertically integrated, IBM's MOS programme did not have the wide outlook of Intel's. While Intel produced customized designs, memories, and microprocessors, IBM produced only MOS RAM for its computers and nothing else. The rest of the computer stayed the same for years. This didn't cause serious problems in the beginning but as MOS technology matured and more devices could be packed on a single IC, powerful computers built at lesser cost emerged to challenge IBM's leadership.

Although transistors entered the first computers within a few years, it was only towards the end of the fifties that they became widespread for commercial use. IBM mainframes 1401 and 7090 that came out in this period used transistors. They were powerful but also huge. They required trained technicians and climate-controlled rooms. Transistors, as we noted earlier, had a ready market as replacements of vacuum tubes. It was soon realized that mere replacement did not represent progress. It was like powering a sixteenth-century triple-masted Spanish galleon using a nuclear reactor, things that worked together but really were not meant for each other. If computers were to tap into the possibilities of the transistor, there would have to be architectural changes. These changes took time. It explains the lag between invention and commercialization on an industrial scale.

Among the few who recognized the need for new architectures was a company founded in 1957, the Digital Equipment Corporation (DEC). Engineers at DEC realized that not everyone required powerful IBM mainframes. IBM 7090, for example, used 36-bit words but did a travel agency or a stock broker need such high computational capability? As a small start-up company, it was also not possible for DEC to be vertically integrated like IBM. This was in fact a blessing since DEC could focus on architecture and the computing core. It could leave out peripherals and specialized hardware for others to pursue. To make this possible, DEC opened up its interfaces. Manuals were printed and freely distributed. While IBM did not permit its users to modify mainframes, DEC encouraged it for its own line of products. In fact, as customers developed peripherals and gave suggestions, DEC's product line only got better. DEC's open approach allowed others to add, customize, and offer complete products and solutions. The roots of _Original Equipment Manufacturer (OEM)_ are here.

DEC's offering came to be called the _minicomputer_. It was a glamorous name that came in the age of miniskirts and the Morris Mini. Users did not feel intimidated and its appeal was greater than the mainframes. The PDP-1 was released in 1959 and one of its key innovations was _Direct Memory Access (DMA)_. DMA allowed input and output peripherals to transfer data directly to memory without involving the _Central Processing Unit (CPU)_. When the transfer was complete, an interrupt was sent to the CPU for further action. In simple terms, certain portions of system tasks were delegated and supervised only when necessary. This architectural change saved precious processing in the CPU, a change that had been aided by the use of transistors and sophisticated software.

DEC's greatest success, the PDP-8, built out of germanium transistors, was released in 1965. It was value for money. It stood on a desktop and that's all there was to it. Programming was aided with the use of the Model 33 AS from Teletype Corporation. DEC was smart enough not to develop such peripherals on its own. The beauty of Model 33 AS was that it was among the first machines to use ASCII character encoding, which is universal in today's computers. It incorporated control and escape keys, a novelty compared to older teleprinter models. The PDP-8 contribution to architectural innovation was in indirect addressing and paging. With indirect addressing, instructions pointed to locations in memory where address of data was stored rather than the data itself. Moreover, memory was divided into blocks or pages. This allowed memory to be increased even with the constraint of finite bit addressing. To be specific, PDP-8 used 12-bit instructions of which seven bits were used to address memory. Thus, each page contained 128 words but special addresses enabled the computer to select one of 32 pages and thus extend overall memory to 4096 words. It should be pointed out that the constraint of fewer bits in instructions actually pushed engineers to invent complex instructions and change the architecture.

DEC tapped a market that IBM had ignored. As such, IBM's mainframe leadership was not greatly affected. Mainframes and minicomputers complemented each other in the market place. The first signs of overlap came in the second half of the 1960s. Early minicomputers based on transistors gave way to better designs that relied on the chip. Literally dozens of new companies entered the fray because it was now a lot easier and faster to make minicomputers. Keyboards, printers, and memories were already available. All one had to do was to put them together with incremental improvements. It was only when the chip entered into minicomputers that they started to make inroads into the mainframe market. Even IBM replaced its famously successful System/360 with System/370 in 1969. The latter used the chip while the former had relied on an adaptation of micro-circuitry programmes that had preceded the invention of the chip. As for DEC, chip-based PDP-11 was released in 1970.

It is worth noting that the lag the chip faced for widespread commercial adoption was similar to the lag the transistor had faced a decade earlier. To think that the chip changed the world in a flash is to be naïve to the realities of technology. Every new technology requires time to mature in terms processes and practices. At the same time, prevailing technology undergoes continuous improvements towards extended life. Economic models dictate that sometimes it is better to squeeze the last drop from an old technology than invest upfront in a new technology. History has thrown up lots of examples. Even when electricity became available, it was long before factory floors moved away from steam power. Even when PCM/TDM came into operation, some old H-carrier systems of the analogue world were still in use. Even when optic fibres promise blazingly fast bit rates, most of the world continues to use humble telephone lines of copper.

Thus the years when the transistor reached its peak, the chip was still in adolescence. Growth and maturity came together for the chip. Speed and reliability improved over time. The chip entered computers and communication systems. The Apollo space programme and the Minuteman missile programmes used it extensively. Channel coding and transmission of images from distant planets could not have happened without the chip. It was the chip and the minicomputer that were at the forefront of the Internet's beginnings.

Though the chip was invented in 1959, the microprocessor came only in 1971. Two things had to fall into place to realize the microprocessor. MOS technology had to reach maturity. Computer architecture had to move from a macro scale to a micro scale. This required a new breed of designers who could think in silicon. It was only then that the microprocessor could come into existence. If the fifties saw the rise of the transistor, the sixties saw the rise of the chip. The seventies belonged to the microprocessor. Let us recall that when TI released the first electronic calculator in 1971 based on the chip, Intel launched the microprocessor the same year. The microprocessor, sometimes called "computer-on-a-chip," brought a new wave of development but it would take more than just hardware to take this to fruition.

#  1001 The Goodness of Being Soft

**Computer engineers are** today surprised when they learn that the world's first computer programmer was a woman. If more and more women are entering the field of programming today, they are in fact following an old tradition. If there are any women still doubtful of their abilities as programmers, they can take inspiration from the historic precedent set in the mid-nineteenth century, an era in which women were even denied a university education.

Lady Ada, Countess of Lovelace, was born in 1816 as Augusta Ada Byron, the daughter of now famous English romantic poet Lord Byron. The young Ada never saw her scandalous father, who had divorced her mother and fled to the Continent, never to return to England. If there was any poetry in her blood, it was not the poetry of words but of numbers and logic. The science of the day excited her. Electricity and electromagnetism were new phenomena. Although oil portraits were popular, she sat for daguerreotypes from which modern film photography later evolved. Augustus de Morgan tutored her by post. Most certainly her interest in logic came from this source. Sometime in 1833, she visited Charles Babbage and saw for the first time the Difference Engine that stood in his salon. She was impressed but also inquisitive. Meanwhile, Babbage had already commenced on the Analytical Engine.

By 1840, the government was losing confidence on Babbage. So Babbage travelled to Turin and gave a talk on his creation. One of those present in the audience, Luigi Menabrea, wrote a detailed paper on the Analytical Engine. This paper broke new ground by talking less about the hardware and much more about what Ada later called "the science of operations." It explained to the reader how even complex problems could be broken down into elementary operations. This became the starting point for Ada's own contribution to the subject. Her translation of the paper appeared in 1842. It included copious notes that ran to three times the paper's original length. Her notes took up more difficult examples—generic polynomials, multiplications of trigonometric functions, integration, differentiation, and Bernoulli numbers. Ada showed how these could be reduced to a sequence of operations to be programmed as input into the machine. While humans can live with ambiguity and still maintain their sanity, precision is really the essence of machines. Machines need to be told exactly what to do. By necessity, Ada gave precise definitions of concepts that Menabrea had introduced in his paper. She defined _operations_ , _objects operated upon_ , _results of operations_ , _numerical data_ , and _symbolic data_.

The Analytical Engine did not care so much about the problem itself. Everything it did was on the level of symbols. It did not _understand_ addition, multiplication, division, or subtraction; but once these operations were reduced to certain symbols, it knew what to do. Everything it did was mechanical and repetitive. The machine saw no meaning in whatever it did. Meaning came by way of human interpretation of the problem and the results. Even its name was really a reflection of the human mind and not of the machine's innate ability. Ada pointed out that,

we may consider the engine as the _material and mechanical representative_ of analysis.... The Analytical Engine has no pretensions whatever to _originate_ anything. It can do whatever we _know how to order it_ to perform. It can _follow_ analysis; but it has no power of _anticipating_ any analytical relations or truths. Its province is to assist us in making _available_ what we are already acquainted with.

Obviously, someone had to tell the machine what to do. Moreover, this had to be done not in conventional languages such as English or Latin but in symbols that the machine understood. If decades later switching networks used the Boolean logic to perform what they were designed to do, the Analytical Engine operated on decimal numbering and symbols that stood for arithmetic operations. So, programming the machine meant translating the problem into sequences of holes that the machine could interpret according to rules defined a priori. These rules taken together made up a specific _machine language_. This was the language that the machine understood. Its alphabet was binary, a one and a zero. Its words were sequences of ones and zeros. Its grammar was combinations of bit sequences that defined operations. Language was indeed an apt name for the manner in which the machine made sense of bit sequences.

Each machine type had its own language. Reading Menabrea's paper, it occurred to Ada that one could talk about programming without worrying about the mechanical construction of the machine. One needed to be aware of the machine's limitations in terms of memory or speed; but beyond that, programs could be written in abstraction as a sequence of holes on cards as determined by the machine language. If Babbage had taken inspiration from Jacquard, Ada in turn derived hers from the elegance of the Analytical Engine. If Babbage was its architect, Ada was its programmer.

In Ada's own time, the very concept of programming was new. As yet, there was no clear understanding of what exactly a programmer was supposed to do. Was she meant to invent algorithms as solutions to problems? Was she to encode these into punched cards? Was she to concern herself with the intricacies of hardware so as to write better algorithms? Was she to do some or all of these tasks in isolation or in cooperation with those who designed the hardware? Concepts of hardware and software were just taking shape. Perhaps it was too premature to even ask these questions let alone answer them.

The idea that was beginning to emerge was that a computing machine could be tasked to solve any desired problem that need not necessarily be of a mathematical nature. It may be a biological problem that studied the diversity of wildlife in the Amazon. The computer could still do it by reducing the problem into symbols and logic operations connecting those symbols. The ideas of Babbage, annotated and explained by Ada, found their torchbearer in the 1930s when young and brilliant Englishman, Alan Turing, began to consider the problem from a new angle.

Turing had little interest in building a computing machine. He was no engineer and his scientific mind focused on deeper questions about computing. He asked himself if there were numbers that were not computable in the sense that no machine with finite means could arrive at the decimal expansion of the number. In a paper of 1936, he showed that indeed there were such incomputable numbers. In short, computers had limitations. If ever they attempted to tackle incomputable numbers they ended up crunching symbols endlessly without reaching a conclusion. One of the classic problems in this category is to figure out if a sequence of symbols is random. When such a sequence is truly random, no machine can offer definite proof that it is random. At best, the machine can claim that the sequence is probably random. While Turing's incomputable numbers were the essence of his paper, the theoretical machines he introduced in the process were closer in spirit to Babbage's machines.

Turing conceived of a machine that took one of many finite states and processed a finite number of symbols. The same symbols served as input and output. These essentially defined the language alphabet of the machine. Symbols were recorded on a paper tape that could be as long as one desired. The machine read one symbol at a time. The tape could be stepped to the right or left one symbol at a time. The machine had the ability to erase symbols on tape and record new symbols. A combination of states and symbols defined the machine's behaviour. Turing showed that any problem could be reduced to states, symbols, and rules of behaviour. It was exactly this kind of abstraction that Babbage had conceived and Turing formalized the same in his analysis of incomputable numbers.

Though the _Turing Machine (TM)_ had no ability to read multiple symbols at any given moment, it remembered what it read in its internal state. Even if there were only finite number of states, any computation was possible by processing more symbols. Even if there were only finite number of symbols, symbols could be combined to construct a larger symbol space, just as in binary notation any number can be represented with just ones and zeros. In many ways, Turing was inspired by Babbage who had explained the trade-off between space and time in the design of his own computing machines. Turing's own work in turn inspired von Neumann. Two decades after Turing, his theoretical machine acquired more concrete forms in the Mealy and Moore machines. Though Turing Machines are abstract entities, they unconsciously found real applications in early computers and switching networks.

The problem with a Turing Machine was that it was fixed to a specific problem, like the Difference Engine. Turing easily expanded it to what we today call the _Universal Turing Machine (UTM)_. The UTM did not tie itself down to specific rules or behaviour. Its few rules were simple and of a wider scope. The UTM first read the rules of another Turing Machine. It then emulated that Turing Machine using the latter's rules. In this manner, the UTM could emulate every other Turing Machine. The UTM was the equivalent of the Analytical Engine. The UTM inherited the program of another machine and behaved as if it were that other machine. In the process, Babbage, Ada, and Turing were all saying the same thing—a machine could be programmed in such a way that its behaviour was not tied to its operating parts but really to the program that controlled it. In fact, the program did not tie the machine to anything. If anything, programs brought flexibility and freedom. Programs could be read, acted upon, and forgotten. New programs could enter the machine as required. The machine remained the same, did the same routine operations at the basic level, but at a higher level had the power to solve a diverse range of computational problems.

Despite this early realization, the first computers did not separate hardware and software. There was no concept of software programming. The Differential Analyzer of Bush, being an analogue computer, could not be programmed. The ENIAC was designed for a specific problem. It had to be rewired by hand if a new problem came up. Programming the ENIAC meant hardware reconfiguration. It was not that engineers did not understand the power of Babbage's Analytical Engine or Turing's UTM. Making a generic machine was really a difficult design issue. No one knew how to design one. So, computers took a natural and slow path of evolution. On hindsight, this seems to have been the right approach. Had engineers followed in the footsteps of Babbage, they might have ended up with another half-finished machine that was too complex for its age.

The evolution that computers took was from analogue to digital, from decimal to binary, from fixed to programmable machines. The architecture of von Neumann entered computers in the late 1940s. Once these fundamentals were established, programming started to define itself. Programming was separated from computer design. Hardware designers sought ways to make computers faster and more reliable using the best that electronics could offer. They also designed the computer's language. The computer obeyed instructions written in this language. Programmers for their part sought to reduce any given problem to instructions in this language. Translating a problem to machine language was a difficult task. It was true that electronic computers were more reliable than human computers but a human programmer made his mistakes too. Wrong programming led to wrong results. Moreover, every machine had a language of its own. If they betrayed any similarities, these could be traced to the blueprint drawn up by von Neumann.

Two solutions arose to alleviate programming. Programmers started using what came to be known as _flowcharts_. The sequence that a program took was drawn up as discrete steps connected by arrows that indicated the path of the sequence. There were initializations. There were computational tasks. There were steps that took decisions. The flowchart was populated with symbols at every step. These symbols indicated variables, operations, input, and output. Essentially, this visual display of the program's flow clarified the thought process. It enabled programmers to correct or improve their work before it was submitted to the computer for execution. Beyond programming, flowcharts are now standard tools in process analysis that may have little to do with computers.

The second solution was just as important. If programmers found the language of ones and zeros difficult to grasp, it made sense to introduce a layer of abstraction between machine world and human world. _Assembly language_ fulfilled this purpose. Programmers wrote down basic operations using acronyms. For example, MOV indicated move or copy of data from one location to another. ADD indicated addition of two numbers. JMP indicated a jump to another instruction few places away just as a Turing Machine moved left or right on its paper tape. As for data, these could be specific numbers ( _literals_ ) or numbers picked up from a register variable. A combination of acronyms, symbols, and literals made up assembly language. Of course, the machine did not understand this new language. It was stubborn as ever, refusing to learn new things. So programmers who designed assembly language also wrote _assemblers_. These were automated translators that took assembly language instructions written by humans and converted them into equivalent machine language. Parallels can be drawn from Menabrea's comments on the Analytical Engine,

When once the engine shall have been constructed, the difficulty will be reduced to the making out of the cards; but as these are merely the translation of algebraical formulae, it will, by means of some simple notations, be easy to consign the execution of them to a workman. Thus the whole intellectual labour will be limited to the preparation of the formulae, which must be adapted for calculation by the engine.

Putting these into the context of the 1950s, the workman that Menabrea referred to was the assembler. The formulae were algorithms or methods of computation. The person who wrote down these formulae was the programmer. She was the one who had an insight into the problem. Adaptation of such formulae for the engine meant writing down computer instructions in assembly language.

The necessity of writing assemblers was extra work. It was a catch-22 situation. Assembly languages and assemblers were intended to simplify programming but the assembler itself had to be written in machine language. The real gain came because assemblers needed to be written only once. Any program written in assembly language could then be automatically translated into machine language. Interestingly, the process of creating an assembler was like bootstrapping oneself from machine language and quickly moving on to operate at assembly language. The first assembler had to be written in machine language. Once this was up and running, future assemblers could be written in assembly language. Assemblers were like construction cranes that erected themselves level by level from the ground up. Essentially, assemblers were also programs, albeit special ones, that needed to be converted to machine language. Though computers were unwilling to learn any assembly language, they could be commanded using machine language to do the necessary translation. Assembly languages arrived on the computing scene in early fifties with Alick Glennie writing the first of these instructions for the Manchester Mark I. Assembly languages were crucial for the growth of computing through the fifties.

The earliest of computers were used for two things—data processing and scientific work. The former can be traced to Hollerith, who processed US census data in 1890. The latter had its beginnings in the Difference Engine, the Differential Analyzer, the Colossus, and the ENIAC. These cracked ciphers, mapped missile trajectories, and ascertained the stability of nuclear chain reactions. Universities that had sufficient capability and funding built their own machines for the purpose of scientific computations. When the need arose to share programs or for programmers to move to new jobs, the limitations of assembly languages became apparent. Every assembly language was tied to a specific machine language. So a program written for one machine could not be run on another. Programmers who had worked on a particular machine for months had to learn a new set of assembly language instructions when they moved to work on a different machine. What was needed was a language that could be standardized across all machines.

To standardize a single assembly language for all machines was not a realizable goal. Every machine was different. Assembly language instructions were far too diverse and far too close to the hardware architecture. The solution that evolved was another layer of abstraction above assembly languages. These came to be called _high-level languages_. Instructions in these languages made no references to hardware whatsoever. To add two numbers _a_ and _b_ , they required the programmer to simply write ( _a_ \+ _b_ ). This mathematical notation was far more universal than using the assembly language acronym ADD, which in its details differed from machine to machine. The added benefit was that programs were easier to write since these new languages were closer to humans. Among the early languages that appeared through the fifties were FORTRAN for scientific purposes, COBOL for data processing, LISP for artificial intelligence, and ALGOL that evolved from FORTRAN.

If ever there was an attempt to standardize a single high-level language, it was stillborn. Today we have literally dozens of languages in existence. A decent programmer has a good grasp of half a dozen languages. High-level languages evolved simultaneously and often independently through the fifties. It became clear that no single language can fulfil all purposes of computation. Mathematical equations were easier to write in FORTRAN than in COBOL. Sorting, searching, or indexing data was easier in COBOL than in FORTRAN. To have more than one language might seem like a step backward to the days of multiple assembly languages but high-level languages had standard rules of syntax. So long as the programmer followed these rules, she could run her program on any machine.

Human languages have clear syntax and grammar. If programming languages were to be unambiguous, they too needed formal syntax. This formalization was born in two places rather independently—John Backus at IBM, who had also invented FORTRAN and Peter Naur of Denmark. Their formalization of syntax came be called the _Backus-Naur Form (BNF)_. High-level languages had immense power. With assembly languages, generally a single instruction translated to a single piece of machine code. With high-level languages, a single instruction was a short form for a sequence of machine operations. Basic elements of the language could be used to design complex expressions. To avoid misinterpretation, unambiguous syntax was necessary. For example, how was one to interpret _A_ * _B_ + _C_ / _D_? Without clear rules, this could be interpreted in many ways, each leading to possibly a different result:

(A*B) + (C/D)

(A) * (B+C)/D

(A * (B+C))/D

((A*B) + C)/D

The first language to use formal and unambiguous syntax was ALGOL, born in 1968. ALGOL drew inspiration from FORTRAN but most of all, it was instrumental in the formation of many popular languages of the future—SIMULA, PASCAL, and C. Interestingly, the BNF was later discovered to be hardly a novelty of twentieth-century computing. Indian grammarian, Pānini had devised similar rules to codify Sanskrit grammar two millennia earlier. In his classic text _Ashtādhyāyī_ , he had described how words could be combined unambiguously into expressions. Sanskrit was a language of fluid expressions and continuous sounds. Unlike English or Latin, word boundaries were quite rare. Words merged seamlessly to make long uninterrupted poetic sounds. It was indeed the language of poetry but to make sense of it, to know where a word ended and another began, a formal notation of grammar was needed. Pānini provided that. This probably inspired computer scientists to some extent but only after they had discovered the BNF on their own. There was even a suggestion to rename BNF to "Pānini-Backus Form."

A Sample of ALGOL 60's Formal Grammar

This sample shows definitions that contain valid ranges of values. It shows how one entity is related to another, the order in which they should be interpreted, and the set of operations allowed on such entities. Such formalism is necessary for machines to understand programs in an unambiguous manner.

Just like assemblers, high-level languages needed their own tools to do the translation to machine language. These came to be called _compilers_. Compilers benefit directly from the formalization of language syntax. Grace Hopper, a mathematician working with Remington Rand, was among the first to address the programming problem. In 1951, she started working on a compiler for a language she had created and named A-0. When she presented her ideas at a conference the next year, she titled her talk "The Education of a Computer." It suggested that the computer was being taught to understand a high-level language and generate machine language code. Other scientists found it difficult to take Hopper seriously. The consensus was that computers could be tasked to do mathematical operations but not code translations. In fact, with the A-0, Hopper had written the world's first compiler. All the computer had to do was to run it. Though A-0 did not catch on, Hopper later played an important role in the creation of COBOL.

A FORTRAN compiler might have many variants. One variant would do the compilation for an IBM machine, another would do the same for a DEC machine. COBOL, which had been promoted by the US Department of Defense, gives us an early example of program portability and the power of high-level languages. With minimal changes, it was demonstrated in December 1960 that the same program ran on both UNIVAC II and RCA 501, two very different computers. In time, compilers evolved to support _cross-compilation_. What this meant was that the compiler running on an IBM machine could look at COBOL code and generate machine code for any chosen computer. High-level languages were indeed a class apart from assembly languages. They liberated the programmer from the messy details of hardware. The programmer could focus on the problem at hand. The idealized dream of Lady Ada had finally came to fruition with the birth of high-level languages.

Assemblers and Compilers

This example shows high-level C code and its equivalent in assembly and machine codes. It can be seen that high-level C code is easier to read and understand. The other two are more specific to the hardware architecture. Assemblers and compilers do the necessary translations to reconcile the human need for programming convenience and the machine need for simple design.

What had really happened was a slow transformation of boundaries and responsibilities. In the beginning, a problem was tied to the machine. The idea of a programmable computer separated the two. Then came the separation of machine language from programming language. Programming languages were initially tied to specific machines but these too evolved to become independent of hardware. So it was no longer about program independence but about independence of methods of programming. No doubt this independence came at a price. Engineers had to write assemblers and compilers, both of which required new programming skills and inventiveness. While these few engineers rolled up their sleeves and got down into the trenches, their difficult work was for the greater good. Once they had done their jobs, every other programmer benefited. Compilers and assemblers became essential tools in a programmer's kit. Engineering in many ways is first a creation of tools before the products themselves can be built.

Just as minicomputers made their foray into the industry, bringing with them a culture of open interfaces and third-party peripherals, IBM reinvented itself with a product so revolutionary that it kept the company in the lead for the rest of the decade. Released in 1964 as _System/360_ , it was built out of solid-state electronics miniaturized using IBM's proprietary _Solid Logic Technology (SLT)_. Memories were based on magnetic tapes and ferrite cores. Punched cards and paper tapes had entered their last phase of life. These changes gave System/360 speed, performance, and reliability that customers desired. The real revolution was that it was not just one computer but a family of computers. Computers were fully compatible with one another. Programs written for one machine could be executed on any other machine of the System/360 family. So a small company could start with a small computer and move to a powerful one as its processing needs grew.

While high-level languages would have solved the program portability problem elegantly without requiring compatible hardware, those were early days when compilers were not mature enough to deliver what they promised. FORTRAN was IBM's creation and there were statements in the language that were specific to IBM's Model 704. As high-level languages evolved much too frequently, compilers too had to keep apace. It certainly helped to have compatible computers and peripherals to ease the job of the programmer.

It had taken IBM two full years and five million dollars to develop System/360. It was a risky venture from the start for it would force every customer to abandon IBM's current products and move to System/360. There was also no clear indication that the industry placed importance on compatibility. In the end, System/360 turned out to be a success that propelled IBM to a global name in computing. With large delays and increasing costs, it was not an overnight success. After its launch, it took two more years of computer engineering to make it a commercial success. IBM's CEO Thomas Watson Jr later commented,

In order for the System/360 to have a consistent personality, hundreds of programmers had to write millions of lines of computer code. Nobody had ever tackled that complex a programming job, and the engineers were under great pressure to get it done.... System/360 was the biggest, riskiest decision I ever made, and I agonized about it for weeks, but deep down I believed there was nothing IBM couldn't do.

System/360 was so good that it rewrote the design rules of computing. Because the design of System/360 was modular, parts of the system interfaced with one another in well-defined ways. It was therefore possible to connect a printer to any computer. Likewise, magnetic tapes for storage could easily be moved from one computer to another as the need arose. Users wrote programs to get their work done but there were some tasks so fundamental that it made sense to provide them in advance along with the hardware. Some of these were built into the hardware and went by the name _microcode_. Otherwise, most of it was compiled code. For example, some programs controlled the execution of other programs. Others simplified the interfacing between computers and peripherals, handling essential tasks such as reading from magnetic tapes or sending data to printers. If a program did something illegal, such as division by zero, the system caught this error. These specialized programs came to be collectively called system software but more commonly known today as the _Operating System (OS)_. The OS relieved users from the base details of programming. They sat between hardware and user programs. They provided a framework for program execution so that users could focus on tasks that concerned them directly.

Subsequent evolution of the operating system came via batch processing and time-sharing. Computers being expensive were rarely dedicated to a single user. Multiple users submitted their jobs to a single computer through a common queue. When the computer finished one job, it moved on to the next one in the queue. System software coordinated these actions of batch processing—managing the queue, scheduling each job in its turn, catching errors when they occurred, and even recovering from errors as Hamming had shown in 1948. Time-sharing was more complex than batch processing.

Time-sharing systems evolved in the 1960s to make computers more interactive. This came naturally in an era when focus was slowly shifting from centralized mainframes to leaner minicomputers that depended on peripherals to add value. But the first time-sharing system ran on an IBM 709 mainframe. John McCarthy of MIT first conceived the idea of many users sharing a single system apparently simultaneously. Taking inspiration from McCarthy's proposal, Fernando Corbató created in 1961 the first version of such a system. It soon evolved into a full-fledged system under the name Compatible Time-Sharing System (CTSS).

With batch processing, the programmer submitted his job and could do nothing until the results were printed out. Errors in the code meant resubmission and recomputation from scratch. It was similar to a lecturer running through an entire set of slides before realizing that his students had not understood a word. Would it not be better to solicit feedback at every stage of the computation process? Time-sharing systems did this. They gave all users equal computer time just as a PCM/TDM system multiplexed speech samples from many users in the same transmission frame. Specialized programs called _schedulers_ were responsible for managing all active jobs. A user program might do a few calculations and then wait for the user to give some input. While the program waited for such an input, the scheduler picked up another active job and ran it. Even if programs were not written to be interactive, the scheduler ensured that no single program hogged the system. So if there were ten active programs in the system, all ten of them were given equal slices of time for their computations. Yes, this made each program ten times slower but the essential point was that all ten of them were now made interactive.

System/360 came with its own operating system named _OS/360_. It was not as much a success as IBM had hoped. FORTRAN and COBOL being popular, users started to write their own programs in these languages for System/360. Moreover, System/360 had been conceived in the culture of mainframes. Time-sharing was not its strength. DEC's PDP-10 on the other hand had been designed as a time-sharing system. Time-sharing, a novel concept in computer design, turned out to be an evil for IBM. Because of System/360's modular design, it was possible for enterprising start-ups to come up with peripherals that connected to System/360 computers. With focus shifting towards peripherals and user interactively, IBM attempted to keep its interfaces secret. What happened next took IBM on a new path that can be seen only as progress.

IBM had always offered much of its software free along with the computers it sold. Together with IBM's practices to lock out competitors from interfacing to the System/360, this invited the Department of Justice to consider antitrust action. In order to pre-empt such an action, for the first time in its entire history, IBM unbundled its software from the hardware. This came in July 1969, the very month that man landed on the moon. Software, which for decades had played a secondary role in the business of computers, finally acquired commercial value of its own. In the end, IBM had not made a revolutionary decision out of foresight or vision of the future. It had simply reacted to the rising presence of software first launched by minicomputers. It had been motivated by the looming shadow of the government tightening the controls.

The very idea of software emerged only in the sixties. Even then, there was no consensus or precise definition of the term. For some, software was anything that wasn't hardware. For others, it was primarily system software. For some, it was all about programming software that included compilers and assemblers. For others, it was something that interfaced hardware and data. What was really clear was the growing importance of software. User requirements were becoming increasingly demanding and systems were therefore becoming increasingly complex. The transistor and the chip had taken hardware to another level of performance and sophistication. They gave license to higher expectations. There was a greater need for skilled programmers than ever before. Software needed to catch up.

The coming of high-level programming languages facilitated rapid development. Concerns about compilers that traded simplicity against reduced performance were put aside when hardware afforded that luxury. The need of the hour was to simplify methods of programming. Through the sixties, more and more of a project's budget went into software. Hardware was perceived as rigid and software as flexible. Faulty hardware had to be either repaired or discarded. The essence of software was upgrade. Hardware provided the platform but it was software that exercised it to do many useful things. Unlike hardware, software was not subject to wear and tear. If at all software failed after a decade of continuous operation, it was really a fault in design or coding. Software failure was never a case of aging. Software indeed was appealing and businesses did not mind putting more of their resources into it.

The early business of selling software had many facets to it. For some it was about selling system software such as file management and report generation utilities. These attempted to fill voids that had been left out by the operating system supplied by the equipment vendor. The Mark VI file management system from Informatics came out in 1967 as packaged software that the industry could use. There were others who looked at user applications instead. The notion that applications could be considered as software entered quite late in the programming consciousness. For most purposes, applications were either developed in-house or supplied by the equipment vendor. Applications generally took the form of customized solutions and consulting services. When IBM teamed up with American Airlines to create a computerized airline reservation system named SABRE in the early sixties, it was following this model. IBM created a customized operating system for the purpose. There were some who even believed that every industry needed customized solutions from the level of hardware upwards. If airlines needed customized solutions, so did banking, retailing, or healthcare.

Companies that offered such consulting services and created customized programs realized that many of these could be reused across industries. It made sense to package these programs and sell them as readymade solutions. At times, these were offered as complete solutions that included operations and maintenance. They were also offered as simply software to run on customer's hardware and operated by customer's personnel. General Electric sold packages to banks. BBN, a company that would soon play a crucial role in the creation of ARPANET, sold packages to architects. Application software thus had its beginnings in the late sixties. Something as simple as a payroll processing software had a huge market—there were hundreds of companies in existence and so long as they paid their employees, the software got itself many potential customers.

It might seem on the surface that software of the sixties had a dramatic growth that came with maturity. One part of this was true, the other part was not. While programming languages had come some distance, programmers had no guidelines as to what made good programs. Software projects often extended beyond deadlines. Programs had bugs and were unreliable. There were no recognized practices to effectively test software. Projects that had taken months to reach completion churned out disappointing software. The software might contain a dozen interesting features but lack essentials that users really needed. These problems found expression in a conference of 1968 organized in Germany by NATO. For the first time, the term _software engineering_ was used. Software had always been regarded rather lightly. If a civil engineer made a mistake, a bridge might collapse, endangering lives. If a programmer made a mistake, the dangers were not obvious. Software itself was intangible. It existed only in the arrangement of bits on magnetic tapes. As such, dangers were intangible too.

Software brought flexibility to change things quickly. If something went wrong, one did not have to wait for replacement of parts or rewiring of patch cables. The engineer could make changes quickly, compile the program, and run it again. Perhaps it was this that prompted programmers to take a relaxed attitude towards programming. An enthusiastic programmer might make last-minute changes to solve a particular problem but she might unknowingly introduce new problems. Quality checks on software did not exist. Software engineering changed all that. The very word engineering came with its connotations of discipline. This discipline included plans, forecasts, reports, and reviews. Software was not just a piece of code but really a process that started at design stage and continued right into actual deployment at customer premises. Documentation was part of the process. The need for rapid prototyping was no excuse to skip essential steps of this process. One such step was to gather clear and detailed requirements of what the customer wanted. It was no good creating a complex masterpiece when all that the customer needed was a simple solution suited to his limited budget. Software engineering became a major industry concern all through the 1970s.

Surprisingly, software engineering started with quite a different focus. In 1968, Edsger Dijkstra published a short paper titled "Go To Statement Considered Harmful." In many high-level languages "go to" was a statement that allowed the CPU to move from one section of the code to another. For programmers who had been initiated into programming through assembly languages, the use of "go to" was a natural consequence of habitual use of "jump" statements in assembly languages. These jumps enabled one to implement decisions, branches, and exit conditions. The problem with "go to" was that it made code confusing. Later updates to the code invariably introduced bugs. Years later, computer engineers devised a term for such unstructured code: _spaghetti code_. Dijkstra himself wrote,

For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of "go to" statements in the programs they produce.... our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.

Dijkstra recommended alternatives including the "case" statement that preserved the correspondence between code and process. More importantly, the issue was much broader than being pedantic about a single statement. The message was to write structured programs. Elements of good programming practices evolved hand in hand with evolution of languages. For example, initial versions of FORTRAN allowed only six characters for variable names. To add two numbers, one might usually write _C_ = _A_ \+ _B_. COBOL allowed long descriptive names. So one could write _ADD INTEREST TO PRINCIPAL GIVING TOTAL_. Such descriptive syntax made a lot of sense since programs written by one programmer could easily be understood by another. The chance of making mistakes was reduced. Despite such facilities, many programmers continued to use cryptic symbols and acronyms that only they understood.

Spaghetti Code and Structured Code

(a) Spaghetti code contains frequent program jumps via "go to" statements. Program analysis and maintenance becomes difficult. (b) Structured code brings order and neatness to a program. Code branches are neatly indented, which brings a visual structure to program flow.

Many people in the industry did not even acknowledge that there was a software crisis. The failure of IBM's OS/360 was an example but not everyone was convinced that it was a software failure. Some saw the true causes as organizational and systemic. If ever there was a crisis as noted in the conference of 1968, the clarion call for change was raised not by engineers and project managers. The call came from researchers and computer scientists. Dijkstra and Naur were both computer scientists. Their approach to structured programming was theoretical and mathematical. Structured programming meant building up complex program flow by combining a few well-understood sequential instructions. It meant the use of abstractions in design and analysis. Indeed, it was important to separate software design from the writing of code. Also, rather than testing software to reveal bugs, the proper approach would be to verify the correctness of a program by placing checks along the way. Formal proofs of program correctness were advocated. Those in industry did not care much for what the academicians thought. Widespread use of structured programming took a lot longer than expected.

One immediate influence of structured programming was seen in the creation of Pascal, released in 1970. The language was named in honour of Blaise Pascal who had invented an early adding machine in the seventeenth century. Understandably, the language Pascal was created within the academia as a tool for teaching programming to students. Its inventor was Niklaus Wirth at the Swiss Federal Institute of Technology in Zurich. Pascal had capabilities normally regarded as unnecessary by industry. It enabled automatic checking of program correctness through the use of _assertions_. It had the "case" statement for structured branching. Variables could be created and destroyed on the fly during program execution. The language also had a small footprint that made it a suitable match for minicomputers. Its designers also developed debugging and editing tools for the language. Pascal's success was short-lived. It had one major fault in that there was no way to define arrays of variables. Its original creator attempted to better Pascal by introducing a new language named Modula. By then, another new language had arrived quietly on the scene and taken the world by storm.

Faced with the deficiencies of existing operating systems, Ken Thompson of Bell Labs got hold of a PDP-7 minicomputer and set out to create a more amenable environment for programmers. His work caught the interest of others at Bell Labs, particularly Dennis Ritchie. From their efforts was born a new operating system named _UNIX_. The earliest version of UNIX was up and running by 1970. From the beginning, UNIX was designed as a time-sharing system. The design was minimal and elegant. The desire to conserve memory led to a better design. Ritchie later commented that the design philosophy was "salvation through suffering."

UNIX introduced hierarchical file systems of files, folders, sub-folders, and links. Both files and devices were handled in similar ways from the perspective of the programmer. A layer of abstraction called the _shell_ was introduced to enable users to interact with the operating system. Rather than build large complex system utilities, the approach was to built many small utilities that cooperated together through standard interfaces. Each utility did a single specific task and did it very well. So it was possible to chain together many simple commands to create complex behaviour. In the days before the coming of the chip, engineers and system designers had sweated over the hardware interconnection problem. In the realm of software, interfacing different components was a breeze. At the most, they had to agree on low-level semantics such as character encoding. Otherwise, it was just bits coming in and out of interfaces. UNIX designers knew this and therefore focused on the design of interfaces before delving into the components themselves. This made UNIX portable and powerful. It was possible for users to replace components easily if ever they wanted to do so. As successful as UNIX was, Ritchie gave its predecessors due credit,

The success of UNIX lies not so much in new inventions but rather in the full exploitation of a carefully selected set of fertile ideas, and especially in showing that they can be keys to the implementation of a small yet powerful operating system.

Modularity and Interfacing in UNIX

Given a list of words, it is required to get a sorted list of unique words that start with _p_. One can do this easily in UNIX by using in tandem many small and efficient programs, rather than custom-build a large complex program. In this example, _grep_ does filtering, _sort_ does alphabetical sorting, and _uniq_ removes duplicates.

Initial versions of UNIX were written in assembly. Dennis Ritchie took inspiration from an existing language named _B_ and started working on a language that came to be called _C_. In 1973, most of UNIX was rewritten in C, except for those parts that needed efficiency or direct hardware access. Two things made C successful. It was a high-level language with powerful syntax that appealed to programmers. It also gave access to low-level operations. It was possible to access memory addresses directly and change data at the level of bits. Programmers loved it. Secondly, AT&T was a telephone company. An operating system such as UNIX was not directly relevant to its business and antitrust regulations meant that it couldn't sell UNIX to anyone. So from the start, UNIX was available to the industry for a nominal fee. In universities, UNIX found the perfect environment. Students could use the published interfaces and create new tools. Best of all, UNIX code was shared so that other programmers could point out mistakes and contribute to the development.

Since UNIX was a success, programmers were tempted to create new programs in C for UNIX and other operating systems. Even Pascal programmers started making the switch. Even today, for reasons of performance and compactness, C is the preferred language in many small microprocessor-based systems. Examples of such systems are data modems, washing machines, handheld gaming devices, and digital cameras. For large projects, C is not the ideal candidate. The growing popularity of C through the seventies turned out to be a step backward for structured programming. While C offered facilities for structured programming, it did not enforce them. It was possible to organize data into neat structures but the language gave facility to access them in an unstructured manner. The language did not care much about data types. So a programmer could take a real number and assign it to an integer. Programmers saw these as useful and even powerful features. The same features made C a difficult language to master. Those who had been voicing the need for structured programming as a solution for the software crisis were disillusioned by the path the industry had taken. At a lecture of 1978, R. W. Floyd, who had earlier introduced assertions into programming, commented that after ten years of little progress against the software crisis, "software depression" was a more appropriate term.

If ever industry experts talked about software crisis, they had a different notion of it. How was software to be marketed independent of hardware? How were corporates to train their marketing departments to sell software? Another important concern was protecting software. Copying and replicating hardware was not easy, particularly for mainframes. Copying software was trivial. The customer might purchase a single license and then distribute multiple copies in an undisclosed manner. The situation got worse if source code (as opposed to compiled machine code) got leaked. _Source code_ refers to code originally written by the programmer, usually in a high-level language. Access to source code implies that program can be easily modified, tweaked, and improved. Those who did this without the permission of original authors, might even market them as their own. In the US, the first software codes were registered for copyright protection in 1964, but it wasn't until 1980 that copyright law was ratified. Meanwhile, the original goal of software engineering was reinterpreted. It became synonymous with management tools and practices. It was about process maturity with necessary checks and balances. Essentially, software engineering was an earnest attempt by the industry to elevate programming from the level of ad hoc craftsmanship to process-oriented engineering.

A broad view of early software evolution informs us that in the sixties focus was on small programs with simple control flows. As software complexity grew, so did the problems. New languages were born in an effort to make programming easier. Each new language benefited from the experiences of those gone before. Standardization of languages started in this era. From late sixties through the seventies, when languages improved, there arose an equal focus on algorithms. In other words, methods of programming became important. Researchers looked for ways to minimize the use of memory or solve problems faster. This was symbolically important for computer engineering. The realization was that hardware alone did not determine performance. Even if the transistor and the chip had brought gains in hardware performance, system performance was finally determined by software running on it. Hardware provided the baseline but it was up to software to make use of it intelligently. A badly written program might do a lot of work but not get any work done.

It was C. A. R. Hoare, a British researcher working for the National Physical Laboratory, who first published one of the famous algorithms that started the trend towards more efficient software. The task at hand was automatic language translation from Russian to English. An entire Russian to English dictionary was stored alphabetically on magnetic tape. The computer looked up the dictionary and did the translation word by word. Since access to magnetic tape is sequential, the computer had to frequently move forward and backward along the tape. Translating a sentence of even a few words took a long time. It occurred to Hoare that if words in the sentence were first sorted in alphabetical order, the computer would have to make only a single pass through the tape. This prompted him to think of efficient methods of sorting, which led to the invention of _Quicksort_ algorithm.

Quicksort is an example of classic recursive paradigm in programming. Programs are generally implemented as code blocks, a set of instructions that work together towards a specific purpose. When a code block is defined explicitly, it facilitates code reuse. So the programmer can call this code from many places in the program without repeating herself. This saves memory, reduces programming effort, improves code maintenance, and supports modular design. Such code blocks have been described by various names—subroutines, procedures, functions, and methods. Calculation of logarithms could be one subroutine. Computation of parity bits of an RS code could be another subroutine. Subroutines are the building blocks of high-level functionality.

For Hoare, Quicksort was a subroutine that took words as inputs and gave a sorted list as output. Quicksort operated in recursive fashion. It called itself repeatedly until it reached a certain terminating point when it could do the actual sorting. Hoare got the idea of recursive implementation for Quicksort when he discovered that ALGOL 60 supported recursive programming. The underlying idea of all recursive programming is to break up a complex problem into another problem of exactly the same nature but of a lower scale. This reduction in scale leads to reduction in complexity. The algorithm then proceeds to higher levels of complexity and solves them based on solutions obtained at lower levels. While a particular implementation may or may not be recursive, Quicksort algorithm itself is classified under _divide-and-conquer algorithms_.

It was in 1961 that Hoare's Quicksort first appeared in English. Since then lots of variations of Quicksort and newer algorithms have been invented. Although algorithms had existed even before, Quicksort marked the start of an explosion of interest in algorithms. What is today a classic in computer literature was published in three volumes by Donald Knuth, _The Art of Computer Programming_. The last of these appeared in 1973. It focused exclusively on sorting and searching. The fact that Knuth had qualified programming as art rather than science or engineering gave algorithms an attractive aura. More recently, Knuth has been compiling a fourth volume. Robert Sedgewick published his Stanford PhD thesis on Quicksort in 1975. Though much of this interest in algorithms was academic, the industry picked up the techniques quickly. Unlike innovations in hardware, software algorithms did come with problems of manufacturing or fabrication. Barriers between research and application were lower. It was possible for programmers to adopt and implement new algorithms without a great deal of gestation period. If ever there was a delay between software research and industrial application, it arose due to an increased awareness of good software engineering practices.

With faster hardware, time-sharing operating systems, powerful programming languages, better compilers, and efficient algorithms, almost all the necessary elements were in place for computers to evolve in a major way. The move from mainframes to minicomputers had provided enough traction for the market to evolve and expand. When many basic things are ready, the possibilities of combining them in multiple ways are exponential. All that was needed now was for innovators and entrepreneurs to realize this potential.



**It was in** January 1975 that a magazine named _Popular Electronics_ featured on its cover a new form of minicomputer named Altair 8800. Unlike minicomputers of the day that normally cost many thousands of dollars, the Altair seemed to be value for money. The magazine claimed that it could be had for less than four hundred dollars. It was not a packaged ready-to-use computer. It was a do-it-yourself kit. Customers were required to solder and assemble it from supplied parts. As such, it appealed to hobbyists, who loved nothing more than putting things together. Such hobbyists and the hands-on culture they represented had originated from the days of radio receivers at the turn of the century. Later they evolved through the years of vacuum tubes and solid-state electronics. For the greater part of the computer revolution, the hobbyists had been left standing by the sidelines of change. Computers were expensive and not affordable to their modest wallets. Altair 8800 offered to the hobbyists a new gadget that was more than affordable. It was attractive.

Only four years earlier Intel had announced the world's first microprocessor, the 4004. It was not born as a conscious innovation to take the industry in a new direction. Rather, when Japanese company Busicom asked Intel to build twelve chips for their upcoming electronic calculator, engineers at Intel felt that there could be a better way of doing this. Building twelve customized integrated circuits involved massive effort in design. When manufacturing and fabrication were getting cheaper and faster, design would become a major bottleneck. To do customized design for every customer who came through the door was going to be expensive. It was at this point that Intel engineers, led by Ted Hoff, conceived of a general-purpose computing machine in chip form. Such a chip could do any sort of computation including those that Busicom wanted. Calculations instead of being wired hardware logic, as Busicom had proposed, would instead be done by software running on a generic chip. With the coming of the microprocessor, a fundamental change had happened in the industry—the slow and sure shift from hardware to software. Back in the 1940s, computers had become programmable. Now the concept was extended to the micro level, the level of silicon.

Though initially called a minicomputer, Altair 8800 was really a microcomputer because its computational engine was Intel's more powerful 8080 microprocessor that was released in 1974. It used the state-of-the-art _Large-Scale Integration (LSI)_ , which meant that a stamp-sized chip could pack tens of thousands of transistors. It was an 8-bit MOS processor that could address 64 kB of memory. The Altair 8800 only looked like a minicomputer—a box with lots of panel switches and lights. It had no monitor or keyboard. With only 256 bytes of memory, there was nothing much that could be done with it. Programming the Altair was a messy affair, a toggling of multiple switches by hand. If the program worked, the lights would blink in some determined fashion. But with the Intel 8080 inside it and an ability to add peripherals easily, it was a tiger waiting to be let loose. Those who saw this potential immediately got in touch with the inventors of the Altair.

Altair 8800 was the brainchild of H. Edward Roberts of Micro Instrumentation and Telemetry Systems (MITS). Only recently, Roberts had been put out of the calculator business because dropping costs and intense competition had eaten away profit margins. Altair 8800 was his response. He also knew that to build the business, the company had to build on Altair's interfaces so that hobbyists can add peripherals. Meanwhile, he received a letter dated January 2, 1975, claiming that there was a BASIC language interpreter already available for the 8080 microprocessor. Would MITS be interested? The letter was signed by Paul G. Allen.

In fact, at that point neither Allen nor his friend Bill Gates had written a single line of BASIC code for the 8080. They did not even have the instruction manual of the processor. Still, they were confident that a BASIC interpreter could be developed quickly. BASIC was a simple high-level language. It was easy to learn and appealed to novice programmers. Its simplicity made it attractive for the small memory footprint of the Altair. While a compiler parsed an entire high-level code and generated machine code, an interpreter was more flexible. It did translations on the fly. Interpreters were more suitable for the Altair.

Remarkably, Bill Gates and Paul Allen developed the BASIC interpreter in just six weeks. They did this without access to either the Altair or the 8080 microprocessor. All they needed was the instruction set of the 8080 and some details of interfacing the Altair to its peripherals. It was purely a software effort. They did all their development on a PDP-10 running on TOPS-10 time-sharing operating system. They wrote simulators and debuggers to ensure that the program worked. Best of all, the program was handcrafted almost byte by byte. The entire interpreter occupied only 3,200 bytes in total, spanning 2,000 lines of code. The acid test came when Paul Allen flew to MITS office at Albuquerque to give a demonstration. The intellectual property that he carried with him was on a paper tape. An Altair machine was connected to a Teletype machine with monitor, keyboard, and paper tape reader. The programmed paper tape was fed into the Teletype and loaded into the Altair's expanded memory of 7 kB. Allen typed 2 + 2 and out came the reply as 4. A number of things could have gone wrong in this demonstration but the program worked at the very first attempt. This was immediately followed by loading a Lunar Lander program written in BASIC. This too worked smoothly. Thus was born the world's first microcomputer that hobbyists could comfortably purchase, own, and use in their homes. Following in the tradition of log books, pencils, and electronic calculators, computers had become personal devices. Soon microcomputers were called _Personal Computers (PC)_.

Gates and Allen formed a partnership that they named Micro-Soft, a name derived from the microprocessor and software, both enablers of a new shift in computing. In July, they signed a contract with MITS. BASIC interpreter could be shipped either as part of the Altair hardware or sold separately. It could also be sold to other manufacturers. Roberts had realized that the Altair's hardware alone was not going to bring growth. It needed software so that the computers could do something useful. Surprisingly, by the end of 1975, software sales were not as well as expected. Altair 8800 was no better than a fancy toy that did next to nothing without its software. Despite this limitation, Altair 8800 had sold well while the BASIC interpreter had not. It turned out that hobbyists, grown in a culture of hippies and hackers, copied software freely. Until very recently, even IBM had supplied software without cost. Commercial value was placed on hardware. That software could have intrinsic value was not perceived. It took a while for perceptions to change and get users to pay for software. In the end, Microsoft did earn its revenues. By April 1976, the BASIC interpreter had crossed a million dollars in sales and clocked more than three hundred thousand users worldwide.

What Roberts of MITS didn't realize was that building a microcomputer was not all that difficult. The microprocessor had simplified computer design. Competitors arrived quickly in the marketplace. All that Microsoft needed to do was to adapt its software to every new personal computer. Moreover, Gary Kildall of Digital Research had written an operating system named CP/M that became the de facto industry standard. This simplified the work of Microsoft. Their engineers now could port the BASIC interpreter to CP/M just once and use it for multiple vendors with minimal changes. As new players entered, market share of the Altair declined. MITS never fully recovered from this but Microsoft prospered from its software that suddenly appeared to be universal. Software had finally triumphed over hardware, not just in university laboratories but in the commercial space. The final nail in the coffin for MITS came the next year when the famous "1977 Trinity" entered the market.

This refers to three personal computers that made an impact by offering complete solutions rather than assembly kits for hobbyists. The vast majority of the market is not made of hobbyists. It is made of people who want to get things done without getting their hands dirty or grappling with the complexities of technology. Two of these computers were Commodore PET and Tandy's TRS-80. Both came with integrated keyboards. Commodore PET also had a monitor and a cassette recorder that functioned as permanent memory storage. TRS-80 could be hooked up to television sets and audio cassette recorders. It cost only $399. The new competition was no match for the more heavily priced Altair 8800b, the successor of the first Altair. The third of the trinity was priced at $1,298 but by design is was a lot more appealing to users. This was Apple II.

The first Apple computer had been created in 1975 in a span of few weeks based on the 8-bit 6502 microprocessor from MOS Technology. It was no more than a kit for hobbyists and only two hundred of these were sold. The engineers who had created Apple, Steve Jobs and Stephen Wozniak, saw the market potential for computers that simplified user operations. To appeal to everyday users, a personal computer had to be packaged properly. The user would be required to do no more than plug it to a power supply and switch it on to get going. Essentially, it had to be as simple as a television set. It is from such a vision that Apple II was created.

Apple II did not come with a monitor but one could be connected to it easily. Unlike Commodore PET, which was a closed system, Apple II was an open system. Peripherals could be added easily. Even in design, Apple II had user appeal while Commodore PET looked like a glorified calculator. As for TRS-80, its main appeal was for playing video games. Early versions of Apple II interfaced to cassette tapes but floppy disks were already available in the market. Even the Altair machines had started using 8-inch floppy disks, first invented in IBM. Apple II designers selected the five-quarter-inch floppy disks but felt that the controlling circuitry was too complex. Wozniak redesigned it in entirety with minimal components, only five chips to do the job of fifty previously used. This is the kind of simplicity in design, inside and out, that defined Apple's approach to computing. Nothing should be any more complex than it needs to be. The next big thing that happened took the industry by surprise.

Daniel Bricklin, a graduate student at Harvard Business School, noticed the great lengths to which professors and students went to perform complex financial calculations. It occurred to him that a spreadsheet software executed by computers could simplify such calculations. Similar software had existed on mainframes but to have them on personal computers would give power to almost anyone who wanted to manage his finances. It was Apple's fortune that Bricklin had access to an Apple II computer back in 1979. So the entire software, eventually named VisiCalc (Visible Calculator), was created on Apple II and ran on Apple II. Using only 25 kB of memory, it was sleek and fast. In days not too long ago, users had submitted their formulas to mainframes and minicomputers. Even with the interactivity of time-shared systems, they had to wait a few minutes to get the results back and longer if the system was overloaded. These system bottlenecks did not plague VisiCalc running on Apple II. Apple II was a standalone system. The user had a dedicated machine of her own. Results came back almost instantly. The level of interactively was far superior. Financial consultants could change formulas, look at predictions, and make corrections in a matter of minutes.

VisiCalc directly contributed to the sales of Apple II. In other words, people were buying a personal computer on the strength of the software than ran on it. Even for Microsoft, the success of VisiCalc was an eye-opener. Microsoft BASIC was a programming language and interpreter. It was excellent for programmers but alienated non-programmers. VisiCalc was the first of what came to be called _killer applications_. In fact, applications got their first taste of success from VisiCalc. Video games were applications too but they were not considered then as part of mainstream software. What VisiCalc did was to move software from the level of programmers to the level of common users. A dividend discount validation model would have taken twenty hours of programming with BASIC but with VisiCalc, even a non-programmer could construct the model within fifteen minutes. The success of Apple II due to VisiCalc is perhaps overrated. Only 25,000 of 130,000 computers bought before September 1980 were bought due to VisiCalc. Is this perhaps a case of software piracy? After all, floppy disks on Apple II had simplified the task of copying and distributing programs.

All three of the "1977 Trinity" had one thing in common: Microsoft BASIC. Just as the Altairs had lost the technological race, other personal computers too had their short runs of advantage before they succumbed to market forces. Microsoft survived through these troubling times of opportunity. In his memoir, Paul Allen noted that "Machines came and went; good software lived on." While Microsoft had provided programmers with a platform for development, the increasing user base of Apple II was a worrying sign. There were lots of wonderful CP/M-based applications that were not available to Apple II users. Microsoft was keen to tap into this market. It was at this point that Bill Gates reluctantly agreed to get into hardware. A new card named _SoftCard_ was launched by Microsoft in 1980. It could plug directly into an Apple II system. It carried CP/M OS and with it the support of many popular applications. Allen commented,

In the old world, everyone from IBM to MITS had bundled software as a throw-in with the machine. Now we were bundling a cheaply made piece of hardware to help us sell BASIC and our expensive suite of software. The SoftCard was the razor; our languages were the blades.

The SoftCard gave Microsoft a revenue of $8 million in 1981 alone and maintained strong performance for two more years. This was but a short digression from the company's focus in software. The very year that the SoftCard was launched, a sleeping giant was waking up, except that he had only pretended to be sleeping. He had been following market dynamics for sometime but now the management was committed. The market of the future was clear. IBM would enter the personal computer market without delay. Once the decision was taken, IBM moved with surprising speed. The old ways of development were kept aside for the mainframes. New leaner methods were adopted for the personal computer that required shorter development cycles. By August 1981, IBM Personal Computer (PC) was launched. It came with its brand name and reputation. Programmers started developing software compatible with the IBM PC. This trend spread so quickly that within a few years, any personal computer not compatible with the IBM architecture was forced out of business. The only one that survived was Apple, whose stylish design won over many hardcore fans.

Many things contributed to the success of IBM PC. To begin with, IBM went against its long-standing policy of vertical integration. Instead of fabricating its own microprocessor, it adopted Intel's new 16-bit 8088. The new microprocessor was a force in itself. The 8080 had only 8 bits in its instruction set. Expansion to 16-bit architecture meant that the 8088 could address larger memories. It could do much more in a single instruction. Simply put, advancing MOS technology had enabled some of the power of old mainframes to be consolidated into a single chip. If there is a simple rule for innovation, it is to use the latest in technology for new designs. DEC, a pioneer in minicomputers, entered the personal computer market quite late. When it launched its Rainbow 100 in 1982, it used a dying 8-bit technology running on CP/M OS. DEC never convincingly got into the personal computer market. IBM's early adoption of 16-bit technology was a calculated decision. The problem was that there was no operating system for 16-bit microprocessors.

IBM approached Microsoft to not only supply a 16-bit version of its BASIC but also develop a new 16-bit OS. Building an OS was no trivial task but Microsoft knew that this was a golden opportunity. If they managed to pull this off, CP/M OS would be out of the door. In fact, Gary Kildall had passed over the opportunity to upgrade CP/M for the Intel 8088. Fortunately for Microsoft, Tim Patterson of Seattle Computer Products (SCP) had already written a basic 16-bit CP/M version. With some effort, he was confident he could port it to the Intel microprocessor. Microsoft quickly negotiated a licensing deal with SCP, which was later changed to an outright purchase. The operating system was initially named _Quick and Dirty Operating System (QDOS)_ , which later evolved to the well-known _Microsoft Disk Operating System (MS-DOS)_. Most importantly, Microsoft retained the rights to market this to manufacturers other than IBM. From its CP/M experience, Microsoft had realized that it was important to control operating systems, price it low, and steer the market towards a single universal standard.

BASIC, as necessary as it had been in the making of personal computers, was a great step backward for structured programming. On the flip side, any structured program would have been too bloated to fit into the small memories of the Altair 8800. But by the early eighties, the market was changing once more. Hardware was continuously becoming better, faster, and cheaper. Personal computers were closing the gap with minicomputers. Applications were many but they did not tap the full potential of hardware capabilities. Technology had become available for a new form of human-computer interaction.

For decades, human-computer interaction had been limited to text. Teletype machines had a keyboard for textual input and a CRT for textual display. The idea of pointing on a display screen and using graphics can be traced to early US defence systems from the fifties. Operators used "light guns" to locate points on CRT displays that tracked moving targets. Subsequently, such pointing devices and graphical displays took too long to enter computers as primary forms of interaction. The rationale was quite simple. Computers were meant to compute. When processing power was at a premium, it would be luxurious and even wasteful to expend that power on displays. The focus of computer design right up to the sixties was on functionality and not user interaction. Users were expected to adapt to computers. Only a few users managed to live up to these expectations. The vast majority found computers intimidating. Then came the famous demonstration of 1968, often called "the mother of all demos."

Douglas Engelbart, then working with the Stanford Research Institute, gave a presentation unlike anything done before. Television cameras focused on his face, the movement of his hands, and a display screen. The audience were treated to a multimedia presentation that involved a mix of text, graphics, and live video. Different displays faded in and out on the same CRT screen. When Engelbart moved a little wired box, a little line with a pointed arrow moved on the display screen. With this arrow, he could draw attention to specific parts of the display. Engelbart had applied for a patent on the box-device only the previous year but the idea had germinated as early as 1964. The patent application bore the title "X-Y Position Indicator for a Display System." Today we call such a device the _mouse_ but Engelbart's idea was more about the software than the hardware. The idea of pointing to something was bringing everyday experience into the world of computers. It was all about interactivity and user experience. Later Engelbart commented that he had been inspired by the planimeter, the mechanical integrator that had been so essential to early analogue computers.

Engelbart's presentation was an inspiration and a wake-up call for Xerox, whose main business was in photocopiers. If users could update and share electronic documents, it was in their business interest to master this new technology. The Palo Alto Research Center (PARC) was established in 1970. Dedicated research that Xerox PARC pursued through that decade laid the foundations of modern day _Graphical User Interface (GUI)_. The shift in emphasis was clear. Sure, computers were meant to compute, but they were also meant to simplify their own interactions with humans. Interfaces were to be aligned to human experiences and behaviour. In the real world, we use our fingers to point. We use our arms to grasp, pick, and place objects. We drag things across the room. We drop letters in mailboxes; or drop waste in trash cans. Life was experiential and full of actions. Words were only descriptions of that experience. So the world of graphics that enabled actions would be far more comfortable than the world of text.

Tiled Windows and Overlapping Windows

(a) Early commercial releases of GUIs implemented only tiling of windows. (b) Soon it was realized that overlapping windows brought huge benefits. Each window could potentially occupy the entire display, while inactive windows could be made to recede into the background.

The problem was that BASIC was too simple a language for GUI programming. Even the more powerful C language wasn't an ideal candidate. C was closer to hardware than user interaction. For GUI, designers had to think at a higher level that related to user experiences—objects, actions, object interfaces, behaviours, relationships, and hierarchies. From such a need arose the notion of _Object-Oriented Programming (OOP)_. Alan Kay at PARC coined the term and designed Smalltalk as an object-oriented language as well as a programming environment. To hide details and enable design at a higher level of abstraction was the essence of OOP. When details were hidden, only interfaces mattered, only interactions between objects mattered. Objects handled the details internally while external interactions determined system behaviour. Simple objects could be assembled to create complex systems. As software projects grew in size and complexity, it was better to design in languages that automatically took care of this inevitable complexity. Programmers would then be freed from the messy but necessary details. For example, C programmers need to take care of memory management but true object-oriented languages would do this automatically.

Kay used the power of Smalltalk to investigate many aspects of graphics—display resolution, font rendition, animated clips, painting in pixels. He had been motivated by abstract algebra in which algebraic relations could have myriad applications; or that simple mechanisms of cell biology led to complex life-sustaining processes. Before arriving at PARC, Kay had come across two sources that influenced his thinking. Back in 1963, Ivan Sutherland had invented at MIT the Sketchpad. If we were to identify the big-bang of computer graphics this would be it. Users could draw line graphics using a "light pen." Sketchpad was the start of vector graphics in which lines, curves, and shapes were the basic elements of drawing rather than pixels. From here, system simulations could become visual. IBM 2250 released as part of the System/360 family was specifically meant for vector graphics. The second idea that came in Kay's way was SIMULA, a language that two Norwegian researchers had invented in the sixties. Existing languages had been simply deficient for purposes of simulation. SIMULA was used to model and simulate systems. It was from SIMULA that some of the concepts of OOP were born. Kay saw that beyond surface appearances and definitions, both Sketchpad and SIMULA shared structural and philosophical similarities. These inspired Kay in the creation of Smalltalk. PARC management understood little of Kay's ideas on programming. They were more concerned with industry trends, to which Kay famously retorted,

Look. The best way to predict the future is to invent it. Don't worry about what all those other people might do, this is the century in which almost any clear vision can be made!

In 1973, PARC internally unveiled an experimental computer that supported GUI via Smalltalk. The computer was named Alto. Graphics on the Alto were bitmapped; that is, displayed pixel by pixel. The basic elements of any GUI familiar to us today were available on the Alto—application windows that could overlap one another, menus, icons, file manager, bitmap graphics editor, and even a document editor named Bravo. Bravo introduced the notion of _What-You-See-Is-What-You-Get (WYSIWYG)_. In other words, documents appear on display in exactly the same way they would be printed on paper. Much of this innovative research remained within the company. The first commercial product appeared on the market only in 1981 as the Xerox Star 8010 Document Processor. At $17,000, it was hardly a product for the average computer enthusiast. But the research done at PARC directly set the tone for a revolution in personal computing.

Both Apple and Microsoft saw the potential of GUI and user interactivity that came with it. They copied some of PARC's ideas and improved on them. Apple's offering came first in 1983 as the Lisa computer. It was followed up a year later by the more successful and cheaper Macintosh. Microsoft's own GUI named Windows did not appear till 1985 but it did release a word-processing software named Word in 1983. Understandably, Word ran on MS-DOS with a clearly textual flavour. As a third-party developer for Apple, Microsoft's Word was available for Apple's products as well. Later versions of Word were clearly better since they were built over Windows GUI. Early Windows that had been handicapped by the need to run on MS-DOS, evolved to become a full-fledged OS. Moreover, Microsoft understood that if applications were the key to selling Windows it ought to publish its interfaces so that third-party vendors could write applications to run on Windows. Such published interfaces are generally termed _Application Programming Interfaces (API)_.

Barely a decade ago, Microsoft had scurried to port its BASIC interpreter to every 8-bit processor system on the market. With the consolidation of hardware platforms, the industry had moved one level up the value chain. Now applications relied on system APIs that were feature-rich for user interaction and minimal for hardware configuration. Through the 1980s, shrink-wrapped software became a phenomenon. Consumers could walk into a store and buy software to run on whatever compatible hardware that they already owned. Most software vendors released their products for Microsoft Windows. With this alone, they could reach out to most of the market. A few others tapped into Apple's Macintosh or UNIX-based platforms. While many user applications were thus built using system APIs, some vendors built programming platforms and offered them as _middleware_. Such middleware eased GUI programming further.

With Word, Microsoft finally entered application software business. It had diversified from its origins as a supplier of language interpreters and operating systems. In the years ahead, Microsoft Office comprising of word processing, spreadsheets, presentations, desktop publishing, and database management, became the most popular suite of application software in the world. It displaced and destroyed early leading applications including WordPerfect and Lotus 1-2-3. Neither Windows nor Microsoft Office could have achieved this success had their programmers not migrated to OOP.

Among the more successful of the OOP languages was C++, an offspring of C and SIMULA. Invented in Bell Labs by Bjarne Stroustrup, C++ evolved as a natural response to the increasing complexity of software projects. C was a good programming language but did not help in the design of software. C++ combined the powerful syntax of C and the object-oriented approach of SIMULA. Critical to its success was its evolution from C, which had already established a steady following. Nonetheless, many software engineers who had grown up with C found it difficult to think in C++. It dawned on them that a good programmer was not necessarily a good designer. The use of C++ was more than just writing code. It required considerable grasp of abstract thinking and design skills. If C was like crossing a river in a coracle, C++ was like crossing an ocean in a ship. For complex problems, a ship was necessary and it mandated solid design practices. For nearly a decade, engineers who entered the realm of object-oriented thinking did so via C++. By then, what was to become the world's most popular language, had arrived on the scene.

Java was conceived as a means to reconcile programming with the growing diversity of electronic devices. By the close of the 1980s, the industry predicted that computers and consumer devices would soon converge. Microprocessors had become so cheap that they had already started entering mobile phones, cameras, car dashboards, TV set-top boxes, and music systems. Naturally, all these required operating systems and application software. With such a diversity, applications could be written in C++, for example, but the job of the compiler writer would become unmanageable. Many more compilers would have to be written, one for each type of platform. In an attempt to simplify the job of compilers, James Gosling and others at Sun Microsystems came up with Java.

What Java did was to translate high-level code into _byte code_ , a new layer that sat between machine code and high-level code. On every different device, there would be a Java run-time environment, which interpreted byte code and did translation to machine code on the fly. The advantage came because one could take the byte code of a program and run on various devices without doing anything special. Each device needed an installation of its own run-time environment but once this was in place, convenience came to the programmer, the compiler writer, and the user. Java's approach was way beyond the simple separation between hardware and software that had characterized early software evolution. It was about portability of software across different hardware. Programming was getting complex. Good programmers were always in short supply. It was becoming important to write programs that could be reused in many ways.

Design and code reuse is the essence of all object-oriented programming. Java took this to a mature level of sophistication with tested packages containing well-documented templates that could be reused to create objects easily. For example, if "Shape" represents a class of objects, its characteristics could be inherited by other classes rather like parents passing on their genes to their children. So new classes of objects such as "Triangle," "Rectangle," or "Pentagon" could be formed from "Shape." This was the idea of code reuse via inheritance. A Java programmer didn't have to build a system from scratch. Particularly with user interfaces, Java came with powerful packages and APIs that simplified programming.

Inheritance in Object-Oriented Design

Inheritance is all about design and code reuse. An object inherits relevant properties of another to which it is closely related. By such association, it becomes easier to create complex software systems since each component need not be built from scratch.

Java was initially released as Oak in 1992 but it really took off when the Web became popular in the mid-1990s. Java was a timely solution that the industry urgently needed. Users of the Web were on different platforms—UNIX, Windows, Macintosh, and many more. From the perspective of applications, Java unified all of them and made it popular among programmers. After all, Java APIs had been openly published and there were no licensing restrictions to using Java. Many of the world's websites today run on Java technology.

With the coming of C++, Java, and other object-oriented languages, structured programming had finally been realized. The eighties was truly the decade when structured programming started making real progress, even though SIMULA and Smalltalk had started the trend long ago. This is a case in point that industry often has a commercial focus. Ideas that come too ahead of their time, have to simply wait till industry sees a real necessity in adopting them. While it's true that necessity is the mother of invention, it's also true that perception of necessity is different in research and industry.

Programming Languages

Today there are literally dozens of programming languages across many levels. Among the popular ones are Java, C, C#, C++, VISUAL-BASIC, PHP, and Python.

All through the eighties, structured approach to software went beyond the confines of just programming. Design became structured. Process became structured so that tools evolved to track engineering requirements. Automated testing of software saw great strides at all stages of development. _Computer-Aided Software Engineering (CASE_ ) and _Computer-Aided Design and Manufacturing (CAD/CAM)_ became established methodologies. New languages emerged not for programming but for design. Technology had advanced so far that human focus shifted from implementation to design. Once a design was validated, everything else could be automated. In telecom engineering, _Specification and Description Language (SDL)_ was widely adopted for the design and definition of telecom standards. With SDL, an entire system could be built visually using graphical elements. Tools would then automatically translate the design into C code, compile and execute the same. Tools would automatically generate test cases to simplify the job of test engineers. A whole new design philosophy by the name of _Unified Modeling Language (UML)_ emerged as part of this shift from coding to design. Powerful _Software Development Kits (SDK)_ that included interactive debugging, colourful syntax highlighting, reference examples, and automatic code indentation were offered in the form of _Integrated Development Environments (IDE)_.

In the end, GUI had come a full circle. What had initially started for visual simulation and later as enablers of intuitive human-computer interaction, returned to aid engineers in the design of complex systems. These developments may suggest that software projects were inherently complex and therefore took long development cycles from the late eighties onwards. This was quite true in the beginning but the industry responded by innovating on process. Software process was evaluated on various levels of maturity. The _Software Capability Maturity Model (SW-CMM)_ was setup in 1987 as a formal means to evaluate and certify software processes. In a fast-changing market, no company could afford long development cycles. Shorter cycles did not mean skipping essential steps of the process such as design reviews, interface documentation, or incremental testing. Processes had to evolve to find a common ground between project complexity and shorter time to market. New ideas and processes emerged to address these issues—Extreme Programming (XP), Agile Methodology, or Scrum.

As microprocessors evolved, so did operating systems and applications. Today's personal computers run on 64-bit architectures. Windows OS evolved to make optimal use of 32-bit and 64-bit microprocessors. Given the power of newer microprocessors, personal computers could handle multitasking. While batch-processing systems had once upon a time handled users one at a time, multitasking enabled a single user to run multiple applications at the same time on a single microprocessor. This meant that a user could open a graphics software, edit some image details, copy that image to another running software, while in the background an engineering simulation program might be crunching numbers. The operating system software was smart enough to schedule each of these efficiently on the same microprocessor that they all shared. Meanwhile, the giant in computing whose strength had always been in hardware felt the pinch in the personal computer business.

The Evolution from Batch Processing to Multitasking

(a) Users queue up and take turns to acquire computing resource. (b) Multiple users share computing resources, each one getting small slices of dedicated time. (c) With personal computers, technology made it possible for each user to use a dedicated computer. (d) Powerful hardware and system software made it possible for a single user to multitask across multiple applications on a single personal computer.

IBM had acted swiftly in the early eighties and collaborated with multiple vendors to launched its PC quickly into the market. While this enabled IBM to capture the market quickly, IBM executives had not considered what long-term effects this decision might bring. The interfaces were non-proprietary, which meant that others could duplicate them. Peripherals and entire systems could be built as clones of the IBM PC. Compaq and Dell are just two of numerous companies to build IBM-compatible clones. For every new microprocessor that Intel introduced, IBM had to upgrade the PC to retain its leadership. Even then, IBM's profit margins steadily declined. It turned out that Intel and Microsoft made money from the PC business, but not IBM. This was the final chapter in the triumph of software over hardware. Technology had made hardware so cheap that the real value was in software. Hardware continues to be important in mission critical and real-time applications. Specialized hardware that comes out of innovative design still has value. For generic hardware, it's only a game of volumes differentiated by software.

IBM eventually sold its PC business to Chinese manufacturer Lenovo and began to focus increasingly on enterprise software solutions. In a seven-year period beginning in 2003, IBM acquired fifty-seven software companies at an expense of $13 billion. The software service industry too has become competitive in recent times. Indian companies including Wipro and Infosys have eroded into IBM's market share, forcing IBM to focus on specialized software that command higher margins. Microsoft on the other hand had its golden period through the nineties. Its stock value multiplied hundredfold through the decade. In the same period, a new wave of change swept the industry. Both Apple and Microsoft would be left facing the same searching questions that had troubled IBM not too long ago. On this wave would ride a new generation of companies to rival the reigning software giants.

# 1010 Beyond Borders

**Legend has it** that some year in the late 1880s, an undertaker from Kansas City, Missouri, was losing business to a competitor. Almon Strowger had formerly been a schoolteacher and some years ago had moved into the funeral business. Strangely, business went to his competitor even when he personally knew the deceased. Strowger suspected foul play. Those were infant years of telephony. A subscriber could not place a direct call. He had to go through an operator at the local exchange. The operator sat overlooking a plugboard of patch cords with metallic electrical contacts at their ends. The operator spoke to the caller, took down the name or number of the callee, and made the connection manually. In this way, the caller's line was electrically connected to the callee's via the local exchange, like a knot temporarily joining two pieces of strings.

The operator's manual intervention was essential. Early on, system engineers had realized that it would be economically infeasible to connect every subscriber to every other subscriber on the network. Suppose in a group of ten subscribers, every one had a dedicated connection to nine others, then the network would require forty-five circuits, certainly a big number for such a small group. One calls this a fully connected mesh network. If such a mesh architecture is imagined for a thousand users, then half a million circuits would be needed. In short, architecture that may be acceptable for small groups becomes unmanageable for bigger groups. Such a solution is not scalable.

A subscriber talks at most to only one other subscriber at any time, which means that most of the time all except one circuit per subscriber is not in use. Since laying copper wires and maintaining them was an expensive affair, it made sense to get system architecture right. It was from here that telephone switching and exchanges were born. Rather than having multiple wire pairs for each subscriber, it was sufficient to have a single one per subscriber. The exchange would take care of making the connections when so required.

The world's first telephone exchange was established in 1878, barely two years after the birth of the telephone. With this was born the first network that enabled end-to-end connectivity. The telegraph network had come decades earlier but it was a network that didn't need end-to-end connections. In fact, telegraphy is a _connectionless_ _service_. When someone sends a telegram, it is not required that the recipient be present at the same instant at his local telegraph office. It is not even required that an electrical circuit be established between telegraph offices at the two ends. The telegram works its way through a chain of offices from the sender to the receiver. Intermediate offices store the messages and forward the same to the next office along the chain. Chappe telegraphy, Morse telegraphy, and postal services all work on the same principle. Telephony on the other hand was different. Both sender and receiver had to be present during the call. Telephony is a _connection-oriented service_ and it requires end-to-end connections. Trained operators made these connections manually because there was no way subscribers could do this on their own.

Strowger suspected that one of the operators in his local exchange was purposefully directing all his incoming calls to his competitor. He suspected that this operator was having a romantic relationship with his competitor. Exasperated, Strowger resolved to invent a mechanism by which switching could be done automatically without human interference. Clearly, at least in this case, Strowger had greater trust in machines than in humans. Machines could be designed to perform specific tasks and they did them without complaining. Humans embraced something called free will and were often unpredictable. Strowger was among the early inventors to tackle the difficult problem of automatic telephone switching. By 1889, he invented a system to do just that. Two years later, he obtained a patent. In 1892, the world's first automatic switching office became operational in La Porte, Indiana. Historians later noted that as a boy, Strowger and his brothers often generated ideas for devices designed to automate daily chores their mother asked them to do. It was this streak of inventiveness that Strowger brought to switching.

A Strowger Switch

This is one part of an early Strowger switch to select an available circuit by stepping through circuits one at a time. Source: Brock Craft, Wikimedia Creative Commons.

Strowger's first switch was crude by modern standards. To dial the digit _9_ , the subscriber had to press a button nine times, which resulted in nine pulses travelling towards the exchange. These pulses triggered electromechanical mechanisms in the switch to select one among many sets of contacts. Then a rotary mechanism swept the chosen set to find a free circuit to the callee. It took nearly a decade of incremental improvements to make these switches reliable and scalable for industrial requirements. In 1896, rotary dials were first introduced out of research work carried out by engineers at Strowger's firm. These dials employed the same pulse dialling of earlier equipment but gave users a simplified interface. The equipment took care of generating the correct number of pulses for every dialled digit. While the first Strowger switch could handle only a hundred calls, modular design enabled interconnection of switches so that multilevel switching became possible. Such improved switches could handle 10,000 simultaneous calls. Strowger switches were introduced into Europe at the turn of the century. Despite later innovations in switching, many Strowger switches were in use well into the 1960s.

There are three essential parts of a telephone network, or any network for that matter including the modern Internet. _Signalling_ deals with methods of control on circuits and calls. Among other things, it involves ringing the phone for an incoming call, transmitting a dialled number from subscriber to the local exchange, symbol encoding, and indication of busy tone if callee is unavailable. A good deal of the works of Hooke, Chappe, and Morse were about signalling. _Switching_ deals with establishment of a circuit path between parties of a call. A hierarchy of exchanges, local and trunk, do this. _Transmission_ deals with the actual transport of voice traffic during the call. When traffic is aggregated from multiple circuits, it involves multiplexing—FDM in the analogue world, PCM/TDM in the digital world.

Strowger's invention dealt with both switching and signalling. It came at a time when Graham Bell's telephone patent was expiring. Independent players entered the industry and they targeted non-urban areas that the Bell System had ignored. Strowger switches became an attractive option over recruiting and training women operators. Strowger himself commented at the 1892 inaugural launch that just as the telephone had made messenger boys redundant not too long ago, the automatic switch would do the same with women operators. He might have added how mechanized looms of the Industrial Revolution had made thousands of cottage weavers redundant.

AT&T was initially reluctant to automate switching, claiming that for large cities with thousands of subscribers, manual switching was faster. Introducing automatic switches would also pose a problem of compatibility with their existing manual switches. Nevertheless, Edward Molina of AT&T brought out a key innovation in 1906 with the method of _indirect control_. While Strowger switches enabled subscribers to directly control switching at the local exchange, indirect control separated dialled pulses from the switching fabric. A special equipment called the _sender_ processed the dialled pulses, encoded them into forms suitable for machine processing, and independently controlled the switching fabric. Subscriber dialled a number but switching control remained within the exchange. This brought flexibility as addition and rearrangement of trunks became easier. Routing of call through intermediate offices became easier. In short, the sender was the machine equivalent of a human operator. Two indirect control switches evolved—rotary and panel. Rotary switches were adopted in Europe while panel switches with their higher capacity became popular in the US.

AT&T installed the first panel switch in 1921 in Omaha, Nebraska. Soon other panel switches were introduced into many metropolitan areas. For the non-urban areas, Strowger switches were more cost-effective. Realizing this, in 1916 AT&T licensed Strowger's invention for $2.5 million and the first of these were installed in the Bell System three years later. Strowger himself had sold his patent in 1898 for a mere $1,800. Meanwhile, William Blauvelt of AT&T invented in 1916 a method by which subscribers could dial a number even if the callee was at the other end of the continent. This was made possible by introducing two types of digits within a number. The first few digits represented area code, the rest represented the callee's local number within his area. Although switching of long-distance calls were automated only in the sixties, Blauvelt's method was crucial in the evolution of switching automation.

The panel switch was a product of the mechanical age. Motors, commutators, brushes, pawls, lugs, shafts, clutches, tubes, and bearings were some of the terms used to describe the construction of such a switch. Electricity played a secondary role of activating the mechanics. The mechanical nature of panel switches implied that it had many moving parts, just as the Strowger switches. They required frequent maintenance. If switching had to keep pace with subscriber growth and volume of voice traffic, it was desirable to reduce moving parts, to improve speed and reliability of switching. It was in this spirit John Reynolds of Western Electric invented the crossbar selector in 1913. The selector was that part of the switch that made the actual connections between lines and trunks. It was the replacement of the manual switchboard of cords, plugs, and jacks. The selector was composed of electromechanical relays. Beyond the opening and closing of relays, nothing else moved. Reynolds's invention was patented two years later and was promptly forgotten. Engineers instead focused on realizing the panel switch for commercial deployment. The potential of the crossbar selector was not recognized.

The crossbar selector was resurrected two decades later when engineers realized that beyond the selector, the crossbar principle could be used for many switching elements that needed to perform interconnections. A crossbar-switching element was a grid of horizontal and vertical circuit lines that could be connected at their intersections using relays. In a typical switch for instance, 20 vertical lines connected to subscriber lines and 10 horizontal lines connected to trunk lines going to other local exchanges or toll switches. By stacking together 10 such crossbar switches and interconnecting two such stacks, it was possible for 200 subscribers to connect to any one of 100 outgoing calls and 100 incoming calls. In a mesh connection of 200 incoming lines with 200 outgoing lines, 200 x 200 = 40,000 interconnecting lines would be needed within the switch. With the crossbar configuration just described, only 200 x 10 x 2 = 4,000 crosspoints are needed to realize all the interconnections. This simplification comes by sacrificing switch capacity. This exchange can carry only 100 simultaneous calls whereas a full mesh can carry twice as much. This is an acceptable trade-off since the number of crosspoints has been reduced by a factor of ten.

This underscores the basic design philosophy of all switches. The expectation is that not all subscribers will be calling or receiving calls at the same time. The capacity of switch is installed based on subscriber base and estimated peak hour traffic. Switch design is governed by statistics. It is therefore possible that at times some calls will not get through the switching network because no circuit is available. Even mobile phone users today face this on the wireless channel when they see "Network Busy" indication. The probability that a call is denied for this reason is quantified as _call blocking probability_. Telephone service providers publicize this to subscribers as _Grade of Service (GoS)_. A GoS of 2% implies that there is a 2% chance that the call will be blocked. This generally happens during peak hours. This also explains why during disasters when almost everyone is making calls, many calls do not get through. The telephone network is provisioned for peak hour traffic with acceptable GoS. It is not provisioned for worst-case scenarios during which call blocking is much worse than the advertised GoS.

Switching Fabrics to Handle Eight Calls

(a) Eight incoming lines can be switched to any of the eight outgoing lines. Most connections within the switch will remain unused at any one time. (b) This switching fabric has less flexibility than a fully connected mesh but it is more economical since it contains only 32 lines rather than the original 64. (c) This switch uses three stages of basic 2x2 building blocks.

Crossbar switches allowed great flexibility with interconnections, using minimum number of crosspoints for best possible circuit availability and GoS. Effectively, they were an equivalent of the mesh network using minimal lines. It was mesh network only in the logical sense—any subscriber could connect to any other subscriber. In the physical sense, the subscriber had only one line going to the local exchange. Within the crossbar exchange, lines were assigned to circuits as the requirement arose. Once a call was terminated, the lines were released and made available for other calls. The brain of a crossbar exchange was the _marker_. It interfaced with crossbar switches, searched for available circuits at multiple stages, and reserved a set of lines to complete the desired circuit. Markers operated at fractions of a second and were instrumental in increasing switching speeds. The beauty of the design permitted reuse of control elements once a call circuit was established. Senders, markers, and their controlling relay circuitry didn't need to be tied down for the duration of the call. They were in operation only during call establishment and release. Senders in panel switches had enabled reuse as well. It is from these early automated designs that the power of _common control_ was realized. With a few shared control units, many circuits could be easily managed.

AT&T's first No. 1 crossbar switch was installed in 1938 at New York City. It was such a switch that George Stibitz used to build his Complex Calculator. In 1943, the No. 4 crossbar switch was introduced for long-distance calls. Five years later, the No. 5 crossbar switch made its way into suburban exchanges, which had been witnessing steady growth in subscriber numbers, particularly after the end of the war. These smaller switches found appeal in other countries as well. The French ITT introduced a variant named Pentazona in 1964, which became popular across many countries worldwide. By the 1970s, crossbar switches became essential for telephone switching. Nevertheless, for decades manual switches existed side by side with panel switches, Strowger switches, and crossbar switches. One of the challenges of the telephone industry has always been compatibility. Any new technology, no matter how superior in design or capability, must remain humble enough to interface with the old guard so that the latter can be retired gracefully. High installation costs and stringent reliability requirements of the telephone industry imply that new technologies have to work alongside older ones for decades. Even introducing new technologies necessarily requires long years of research and manufacturing maturity. New technologies usually enter complex systems in incremental fashion.

Though the transistor was born in 1948, the Bell System took a long time to adopt it into switching. Ironically, the transistor itself had been born out of basic research aimed at switching; and computers, which had been inspired by telephone switching, were among the first to use transistors in their designs. This was understandable. If computers of the 1950s failed, the impact would be localized. Computers were confined to research laboratories and specific industries. On the other hand, if the telephone network failed, almost everyone would be affected. Back then, telephony had a public face while computing did not. Telephony was a network while computing was not. From a sociological perspective, human communication is far more important than machine computation. Free speech is more essential to the individual than cheap computing. Speaking on the telephone was anything but free in the cost sense but the very fact that one could pick up a handset and call almost anyone underscored the freedom to communicate. It was therefore important that a switching system not incur a service outage of more than a few minutes over a period of forty years. This essential difference between computers and switching networks was stated by engineers,

In fact, the dependability objective represented one of the major challenges of the No. 1 ESS development. Since No. 1 ESS is a large digital information processor, it is a cousin to the general-purpose digital computer. However, the dependability requirement requires that No. 1 ESS be a different kind of system with a much higher level of redundancy.

Launched in 1965 by AT&T, No. 1 ESS marked a fundamental shift in telephone switching. True to its name _Electronic Switching System (ESS)_ , it was the first extensive use of electronics in switching networks. Solid-state diodes and transistors were used. It took Bell Labs more than a decade of R&D to introduce electronics into switching systems. In the early 1950s, the focus of research was on using vacuum tubes and gas-filled tubes. Engineers had to invent better designs to compensate for the lower reliability of early electronics. This included adding redundancy into the circuitry so that if one component failed, another could take over. Their advantage over electromechanical relays was speed of operation. Later in the 1950s, engineers started working on solid-state electronics. These took up smaller space, operated faster, and were more reliable. The first of these were introduced in an experimental system named _Experimental Solid State Exchange (ESSEX)_. The ESSEX laboratory experiment motivated the first trial of electronic switching in real-world networks.

The town of Morris, Illinois was still on manual switching as late as 1959 when it was decided to install No. 5 crossbar switches at the local exchange. In 1960, the first trial ESS was also installed and about six hundred customers were moved temporarily to the new system. Simulations have always played an important part in engineering to predict performance and aid design activity. Telephone switching involving millions of interconnections across exchanges all over the country was far too complex for any simulation program available at the time. Real-world trials such as the Morris ESS trial were essential to understanding system bottlenecks, design issues, and practical field problems. This trial was instrumental in directing engineers in the design of No. 1 ESS that followed five years later.

Electronics enables system engineers to scale up operations. More subscriber lines can be switched with fewer common control units. Using electronics in this way is like viewing an old black-and-white movie on a HDTV colour television set. To tap into the full power of solid-state diodes and transistors, the switch itself had to undergo a design change. Electronics alone does not reduce the number of interconnections within the switch. One key design change that was available in both the trial ESS and No. 1 ESS changed fundamentally the basic nature of switching. The switch was redesigned from a few basic control units that could be _programmed_ to operate as desired. Following the evolution computers had taken, switches were transforming themselves from giant structures of hardware logic to smaller reusable hardware components controlled by software programs. No. 1 ESS was the first commercial switch to use _Stored Program Control (SPC)_ , a practical realization derived from the early ideas of von Neumann. As many ninety programs controlled the overall operation of the No. 1 ESS switch.

The second change that happened had more to do with signalling than switching. For decades, pulse dialling had been in place. Signalling functions such as ringing a phone took up large currents and power. Electronics required new signalling solutions that could operate faster, at lower power, and bring convenience to subscribers. Moreover, pulse signalling employed direct currents that were unsuitable for long-distance trunks. Keeping these requirements in view, AT&T introduced the Touch Tone phone in 1963. Pulse dialling was replaced with tone dialling whereby combinations of tones or frequencies were used to signal digits from the subscriber to the exchange. In fact, tone dialling had been introduced two decades earlier along with No. 4 crossbar switches and soon became widely used for trunk signalling. For long, high cost of handsets kept subscribers away from the benefits of tone dialling. It was only with solid-state electronics that it became possible to reduce cost and take tone dialling to the subscriber. Tone dialling has since replaced pulse dialling worldwide. It is tone dialling that we use today on landlines. It is tone dialling that enables us to enter passwords and credit card numbers on our phones.

Pulse Dialling and Tone Dialling

(a) Digits _641_ signalled using pulse dialling. (b) Continuous tone that signals digit _6_. (c) Continuous tone that signals digit _4_. (d) Continuous tone that signals digit _1_. (e) Tone dialling to signal the digits _641_. As can be seen, this is a lot faster than pulse dialling.

Tone signalling used the same frequency band as those used to transmit human voice. Known as _in-band signalling_ , this method required no additional bandwidth. Care had to be taken in the design of the tones so that voice samples do not imitate them or signalling tones do not corrupt voice samples. _Out-of-band signalling_ on the other hand was easier in this sense but required coordination between signalling and voice circuits. In addition, it required additional bandwidth that was often difficult to justify in bandwidth-constrained channels. It would take years of research and the advance of electronics to introduce out-of-band signalling into practice. Just as common control had become a standard design principle in switching, common control signalling became standard. The signalling for many different voice circuits was consolidated into shared control channels. The complexity involved in such a design was easily handled by the power and flexibility offered by software processing.

More than the common control principle, tone dialling, and SPC switches, the greatest innovation in switching was perhaps the move from space-division to time-division switching. From the time of manual switches in 1878, through Strowger, panel, crossbar, and No. 1 ESS switches, the nature of switching has been one of _space-division_. An incoming line was connected to an outgoing line to form a circuit that remained unchanged all through the call. Different ongoing calls were separated from one another in space within the switch. In other words, each had a distinct electrical circuit. This was the state of the art in switching when engineer Maurice Deloraine got a flash of insight one night on a rail journey from New York to Chicago.

Deloraine had started his career with Western Electric in London to work on voice transmission on long-distance cables. Later with the formation of ITT and LCT, he found himself in Paris working on radio transmission. It was during this time that his colleague Alec Reeves invented PCM for multiplexed voice transmission. When Germany occupied Paris during the war, Deloraine fled to the US. While in the US, it occurred to Deloraine that if pulsed modulation could be applied for transmission, it could be done for switching as well. The switch would first sample analogue voice on various active subscriber lines. These would be stored on memory such as a cathode ray tube. In modern terminology, one calls this a _memory buffer_. The switch would then read out the pulses from the buffer one at a time and switch them into outgoing lines. So rather than having multiple space-division circuits for switching, a single buffer than implements _time-division switching_ saves cost and space. The switching fabric is shared across multiple voice circuits. This could be done because switching is done at much higher speeds than the rate at which voice is sampled on each line. In a patent of 1945, Deloraine noted that pulse switching had become possible due to electronic devices that operated much faster than relays.

Deloraine's invention faced the same fate as Reeve's PCM. Technology had to mature from vacuum tubes to solid-state electronics before practical implementations could be constructed. Experimental models and trials of time-division switching were undertaken through the fifties. LCT built a model in 1951 using PAM samples. Bell Labs' ESSEX took time-division switching to a new level by converting PAM samples to PCM encoding. PCM had the added advantage within the switch by minimizing crosstalks across voice circuits. ESSEX used _Remote Line Concentrators (RLC)_ whereby multiple subscriber lines could be combined closer to the customer premises than at the exchange. This saved precious cabling cost since only a single line from the concentrator needed to connect to the exchange. Samples were taken on the concentrator, converted to PCM, switched by time-division, and multiplexed on the single line. ESSEX proved an essential concept that it was possible to integrate both switching and transmission in the digital domain. It was time-division that finally enabled digital switching.

When PCM transmission was commercialized in 1962, digital time-division switching was still in experimental stage. This meant that while PCM/TDM was operational, switches had to convert PCM samples to analogue voice, perform space-division switching, and convert the voice back to PCM before sending the same on an outgoing line. Such frequent conversions brought down network performance. It was almost like driving on superb motorways except that each motorway was connected to another by small bumpy country roads. Travelling by any motorway was fast and convenient but whenever one needed to change to another motorway, the transition was slow and painful. The best way out of this problem was to have the interconnections as efficient as the motorways themselves. This is what ESSEX demonstrated, that to avoid frequent conversions between digital and analogue signals meant better network performance. The same scenario was repeated a few decades later when fibre optics were introduced for transmission. In such cases, switching was done electrically. It took many years for engineers to develop network devices that could switch in optical domain.

In Deloraine's invention and in the ESSEX implementation, a major limitation remained unaddressed. A concentrator could connect to another only on the same time slot. So it was often possible that calls were blocked due to unavailability of idle pairs though individual time slots might be available. Japanese engineer Hiroshi Inose finally solved this problem. In a patent of 1960, he described his concept of _Time Slot Interchange (TSI)_ whereby each PCM sample could be delayed by varying amounts so that any free time slot could be used. This eventually led to a clear description of components that made up a time-division switch.

A T-stage unit worked in the manner proposed by Deloraine and Inose. It took time-multiplexed data on an incoming line, used memory buffers, and copied out voice samples in a different order based on the TSI principle. An S-stage was essentially a space-division switch but interestingly the crosspoint connections changed every time slot. What ESSEX had done in its concentrators was implementation of the S-stage. Though ESSEX had achieved the first integration of digital switching and transmission, it was LCT engineers who first demonstrated time-division switching using the more powerful TSI principle. Modern switches use a combination of T-stages and S-stages to implement time-division switching.

Despite these incremental successes, No. 1 ESS was released as a space-division switch. Time-division was inherently complex and without sufficient maturity it was too risky to enter into commercial service. Particularly when most switching functions were software-controlled, poor software quality translated to higher risk. In the sixties, software engineering was unheard of and engineers faced enormous challenges in building software for complex systems. When problems arose in early time-division switches, engineers had the tendency to suspect the hardware but more often tracked them down to bugs in the software. The result was that time-division switches did not enter commercial public service until the 1970s. At best, they were introduced into simple systems such as _Private Branch Exchanges (PBX)_ , used to connect employees with one another in a large office. AT&T's No. 101 ESS was introduced in 1963 as a PBX time-division switch location on customer premises. It was based on PAM samples, which was alright since they travelled short distances. In the UK, the 1971 Moorgate exchange of London was operated for two years as a trial of PCM time-division switching based on S-S-T-S-S stages.

In the US, work on commercializing ESS for toll switching started in 1968. When it came to deciding on a suitable system, time-division using PCM samples was selected. It was not until 1976 that No. 4 ESS was introduced into the Bell System for toll switching, first at Chicago, Illinois. Until then, toll switching had been done mostly by No. 4 crossbar switches. The new switch was cheaper and could handle 500,000 calls per hour. Its switching structure was of T-S-S-S-S-T stages. For signalling, it used _Common Channel Interoffice Signalling (CCIS)_ , which later evolved to incorporate elements of CCITT standards _Signalling System 5 & 6 (SS5 & SS6)_. Most importantly, by using PCM, switching had finally become digital.

Since software of No. 4 ESS was complex, tools and processes had to be better. A new language named _ESS Programming Language (EPL)_ was invented. It was better than assembly languages but lower than high-level languages. As such, it gave programmers easy software constructs to work with while not sacrificing the efficiency of assembly. EPL later evolved into _EPL Extra (EPLX)_ , which introduced many facilities normally seen in high-level languages. Where performance mattered, software programmers could use a mix of EPL and EPLX. Initial installations of No. 4 ESS were not substantially different from No. 4 crossbar switches that had preceded them. Through the late seventies, these switches were redesigned for improved performance and more advanced features. Changes could be done easily since much of the control had gone into software.

Today telephone switching is fully automated. Even for international calls, the subscriber can directly dial the callee's number. A long line of advancements in switching technology had made this possible. Time-division switching had started with Deloraine in 1945 but it had taken three long decades to enter into commercial telephone networks. Through the late seventies and early eighties, more and more switches were converted to time-division. It was MOS technology and the arrival of the microprocessor chip that made this happen. The chip along with _Very-Large-Scale Integration (VLSI)_ made the transition cost-effective. Remote concentrators had started a trend towards pushing PCM/TDM transmission closer to subscribers. Such concentrators evolved into _Remote Switching Units (RSU)_ , which could not just aggregate traffic but also switch them on their own. The eighties witnessed this move from centralized switching towards distributed switching.

Just as the golden age of telephone networks was coming to an end, a new age was dawning. Speech was not the only form of human expression. There was music for which the telephone network was poorly suited. At times, people wanted to write to each other rather than talk on the phone. They wanted to share documents. Data was becoming increasingly important. When people attempted to use telephone networks to send data, new problems arose. Old ways of switching that had worked so well for voice, had to be questioned. No one suspected that what followed next would challenge even traditional voice switching, something engineers had almost perfected after a century of blood, sweat, and tears.



**The rise of** telephony through the early decades of the twentieth century quickly displaced telegraphy as the dominant form of communication technology. Telephony established the supremacy of voice over data. As a result, telephone networks expanded faster than telegraph networks; but the public's need to send data was not altogether dead. Facsimile, which had begun even before telegraphy, was witnessing a renewed interest in the sixties. For decades, stock market updates over telegraph lines had been an essential service to the financial industry. Then there were the teleprinters that were used across many industries for a good part of the century. The evolutions of facsimile and teleprinters show some similarities.

In the beginning they relied on the telegraph network. Later they made use of the growing network of telephone lines. Facsimile was initially analogue and later migrated to digital modulation along with compression. Fax modems were developed to send digital fax over ordinary telephone lines. As for teleprinters, they were digital ever since Baudot Code had been invented in 1874. When they began using telephone lines, they adopted pulse dialling. Later they adopted ASCII Code and tone signalling. Despite these advances, teleprinters lost the race to facsimile systems by the eighties. But teleprinters left one important legacy of data transmission that remains with us even today.

For each character typed on a teleprinter machine, the machine encoded it into a Baudot Code of five pulses. When many characters were typed in succession, the receiving unit had to process a continuous stream of incoming pulses. The problem for the receiver was to identify correctly the start and end of each character. If the receiver got this wrong, it decoded gibberish. Such a problem had never been faced by voice systems, which were then still in the analogue world. Even later when voice was digitized by PCM, synchronization didn't arise as a problem because PCM/TDM enforced strict timing of the pulses. T1/E1 lines carried with them timing information by design. Rise and fall of pulses occurred at defined time periods, what we may term as clocked pulsing. Higher-level framing information indicated slot and frame boundaries. PCM/TDM was a _synchronous_ system.

Since timing was so essential for the correct working of T1/E1 lines, clocks had to be accurate. Any drift in timing would cause bit errors. Indeed, it was seen that lines were limited to 150 miles, beyond which synchronization was problematic. Engineers termed this variation _jitter_. The solution was rather easy. By spacing repeaters often enough, build up of jitter was avoided. This was really the whole point about going digital. Repeaters reshaped, retimed, and amplified signals before errors could have any detrimental effect on the information they carried.

Teletypewriter traffic was not continuous like T1/E1 lines. An operator might type a few characters, pause in between to gather her thoughts, and then continue typing. She might wait for a reply from the other end before continuing. When machines read from perforated tapes and sent messages in quick succession, the problem was worse. There was no strict definition of when a character ended and another began. The two ends might start off well but over time they generally lost synchronization. Teletypewriting was an example of an _asynchronous_ system. The Morkrum Company, incorporated in 1907, faced this problem of synchronization on their prototype machines. Engineer Howard Krum, son of the founder Charles Krum, found an easy method to achieve synchronization. The idea was to frame every character with a start space and a stop pulse. This way the receiver would never be confused about character boundaries. By adding such in-band control signalling within data traffic, synchronization became possible even for asynchronous systems. A modern example of asynchronous data communication that uses start-stop synchronization is infrared transmission by TV remote controls. Remarkably, no one realized in Krum's own time that the underlying idea was present in the seventeenth-century writings of Robert Hooke or in the workings of Chappe telegraphy.

The growth in data communications through the sixties can be attributed to a number of related things—increasing use of computers, move from batch processing to time-sharing, the need for interactive computing, fast and flexible transmission lines such as T1/E1. Fundamentally, the impetus came due to the move from analogue to digital. All sorts of media could be reduced to bits. In digital form, information could travel far without significant impairment. To access computers remotely through communication lines was an idea that had found its time. Two things were necessary to make this possible—modems and multiplexers. Both were becoming digital. Both were seeing steady improvements through the decade. Error-correcting codes too were going through their golden era of development.

The idea of connecting computers over telephone lines can be traced to a defence project named the Cape Code System in the early fifties. This was an experimental system that later led to the Semi-Automatic Ground Environment (SAGE) system. SAGE became fully operational in 1963. It was designed to spread a radar net over the American airspace to detect incoming threats. Twenty-three radar sites were interconnected over telephone lines. The transmission rate was only 1,600 bps using FSK modulation. This later inspired SABRE, an airline reservation system that networked 2,000 computers in sixty locations. But sending data over telephone lines was fraught with problems. The lines were designed for voice and data had quite different needs.

To start with, loading coils introduced an upper frequency cut-off beyond which line attenuation shot up dramatically. Similar cut-off at the lower end of the band was seen in analogue carrier systems. Loading coils introduced phase distortion. Human hearing had tolerated such distortions on voice but data was less forgiving. To counteract this, equalizers had to be added on lines. Carrier systems that used noncoherent detection had worked well for voice but were detrimental for data transmission. Companders so common on voice circuits caused problems for multilevel data transmission. Impulse noise affected data much more than voice and there was no significant research on this type of noise. Then there was the problem of echo suppressors on two-wire circuits.

Normally in a voice conversation only one person talks and the other listens. So when Alice is talking, her return echo is suppressed. When Bob starts talking, the circuitry detects this transition and allows Bob's speech to travel to Alice while enabling echo suppressors in the reverse direction. A circuit in which both parties can talk at the same time is termed _full-duplex_. A conversation in which parties take turns is termed _half-duplex_. There are also systems in which only one party can talk and the other can only listen, such as listening to news on the TV. Such systems are termed _simplex_. What echo suppressors did was to convert a full-duplex circuit into half-duplex circuit on a voice-switched basis. It worked for humans but this was far from ideal for computers and data communication.

AT&T was keen to tap into the growing data market by allowing data transmission over telephone lines; but first, many of these issues had to be studied. Early research indicated that a bit rate of 650 bps was possible on telephone lines. This used a train of Gaussian pulses amplitude modulating a 1,200 Hz carrier. Interestingly, this study did not look at user data at all. It looked at signalling data. Indeed, data transmission on telephone lines is as old as Strowger switches. Every time a caller dials the callee's number, she is actually sending data to the exchange over a normal telephone line. Then came the famous 1960 study of Alexander, Gryb, and Nast.

Just as AT&T was releasing its dial-up modems, they wanted to convince the market that telephone lines were suitable for data. To this end, the study of 1960 sought to inform people the capabilities of the telephone network. The study showed that transmission at 1,200 bps was possible using FSK modulation. About 70% of the time only one bit was in error out of 70,000. At 600 bps, the BER performance was thrice as better. It demonstrated the positive effects of equalizers and error-correcting codes. Over long-haul connections, if data didn't get through reliably, the sender may be asked to retransmit. Number of retransmissions could be reduced by using suitable codes that corrected burst errors. Being an engineering study, it did not propose anything substantially new. All it did was to shoehorn data on to voice circuits using switching methods designed for voice. It did not recognize the important fact that machines communicated in ways very different from humans.

The telephone switching network takes a long time to setup a voice call. Today it can be done in a few seconds but back in the sixties a long-distance call might take half a minute or more to setup. Those were times when fast electronic switching was still in research phase. So when a call is setup, the circuit is kept alive for the entire duration of the call. Even when both parties were silent, the circuit would be kept alive until someone decided to hang up. The switching fabric might be time-division or space-division. That was internal to switch. At the level of the call, voice had a dedicated connection end-to-end. Engineers called this _circuit switching_. Because of these long connection setup times, operator-assisted calls normally imposed a minimum call charge. For example, even if one spoke for only a minute, the call would be charged for a minimum of three minutes. Statistically, voice calls on average last between two to three minutes.

So long as the voice call was in progress, its allocated circuit could not be used for any other call. To employ such dedicated resource allocation for data was simply wasteful. Computers often communicated in short bursts of data exchange. Today if one sends a Twitter update of 140 characters, the transmission happens in a flash. Then the user might go idle for long periods of time. Fortunately, telegraphy had been using _message switching_ from the very beginning. Telegraphy didn't need end-to-end connections and therefore did not reserve circuits end-to-end. Telegraph messages, however, occupied storage space at intermediate stations. Messages could be combined with other messages and transmitted together. At a later point, they would be separated and forwarded to their individual destinations. For this reason, message switching is also called _store-and-forward switching_.

The first experiments in sending data over telephone lines used circuit switching. AT&T suggested this approach even though the circuit remained idle during most of the connection. AT&T didn't care so long as subscribers paid their telephone bills. From the perspective of bandwidth efficiency, message switching was a better way to connect computers but this didn't provide interactivity that users wanted with time-sharing computers. Messages might get buffered at intermediate stations and arrive at destinations leisurely. Message switching might have made better use of communication channels but it wasted computer time. In any case, message switching was not compatible with the telephone network. Clearly, none of the known methods of switching worked well for computers. When old ways don't work in newer times, there is a need to evolve and innovate. This is exactly what happened in the early sixties and it happened independently in three places.

Paul Baran's first encounter with computers was with the UNIVAC I, completed in 1951. It was a successor of the ENIAC but built on the stored program concept. It was one of the earliest computers to become commercial. Baran analysed the UNIVAC and found that it was too unreliable to become a commercial success. One primary reason was that it was built from thousands of unreliable vacuum tubes. A decade later, now with RAND Corporation, he focused his attention on the reliability of networks. This was the time of the Cold War and the US government was concerned that a massive attack on the country would paralyse all communication channels. In fact, earlier atomic bomb testing in the Pacific had shown that radiation from the blast disrupted short wave wireless communication for hours. Baran starting thinking about a network that could survive large-scale attacks.

The first problem Baran attacked was the way network nodes were interconnected. A telephone network had a centralized topology since all subscriber lines in an area connected to the same local exchange. Local exchanges connected to primary exchanges one level higher. In this way, the telephone network was centralized and hierarchical. Such a network would not survive an enemy attack. It suffices to take down a few critical exchanges to disrupt communication. Baran saw that the solution lay in a distributed mesh-like topology in which nodes were interconnected through multiple communication lines. This redundancy meant that connectivity would be maintained even if some lines were destroyed. This change of topology had a bigger implication. It meant that even if individual connections were unreliable on their own, collectively, a network of connections built with sufficient redundancy was reliable.

Typical Network Topologies

(a) Office networks involving computers, servers, and printers share a common channel called _bus_. (b) Satellite networks implement a _star_ topology in which the satellite forms a centralized node for switching. (c) Telephone network is centralized but also hierarchical. (d) With _mesh_ topology, multiple routes and connections exist within the network. This is the model of the Internet.

It was then natural to ask about the routes messages would take through a network that had multiple paths. The nature of the problem was not any different from message switching in telegraph networks. The difference was that in this case messages should suffer minimal delay or buffering at nodes. Baran introduced what he called _"hot potato" routing_. The concept was that no one could hold a hot potato with bare hands without burning oneself. So the hot potato must be quickly tossed over. The analogy to network routing was that any message arriving at a node must be treated like a hot potato. The node must toss the message quickly to another node on the way to its destination. If a particular route was busy, the node should not wait. It should use an alternative route and transmit the message immediately. Equally important was the fact that nodes could learn these routes to destinations on their own without centralized control. Each message carried with it a record of the number of nodes it had crossed on its way from the source. This in-band control information assisted each node in learning about the network. By extensive computer simulations of a network of fifty nodes, Baran showed that nodes could learn the routes in under half a minute. Better still, when nodes or links went down, nodes would relearn the best alternative routes and update themselves. A combination of redundancy, self-learning ability, and decentralized maintenance of routes was the perfect foil against the rigidity of circuit switching.

There was an alluring subtlety to the discovery and maintenance of routes. Years ago, Shannon had built a mechanical mouse that could make its way through a maze. When it hit a barrier, it remembered. When it found a path, it remembered. In this way, the mouse kept track of the routes through the maze until it arrived at its goal. Moreover, if the maze was modified, Shannon's mouse could relearn the routes. But Nobel Laureate Dennis Gabor later remarked, "In reality it is the maze which remembers, not the mouse." Since the routes are already laid out by design, Gabor was right in some sense. For network routing, messages don't remember the routes. All a message carries with it is a destination address. Routing information is embedded within the network in a distributed fashion at each node. Routing was almost like postal and telegraph dispatches—the customer needed to specify only the recipient's address while the system took care of the means of delivery. Only one piece of the design was missing and it was to be Baran's greatest contribution.

Messages, used in the general sense of the word, are complete entities of information between communicating parties. An email is a message. An electronic document is a message. A JPEG picture is a message. The problem with messages is that they have no defined length. Many messages are short and a few messages are perhaps extremely long. This means that long messages at nodes can keep short ones waiting. This translates to loss of interactivity. Computer users would get impatient at their terminals. While a node is servicing a long message, short messages might accumulate and fill up node buffers in ways similar to message switching in telegraphy. Baran proposed an innovative way to overcome this,

Traffic to be transmitted is first chopped into small blocks, called Message Blocks, or simply messages. These messages are then relayed from station to station through the network with each station acting as a small "post office" connected to adjacent "post offices." Messages will eventually arrive at the desired destination. In the proposed system, the transmission time and storage time at the nodes is so short that to the user there appears to be a direct link between the originating station and the end station.

Across the Atlantic, at the British National Physical Laboratory, Donald Davies started with a different focus. He wasn't trying to build a robust network that could survive a nuclear holocaust. Recent deployments of digital transmission lines and advances in time-shared computers fed into his thought process. He envisioned a network that could connect time-shared computers, user terminals, synchronous peripheral devices, and asynchronous teleprinters. In other words, devices that connected to the network were quite diverse. The goal was to satisfy each one of them by using a combination of multiplexers, buffers, high-speed digital transmission lines, and network nodes responsible for routing of messages. In Davies' terminology, the basic unit of network transmission was not called a message. He gave it a new name that stays with us today: _packet_.

Essentially, packets were what Baran had named message blocks. In fact, both were defined to be of length 1024 bits, which was typical of messages in time-shared systems. When devices generated longer messages, they would be broken up into smaller packets. Each packet would presumably take its own route through the network to arrive at the destination. There they would be reassembled in the correct order to form the original message. This sort chopping up and stitching together of packets had many advantages. Each packet found the best route at that point in time when it passed through the network. Being small, no packet held up other packets. This also meant that nodes needed less buffers than in message-switched systems. Buffers, when required to meaningfully coordinate asynchronous devices, where moved to devices themselves or network interfaces. For example, in some applications there was no point transmitting every character that the user typed. It made sense to buffer the typed characters until the user provided a newline character. Only at that point, computer processing was invoked and network resources allocated.

Another important advantage was the error resilience of a packet. Voice was tolerant to errors. When 8,000 samples are taken per second, even if a hundred samples are corrupted by noise the human ear hardly perceives the loss. Data was different. Every bit mattered, more so if it happened to be control information or compressed data. Loss of critical data can wipe out bank balances, corrupt user identities, and render documents invalid. It was therefore important to receive every bit free of errors. A long message of 10,000 bits was more error prone than a smaller one of 1,000 bits. If a message got corrupted by noise, there was no alternative but to ask the sender to retransmit the whole thing. Packet transmission overcame this problem elegantly. Use of small packets reduced the probability that a single transmission would go waste. If it did, it was sufficient to retransmit only that corrupted packet instead of the entire message.

The third person to analyse network behaviour was Leonard Kleinrock, born in Manhattan of Ukrainian Jewish immigrants. Right from the age of eleven, to make ends meet, Kleinrock had to work and study at the same time. For his bachelor's, he took evening classes at the City College of New York while having a regular day job. He soon realized that the evening faculty were often drawn from the industry. He appreciated that they saw beyond the veil of theory and knew what really worked in practice. When he got into MIT for his PhD, he was spoilt for choice. The faculty was a stellar cast, the best minds in information theory—Shannon, Wiener, Huffman, Elias, and Wozencraft, among others. He briefly worked under Shannon but the prospect of doing a dissertation on the game of chess didn't appeal too much. Given the growth of minicomputers and time-sharing, he saw that computers would not remain isolated for very long. It looked likely that large networks of computers might be just around the corner. With this view of the future, Kleinrock did some preliminary work and put out a proposal in 1961.

Kleinrock's view of the network had a strong theoretical approach. At its heart is a branch of mathematics called _queueing theory_. The theory is intimately connected to random processes, probability, and statistical analysis. Statistics is a far more powerful weapon than deterministic methods since the latter are often impossible when we consider large network of nodes. The process by which terminals generate messages is statistical. Message lengths vary and hence follow some statistical distribution. Messages arrive at nodes along multiple communication channels. Network routers buffer them in queues. When their turns come, they are sent out on relevant channels. Channels have varying bit rates. The complexity comes because all these different aspects of the network are interrelated. Some reasonable assumptions of independence must be made to reduce the problem into a tractable nature. This is where Kleinrock started his research. He adapted prior knowledge in queueing theory to data networks in a manner no one had done before. This is his important contribution to data communications. Then came the famous controversy that surprised everyone.

Today, almost all data networks run on _packet switching_. Baran first conceived the idea of breaking up messages into smaller blocks in 1962. Davies did the same in 1965 and coined the term _packet_. Kleinrock never explicitly stated this key innovation. His PhD thesis and later publications deal primary with messages whose lengths vary and follow an exponential distribution. Then in 1996, he made a bold and belated claim that he was the inventor of packet switching since he had grasped the idea as early as 1961. The world of computer networking was immediately divided into two camps—those who supported Kleinrock and others who supported Baran and Davies. Davies himself gave Baran credit for inventing packet switching. Davies viewed Kleinrock's work as essentially message switching. If there is any validity to Kleinrock's claim, it lies in an imaginative interpretation of the way each node processes messages in its queue.

Different Ways of Switching Voice and Data

(a) With circuit switching, each call has a dedicated path through the network. (b) When TDM trunks are used, calls will share the same trunk but will still have dedicated circuits. (c) Message switching is a connectionless service. Messages are stored at intermediate nodes before being dispatched to destinations. (d) With virtual circuit switching, messages are broken up into smaller packets and all packets take the same route. (e) Packet switching uses small packets, each taking its own route through the network.

In general, when a node is ready to offer service, it has to decide which message in its queue should be serviced next. The manner in which a queue is serviced is variously called _queueing discipline_ or _scheduling algorithm_. This is one of those areas of queueing theory that has enormous practical importance. Under reasonable assumptions, it helps managers decide how many service counters should be made available in a post office. It helps them decide if it is better to have separate queues for bill payments and postal services, or combine them into a single queue. The objective is faster service to customers while keeping all the tellers busy. Scheduling algorithms are directly relevant in time-shared computers, multitasking operating systems, and wireless data channels shared by many users in an area. In all these scenarios, the scheduler must decide who gets how much of computing resource or communication bandwidth, without either starving some at the expense of others or making inefficient use of system capacity.

When many messages are in queue, a simple strategy is to service the first one in the queue. If messages have priorities, high priority messages are serviced first. With messages of varying lengths, Kleinrock suggested that each message should be serviced for a fixed time period. If the message was long, it would be partially serviced and the rest of the message would be returned to the queue for the next round of service. In doing so, Kleinrock extended a concept familiar in time-shared computing. Time-shared computers and Kleinrock's idea of partial fixed-time service came under the name of _Round Robin Scheduling_. Davies claimed that Kleinrock's idea of breaking messages in this manner is limited to a single node. There is no extension of the concept to the level of multiple nodes and its impact on node buffering and routing. Kleinrock retorted that there was no reason for him to consider a single node when his entire research had been on networks.

Priority over invention is important for researchers because peer recognition is an essential part of their reward system. Researchers are certainly not altruistic as what Harry Truman might desire. The American President once observed, "It is amazing what you can accomplish if you do not care who gets the credit." Most engineers who read about inventions focus on the context. They wish to get into the minds of scientists and the process of discovery. Priority to them is just an interesting side story. As for users, they usually don't care too much about what's under the hood. They are happy so long as technology works. Thus, it happens that all of us may be on the same page, but each one reading different portions of it that suits individual goals.

Packet switching was conceptually proven through simulations but no one had actually built a real network. Davies and his team experimented with a pilot network but lacking funds, it went nowhere. The British Post Office was a monopoly that believed only in circuit-switched technology. There were doubts whether packet switching would work in the real world. Engineers are sometimes suspicious of new research, partly because it challenges their current knowledge and forces them to adapt. Circuit switching set up a definite path from source to destination at the start of every call. Messages then followed this path without any problem. Packets did not have this luxury. Routes had no centralized control similar to telephone exchanges. Questions were raised. What if nodes got the routes wrong? Could packets go around in circles? What if buffers filled up or nodes locked up due to overload? Without centralized control, how would the system recover? Despite these doubts, the time was ripe for the birth of a packet-switched network.

When Robert Taylor became the director of ARPA's Information Processing Techniques Office (IPTO) in 1966, he noted that he had three different terminals on his desk. Each one connected to a different time-shared computer. IPTO itself had been created only four years earlier with J. C. R. Licklider as its first director. With the growing volume of data, IPTO's objective was to put computers to the processing and analysis of data. It was through Licklider's brilliance that research projects were funded in various institutions all across the country. One of its great projects was Project MAC, which brought out the CTSS time-sharing system at MIT. Licklider's envisioned an "intergalactic network" that would connect millions of computers and make information accessible from anywhere by anyone. As Taylor took over the reins, he too saw that the future lay beyond time-sharing. It was going to be about connecting computers together. When he therefore looked at the three terminals on his desk, he asked why not have a single terminal to access all three computers.

In a now legendary meeting, Taylor walked into the office of ARPA's director, Charles Hertzfeld, and proposed the idea of connecting computers together. Within fifteen minutes, Hertzfeld sanctioned a million dollars for the project. The first thing Taylor did was to recruit Lawrence Roberts who was then at MIT. Roberts was quite comfortable at MIT and didn't want to move. In another legendary tale, Taylor got Hertzfeld to blackmail MIT Lincoln Laboratory's director to send Roberts over. If not, ARPA would stop funding the lab from now on. Roberts arrived at ARPA's office at Washington in a couple of months. Only the previous year, Roberts had connected two computers to exchange messages over a leased line. One was at MIT and the other at System Development Corporation in California. This experiment showed that performance was poor. Importantly, the experiment did not use packet switching, which is somewhat surprising since Roberts knew Kleinrock. Both were then at MIT. If this experiment had only a marginal success, the challenges that Taylor faced were of a different order.

Computers were meant for important data processing. Not everyone wanted to allocate their precious computing resources to manage routes. In fact, very few were even convinced that computers needed to be interconnected. This pushed Roberts to conceive of the _Interface Message Processor (IMP)_. The IMP would be a special device that was the entry and exit point of the network. It received messages from computers and took care of routing. There was no need to connect every computer to every other computer. Such a mesh network is not scalable. Rather, have a few interconnected IMPs to do routing of messages through the network. Computers, which were also called _hosts_ , could do their precious computations without worrying about routing. This was the conceptual beginning of ARPANET. It happened in 1967 and would later evolve to the modern Internet. At the present, the immediate task was to build these IMPs.

It is not exactly clear when and how Roberts got the idea of packet switching, and incorporated it into the design of ARPANET. Roberts later claimed he got the idea from Kleinrock, who was involved in the initial deployment of ARPANET. Others claim that packet switching got its first public announcement at the Gatlinburg ACM Symposium in October 1967, where NPL researchers presented Davies' idea. The ARPANET team was also present at this symposium but none of their papers gave any hint of packet switching. In any case, IMPs were tasked with the job of segmenting messages into packets on their way out. IMPs connected to destination hosts would also reorder and reassemble packets. Intermediate IMPs would do the routing. Serious work on building IMPs started in 1968 when companies were invited to send in their technical proposals. All they were given were high-level requirements. Big companies including IBM chose to ignore. From the few who bid for the contract, the small firm of Bolt Beranek and Newman (BBN) was picked as the winner.

Robert Kahn at BBN was intimately involved in this project. He started by building a simulation model. It was a lot wiser to make and correct mistakes in simulations than in the real world. The first IMP was delivered in September 1969 to the University of California Los Angeles (UCLA). It was Kleinrock at UCLA who took delivery of this IMP. A single IMP did not make a network and it had to wait for a partner. The second IMP was installed at Stanford Research Institute (SRI) a month later. The first host-to-host communication on ARPANET was not without surprises. The intention was to type "login" but when the character _g_ was typed at UCLA, the host at SRI crashed! It took engineers an hour to rectify the problem. With this, the first ever message on ARPANET was transmitted on October 29, 1969. A milestone in the early history of the Internet had been marked. By December, ARPANET consisted of four nodes connected by 50 kbps leased lines.

The fundamental change that had happened in the process was that computers were now starting to communicate. It was no longer about just computations. Computers of the past had been using _client-server_ models, sometimes called _host-terminal_ models. One party controlled the interaction and the other obeyed. It was a centralized control that was not deemed suitable for ARPANET. ARPANET saw all IMPs as equals. They were rigged up together in the network in _peer-to-peer_ configuration. Responsibilities were shared and no single IMP asserted superiority. When computers start to communicate in this way, they need a common language. IMPs had no trouble talking to one another because they all came from BBN. The problem was with the host computers. Each host was a different computer. Researchers at each site had to write programs to tweak their host interfaces to IMP specifications. Clearly, a standard of communication was required for all hosts across the network.

It was in April 1969 that a graduate student at UCLA, Stephen Crocker, sent out by postal mail a document on the handshaking between an IMP and a host. The document itself was like an invitation, named _Request for Comments (RFC)_. This mail of Crocker immediately set the tone for networking research. From the outset, standardization was to be a collaborative effort. The process was democratic. Comments were always welcome. Authors initiated documents and standards but it was the research community that finally owned it. In a sense, even the process of standardization mirrored the underlying philosophy of ARPANET. Everyone was a peer. Supervisors, if any, would exist only for administrative purposes. By the summer of 1970, the _Network Control Program (NCP)_ was finalized. Any host that implemented this software would be able to talk to another host on ARPANET. NCP thus became a primary enabler of ARPANET's early growth.

By the close of 1971, ARPANET had 15 nodes connecting 23 hosts. A year later, there were 37 nodes. In 1972, a public demonstration took place. The following year, ARPANET and packet switching were proven over the Atlantic. The link from Virginia to Brighton used a mix of satellite links and leased lines.

ARPANET had been designed to enable researchers to access remote resources, share documents, and exchange research findings. What packet switching and NCP had provided was the basic infrastructure to interconnect sites. The missing link was application software to empower users to do what they wished to do. In these early years of the 1970s, two powerful applications emerged. _File Transfer Protocol (FTP)_ enabled users to download or upload files to remote hosts. TELNET enabled users to log into remote hosts and run their programs or commands interactively. The third major application that arrived on the scene surprised everyone. It would soon become the "killer app" of ARPANET.

With the coming of time-shared computers in the sixties, users started leaving messages for one another on the computer they shared. This evolved into exchange of messages across computers. Often this was little more than transferring files from one computer to another. On the face of it, it appeared to be a modern extension of personal exchanges and "online chatting" that telegraph operators had been doing for decades. If we are asked to trace the genesis of email in its modern form, perhaps it all began with Ray Tomlinson of BBN. It was Tomlinson who imported the use of "@" symbol as part of email addresses. The symbol separated the recipient's name and her location in the world of computers. At first, email seemed a rather frivolous use of computing and communication resources. In fact, when it went public on ARPANET in 1972, it was piggybacked on FTP. Its importance was not obvious.

An email address created a user identity in cyberspace, that is, in the world of computers. Email brought the feeling that one interacted with other users than with the computer itself. This was a new alternative to picking up a telephone or sending a telegram. For the first time, the strict separation between humans and computers were blurred. Humans could exist, as it were, inside computers. With the creation of alternative identities, people realized that there was a deeper reality beyond the world in which we live. There was no requirement that cyber identities have any relation to real-world identities. Users could give themselves fanciful names such as _prettygirl_ , _bigbully_ , _godman_ , or _startwinkle_. Though virtual reality would take many more years to develop, and is often associated with graphics, at least in a broad sense its beginnings are in the simple email. The greater impact of email was the way it revolutionized human interaction. ARPANET had been designed to enable computers to communicate. No one had imagined ARPANET as a tool for human communication. This is exactly what happened. It turned out that to communicate was humanity's basic need.

This need was underscored in September 1973 when Kleinrock and Roberts interactively communicated using a program called TALK. Kleinrock had just returned from a conference in Brighton, England, when he realized that he had forgotten his razor. From his home in Los Angeles, he got in touch with Roberts, who was still at the conference. The two communicated by text across the Atlantic via a satellite link. This became the world's first interactive online chat and it enabled Kleinrock to get his razor back. They weren't talking in the usual telephonic sense but there was clear sense of personal contact and almost real-time interaction. Only a decade ago PCM/TDM had shown that speech could be transmitted in digital form. TALK showed that interactive human conversation could take shape purely in the form of bits, transmitted and consumed as bits. This was the fundamental way in which emails and chats transformed human communication. Telegraph operators had done this in the past but now everyone could communicate this way directly.

About the same time, halfway across the Pacific on the tropical islands of Hawaii, a different type of research was underway. In 1968, Norman Abramson of the University of Hawaii had started a project with the intention of connecting computers across four islands. He was assisted in this endeavour by W. W. Peterson. Both Abramson and Peterson were respected coding theorists. Their books on coding theory published in the early sixties are classics even today. Now in the late sixties, they turned their attention to time-shared computers and computer networks. The difference was that they were attempting to connect computers by wireless links.

With leased lines, every host has a dedicated line going to its IMP. IMPs connected to one another using dedicated lines, though these lines were shared at the level of hosts and applications. The essential problem with wireless is that access itself is shared. It is a _multi-access channel_ for which many users compete for transmission bandwidth. So if both Alice and Bob attempt to transmit at the same time to a receiver, their packets will interfere with each other. This is termed _packet collision_. The receiver may be able to pick out some bits but most bits may be in error. Alice and Bob have to take turns to transmit but a new mechanism had to be invented to coordinate the actions of remote parties. An easy way would be to timeshare the channel across all users in specific time slots. Another option is to poll every user regularly and reserve time slots when users wish to transmit. Both approaches, fall short of the goals of networking. There is reliance on a central controller to perform channel assignments. Fixed reservation does not consider the bursty nature of data communication.

The approach that Abramson chose for the Aloha System was _random access_. When terminals were ready to send packets, they simply sent them on the air. The intended recipient sent an acknowledgement if packet was received correctly. If not, the sender waited for a random time interval and retransmitted the packet. From the outset, the Aloha System was packetized so that no terminal with a long message would hog the channel. In fact, the system used a modified IMP of ARPANET except that it was enhanced with a wireless transceiver subsystem. Aloha packets used CRC for detecting errors so that probability of not detecting an error was one in a billion.

The Aloha system inspired Robert Metcalfe at Xerox PARC to conceive of a similar system to interconnect computers at PARC. The motivation was that technology had made computers fast and efficient but computers were quite slow when it came to talking to one another. A printer might print an A4 page in a few seconds but it took fifteen minutes to send that data from a computer to the printer. It was in this context that Metcalfe wanted to interconnect digital machines efficiently. Unlike Aloha, Metcalfe could use wired media and coaxial cables fulfilled the role well. These cables had been in use since the 1940s in FDM carrier systems. As undersea cables, they were intercontinental lifelines for the global telephone network. Now they were recast from the world of analogue to the digital world of bits. Coaxial cables offered a high bit rate of almost 3 Mbps, which was much higher than what ARPANET was using at the time. All devices would share the same cable and employ random access. The network itself was in bus topology whereby all devices tapped into the shared cable. So if any device went down, none of the others would be affected. Each packet would carry the receiver's address and only the addressed recipient would process the packet. Metcalfe summarized the essential change that was happening in networking,

Just as computer networks have grown across continents and oceans to interconnect major computing facilities around the world, they are now growing down corridors and between buildings to interconnect minicomputers in offices and laboratories.

With Metcalfe's networking program, computers at PARC were able to talk to one another. All computers could access a shared printer. Anyone could log into any machine from a single terminal. Sharing documents electronically became easy. This was the starting point for the transformation of the office environment. Metcalfe's system was operational by 1973. This development happened almost at the same time when PARC's Smalltalk was born and Alan Kay unveiled the Alto's GUI. Such a network interconnecting computers within a small area came to be called a _Local Area Network (LAN)_. Metcalfe's system was standardized years later as _Ethernet_ , which is today the dominant LAN technology. Wireless LAN evolved from here. Thus, the technology of modern Wi-Fi hotspots can be traced directly to the early work in the islands of Hawaii. Although in the UK, Maurice Wilkes of Cambridge University developed a popular LAN technology in ring topology, it remained within universities. There was no drive to standardize it commercially. Ethernet emerged as an internationally standard simply because DEC, Intel, and Xerox got together and standardized it.

These developments opened the eyes of ARPANET community. Their original intention had been to create a single network to share resources. Now they were discovering a multitude of networks cropping up in various places. There were satellite networks commonly used by the military. There were ground-based wireless networks such as Aloha. Then there were LANs like the one at PARC. Interconnecting these diverse networks was not trivial. Since IMPs took full responsibility of packet delivery, this placed a heavy burden on the network. Ethernet on the other hand placed that responsibility on the devices themselves. Their interconnection was just a coaxial cable. Inspired partly by the way Ethernet did things, ARPANET made a fundamental change. Hosts would do all packet level processing including acknowledgements and retransmissions. IMPs would handle the routing. This was like saying that the postal system will delivery any registered mail but only the recipient will sign and return the acknowledgement slip. Post offices along the line would not generate acknowledgements.

By now, Robert Kahn had left BBN and joined the ARPA team. Collaborating with Vinton Cerf, the two formed one of the famous partnerships in technological history. It was equal to the partnership between Morse and Vail, or Bell and Watson. What they were looking for was a single protocol that all hosts and IMPs understood. In 1974, they proposed a new protocol for networked devices and it was named _Transmission Control Program (TCP)_. TCP was a host-to-host language and it would guarantee packet delivery. While its predecessor NCP was limited to IMPs, TCP went right up to the hosts. This marked the beginning when IMP-to-IMP semantics was replaced with end-to-end semantics. Packet processing was beginning to shift slowly from inside the network to the peripherals.

To solve the problem of diversity, Cerf and Kahn proposed a new network entity called the _gateway_. Just as modems straddled analogue and digital worlds, gateways would straddle two different networks. These gateways would receive packets from one network and convert them into a format that the other network understood. The inner contents of each packet, formatted by the rules of TCP, would remain untouched. Thus, TCP semantics persisted host-to-host but various networks en route spoke in their own languages. Gateways acted as necessary translators. The name "Internet" evolved from this idea. Internet was not a single network. It was an interconnection of diverse networks. TCP was almost a gift to ARPANET, which could not have scaled without it. But not everyone was convinced that TCP was the right solution. One of them was Danny Cohen at the University of Southern California (USC).

Robert Kahn had initiated in 1972 a program to carry packetized voice on ARPANET. As it had been in earlier times, this was driven by a need to encrypt voice, something that couldn't be done easily with analogue waveforms. Cohen was part of this project at USC, which happened to be one of the first four nodes of ARPANET. Cohen saw that TCP was ill suited for the needs of voice. TCP guaranteed packet delivery by way of retransmissions. These retransmissions brought reliability but introduced delay. If network experienced congestion, TCP throttled its own packet transmission rate. Voice was a real-time application that was not tolerant to delays or congestion control. The unspoken truth wasn't really that data was tolerant to delays and voice wasn't. It's just that computers didn't mind delays but humans lost patience. A telephone line that went silent for ten seconds just because some voice packet was being retransmitted irritated users. TCP worked well for services that required error-free delivery more than immediate delivery. File transfers and emails came in this category. In this context, Cohen

characterized the difference between real-time traffic and reliable data transmission as the difference between milk and wine: you had to deliver the milk quickly before it spoiled even if you spilled some on the way, but you could deliver wine a lot more slowly.

To solve this problem, Cohen set aside TCP and instead created a new protocol that he named _Network Voice Protocol (NVP)_. In August 1974, the first demonstration of packetized voice over ARPANET was proven between USC and MIT. Using a form of delta modulation for voice compression at 16 kbps, the application performed poorly. After all, ARPANET nodes were linked by only 50 kbps leased lines. Fortunately, LPC had already been invented the previous decade and researchers had subsequently come up with a real-time implementation of LPC. At 3.5 kbps, LPC gave excellent compression of voice samples. A subsequent demonstration two months later showed that real-time packetized voice could work with good performance. The success of this project woke up the ARPANET community, which desired that NVP must not exist separately from TCP. They must somehow be reconciled to result in a single standard.

As a direct result of this work, TCP was broken into two parts. One part was responsible for reliability, retransmissions, and congestion control. The other part was responsible for addressing and routing. Thus was formed the twin components of the now famous _Transmission Control Protocol/Internet Protocol (TCP/IP)_. This separation was formalized in 1978. As for voice, it would ride on IP but use a new protocol named _User Datagram Protocol (UDP)_ , which was explicitly created to handle real-time traffic. Voice riding on top of IP acquired a special name, _Voice over IP (VoIP)_. Lack of availability of commercial solutions to compress voice in real time, lack of sufficient bandwidth for subscribers, and the general limited capacity of the Internet in its early days prevented VoIP from becoming an immediate success. Real interest towards VoIP started in the mid-1990s. Skype was founded in 2003 to offer VoIP service to customers who had Internet connections. Today, Skype and its competitors offer reasonable voice plus video call services over Internet. Voice, which had been for more than century a circuit-switched service, finally became a packet-switched service. If VoIP is perceived as a recent technology, nothing can be farther from the truth. It is VoIP that triggered the birth of TCP/IP. It is almost ironic that ARPANET, which started as a data network, discovered its most important component not from data but from voice.

With the break-up of the old TCP into two parts, the power of modular protocol design became obvious to all. Protocols were not to be designed as monolithic monsters of high complexity. They were to be designed in layers, each layer fitting snugly between layers above and below it. Even with the old ARPANET, protocol engineers had identified these layers. IMP-to-IMP communication was one layer. Host-to-host was another layer. Applications such as FTP formed the top layer. This notion of layered protocol design was formalized and has since become accepted practice for all communication protocols. Applications formed the top layer. They relied on services of TCP or UDP below them. TCP relied on IP layer for the routing of packets. Below IP were other layers closely tuned to the requirements of individual networks. Ethernet, for example, used its own layer that sat below IP. The electrical and physical characteristics of coaxial cables formed the bottom layer on which Ethernet packets travelled.

The flexibility that layered design brought was that one layer could be developed independently of another so long as their interfaces were clearly defined. A particular lower layer could offer its services to multiple and diverse higher layers, just as IP offered its services to both TCP and UDP. A highly complex communications system could be broken down into simpler layered components. Layer-to-layer interaction simplified the dynamics of protocol behaviour. Even network devices could be classified based on the primary layer in which they operated. _Routers_ looked at IP packets and therefore they were Layer 3 devices. _Switches_ looked at Ethernet headers and therefore they were Layer 2 devices. _Hubs_ simply repeated incoming packets on to all outgoing lines. They were Layer 1 devices.

Layered design was in some ways similar to office hierarchy. Higher-level managers in different organizations initiated projects. Lower-level engineers jointly coordinated and executed these projects. Once the engineers had completed their tasks, they informed higher-level managers. Protocol layers interacted in similar fashion. Once IP packets were received, they were passed on to TCP layer. Once all packets were received, TCP reordered them correctly, reassembled them into a complete message, and passed on to the higher application such as FTP. FTP did not worry about addressing, routing, or delivery. Since lower layers did that, it focused only on transferring files.

TCP/IP was only a small part of a _protocol stack_ that may consist of as many as seven layers in a standard model. Competing against TCP/IP, alternatives were defined, mainly at organizations backed by large corporations, governments, and telecom operators. Within IBM, Systems Network Architecture (SNA) emerged with six layers. DEC formulated its own networking protocol of five layers. Years later in the 1980s, Novell came out with a networking protocol, which was packaged commercially as NetWare. Novell's IPX/SPX was popular for a while in office environments because it simplified network installation and maintenance. In Europe, the Open Systems Interconnection (OSI) Reference Model began to take shape in the early 1980s and from it emerged the X.25 standard. X.25 standardized an important variation of packet switching. Instead of every packet taking its own route through the network, X.25 dictated that a _virtual circuit_ be setup at the start of every call. Unlike circuit switching, end-to-end circuit resources were not reserved. Unlike packet switching, all packets took the same route.

In the end, the virtual circuit approach performed poorly in the face of congestion and network changes. It was only TCP/IP that survived. This was a vindication of the collaborative process that had created it. Packet switching, TCP/IP, and Internet, had all been motivated less by profits and more by what users wanted. They had developed in an open environment that cared little about ownership or personal egos. Many scientists and engineers had worked in almost philanthropic style, documenting carefully everything they did, producing working systems that worked in the real world.

It may appear that TCP/IP was more or less ready by 1978 when it was first formalized, but it took many more years to perfect it. The devil, as they say, is in the details. It was only in 1983 that ARPANET officially adopted TCP/IP. Meanwhile in the late seventies, data communication was happening in a different space and much of this was the work of hobbyists. Microcomputers had made computers affordable for homes and small businesses. When connected to a modem, computers became communicating devices. Dennis Hayes and Dale Heatherington invented a 300 bps modem and offered it at a price home users could afford. Their more famous 1981 Smartmodem simplified dialling and answering by automating the handshaking process. This combination of microcomputers and easy-to-use modems was just what the hobbyists needed.

Users could log into a remote computer and post a message. They could read messages posted by others and type in their own personal comments. This was how the Computer Bulletin Board System (CBBS) came about. It didn't matter to hobbyists that the system used message switching. The use of an old technology in a novel way was in itself exciting; and it came with minimal cost of device ownership and telephone line charges. Just as ARPANET had started a virtual community among researchers, these bulletin boards created communities and interest groups for common people. It is from this system that Fidonets emerged in the early 1980s. Even today, Fidonets continue to exist in remote places where Internet has not managed to penetrate. Internet required network routers and service providers but all that a Fidonet needed was a telephone line.

The timing of ARPANET was almost perfect. It was in the early seventies that both UNIX and C were invented at Bell Labs; and UNIX was almost free for universities. Now that research facilities were getting connected via the ARPANET, UNIX became a common platform on many hosts. The University of California at Berkeley bought a copy of UNIX in 1974. Bill Joy teamed up with few other students to enhance UNIX. Perhaps the best thing they did was to implement TCP/IP and offer it along with their version of UNIX. Berkeley Software Distribution (BSD) UNIX became so popular that even ARPA recommended it. IBM did not include TCP/IP in its System/370. Neither did DEC in the VMS operating system for its VAX line of computers. Ironically, BSD UNIX was developed and configured to run on a VAX machine.

When the IBM PC was released in 1981, it was not powerful enough to run TCP/IP. In fact, for much of this period PCs were used to access bulletin boards and Fidonets. It was the same with the Macintosh of 1984. The gap was filled by a new breed of computers that were somewhere between the power of minicomputers and the desktop appeal of microcomputers. These were the _workstations_. Among the successful companies making these were Sun Microsystems, Hewlett-Packard, and Silicon Graphics. It was to Sun Microsystems that Bill Joy went upon leaving Berkeley and he took with him BSD UNIX. This was modified and packaged as the Solaris OS. Naturally, it had built-in TCP/IP. In addition, Sun implemented the Ethernet standard for its workstations. By avoiding in-house proprietary technology and embracing open standards, Sun was making all the right technical decisions. Networking was given due importance from the very beginning. Sun saw that the sum of parts would be greater than the whole. Even its marketing slogan reflected this: "The network is the computer."

Concepts in Networking

UNIX-based services emerged about the same time as BBS services. These came to be called Usenet newsgroups. Like BBS, users could post messages and replies. Newsgroups were organized by topics and sub-topics. While ARPA was maintaining distribution email lists to enable communication within groups, newsgroups changed the model of interaction. It placed the control directly in the hands of users, who could choose to subscribe or unsubscribe to any group at their will. With data came databases and structured information storage systems. When these needed to be searched, Gopher emerged as a service to query them remotely and find relevant information.

Popular use of UNIX and the rise of Usenet encouraged users, particularly in academic institutions, to share software freely. Against this trend were expensive proprietary software from the likes of IBM, Apple, and Microsoft. In a now well-known story, Richard Stallman of MIT found a problem with a new laser printer donated by Xerox. Xerox was not interested in giving a fix. Neither were they ready to release the printer's source code. At this point, Stallman took a philosophical stand never to support proprietary software. In 1983, he set out to develop an entire operating system modelled on UNIX but gave it an interesting acronym, GNU (Gnu's Not UNIX). The core of his philosophy was to create software that was free. Free meant freedom from restrictions rather than being zero cost of purchase. Freedom meant "free speech, not free beer."

Stallman explained the concept in terms of cooking recipes. One may obtain such a recipe from a neighbour. During the course of cooking, one may try out slightly different ingredients or modify some steps of the process. Subsequently, others who taste the dish might request the enhanced recipe without even obtaining explicit permission from the original creator of the recipe. Free software meant that people had the right to access source code, were free to make improvements, and then release the improved software back to the community. The essential idea was sharing, not secrecy. Two years after the initiation of the GNU Project, Stallman formally announced the Free Software Foundation (FSF).

There was one loophole in Stallman's vision. There was no system in place to prevent people from obtaining free software, making minor enhancements, and selling the result commercially without releasing the enhanced source code. To fix this, Stallman released in 1989 the GNU General Public License (GPL). This introduced the concept of _copyleft_ rather than the traditional copyright. While copyright prevented unethical copying of original works, copyleft required that derivates of free software must themselves be free. In other words, GPL protected the freedom to copy, modify, and distribute software.

Over time, GNU software proved efficient, reliable, and compatible across diverse systems. The main problem for the GNU community was the lack of an OS kernel, that core part of an OS controlling most of the critical operations. The GNU community had written many excellent software utilities but had tried and failed to produce the kernel. It was in this context, almost by chance, that _Linux_ was born in 1991 at the University of Helsinki. Its creator was Linus Torvalds. It was not meant to be a commercial OS and Linus had no idea how to prevent people from making it commercial. GNU GPL therefore proved to be the perfect licensing model and Linus adopted it for future versions of Linux. Since then the OS is often known by the term GNU/Linux.

Neither Stallman's vision of free software nor the open source methodology of Linux would have succeeded in a world of isolated machines or programming teams working for their own commercial goals. They succeeded only because of the Internet. They required a community of programmers willing to dedicate time and effort to create better programs. The Internet established this online community. Like the Internet, free software was really about breaking barriers and building communities.

The success of ARPANET prompted a move towards its commercialization. In 1983, military facilities were separated into a separate network termed MILNET. The rest of ARPANET was run by the NSF. A year later, Domain Name Service (DNS) was introduced. This enabled users to remember hosts and servers based on memorable names rather than numbers. Now familiar domains were defined: .gov, .com, .org, .net, .edu, and .mil. In 1988, NSF upgraded the network backbone from 50 kbps to a T1 line. Fidonets became accessible through the packet-switched network. By 1989, the number of hosts had crossed 100,000. In 1990, the old ARPANET was taken out of service. By then, it had handed over the baton to an international network, which was being termed the Internet.

It may be rightly said of the Internet that it had no father. Any attempt to name one person from a crowd of many notable contributors is to do injustice to the very nature of processes that created the Internet. It was an amalgamation of innovations and an interworking of diverse ideas. It was the realization of a vision that came naturally with the growth of both computers and communication technologies. If not for the few names now remembered, a few others might have done it in their own ways. As the nineties dawned, a new paradigm shift in networking was in its womb and it had a definite father.



**Tim Berners-Lee working** at the European Organization for Nuclear Research (CERN) in Geneva noticed the problem of information exchange among scientists. CERN was Europe's leading organization in particle physics. While Internet had made it possible for CERN researchers to exchange information internally and externally, much of this information existed separately. There were multiple protocols for multiple services. There was also no consensus across platforms and operating systems. This resulted in fragmentation and even duplication of information. Berners-Lee saw that the future lay in bringing the expanding universe of information on to a common framework. Though these ideas were first voiced in the mid-1980s, Berners-Lee was not exactly the first one to do so.

Way back in 1945, Vannevar Bush had made the point that information was growing at a pace faster than one's ability to assimilate them. Research was becoming increasingly specialized and it was not within the reach of a person to understand all threads of development and progress. One possible solution lay in organizing information in a better way. To this end, Bush appealed to the human thought process. He gave a description of the workings of the human mind,

It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain.

Bush proposed a system modelled on the workings of the human mind. The system would assist humans to make and record associations. In other words, humans may forget associations over time but the system would remember them. Traditional systems organized information alphabetically or hierarchically. Dictionaries and encyclopedias had done this for centuries. Gopher services of the Internet organized information in categories. Indices allowed one to access information outside these rigid structural formats but indices were seen as a weak implementation of the association concept. After all, indices themselves were arranged in alphabetical order. The way Bush saw it, associations ought to be integral to the flow of information so that readers could seamlessly move across structural boundaries.

What this meant was that someone reading about the history of computers might get curious along the way about the impact of computers on human interaction. While reading about human interaction, she might want to understand more about certain relevant concepts from psychology. She might come back a full circle and look at the application of computers in the field of psychology. In traditional information systems, these diverse concepts might be organized separately under such broad categories as computing, sociology, and psychology. With the new proposal from Bush, these diverse concepts would be linked to one another by references. The reader is then given the freedom to follow her thought process without being constrained by a rigid organization of information. In other words, it is the reader that determines the flow of information transfer. Any system based on the association of ideas gives power to the reader. Fundamentally, reader would be able to interact more powerfully with information.

Bush's idea came in the tradition of many ideas in history that identified a desirable want but lacked the technology. It had come at a time when even programming a computer was a novel idea. It was only when Berners-Lee spoke on similar lines in 1989 that some believed it worthwhile building a prototype system. What Bush lacked, Berners-Lee had at his disposal. Computers were faster and blessed with lots of memory. GUI made it easier to interact with computers. Information itself had become digitized. Internet had opened up geographical boundaries so that information travelled more or less freely. The task for Berners-Lee was to make a few innovations to realize a working system.

For the representation of information, Berners-Lee took inspiration from previous work done at IBM in the 1970s. It was IBM researchers who had invented the concept of _markup languages_. A markup language categorized individual parts of data as headers, quotes, citations, and main text, among others. Berners-Lee saw that markup languages could be used effectively to make associations between related ideas even when they were on different files on different computers. From here was born _HyperText Markup Language (HTML)_. Like text, hypertext contained information content. Unlike text, hypertext included links to other parts of the same document or parts of a different document. What's more, every piece of information could be addressed and linked by a unique identifier called the _Universal Resource Locator (URL)_. URLs made the concept of association realizable.

With HTML, a writer could add links to other documents that might be references or supplementary material. HTML could be used to present to the reader a high-level summary that also included direct links to individual chapters. This benefited readers who might not want to read the entire summary, who might prefer to jump straightaway to interesting chapters. Berners-Lee created a tool to edit and browse information via hypertext. With a view of the future, the creator named this tool _WorldWideWeb_ , the world's first HTML browser. This was simple enough to implement on a single computer but for this to work across computers a new protocol was needed. Since none of the existing protocols suited the requirements, Berners-Lee got down to writing one himself. This he named _HyperText Transfer Protocol (HTTP)_. Through the months of 1991, both HTML and HTTP were announced to the world.

At the heart of this system was the client-server paradigm. Browsers were clients on which users accessed information. Servers processed client requests for particular URLs. Servers retrieved the requested data and sent them to the client browser. The browsers displayed that information to the user. Since these interactions between clients and servers are short, it made no sense to establish connections in the old circuit-switched manner. In fact, HTTP client-server model is based on individual transactions. Each client request is serviced on its own merit. This frees the server from tying down resources for each client. HTTP implements a connectionless service. At a lower protocol layer, it makes efficient use of network bandwidth by relying on TCP/IP. Layered protocol design showed that it was easy to implement various approaches for optimal performance. IP was a peer-to-peer protocol but HTTP at a higher layer was a client-server protocol. HTTP was a connectionless protocol but TCP beneath it was a connection-oriented protocol.

Layered Protocol Design for the Web

Protocols are designed layer by layer. HTTP is used at the top between web servers and clients. To carry them over the network, HTTP messages are broken up and packaged into TCP/IP packets. The receiving end reconstructs the original HTTP messages. Actual user data carried by HTTP can be text, images, video, or audio.

The idea of linking information and using HTTP/HTML really took off in 1993 when CERN decided to share HTTP code free of cost. That same year, the University of Illinois at Urbana-Champaign (UIUC) released a browser named _Mosaic_. Mosaic had good support for multimedia. Information was not limited to text. It could be images or snippets of audio samples. It could even be videos implemented as animated GIF files. Moreover, Mosaic came at a time when the Internet itself was becoming commercial. No longer limited to scientists of the old ARPANET, Internet was becoming accessible to a wider population. Commercial Internet Service Providers (ISP) had come into existence with CompuServe and America Online (AOL) handling increasing volume of traffic. At the same time, the digital transport infrastructure was undergoing an important change.

Up until the mid-1990s, the fastest data modems that consumers could buy operated at 33.6 kbps. This had been sufficient in earlier times when information was mostly text-based. With the coming of the Web and the increasing dominance of multimedia, even the fastest modems were deemed insufficient. Although modem speeds were eventually increased to 56 kbps, this was just about the upper limit for data communications on the circuit-switched network. The reason was that T1/E1 lines were limited to 64 kbps per circuit by design. Since part of this was reserved for signalling, the user could effectively use only 56 kbps for data. Thus this limit had more to do with the way the telephone network was tied to circuit-switched technology than with the subscriber line itself. In any case, engineers started thinking about overhauling the telephone network to include packet-switched data and provide digital lines all the way to the subscriber. Subscribers would interface a diverse set of devices to this digital line. The network would be run by an advanced common control signalling system that was named _SS7_. Telephone, facsimile, video conferencing, and Internet data would all be integrated under a single overarching standard that came to be called _Integrated Services Digital Network (ISDN)._

ISDN was first conceived by CCITT in the mid-1970s but the first set of recommendations did not arrive until a decade later. When deployment efforts got underway, the costs were prohibitive. There was no clear business case to prove that users would be willing to spend on upgrades. Standard-compliant devices did not arrive in time for the market. For packet switching, ISDN interfaced with X.25, which in its turn lost the race to TCP/IP. The fact was that ISDN strangled itself by defining a multitude of features that involved complex signalling. With complexity, its time to market was much delayed. This was a disaster given that the market itself was changing rapidly. Since then, ISDN has gone down in history as one of those technologies that attempted to do too many things at once. While ISDN witnessed some success in Europe and Japan, Americans saw it as being more suited to voice than data. The ISDN promise of 128 kbps for applications didn't seem to be attractive enough to increasing demands of multimedia interactivity that the Web brought. What's more, ISDN's market entry was clouded by alternative technologies that gave it stiff competition.

The Cable TV network had already established a large subscriber base with coaxial cables going right into homes. In the nineties, it was recognized that the same cables could be used to carry bidirectional data. Multiple users shared cable bandwidth but even then, each user could obtain as much as 1 Mbps of downstream bit rate since the cable could support as much as 30 Mbps on a 6 MHz channel. Upstream bit rate was far lower but this didn't bother subscribers. Subscribers did downloads more than uploads. _Cable modems_ emerged in this context to offer subscribers broadband Internet access. At the heels of cable modems, an alternative was also emerging. Rather than relying on Cable TV systems, it found new ways of using the good old telephone network.

The best technologies are sometimes simple ideas that approach a problem from a commercial perspective from the outset. The failure of ISDN was an indication that perhaps engineers must find ways to extend the longevity and usefulness of copper wires of the telephone world. All along, engineers had been improving on data modems in the voiceband. In reality, the greater cost of system deployment was in laying wires to subscriber premises. The cost of network upgrade was itself comparatively smaller. The result of this thought process was that a new breed of data modems was invented to operate outside the voiceband. Since different frequency spectrum bands can exist simultaneously on the same wire pair without interference, it was possible for the same copper wires to carry both voice and data. Data operated at a higher spectrum of 25 kHz and above. Simple devices at both ends of the line could split the spectrum into voice and data, each one taking its independent course through the network. Voice could use circuit switching in the usual manner. Data could use packet switching as it made its way through the Internet.

In the past, this had not been implemented because subscriber lines had been optimally designed for voiceband. In this band, analogue signals travelled for many miles without deterioration. When the need arose for higher data rates, engineers saw that in many urban locations lines were rarely that long. Lines had become shorter with the increasing presence of remote switching units and line concentrators. Along shorter lines, it was possible to transmit on higher frequencies to carry data.

These newer modems brought an additional advantage. In the old voiceband modems, if the line was being used for data, the subscriber could not make a call. With these new modems, it was possible to browse the Web and talk at the same time. These modems came under the category of _Asymmetric Digital Subscriber Line (ADSL)_. It was asymmetric because downstream bit rates were bigger than upstream bit rates. Given these emerging alternatives, no one really wanted to put money into ISDN deployment that was seen as expensive. The modulation used for ADSL was in itself an advancement of communication engineering.

At a higher spectrum, signal bandwidth was large enough to allow for broadband bit rates in the order of Mbps. ISI was smartly overcome by treating this wide bandwidth as a summation of multiple narrowband channels. While the frequency response of a wideband channel usually showed variations within the band, by breaking it up into many narrowband channels, each channel approached the ideal AWGN channel with a uniform frequency response. Information was therefore divided across these narrowband channels, with more information going to those channels that presented the best error-free transmission conditions. Such a modulation employed in ADSL was known as _Discrete Multitone (DMT)_. Naturally, this increased the cost and complexity of the system but increasing clock speeds and VLSI made implementations practical.

As if ISDN wasn't enough on its own, CCITT went ahead and defined Broadband ISDN (B-ISDN) to compete against ADSL. B-ISDN offered subscribers, at least in concept, rates as high as 155 Mbps. This sort of bandwidth would enable a few high quality compressed video, voice plus data on the same line. The network backbone serving perhaps thousands of such high-rate lines would have to handle traffic at terabits per second (Tbps). This was almost futuristic. Even switching fabrics running at 45 Mbps per line were stretching the limits of technology. Through the late eighties, engineers took up the challenge of pushing these limits into untested boundaries. When engineers at Bellcore tackled this problem, within two years they managed to switch at over 150 Mbps. Their prototype switch could handle traffic in the order of many Gbps. Commenting on this success years later, project manager David Sincoskie recalled that he had

learned one of the most important lessons of research management. Take talented engineers, add motivation and resources, and ask them to do something just slightly impossible. They'll never fail.

Where B-ISDN failed was to sideline IP and packet switching. The underlying transport technology chosen to support this service was _Asynchronous Transfer Mode (ATM)_. In the late 1980s, ATM had emerged as an alternative form of switching to compete directly against IP. In ATM terminology, packets were called _cells_. While each IP packet took its own route through the network, ATM popularized the idea of a virtual circuit. This was somewhere between circuit switching and packet switching. Since all cells took the same route, ATM argued that it was possible to guarantee expected levels of service. Guaranteeing low delay and jitter is critical to the performance of real-time services such as video conferencing and VoIP. IP gave best-effort service without guarantees. As a connection-oriented service, ATM bettered this by reserving resources along the way and this reservation was not as rigid as in circuit switching. Moreover, in the days when IP routers were not sufficiently fast, peeking into and updating IP headers took time. ATM simplified switching by using fixed-sized cells and doing faster cell switching on virtual circuit paths.

ATM had its share of success through the nineties but eventually lost the race to IP. One of its faults was to use small cells of only 53 bytes of which 5 bytes made up the header. This proved to be an unattractive overhead in the long run. ATM also made inroads into LAN technology by introducing _Local ATM_ in the early nineties. This too enjoyed early but short-lived success that lasted only until Ethernet itself got upgraded from 10 Mbps to 100 Mbps. Compared against Local ATM, Ethernet became a natural choice for most LAN deployments since hosts already had the necessary network interfaces. Moreover, ATM was connection-oriented while IP wasn't. So the job of integrating ATM with TCP/IP proved lot more difficult than anticipated. In technology at least, evolution finds success more easily than a revolution.

In the long run, neither ATM nor B-ISDN survived. ADSL deployments gained pace and outgrew cable modems. IP and packet switching were firmly established as technologies of choice. Routers got faster so that ATM lost its speed advantage. In the period of transition, IP packets were encapsulated into ATM cells so that service providers who had deployed ATM infrastructure for their backbones could continue to use them to transport IP packets. About the same time, at the lowest layer of the protocol stack, a new technology matured to carry traffic that had quickly grown a thousandfold.

When laser was born in the 1950s, it was not immediately apparent how it would be useful to engineering. Laser is a classic case for which the science was ready long before anyone found a use for it. It has been noted that it "was a solution looking for a problem." Communication engineers saw two things in the laser. First, light signals could be switched on and off incredibly fast. Electronics could be used to detect these light pulses. Second, this on-off nature of light perfectly suited digital transmission. Digital information can be sent digitally without messy modulated analogue waveforms. In other words, just as T1/E1 lines were entering service, communication engineers started looking at the laser for direct digital transmission. With large bandwidth, broadband bit rates in the order of many Mbps would become possible.

The very idea of using optical signals rather than electricity to transmit information represented a strange twist of fate, sort of a return to the origins of distant communication. Centuries ago, distant communities had used hilltop beacons. Reflecting mirrors, flag signalling, and Chappe telegraphy were later forms of optical transmission. With laser, optical communication was back in fashion but in a different avatar. It was no longer about light travelling in the atmosphere. It was light triggered at the level of atoms and electrons. One can even say that the progress of civilization has been a shift from outward perspective to inward soul-searching, from visible forms to fundamental constituents. In the beginning, men used the heavens to keep time. Today we use periodic vibrations of the Caesium atom. As radio astronomy has shown, even understanding the heavens is greatly dependent on the analysis of matter at its finest level.

Initial efforts towards commercializing the laser were not promising since light signals when directed into glass fibres suffered heavy attenuation. In the early sixties, this was as bad as 1000 dB for every kilometre. The initial problem of signal leakage from glass to air was solved by enclosing glass cylindrical cores within a layer of _glass cladding_. The bigger problem was about drawing out a glass fibre so pure that attenuation could be reduced to levels practical for digital communications. These were challenging times for optical technology because it was in direct competition with millimetre waveguides. In 1970, scientists at the Corning Glass Works produced fibres with a loss of only 20 dB/km. Today, commercial fibres are available at loss below 0.2 dB/km. Millimetre waveguides lost out in the race to carry broadband data.

Optic fibres initially carried data only within the network. Since the late seventies, fibres have been deployed in telephone networks to carry interoffice traffic. In 1988, TAT-8 fibre cable was laid across the Atlantic. Barely three decades earlier, TAT-1 had managed to carry 36 calls. TAT-8 carried 16,000 calls. A year later, a similar line spanned the Pacific to connect US and Japan. It could carry 40,000 calls. Without fibre technology, transport networks never could have scaled up to meet subscriber growth. In the nineties, Cable TV networks used fibres extensively to carry data as close to subscribers as possible. The final leg of the journey to the subscriber was negotiated using coaxial cables. These systems came to be called _hybrid fibre-coaxial (HFC)_ systems. Riding on these, cable operators introduced "triple-play" packages that included digital TV, telephony, and Internet. This move from analogue TV to digital TV partly came as a response to competition from satellite TV. Those cable operators who failed to upgrade to digital, were wiped out. Many small players were swallowed by bigger ones.

Through the nineties, fibre capacity increased in response to demand. _Wavelength Division Multiplexing (WDM)_ was invented to allow multiple data streams to share the same fibre, each stream occupying a separate optical channel in the frequency spectrum. As fibres started carrying traffic in the order of many Gbps, error-correcting codes were introduced to protect the bits. BER requirement for fibre transport is particularly stringent at 10-15. VLSI has enabled implementation of concatenated RS and LDPC codes with soft-decision decoding. Fibres can now carry bits at 100 Gbps, certainly a long way from the days of FDM and TDM lines.

While ADSL has remained a popular technology to deliver digital data to the subscriber in the last decade or so, fibres are being laid out these days straight into homes. Such high-bandwidth infrastructure is of little use if there is nothing significant to be delivered. Fortunately, with the growth of the Internet and the Web, there has been no shortage of content. Even ISPs wanted to capitalize on content and forged ahead with mergers and acquisitions with content providers. AOL acquired Time Warner in 2000. This was partly a response to the limited approach networking companies had taken since the eighties. They had provided nothing more than "fat dumb pipes" of wires, cables, and fibres. Only later did they realize that higher profit margins could be obtained by providing network-driven services.

This controversy between content and connectivity goes far deeper than what appears to the eye. In 2004, Comcast proposed a merger with Walt Disney and failed; but back in 1995, Walt Disney itself had acquired the American Broadcasting Corporation (ABC). The real mergers have been one of content and connectivity. Many in the industry contend that content is king. Take television for instance. Even the best HDTV 3D sets are of little value if there are no quality programmes to watch. But the meaning of "quality programmes" is in itself subjective and controversial. While content may wear the crown and hold the sceptre, it is connectivity that collects the money. Connectivity is the royal exchequer. While most of what's on the Web is free, users almost always have to pay for their ADSL connections. The plain truth is that content and connectivity form an important synergy. Neither can survive on its own. In some channels of distribution such as the Web, connectivity holds the upper hand. In others such as cinemas, satellite TV, and DVD releases, content is a clear winner.

The importance given to content has a different flavour in the world of computing. Right functionality and data processing have been important from the very beginning. Only later did engineers think about interactivity and user-friendly interfaces. The same evolution happened in networking. The Internet was primarily about content. The Web was more about presentation and ease of use than just content delivery or accessibility. When it came to adoption of a standardized delivery format, HTML was selected. The important aspect of such a markup language is that it categorizes content neatly, but falls short of defining how such content ought to be presented. This separation of content from its presentation was in a way quite advantageous from design perspective. It allowed content creators to focus on content. It allowed various forms of presentation from the same content.

For example, one particular browser might display a heading as Arial font size 16 in bold while another might display it as Verdana font size 18 in bold. For that matter, users may choose the presentation that suited them best. So a teenager might want lines to be spaced normally but an elderly person might want greater spacing for better readability. The real problem was in implementation. Early browsers did not provide adequate support for users to define or select their own preferred presentations. Sometimes documents carried with them style information. This confused matters because when users chose to overlay their own styles, the results were far from pleasing. Most users stayed away from the messy business of defining their own styles.

The original intent had been good but users would rather read webpages that had been formatted and styled properly by the creators themselves. Even creators realized that webpages that incorporated both content and style allowed them to be creative in many ways. They could create pages that represented their personalities. They could create pages for impact, an essential attribute in the online advertising business. Styles allowed them to emphasize content visually and spatially. Above all, style made webpages attractive enough to make users visit them often. These developments came in the early years of the Web. At least on the Web, if content was king, style was the glamorous queen.

Ideas on defining style evolved into the _Cascaded Style Sheet (CSS)_ standard. First presented in 1994, it wasn't standardized until 1996. HTML was retained for content but it could import or reference CSS definitions to control the way content was presented. Instead of defining presentational aspects within HTML, using separate CSS files gave flexibility to designers. It was possible to change the look and feel of a website by simply changing CSS references. It enabled reuse of styles and facilitated harmonious design across all pages of a website. Much of the structuring in CSS was inspired by object-oriented concepts that had gathered support in the previous decade. This gave CSS much of its power. It could apply styles at many levels of a HTML document with minimal programming effort.

Just as emphasis shifted from content to style, there was also a parallel development that began looking at content beyond the traditional manner in which data was generated and consumed. In the early years of the Web, data was static. Writers, journalists, and website creators put together content, which was then made available on the web server. Once in a while, these pages may undergo updates and corrections. Even when data was generated by machines, such as periodic temperature readings at a weather station, data would be collected and uploaded by an administrator on a daily basis. It soon became obvious that as services on the Web became increasingly sophisticated, automated systems would be needed. Score update of a live tennis match at Wimbledon on a webpage had to be done by machines. For humans to do this would be too laborious, error prone, and inefficient.

There are two parts of managing dynamic content on the Web. The first part is management at the server. Since it is difficult and clumsy to do direct updates on HTML pages, designers came up with a better option. Whenever browsers requested content, they would be packaged into HTML pages generated dynamically. That way, the latest available data could be picked up without manual processing delay. Applications that rely on such real-time data updates include live match score updates, stock prices, mountain weather conditions, online community gaming, latest news, remote monitoring systems, and web chatrooms. HTML pages would be the output of programs running on the server. It was in this context that Java found worldwide adoption because it simplified server-side programming. This thread of development evolved into _Content Management Systems (CMS)_.

As suggested by the name, content in CMS was to be managed separately in a format that was suitable for updating, searching, sorting, categorizing, and archiving. HTML remained the method for delivering content to users but it was no longer a storage format for content. Content went into sophisticated databases. CMS platforms themselves evolved to tie together databases, website design, and server-side programming. This was really packaging to ease the job of programmers. Among the famous open source CMS platforms are WordPress, Joomla!, and Drupal, all of which were introduced in the early years of the twenty-first century.

The second part to managing dynamic content is at the client side. Suppose one were following the latest scores of a Wimbledon final. The webpage may itself have many components, a mix of articles, links to archives, past matches of the tournaments, and images. To download the entire page for one small update is a waste of bandwidth, both in terms of client-server communication as well as server-side processing. To solve this problem, designers invented _Asynchronous JavaScript and XML (AJAX)_. The truth is that no one really invented AJAX. It was a term coined by Jesse James Garrett in 2005 by taking note of the way diverse technologies had come together to handle dynamic content.

JavaScript had existed almost since the start of the Web. One of the Web's earliest browsers, Viola of 1992, had elements of both CSS and JavaScript. The idea was to provide greater user interactivity and data manipulation on the client-side. This is particularly useful for applications including gaming, calculators, and simulators. Just as server-side programs came into being, JavaScript introduced client-side programming. In such cases, the server would provide the program and possibly some initialization data. The browser would do the calculations. For instance, if a user requests the value of pi for a thousand decimal places, the server could calculate it and sent the result to the browser. With client-side programming, the server would send the formula instead and the browser would do the calculation on its own. JavaScript enabled this.

JavaScript was also about finding a balance between computing and communication. These twin aspects of digital technology were getting faster and more powerful through the mid-1990s. Designers therefore attempted to leverage on these developments. Workstations and PCs had brought lots of cheap computing to common users. Pushing computation to these clients relieved the server from burdening itself, thus decentralizing computing. As for communication bandwidths, these too were on the rise but they lagged behind computing capability. It therefore made sense to present clients with mechanisms to compute rather than transfer lots of data over telephone lines. If AJAX evolved a little later from these early developments, it was because it had to wait for faster communication links with minimal packet delays.

The full power of JavaScript was brought out by the _Document Object Model (DOM)_ , which imported the hierarchical structure of HTML into the context of programming. With DOM, it became possible to identify, access, and manipulate every single element of a document. JavaScript therefore relied on DOM to program the behaviour of a webpage. JavaScript was formally released in 1995 by Netscape and the company's Navigator 2.0 supported it. With this, browsers became generic environments for interactive program execution rather than being simply about presenting content.

Just as dynamic content was on the rise, _Extensible Markup Language (XML)_ was standardized in 1998. It was designed as a hierarchical format to organize data, just as DOM was a hierarchy to understand a document. Data itself could be qualified using attributes. The format itself had no semantics. Meaning was left to the application using the data. While many CMS platforms used databases to manage data, they could use XML instead if they chose to. The greater advantage of XML was that the format was both human-readable and machine-readable.

AJAX arose as a natural consequence, as a coming together of technologies that jointly enabled dynamic content delivery and client-side processing. Browsers via JavaScript would periodically contact the servers to check for updates. Servers would come back with new data but rather than sending huge HTML pages, they would send short XML-formatted data. JavaScript would process the received data and update only those parts of the webpage that mattered. User experience was vastly improved in the process. The user didn't need to refresh the page or click on any link to get the latest data. AJAX has more or less revolutionized the design of websites. It had become an essential component of the modern Web. It was from these component technologies that HTML evolved into _Dynamic HTML (DHTML)_.

In the early years, Java was also used at the browser side to drive interactivity and animation. This was done in the form of _applets_ , little applications that could be embedded into webpages and run on their own on any platform. With the growth of JavaScript and the AJAX paradigm, Java applets have more or less disappeared from the Web.

Those who saw the potential of the Web early on found phenomenal success. Cisco Systems was founded in 1984 and its very first product launched two years later was a network router that supported TCP/IP. Since then, Cisco at its core has been a company committed to IP technology. It started by selling its routers to universities and government institutions. Through the late eighties when large corporations were setting up their own wide area networks, Cisco sold them its networking products. Cisco's real period of boom came in the early nineties just as the Web was growing up and just as IBM was struggling. Among its strategies has been to sell a diverse range of products from high-end routers to basic affordable units for small businesses, from LANs to WANs. Along the way, it embraced ATM and fibre technology but never lost its focus on IP. It got into video and voice products that interworked with the rest of its networking portfolio. Telephone companies that had been entrenched in circuit-switched technology looked towards Cisco to provide them with packet-switched technology.

Microsoft, while being successful during the same period, achieved that success on the strength of Windows OS and application software. It did not get into networking technology on a big scale. Its first web browser, Internet Explorer, was released only in 1995, almost a year behind Netscape Navigator. These were the years of the "browser wars" when Microsoft realized that people were spending increasing more time on the Web using Netscape Navigator than on using its own core products. Microsoft attempted to force users into using its Internet Explorer by bundling it with the OS. Antitrust suits followed and Microsoft was forced to unbundle its browser from the OS. Either way, both these browsers were available to the public free of cost. Internet Explorer on its own did not bring money but it represented a technical superiority upon which customers might be enticed into buying Microsoft's other products. This was something like Renault or Ferrari competing in the F1 Grand Prix. Problem was that Netscape Navigator, and its open source successor Firefox of 2004, had a large following that couldn't really see what was so superior about Internet Explorer.

What the Web ultimately did was to provide everyone a new channel for two-way communication as opposed to one-way radio or TV broadcasting. One didn't need to be an established writer or journalist to give personal opinions. While freedom of speech has been around for decades in many countries, the system had paid only lip service to this freedom. In reality, a writer had to know the right publishers or literary agents. An artist wishing to exhibit his paintings needed to have financial backing to rent a gallery for an entire week. A professor wanting to give a series of talks usually addressed a small gathering just because her department didn't have a marketing budget. A short film could not be screened because no cinema operator was interested in buying it although a small crowd of film buffs might pay big bucks to watch it.

The Web established a direct link between producers and consumers. Everyone was potentially a producer of content. This was freedom in the ultimate sense. The Web did not require that content be peer-reviewed or accepted by experts. It did not require grammatical correctness. There was no censorship of expression. The whole point was to communicate openly, passionately, and often without expectation of reward. These were the very traits that had shaped the technological drivers of the Web. Anyone could publish almost anything on the Web. Even if a work had limited scope to interest many readers, it had unlimited reach.

This was the essential point made by writer Chris Anderson in his book _The Long Tail_. Anderson claimed that even if each independent writer or artist appealed to a niche market, the very fact that were simply so many, made them a significant force in the marketplace. There was a time when there were only a few major companies who produced packaged software to run on popular platforms. Today thousands of programmers are writing applications for web-enabled devices. As of October 2012, Apple's iStore had 700,000 applications and the number was matched by Google's Android platform. Microsoft's Windows Phone 8 too had an impressive 120,000 applications. It is therefore quite true that we are living in an "App Economy." With powerful APIs, easy distribution channels, and a large customer base, this economy is certain to keep growing. No one really talks much about a single killer application these days. The focus really is on a multitude of applications catering for an audience of diverse expectations.

It had all started in the days of the Internet with BBS, mailing lists, and newsgroups. Online diaries called _blogs_ enabled common people to post articles, solicit comments, and engage in active dialogue. Soon people starting expressing themselves in more than just words. With the coming of affordable digital cameras, photographs and videos found their presence on the Web. As examples, Flickr became a popular website for sharing photographs. YouTube became popular for users to upload and share videos. For knowledge sharing, Wikipedia emerged as the de facto website. In time, Wikipedia challenged even established encyclopedias that had been around for decades.

Sun Microsystems, Netscape, and Cisco are just a few examples of companies that made it big on the strength of the Internet and computer networking in general. Those that provided free online email facilities were among the first to capitalize on the Web. Early names in this context are Hotmail and Yahoo. They established a large customer following and advertising potential. In the e-commerce space, Amazon's online bookstore and eBay's online auction are now recognized services. Coupled with the migration of many services to the Web, the fact that there were now tens of thousands of content generators at the least, information on the Web exploded. When finding relevant and accurate information became problematic, search engines emerged in the mid-nineties. Today only one search engine stands out, Google.

Two decades ago, Microsoft had seen an opportunity in PC software, which IBM had missed out. Google, founded in 1998, saw a similar opportunity in a networking service that Microsoft had missed out. By constantly scaling up its operations, Google has kept apace with the growing volume of information. It has continued to provide relevant search results to user queries. More than a decade after its inception, its supremacy in online search has been unchallenged. Microsoft resolved not to repeat its mistakes and kept a cautious eye on another technology that had been developing for some time now. This new technology would not be a rival to the Web but it would fundamentally change how people used the Web. Microsoft and Google would compete on this new turf but many new players would vie for a share of the pie. Interestingly, computing and networking would once more find themselves going back to the source of digital technology, telephony. Just as interestingly, the new form of telephony would not even need wires.

# 1011 Bits on Wings

**Sometime in the** year 1831, Michael Faraday designed a rather simple experiment. He took a ferrite ring and wound two coils of insulated wires. To one coil, he attached a battery. To the other, he connected a galvanometer in an attempt to measure any current that might flow. The hypothesis was that since electricity created magnetism, magnetism would create electricity. The ferrite ring was used simply to enhance the magnetic effect. Initial experimental results were discouraging. When he closed the circuit of the first coil, there was no current in the second coil. Instead, Faraday saw something else that required explanation. When the circuit was closed or opened, a momentary current flowed in the second coil. Here was something new and unexpected. It was not magnetism itself but a change of magnetism that created electricity. As soon as magnetism stabilized, electricity subsided. This marked the discovery of _electromagnetic induction_ whereby changing magnetism can induce electric currents. Electric generators and microphones operate on this principle.

Across the Atlantic, Joseph Henry discovered the same year exactly the same principle. He did not publish his result until later and the credit of the discovery remains with Faraday. Electromagnetic induction is no doubt one of the greatest scientific discoveries of the nineteenth century. It triggered a series of developments in both theory and experimentation. In time, this went on to unravel the deeper truths about electromagnetism. At the forefront of this scientific investigation was Faraday himself.

Born to a blacksmith, Faraday did not have a formal education. At the age of fourteen, he was apprenticed as a bookbinder. This turned out to be a good thing. He had access to books and he loved to read. Science excited his curiosity. When he came across interesting experiments, he repeated them to obtain first-hand confirmation of the results. Essentially, Faraday's approach right from these early days was strongly experimental. He lacked a mathematical background to delve into theorizing or abstract analysis. As a true empiricist, he relied on his intuition to seek explanations. It was in this spirit that he proposed an alternative to Newton's central force theory, which had remained unchallenged since the seventeenth century.

Faraday started thinking about the essential process behind induction. Followers of Newton had stated for long that forces acted at a distance. The intervening medium, whatever it was, had no important role to play in this force interaction. This was the case with two material bodies acting on each other with mutual gravitational force. The same could be said of electrical and magnetic forces between electric charges and magnetic poles respectively. To Faraday, this view did not satisfactorily explain electromagnetic induction. Induction was not instantaneous. It had a time component whereby it related to rates of change. By the power of physical intuition, Faraday conceived of _lines of force_ that existed in the medium. These lines of force together formed a _field_ , be it electric or magnetic.

In Faraday's field theory, the medium took an active part in interactions. To Faraday, the medium linked rather than separated electric charges or magnetic poles. He saw forces as continuous rather than being jumps across discrete points in space. It didn't matter if an electric field was due to electric point charges or due to changing magnetic fields. The essence of the theory was in the field itself rather than the manner in which the fields were produced. Lines of force indicated the direction in which forces acted. The density of these lines indicated the strength of the forces. Two statements could be made about these lines. First, they obeyed the general principle of least action; that is, to move an electrically charged body in an electric field, these lines represented the least amount of work necessary. Second, the lines of force nudged one another so that each one exerted a pressure that was perpendicular to the force lines. It was also possible to visualize these lines by sprinkling iron filings in the presence of a magnetic field. As the filings got magnetized, they aligned themselves to the lines of force in the field. Lacking mathematical proofs, these ideas of Faraday were not enthusiastically accepted. Among the few who believed in Faraday's intuitive explanations was James Clerk Maxwell, born the same year that Faraday discovered induction.

Maxwell had a bit of both—a strong physical intuition of how things worked and a mathematical ability to theorize phenomena. It is from Faraday that Maxwell took his inspiration. He not only gave the equations for induction and lines of force, but also put forward the view that light itself was electromagnetic in nature. By then, the speed of light had been directly measured. Kohlrausch and Weber, approaching the subject from the perspective of electricity and magnetism, discovered that electricity travelled at the same speed as light. Maxwell proposed that light was really undulations of electric and magnetic fields, the two fields connected by induction. Electric vibrations created magnetic vibrations, which in its turn created electric vibrations. More importantly, these vibrations were not localized. Vibrations spilled over to adjacent field lines and consequently the vibrations propagated through the medium. Taken together, light was an electromagnetic field that varied in time and space.

What this meant was that light was not a mechanical phenomenon but an electromagnetic wave. Electric and magnetic fields were transverse to the direction of the wave's propagation through space. Maxwell's theory lent support to earlier views of Young and Freznel that light was not really composed of particles acting at a distance. It was also believed at the time that waves require a medium to propagate. Light from the stars and our own sun reached the earth through what was apparently a vacuum. The proponents of the wave theory postulated that just as the ancients had argued, there was a fifth element that pervaded all of space. They called this the _luminiferous æther_ or _ether_. Some famous experiments, including the one by Albert Michelson and Edward Morley of 1887, failed to detect this ether. Later developments showed that light sometimes behaved as waves and sometimes as a stream of particles. Einstein's theory of relativity sidestepped the issue by saying that it doesn't require the existence of ether. While much has been achieved since the time of Archimedes, modern physics has failed to unite diverse phenomena under a unified theory. Light continues to exist in its dualism. Ether continues to be as elusive as ever. Gravity and electromagnetism continue to exist in their separate worlds that coexist in the same space and time. If the situation seems quite hopeless and muddled today, Maxwell's ideas proved to be the starting point of a new revolution in communications.

The first task was to prove experimentally that light was an electromagnetic wave. Maxwell had given no clues as to how this could be done. Generating light was an easy thing to do by chemical processes. All one had to do was strike a match. This did not serve to prove the electromagnetic nature of light. Static electricity had given one more way to generate light in the form of spark discharges. This was how electricity had begun before the birth of galvanism. For long, it was thought that static discharges were instantaneous. Charged particles flowed in a single direction to neutralize electrically the charged conductors. Among the first to think otherwise was Felix Savary, who observed in 1826 that when steel needles were magnetized by discharging Leyden jars, the needles did not always end up with the same polarity. Savary suspected that charges flow back and forth before neutrality is achieved. Others too inferred similar behaviour but no one could directly verify this phenomenon. The oscillatory nature of static discharges was not something one could see with the naked eye.

Where experimentation is elusive, scientists take recourse in theory. William Thomson in a classic paper of 1853 derived a theoretical formulation for these discharges. It had been known by then that conductors have both resistance and capacitance. Thomson was the first to see that inductance too had an important effect on charge flow. In fact, the mathematical foundations of the loading coil invention of Pupin and Campbell can be traced to Thomson's early work. Thomson identified the conditions under which discharge would be oscillatory. These oscillations are really a back and forth exchange of energy between inductive elements and capacitative elements of a circuit. The former store energy in magnetic fields while the latter do so in electric fields. Moreover, every circuit has a characteristic frequency at which this energy peaks. This is because capacitative and inductive elements cancel each other completely so that circuit impedance reduces to resistance. Thomson's work was the first theoretic treatment of this important phenomenon called _electrical resonance_.

Resonance had been known for centuries in the world of sound—vibrating reeds, sympathetic strings, and tuning forks. Thomson extended the concept to the world of electromagnetism. On the strength of this theory, a circuit could be designed to effect resonance at a particular frequency that could perhaps be suitable for observation. Five years after Thomson's publication, Berend Wilhelm Fedderson used revolving mirrors to observe these oscillating sparks for the first time. A year later, he photographed them and even measured the period of oscillation, giving definitive proof to Thomson's theory. When Maxwell's theory of electromagnetic waves followed in the next decade, it was not obvious if these spark discharges were in any way related to Maxwell's theory. The fact was that Maxwell himself had not said anything about invisible electromagnetic waves. He had talked only about the electromagnetic nature of light.

The first person to make the link was George FitzGerald, who in 1880 proposed that while spark discharges dissipate in time due to circuit impedance, part of the energy is also radiated into the surrounding medium. He even gave mathematical equations that related the amount of radiation to current and wavelength. This was the beginning of electromagnetic waves in the general sense, waves that could exist outside the visible spectrum of light. Newton had shown two centuries earlier that white light itself was a mix of colours. Today we know that each colour is defined at a particular frequency. With FitzGerald, the idea was put forward that there were "colours" or frequencies that were invisible to the eye. They too were electromagnetic in nature and propagated as waves. The biggest problem was to see what was invisible. One of Maxwell's followers, Oliver Lodge, commented later on the prevailing scientific thinking in this context,

We did not know that there was any such radiation, nor did Lord Kelvin. We knew, or might have thought, that such radiation was possible, by the analogy of a tuning-fork.... but no one suspected it for a long time; they did not know that the conditions for ether waves would be satisfied by an electric discharge. We had no sense for such waves, and could not tell that they were being emitted, even when we made the experiment. We were in the condition of a deaf person striking a tuning-fork or a bell. If you could not hear the sound emitted by the fork you would not know that there was any; and you would certainly not experiment on the waves, measure their wave-length, and utilize them for purposes of communication.

Such was the dilemma that two decades passed since Maxwell and no one managed to prove experimentally that electromagnetic waves really existed. Even ardent supporters of Maxwell, including FitzGerald, Lodge, and Heaviside, who came to be collectively called the _Maxwellians_ , had little success. Across the English Channel, a young German grew up under the shadow of these developments. At the age of twenty, he was torn between an engineering career and research in pure science. Eventually, his interest in pure science prevailed. Rather than devoting himself to purely theoretical work, he combined mathematical foundation with experimental practice. His clear vision of how he wanted to contribute to physics was reiterated later when he declined a university promotion at Kiel simply because he didn't want to end up doing only theoretical work. It was under these circumstances that he accepted a position at the Technical University of Karlsruhe in 1885. Over the next four years at Karlsruhe, he would show the world that electromagnetic waves do exist and Maxwell had been right all along. This great experimentalist of the nineteenth century was none other than Heinrich Hertz.

Hertz started his investigations with electrical oscillations of spark discharges. It was easy enough to produce spark discharges using induction coils but by adding capacitors in the circuit, he obtained a more pronounced effect. To this circuit he connected a secondary circuit, which too had a small gap in the circuit path. What happened next was almost magical. When the primary circuit sparked, the secondary circuit sparked too. He then disconnected the secondary circuit completely from the primary. Even without any physical wire connection, the effect remained unchanged. By adjusting the wire length of the secondary circuit, which was nothing more than passive elements of inductors and capacitors, he could make the secondary circuit spark in resonance with the primary circuit. Sparking in the primary circuit had generated electromagnetic waves that travelled to the secondary circuit a few metres away. Induction effect on the secondary circuit allowed it to spark faintly. Miraculously, Hertz had found a method to prove that electromagnetic waves do exist. He could detect them even if he couldn't see them. He had generated an electromagnetic wave from first principles, these principles being variations of electric and magnetic fields interacting with each other. Speaking at a lecture in 1889 Hertz explained,

The method had to be found by experience, for no amount of thought could well have enabled one to predict that it would work satisfactorily. For the sparks are microscopically short, scarcely a hundredth of a millimetre long; they only last about a millionth of a second. It almost seems absurd and impossible that they should be visible; but in a perfectly dark room they are visible to an eye which has been well rested in the dark. Upon this thin thread hangs the success of our undertaking.

Hertz went on to show that these electromagnetic waves exhibited all the classical properties of waves. The only exception was that electric and magnetic fields were transverse to the direction of propagation. Otherwise, these waves exhibited interference, reflection, refraction, polarization, and diffraction. True to Maxwell's predictions, these waves were reflected by conductors. By using metal gratings, Hertz showed that these waves could be polarized so that electric fields can be restricted to particular orientations. This was what Faraday had discovered with light four decades earlier. These waves obeyed diffraction so that they could turn corners and propagate behind solid walls. Hertz also measured the wavelengths by moving his receiving resonator to pick up the peaks and troughs of standing waves. By using different circuit configurations, he managed to produce waves from a few metres down to few centimetres. These electromagnetic waves that carried with them radiated energy were later termed as _radio waves_. In the beginning, they were simply known as _Hertzian waves_. Among its first applications was _wireless telegraphy_ , which was also referred to as _radiotelegraphy_.

The Experiments of Heinrich Hertz

(a) Initial experiments involved a secondary circuit connected by an inductive coil to the primary sparking circuit. (b) Later experiments showed that sparking occurred even when the secondary circuit was completely disconnected from the primary. This was a clear indication that sparking produced electromagnetic radiation. (c) Secondary circuits of different shapes and sizes were tried at different distances from the primary circuit. (d) Hertz's own diagram showing how electromagnetic waves disengaged themselves from the primary circuit and travelled outwards. Source: (Hertz 1893, pp. 34, 37, 104, 144).

Maxwell himself did not live to see the glorious verification of his theory but his theory stands today as a pillar of modern science. Even with later developments including relativity and quantum physics, his theory has not lost its validity. As for Hertz, he was not interested in commercializing his discoveries. It was therefore left to engineers to build on the foundations and come up with practical radio transmitters and receivers that would work reliably. Wireless communication applied to telegraphy or telephony was an attractive proposition for the industry. To lay copper wires door to door required a massive financial outlay. The cost was often prohibitive to reach rural communities. Economies of scale and profitability could be obtained only in densely populated areas. This is a problem that remains with us even today. Many rural communities in developing countries are not well connected to telephone networks. Wireless technology has come to play an important role to reach out to them.

The first person to conceive of wireless telegraphy was Mahlon Loomis, who was by profession a dentist. Dentistry had nothing to do with telegraphy or electromagnetism but just as a few of today's lawyers or doctors may take interest in computer technology, people of the nineteenth century took a keen interest on anything related to electromagnetism. The remarkable thing about Loomis was that he conceived of wireless transmission even before Maxwell had formalized his views on electromagnetic waves. In hindsight, we find that Loomis clearly lacked engineering background and his ideas were primitive. He viewed electric charges in high atmosphere as a medium for current flow and the ground as the return path. In other words, Loomis was talking about completing an electric circuit made of heaven and earth. His single page US patent of 1872 betrays the simplicity of his ideas. Edison applied for a similar patent application in 1885 and nothing came of it. Neither Loomis nor Edison had any notion of radio waves. Real use of wireless for distant communication had to follow in the tradition of Maxwell and Hertz. But first, the primitive equipment with which Hertz had worked had to be improved.

Édouard Branly teaching at the Institut Catholique de Paris got curious about changes to electrical conductivity of metals when subjected to electromagnetic radiation. Branly was following up partly on the discoveries of D. E. Hughes and later of T. Calzecchi-Onesti, who had seen that conductivity was increased on application of sparks or high voltages. If either of them had pursued this further, they might have secured the honour that finally went to Hertz. Branly too did not know about Hertzian waves until later. Fundamentally, these scientists focused on the receiving device rather than the role of the medium in carrying Hertzian waves. The original device with which Branly experimented in 1890 was a glass tube filled with oxidised zinc filings. When subject to radiation from a spark gap transmitter, these filings clung together resulting in a drop in circuit resistance. Once in this state, the effect remained for many hours. It sufficed to tap the tube lightly to restore the filings to their original state of high resistance. Within a year of the first experiments, Branly managed to detect signals at a distance of 80 metres. Branly named his device the _radioconductor_ , thus importing into scientific language the first use of the word "radio" as a prefix.

It was Oliver Lodge who gave the radioconductor its more popular name, the _coherer_. Lodge reproduced many of Hertz's experiments using the coherer and thus paved the way for more widespread acceptance of Hertzian waves. Lodge went on to invent an automatic relay mechanism by which the filings would be shaken loose to restore the original high resistance value. This was the _decoherer_. Branly himself did not agree to these names, which implied a combination of electrostatics and mechanical contacts led to the reduced resistance. He proved that even by limiting movement and maintaining contact at all times, changes in conductivity could be effected. Nonetheless, these names have stayed with us. Without the Branly coherer, wireless communication would have been delayed by at least a decade. Despite being so essential to radio detection, the scientific principle behind the coherer was not understood for well over a hundred years. It was only recently in the twenty-first century that the effect was explained in terms of micro welding.

The coherer was perhaps the final piece of the puzzle. Technology was now ready to enable wireless communication for the first time in history. The binary nature of the coherer, to detect the presence or absence of radio waves, was just what was required for Morse code telegraphy. The real problem with the coherer was its limited sensitivity, which in turn limited the distance. Many parallel developments and incremental improvements all through the last decade of the nineteenth century brought out better detectors and transmission designs.

In the US, Nikola Tesla did an early demonstration of radio communication. Oliver Lodge in Britain and Alexander Popov in Russia too made progress. Halfway across the world, a brilliant professor at the Presidency College of the University of Calcutta advanced the science of radio communication with very limited resources and poor laboratory equipment. In addition, he suffered from racist discrimination and stoically refused to accept a salary for three years until he was paid on equal terms with his British colleagues. With degrees from both the University of Cambridge and the University of London, this was Jagadis Chandra Bose. He was among the first to produce millimetre waves, which he used to ignite gunpowder or ring a bell at a distance. This demonstration was reminiscent of Joseph Henry's own demonstration of crashing weights in the early years of wired telegraphy. Like Henry, Bose was never interested in commercializing his inventions and didn't bother applying for patents. From this little crowd of great scientists, there was one and only one who combined experimental genius, business acumen, and above all, an unflinching belief that wireless transmission over long distances was possible. This was Guglielmo Marconi.

Despite being born into a wealthy Italian family, Marconi did not go through a formal system of schooling. He was privately tutored in his early years. Later when he entered school in Florence, he did not enjoy his studies. At a higher level, he took a keen interest in physics but did not obtain any qualification. In short, he ended up a college dropout. As can be expected, he was closer to field practice than theory. Most of what he contributed to radio technology came by way of trial and error. It all started at the age of twenty when he came across the works of Hertz and the Branly coherer. He was immediately inspired to recreate these experiments on his own. Within a year, he had achieved a range of over a mile. In the process, he discovered the importance of an antenna or aerial connected to the coherer. The use of an antenna enhanced the transmission of radio waves, something Branly too had realized in his experiments of 1890. Otherwise, Marconi had achieved nothing significant. His real achievement was that he managed to send Morse code signals over wireless. In other words, he had put science to practical application. Receiving lukewarm response in his own country, he arrived in London in 1896 with a greater hope of commercializing his invention.

The first thing Marconi did upon arriving in London was to apply for a British patent. The first demonstration was between rooftops of buildings in London. This was soon followed with other demonstrations, each one achieving a greater distance. Marconi improved on the Branly coherer, by creating a vacuum within the tube and carefully selecting the composition of the metal filings. His filings were almost a fine powder, a mix of nickel and silver. The improved coherer was a fine piece of engineering, sometimes taking as many as a thousand hours to handcraft a single piece. By 1897, he had achieved ship-to-shore communication to a distance of ten miles. No wired telegraphy could have achieved this. The importance of wireless was immediately grasped and caught the interest of the British Navy. August 1898 marked another milestone when Queen Victoria sent a telegram from her residence on the Isle of Wight to Prince of Wales on board a royal yacht fourteen miles away. Historically, this is perhaps the first documented wireless text message.

Part of the allure of Marconi's demonstrations was that a foreigner had succeeded where British scientists had failed. The Maxwellians had supplied the theory but they had not been able to achieve transmissions beyond a few miles. In every demonstration, Marconi's transmitter and receiver devices were enclosed in opaque boxes. This only served to elevate Marconi to the status of a magician who came along with his "secret boxes" and communicated over many miles without wires. With every success, Marconi was stretching the limits of available technology. Yet he never felt that any fundamental limit had been reached. He kept a keen eye on everything happening around him and kept innovating at every step. The primary concern in these early years was not speed but distance. Wireless transmission was slow and messages had to be retransmitted many times to be received correctly. The real advantage lay in communicating to mobile receivers without using wires.

One important invention that increased communication distance came from Ferdinand Braun, who in 1898 showed that traditional spark gap transmitters were inefficient. Much of the power was dissipated in the sparks themselves as they oscillated back and forth from the antenna. Braun improved the design by separating the antenna from the spark gap. The two were coupled magnetically using a transformer. This improved design was later adopted by John Fleming, who worked as a consultant to Marconi. Marconi too made improvements of his own to transmitter design and antenna configurations. Much of this work had no theoretical backing. They were discovered by a combination of chance, fortunate mistakes, acute observation, and an intent to explain things. Marconi was not a scientific genius but he didn't need to be. Just as Edison pointed out later, genius was not a prerequisite for success.

In 1899, the English Channel was spanned wirelessly at a distance of 32 miles. To his surprise, Marconi found that signals could be picked up many miles inland. Apparently, signals were not limited to line-of-sight transmission. The curvature of the earth did not seem to impose a limitation. Marconi became confident that if this was the case, transatlantic communication might be possible. He argued that more powerful transmitters and bigger antennas would increase the distance. He worked out that doubling the antenna height would quadruple the distance. His initial efforts were a failure. The antenna at Poldhu at England's southwest coast was a monster that consumed enormous power. When the giant sparks flew off, static charges remained in the surrounding area for a long period. People went deaf. Then in a gale, the entire antenna collapsed.

Marconi was unfazed by such failures. With Fleming's assistance, a new transmitter design was implemented. Lacking sufficient funds to build a similar structure across the Atlantic, Marconi opted for a simple wire antenna hung by kites and balloons. This meant that at St John's, Newfoundland, Morse signals could be only received. The breakthrough happened in December 1901 when three dots were received at St John's from the gigantic transmitter at Poldhu more than two thousand miles away. The dots signified the letter _S_ and they were detected using a magnetic detector connected to a telephone receiver. Marconi had built such a detector based on an invention of Ernst Rutherford in 1895. This first transatlantic wireless reception marked one of Marconi's greatest moments. No one could explain how signals could travel such great distances. Many thought that Marconi had mistaken static noise for genuine signals.

Two months later, Marconi successfully repeated the experiment with a receiver placed on the _Philadelphia_ on its journey from Southampton to New York. With this, there was no doubt that transatlantic wireless transmission was possible. Marconi also noticed that often signals were clearer at night than during the day. Theorists scrambled to seek an explanation. Oliver Heaviside and Arthur Kennelly independently postulated that this was due to a layer of charged particles in the upper atmosphere. The layer was responsible for guiding radio waves over great distances by wave refraction. It was only in the 1920s that the presence of this layer was confirmed by Edward Appleton. Named the _ionosphere_ , it consisted of charged particles created under the influence of cosmic radiation. At night, charge density drops. As a result, wave absorption is less and reflection is more. Appleton used directional antennas to send a beam of radiation to the ionosphere and measured the time taken for the beam to get back to earth. By such measurements, he figured out the ionosphere's height from the earth's surface. Appleton's techniques played an important role in the later development of what became radar technology.

The Branly coherer's time in the limelight was only a decade. For a while, a mercury-iron detector of Bose's invention was in use. Like the magnetic detector of Rutherford, it didn't need a decoherer mechanism. What really signalled the end of these early detectors was Fleming's invention of the rectifying diode, later improved by the audion of de Forest. Vacuum tube electronics took about a decade to become practical. Electronics vastly improved the sensitivity of radio receivers. Sensitivity, however, was not the only problem facing radio receivers.

While sensitivity is all about picking up a weak signal, there was something else about wireless transmission that was not immediately apparent to early practitioners including Marconi. The fact was that early wireless transmissions splattered signal power over a wide band of frequencies. As such, transmissions were broadband. All equipment being alike, this meant that only one pair of equipment could communicate at any one time. If anyone else attempted to use the wireless medium, the problem of interference prevented any intelligible communication. Such a problem had never arisen in earlier systems due to the use of separate wired circuits.

It was in this context that Lodge and Marconi realized the importance of _tuning_ or _selectivity_. They saw that if each transmitter had its own frequency, it would be possible for multiple parties to communicate without interference. Multiple transmissions would use the same wireless medium but use different parts of the frequency spectrum. The radio receiver would select a desired frequency band and reject all others. Essentially, this was an extension of the resonance principle to both transmitter and receiver so that out-of-band interference was minimized. Lodge obtained a British patent for selective tuning in 1897, which was eventually bought over by Marconi.

Selectivity in radio transmission systems was really a reincarnation of FDM, which had already become standard in multiplexing telegraph transmissions on the same wire. Interestingly, telephony did not initially multiplex voice calls on same wire pairs. It was frequency multiplexing on the wireless medium that directly inspired George Squier of the US Army Signal Corps to conceive of FDM for telephony. He demonstrated such a system in 1910, which in time was researched and commercialized by AT&T. This is just one example of the precedence that wireless systems sometimes established and wired systems followed. Squier himself named his invention "wired wireless" system, clearly signifying the pioneering influence of wireless. The general perception that wireless technology was mostly inspired by wired systems is only partially true. The access method employed by ALOHA and later adopted in the design of Ethernet is another example.

Wireless telegraphy developed steadily thereafter, culminating in the Telex international service in the 1930s. Radioteletypes (RTTY) appeared. For decades, these remained the primary means of connectivity to remote communities in developing countries. Even today, a handful of amateur radio operators use RTTY terminals to communicate across the night sky. The essential mode of communication in both telegraphy and telephony was _point-to-point_ or _unicasting_. In other words, there was one sender and one listener. Early telephony had also introduced "party lines" in which multiple subscribers shared the same line to the local exchange. This saved the cost of individual cabling for each subscriber. When an incoming call arrived on a party line, all the subscriber phones would ring. To differentiate the intended callee, each subscriber was given a unique ring tune. This was perhaps the first instance of caller tunes that are so prevalent in modern cellular telephony. However, this did not prevent many subscribers from picking their handsets. Since the line was common, neighbours liked to gossip and eavesdrop.

Party lines had a special ring tune that indicated a community call for everyone to pick up. This was used for advertisements, grain prices, news, and local information. Party lines therefore enabled for the first time electrical _point-to-multipoint_ communication, also called _broadcasting_. Decades later, it became possible for multiple subscribers at different locations to talk to one another on a common call. This is formally termed as _multicasting_ but generally known as _phone conferencing_. At the start of the twentieth century, wireless telegraphy was point-to-point until someone realized that broadcasting made a lot more sense for the wireless medium.

Unicasting, Multicasting, and Broadcasting

(a) Communicating to a single user in which interaction is usually personal. (b) Communicating to multiple users, usually of a select group. Interaction can be one-way or two-way. (c) Communicating to all users in which interaction is always one-way.

While with the United States Weather Bureau, Canadian engineer Reginald Fessenden started asking himself if wireless could be used to interconnect weather stations. From this simple idea, he experimented and improved on existing radio equipment. Early radio equipment was capable of only Morse code transmissions. Fessenden was the first to attempt wireless transmission of speech. Using more sensitive detectors of his own and a spark gap transmitter, his first experiment of 1900 was only a partial success. The speech was noisy and not smooth due to the intermittent nature of the sparks. Speech being continuous needed a fundamentally different transmitter design. Sure enough, Ernst Alexanderson of Swedish origin, then working as a designer with General Electric, came along with the _alternator_. This could generate continuous waves, which in turn could be modulated by human speech. Fessenden also improved the alternator to generate higher frequency waves. In a now famous demonstration of 1906, on the Eve of Christmas, Fessenden gave a short speech and then played a piece of Handel on his violin. The transmission was heard by receivers more than a hundred miles away. This was the first radio broadcast in telecommunications history.

Radio was one of those technologies that had a huge social impact. Anyone who controlled mass communication, controlled information flow and influenced public sentiments. Radio represented power and television broadcasting followed in its footsteps. It represented the transition from unicasting to broadcasting. But in the days of Fessenden, the importance of radio was not immediately recognized. It was only in the 1920s that radio really took off. Launched in 1920, KDKA Pittsburg became the first radio broadcasting station. In the US, Radio Corporation of America (RCA) was founded in 1919. By 1923, the US had more than five hundred broadcasting stations. By 1929, there were four million AM radio receivers in the US. In Britain, the British Broadcasting Service (BBC) was formed in 1922. This later absorbed Marconi's operations in Britain. While broadcasting was essentially a private service in the US, in Britain and Europe it started as a government monopoly and remained so for many decades.

Real commercial success of radio broadcasting can be attributed to vacuum tube electronics. Even the term radio went beyond its original meaning of wireless and became synonymous with broadcasting. Ironically, when TI introduced the transistor radio in the 1950s, people started referring to this revolutionary new gadget simply as the transistor. In both cases, the underlying technology acquired a larger presence than its implementation. But there was one technology that came too early, remained unexplained, and was ultimately neglected in its own time. This was the early semiconductor technology of Braun.

Although Braun had discovered the rectifying properties of the galena crystal in the 1870s, he did not apply it to radio receivers until 1906. By then, Bose of Calcutta had done a successful application of such crystals to radio detection. Also called _cat's whisker_ , it was simply a thin copper wire delicately touching the crystal surface. Essentially, this formed a p-n junction but this was not understood until later research that led directly to the invention of the solid-state diode and transistor. Crystal radio sets became popular among hobbyists who could buy the parts and assemble them on their own. Best of all, such receivers operated at low currents and did not even require a battery. By the mid-1920s, the US radio environment was teeming with as many as five hundred radio broadcasting stations. What were apparently quiet nights were in reality happy hours of news, events, and gossip coming across from stations thousands of miles away.

A Crystal Set Radio Receiver

A Swedish crystal radio is shown here with earphones. The inset shows a point-contact formed between semiconductor base and metal wire. Source: Holger Ellgaard, Wikimedia Creative Commons.

Ever since the Titanic disaster of 1912, amateur radio operators in the US had been slapped with regulations. The ham radio network of these amateurs had little to do with the Titanic disaster but they become convenient scapegoats. Every radio operator was required to obtain a license. Many frequencies were made unavailable to them, so that they were forced to switch to a higher frequency spectrum. This was the shortwave spectrum (about 3-30 MHz) that had not been widely used for wireless transmissions. When these amateurs started using shortwave, they discovered to their surprise that shortwaves travelled much farther than they had ever imagined. This was the effect of the ionosphere as explained by Appleton soon after.

No single person can be named to have discovered the advantages of shortwave. The honour goes to the amateur radio community as a whole. They were the first to achieve speech transmission across the Atlantic. At 60 kHz carrier, AT&T's first calls across the Atlantic used long wave radio. Only later did they adopt shortwave. While AT&T had managed a one-way voice call from Virginia to Paris in 1915, a two-way call became possible only in 1926. Ham radio amateurs had achieved something similar five years earlier. Nonetheless, AT&T's commercial offering of transatlantic telephone service was a great step forward. For the first time, a subscriber in the US could pick up her home telephone and place a long-distance call (by operator assistance) to someone in England. Three long decades later, to cater to the growing volume of traffic, the first transatlantic cable was laid to carry speech. Until then, it was radiotelephony that served to connect the two continents.

One of the key achievements that happened in the process was an implementation of _Single Sideband Amplitude Modulation (AM-SSB)_ , which had been invented earlier by J. R. Carson. Raymond Heising at AT&T was attempting to conserve transmission power by removing the carrier since for AM demodulation, carrier is not necessary. Heising found in AM-SSB a suitable method to remove the carrier. In addition, one of the sidebands was filtered out so that bandwidth was conserved and link capacity nearly doubled. Normally with AM, the carrier is centred on two sidebands but one of them is sufficient to carry the information. Interestingly, Heising's work on AM-SSB is among the earliest to implement two orthogonal carriers in the modulator, that is, the use of in-phase and quadrature-phase carriers. Heising himself probably did not realize this since the mathematical concept was developed in later decades, particularly with digital modulation techniques.

For decades thereafter, AM-SSB remained the staple modulation in coaxial carrier systems and radio networks. While AM-SSB was preferred for long distances, narrowband FM was used for shorter distances. Its popularity was limited since it was less efficient than AM. It was only later in the 1930s that Edwin Armstrong invented wideband FM to offer better sound quality. But long before Armstrong invented his FM, he had done something so important that wireless communication as we know it today would not have been possible without it.

Armstrong's birth and childhood somewhat follows that of radio itself. He was born in 1890 just as research on the newly discovered Hertzian waves was gaining momentum. By the time the diode, the audion, and crystal sets came into existence, Armstrong was old enough to appreciate these developments and young enough to remain wildly curious. He experimented with these devices and even built large antennas in his backyard. The outcome of these early hands-on investigations was that while still an undergraduate at the Columbia University of New York, he made an important contribution to the sensitivity of radio detectors. He took de Forest's audion and introduced into it positive signal feedback, otherwise known as _regenerative feedback_. Basically, part of the plate's output was fed back into the grid. This resulted in higher amplification and greater sensitivity. The only problem with such a detector was that it could handle signals only over a narrow band. This wasn't really a problem back then since speech occupied a narrow band.

Armstrong applied for a patent in 1913 and even arranged for a demonstration of his invention. Among those present was the original inventor of the audion. De Forest later claimed priority but it was clear that the idea of feedback had never occurred to him. If he had dabbled with feedback, he had clearly not understood its principle. Since the audion itself had been sold to AT&T, it was in AT&T's interest to support de Forest's claim of priority over the audion and positive feedback. Harold Arnold's own patent for the vacuum tube amplifier was based on the audion. Arnold's patent was becoming commercially important for AT&T since it enabled for the first time transcontinental telephony. This was the context for one of the great patent disputes in history, a dispute that ran for two whole decades. Although Armstrong lost this battle, he came up with another great invention. At least for this one, he had clear priority.

Posted to Paris during the Great War, Armstrong was part of the US Army Signal Corps. By now, shortwave communications was becoming increasingly common but the Allies had difficulty detecting enemy signals. Regenerative detectors had trouble separating audio signal of a few kilohertz from a carrier of many megahertz. Both amplification and selectivity of the incoming signal was a problem. One common method of detection was to combine the incoming signal with a local signal at the carrier frequency. This was a direct conversion or _autodyne_ method attributed to British engineer Henry Joseph Round. Armstrong proposed that it would be better to do most of the amplification and signal processing at a lower _intermediate frequency (IF)_ rather than at the carrier _radio frequency (RF)_. The beauty of the design was that different carrier signals could be reduced to the same IF. Receiver equipment could be optimized to work at IF while only the receiver front end focused on RF. Once the signal was properly filtered and amplified at IF, it could be brought down to baseband _audio frequency (AF)_ and sent to the speakers for listening.

Armstrong's principle came to be called _superheterodyne_ , supersonic heterodyne, a mixing of two different frequencies that led to signals above the audible range. In an early implementation, Armstrong used as many as eight vacuum tubes consisting of signal mixers, amplifiers, and oscillators. While the invention didn't change the course of the war, it came into widespread use immediately thereafter. To this day, Armstrong's design remains essential to the working of FM radio, Wi-Fi devices, satellite links, and cellular phones. In FM radio for example, the RF spectrum is 88-108 MHz. It doesn't matter to which particular channel a receiver is tuned, the incoming RF signal is amplified with an LNA and then downconverted to an IF of 10.7 MHz. The desired station's signal is amplified and neighbouring station signals are rejected more easily at IF. The signal is then downconverted to audio, further amplified, and sent to the speakers. Some designs employ multiple stages of IF in the receiver.

In this long line of developments from Maxwell to Armstrong, if one is asked to identify the father of radio, the name of Marconi often comes up. Even with Armstrong's superheterodyne technology, similar developments had been done earlier by Fessenden, Round, Hogan, Schottky, and Lucien Lévy. With radio, history has shown us once more that inventions are often associated with those who make them practical and a commercial success. Ultimately, when the Nobel Prize for Physics was awarded in 1909 for the advancement of radio science, only two names stood out—Ferdinand Braun and Guglielmo Marconi. One was a first-rate scientist with a sound theoretical background; the other was a first-rate engineer who relied more on hands-on experience.



**Wireless communications began** with ground stations fitted with antennas. As technology progressed to higher bands of the frequency spectrum, antenna sizes could be reduced. The fact is that antenna sizes are intimately related to signal wavelengths. When the properties of shortwave were discovered, wireless links could be established across thousands of miles thanks to the ionosphere. Now engineers started looking at the night sky and asked if objects in space could be used to aid in communication. Wireless by its very nature was not limited to terrestrial propagation. If one could send signals to space and relay them back to earth, it would be possible to cover the entire globe wirelessly. This was the beginning of satellite communications and the first satellite that engineers looked at was the moon.

The basic idea was that signals could be sent to the moon where the lunar surface would reflect them back to earth. It was an idea that came about purely by chance observations. By the late 1920s, shortwave communications was well established. Some researchers started looking into transmissions at Very High Frequencies (VHF), say about 50 MHz. They found that sometimes the returning signal came with a very long delay of perhaps a few seconds. Ionosphere reflection generally incurred a delay in the order of milliseconds. This was the first indication that perhaps these VHF frequencies penetrated through the ionosphere and were reflected by the moon. There was little progress with this idea for perhaps a decade. Then the Second World War intervened during which tremendous strives were made in radar technology.

Radar was nothing but sending a focused wireless signal and waiting for echoes. Signal reflection and scattering from moving aircraft generated these echoes. By measuring the time delay between transmitted pulse and received echo, an aircraft position was betrayed. When the war ended, the US Army Signal Corps immediately undertook research into applications of radar in novel ways. Apparently, the Germans had used V2 rockets during the war that could fly 70 miles above the earth. The US wanted to go higher up and have an eagle's capability to intercept such missiles. Under the name _Project Diana_ , a radio station was setup in New Jersey and experiments soon commenced on _moon relay communications_ by using reflected power.

From the start, the project faced technical challenges. Current radar technology needed to be vastly adapted for getting signals to the moon and back. As much as 3000 watts of power was needed. For increased sensitivity, four stages of Armstrong's heterodyne principle had to be employed. To get a clear reflection, the signal bandwidth was limited to 50 Hz. At such low bandwidth, the use of microwave frequencies was ruled out and a carrier of 110 MHz was selected. Antenna arrays installed on a 100-foot tower had to be designed carefully to obtain necessary gains and SNR. Theoretically, signals are radiated in space equally in a spherical pattern. Practical antennas concentrate signals in a particular direction towards the receiver. This is then the gain of the antenna. In wireless system design, antenna gain may very well be the single factor between maintaining and losing a link. Then there was the problem of _Doppler Effect_.

Doppler effect in the audio spectrum is easily experienced when a whistling train is rushing past us. As the train approaches, we experience the whistle at increasingly higher pitch. As the train recedes, the whistle seems to drop to a lower pitch. The same happens at any part of the frequency spectrum. In the case of earth-moon communications, both objects are in relative motion. So the frequency that's sent out experiences a Doppler shift to another frequency that may be 300 Hz away. Since this shift is variable, some amount of receiver tuning must be done to detect the faint echoes coming back from the moon. Doppler shift had not been a problem for radar during the war because the shift was insignificant in relation to the signal bandwidth. As an additional constraint, the nature of antenna orientation limited the experiments to moonrise and moonset times. This meant a window of only forty minutes to perform the experiments. Despite these challenges, first echoes from the moon were received successfully in January 1946.

In the long run, nothing much came out of _Project Diana_. It was not understood why echoes sometimes didn't come back. Many factors such as ionization, cosmic noise, and multiple moon reflections, were either unknown or could not be investigated from only an engineering perspective. Fundamentally, moon is a passive reflector and hence only a relay station. This means that power entering the receiver is heavily attenuated at inverse fourth power of earth-moon distance. Empirically, engineers knew that transmitted power decayed with distance but no one had given an elegant mathematical proof. About the same time in 1946, Harald Friis of Bell Labs published an important equation linking radio signal attenuation, antenna gain, and frequency. This equation proved to be as important to engineers as his earlier work on noise figures. Friis essentially showed that a radio signal was attenuated at inverse square of the distance. In reflected power communication, the path loss was much worse.

If moon was too far, the next best solution was to place a man-made satellite in earth's orbit for the purpose of communications. Newton had suggested that given enough velocity, objects could be made to overcome earth's gravity and journey into space. In 1867, Everett Hale had conceived of using such satellites for communication and navigation. An idea can only go so far as technology will allow it; and technology will arrive only when necessity requires it. The nineteenth century was certainly premature for any form of satellite technology to take shape. It was only after the Second World War that efforts were underway.

Despite the usual claim by governments, that satellite technology was intended for global communications, from the outset it was always about military might and espionage. The stated goal of peaceful international cooperation was only a front to mutual suspicion. Thus, when the United Nations formed a committee in 1959 to look into issues of outer space, the committee itself did not meet until two years later. When it did, there was no consensus. Established rules of nationality, territorial sovereignty, and international law didn't seem to make sense in outer space. There was no clear demarcation as to where a country's air space ended and where outer space commenced. While international cooperation stalled, the few who had the technology took the lead for their own gains. If satellite technology progressed as fast as it did, it was purely due to the Cold War. The Soviets first led the race but they were soon overtaken by the Americans who made vast strides through the sixties. The first of these was launched into a _Low-Earth Orbit (LEO)_.

The Score satellite of 1958 may be seen as the first communication satellite. Launched by the Americans, it was more of a publicity stunt and propaganda. It carried on board a recorded message from President Eisenhower, "Peace on earth. Goodwill to men." Two years later, a 100-foot polyester balloon wrapped in aluminium foil went into the earth's atmosphere. This was appropriately named Echo 1, since all it did was to echo the signal back to earth. Considering that the concept of moon relay communications had hit a dead end, one would have expected that future engineers would keep their distance from similar ideas. Surprisingly, mistakes have a way of coming back from the dead. To give engineers the benefit of doubt, it was felt that active electronic components were as yet unproven in space at that altitude. Hence, an incremental approach using a passive balloon reflector was adopted. Once in orbit, President Eisenhower once more made an official broadcast except that this time around his recorded message was sent up from California, reflected by the orbiting balloon, and was received at New Jersey. The brain behind Echo 1 was John Pierce of Bell Labs but the balloon reflector itself was originally conceived by William O'Sullivan of NASA.

In earlier days, Pierce had published a few articles in science fiction. Most of early science fictional writings involved robots, men in space, and space travel. The use of unmanned satellites orbiting around the earth did not take up a writer's fancy. Sometimes the most useful ideas are the simplest and therefore easily mistaken for being too primitive. With Echo 1, space communication was no longer in the realm of science fiction. Besides being a pioneer in satellite communications, Pierce's other claim to fame is in relation to the transistor. Although it was his colleagues at Bell Labs who invented the transistor, it was Pierce who named it. At Bell Labs, working alongside Shannon, he did a theoretical analysis of PCM. Echo 1 used neither the transistor nor PCM technology. It owes its success to its own simplicity and realistic goals. It gave a platform upon which the next generation of communication satellites could be built for commercial use. These were Telstar 1 (1962) and Telstar 2 (1963). The latter could carry multiple voice channels and even a low bandwidth TV channel. With these, satellite communications had finally arrived. Because Telstar and later satellites used on-board signal amplification at microwave frequencies, the design of efficient amplifiers was in itself a challenge and this where Pierce's greatest contribution lay.

Although Pierce came into popular public view due to his role in Echo 1, he himself claimed little technical contributions in that project. He saw himself more as an evangelist, an agent to inspire others. Before his involvement in Echo 1, he had spent more than a decade with microwaves. Since the late 1930s, Bell Labs had been actively pursuing microwave research. Microwaves provided high-bandwidth channels and therefore became suitable candidates for FDM multiplexing of many voice circuits. The problem was that there were no suitable broadband amplifiers. About this time, coaxial cables were just entering active service as the carrier technology of choice but Bell Labs was already thinking ahead. Coaxial cables involved high installation costs. Microwave wireless links would be far cheaper.

Although Pierce conceived of the _Travelling Wave Tube (TWT)_ , it was Rudolph Kompfner, an Austrian architect-turned-physicist, who actually built one independently of Pierce. Kompfner himself did not explain how it worked and it was Pierce who provided the theoretical background. It is therefore said that Kompfner invented the TWT and Pierce discovered it. The TWT worked at microwave frequencies and could amplify signals over a wide band. Its workings were a combination of Maxwell's waves and J. J. Thomson's discovery of the electron. Electron beam and electromagnetic wave interacted with each other so that energy from the electrons was transferred into the wave. This was how the TWT achieved amplification and its gain was impressive. Nonetheless, to avoid signal distortion during amplification it was usually operated with a lower gain. To operate at full gain would mean intercarrier interference perceived in the form of intelligible crosstalk. In other words, a pair of speakers could hear conversations from a neighbouring voice channel. Without the TWT, active satellites in orbit would not have been possible.

The issue with these early satellites was that there was no global or constant coverage. Like the moon, LEO satellites never stayed in one location, which meant that it could not be used for communication at a location until it appeared again over the horizon hours later. The way out of this problem was to launch a series of LEO satellites to circle the globe. Together, these would form a satellite net over the globe and provide constant coverage. It also meant that while a voice call was in progress with a particular satellite, and if that satellite dipped below the horizon, the call had to be handed over to another satellite in view. The subscriber does not necessarily perceive this handover if it is done perfectly. There are fundamental problems to such a satellite net. To launch and manage perhaps a hundred satellites to achieve this involves considerable cost. Ground stations should constantly track the satellites as they move about in orbit. Handover of live calls is complex. This sort of a proposal was never implemented. There is one other reason why very few back in the late sixties were enthusiastic about LEO satellites. They knew of a more promising alternative that had been proposed by a British engineer in 1945.

To say that Arthur Clarke was an engineer is to be oblivious to the diversity of his personality and his contributions in many fields. He was also a writer, a futurist, a humanist, an explorer, and an evangelist of space science. He himself wished to be remembered as a writer who stretched the imagination of his readers and inspired them to explore their limits. His writings have given us many memorable quotes that have possibly drawn an entire generation of young minds towards science and engineering. Just as Newton had his three laws, Clarke too had three of his own,

First Law—When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.

Second Law—The only way to discover the limits of the possible is to go beyond them into the impossible.

Third Law—Any sufficiently advanced technology is indistinguishable from magic.

When Clarke published in _Wireless World_ magazine, issue of October 1945, a proposal to cover the entire globe with just three satellites, it was partly perceived as magic. Very few recognized that the proposal was hardly a matter of science fiction. Clarke clearly showed how there was a unique mathematical solution to the problem, how at exactly 22,300 miles above the earth's surface this could be achieved. LEO satellites operated at most a couple of thousand miles above the surface. Clarke's satellites were to orbit at ten times the height. If the difficulty of putting a satellite into such a high orbit could be overcome, the gains would be very lucrative.

Because the satellite is at a higher orbit, it sees more of the earth's surface. This is the reason why just three satellites are enough to provide global coverage than perhaps a hundred LEO satellites. The real beauty of the proposal was that these satellites would orbit the earth in perfect synchrony with the earth's own rotation, which means that the satellite stays in exactly the same position above the earth. Ground stations do not need to track the satellite since there is no relative motion. Antennas on the ground can simply point to the same location in the sky. There is no problem of handover. Equipment and communication protocols become simpler. These were the _Geostationary Earth Orbit (GEO)_ or geosynchronous satellites. The main technical challenge was the distance.

Syncom 2 of 1963 built by Hughes Aircraft became the first successful telecommunication GEO satellite. Being experimental, it was not intended to remain in a stationary orbit. It was quickly followed a year later by Syncom 3, which was truly a GEO satellite. Since Syncom 3 had enough bandwidth to carry a TV channel, it enabled live coverage of the 1964 Tokyo Olympic Games. Much of the design credit for these satellites goes to Harold Rosen. They were lighter than the Telstar satellites, so light that prevailing booster technology became adequate to carry the satellite into high orbit. The satellites were better designed for communication and altitude control. Riding on the success of Syncom, an international organization by the name of Intelsat was formed. The intent was to implement a worldwide satellite network using GEO technology so that, for example, a voice call from Alaska could be relayed via satellites and switched by telephone exchanges to a subscriber in Sri Lanka.

Just as the Soviets were launching _Medium Earth Orbit (MEO)_ satellites, Molnya 1 (1965) and Molnya 2 (1971), the first of the Intelsat satellites took station over the Atlantic and came into operation in 1965. It could carry 240 voice circuits. Two years later, another satellite came into position over the Pacific. The launch of the Intelsat 3 series began in 1968. By 1969, one of these came into position over the Indian Ocean. This marked a milestone—the entire globe was connected by satellite technology. This came barely weeks before Neil Armstrong's moon landing. While the world watched spellbound mankind's "giant leap," some unsung heroes behind the curtains were the Intelsat 3 satellites. Arthur Clarke's vision had finally come true.

To put things into perspective, intercontinental cables could provide at best a few hundred voice circuits. Intelsat 3 satellites could provide 1,200 voice circuits plus two TV channels. Operating at microwave frequencies, satellites had more bandwidth to work with than coaxial systems. The early developments of satellite technology, stunning as they were on their own, were only a prelude to what was to follow. Ever since the introduction of PCM/TDM in the early sixties, it had become clear to engineers that satellite technology would have to sooner or later leave their analogue world behind. The transition to digital took a long time. While the first experiments were conducted on Intelsat 1, satellites did not go digital until much later. Many technical problems had to be sorted out along the way.

Time division multiplexing evolved into _Time Division Multiple Access (TDMA)_ technology. The difference is rather subtle. With multiplexing, a centralized controller or exchange multiplexes different voice circuits. With TDMA, multiple users attempt to get hold of a circuit they can use. In this case, each circuit lasts for a short burst called the _time slot_. One of the challenges was to maintain this burst intact for thousands of miles between earth and GEO orbit. Research in TDMA happened during the late sixties. This was also the period when ALOHA took shape at the University of Hawaii. ALOHA was essentially a TDMA system. In later implementations, ALOHA too used GEO satellites.

An interesting problem was experienced by Gene Gabbard in the early tests leading to TDMA systems. Gabbard noticed that stray radio bursts in the receiver whenever the transmission burst was active. Satellite systems are generally based on _Frequency Division Duplexing (FDD)_ in which transmission and reception happen at different frequencies. So it was strange for Gabbard to see a burst at 4 GHz when the transmission was at 6 GHz. By measuring the delay, Gabbard estimated the point in the waveguide where the problem might possibly originate. When the waveguide was opened up, some nuts and bolts were discovered. These turned out to be the source of the problem. By removing these nuts and bolts, link performance improved by about 2 dB. This example characterizes the difficulties faced by radio engineers. This is true too of engineers who design antennas. For decades, RF engineering was more an art than a science. If Maxwell and Hertz had built the science, engineers struggled quite a bit to reduce that knowledge into commercial practice.

The upshot of this struggle was that in 1986 Intelsat 6 became the first satellite to adopt TDMA technology. Not only was information carried as bits, voice, data, and video could all be carried in the same time burst if so required. TDMA provided the flexibility to juggle capacity allocations to individual earth stations. Moreover, each burst could use full transmit power without worrying about intercarrier interference that had been a problem in FDM systems. The other advantages of going digital came naturally. HDTV video could be compressed and more channels could be squeezed into the same bandwidth. Redundancy could be introduced easily with sophisticated concatenated codes. This led to adequate BER performance while conserving precious satellite power. Part of this performance was because digital satellites were active devices that operated at baseband. Satellites that operated at RF were little more than repeaters. They received RF signals, amplified them, and sent them back to earth. Digital satellites that worked at baseband, did full demodulation to the level of bits, did error correction, and took this information to modulate signals returning to earth. This could never have happened quite as effectively in an analogue world.

At least for telephony, the real problem with satellites turned out to be the propagation delay. There was nothing engineers could do to reduce this delay. This was a fundamental limit from the world of physics. Electromagnetic waves could not travel faster than the speed of light. Signals took a couple of seconds to travel to the GEO orbit and get back to earth. Such long delays made echo suppressors ineffective. Echoes and uncomfortable silences became annoying. Speakers kept interrupting each other. Echoes were ingeniously removed by synthesizing it and subtracting it from the received signals. In other words, echo suppression gave way to _echo cancellation_. But the problem of delay remained.

Beginning in the 1980s, long-distance calls began a slow migration from satellite technology to undersea optic fibre technology. Fibres did not come with bandwidth limitation since one could simply lay more of them when demand increased. In some sense, a cycle was completed. Long-distance telephony began with wires, went wireless for a while, and eventually returned to their roots. The difference is that in the beginning it was in copper and coaxials, while today it is in fibres. In the late 1990s, Motorola attempted a revival of satellite telephony. Using its Iridium series of 66 LEO satellites, the idea was to contain delays to with a hundred milliseconds. The project went into bankruptcy. It never made commercial sense to consumers and the heavy initial investments could never be recovered.

This does not imply in any sense that satellite technology is dead and gone. They may not be the preferred technology for point-to-point voice service but their real strength is in broadcasting. Today's satellites are essential for military and maritime communications. Low-cost earth stations were invented in the early seventies to persuade developing countries to invest in low capacity satellite links. With such stations, capacity could be acquired and reserved on-demand. Such reservations, going by the name of _Demand Assignment Multiple Access (DAMA)_ , could be made in a decentralized manner. _Single Channel Per Carrier (SCPC)_ technology enabled fixed allocation of low bandwidth pipes at low cost. With increasing demand, many SCPC/FDMA links have been replaced with TDMA links. Part of this shift can be attributed to the move from circuit switching to packet switching. Today, satellites are commonly used for video distribution and broadcasting. In the beginning, TV channels were relayed to cable operators, who then distributed them locally using coaxial cables. No one back then foresaw that cable TV networks would soon face competition in their own backyard.

Since the 1990s, video broadcasting has become a phenomenal success with hundreds of channels reaching home subscribers directly from a GEO satellite. This has been possible partly because satellite receivers have become so cheap that home subscribers can afford to buy them. Satellite TV networks operate at 12 GHz downlink (satellite-to-ground) and 14 GHz uplink (ground-to-satellite) bands. This means that the downlink wavelength is only 25 centimetres, which makes receiver antennas compact for residential installations. In the mid-1960s, this reduction of a satellite receiver for the home subscriber might have been thought impossible. Indeed, many receivers back then used costly cryogenically cooled receivers to reduce the noise figure of receiver amplifiers. Steady improvements in semiconductor technology and receiver designs have made advanced technology look simple. Satellite TV has transformed remote communities almost overnight. For example, many isolated communities of the Indian subcontinent are now receiving hundreds of channels from what seems to be from another world.

The other popular application familiar to most people is the GPS system. GPS receivers communicate with MEO satellites to locate their ground position down to the accuracy of better than a metre. At any particular time and space on earth, at least three GPS satellites are in view. Combined with maps, they have become effective navigation devices for road drivers and even mountain hikers. When used in conjunction with Geographical Information Systems (GIS), GPS has enabled researchers to analyse geological fault lines and river tributaries. Another common but important role played by satellites is in meteorology and climate change studies. Centuries ago, the way men pieced together the world's geography was from the explorations of Christopher Columbus, Ferdinand Magellan, Vasco da Gama, and James Cook. More recently, Internet company Google has done it by photographing the earth using satellites in space. In short, satellites have become eagle eyes in the sky. The old debates about individual privacy and international espionage are still with us today. Even if the Gods are asleep, rest assured, we will always have satellites to watch over us.



**Long before satellite** technology matured, the terrestrial space was teeming with wireless signals. Radio broadcasting was followed by TV broadcasting. Radiotelephony using shortwave ground stations came into operation for long-distance calls. Thus, both wireless broadcasting and point-to-point systems existed. Soon after the Second World War, the experience gained with radar was put to good use. FDM carrier systems were introduced at microwave frequencies. They proved to be a cheaper alternative to coaxial cables. About the same time, ideas were put forward to make radiotelephony accessible to everyone.

With broadcasting, there are typically thousands of transmitters and millions of receivers. To translate this directly into point-to-point wireless, there is simply insufficient bandwidth resource to enable millions of transmitters. This was the key technical problem that needed to be solved. The first known proposal came from researchers at Bell Labs in 1947, Douglas Ring and Rae Young. Their idea was to divide the geographical landscape into smaller units or _cells_. Rather than have a single gigantic transmitter to cover the entire area, these cells with smaller transmitters would together give full coverage. Cells would be interconnected to form a network. Each cell serviced subscribers in its area of coverage but the network as a whole serviced everyone. The beauty of the whole scheme was that no longer were engineers constrained by bandwidth scarcity. Because a particular cell uses a small transmitter, its signal is limited to its immediate vicinity. This means that the same frequency can be reused in another cell sufficiently far away. In normal circumstances, two transmitters using the same frequency would interfere with each other; but if they are separated by distance and the transmission power is contained, this interference is greatly minimized. This was essentially the concept of _frequency reuse_ that comes naturally in a _cellular network_.

The Cellular Concept of Ring and Young

This is an original diagram from engineers at Bell Labs who first proposed the cellular concept. An entire geographical area can be covered by many small cells. It is also possible to overlay bigger cells on top of them in case lower capacity sufficed. Source: (Ring 1947, fig. 6). Reprinted with permission of Alcatel-Lucent USA Inc.

The internal Bell Labs memorandum that contained these ideas was not widely circulated and it is not quite clear what inspired them to think of a network of cells. It is possible that they got the idea from telegraphy, telephony, or even the postal services. All these shared one important trait—they were all networks, individual exchanges that served small areas of a few customers but collectively served everyone. In cellular networks, cell transmitters would be interconnected either by cables or microwave links to a hierarchy of serving nodes or exchanges. These exchanges would switch calls not unlike traditional telephone exchanges. They would interface with traditional exchanges as well to switch calls between landline telephones and cellular subscribers. This early conceptual development might suggest that a prototype of such a system might have become available within ten years. Commercial cellular networks did not arrive until three decades later. The challenges that stood in the way were more than just technical.

When the Bell System first introduced a wireless service at St Louis in 1946, it had only six channels. Because selectivity was not very good, the prevailing technology allowed the use of only three channels. A year later, a highway system was introduced between New York and Boston. These early experiments were heavily limited by low capacity. Just as worse, it was a push-to-talk system and the idea was tied to moving vehicles. There was no pocket-sized device that an individual could carry around for voice calls. There was no adequate battery technology to power such a device even if one had existed. Mobility was therefore based around vehicles rather than individuals. Radio systems mounted on these vehicles were meant to be powered by the latter's batteries.

Capacity was limited not because there was insufficient bandwidth but because the technology to tap into higher frequency bands was limiting. In fact, bandwidth scarcity is only relative to the technology at hand. Today, UV-rays, X-rays, and cosmic rays are not used to carry digital information, but perhaps in a distant future they may be tamed for our communication needs without their harmful effects on health. Back in the early 1950s, technology was just about maturing to allow modulation and amplification of microwave frequencies. When advances permitted the use of spectrum up to 1 GHz, FCC in the US took the decision to use large parts of it for television. This made good sense back then when television was on a strong growth curve. The result of the decision was that mobile telephony received only as few as a dozen channels in the 150 MHz and 450 MHz bands.

In the mid-fifties, manual switching was adopted for interfacing mobile and fixed telephone systems. When the capacity itself was low, it didn't make sense to automate switching at that point. A decade later, new systems under the names of MJ and MK were introduced. They incorporated automatic switching, whereby a subscriber could do direct dialling. There was as yet no technical standard as to how mobile telephony should operate in the radio space but earlier concepts of cells became useful. Powerful transmitters on large towers catered to many users but only a few could be served at the same time. This was clearly a capacity problem. These transmitters also introduced interference to others that might be using the same frequency. Therefore, care had to be taken to ensure ample separation to limit interference and improve system performance.

Other than these public systems, many private radio systems had been in use since the late 1940s. As early as 1921, Detroit police had used mobile radio systems mounted on their vehicles. These operated at a low 2 MHz band. Later, higher bands were opened up for the use of private dispatch services, individuals, and companies. These systems operated autonomously and were generally not connected to the public telephone network. When the need for public systems was felt, clearly a nationwide standard was needed so that subscribers could avail the service irrespective of location or device. FCC reallocated part of the UHF TV band near 900 MHz and invited proposals for a mobile telephony system. Bell Labs submitted its proposal in 1971 but FCC did not decide on spectrum allocation until 1974. These delays suggest that mobile telephony was not seen as an urgent public need until the late seventies.

The technical reasons for this delay are obvious to those well versed with the history of technology. Mobile phones could never have arrived without the IC of the early seventies. To complement this were the advances in software. Even the switching systems for cellular networks evolved from ESS. The first mobile phone DynaTAC was due to Martin Cooper of Motorola. From its initial conception, the first prototype took ninety days of engineering effort. The first call from this phone was made in April 1973. It might have weighed two-and-a-half pounds and might have looked like a brick, but otherwise it was the first phone that could be considered as handheld, untethered by wires, and used in an active call while on the move. By then, the old and cumbersome Daniel cells had been reduced to portable and compact batteries. Nonetheless, battery technology is one area that has lagged very much in comparison to communications, microprocessors, memories, and display technologies.

Another technical advance that became practical in the early seventies was the _frequency synthesizer_. For better use of resources, it has been known that resources must be shared across many users. Rather than assigning permanently one frequency to a particular user, many users share a set of frequencies under the realistic premise that at any given time only some of the users will be active. The practical problem here is that a receiver must be able to tune to many frequencies to select an available one. This usually implied separate tuning equipment for each frequency. With the arrival of low-cost frequency synthesizers, it became possible to tune to many frequencies with minimal hardware duplication. This was an essential innovation for mobile devices, which needed to be compact.

Perhaps the simplest and yet influential innovation from the user perspective was _pre-origination dialling_ , which basically means that users can type the digits, make corrections if necessary, store the number, and dial the callee as the final step in the process. This had not been possible with good old rotary dials or even the newer push-button telephone sets of the fixed network. The reason is that the fixed network operates on the electrical principle of "off-hook," which is essential for hearing the dial tone from the exchange. Such a dial tone is unnecessary for the mobile phone, which relies only on wireless resources and accesses them when the need arises. On hindsight, we can say that this innovation in signalling was perhaps the first step to typing text messages (SMS) before sending the same.

The proposal from Bell Labs contained an essential idea in relation to transmitter positioning. In the traditional conception, each transmitter radiates power equally in all directions horizontally. This means that the cell is a circle, often approximated by a hexagon for the purpose of planning and analysis. Engineers at Bell Labs figured out that if transmitters were to use directional antennas, then interference is reduced further. This leads to increased capacity and reduced cost of deployment due to fewer cells. A typical directional antenna can cover a third of the cell but since it is placed at the intersection of three cells, a transmitter with three such antennas covers the equivalent of a cell. For multiplexing, duplexing, and modulation the proposal was FDMA/FDD/FM. With an allocation of 40 MHz and 30 kHz bandwidth per channel, about 666 full-duplex channels could be supported at the same time. With 7-cell reuse enabled by directional antennas, about 95 channels are possible per cell. This means that 95 simultaneous voice calls are possible in one cell. This was the capacity of the system that came to be called _Advanced Mobile Phone System (AMPS)_.

Arriving at these system parameters was no trivial task. Propagation of radio waves in urban, suburban, and rural areas had to be studied. Rarely were cells perfect circles or hexagons. There were areas in which adjacent cells overlapped. There were possibilities of blind spots with no coverage. Field measurements had to be taken to achieve suitable cell planning. Engineers had to find the right match across system capital expenditure, capacity, spectrum usage, interference, and cell reuse. This know-how possibly took good part of a decade. One of the pioneers of cellular technology, Richard Frenkiel, recounted in his memoirs the early days of experimental work at Bell Labs,

Reuse distance was just another complex and fascinating question, especially since there was almost no experimental data on radio propagation at this new frequency. We found ourselves poring over old monographs and covering our walls with topographical maps. We stood on ladders, counting hills in northern New Jersey and San Francisco, and trying to figure out what cell sizes and reuse distances would work.

Although AT&T pushed FCC for trials, approval did not come until 1977. By 1978, the system was proven and a full commercial service was ready to be launched. Once more, no approval was received and the system remained in trial form with a limit of no more than two thousand subscribers. The system became commercial only in 1983 but AT&T gained little from this delayed decision. The real reason for these delays was elsewhere. Through the seventies, AT&T was becoming more and more involved in antitrust investigations from the government. The company had been operating for years as a regulated monopoly in the fixed telephony space. No one wanted it to become a monopoly in the cellular space as well. For this reason, there were already in the US many Radio Common Carriers (RCC), who did not have a fixed telephony network but had a presence in the radiotelephony business. Motorola Corporation was one among them. The RCCs and independent equipment manufacturers were keen to keep AT&T from monopolizing cellular telephony. RCCs were generally small players servicing local needs. They certainly would not be able to compete against AT&T should the latter choose to deploy a nationwide cellular system. It was against this backdrop that the introduction of AMPS was delayed.

In January 1984, AT&T was formally broken up into smaller regional companies. The result of this was that each one deployed its own network in its regions of interest. Although AMPS was a standard for all to follow, there was no process to ensure strict conformance. The result was that devices worked in their local regions but when the subscriber moved to another region, problems of interoperability arose. This was in the form of poor voice quality, call blocking, or simply service unavailability. There was also no clear billing practices when a subscriber moved from one network to another. The original research of Bell Labs had not considered many of these operational problems. It didn't need to because it was foreseen as a single system. When it came to implementation after the divestiture, the AMPS cellular network was really a patchwork of many networks that had their differences. In the end, the subscriber turned out to be the real loser.

AMPS did inspire networks in other countries—MCS in Japan (1979), C450 in West Germany (1985), and TACS in Britain (1985). Another popular system evolved in Europe to connect many of the Scandinavian countries. The geographic isolation and harshness of the environment were key motivators for these governments to take interest in mobile telephony early on. Like in the US, private mobile radios had existed here since the fifties. Then in 1967, Carl-Gosta Asdal proposed to interconnect the mobile network with the fixed telephone network. It was from here that the _Nordic Mobile Telephone (NMT)_ system evolved. NMT was defined by negotiation and consensus across countries. Introduced in 1981, it eventually grew to include nearly thirty countries in Europe. It started operating at 450 MHz band but later expanded into 900 MHz. Like AMPS, it used FDMA/FDD/FM but at a smaller channel bandwidth. This meant that more channels could be squeezed into the same band resulting in a higher system capacity.

These early cellular systems were all analogue, what one today calls _First Generation (1G)_. Only signalling was digital in nature and this was processed in software. _Second Generation (2G)_ systems used digital transmission of voice. Coming with the old perspective of fixed telephony, 1G designers did not adopt the superior digital technology even when all the elements were ready by then. In their defence, they have valid reasons for not going digital early on. Processing power on the mobile side was limited back in the late seventies. PCM voice took up more bandwidth and the necessary advances in speech compression did not become available until later. ADPCM, for example, arrived only in 1973 and was not standardized until a decade later. When voice was packetized in relation to the early work on Internet and IP, real-time voice compression was implemented on minicomputers. As MOS technology progressed, bringing with it the benefits of VLSI, a great deal of the processing could be packed into a single chip. This came to be realized in the early eighties.

In the long run, the intent has been to migrate almost everything from analogue to digital. The only parts of a system that need to remain analogue are human dependant—the way we speak and listen in analogue waveforms. Once speech is sampled it becomes digital. From then on, all processing is done digitally using special processors known as _Digital Signal Processors (DSP)_. Unlike the more versatile microprocessors, DSPs were optimized by design for mathematical operations. With a generic microprocessor, a multiplication might take many clock cycles but a DSP would do it in a single cycle. This meant that DSPs were fast and efficient for baseband processing of signals. Using only ones and zeros, they could do quantization, band filtering, speech compression, encryption, and CRC computation in real time. Beyond its role in communications, DSPs are essential in modern consumer electronics. DSPs power digital cameras, iPods, and DVD players. In a digital camera, for example, DSPs perform JPEG compression for images and MPEG encoding for video, although some models may have dedicated hardware to do the same. Such dedicated hardware goes by the name of _Application Specific Integrated Circuits (ASIC)_. It makes sense to develop and promote ASIC only if there are enough volumes.

One notable name in DSP technology is Texas Instruments, the same company that had first given the world the transistor radio and the pocket calculator. Its first commercial offering of a DSP came in 1982, the TMS32010. This represented a delay of a decade since TI's first 4-bit microprocessor. The arrival of DSPs was really a response to a world increasingly moving towards data and becoming digital. It was therefore fitting that 2G cellular systems were designed as digital systems. Speech would be processed digitally within the mobile handset and carried by the network digitally.

The eighties was also the era that saw a keen interest in ISDN. In other words, even within the fixed telephone network there was a drive to carry speech digitally right up to the subscriber. Although ISDN eventually did not succeed, many of its principles survived and inspired mobile telephone networks. The most successful 2G digital cellular system was GSM that came into operation in 1991. Based on TDMA/FDD, it had eight slots per time frame. In other words, eight voice calls could use the same frequency on a time-shared basis. The system itself had 125 channels, which could be reused by proper cell planning. For voice compression, a form of LPC called _Residual Excited Linear Prediction (RELP)_ was used at 13 kbps. Voice compression provided the necessary bandwidth efficiency while speech bits protected by channel coding brought better speech quality in the presence of channel noise. There was another important reason for going digital.

By the late eighties, IP and packet switching were proven technologies. ARPANET was transforming itself to the modern Internet. Communication was no longer limited to just voice. It was natural and sensible to make provision for data in any new cellular standard. Initial data offering in GSM was a mere 9.6 kbps, which was later expanded to 57.6 kbps. This was made possible by sacrificing error protection on good channels as well as using multiple time slots per data connection. The real problem with data on GSM was circuit switching, which was known to be a misfit for bursty traffic. It was only later in 1997 that GSM standard was augmented to the _General Packet Radio Service (GPRS)_ to provide data capability using packet switching. What this meant for the wireless channel was that radio resources were reserved temporarily and only when there was any data to send. This had not been possible with GSM, which tied up radio resources until the data call was disconnected. GPRS provided as much as 171.2 kbps in the downlink. This was improved by _Enhanced Data Rates for GSM Evolution (EDGE)_. Using better modulation techniques, EDGE was able to offer theoretical download speeds as high as 473.6 kbps. GPRS and EDGE became collectively known as 2.5G technologies.

About the same time that DSPs arrived on the market, a new breed of processors was coming up with a fundamentally different architecture. Its genesis can be traced to John Cooke of IBM. In the seventies, Cooke opined that Intel and TI microprocessors of the day were getting increasingly complex. These processors had many dozen instructions. This no doubt resulted in shorter programs but on the flip side, many cycles were needed to complete one instruction. Cooke proposed a new architecture in which processors can do everything with a handful of basic instructions. This no doubt increased program size because many instructions would be needed to perform any given task. The advantage Cooke saw was in speed. Most instruction would require only a few clock cycles. Some could be completed in a single cycle. Overall, the processor would run faster. This came to be called _Reduced Instruction Set Computer (RISC)_ , as opposed to the traditional _Complex Instruction Set Computer (CISC)._

As the eighties dawned, RISC architectures became a topic of research outside of IBM. Though IBM had built a prototype RISC machine, it was never commercialized. The success of System/370 clouded IBM's vision of the future. When researchers at Stanford published positive results in favour of RISC, these were met with scepticism. Then things changed dramatically when Sun Microsystems introduced a RISC processor called SPARC in 1987. Sun's workstations were built around SPARC. On top of that, Sun licensed SPARC technology so that others could build their own workstations. No doubt this trimmed Sun's own profit margins in the workstation business but the move brought faster processing at lower cost. In combination with Berkeley UNIX, SPARC brought new converts from the world of IBM PCs to workstations. Clearly, RISC was here to stay.

Just as TI became renowned for DSPs, a company named Advanced RISC Machines (ARM) came into being in 1990, although the company had existed since 1979. Its first RISC offering came in 1985 and by the end of the decade, ARM had established its supremacy in the world of low-power, high-speed RISC architectures. While SPARC and similar processors catered for powerful computers, ARM catered for small devices. As for its business model, rather than becoming a semiconductor company offering its chips, ARM adopted a licensing model. Even Intel became one of the licensees, not to mention other big names—Sharp, TI, Hyundai, Philips, Sony, HP, Qualcomm, IBM, Samsung, and many more. Particularly with mobile phones, ARM technology has become indispensable. It has now become standard to combine an ARM core along with a powerful DSP, throw in lots of cache memory, all within a single chip. The ARM core typically runs the wireless protocol such as GSM. The DSP typically does all the data processing. Without either ARM or sophisticated DSPs, it is fair to say that today's mobile phones would be slower and more taxing on battery power. Cellular technology in part owes its success to such advances in mobile phone technology. After all, it is well known that any chain is only as strong as its weakest link.

By 1996, GSM had become so popular that it was in use in more than a hundred countries worldwide. Incidentally, this was also the year when a Finnish cellular network operator first began offering Internet data on mobile phones. When GPRS and EDGE came later, they offered a natural evolution of GSM without requiring major investments or network changes. Many GSM operators worldwide introduced these enhancements, maintaining their priority of voice services but increasingly promoting data services. The original GSM spectrum at 900 MHz was expanded to include 1800 MHz and 1900 MHz. As demand increased, cells were shrunk to increase system capacity. For such smaller cells, the extra spectrum became valuable. With GSM, Europe established its superiority over the US in cellular technology. The Americans had for once lost out on global dominance. Once more, the reasons were not technical.

Part of the reason could be the not-invented-here syndrome. GSM adopted TDMA technology, which had been proven in satellite systems. While some in the US supported TDMA, others attempted to promote an alternative technology called _Code Division Multiple Access (CDMA)_ , a technology that was home-grown. The birth of CDMA can be traced to a famous Hollywood actor Hedy Lamarr. Lamarr was an Austrian Jew trained in both ballet and piano. Married to an arms supplier, she often hosted extravagant parties to Hitler and Mussolini, not that she supported their Nazi cause. She had little choice. Her husband was domineering and the political atmosphere was stifling. Under disguise, she fled to Britain and then to the US. When the war broke out, she was determined to contribute to the war effort in any way she could. So between her makeups and screenplays, she interacted with engineers. One of the problems she looked at was radio jamming, which interfered directly with the operation of remote-controlled guided torpedoes.

With jamming, radio communication gets disrupted by a rogue high power transmitter within range of the receiver. The frequency on which communication is desired becomes useless. While playing the piano, Lamarr noticed that notes were strung together to make music. Suppose one looked at the key presses without hearing the music, it might seem that keys were pressed in some random sequence. Lamarr used this idea to invent the concept of _frequency hopping_. Instead of communicating on a single frequency, both transmitter and receiver use a set of frequencies and randomly hop from one frequency to another. Prior to communication, both parties agree to an algorithm to achieve this pseudorandomness. In other words, the sequence looks random to the enemy but it is completely deterministic to communicating parties. Since many frequencies are employed, the only way to jam effectively is to transmit on all frequencies at once, which is really a difficult thing to do. What Lamarr had done was to convert a narrowband signal into a wideband signal in which information was spread across many frequencies. This _spreading_ of the signal gave jamming resistance to the communications link. Bandwidth was sacrificed to achieve anti-jamming.

Frequency hopping came to be called CDMA because the sequence in which frequencies were hopped was like a code. Each user was dynamically assigned a code and one user's code did not interfere with another's code. Another way to implement something similar was to spread the original signal to a wideband. This made the signal look like noise because its power was now distributed over a wider bandwidth. This was the method of _direct spreading_ for implementing CDMA. The reason why spreading works is because one user's spreading sequence has low correlation to another's sequence. This means that multiple users can share the same frequency at the same time. They are separated only by the code, the manner in which their original signals are spread.

Methods of Multiple Access

(a) With FDMA, multiple users use different frequency channels and use them for the entire duration of their allocation. (b) With TDMA, multiple users use the same frequency but take turns to use the resource. They are therefore separated in time. (c) With CDMA, multiple users are separated only by code, a special mathematical construct that ensures that one user does not interfere with another.

Lamarr's idea came too early for its time and was not useful during the war. A decade later, the principle was applied by the military and the police. Then in the eighties, the idea of using CDMA for 2G cellular was promoted by those who believed in its superiority over TDMA. Indeed, TDMA technology as adopted by GSM wasn't exactly the best technology to use. Yet, there was enough experience in the industry to take TDMA forward. Many incremental improvements were introduced to steadily improve its performance and increase capacity. On the other hand, there was limited experience among engineers with CDMA beyond its use in the military. While the technology is the same, commercial aspects are different. Military systems are rarely cost conscious. Their demand for capacity is usually secondary to their need for reliability. In public commercial systems, there is a need for balance among various factors—cost, capacity, efficiency, and reliability. Success of any system in the military is no guarantee of success in commercial application. Such an application will probably succeed but the transition requires both technical and market research. William Erdman of InterDigital Communications Corporation expressed the outcome in the early 1990s,

Today, cellular "boasts" four separate technology standards (two analog and two digital) and no direction from the FCC. As a result, it appears that the cellular industry will implement two digital standards that are incompatible with each other.

The two digital standards that emerged in the US were IS-54 (TDMA) and IS-95 (CDMA). Because IS-54 was compatible with AMPS, it was popularly referred to as Digital AMPS (D-AMPS). The old analogue AMPS not only continued to serve existing customers but also gave rise to _Narrowband AMPS (NAMPS)_ , which was a new analogue standard in a world becoming increasingly digital. IS-54 in turn inspired a similar standard in Japan by the name of _Pacific Digital Cellular (PDC)_. For years after its introduction in the US, CDMA as a technology was adopted by only a handful of countries. Though many countries subsequently adopted CDMA as an alternative to GSM, penetration is still below 20% of the global market. This means that worldwide GSM subscribers are 3.2 billion strong. When GSM entered the US market in the 1900 MHz spectrum, this brought further competition but also excessive fragmentation of the market.

Besides being a cellular standard, CDMA also found its niche applications. In India for example, CDMA was proposed as a means for implementing _Wireless Local Loops (WLL)_. The problem in developing countries is to reach rural communities for which laying of copper wires is expensive and almost profitless. This is usually known as the _last mile problem_. For decades, even when rest of the telephone network progressed, the final connectivity to the subscriber remained a bottleneck. WLLs solve this problem by going wireless. For this reason, Reliance Infocom and Tata Teleservices Ltd were given licenses to operate WLLs for local or rural coverage. Unknown to authorities, or perhaps in tacit cooperation, they used the WLL model to deploy a full-fledged cellular network. This was how CDMA came to be widely deployed in India, bringing much-needed connectivity to rural communities.

The key differentiating factor between WLL and cellular is in mobility. The service model of WLL says that mobility ought to be limited to an area in which telephone connectivity is desired. It was simply a replacement of plain old copper wires with electromagnetic waves. Cellular networks operated on a different service perspective. In such a network, subscriber ought to be able to access the network anytime, anywhere. Even if the subscriber travels out of his local area, she should be able to make and receive calls. Cellular networks offer seamless mobility within the boundaries of the network. When these boundaries are crossed, operators generally make agreements with other network providers so that connectivity can be extended globally. These are the _roaming agreements_ , which become useful when subscribers roam out of their home networks to foreign networks. Within the network, connectivity to a subscriber is achieved by two essential principles— _paging_ and _handover_.

When a subscriber gets an incoming call, the network must first locate the subscriber. For this reason, the subscriber's mobile regularly updates the network of its location. With an incoming call, the network pages the mobile within a location area comprising a few cells. Paging goes out on all the cells of the mobile's last location area. When the mobile responds, signalling proceeds to complete the call by establishing and reserving radio resources. Tracking the mobile's location and paging it efficiently is one of the great challenges of cellular networks. If the mobile updates its location too often, battery power is drained. If the mobile does this too infrequently, it cannot be paged correctly and it becomes unreachable for an incoming call. This is the reason why sometimes when a subscriber is on the move in urban areas, incoming calls may not go through since the mobile has not yet updated the network of its location.

The other key challenge is to maintain an active call when the subscriber is on the move. In fixed networks, changing line characteristics were taken care of with automatic adaptive equalization techniques. With wireless, the signal path changes more frequently, particularly when the subscriber is moving. As the subscriber moves away from her serving base station (a network entity that connects the subscriber with the network), SNR drops, interference may increase, and call quality suffers. As a result of this dynamism, both the network and the mobile regularly take measurements of signal quality. When it is found that a subscriber is closer to another base station, which may offer a better service, the active call is handed over from the currently serving base station to the new base station. Without handover, calls would be dropped and subscribers would be forced to redial. The only problem is that this handover may take up to a second. The subscriber may perceive this as a temporary break of the call.

One of the factors that motivated the selection of Wideband CDMA (WCDMA) as the technology of choice for _Third Generation (3G)_ cellular systems was the need for increased capacity. It was wideband because signal bandwidth was 5 MHz rather than 1.25 MHz of the American IS-95. WCDMA also brought an advantage in terms of handover. In GSM, connection was first broken with the current base station before it was established with a new base station. This was "break-before-make" or _hard handover_. WCDMA offered an improvement whereby it was possible to maintain connections with two base stations during transition before breaking off with the old base station. This was "make-before-break" or _soft handover_. The advantage to the subscriber is that she no longer perceives break of service during a handover.

Standard bodies had attempted earlier to make 2G a universal standard the world over. A single standard brings advantages of economies of scale for research laboratories, manufacturers, operators, and subscribers. Technology matures faster. Subscribers can roam across continents carrying the same phone. Unfortunately, this goal never came to pass. When it came to designing a universal 3G system, the goal was once more pursued with dedication. Even the name of such a system reflected this: _Universal Mobile Telecommunications System (UMTS)_. But history has a way of repeating itself.

In the US, IS-95, more commonly known by its commercial name cdmaOne, evolved to include data with rates comparable to GPRS and EDGE. This led the way to CDMA2000 with variants that could support only data (1xEV-DO) or a combination of data and voice (1xEV-DV). These were 3G standards that evolved separately from the GSM family, although the term 3G is often reserved for WCDMA. Outside the US, CDMA2000 attained notable commercial success in South Korea and Japan. Worldwide success of CDMA technology came only with WCDMA but in the US, it found itself competing against CDMA2000. Verizon Wireless, for example, uses CDMA2000 while AT&T Wireless uses WCDMA. As for China, intentionally wanting to assert an influence on the world stage, approved its own version of CDMA called _Time Division Synchronous CDMA (TD-SCDMA)_. One key difference with other 3G systems was that the Chinese standard used _Time Division Duplexing (TDD)_ rather than FDD. In other words, uplink and downlink channels were separated in time rather than in frequency. This translated to extra work for everyone from design to deployment but no one could really ignore a market of more than a billion people.

The fragmented nature of the US market led to one more difference in mobile phone architecture. In GSM, the phone equipment was separate from the user subscription. The latter was realized as a smart chip called _Subscriber Identity Module (SIM)_. SIM cards represented the service agreement between an operator and the subscriber. This gave subscribers freedom to choose any mobile phone from the open market and use it with their SIM. For years, this flexibility was not available in the US market where most operators tied the subscriptions to the phones. Roaming to foreign networks was impossible because CDMA2000 phones would not work on GSM networks. It was only in mid-2000s that operators started implementing the SIM equivalent for CDMA2000 networks. They called it _Removable User Identity Module (R-UIM)._ True to its name, it became possible to remove R-UIM from one CDMA2000 phone and insert it into another. Better still, it was possible to insert it into a GSM phone and connect to GSM networks.

As the industry gears up for the _Fourth Generation (4G)_ of cellular systems, known within telecom circles as _Long-Term Evolution (LTE)_ , there is yet another grand attempt to agree on a single worldwide standard. Perhaps this time the industry will succeed in bringing about a single universal standard. An unavoidable diversity will be in the spectrum allocations, which means that only multiband phones will be able to roam seamlessly across countries and continents. Yet another problem is that LTE comes in two variants—Frequency Division LTE (FD-LTE) and Time Division LTE (TD-LTE). While most of the world is aligning itself to FD-LTE, major markets including India and China have chosen the TD-LTE route. So Americans travelling to China will be not able to connect to TD-LTE networks with their FD-LTE phones. They can, however, connect using a TD-LTE phone enabled by their American subscriptions.

The new 4G standard is striving to give users data rates up to 100 Mbps or more. Since the arrival of the Web, data has grown tremendously. Video, image, and music sharing became possible as ADSL and fibre technology lent support to handle higher data rates. 4G will enable the same in the mobile space. Users are already downloading short video clips and MP3 files over wireless and storing them on their iPads and smartphones. This will evolve to live HDTV streaming on mobile devices. Given the prominence of data, 4G is framing itself as an all-IP architecture. This means that even voice may eventually be carried as VoIP packets using packet switching. Traditional voice may be limited to legacy systems until they decide to upgrade. Overall, 4G aims to migrate Internet users from wired broadband connections to the cellular world. The real issue will be the affordability of these services, the cost of download per byte. While 4G will bring higher capacity, it will nonetheless be finite. Subscription costs will get only as cheap as available capacity will allow it. Cost structures will shape themselves to find the optimal match between demand and supply.

The underlying technology that drives 4G is _Orthogonal Frequency Division Multiplexing (OFDM)_. This is a multicarrier technology from which is derived ADSL's own DMT modulation. With OFDM, information is distributed across many subcarriers. This transforms a wideband signal into many narrowband signals each of which suffers less channel distortion. In fact, a wireless channel suffers from a peculiar loss called _fading_. When mobile devices move around, signal sometimes drops in quality or fades. To maintain signal quality and SNR, frequent power control is done. With wideband signals, the situation is far worse. Different parts of the spectrum are faded by different amounts. This is normally termed _frequency selective fading_. OFDM transforms this into _frequency flat fading_ within each subcarrier. The end result is that signal distortion and ISI are both reduced.

ISI is an important problem for all channels, particularly for the wireless channel. Wireless channels suffer from multipath propagation, which means that the receiver gets multiple copies of the signal. The signals may arrive directly. Many delayed versions of the signal may arrive after numerous reflections and scattering effects along the way. This is called the _delay spread_ of the signal. It directly leads to ISI since one symbol overlaps in time with another nearby. ISI puts a limit on bit rate unless novel techniques are invented to combat its effects. Equalization is one technique. Another is to use treat these multiple versions as a form of redundancy and combine them in intelligent fashion. Redundancy in the wireless channel is usually referred to as _diversity_. Thus, instead of a single receiver antenna, multiple antennas can lead to better system performance. Multiple streams of data are sent independently using multiple antennas at both transmitter and receiver. This gives rise _Multiple Input Multiple Output (MIMO)_ systems that lead to higher bit rates, lower BER, and improved system performance. A combination of OFDM and MIMO allow 4G to attain high bit rates that only a decade ago might have seemed impossible. The real beauty of OFDM, however, lies in its mathematics.

OFDM can be seen as an evolution of FDM technology in which multiple frequency carriers are used to multiplex many voice calls. Each carrier has it own defined bandwidth. OFDM improves on this by overlapping the sidebands of one carrier with those of its adjacent carriers. This saves valuable bandwidth. To ensure that carriers don't interfere with one another, they are designed to be orthogonal to one another. One way to perform modulation in this system is to feed data into each of these carriers, modulate all of them separately, and then combine them before final amplification and transmission. Ingeniously, the whole process is simplified by a single operation that relies on Fourier transform. Since processing is done digitally, a discrete version called _Discrete Fourier Transform (DFT)_ is used. DFT was made practical for real-time implementations by reducing it to _Fast Fourier Transform (FFT)_. An OFDM receiver applied FFT while the transmitter did the reverse operation. FFT relied on properties of symmetry in the computations and reuse of many intermediate results. By doing so, complexity of the order of _N_ 2 was reduced to _N_ ·log _N_. FFT was initially thought to be a 1965 invention of James Cooley and John Tukey of IBM. Later it was discovered that the famous mathematician Carl Gauss had used FFT back in the nineteenth century. Remarkably, Gauss had done this before Fourier himself had published his series expansions.

As a complement to the growth of cellular communications, _Wireless LAN (WLAN)_ standards evolved via the Ethernet route. While Ethernet employed a technique to detect packet collisions, such a detection technique did not work on the wireless channel. WLAN therefore employed a pre-emptive technique to avoid rather than detect collisions. Fundamentally, WLAN evolved from the data world while cellular systems evolved in a voice world, which later catered to the demands of data. WLAN was from the beginning about data terminals and computers. Here lies one of the key and subtle differences between the WLAN world and the cellular world.

While the terms wireless and mobile have often been used interchangeably, they are not the same. Wireless aligns itself with the need for connectivity while the term mobile has closer affiliation with portability. Wireless enables mobility but not all wireless devices are necessarily mobile devices. A Wi-Fi access point and a Bluetooth-enabled printer are clearly not mobile. The focus of WLAN has been connectivity. Mobility has been given secondary consideration in the design of WLAN standards. It is true that Wi-Fi-enabled laptops can be moved around but their movements are limited to low speeds within small areas of operations—cafes, airports, conference venues, and so on. In many other cases, such as wireless surveillance systems, IP cameras mounted on fixed locations connect to a central controller using Wi-Fi. WLL is also more about connectivity rather than mobility.

On the other hand, cellular mobile phones can be used in high-speed trains or in rural areas where the serving base station may be perhaps thirty miles away. This explains why cellular systems gave plenty of design consideration to system-wide service availability via paging and handover. Mobile devices are designed by keeping in mind portability and battery technology. No one wants to lug around a suitcase or charge the phone every couple of hours. Wireless is a necessary part of being a mobile communication device but mobility comes with its own set of goals. WLAN standards catered only for local wireless access and left higher layer protocols to other systems that interfaced with them. Today we are seeing one such integration whereby cellular mobile phones coming into the range of Wi-Fi hotspots can be handed over to the latter for better service. Thus, the mobility characteristics of cellular systems are being complemented by better connectivity and higher capacity of WLAN systems.

Starting from wireless telegraphy through radio broadcasting and satellites, the waves of Maxwell and Hertz have indeed come a long way. Today we are surrounded by wireless systems both at home and at office, both in private and public systems. Sensors monitor lighting, humidity, and temperature in intelligent buildings. They relay these measurements wirelessly to a controller that takes decisions to conserve energy. Cordless phones based on standards similar to cellular systems have enabled a home subscriber to talk from his garden. Contactless tickets and access cards are possible because of wireless. Readers can now borrow library books without going through the old-fashioned stamping process. RFID chips embedded in these books enable borrowing and returning with the use of wireless readers. The same technology is used to track the movement of goods and inventory management without even unpacking the stuff from the boxes.

Tangled wires at the back of a computer or a home entertainment system may soon be a thing of the past. Already the computer keyboard and the mouse have become wireless devices that use either Wi-Fi or Bluetooth. Wi-Fi has enabled wireless interconnectivity in offices and homes, so that computers can send to the printer a document over the air. Our mastery in the 2.4 GHz and 5 GHz spectrum has given us technology to support such applications. Going forward, as broadband multimedia becomes pervasive, this is certainly not going to be enough. Wi-Fi standards are therefore quickly making the move towards the 60 GHz spectrum where massive bandwidth is as yet untapped. The spectrum will most likely be effective only over short distances but whenever the technology is ready, it will change the way we interconnect our devices. HDTV Internet channels can then be relayed from a Wi-Fi-enabled ADSL modem directly to an LED flat screen at the other end of the room. Video games that place gamers in a virtual world will become more real and interactive, without being constrained by wires. In real classrooms, students may have individual display screens but the content would be relayed over the air from a central server. The wireless channel therefore has potential to replace the traditional computer bus architecture of data exchange.

The future of wireless is simply a network of little devices, sensors, and diverse systems talking to one another, interacting, and taking decisions autonomously without humans ever being involved in the process. Experts call this the _Internet of Things_. Indeed, the future looks promising if threatening at the same time. A world without wires may look neat and tidy but just as Marconi's "secret boxes" were once met with wonder and suspicion, the wireless world of the future will have its own secrets that we may find hard to accept.

# 1100 From Carbon to Silicon

**Something momentous happened** in Netherlands on December 11, 2006. Analogue TV broadcasting that had been in place for decades was suddenly turned off with the flick of a switch. Over-the-air broadcasting from that day forward was to be in digital form. A barrier was crossed that day. A legacy system had been ousted by a digital upstart. The old guard who had faithfully served and entertained millions had been almost silently retired. There was no standing ovation or medal ceremony. Everything the elderly and experienced analogue knew had already been learnt and bettered by digital. They had worked side by side for years ever since digital TV started entering commercial services from the mid-1990s. It had taken more than a decade to begin the transition to an all-digital network.

What happened in Netherlands was only the start of a wave of TV digitization across the globe. In the UK, first digital TV signals had been broadcasted in 1998 but the transition from analogue to digital did not begin until 2007. Five years later, the last of the analogue signals ceased to exist in the terrestrial airwaves. As for satellite broadcasting, it had already been digitized earlier in 2001. Even cable TV systems were digitized in the UK in 2012. While UK had adopted a phased regional approach to the transition, analogue transmissions in the US suddenly went off the air on June 12, 2009. In Australia, this digital switchover started in 2012 and is planned to reach completion in 2013. The biggest cities in India achieved this switchover in 2012 while the rest of the country will slowly move towards digital TV by 2015. As early as 2007, researchers reported that 94% of memory storage was digital, that 99.9% of telecommunication was digital when in 1986 the same statistic was hardly 20%.

The digital switchover that's happening right now is an historic event. It puts to rest an old technology from which the last drop of juice had been squeezed out long before. It is an acknowledgement of the victory of digital over analogue. Digital brought quality by virtue of transmission and processing of bits. Digital compression enabled many more TV channels to be squeezed into the same bandwidth previously occupied by a lower quality analogue transmission. Digital technology enabled easy integration of programme information and user interactivity with the content. Encryption of digital content meant that information could not be tampered easily. The switchover itself was no trivial task. It involved installation of new equipment, upgrade of devices at customer sites, education of subscribers to make the switchover, and planning of frequency allocation. For many years, the switchover was delayed because consumers waited for digital equipment to get cheaper while manufacturers in turn waited for demand to go up. Content providers waited for consumers to make the switch before committing themselves to digitization of content. Every new technology faces this start-up dilemma and only those that demonstrate clear advantages survive.

Similar transition is also happening in the world of radio broadcasting, which is an even older technology compared to TV broadcasting. Traditional radio broadcasting using AM and FM is being digitized. Under the umbrella of _Digital Radio Mondiale (DRM)_ , digital techniques will allow quality radio listening of not just local channels but also international channels broadcasted from stations perhaps a thousand miles away. What DRM brings is the nostalgic beauty of AM combined the quality of digital. Digital radio broadcasting standardized earlier as DAB already gives listeners the benefits of easy tuning, programme schedules, and textual information. In competition with DRM is Internet radio whereby music is digitized and streamed to listeners worldwide. Internet transcends geographical boundaries. It does not even enforce upon the listener a strict schedule of programmes. Listeners can often access past programmes, download them to their iPods, and listen at leisure to these _podcasts_. DRM, however, is radio in its original form. Even without Internet connectivity, listeners can pick up content from the air.

This tussle between traditional broadcasting and Internet has been a good thing for the consumer, who now has more than one way to access content. After going digital, traditional broadcasting has brought with it greater interactivity, information, and quality. Internet services are offering a mix of real-time broadcast channels and on-demand pay-per-view channels to appeal to almost every user preference and budget. In other words, the same content is now delivered via different technologies. With each passing year, the divide between traditional broadcasting and Internet is blurring. Any difference, it lingers on only in the final delivery to the user. Everywhere else within the network, content is being reduced to bits, packed into IP packets, and carried over optic fibres. One calls this _convergence_ and it's already happening.

Even within consumer electronics, convergence is being experienced vividly. TV programmes are now accessible on the Web. While in the past, much of this had been in low resolutions and sizes, today many web channels are being offered with full HDTV capability. If traditional TV broadcasters are reluctant to make available all of their content on the Web, licensing is one of the reasons. An Internet user need not pay a TV license fee while in the old world, users had to pay a license fee to own and use a TV set. Licensing is one of the things that prevent wide area Internet broadcasting. For instance, BBC's iPlayer streams video across the UK. Though there is no technical barrier for users outside the UK from accessing this content, license regulations prevent this from happening.

Convergence is also happening in the other direction. Delivery of video-on-demand over TV had been tried and abandoned in the nineties. Given that now ADSL and fibre broadband connections are more common, they are making a comeback. With this technology, a packet-switched network is used to deliver video content. Going by the name of IPTV, usually offered via attractive triple-play packages, it has become a commercial success in France, where more than a quarter of the population are active subscribers. _BT Vision_ in the UK and _mio TV_ in Singapore are examples of IPTV offerings. With IPTV, viewers can preview, pause, subscribe, or view programmes later without being constrained to a fixed schedule. They can participate in news polls or interactively view sports statistics at their own demand. Given the growing adoption of flat-screen LED displays fitted with Wi-Fi and Ethernet ports, Internet is now accessible on the same displays whose original role had been limited to viewing satellite TV or digital cable TV. This brings another level of sophistication to user interactivity. Broadcasting on its own makes subscribers passive consumers. When this is combined with Internet, users have become active critics and commentators. It is now common to see TV programmes include a running tickertape of Twitter comments, particularly in news channels. In a digital world, broadcast content is seamlessly merged with interactive content.

One burning question that has long troubled designers and caretakers of the Internet has been the issue of _Quality of Service (QoS)_. This had not been a problem in the eighties when content was primarily textual and non-real-time. With the coming of the Web, delivery of audio and video needed certain guarantees of delay, jitter, bandwidth, and packet loss. Indeed, services such as video conferencing, live sports broadcasting, remote video surveillance, and VoIP depend on such performance metrics to be effective. There was no point receiving a live video frame after a delay of five seconds or losing a few packets on an important goal in a soccer match. ATM had brought QoS guarantees in its own way but the Internet being primarily based on IP technology needed a solution of its own. Of Internet QoS, it has been said,

First, trying to introduce QoS in IP routed and connectionless networks is indeed a Utopian idea. This is like trying to introduce fine cuisine, gourmet dishes, and a la carte menus in fast food restaurants.... Second, IP was designed without any QoS perspectives and has grown very big.... it is not a trivial task to implement the same QoS mechanisms in every single Internet device around the world. It would be as if, on top of quality, all fast food restaurants around the world must provide identical menus.

Internet has always been a best-effort service. If bandwidth is available, delays are minimal. If enough buffers are available on routers, packets don't get dropped. At the same time, when the network is congested, no one is denied service. Internet QoS was therefore viewed with indifference and even suspicion. Internet is almost the perfect example of a democracy. All users are treated equally. Politician-bits are not prioritized over layman-bits. Rich-bits don't have preference over poor-bits. Middle-aged-men-bits are not ignored at the expense of young-pretty-girl-bits. Asian-bits get the same treatment as European-bits. When it was therefore proposed that ISPs could introduce a two-tiered cost structure, it suggested that those with money could pay to prioritize their personal bits over those less fortunate. Therefore, the debate about Internet QoS was from the start fraught with controversy, not so much in relation to technology than the free spirit ideology that created the Internet itself.

There is another angle to the debate, approached from the application perspective rather than the user perspective. All users in a democracy are treated fairly but just as there are emergency services in hospitals, some bits require immediate attention over others. If hospitals delay emergency operations, lives are lost. If Internet routers don't apply similar preferential treatment to real-time bits, packets become useless and are eventually discarded. While it is true that all media are reduced to bits, it turns out that not all bits are equal. Real-time bits are given higher priority. Video bits are given more bandwidth than voice bits. E-commerce bits are given greater security than chatroom bits. Bits from deep space probes are given more error protection than LAN bits.

IP, working in cooperation with other protocol layers, made it possible to treat bits differently as the need arose. Packets could be associated with specified priorities. To make things manageable, packets were identified generally as belonging to voice, video, best effort, or background. This differential treatment was due to the requirements of applications, not users. But Internet being by architecture open and distributed, could not force operators to implement these techniques on every router and gateway. This meant that QoS guarantees were available on some paths but end-to-end QoS remained unrealizable. At best, a user might pay more to her ISP for faster access to Internet but she does not buy hard end-to-end QoS guarantees. At best, ISP's could give soft guarantees as part of service objectives. In general, when the network is congested or experiences large-scale failures, QoS guarantees break down.

Key Components in Implementing QoS

The very nature of administering QoS is complex and incurs overheads. In many cases, ISPs have adopted the simpler approach of adding excess capacity to the network. The very fact that most of the time we are able to stream video smoothly is due to this excess capacity. Even if technologies that aspired to achieve end-to-end QoS have failed to do so, they remain important to ISPs for measuring current performance levels and characterizing traffic. One calls this _traffic engineering_ , which has enabled ISPs to optimize network performance and upgrade capacity where required. Internet QoS in the strict sense is still a dream but this has not prevented the proliferation of high-bandwidth data. There may be no QoS guarantees but thousands of Internet users make interactive video calls on Skype that connect people across the globe. The video may freeze at times and speech packets may suffer loss. The ease of making long-distance calls at almost little cost means that users are willing to put up with a little inconvenience. Advances in technology continue to drive capacity upwards. The real question is if this is happening fast enough to cater to the increasing demand.

There are two well-known laws from the world of computing that are at the heart of this demand-supply debate. It was back in 1965 that Gordon Moore, one of the founders of Fairchild Semiconductor and later of Intel, wrote,

The complexity for minimum component costs has increased at a rate of roughly a factor of two per year. Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer.

This later became known within industry circles as _Moore's Law_. It was not so much a law of nature as an educated guess. The idea was that by packing more transistors into a single piece of silicon cost per transistor dropped. Computing power and memories would therefore become faster and cheaper in the process. Time and technology have shown that Moore was uncannily right. In the longer term, the uncertainty was resolved by Moore himself when he gave semiconductor technology about two years to double the packing density of components on silicon. Remarkably, the law has continued to hold good for almost forty years, although doubling more optimistically within eighteen months. The original transistor of 1947 was large enough to be assembled by hand. Today 100 million of Intel's 22-nanometre transistors can be packed on a pinhead. Some even go on to claim that the law provided researchers a baseline to aim for in their continuing quest for better technology. The law was therefore a self-fulfilling prophecy.

Technological growth as predicted by Moore's Law was why personal computers could become as powerful as early mainframes. This was why a mobile phone today is a computer in a pocket. They do much more than simply make calls. Digital cameras integrated with these mobile phones take pictures, while phone software does the JPEG compression. Users can manipulate these images using sophisticated image-processing algorithms. Hundreds of MP3 songs can be stored because mobiles are now blessed with memories exceeding gigabyte capacity. With mobiles, users can connect to the Web and manage their online calendars, access emails, post Twitter comments, or upload their latest photographs. This is one more aspect of convergence from the perspective of a mobile phone. A mobile phone will never be a professional photographer's choice or a programmer's choice as a platform for software development. It is precisely because it is not specialized that it appeals to a broad audience. In the mobile phone, we see the convergence of many common tasks that are deeply integrated into modern day digital culture. A single mobile phone is enough to replace a wired phone, a voice recorder, a portable CD player, a digital camera, a digital PDA, a wristwatch, a transistor radio, and an alarm clock.

As IC technology progressed, so did the possibilities of using it in new ways. Text-based communications and user interfaces gave way to graphical interfaces, rich images, and high-definition videos. Content was no longer in kilobytes. Megabytes and gigabytes of data became commonplace. Software became increasingly sophisticated, packing many rich features. Long before these trends appeared on the scene, Cyril Northcode Parkinson wrote in a humorous article of 1955 in _The Economist_ that "work expands so as to fill the time available for its completion." In the world of computers, this has a corollary that data expands to fill available memory. In time, this became known as _Parkinson's Law_. While supply followed Moore's Law, there will always be enough demand to exhaust that supply. Supply would only be temporarily in excess and therefore technology is forced to innovate itself continuously. Just as computers got faster, programs and processing requirements got more demanding. Just as memories became cheaper per byte, programs got less careful about their memory footprints. The complexity of programming far outweighed the need to save on cheap memory. In the old days, text documents didn't occupy much memory. Images and videos, particularly when uncompressed, tend to fill up memory more quickly. Parkinson's Law essentially made the point that no matter how much memory was available it never seemed enough either for users or for applications. It was like saying that no matter how much wealth a billionaire possessed, it never seemed enough.

While Moore's Law opened up new possibilities, Parkinson's Law was quick to seize those possibilities. Even in the world of communications, their relevance was seen. The earliest example of remote computing is due to George Stibitz in 1940. The communication requirements for Stibitz's calculator were minimal, non-interactive, and non-real-time. The system operated on what one might call "command and control" mode. The operator remotely gave commands to the calculator that executed them without any urgency. When the results became available, they were sent to the operator. Later mainframes that did batch processing evolved on this basis. The focus really was on computing, not communications. Early computers did not communicate or interact. They simply obeyed. They started communicating only when sufficient bandwidth became available.

Moore's Law while being relevant to computing is less so for communication bandwidths. Network capacity grows much more slowly than computing power. Part of the reason is technology but the greater part is economics. Even when technology is available, it takes years and decades to retire old systems. Converting old copper wires to optic fibres to subscribers is a massive expense, which can be justified only when either a critical mass is reached or the user is willing to pay for a premium service. When capacity does increase, Parkinson's Law becomes relevant, albeit with different time scales. New services quickly tend to fill up the excess bandwidth. When moderate bandwidths became available, interactive computing emerged in the form of time-shared computers. Later when telephone lines were leveraged to carry data, computers started to communicate via data modems. When more efficient modems arrived, computers started sending compressed images and videos.

If there is a perfect match between demand and supply, everyone is happy. Perhaps, there is no incentive to innovate and technology stands still. In reality, there is always a mismatch. The world always looks forward to something better. For most scientists and engineers, demand is not necessarily the trigger for innovation. The joyous process of discovery and innovation is what drives researchers. Their work creates new forms of supply, which then drives demand. Demand and supply, like varying electric and magnetic fields, have this symbiotic relationship to drive technology forward. This is apparent all through the history of both computing and communications, which have a synergy of their own.

Through minicomputers, workstations, and PCs, computing was brought closer to the user. Computing was decentralized from mainframes. Real power to this shift came when multiple computers were networked together to share resources. Back in 1950, Herb Grosch had established a rule that gains from a large computer were far more than those that could be obtained by many smaller computers. The justification to this rule was that every computer required a certain amount of basic circuitry and interfaces to peripherals. Multiple computers duplicated this functionality. Their individual processing capabilities could not be easily coordinated. They did not have the advantage of economies of scale during operation. This became known as _Grosch's Law_. With the coming of Ethernet and Internet, it became possible to interconnect computers, endow some of them with specialized functions, and thereby coordinate their actions. In some applications, peer-to-peer model worked. In others, client-server model worked. Either way, the power of networking became established. Grosch's Law was toppled. One of the classic applications of this is the Search for Extraterrestrial Intelligence (SETI), initiated at the University of California at Berkeley.

The UNIX world has always been strongly linked to networking. It is therefore no coincidence that Berkeley that had given the initial commercial impetus to UNIX came up with the idea of using computers worldwide to process enormous amounts of data collected from radio telescopes. The idea was to use idle computing power and put them to the task of data processing. Instead of a single supercomputer, thousands of workstations and PCs would do an even better job. All they needed was some coordination. By the time the SETI@home initiative was launched in 1999, home computers were powerful enough to the task. Best of all, users who chose to contribute to this project did not need to sacrifice control of their own computers. The SETI client software ran in the background without disrupting the user's main tasks.

The processing power of PCs was what triggered the birth of Rich Internet Applications (RIA) whereby user interactivity could be effectively combined with multimedia content on the client side. Technologies that enabled such applications include Adobe's Flash, Microsoft's Silverlight, and AJAX. These were added to a browser's functionality as customized extensions or _plug-ins_. More recently, HTML5 has emerged as a new standard to achieve similar functionality natively within the browser. If it appears that the trend is to push processing towards user devices, there emerged another trend that's already pushing traditional computing out of devices and into Internet servers. Surprisingly, the two trends complement rather than compete against each other.

The fact that Internet connectivity is now almost taken for granted, at least in developed countries, means that there is a fundamental shift in the way the Internet is perceived. Internet is no longer a bunch of interconnections but a universal digital platform. It is as much about computing as about connectivity. In the old days, mainframes did the computations. Later, desktops took over this responsibility and made computing affordable for home users. The home desktop became the user-friendly computational platform. Word-processing suites, spreadsheet utilities, and email clients were some of the packaged off-the-shelf software that could be installed on desktops. With almost continuous connectivity and high bandwidth, the trend is reversing. In popular lingo, one calls this _cloud computing_.

Cloud computing implies a shift in responsibility from end-user devices to the network. The intent is to simplify desktops and trim them down to thin browser-based clients. They are required to do no more than manage connections to the Internet and perform client-side processing of web applications. Users today mostly access and manage their emails online without ever downloading them to their PCs. Music lovers stream songs from sites such as Spotify by paying a monthly subscription without feeling the urge to purchase and store these songs locally. Affordable digital cameras, often integrated into mobile phones, have seen an almost overnight creation of legions of amateur photographers. They can now store high-resolution photographs by the thousands in sites such as Flickr and Picasa. Businesses are migrating their data to secure online data servers, which often operate with sufficient redundancy to guarantee 24/7 availability. This saves many small businesses upfront investment on servers, disk storage, backup systems, firewalls, and antivirus software packages.

More than a decade ago, BPOs were conceived to simplify mundane but necessary operations. BPOs proved to be effective for small and medium businesses, for offshore operations. Today the same thing is happening with digital technology. With cloud computing, medium businesses can operate even without an IT department. This shift in focus is bringing greater mobility. Data is always on the cloud, which is synonymous with data servers. Data can be accessed anytime from anywhere. The cloud offers itself as a platform for heavy-duty computation. Where previously an independent computer scientist might find it difficult to access a powerful supercomputer, today she can rent equivalent computing power from the cloud. This computing power may be implemented as a supercomputer or a network of computers cooperating efficiently. Even traditional desktop applications have begun a slow migration to the cloud. Google was among the first. Formatted documents and spreadsheets can be created and managed online. Microsoft responded to this competition by offering a Web version of its successful Office suite of applications. AJAX and HTML5 become even more relevant. User devices still need to be powerful enough to do client-side processing but otherwise, these web-enabled applications are managed by the servers.

On the surface, cloud computing may represent a reversal of the proven _end-to-end argument_ first stated explicitly in 1984 by Jerome Saltzer, David Reed, and David Clark. The idea had taken shape along with the emergence of TCP/IP that only a year earlier had become universally accepted as the essential protocols for ARPANET. The argument stated that it made little sense to design sophisticated checks and balances at low layers within the network. The network should be as simple as possible. Its design should find a balance across performance, cost, and complexity. It should not strive to achieve perfect reliability. In fact, such a goal is not even feasible since errors could occur at hosts after packets have been delivered by the network. It was therefore important to have end-to-end checks.

These days, a person sending a parcel from a small town in Alaska to a rural community in South Africa can easily track his shipment online. He may be able to see that the parcel's been dispatched from New York, has cleared customs at Cape Town, and has been delivered to the rural post office. But the sender will not be satisfied until he calls his customer and verifies that the parcel's been received, the contents are as expected, and that no one has tampered the parcel en route. Although there may be value in ensuring correct delivery along the route, this by itself is not sufficient. It was on this principle that IP was designed to be as simple as possible and end-to-end delivery was taken care of by TCP. TCP retransmitted packets end-to-end. IP routers along the way did not do retransmissions.

The fundamental idea was to keep the network simple and put responsibility on the end protocols and devices. It may even be claimed that end-to-end argument made a case for layered protocol design. TCP and IP were two different layers with clearly defined roles and interfaces. Even the idea of RISC computers can be related to this argument. RISC instructions are simple building blocks from which complex systems can be more easily built. When it is difficult to foresee what designers and programmers may need, simplest reusable elements work best. Rectangular Lego blocks are more versatile than monolithic walls, windows, chimneys, and roofs.

Although conceived for purely technical reasons, the argument has been applied as a defence against new developments affecting the social balance of the Internet. When cable modem companies proposed closing their systems to competing ISPs, Internet observers saw this as a violation of the end-to-end argument. Subscribers would be limited to accessing only those parts of the Internet that their cable companies deemed suitable. Going by the argument, the role of cable modem companies was to provide connectivity but not control content. When therefore the merger of AOL and Time Warner was finally approved by the FCC, it was under the condition that their delivery systems remained open to competing ISPs.

The end-to-end argument when applied to the new paradigm of cloud computing takes on a different form. User applications and computing resources are moving out of workstations, desktops, and tablets into the network. With the proliferation of devices in all forms and capabilities, developing applications to this growing diversity is becoming complex. The network therefore becomes the unifying force. Cloud computing is surely about accessing applications on the move over fast and affordable data connections; but it is also about simplifying devices to the lowest common denominator. End-to-end argument is still valid except that the ends are different.

While new technologies will surely evolve from existing ones, steady and incremental improvements to connectivity and mobility on their own will bring tremendous changes to the way we work, play, and live. Many patterns have been conceived. It only remains for them to expand and mature. It's hard to say exactly what the digital future will bring. A new generation of users born since the turn of the century is a digital generation. They have few links with the analogue past. They have little practice or affiliation with pen and paper, and much more with keyboards and touch screens. They are more comfortable with multimedia than with words alone. Printed books have an uncertain future in a world three decades from now. Already the transition to e-books has commenced. The future will see more e-libraries and e-bookshops. In December 2012, _Newsweek_ issued its last printed copy of the magazine. Since then, the magazine has existed only in the digital world. Books will survive not as repositories of knowledge but as works of art. A recent survey of 2013 showed that traditional public libraries continue to be important but that's only because the survey involved participants sixteen and above. The digital generation is likely to have quite a different opinion.

Education is going online with flexible class schedules, webinars, online exams, and interactive virtual classrooms. This transformation has made higher education accessible to many, particularly to low income groups. To prove that education need not be expensive, a university in Texas has started providing a four-year bachelor's degree in IT from as little as $9,700. First generation of these graduates will be coming out in 2016. In another three decades, universities may exist only as research laboratories. By then, online interactions would include rich multimedia, class applause, and virtual handshakes. Students would not simply connect to a textual stream of content but walk into virtual classrooms, take their seats, and look at virtual blackboards. The classroom itself will become culturally diverse with students from New Zealand to Iceland sitting in the same class. The real question is if we are ready for such a social change. Traditional universities will resist change. With the rising cost of education, they may be forced to go online.

While a lot of Internet connectivity is about transcending geographies, use of geographies is triggering a new wave of applications. These applications are generally linked to mobile users who access the Internet on the move. Billboard advertisements will begin to target individuals rather than market segments. GPS and cellular technologies are able to locate users and enable _Location-Based Services (LBS)_. Mobile phones and tablets enabled with LBS can help users find the nearest pharmacy or restaurant. Finding oneself stranded in an unfamiliar town late one night, need not trigger a panic attack. With a few clicks, one can get a list of nearby hotels with available rooms. It is exciting to find strangers in the neighbourhood who share similar interests. At least in this sense, Internet is moulding a tribal culture. Users are not forced to put up with opinions they don't like. Not everyone agrees if this is a positive development but most people acknowledge that this is inherent in the democratic nature of the Internet.

Neighbourhood communities have traditionally been successful in forging relationships. Over the decades, this has faded. Families sidelined communities. Individuals in their turn sidelined families. The most cherished ideal of neighbourhoods these days is privacy. Ironically, social networking sites are encouraging users to share personal information. Apparently, users don't mind announcing to the world what they bought over the weekend. The same users resent it if an automated system publicizes that information. It appears that so long as users have the reins of control, they are happy to share. Internet is bringing back communities in a new way but relationships built online are regarded as superficial. Statistics is yet to prove otherwise. Social networking works far better if users know one another in the real world. This is the real reason why when a user connects to new friends via Facebook, the system prompts her to indicate if she knows these people in the real world. Virtual communities may not have had much success with building lasting relationships but for gaming enthusiasts, they are almost as essential as oxygen. Social game Farmville is an example from recent years but an all-time gaming classic is Multi-User Dungeon (MUD).

Given that network bandwidths tend to increase at a slower pace, it may take quite a few decades before sales of DVDs and Blu-ray discs start hitting a plateau. Whenever that happens, every kind of content would be downloaded or streamed. Home theatre systems might be less about discs and players than about displays and sounds. Content itself would be streamed on demand. Existing models of entertainment will not disappear overnight since there is value in giving consumers multiple ways of enjoying content. Real challenges will be commercial—how much of consumer spending should go into the pockets of content providers and how much towards content distributors. The protection of online content against piracy will continue to be an important concern.

E-commerce will continue to grow not just for the latest in consumer products but also for bargains on second-hand goods. Sites similar to eBay will crop up but there will also be channels deeply integrated into social networking sites such as Facebook. Increasing consumerism and decreasing cost of ownership has led to a use-and-throw culture, particularly in the developed world. Some of this loss will be regained as focus shifts to recycling and reuse. The Internet will be the platform for car-boot sales and flea markets of the future. Perhaps the greatest change that has already happened is the increasing use of plastic at the expense of notes and coins. Bill payments, flight bookings, bank transactions, and stock trading have already been simplified.

It has become common in many companies to allow employees to work from home. What prevent this from widespread adoption are concerns of information security and productivity. More importantly, web interactions are still far from achieving the realism and interactivity of a physical meeting room. As virtual reality matures and bandwidths increase, this is likely to change. Sometime in the future, unless commuting itself is fun, people may no longer need to travel. Unless virtual interactions are deemed inadequate, people may not feel the urge to gather for a family Christmas dinner or a Chinese New Year reunion. Sights and sounds are easy enough to recreate in a virtual world. The real challenge will be to import touch and smell into the digital realm. Developed countries will be forced to innovate along these lines to remain competitive against developing countries. While developing countries have cost advantages, developed countries must rely on innovative technology. The economics of any transformation towards virtual offices are as yet unclear. If nothing else, perhaps increasing pressure on nonrenewable resources will drive society to live lives and maintain relationships through a digital infrastructure. If this sounds scary, the loss will not be felt by a digital generation that has spent most of its childhood and teenage years in cyberspace.

UK viewers watch on average four hours of television per day. An American spends on average 8.5 hours per day interacting through digital devices. More and more of our lives are being lived in cyberspace. Even the demarcation between what's real and what's virtual is blurring. We already have names for people living in cyberspace—netizen, digerati, bitnik, troll, griefer, cybrarian. In MUD for example, multiple gamers undertake perilous journeys through cyberspace, often indulging in role plays quite at odds with their real-world identities. An online personality can be more real to some individuals than their biological existence. Their avatars crafted in bits coursing through miniaturized urban landscapes of silicon live far more adventurous lives than their carbon counterparts. The real world as we know it today will probably exist only for biological sustenance. For everything else, the digital world will mould our thoughts, form our memories, and create our experiences. Commuters will no longer bother to look out of windows at passing landscapes. Their tablets already open up for them windows to new and multiple worlds. Business travellers would rather not ask directions from locals. They would rely instead on GPS navigation to get to a restaurant nearby. They would rather not be adventurous in ordering something exotic unless they have checked the reviews and ratings. It is therefore not surprising that Max Frisch once pointed out, "Technology is a way of organizing the universe so that man doesn't have to experience it."



**It is hardly** surprising that technology, while offering many benefits, comes with its own set of problems. A knife can be used for cutting or killing. Digital technology is no different. The problem is not just about intentional misuse. It is in the inherent dual nature of technology. A knife is generally a useful tool but it also represents a real danger when toddlers get their hands to it. One of the most successful industries on the Web is pornography. Its business model was very clear from the start and brought in profits early on. This was in contrast to many free web services that relied on an advertising model. Hotmail struggled to make profits for years. Despite being the world's second largest email provider, now part of Windows Live, it is a loss-making arm of Microsoft's operations. Pornography faced no such soul-searching difficulties. It has been among the first to commercialize latest developments in Web technology, from sophisticated syndicate sites to live video streaming. At the same time, they are a real social danger by giving easy and anonymous access.

Bits are easy to store and share. They are also easy to steal, duplicate, and distribute for those involved in digital piracy. Protection of intellectual property online is a big concern for many. Original creative works are protected by copyright laws but the laws allow individuals to rephrase or summarize content. This is part of the license to be creative. With online availability of content, it has become easy for automated systems to obtain content from multiple sources, combine them into a different form, and release them as their own. The law is as yet unclear on how these should be treated. Generally termed as _content aggregation_ , it did not go well with media magnate Rupert Murdoch. In 2004, Murdoch erected paywalls around his news websites to protect original content.

In the olden days, robbing a bank required masks, guns, and transport. In the Internet era, where security systems are inadequate, it can be done from the comfort of one's own den with nothing more than a laptop hooked up to the Internet. Not long ago, living inside four walls meant privacy. Modern walls are transparent, their boundaries fragile. What we really need are sophisticated firewalls. The game has changed. The players are the same but they have to relearn and rethink in new ways.

Smart programmers create good and useful software. Other smart programmers of a different strain create computer viruses designed to cripple networks and systems. What seems an innocuous web link, can lead to stealing of private data and tracking of keyboard actions. Little less malicious are junk emails containing unwanted messages and advertisements, often termed as _spam_. By estimates, these represent nearly 70% of the world's email traffic. Sending emails to millions incurs next to no cost whatsoever. In the mobile space, the same can be said of SMS. These automated systems are Internet's own way of broadcasting, except that unlike radio and television broadcasting, the user has little choice what she receives in the mailbox. In the coming years, spam can be expected to make its presence through smartphones and multimedia content, consuming more computing resources and communication bandwidth.

The business of viruses and spam runs deeper that what appears to the eye. Antivirus software and firewalls would serve no purpose if the threats didn't exist. Software providers continually urge users to upgrade their software to protect against new viruses. This sounds more a threat than an advice. There is no guarantee that latest upgrades are better than older software. New software code built with additional features and interfaces, if not properly tested, can open up new security flaws. In early 2013, Oracle's Java 7 was shown to introduce serious security holes, which had not existed in Java 6. The best advice experts could give users was to revert to Java 6 or disable Java altogether.

One of the great things about software is the ability to upgrade rather than throw. Software is potentially immortal. This has been one of its strongest selling points against hardware. Since not all users are likely to upgrade at the same time and software often has to run on previously purchased hardware, one of the key design goals is _backward compatibility_. This is important for application software as well as communication protocols. Backward compatibility requires that old functionality be retained even when new features are introduced. Even if old ways were possibly inefficient, they can't simply be discarded. At best, features can slowly be deprecated so that new software stops using them. Nonetheless, old software lives on.

Backward compatibility is generally a good thing but it bloats software code because new code adds to old. It doesn't replace the old. Code maintenance is a pain. Sometimes it's better to start from a clean slate. There are companies still building GSM protocols from scratch because most existing implementations are two decades old. They believe a fresh implementation will bring efficiency and performance. It will also facilitate easy integration with 3G and 4G protocols. The world's first popular browser, Netscape Navigator, initially occupied a little more than 1 MB of storage space. Its successor, Netscape Communicator, integrated an email client, an interactive conferencing facility, address book, powerful HTML editing software, and more. By the time of version 4.8, Communicator had become unmanageable at a colossal 21 MB. Even when the source code was opened up to the Web community, no one could salvage Communicator. It had become a beast, slow and inefficient. Meanwhile, Internet Explorer captured the bulk of the market. When Mozilla Firefox came out in 2004, it was designed from scratch and very little of Communicator code found its way into Firefox. Firefox contained basic functionality. Anything fanciful, it was left to the user to add them as third-party plug-ins. Today, even Firefox is becoming a beast on its own. This is the inevitable consequence of changes to HTML, Javascript, CSS, DOM, and other technologies that browsers have to understand.

Users realized that software was great and flexible but it was the underlying hardware that ultimately limited it. With modern electronics, a laptop may work for a decade without issues but its resale value is almost nil within a few of years. Software development cycles are far shorter and usually many upgrades are available within a year. With only 128 MB of working memory, Microsoft Office 2003 could run reasonably well on Windows XP. Today, users upgrading to Microsoft Office 2013 are forced to upgrade memory or invest in a new laptop. In fact, Microsoft's official recommendation is to have at least 1 GB of memory for Office 2013. Another famous failure from Microsoft was the introduction of its Vista operating system. Many users found Vista too sophisticated and resource hungry to run on their old systems that used to run Windows XP comfortably. These are examples of the classic tussle between Moore's Law and Parkinson's Law that's been going on for many decades. It makes a compelling case for cloud computing, which will slow down the need for consumers to upgrade their devices often. Not surprisingly, cloud computing is not popular with manufacturers and vendors of consumer devices and applications. After all, their whole business is based on this constant cycle of upgrades to both software and hardware. Google's embrace of cloud computing means that it is now the fifth largest consumer of Intel's server chips and in direct competition with Dell and HP.

A whole generation of computer engineers have grown up being exposed to only software. They know all about high-level objects, APIs, and applications but know nothing about CPU cycles or hardware architectures. Ask them about power consumption, they either give blank stares or state that it's irrelevant. Ask them about hardware requirements, they quote as high a number as possible knowing that memory and processing are cheap. A software-only approach has made them irreverent of hardware. They take performance for granted. One way to get the best out hardware is to write intelligent software, because it is ultimately software that controls hardware. With the proliferation of mobile devices, it has become all the more important for smart software to reduce power consumption as much as possible. If there is something smart about smartphones, most of it is in software that understands the hardware on which it runs. Software engineers who don't abide by this philosophy can rarely get the best out of system capability. In a particular technical magazine, Qualcomm once announced a job opening thus,

We need outstanding software engineers who can write real time signal processing code (usually in "C" or assembler) who aren't allergic to oscilloscopes and spectrum analyzers.

Frequent software upgrades are causing a different problem within the industry. Users accustomed to Microsoft's Windows platform find even the simplest task hard to execute when they migrate to Apple's Macintosh platform. It's not that the latter is inherently a difficult platform. It's just that users have to relearn everything, forget old ways, and form new habits in the world of computing. Diversity of tools and software is giving computer users lots of choice, in fact, too much choice. It is becoming difficult to stick to something for long. Chrome OS has introduced a completely new architecture in the way users interact with computers. Users who tried it suddenly found themselves trapped in a browser world. The familiar operating system with its local file storage, games, and client software disappeared. When Windows 8 was released in 2012, it too rewrote the rules of human-machine interaction. The new Windows was custom-built for touch screen devices and did not go well with traditional PCs and laptops. These frequent changes are particularly difficult on the elderly, who take longer to relearn new things. When they manage to grasp something, something supposedly better is already on the horizon. This is one of the reasons why many people believe that the information age has not brought expected increases in productivity.

Today we have smartphones more powerful than the first computers of the 1940s. We have remote access to information no matter where they reside. By going digital, we are able to process mountains of data and indulge in high-definition entertainment. Yet many sceptics believe that the information age has not truly improved the quality of our lives. Penicillin saved lives. Chlorination brought clean water to millions and stemmed epidemics. Edison's incandescent light bulb is widely known because its essential impact on society was far-reaching. In the world of communications, telegraphy is still regarded as perhaps the greatest invention. This is because a telegram could reach distant places in a matter of minutes when previously messages might have taken days or weeks. Electronics, computers, and the Internet, argue the sceptics, enabled us to do things faster and better, but the improvements have been marginal in comparison to life-changing innovations of the past. Internet itself brought services online but many of these services already existed prior to Internet days. Tom Standage in his readable work _The Victorian Internet_ , argued how the telegraph network had many of the things we are used to today. In short, the benefits of modern Internet have been overrated.

Being connected with almost everyone has brought each one of us in contact with too many things all at once. There is so much distraction that our brains are getting used to it rather than countering it. We would rather read a short blog post than a full-length book. More links a webpage has, more likely it is that we will not read the article in full. These links may entice us to follow our curiosity and our insatiable desire for more information. HTML was designed to put the world's information at our fingertips but Berners-Lee had not foreseen how badly it would affect our abilities to do things sequentially in structured ways.

We don't make the effort to learn something properly because there are simply so many things to learn. So the preferred approach is peck and sample. We would rather walk around an all-you-can-eat buffet than sit down for a three-course meal. Completeness is sacrificed for variety. Rather than learn to use a digital camera the proper way, novice users fiddle with panning, tilting, zooming, digital brightening, and sepia toning without really learning the basics of composition and exposure. It is common for inexperienced designers to pack webpages with distracting images, fancy fonts, and blinking headlines. In the process, the essential message gets buried by the uncontrolled use of multiple conflicting features.

This culture of using more but getting less is plaguing everyone, from content producers to consumers. Television news channels are the perfect examples. While someone is reading the news, a ticker tape at the bottom scrolls along with completely unrelated news. A little above that, Twitter updates keep popping up once in a while. Suddenly a headline reading "Breaking News" flashes, except that it is almost regular and predicable. The viewer hardly knows whether to follow the latest gold prices at the bottom right corner, listen to the news at the top left part of the screen, or watch live feeds from field reporters that are relayed at the top right corner. Our attention spans don't last long on any particular item. The diversity and richness of multimedia interrupt and overwhelm our senses.

Even in the days before the Internet, some people complained about interruptions. When the telephone rang, people felt compelled to stop whatever they were doing and pick up the handset. Then came a marvellous invention called the answering machine. When emails and SMS messaging arrived on the scene, it was exactly in this spirit of both courtesy and non-urgency. People could check their voicemails, mailboxes, and messages in leisure. These were asynchronous services that didn't require immediate attention. Unfortunately today, checking emails and messages has become almost an addiction for many. Now that these asynchronous services are coming directly to smartphones, including Facebook updates and tweets, people once more feel the urge to stop their current tasks and reply immediately. An email notification acts as a neurological stimulant and like any drug addiction makes us want more of it. Research has shown that it takes many minutes of focused concentration to get into deep thinking. With more interruptions from our digital devices, we can little expect to produce Einsteins and Edisons in future. Thankfully, not everyone is a digital addict and there are still sufficient people who know how to control their consumption patterns.

What everyone consumes in a digital world is information; and there is lots of it. Commonly heard phrases these days are "information overload" or "information fatigue." In hindsight, it is no mystery how we arrived here. Armed with point-and-shoot digital cameras and almost ubiquitous connectivity, everyone today is a potential producer of information. It's not just humans who are generating data. Machines are automatically tweeting. Machines are analysing data and creating data out of data. As of March 2013, 400 million tweets were being created every single day. In 2012, about 2.5 exabytes of data were being created per day, an exabyte being a billion gigabytes. More data passes through the Internet every second than all the data stored in it two decades ago. Data centres that store and manage such high volumes of data take up 1.5% of the world's electricity. Such large numbers have been impressive enough for scientists to coin the term _Big Data_.

There was a time when thoughts were considered fleeting. In a digital world, every thought is reduced to bits, tweeted around, and permanently stored on a server somewhere. Little consideration is given whether these thoughts will have value a decade from now. The world as we experience it is certainly analogue in nature. By capturing fleeting thoughts, flashing images, and whispered sounds, we are actually sampling and quantizing our present. At some point in the future these digital samples could perhaps be used recreate the past. We may do just that and live in a virtual world if the future proves unsatisfactory. Even if this future is fantasy, our present obsession with data is not without reason.

The very fact that there are now more content producers has given a new relevance to data. Social gaming and social networking go beyond interaction. These are platforms for collecting data on user behaviour and preferences. Advertisers can target consumers selectively. The fact that someone in East London likes lemon pie is of limited value to a bakery just starting out. But if the baker learns that 50% of East Londoners aged between fifteen and fifty like lemon pies, he will know where to put his focus. Commonly known as _data analytics_ , information on the Web is giving insights into market trends and patterns of consumption. This may one day replace traditional market surveys and questionnaires. The belief is that society is leaving its traces on the Web. Consumers are becoming increasingly expressive of everyday details. Chatrooms, blogs, newsgroups, multimedia uploads, and tweets may contain lots of everyday trivia and even mindless chatter, but at least in the world of advertising they have value. Big data doesn't exist for its own sake. It enables data analytics.

While data analytics is about looking at who is doing what, some realized that it is also possible to request Internet users to collaborate on something. Instead of relying on a few sources of information, there is power in the collective knowledge of the masses. One calls this _crowdsourcing_. Wikipedia works on the idea of allowing anyone and everyone to contribute to its storehouse of information. Open source projects do the same for programmers. When it comes to pooling resources, SETI@home is a suitable example. Yahoo! Answers allows users to ask questions. There are no nominated experts to give answers. Rather, anyone can supply the answers. In 2009, the _Guardian_ newspaper in the UK requested the public to check dubious expense claims made by British MPs. In 2010, the Library of Congress in the US relied on the power of crowdsourcing to identify people from old Civil War photographs.

The only problem with crowdsourcing is coordination. Without coordination, crowds can become mobs. One ends up with a haystack of useless information. Where there are multiple and diverse voices, it is easy for crowds to lose focus. Crowds can get polarized into highly opinionated groups, thereby putting the project at risk. The Internet, the Web, and associated technologies started from the ideas of a few individuals rather than crowds. If Wikipedia has succeeded, it is because of many dedicated volunteers who give advice, point out flaws, mark articles as having inadequate references, and merge duplicate articles into coherent wholes. Because a million voices can generate impossible amounts of data, it is important to provide convenient ways for exchanging information. It is important to have democratic platforms for debate. Only then can the grain be separated out from the chaff.

Information overload is not so much as being smothered by information as being unable to decide what's important. There are certainly many interesting things happening all over the world but it is simply impossible for anyone to keep track of everything. Since information is anyway accessible with minimal effort, we end up spending hours on the Web hopping from one interesting news to another fad that's just gone viral. The question that must be asked is if something is relevant to our goals, our lifestyle, our idea of work and play. The accusation is that we are not actually learning but passively consuming information. Our minds are no longer transforming information into knowledge. Content is enjoyed and prompting forgotten. The fact that much of it is multimedia makes it worse.

In the immediate years of post-independent India, a whole generation of wildlife enthusiasts grew up reading Jim Corbett's descriptions of the jungles of Kumaon. In a time when very few television documentaries were available, Corbett's writings excited imagination. Despite his detailed descriptions, there was still much imaginative scope for the reader to put the pieces together in whatever manner he wished. He could actually look right into the eyes of a man-eater and feel danger coursing through his veins. Multimedia facilitates learning but its overuse inhibits imagination. What is really needed is a right mix of both style and content. Often there is little time to pause and reflect. We only need to consider music videos as examples. These videos are usually collages of short clips pieced together from multiple cameras. If a scene remains in view for more than a couple of seconds, it's a big deal. No doubt, these are works of art and have entertainment value. But if this is all a person watches, the brain adapts to it without regard to consequences on concentration and reading comprehension.

In the distant past, communication was oral. Knowledge was passed down generations by an oral tradition. Scholars had a prodigious amount of memory to remember and recall things. Then came the scribes who committed words to palm leaves and papyrus. Thoughts and ideas thus acquired a more permanent status, not as permanent as writings on stone but far easier to execute and distribute. Thus began a slow decline of human memory. The trend was not all that bad. Technology helped us to process more information. It helped us to focus on knowledge creation through enquiry and analysis rather than simply remember facts. It helped us to expand our horizons in every direction and at the same time specialize in a few. Digital technology has perhaps taken this too far.

The digital generation cannot recall simple facts without Google's help. A teenager cannot execute simple arithmetic without using a calculator. Engineers during the era of slide rules had an intuitive feel for numbers. When calculators disgorge results, modern engineers sometimes don't see that there is world of difference between three and thirty feet. Most people no longer remember phone numbers because technology takes care of it. This is a good thing from the perspective of phone design but not otherwise. Today we may be losing our memories but in a few decades from now, we may have problems analysing information and generating ideas. We may become adept at searching and finding information but we may not know what to do with it once it is available.

When users attempt to take control of information, they face difficulty in finding relevant information. With so much information around, this search for relevance may be like finding the needle in a haystack. Even with Google, searching is a difficult task. The problem is growth. Information is growing at such an incredible pace that even Google is having difficulty keeping up. Despite being the best for more than a decade, most of online content remains unsearchable by Google. Many websites are protected by passwords and accessible to only a few. Data-driven websites separate client front end from back-end databases managed by the server. Such data cannot be indexed by search engines since webpages are generated dynamically as a response to user queries. Public content on social networking sites is not available to Google. These parts of the Web are not indexed and hence not searchable. They make up what some call the _Deep Web_.

One focus of Internet research is to make the best of visible and accessible data. To this end, Berners-Lee proposed the idea of a _Semantic Web_ in which there would be a descriptive layer to give meaning to content. Computers are good at manipulating ones and zeros but they don't understand what they are about. Google can search for words and phrases but it doesn't know what they mean. Semantic Web is an attempt to associate meaning to bits. It is data about data, what experts call _metadata_. Metadata on a webpage would specify such aspects as date of creation, its author, copyright information, keywords, subject category, and library cataloguing code, among others. To help process such metadata, Semantic Web defines new languages and tools to establish relationships among keywords. So if someone were to search for "Asian mammals," a search engine would include a webpage on Bengal tigers even if it didn't contain the words "Asian" or "mammal." This is because Semantic Web would have established the fact that all tigers are mammals and Bengal is part of Asia. Likewise, if a search is made with the keyword "scale," semantics would inform search engines which pages are about fish scales, which are about music, which are about weighing balances, and which are about maps. This becomes possible even without analysing the content of each webpage.

In the late eighties, Stewart Brand coined the term _broadcatching_. This was before the time when volume of information exploded on the Web. Broadcasting did not deliver personalized content. Broadcatching was one way to filter out irrelevance according to user preferences and personalize content. Rather than go out and search for relevant content, the idea was to be notified of relevant content as and when they became available. It was in this spirit that _Really Simple Syndication (RSS)_ emerged in 1999. Through links to RSS feeds and keyword filtering, users could track multiple websites that interested them. Another example is the use of _cookies_ , which are special bits of information set by a web server on the client browser. With the use of cookies, servers can customize their delivery of information as set by user preferences. The Semantic Web took these ideas to the next level of sophistication. If the original Web was about protocols, client-server models, style, and presentation, the Semantic Web shifted the focus to data itself.

Data on today's Web is largely unstructured. It is also heterogeneous, which makes finding and exchanging relevant data difficult. XML might have brought structuring to data but because it had no associated meaning, XML data remained strictly tailored to specific applications that understood them. Data cannot be reused unless humans decide that it's relevant in another context. Semantic Web aims to solve this problem so that data can transcend systems and applications without human intervention. Computers would see that data generated by forest surveys are relevant to botanists and geologists alike. They would be able to combine census data, police databases, employment records, and online surveys, even though these may reside in various forms across multiple systems.

Right up to the early eighteenth century, biologists were struggling with names. New species were being discovered by the dozens and it was becoming almost impossible to keep up. They didn't want to give random names because that would be unstructured. In their search of structured naming, they noticed that many species were biologically related to others in many ways. They identified vertebrates and invertebrates, cold-blooded and warm-blooded beings, plants and animals, reptiles and amphibians, birds and mammals. From such classification, Carl Linnaeus brought out in 1735 an entire taxonomy by which biological structure and lineage became part of the name. Semantic Web is attempting to do this for all types of data via metadata and rules expressed in formal languages.

Despite being around for two decades, Semantic Web is yet to take off in any big way. There are some who believe that it will never happen. The issue at hand is the complexity of the task. One needs to identify associations and hierarchies. Based on these, every piece of data has to be classified and identified. Only then, computers can make use of them. Fundamentally, computers are still dumb. They have to be given sufficient metadata and the means to interpret them. Semantic Web attempts to bring intelligence to machines in a crude way only because we haven't found a better way to do it. Ultimately, what separates humans and machines is intelligence. What is not clear is how long this separation will last into the future.



**At a conference** in 1956, some of the leading scientists who were attempting to build a bridge between humans and machines coined the phrase _artificial intelligence (AI)_. Machines were taking over the job of numerical computations. Beyond adding numbers, machines aided engineers in their decision-making process. It was therefore natural to ask if machines on their own could take decisions. They were already able to decide between positive and negative numbers. If perhaps everything that we do as humans could be reduced to the level of symbols built from ones and zeros, from manipulation of symbols alone, machines could achieve intelligence. Logical as it may sound, the very definition of intelligence was fraught with problems.

Whether intelligent machines could be built was a question as old as machines themselves. Ada Lovelace had laid down the detailed internal operations of Babbage's Analytical Engine but did not dare speculate if the machine could understand analysis. At best, the machine added to mankind's processing power but did not impart its own knowledge or reasoning. Contemporary to the Analytical Engine, Samuel Butler presented in 1863 an essay titled "Darwin Among the Machines." In this brief essay, Butler portrayed a bleak future in which machines will rule over humans. Machines will somehow evolve to world dominance. In Butler's own time, electricity had given rise to telecommunications. Isolated machines had become interconnected through a vast network of telegraph links. The old steam technology continued to develop. The potential of combining diverse technologies could certainly result in many varieties of machines.

From the perspective of modern computers, among the first to consider the topic was Alan Turing, who wrote in 1950 a paper titled "Computing Machinery and Intelligence." This appeared in a journal titled _Mind, A Quaterly Review of Psychology and Philosophy_. The paper itself outlined views as diverse as computing, theology, mathematics, consciousness, extra-sensory perception, nervous system functioning, laws of behaviour, and rules of conduct. This diversity underscores the difficulty of the subject. One of the famous tools that Turing introduced was the "imitation game," more commonly known today as the _Turing Test_. An interrogator asks questions to a human subject and evaluates the answers. The test is to see if a computer can successfully replace the human without the interrogator noticing it. In other words, a computer that passes the test implies that it can answer questions as well as the human. This leads to the naïve conclusion that such a computer is indeed intelligent. The real truth is perhaps evident in the name of the game—the computer is adept at imitating intelligence.

All living beings may be said to possess intelligence, with humans at the apex of the intelligence pyramid. To study AI, the sensible approach was therefore to study the human brain itself. Such a study was motivated more by our quest to understand the mind than to build intelligent machines. Clearly, a scientific model was needed and scientists looked towards machines. Historically, this may be seen as an unfortunate development. It seemed easier to explain things in terms of machines rather than biological processes. We are the creators of machines and our deeper understanding of life processes came decades later. Thus, instead of studying how the brain functions and see how this could be applied to machines, scientists began to model brains as machines. It is interesting that these models kept evolving with the times. During the Renaissance, the brain was modelled as a clock. Later it was compared with steam engines. More recently, it was equated to a telephone exchange, which certainly seemed complex enough for comparison. During the time of Turing, the brain was modelled as a computer. It was a computer that replaced a human in his imitation game.

There is an obvious difference between the brain and the mind. One is physical, the other mental. In computer lingo, one says hardware and software, circuitry and program. This analogy with computers influenced the early theories on AI. Little importance was given to hardware. Intelligence resided in software. This implied that intelligence was really a manipulation of symbols based on defined rules. If any problem could be solved in this manner, all it took to create intelligence was to write the right program. The key proponents of this theory were Allen Newell and Herbert Simon, who in 1976 proposed their _Physical Symbol Systems Hypothesis (PSSH)_. To them, the underlying hardware did not matter. Brains were made of carbon. Computers were made of silicon. This difference did not matter because they both did similar symbol processing. In fact, Hans Morovec coming from the perspective of robotics suggested that by replacing neurons in the brain with electronic equivalents, we change nothing to a person's consciousness. The fact that computers were not yet intelligent simply points to primitive programs rather than fundamental limitations of computing machinery. Not everyone agreed.

John Searle started to set things right by giving importance to physical and chemical processes that occur in the brain. Symbols on their own lead to other symbols, like words in a dictionary being explained using other words. A machine may produce a perfect translation of any given text using only symbol mappings, but without understanding anything in the process. A person born blind can never understand colours though she may be taught the words of all seven colours in the rainbow. She would know the symbols but not know what they mean. A symbolic representation of mental cognition supported a dualistic view in which physical and mental realms could be treated separately. In Searle's view, the two realms were intimately related. It is not possible to ignore the brain's construction and still claim intelligence. To put it differently, machines can never be intelligent because their machinery is so different from the way human brains function.

Steven Harnard expressed the same principle by stating that meanings that we attribute to symbols are derived from our experiences of the world. We know the difference between milk and water because we experience them through sense organs. We also know by experience that both milk and water share certain characteristics. This experience precedes any understanding that we have about liquids and evaporation. Computers work in the abstract realm of symbols. Unless they can sense the world around them and experience it fully, they will acquire neither understanding nor intelligence. Theories of how we understand the world and derive meaning are closely tied with linguistics. A word is really a representation of an idea. The greater question is if humans are born with innate ideas or are ideas acquired from our interactions with the world around us. Indeed, linguists are perhaps closer to AI than computer scientists. It is language that truly separates man and machine. Even among animals, humans represent a higher order of intelligence because of language. Language is the external manifestation of any inborn intelligence that might reside in the brain.

Noam Chomsky, an eminent linguist, expressed that humans are born with linguistic abilities and hence intelligence. He reasoned that our senses are quite limited. They assist in perception but hardly anything else. He called it "poverty of stimulus." Sense organs are inadequate in explaining human intelligence. Our ability to therefore connect words in new ways and understand sentences are because it is an essential part of who we are. The Turing Test is a challenge for computers precisely because they have difficulty comprehending language. They can solve equations but they are unable to deal effectively with words. Even with sophisticated programming, sooner or later they betray their limits in dealing with unforeseen scenarios. Chomsky stated that to ask if computers can think, is like asking if submarines can swim. We may be able to digitize words in ones and zeros, but we can't be definite about their meaning. This is what trips computers.

Such debates on linguistics and AI resemble old ones from philosophy between rationalists and empiricists. One was a symbol-centric approach. The other was a behavioural approach that relied on interactions with the environment but also depended on the nature of the brain itself. If the human brain is essential to intelligence, there is nothing more to do for AI. Perhaps it's best to build a cyborg using a human brain for control and machine parts for everything else. Then came the Internet and AI took a new turn of enquiry. It was AI pioneer Marvin Minsky who suggested in the eighties that,

you can build a mind from many little parts, each mindless by itself. I call this scheme "Society of Mind," in which each mind is made of many smaller processes. These we'll call _agents_. Each agent by itself can only do some simple thing that needs no mind or thought at all. Yet when we join these agents in societies—in certain very special ways—this leads to true intelligence.

In Turing's days, the brain was likened to a computer but in Minsky's days, it was a network of connected agents, in a way mirroring the Internet. In fact, Minsky's ideas can be traced to his student days in the fifties but serious comparison of the brain to such networks did not happen until the Internet itself was on the rise. This is perhaps a fitting example to show that some ideas do not take hold until, like creepers, they find their host trees. It is from here that _artificial neural networks_ began to be taken seriously for real-world application. The focus of AI went into understanding the internal workings of the brain and to see if knowledge gained in the process could be applied to machines. The brain was not seen as a collection of neurons but rather as a network of neurons with multiple incoming and outgoing connections at each neuron. Connectivity across neurons became as important as neurons themselves. An idea or an experience could trigger a series of reactions involving select neurons and interconnections. Each neuron on its own might be simple enough lacking intelligence but a network of neurons resulted in intelligent behaviour.

Network of simple elements lead to something called _emergent behaviour_. This was one way to explain why a flock of birds and a shoal of fish assume definite shapes and forms. Back in the late sixties, soon after discovering new sporadic groups, John Conway of Cambridge came out with a model called _Game of Life_. Conway's game started with a few shapes and a set of simple rules of transformation of those shapes. These shapes moved on a grid of little squares interacting with one another. As the game proceeded in discrete steps, what one may call a clock cycle or generation, the shapes evolved. Conway saw that from simple shapes complex shapes could evolve after many generations. Even when initial shapes did not have symmetry, symmetric and stable shapes evolved. Some shapes exhibited periodic behaviour, toggling between two alternatives. Conway's model was fundamentally a finite state machine, which Mealy and Moore had formalized in the world of computing. Minsky's theory was that such machines could evolve.

It was therefore important to focus on simple structures. A network of simple parts can learn and adapt. It acquires an ability to handle unforeseen situations even when such situations had not been encountered in the past. Evolution is an essential process and if machines are to become intelligent, they certainly cannot skip the process. Human intelligence is due to millions of years of biological evolution. What we regard as common sense is perhaps the most complex aspect of intelligence. It is common because we don't think about it consciously. We take it for granted and often can't explain it. Common sense is exactly what machines lack and it they have to acquire it, they have to build it from simple parts. Parts should interact and evolve. If machines are to have meaning, understanding, and emotional depth, they have to obtain them by a slow process of interaction and learning. It is with this approach that the iCub project was launched in 2004 under EU funding. Since that time, iCub prototypes of humanoid robots have been interacting with their environment through their artificial sense organs. They are now able to crawl around, grasp objects, and even express emotions through facial gestures.

When machines and computers first arrived on the scene, they were primitive. Scientists did not have the time to wait patiently for them to evolve. As a result, they were not designed to be smart. The expectation was that humans should adapt and learn to work with machines, not the other way around. For more than one reason, early computers had only textual interfaces. Except for computer experts, most others dreaded the thought of pressing a wrong key and triggering a series of catastrophic events that proceeded silently in the unseen world of bits. Such events may include deletion of files or emptying of bank accounts. It was only when graphical user interfaces emerged that computers became somewhat easier to use. Modern smartphones have superb user interfaces but they have little to suggest that they are smart in any real sense.

The closest candidate to intelligence is perhaps Siri, a digital agent included by Apple in its iPhone. Siri answers questions that users may pose but it is doubtful if it will pass the Turing Test. Siri's responses are short and humorous at best. Often it is empty rhetoric. It may make an excellent digital psychiatrist but it is incapable of giving intelligent answers. Back in 1997, IBM's Deep Blue supercomputer spectacularly defeated world chess champion Garry Kasparov. Deep Blue prided itself for being able to evaluate 200 million moves per second and using heuristics tuned from analysis of Kasparov's own style of play. At best, Deep Blue demonstrated brute force computing made tractable by heuristics. It was not really intelligent. In 2011, IBM's Watson defeated two of the best players in a TV quiz show called _Jeopardy!_ Where Deep Blue worked with numbers, Watson worked with words. Watson was closer to human interaction because it analysed human language.

Before the game, Watson had devoured something equivalent to a hundred million books of information. This became its storehouse of knowledge, data that was largely unstructured. Where Google might give search results in response to a question, Watson analysed data and gave an answer. Watson did not think as humans did but it certainly behaved as one by answering questions. Yet, it was not intelligent since it could not answer questions that were completely missing in its knowledge base. Where Watson succeeded was in processing language with all its confusing subtleties, puns, and wordplays. Computers might follow Watson in years to come if Semantic Web becomes widespread. For the moment, only supercomputers such as Watson can do it.

Deep Blue and Watson are examples of research that came decades after the first computers entered homes and businesses. Early machine designs focused on technology with little consideration to usability. The success of Apple II and the easy-to-use VisiCalc application brought home the realization that usability was perhaps more important if computers are to be used by common people. If engineers couldn't make machines to interact in human terms, speak in human tongues, and understand human needs, at least they could design them to make it easier for humans to use them. Such considerations that had begun during the years of the Great War gained traction in the second half of the twentieth century under such fancy terms as _ergonomics_ and _human factors engineering_. At least in implementation, the concepts were misunderstood.

With ergonomics, focus remained on visual design and appeal, not on functionality. Products looked good, were packed with rich features, and no doubt made first impressions. When users started to use them, they were often confused. They felt helpless. Music systems had a dozen dials and switches for sound control. It was a lot worse when all controls looked alike but stood for different functionality. Television controls could not be operated without first reading the manuals. Office phones capable of holding calls, transferring calls to another desk, or retrieving calls using cryptic sequences of numbers, caused confusion to users. Washing machines had a dozen buttons and it was not clear what would happen if buttons were pressed in a wrong sequence. This was the bane of technology and users accepted it passively for technology-centred designers had given them no choice. When they used their gadgets wrongly, they blamed themselves for being stupid. After all, engineers and designers had created the impression that one had to be smart to use sophisticated technology.

For many decades, design of gadgets remained centred on technology rather than the user. Function was more important than form. Looking at old rotary dials on telephone handsets, we find that most countries used a numbering that ranged from 1-0, whereas Sweden was unique in adopting 0-9 ordering on the dial. Sweden's method seems more convenient for everyday users. The more common design was motivated by technical requirements rather than user needs. With pulse dialling, the digit 0 triggers a sequence of ten pulses and is therefore placed after the digit 9. For good part of the twentieth century, engineers rarely gave thought to usability. They rarely succeeded in seeing the user's perspective because they came with the baggage of knowing how the equipment worked internally. Often this was different from what users perceived as the machine's model. This mismatch created confusion.

The Design of Telephone Rotary Dials

(a) In the US, rotary dials incorporated both alphabets and digits inside finger holes. The numbering was 1-9 followed by 0. (b) Swedish dials started with 0. Many early models did not have alphabets in them. (c) Danish rotary dials kept the digits as part of the rotating disc. This had the added advantage that information about the current digit being dialled was preserved, which was a useful feedback to the user.

The classic case is the layout of digits on an electronic calculator, which is quite different from that on a push-button telephone. Though electronic calculators arrived later, their design followed a convention established by mechanical calculators of an era long before push-button telephones entered commercial service. Such a design was motivated by the need to save on mechanical parts but such considerations made little sense in the world of electronics. Later when push-button telephones were designed, left-to-right and top-to-bottom was seen as a natural layout. Before CCITT standardized this layout, conferences were held in the sixties to discuss issues of usability. After careful consideration, the layout was standardized and this included the addition of two special keys, * and #, for signalling functions. Thus, the digital world remains divided to this day into two factions. Traditional push-button phones and modern mobile phones use one convention. Electronic calculators and modern PCs use another. It is debatable which of the two is better in terms of usability, but if we are to go with CCITT's early studies, telephones have got it right at the expense of computers.

The Design of Push-Button Layouts

(a) Electronic calculators were motivated more by technology than user convenience. This layout has persisted in modern computers. (b) Push-button telephones used quite a different layout that was more user-centric.

Engineers knew how to use special keys on telephone sets but users could rarely do it without referring to manuals. Best designs are therefore those that are intuitive and self-instructive. They give clues to their usage without needing a user manual. When we pick up a handset, the dial tone indicates that the line is ready for an outgoing call. When we press a digit, a beep indicates that the digit has been dialled. Such feedback is essential to good design. Mobile SIM cards or batteries will fit into their slots in only one way because they are built with physical constraints to exclude alternatives. These examples of good design are matched by bad examples as well. Only expert computer users know that the F1 key triggers the help system. Engineers for their love of abstraction named it F1 so that it could be mapped to any other desired function. In reality, the key is hardly ever used for anything other than help. A good design giving consideration to usability would name it the "Help" key rather than the cryptic F1.

In the brief days of ISDN, handsets were packed with features. A single push-button was used for multiple functions via complex signalling. It was complexity that users couldn't handle. Today's mobile phones do many more things besides phone calls. Yet users can switch with ease across phone calls, messaging, gaming, and calendar management. This is because these phones have rich displays. At every point of user interaction, visual feedback is provided. The design allows users to see mistakes and return to a previous step. Perhaps the most important user function in many digital applications is the "Undo" function. This single feature is the reason why users are not afraid to try new things or make mistakes. They know that mistakes can be corrected. Without this function, the usability of many digital applications would be severely limited.

Good designers know that users must be given a choice between alternative ways of getting things done. There is no such thing as the average user since everyone has personal preferences. It is for this reason that many websites provide easy controls to adjust the size of the text. Though such controls are available in most browsers, by providing it as part of the webpage, usability is improved. Increasing the font size does not incur messy scrollbars. By providing control within design, design elegance is preserved. This is something that e-book readers such as Amazon's Kindle have almost perfected. Users do not have to bother scrolling up and down, or side to side. The application scales to fit the size of the display, be it a smartphone, an iPad, or a workstation.

On any large website, some users may find stuff by searching. Others may start from the main page and navigate via hyperlinks. For example, content aggregators and news websites provide short summaries with links to full articles should the reader be interested. Many users may refer to the menus. All methods have equal value. If a website requires many menu items to choose from, modularity is adopted so that similar functions can be grouped together. Common functions are made more visible than others. Dropdown menus make for cleaner design but there is nothing worse than hiding an important user interface deep within the menu structure. Design is therefore a trade-off between hardware cost and software complexity, between elegance and usability. Apple has successfully demonstrated that users are willing to pay more for better design. Apple's products are aesthetically pleasing but also easy to use. Apple's designers saw the importance of integrating features seamlessly into a workflow. The idea was to keep the user's attention on the task at hand rather than fiddle around with complex menus to access features. For Apple, user's attention was more valuable than computing resource. But even Apple made design mistakes early on.

Apple's Lisa computer came with a mouse that had a single button. The original mouse of Douglas Engelbart had three buttons, which was incorporated in a mouse designed soon after at Xerox PARC for its Alto computer. When Xerox released a simplified version of the Alto in 1981, called Xerox Star 8010 Document Processor, the Star's mouse was simplified to just two buttons. Designers at Apple felt that this was unnecessary complexity and reduced the mouse to a single button. In the process, they invented double-click action. A single click was for selection whereas a double-click was for execution. This was truly a design innovation because to have two buttons to implement these functions would incur mechanical complexity for the user. Perhaps, Apple simplified the mouse too much. Today the standard design has two buttons with a scrolling wheel in between. The left button is for immediate actions while the right button is for bringing up a menu of common operations.

There was a time when in the world of communications, focus was on the message. This approach may be seen as being centred on information theory. Needs of the user or how he obtained that message mattered little. Today we are seeing a shift towards user-centric design. Users are considered first followed by suitable delivery channels. The way one says something is being given more importance than what is said. A web server may store an image of many megabytes but when served to a thin client on a mobile browser over a slow channel, a low-resolution image is transmitted. The same image when sent to a desktop user over ADSL broadband is transmitted in high resolution. This sort of differentiation already exists on many websites including Flickr. In the same spirit, Project Gutenberg allows users to download e-books in a variety of formats as preferred by the user. Though email had been around since the early seventies, Research in Motion (RIM) reinvented email by launching BlackBerry pagers in 1999. This was followed a year later by a similar service on a BlackBerry smartphone. It was a welcome solution for business users who didn't mind typing with their thumbs on small keypads.

Machines are beginning to synthesize human speech but they are not quite there yet. Abode Acrobat Reader can read text aloud, certainly a useful feature for the blind. Its most significant flaw is that it doesn't pause where necessary. Ends of sentences are sometimes read out literally as "dot." The word "instructions" is read out as two separate words—"instruct" and "ions." Other machines are beginning to listen to interviews and generate transcripts; but they have difficulty hearing the difference between "call her" and "caller." They are unable to read into the context to tell between "fair" and "fare." They are also unforgiving because they require speakers to maintain a single tone of voice. Variations, emotional stress, and ambient noise are treated punitively.

Towards a User-Centric Design

(a) In the past, design was centred on technology. Engineering goals often did not consider the needs of users. (b) User-centric design is gaining popularity. Engineering goals are often aligned to the needs of users. Engineers approach design from a user's perspective.

Elements of good design are necessary only so long as machines remain dumb. If machines evolve into a new order of intelligence, it will not be via the slow process of Darwinism. Machine evolution will happen on a faster path where more than mutations and natural selection, the hereditary laws of Mendel will have greater relevance. Machines will simply inherit successful elements found in other machines and thereby enhance their capabilities. Smartphones evolved quickly from traditional landline handsets by inheriting wireless technology from radar and display technology from computers.

If machines are to evolve to something similar to humans, perhaps they have to inherit traits of unpredictability. There should be an element of randomness in their behaviour. This is what information theory has taught us. If machines always come up with predictable answers, they are regurgitating rather than producing knowledge. When we consider each machine as a network of basic interacting elements, there are innumerable ways in which these interactions can take place. When machines are required to produce answers to questions, we should not expect these answers to be as clear-cut as ones and zeros. Answers will be influenced by the machine's particular state of mind, its emotions, its interpretation, and reasoning. IBM's Watson had in fact not come up with definite answers. It had evaluated hundreds of answers, ranked them, and put them through various validity tests. It had then selected the most probable answer. Sometimes it preferred to remain silent than risk a wrong answer. Give the same question to Watson, it will result in the same outcome. Machines of the future will be fundamentally different because the processes themselves will be subject to variations.

Make a supercomputer read all the world's books on mathematics and then ask it to perform a simple task, "Write a 5000-word essay on the birth of calculus." Today's computers will fail miserably at this task. Even when much of the information is factual and historical, computers will be unable to connect ideas in a natural progression. They may come up with sentences but will find difficulty in framing paragraphs. Give the same task to humans, we can expect every essay to be different in one way or another. Styles will vary from chatty to formal. Vocabulary will vary from simple to complex. Some may write in short sentences while others may prefer long ones. There will be essays detailing the synergy between mathematics and physics. There will be essays focusing more on lives of its inventors. If computers are to equal humans at such tasks as reasoning, contemplation, interpretation, evaluation, and composition, they would have to encompass such diversity. It may be that each computer would be different in its behaviour from every other computer. Alternatively, like the Universal Turing Machine, each computer can be configured to take on a certain role temporarily. The computer becomes the body in which the spirits of virtual agents can temporarily reside.

The ultimate goal for machines would be interact with humans naturally. They will be as human as humans. Machines will be able speak in English and French. They will understand Mandarin. They will accurately translate the same into Arabic. They will serve humans as personal assistants who autonomously manage schedules, give reminders, make bookings, and organize holidays. If they are capable of motor functions, they will pick up kids from school, do house chores, and do the weekly shopping. They will possess knowledge not because they were fed mountains of data but because they developed that knowledge through learning. They would therefore be able to synthesize knowledge from what they already know. They will feel, express, and communicate. They will constantly learn and evolve. If machines achieve all of these, what is left for humans to do? It was therefore Syndey Harris who said, "The real danger is not that machines will begin to think like men, but that men will begin to think like machines."

# Acknowledgements

**Sometime during the** summer of 2011, it happened by chance that I came across a few online biographical notes on Claude Shannon. I had of course known about Shannon since my student days when information theory was introduced as part of the undergraduate curriculum. The problem with college curriculum is that there is so much technical stuff to be covered, there is hardly any time left to talk about the men and women who contributed behind the scene. When reading about Shannon, I understood the context in which his theories took shape. I understood the problems communication engineers faced and how Shannon's theoretical framework came to them as a guiding light.

Shannon's ideas had not germinated in isolation. They were influenced by analogue computers as well as switching circuits common in telephone networks. Engineering problems faced during the Second World War gave birth to many things. Shannon's ideas standing at the beginning of digital technology were among them. It occurred to me that textbooks focus only on subject matter and therefore provide incomplete knowledge. By leaving out the process, by ignoring the engineering, political, and social contexts in which inventions took place, students miss out quite a lot. My first acknowledgement is therefore to the numerous scientists and engineers who have rekindled in me, and possibly to rekindle the readers of this book, a passionate interest in technology, its creation, and evolution. It is through their eyes that we can truly understand technology.

My thanks are due to a few individuals who took time from their busy schedules to review early drafts. Sudeep Divakaran, Sudhir Kumar, and Clifford Joseph gave many useful comments. Their comments helped me refine the content and stay clear of technical jargon as much as possible. Pritam Kalaimani wrote MATLAB programs to generate most of the graphs found in this book. Boopathy Srinivasan conceived and executed beautiful cartoons as appropriate for each chapter. It was a pleasure working with him in this creative effort. Santosh Hombal made time from his own business venture to create many of the technical drawings.

This book would never have been possible without the Internet itself. The Internet has simplified access to publications, some of which date to the seventeenth century. For example, Project Gutenberg provided me with the 1635 publication of _A Discourse on Method_ by René Descartes. The world's oldest peer-reviewed journal, _Philosophical Transactions of the Royal Society_ , published since 1665, was made available online in 2011. My thanks are therefore also due to the Royal Society for putting within my easy reach the original works of Isaac Newton, Benjamin Franklin, William Thomson, and Guglielmo Marconi, among many others.

One of the most important journals for my research has been the _Bell System Technical Journal,_ which incidentally contains Shannon's seminal paper of 1948. During my student days in the early nineties, I had to borrow these journals in printed form from the libraries. Many issues of the journal were often bundled together as bulky leather-bound books. Bell Labs no longer exists but thanks to its successor, Alcatel-Lucent, the entire journal collection from 1922 to 1983 is now available online. Without such an online access, my research would have either fallen short of my own expectations or taken a lot longer.

My thanks are due to the _eResources_ of the National Library Board of Singapore. My membership with the library greatly eased the job of finding relevant papers. Thanks are also due to the libraries of the National University of Singapore, which provided me with a visiting membership for the duration of my stay in Singapore. Though the campus has changed quite a lot since my student days almost two decades ago, campus food, music concerts, and movie screenings brought back fond memories. These are memories as yet impossible to recreate in a virtual world of just ones and zeros.

# Notes

#  Bibliography

Abbott, D., Davis, B. R., Phillips, N. J., and Eshraghian, K. (1996). "Simple Derivation of the Thermal Noise Formula Using Window-Limited Fourier Transforms and Other Conundrums," IEEE Trans. On Education, vol. 39, no. 1, pp. 1-13, February.

Abramson, N. (1963). Information Theory and Coding, New York: McGraw-Hill.

Abramson, N. (1970). "The ALOHA System—Another Alternative for Computer Communications," Tech. Report B70-1, AFOSR, April.

Agassi, Joseph. (1967a). "Ampère's Discovery," History and Theory, Beiheft 2: Towards an Historiography of Science, pp. 20-23, Wesleyan University Press.

Agassi, Joseph. (1967b). "Oersted's Discovery," History and Theory, Beiheft 2: Towards an Historiography of Science, pp. 67-74, Wesleyan University Press.

Ahmed, N., Natarajan, T., and Rao, K. R. (1974). "Discrete Cosine Transform," IEEE Trans. On Computers, vol. 23, no. 1, pp. 90-93, January.

Alexander, A. A., Gryb, R. M., and Nast, D. W. (1960). "Capabilities of the Telephone Network for Data Transmission," Bell Syst. Tech. J., vol. 39, no. 3, pp. 431-476, May.

Allen, Paul. (2012). Idea Man: A Memoir by the Cofounder of Microsoft, New York: Penguin.

Ampère, A. M. (1820). "Mémoire présenté à l'Académie royale des Sciences, le 2 octobre 1820, où se trouve compris le résumé de ce qui avait été lu à la même Académie les 18 et 25 septembre 1820, sur les effets des courans électriques," Annales de chimie et de physique, vol. 15, pp. 59-74, 170-218.

Anderson, J. B. (2005). Digital Transmission Engineering, Second edition, IEEE Press.

Anderson, John B. and Johannesson, R. (2005). Understanding Information Transmission, IEEE Press.

Andrews, E. G. (1963). "Telephone Switching and the Early Bell Laboratories Computers," Bell Syst. Tech. J., vol. 42, no. 2, pp. 341-353, March.

Andrews, F. T. (1989). "The heritage of telegraphy," IEEE Comms. Magazine, vol. 27, no. 8, pp. 12-18, August.

Andrews, F. T. (2011). "Early T-carrier history," IEEE Comms. Magazine, vol. 49, no. 4, pp. 12-17, April.

Ante, Spencer E. (2010). "IBM Turns to Software," Wall Street Journal, May.  http://online.wsj.com/article/SB10001424052748703339304575240213531758410.html. Accessed February 10, 2013.

Armstrong, E. H. (1913). "Wireless Receiving System," US Patent 1,113,149, filed October 1913, granted October 1914.

Armstrong, E. H. (1924). "The Super-Heterodyne—Its Origin, Development, and Some Recent Improvements," Proc. IRE, vol. 12, no. 5, pp. 539-552, October.

Armstrong, E. H. (1936). "A Method of Reducing Disturbances in Radio Signaling by a System of Frequency Modulation," Proc. IRE, vol. 24, no. 5, pp. 689-740, May.

Ata, Bahri. (1997). The Transfer of Telegraph Technology to the Ottoman Empire in the XIXth Century, Boğaziçi University.

Atal, B. S. and Schroeder, M. R. (1970). "Adaptive Predictive Coding of Speech Signals," Bell Syst. Tech. J., vol. 49, no. 8, pp. 1973-1986, October.

Babbage, Charles. (1864). Passages from the Life of a Philosopher. http://www.fourmilab.ch/babbage/lpae.html. Accessed November 24, 2012.

Baker, W. O. (1977). "Language and Logic with Electronics," Proceedings of the American Philos. Society, vol. 121, no. 5, pp. 360-372, American Philosophical Society, October.

Ball, W. W. Rouse. (2010). A Short Account of the History of Mathematics, Project Gutenberg, May.

Baro, A. V. (2003). Interview: Paul Baran-The Beginnings of Packet Switching, pp. 28-32, Stanford, September 27.

Bassett, Ross Knox. (2002). To the Digital Age-Research Labs, Start-up Companies, and the Rise of MOS Technology, Baltimore: John Hopkins University Press.

Beauchamp, Christopher. (2010). "Who Invented the Telephone?: Lawyers, Patents, and the Judgments of History," Technology & Culture, vol. 51, no. 4, pp. 854-878, Johns Hopkins University Press, October.

Beaulieu, Y. (2008). "Peirce's Contribution to American Cryptography," Trans. of the Charles S. Peirce Society, vol. 44, no. 2, pp. 263-287, Spring.

Bektas, Y. (2000). "The Sultan's Messenger: Cultural Constructions of Ottoman Telegraphy, 1847-1880," Technology & Culture, vol. 41, no. 4, pp. 669-696, Society for the History of Technology.

Bell, Eric Temple. (1986). Men of Mathematics, Touchstone edition, New York: Simon & Schuster, Inc.

Bell, T. E. (2006). "The Quiet Genius: Andrew J. Viterbi," The Bent of Tau Beta Pi, pp. 17-21, Spring.

Bellos, Alex. (2010). Alex's Adventures in Numberland, London: Bloomsbury.

Bennett, John J. (1866). The Miscellaneous Botanical Works of Robert Brown, vol. 1, Robert Hardwicke for the Ray Society.

Bennett, W. R. (1948). "Spectra of Quantized Signals," Bell Syst. Tech. J., vol. 27, no. 3, pp. 446-472, July.

Bennett, W. R. and Rice, S. O. (1963). "Spectral Density and Autocorrelation Functions Associated with Binary Frequency-Shift Keying," Bell Syst. Tech. J., vol. 42, no. 5, pp. 2355-2385, September.

Biglieri, Ezio. (2005). Coding for Wireless Channels, New York: Springer.

Bischoff, Dr. (1802). "On Galvanism and its Medical Applications," The Medical and Physical Journal, vol. 7, pp. 529-540, London: R. Phillips, January-June.

Black, H. S. (1934). "Stabilized Feedback Amplifiers," Bell Syst. Tech. J., vol. 13, no. 1, pp. 1-18, January.

Black, H. S. (1977). "Inventing the Negative Feedback Amplifier," IEEE Spectrum, vol. 14, pp. 54-60, December.

Blanchard, Julian. (1941). "The History of Electrical Resonance," Bell Syst. Tech. J., vol. 20, no. 4, pp. 415-433, October.

Blumlein, A. D. (1931). "Improvements in and relating to Sound-transmission, Sound-recording and Sound-reproducing Systems," British Patent 394,325, filed December 1931, granted June 1933.

Boehm, Barry. (2006). "A View of 20th and 21st Century Software Engineering," ICSE'06, Shanghai, China, pp. 12-29, May 20–28.

Boehm, Sharla P. and Baran, Paul. (1964). "On Distributed Communications: II. Digital Simulation of Hot-Potato Routing in a Broadband Distributed Communications Network," Memorandum RM-3103-PR, Santa Monica: Rand Corporation, August.

Boone, J. V. and Peterson, R. R. (2009). Sigsaly-The Start of the Digital Revolution, National Security Agency, January.  http://www.nsa.gov/about/cryptologic_heritage/center_crypt_history/publications/sigsaly_start_digital.shtml. Accessed February 1, 2013.

Bordeau, Sanford P. (1982). Volts to Hertz. The Rise of Electricity, Burgess Publishing Company.

Bowers, B. (2001). "Cooke and Wheatstone, and Morse: a Comparative View," IEEE St. John's Conference on the History of Telecommunications (CHT2001), July 25-27.

Brady, P. T. (1965). "A Technique for Investigating On-Off Patterns of Speech," Bell Syst. Tech. J., vol. 44, no. 1, pp. 1-22, January.

Brady, P. T. (1969). "A Model for Generating On-Off Speech Patterns in Two-Way Conversations," Bell Syst. Tech. J., vol. 48, no. 7, pp. 2445-2472, September.

Brand, T. E. and Sherlock, A. J. (1973). "VK' + V'K? Venn Diagrams or Karnaugh Maps but Not Both. Part 2," Mathematics in School, vol. 2, no. 1, pp. 4-7, January.

Braver, E. R., Lund, A. K., and McCartt, A. T. (2009). Review of "Real-World Personal Conversations Using a Hands-Free Embedded Wireless Device While Driving: Effect On Airbag Deployment Crash Rates" by Richard A. Young and Christopher Schreiner, Arlington, VA: Insurance Institute for Highway Safety, March. http://www.iihs.org/research/topics/pdf/r1120.pdf. Accessed April 18, 2013.

Breen, C. and Dahlbom, C. A. (1960). "Signaling Systems for Control of Telephone Switching," Bell Syst. Tech. J., vol. 39, no. 6, pp. 1381-1444, November.

Bricklin, Dan. (2003). Why We Don't Need QOS: Trains, Cars, and Internet Quality of Service, July 30. http://www.bricklin.com/qos.htm. Accessed March 15, 2013.

Brillouin, L. (1950). "Thermodynammics and Information Theory," American Scientist, vol. 38, no. 4, pp. 594-599, October.

Brittain, James E. (1970). "The Introduction of the Loading Coil: George A. Campbell and Michael I. Pupin," Technology & Culture, vol. 11, no. 1, pp. 36-57, Society for the History of Technology, January.

Brown, James Robert. (1992). "Why Empiricism Won't Work," Proceedings of the Biennial Meeting of the Philosophy of Science Association, pp. 271-279.

Brown, Ralph. (1927). "Transatlantic Radio Telephony," Bell Syst. Tech. J., vol. 6, no. 2, pp. 248-257, April.

Browne, T. E., Wadsworth, D. J., and York, R. K. (1969). "New Time Division Switch Units for No. 101 ESS," Bell Syst. Tech. J., vol. 48, no. 2, pp. 443-476, February.

Brunetti, C. and Curtis, R. W. (1947). "Printed Circuit Techniques," National Bureau of Standards Circular 468, United States Department of Commerce, November.

Brush, Stephen G. (1968). "A History of Random Processes: I. Brownian Movement from Brown to Perrin," Archive for History of Exact Sciences, vol. 5, no. 1, pp. 1-36, Springer, August.

Buckley, Oliver E. (1952). Frankbaldwin Jewett 1879-1949: A Biographical Memoir, National Academy of Sciences.

Bullington, K. and Fraser, J. M. (1959). "Engineering Aspects of TASI," Bell Syst. Tech. J., vol. 38, no. 2, pp. 353-364, March.

Bush, Vannevar. (1945a). "As We May Think," The Atlantic Monthly, July.  http://www.theatlantic.com/magazine/print/1945/07/as-we-may-think/303881. Accessed April 14, 2013.

Bush, Vannevar. (1945b). Science the Endless Frontier: A Report to the President, July.

Butler, Joy R. (1992). "HDTV Demystified: History, Regulatory Options, & The Role of Telephone Companies," Harvard J. of Law & Technology, vol. 6, pp. 155-182, Fall.

Butzer, P. L. and Stens, R. L. (1992). "Sampling Theory for Not Necessarily Band-Limited Functions: A Historical Overview," SIAM Review, vol. 34, no. 1, pp. 40-53, Society for Industrial and Applied Mathematics, March.

Campanella, Angelo J. (2008). Antonio Meucci, The Speaking Telegraph and the First Telephone, July. http://files.meetup.com/1004848/MeucciMarch07.pdf. Accessed February 11, 2013.

Campbell, L. and Garnett, W. (1882). The Life of James Clerk Maxwell, London: Macmillan.

Campbell-Kelly, M. and Aspray, W. (2004). Computer: A History of the Information Machine, Second edition, Boulder, Colorado: Westview Press.

Caneva, Kenneth L. (1980). "Ampère, the Etherians, and the Oersted Connexion," The British J. for the History of Science, vol. 13, no. 2, pp. 121-138, British Society for the History of Science, July.

Carrington, John F. (1949). The Talking Drums of Africa, London: Carey Kingsgate.

Carson, John R. (1928). "The Reduction of Atmospheric Disturbances," Proc. IRE, vol. 16, no. 7, pp. 966-975, July.

Carson, John R. and Fry, Thornton C. (1937). "Variable Frequency Circuit Theory with Application to the Theory of Frequency-Modulation," Bell Syst. Tech. J., vol. 16, no. 4, pp. 513-540, October.

Carus, Paul. (1896). "Chinese Philosophy," The Monist, vol. 6, no. 2, pp. 188-249, Hegeler Institute, January.

Catania, B. (1994). Antonio Meucci: L'inventore e il suo tempo, vol. 1, Rome: Edizioni Seat.

Catania, B. (1996). Antonio Meucci: L'inventore e il suo tempo, vol. 2, Turin: Edizioni Seat.

Ceruzzi, P. E. (2003). A History of Modern Computing, Second edition, Cambridge, MA: MIT Press.

Chang, F., Onohara, K., and Mizuochi, T. (2010). "Forward Error Correction for 100 G Transport Networks," IEEE Comms. Magazine, vol. 48, no. 3, pp. S48-S55, March.

Chapuis, R. J. and Joel, A. E. (1990). Electronics, Computers and Telephone Switching, North-Holland Publishing Company.

Chatfield, Tom. (2011). 50 Digital Ideas You Really Need to Know, London: Quercus Publishing.

Chien, R. T. (1964). "Cyclic Decoding Procedures for Bose-Chaudhuri-Hocquenghem Codes," IEEE Trans. on Information Theory, vol. 10, no. 4, pp. 357-363, October.

Chor, B. and Rivest, R. L. (1985). "A knapsack type public key cryptosystem based on arithmetic in finite fields," Advances in Cryptology: Proceedings of Crypro '84, pp. 54-65, Berlin: Springer-Verlag.

Christensen, Chris. (2007). "Polish Mathematicians Finding Patterns in Enigma Messages," Mathematics Magazine, vol. 80, no. 4, pp. 247-273, Mathematical Association of America, October.

Clark, A. B. (1923). "Telephone Transmission Over Long Cable Circuits," Bell Syst. Tech. J., vol. 2, no. 1, pp. 67-94, January.

Clarke, Arthur C. (1945a). "Peacetime Uses for V2," Wireless World, pp. 58, February.

Clarke, Arthur C. (1945b). "Extra-Terrestrial Relays," Wireless World, pp. 305-308, October.

Clausius, R. (1859). "The Mean Path of Molecules," Philos. Mag, vol. 18, pp. 81-91.

Cohen, L. (2005). "The history of noise [on the 100th anniversary of its birth]," IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 20-45, November.

Colburn, Robert. (2004). "Oral-History: Jacob Ziv," Interview #437 for the IEEE History Center, IEEE, March 25. http://www.ieeeghn.org/wiki/index.php/Oral-History:Jacob_Ziv. Accessed November 17, 2012.

Constable, George and Somerville, Bob. (2003). A Century of Innovation: Twenty Engineering Achievements That Transformed Our Lives, Joseph Henry Press.

Cooley, J. W. (1987). "The Re-Discovery of the Fast Fourier Transform Algorithm," Mikrochimica Acta, vol. III, pp. 33-45, Springer-Verlag.

Cooley, J. W., Lewis, P. A. W., and Welch, P. D. (1967). "Historical Notes on the Fast Fourier Transform," IEEE Trans. Audio and Electroacoust, vol. 15, pp. 76-79, June.

Cooper, Charles. (2012). "Microsoft Windows Phone Store up to 120,000 apps," CNET News, October 29.  http://news.cnet.com/8301-10805_3-57542096-75/microsoft-windows-phone-store-up-to-120000-apps/. Accessed April 16, 2013.

Cooper, Mark N. (2004). Open Architecture as Communications Policy, Stanford Law School.

Cordeschi, Roberto. (2002). The Discovery of the Artificial: Behavior, Mind, and Machines Before and Beyond Cybernetics, Dordrecht, Netherlands: Springer.

Costello, D. J. and Forney, G. D. (2007). "Channel Coding: The Road to Channel Capacity," Proc. IEEE, vol. 95, no. 6, pp. 1150-1177, June.

Craft, E. B., Morehouse, L. F., and Charlesworth, H. P. (1923). "Machine Switching Telephone System for Large Metropolitan Areas," Bell Syst. Tech. J., vol. 2, no. 2, pp. 53-89, April.

Cropper, William H. (1988). "James Joule's Work in Electrochemistry and the Emergence of the First Law of Thermodynamics," Historical Studies in the Physical and Biological Sciences, vol. 19, no. 1, pp. 1-15, University of California Press.

Croxall, L. M. and Stone, R. E. (1978). "Common Channel Interoffice Signaling: No. 4 ESS Application," Bell Syst. Tech. J., vol. 57, no. 2, pp. 361-377, February.

Cutler, C. C. (1950a). "Differential Quantization of Communication Signals," US Patent 2,605,361, filed June 1950, granted July 1952.

Cutler, C. C. (1950b). "Quantized Transmission with Variable Quanta," US Patent 2,724,740, filed June 1950, granted 1955.

Dam, K. W. and Lin, H. S. (Ed.). (1996). "Cryptography's Role in Securing the Information Society," Committee to Study National Cryptography Policy, National Research Council, National Academy of Sciences.

Darrigol, O. (2007). "The acoustic origins of harmonic analysis," Archive for History of Exact Sciences, vol. 61, no. 4, pp. 343-424, Springer, July.

Darrow, K. K. (1942). "Entropy," Bell Syst. Tech. J., vol. 21, no. 1, pp. 51-74, June.

Darwin, Charles. (1893). Charles Darwin: His Life Told in an Autobiographical Chapter and in a Selected Series of His Published Letters, Ed. by Francis Darwin, New York: D. Appleton and Company.

Davidow, William. (2012). "Our Tools Are Using Us," IEEE Spectrum, vol. 49, no. 8, pp. 44-48, August.

Davies, D. W. (2001). "An Historical Study of the Beginnings of Packet Switching," British Comp. Soc. J., vol. 44, no. 3, pp. 152-162.

Davisson, L. D. (1973). "Universal Noiseless Coding," IEEE Trans. on Information Theory, vol. 19, no. 6, pp. 783-195, November.

Deloraine, E. M. (1956). "Pulse Techniques with Particular Reference to Line and Radio Communication," Electrical Communication, vol. 33, no. 3, pp. 183-194, September.

Diffie, W. and Hellman, M. E. (1976). "New Directions in Cryptography," IEEE Trans. Inform. Theory, vol. 22, pp. 644-654, November.

Diffie, W. and Hellman, M. E. (1977). "Exhaustive Cryptanalysis of the NBS Data Encryption Standard," Computer, vol. 10, no. 6, pp. 74-84, June.

Dijkstra, E. (1968). "Go To Statement Considered Harmful," Comms. of the ACM, vol. 11, no. 3, pp. 147-148, March.

Dilhac, J. M. (2009). "Edouard Branly, the Coherer, and the Branly effect," IEEE Comms. Magazine, vol. 47, no. 9, pp. 20-26, September.

Dony, R. D. (2001). "Karhunen-Loève Transform," In The Transform and Data Compression Handbook, Ed. by K. R. Rao and P. C. Yip, CRC Press LLC.

du Sautoy, Marcus. (2008). Symmetry: A Journey into the Patterns of Nature, New York: HarperCollins.

Dyer, J. W. (1980). "Pioneer Saturn," Science, New Series, vol. 207, no. 4429, pp. 400-401, American Association for the Advancement of Science, January.

Eccles, W. H. and Jordan, F. W. (1918). "Improvements in Ionic Relays," GB Patent 148,582, filed June 1918, granted August 1920.

Einstein, A. (1954). Ideas and Opinions, Crown Publishers.

Einstein, A. (1956). Investigations on the Theory of the Brownian Movement, Transl. A. D. Cowper, Dover Publications.

Ekspong, G. (Ed.). (2002). Nobel Lectures, Physics 1996-2000, Singapore: World Scientific.

Erdman, William W. (1993). "Wireless Communications: A Decade of Progress," IEEE Comms. Magazine, pp. 48-51, December.

Espenschied, Lloyd. (1922). "Application of Radio to Wire Transmission Engineering," Bell Syst. Tech. J., vol. 1, no. 2, pp. 117-141, October.

Evenson, A. Edward. (2000). The Telephone Patent Conspiracy of 1876, North Carolina: Mcfarland.

Eyre, Jennifer and Bier, Jeff. (2000). "The Evolution of DSP Processors," White Paper, Berkeley Design Technology, Inc.

Fahie, J. J. (1884). A history of electric telegraphy, to the year 1837, E. & F. N. Spon.

Fahie, J. J. (1899). "Prof. D.E. Hughes's Researches in Wireless Telegraphy," The Electrician, pp. 40-41, May 5.

Falconer, D. (2011). "A history of electric telegraphy, to the year 1837," IEEE Comms. Magazine, vol. 49, no. 10, pp. 42-50, October.

Field, Alexander J. (1994). "French Optical Telegraphy, 1793-1855: Hardware, Software, Administration," Technology & Culture, vol. 35, no. 2, pp. 315-347, Society for the History of Technology, April.

Fluhrer, S., Mantin, I., and Shamir, A. (2001). "Weaknesses in the key scheduling algorithm of RC4," In Selected Areas in Cryptography, Lecture Notes in Computer Science, Serge Vaudenay and Amr M. Youssef, ed, vol. 2259, pp. 1-24, Springer.

Fourier, Joseph. (1878). The Analytical Theory of Heat, Transl. by Alexander Freeman, London: Cambridge University Press.

Frenkiel, R. (2010). "Creating cellular: A history of the AMPS project (1971-1983)," IEEE Comms. Magazine, vol. 48, no. 9, pp. 14-24, September.

Frenkiel, Richard H. (2009). Cellular Dreams and Cordless Nightmares: Life at Bell Laboratories in Interesting Times. http://www.winlab.rutgers.edu/~frenkiel/dreams.

Fry, Thornton C. (1925). "The Theory of the Schroteffekt," J. Franklin Inst, vol. 99, pp. 203-320.

Galambos, L. (1992). "Theodore N. Vail and the Role of Innovation in the Modern Bell System," The Business History Review, vol. 66, no. 1, pp. 95-126, President and Fellows of Harvard College, Spring.

Gallager, Robert G. (2008). Peter Elias 1923-2001: A Biographical Memoir, National Academy of Sciences.

Garber, E. W. (1970). "Clausius and Maxwell's Kinetic Theory of Gases," Historical Studies in the Physical Sciences, vol. 2, pp. 299-319, University of California Press.

Gardner, Martin. (1977). "Mathematical Games-A New Kind of Cipher That Would Take Millions of Years to Break," Scientific American, pp. 120-124.

Gass, Frederick. (1986). "Solving a Jules Verne Cryptogram," Mathematics Magazine, vol. 59, no. 1, pp. 3-11, Mathematical Association of America, February.

Gay, Joshua. (2002). Free Software, Free Society: Selected Essays of Richard M. Stallman, GNU Press.

Glaser, Anton. (1971). History of Binary and Other Nondecimal Numeration, Tomash Publishers. http://www.eipiphiny.org/books/history-of-binary.pdf. Accessed December 4, 2012.

Gleick, J. (2011). The Information, London: Fourth Estate.

Golay, M. J. E. (1949). "Notes on Digital Coding," Proc. IRE, vol. 37, pp. 657, June.

Goldstein, Andrew. (1992). "Oral-History: John Pierce," Interview #141 for the IEEE History Center, IEEE, August 19-21. http://www.ieeeghn.org/wiki/index.php/Oral-History:John_Pierce. Accessed April 14, 2013.

Goldstein, Andrew. (1995). "Oral-History: David Forney," Interview #254 for the IEEE History Center, IEEE, May 10.  http://www.ieeeghn.org/wiki/index.php/Oral-History:G._David_Forney. Accessed April 14, 2013.

Goldstein, Rebecca N. (2010). "What's in a Name?," Chapter 5 in Seeing Further: The Story of Science and the Royal Society, Ed. by Bill Bryson, London: Harper Press.

Golomb, S. W. (1966). "Run-Length Encodings," IEEE Trans. on Information Theory, pp. 399-401, July.

Goodall, W. M. (1947). "Telephony by Pulse Code Modulation," Bell Syst. Tech. J., vol. 26, no. 3, pp. 395-409, July.

Gorman, Paul A. (1969). "Century One. Prologue," Newcomen Address, Newcomen Society, April 17.

Goth, Greg (Ed.). (2008). "Software Leaders Cast Their Votes," IEEE Software, vol. 25, no. 6, pp. 7, 9, 11, 13, November-December.

Gray, A. (1921). Absolute Measurements in Electricity and Magnetism, Second edition, Macmillan.

Gray, F. (1947). "Pulse Code Modulation," US Patent 2,632,058, filed November 1947, granted March 1953.

Gray, Robert M. (2005). "The 1974 origins of VoIP," IEEE Signal Processing Magazine, vol. 22, no. 4, pp. 87-90, July.

Gray, Robert M. (2009a). "A Survey of Linear Predictive Coding: Part I of Linear Predictive Coding and the Internet Protocol," Foundations and Trends in Signal Processing, vol. 3, no. 3, pp. 153-2002, now publishers.

Gray, Robert M. (2009b). "A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol," Foundations and Trends in Signal Processing, vol. 3, no. 4, pp. 203-303, now publishers.

Grieg, D. D., Metzger, S., and Waer, R. (1948). "Considerations of Moon-Relay Communication," Proc. IRE, vol. 36, no. 5, pp. 652-663, May.

Guizzo, E. M. (2003). The Essential Message: Claude Shannon and the Making of Information Theory, MIT, September.

Haigh, T. (2002). "Software in the 1960s as Concept, Service, and Product," IEEE Annals of the History of Computing, pp. 5-13, March.

Haigh, Thomas. (2010). "Crisis, What Crisis? Reconsidering the Software Crisis of the 1960s and the Origins of Software Engineering. Draft Version," To be presented at 2nd Inventing Europe/Tensions Of Europe Conference, Sofia, University of Wisconsin-Milwaukee, June.

Hall, Charles F. (1975). "Pioneer 10 and Pioneer 11," Science, New Series, vol. 188, no. 4187, pp. 445-446, American Association for the Advancement of Science, May.

Hamming, R. W. (1950). "Error Detecting and Error Correcting Codes," Bell Syst. Tech. J., vol. 29, no. 2, pp. 147-160, April.

Hartley, R. V. L. (1928). "Transmission of Information," Bell Syst. Tech. J., vol. 7, no. 3, pp. 535-563, July.

Hayes, J. (2008a). "Telecommunication Memories: 75, 50, and 25 Years Ago," IEEE Comms. Magazine, vol. 46, no. 8, pp. 26-29, August.

Hayes, J. (2008b). "A history of transatlantic cables," IEEE Comms. Magazine, vol. 46, no. 9, pp. 42-48, September.

Haykin, S. (2001). Communication Systems, Fourth edition, New York: John Wiley & Sons.

Heidemann, M. T., Johnson, D. H., and Burrus, C. S. (1985). Gauss and the History of the Fast Fourier Transform, Communicated by C. Truesdel, pp. 265-277.

Heising, R. A. (1925). "Production of Single Sideband for Trans-Atlantic Radio Telephony," Proc. IRE, vol. 13, no. 3, pp. 291-312, June.

Hellman, Martin E. (1978). "An Overview of Public Key Cryptography," IEEE Comms. Magazine, vol. 16, no. 6, pp. 24-32, November.

Henry, Joseph. (1831). "On the application of the principle of the galvanic multiplier to electro-magnetic apparatus, and also to the developement of great magnetic power in soft Iron, with a small galvanic element," American J. of Science & Arts, pp. 400-408, April.

Hertz, Heinrich. (1893). Electric Waves, Transl. D. E. Jones, New York: Macmillan and Co.

Hertz, Heinrich. (1896). Miscellaneous Papers, Translated by D.E. Jones and G.A. Schott, New York: Macmillan & Co.

Hilbert, M. and Lopez, P. (2011). "The World's Technological Capacity to Store, Communicate, and Compute Information," Science, vol. 332, pp. 60-65, April.

Hill, Lester S. (1929). "Cryptography in an Algebraic Alphabet," The American Mathematical Monthly, vol. 36, no. 6, pp. 306-312, Mathematical Association of America, June-July.

Hoare, C. A. R. (1961). "Algorithm 64: Quicksort," Comms. of the ACM, vol. 4, no. 7, pp. 321, July.

Hochfelder, David. (2010). "Two controversies in the early history of the telegraph," IEEE Comms. Magazine, vol. 48, no. 2, pp. 28-32, February.

Hochfelder, David. (1999a). "Oral-History: Vinton Cerf," Interview #355 for the IEEE History Center, IEEE, May 17.

Hochfelder, David. (1999b). "Oral-History: Robert Lucky," Interview #361 for the IEEE History Center, IEEE, September 10.

Hoddeson, Lillian. (1981). "The Discovery of the Point-Contact Transistor," Historical Studies in the Physical Sciences, vol. 12, no. 1, pp. 41-76, University of California Press.

Hollingdale, S. H. and Tootill, G. C. (1965). Electronic Computers, Penguin Books.

Holvast, J. (2009). "History of Privacy," In The Future of Identity. V. Matyas et al. (Ed.), IFIP AICT 298, pp. 13-42, International Federation for Information Processing.

Holzmann, G. J. and Pehrson, B. (1995). The Early History of Data Networks, IEEE Computer Society Press.

Hong, Sungook. (1994). "Marconi and the Maxwellians: The Origins of Wireless Telegraphy Revisited," Technology & Culture, vol. 35, no. 4, pp. 717-749, Society for the History of Technology, October.

Hooke, Robert. (1726). "Shewing a way how to communicate one's mind at great distances," Discourse to the Royal Society, May 21, 1684, in Philosophical experiments and observations of Robert Hooke and other eminent virtuoso's in his time, Robert Hooke, pp. 142-150, London: William Derham, Innys, May 21.

Hoover, C. W., Staehler, R. E., and Ketchledge, R. W. (1958). "Fundamental Concepts in the Design of the Flying Spot Store," Bell Syst. Tech. J., vol. 37, no. 5, pp. 1161-1194, September.

Horton, A. W. and Vaughan, H. E. (1955). "Transmission of Digital Information over Telephone Circuits," Bell Syst. Tech. J., vol. 34, no. 3, pp. 511-528, May.

Huffman, D. A. (1952). "A Method for the Construction of Minimum-Redundancy Codes," Proc. IRE, vol. 40, no. 9, pp. 1098-1101, September.

Huffman, D. A. (1954a). "The Synthesis of Sequential Switching Circuits: Part I," J. of the Franklin Institute, vol. 257, no. 3, pp. 161-190, March.

Huffman, D. A. (1954b). "The Synthesis of Sequential Switching Circuits: Part II," J. of the Franklin Institute, vol. 257, no. 4, pp. 275-303, April.

Hutchinson, Jamie. (2001). "Culture, Communication, and an Information Age Madonna," IEEE Professional Communication Society Newsletter, vol. 45, no. 3, pp. 1, 5-7, May-June.

Huurdeman, Anton A. (2003). The Worldwide History of Telecommunications, Wiley-IEEE Press.

Ingerman, Peter Z. (1967). "Panini-Backus Form Suggested," Comms. of the ACM, vol. 10, no. 3, pp. 137, March.

Ismail, I. A., Mohammed, A., and Hossam, D. (2006). "How to repair the Hill cipher," J. of Zhejiang University SCIENCE A, vol. 7, no. 12, pp. 2022-2030.

Itakura, F. and Saito, S. (1968). "Analysis Synthesis Telephony based on the Maximum Likelihood Method," Proceedings of the Sixth International Congress on Acoustics. Tokyo. Paper C-5-5, pp. 17-20.

Jackson, Dugald C. (1938). "Michael Idvorsky Pupin (1858-1935)," Proceedings of the American Academy of Arts and Sciences, vol. 72, no. 10.

Jansky, Kurt. (1933). "Electrical Disturbances Apparently of Extraterrestrial Origin," Proc. IRE, vol. 21, no. 10, pp. 1387-1398, October.

Jayant, Nikil. (1993). "High Quality Networking of Audio-Visual Information," IEEE Comms. Magazine, vol. 31, no. 9, pp. 84-95, September.

Jenkin, F. (1862). "Experimental Researches on the Transmission of Electric Signals Through Submarine Cables. Part I. Laws of Transmission through Various Lengths of One Cable," Philos. Trans. of the Royal Society of London, vol. 152, pp. 987-1017, Royal Society.

Joel, A. (2002). "Telecommunications and the IEEE Communications Society," IEEE Comms. Magazine, vol. 40, no. 5, pp. 6-8, 10, 12, 14, 162-167, May.

Joel, A. E. (1956a). "An Experimental Remote Controlled Line Concentrator," Bell Syst. Tech. J., vol. 35, no. 2, pp. 249-293, March.

Joel, A. E. (1956b). "Electronics in Telephone Switching Systems," Bell Syst. Tech. J., vol. 35, no. 5, pp. 991-1018, September.

Joel, A. E. (1984). "The Past 100 Years in Telecommunications Switching," IEEE Comms. Magazine, vol. 22, no. 5, pp. 64-70, May.

Johnson, J. B. (1928). "Thermal agitation of electricity in conductors," Phys. Rev, vol. 32, pp. 97-109.

Johnson, J. B. (1971). "Electronic noise: the first two decades," IEEE Spectrum, vol. 8, pp. 42-46, February.

Johnson, M. E. (1994). "The Uncertain Future of Computer Software Users' Rights in the Aftermath of MAI Systems," Duke Law Journal, vol. 44, pp. 327-356.

Kahane, J. P. and Rieusset, P. G. L. (1995). Fourier Series and Wavelets. Part I-Fourier Series, Luxembourg: Gordon and Breach. http://portail.mathdoc.fr/PMO/PDF/K_KAHANE-70.pdf. Accessed March 26, 2013.

Kahn, David. (1967). The Codebreakers: The Story of Secret Writing, New York: Macmillan.

Kahn, David. (1979). "Cryptology Goes Public," Foreign Affairs, vol. 58, no. 1, pp. 141-159, Council on Foreign Relations, Fall.

Kahn, David. (1980a). "On the Origin of Polyalphabetic Substitution," Isis, vol. 71, no. 1, pp. 122-127, History of Science Society, March.

Kahn, David. (1980b). "Codebreaking in World Wars I and II: The Major Successes and Failures, Their Causes and Their Effects," The Historical J., vol. 23, no. 3, pp. 617-639, Cambridge University Press, September.

Kay, Alan C. (1993). "The Early History of Smalltalk," ACM SIGPLAN Notices, vol. 28, no. 3, pp. 69-95, March.

Keister, W., Ketchledge, R. W., and Vaughan, H. E. (1964). "No: 1 ESS System Organization and Objectives," Bell Syst. Tech. J., vol. 43, no. 5, pp. 1831-1844, September.

Kemper, John D. (1967). The Engineer and His Profession, Holt, Rinehart & Winston.

Kerckhoffs, Auguste. (1883). "La Cryptographie Militaire," Journal des Sciences Militaires, vol. IX, pp. 5-38, 161-191, January-February.

Kernighan, Brian. (2008). "Sometimes the Old Ways are Best," IEEE Software, vol. 25, no. 6, pp. 18-19, November-December.

Ketterling, Hans-Peter A. (2004). Introduction to Digital Professional Mobile Radio, Norwood, MA: Artech House.

Kilby, Jack S. (1967). "Invention of the Integrated Circuit," IEEE Trans. Electron Devices, vol. 23, no. 7, pp. 648-654.

Kirstein, P. (2009). "The early history of packet switching in the UK," IEEE Comms. Magazine, vol. 47, no. 2, pp. 18-26, February.

Klatt, D. H. (1987). "Review of text-to-speech coversion for English," J. of Acoustical Society of America, September.

Kleiner, I. (2001). "History of the Infinitely Small and the Infinitely Large in Calculus," Educational Studies in Mathematics, vol. 48, no. 2/3, pp. 137-174, Springer.

Kline, M. (1972). Mathematical Thought from Ancient to Modern Times, Oxford University Press.

Kline, Ronald. (1993). "Harold Black and the Negative-Feedback Amplifier," IEEE Control Systems, pp. 82-85, August.

Kolmogorov, A. N. (1950). Foundations of the Theory of Probability, Transl. by N. Morrison from Grundbegriffe der Wahrscheinlichkeitsrechnung (1933), New York: Chelsea.

Kotelnikov, V. A. (1947). The theory of optimum noise immunity, Trans. by R.A. Silverman, PhD Thesis. Molotov Energy Institute. Moscow, McGraw-Hill.

Kraft, L. G. (1949). A Device for Quantizing, Grouping and Coding Amplitude Modulated Pulses, MS Thesis, Electrical Engineering Dept., M.I.T.

Kucar, Andy D. (1991). "Mobile Radio: An Overview," IEEE Comms. Magazine, pp. 72-85, November.

Langdon, G. G. (1984). "An Introduction to Arithmetic Coding," IBM J. Res. Develop, vol. 28, no. 2, pp. 135-149, March.

Laplace, P. S. (1902). A Philosophical Essay on Probabilities, Trans. from the sixth edition by F. W. Truscott and F. L. Emory, John Wiley & Sons.

Lebert, M. (2008). Project Gutenberg (1971-2008), Ebook #27045, Project Gutenberg, October.

Lee, Jan. (1998). "Richard Wesley Hamming: 1915-1998," IEEE Annals of the History of Computing, vol. 20, no. 2, pp. 60-62, April-June.

Levy, Markus. (2005). "The History of The ARM Architecture: From Inception to IPO," ARM IQ, vol. 4, no. 1, pp. 14-19, March.

Levy, Steven. (2012). "Google Throws Open Doors to Its Top-Secret Data Center," Wired, October 17.  http://www.wired.com/wiredenterprise/2012/10/ff-inside-google-data-center/all/. Accessed April 22, 2013.

Lewis, E. E. (2004). Masterworks of Technology-The Story of Creative Engineering, Architecture and Design, New York: Prometheus Books.

Llewellyn, F. B. (1930). "A Study of Noise in Vacuum Tubes and Attached Circuits," Proc. IRE, vol. 18, no. 2, pp. 243-265, February.

Lodge, Oliver. (1931). "A Retrospect of Wireless Communication," The Scientific Monthly, vol. 33, no. 6, pp. 512-521, American Association for the Advancement of Science, December.

Luciano, Dennis and Prichett, Gordon. (1987). "Cryptology: From Caesar Ciphers to Public-Key Cryptosystems," The College Mathematics J., vol. 18, no. 1, pp. 2-17, Mathematical Association of America, January.

Luke, H. D. (1999). "The origins of the sampling theorem," IEEE Comms. Magazine, vol. 37, no. 4, pp. 106-108, April.

MacDonald, V. H. (1979). "Advanced Mobile Phone Service: Cellular Concept," Bell Syst. Tech. J., vol. 58, no. 1, pp. 15-41, January.

MacGregor-Morris, J. T. (1955). "Sir Ambrose Fleming (Jubilee of the Valve)," Notes and Records of the Royal Society of London, vol. 11, no. 2, pp. 134-144, Royal Society, March.

Mahoney, Michael S. (1990). "The Roots of Software Engineering," CWI Quarterly, vol. 3, no. 4, pp. 325-334.  http://info.lnpu.edu.cn/website/jpkc/rjgc/ywzl/THE ROOTS OF SOFTWARE ENGINEERING.pdf. Accessed November 29, 2012.

Malthaner, W. A. and Vaughan, H. Earle. (1952). "An Experimental Electronically Controlled Automatic Switching System," Bell Syst. Tech. J., vol. 31, no. 3, pp. 443-468, May.

Marland, E. A. (1962). "British and American Contributions to Electrical Communications," The British J. for the History of Science, vol. 1, no. 1, pp. 31-48, British Society for the History of Science, June.

Martersteck, K. E. (1981). "No. 4 ESS: Prologue," Bell Syst. Tech. J., vol. 60, no. 6, pp. 1041-1048, July-August.

Massey, J. L. (1992). "Deep-Space Communications and Coding: A Marriage Made in Heaven," In Advanced Methods for Satellite and Deep Space Communications, Ed. by J. Hagenauer, Heidelberg and New York: Springer.

Maxwell, J. C. (1873). A Treatise on Electricity and Magnetism, Oxford: Clarendon Press.

McAfee, A. and Brynjolfsson, E. (2012). "Big Data: The Management Revolution," Harvard Business Review, vol. 90, no. 10, pp. 60-68, October.

McEliece, R. J. (1978). "A Public-Key Cryptosystem Based On Algebraic Coding Theory," JPL Deep Space Network Progress Report 42-44, pp. 114-116, January-February.

McMillan, B. (1956). "Two Inequalities Implied by Unique Decipherability," IRE Trans. Inform. Theory, vol. 2, no. 4, pp. 115-116, December.

Meacham, L. A. and Peterson, E. (1948). "An Experimental Multichannel Pulse Code Modulation System of Toll Quality," Bell Syst. Tech. J., vol. 27, no. 1, pp. 1-43, January.

Meacham, L. A., Power, J. R., and West, F. (1958). "Tone Ringing and Pushbutton Calling," Bell Syst. Tech. J., vol. 37, no. 2, pp. 339-360, March.

Meddeb, Aref. (2010). "Internet QoS: Pieces of the Puzzle," IEEE Comms. Magazine, vol. 48, no. 1, pp. 86-94, January.

Menabrea, L. F. (1842). Sketch of The Analystical Engine Invented by Charles Babbage, Bibliothèque Universelle de Genève, No. 82, With notes upon the Memoir by the Translator Ada Augustusta, Countess of Lovelace, October. http://www.fourmilab.ch/babbage/contents.html. Accessed November 12, 2012.

Merkle, R. C. (1978). "Secure Communication over an Insecure Channel," Comms. of the ACM, vol. 21, no. 4, pp. 294-299, April.

Messerschmitt, David G. (1997). "Introduction to the Classic Paper by R. A. Heising," Proc. IEEE, vol. 85, no. 5, pp. 747-751, May.

Metcalfe, Robert M. and Boggs, David R. (1980). "Ethernet: Distributed Packet Switching for Local Computer Networks," Reprint of CSL-75-7 May 1975, Xerox Palo Alto Research Center, February.

Metz, Cade. (2012). "Google's Top Five Data Center Secrets (That Are Still Secret)," Wired, October 18.  http://www.wired.com/wiredenterprise/2012/10/google-data-center-secrets/. Accessed April 22, 2013.

Mills, Harlan D. (1977). "Software Engineering," Science, New Series. Electronics Issue, vol. 195, no. 4283, pp. 1199-1205, American Association for the Advancement of Science, March.

Milovanovic, B. D. (2008). Pupin's Theoretical and Experimental Work on Loaded Telephone Lines Accompanied by Modern Full Wave Matrix Approach. http://2008.telfor.rs/files/specsec/H_3.pdf. Accessed October 3, 2012.

Mindell, D. A. (2000). "Opening Black's Box: Rethinking Feedback's Myth of Origin," Technology & Culture, vol. 41, no. 3, pp. 405-434, Society for the History of Technology, July.

Minsky, Marvin. (1986). "The Society of Mind," Whole Earth Review, Summer.

Mofenson, Jack. (1946). Radar Echoes from the Moon, Evans Signal Laboratory, January. http://www.k3pgp.org/1946eme.htm. Accessed January 19, 2013.

Molina, E. C. (1906). "Translating and Selecting System," US Patent 1,083,456, filed April 1906, granted January 1914.

Moore, Gordon E. (1965). "Cramming more components onto integrated circuits," Electronics, vol. 38, no. 8, pp. 114-117, April.

Morii, M. and Kasahara, M. (1988). "New public key cryptosystem using discrete logarithms over GF(P)," IEICE Trans, vol. J71-D, no. 2, pp. 448-453, February.

Morton, D. (1996). "Jay W. Lathrop: An Interview," Interview #265 for the Center for the History of Electrical Engineering, IEEE, May 1. http://www.ieeeghn.org/wiki/index.php/Oral-History:Jay_Lathrop. Accessed October 23, 2012.

Mumford, W. W. (1949). "A broadband microwave noise source," Bell Syst. Tech. J., vol. 28, no. 4, pp. 608-618, October.

Murakami, Y. and Nasako, T. (2007). "Knapsack public-key cryptosystem using Chinese remainder theorem," IACR Cryptology ePrint Archive 2007: 107. http://eprint.iacr.org/2007/107.pdf. Accessed April 14, 2013.

Myers, Robert and Dorbuck, Tony. (Coordinators). (1977). The Radio Amateur's Handbook, Fifty-Fourth edition, American Radio Relay League.

Nakashima, A. (1935). "A Realization Theory for Relay Circuits," J. Inst. Electrical Communication Engineers of Japan, September.

Narasimhan, T. N. (1999). "Fourier's Heat Conduction Equation: History, Influence and Connections," Reviews of Geophysics, vol. 37, no. 1, pp. 151-172, American Geophysical Union, February.

Naughton, John. (2001). A Brief History of the Future-From Radio Days to Internet Years in a Lifetime, New York: Overlook Press.

Nebeker, F. (2004). "Oral-History: Gottfried Ungerboeck," Interview #445 for the IEEE History Center, IEEE, July 6.

Negroponte, Nicholas. (1995). Being Digital, New York: Alfred A. Knopf.

Newman, James R. (1961). Science and Sensibility, vol. 1, New York: Simon and Schuster.

Northover, M., Kourie, D. G., Boake, A., Gruner, S., and Northover, A. (2008). "Towards a Philosophy of Software Development: 40 Years after the Birth of Software Engineering," J. for General Philosophy of Science, vol. 39, no. 1, pp. 85-113, Springer, September.

Norton, John D. (1996). "Are Thought Experiments Just What You Thought?," Canadian J. of Philosophy, vol. 26, no. 3, pp. 333-366, September.

Norwine, A. C. and Murphy, O. J. (1938). "Characteristic Time Intervals in Telephonic Conversation," Bell Syst. Tech. J., vol. 17, no. 2, pp. 281-291, April.

Nyquist, H. (1924). "Certain Factors Affecting Telegraph Speed," Trans. AIEE, vol. XLIII, no. XLIII, pp. 412-422, January.

Nyquist, H. (2002). "Certain Topics in Telegraph Transmission Theory," Proc. IEEE, vol. 90, no. 2, pp. 280-305, February.

Nyquist, H. (1928a). "Certain Topics in Telegraph Transmission Theory," Trans. AIEE, vol. 47, no. 2, pp. 617-644, April.

Nyquist, H. (1928b). "Thermal agitation of electric charge in conductors," Phys. Rev, vol. 32, no. 1, pp. 110-113, July.

Oersted, H. C. (1806). "Sur la propagation de l'electricite," J. de physique, de chimie, d'histoire naturelle et des arts, vol. 62, pp. 369-375.

Oersted, John Christian. (1820). "Experiments on the Effect of a Current of Electricity on the Magnetic Needle," Annals of Philosophy, vol. XVI, pp. 273-276.

Oersted, John Christian. (1932). "Magnetism from the Electric Current: A Classic of Science," The Science News-Letter, vol. 21, no. 567, pp. 118-120, Society for Science & the Public, February.

Okwit, S. (1984). "An Historical View of the Evolution of Low-Noise Concepts and Techniques," IEEE Trans. on Microwave Theory and Techniques, vol. 32, no. 9, pp. 1068-1082, September.

Oliver, B. M., Pierce, J. R., and Shannon, C. E. (1948). "The Philosophy of PCM," Proc. IRE, vol. 36, no. 1, pp. 1324-1331, November.

Partridge, Derek. (1981). "Information Theory and Redundancy," Philosophy of Science, vol. 48, no. 2, pp. 308-316, Philosophy of Science Association, June.

Pasupathy, S. (1979). "Minimum Shift Keying: A Spectrally Efficient Modulation," IEEE Comms. Magazine, vol. 17, no. 4, pp. 14-22, July.

Patel, Prachi. (2012). "The $10,000 College Degree," IEEE Spectrum, vol. 49, no. 8, pp. 22, August.

Pelton, J. N. (2010). "The Start of Commercial Satellite Communications," IEEE Comms. Magazine, vol. 48, no. 3, pp. 24-31, March.

Pesic, Peter. (1997). "Secrets, Symbols, and Systems: Parallels between Cryptanalysis and Algebra, 1580-1700," Isis, vol. 88, no. 4, pp. 674-692, History of Science Society, December.

Peterson, W. W. (1960). "Encoding and error-correction procedures for the Bose-Chaudhuri codes," IRE Trans. Inform. Theory, vol. 6, no. 4, pp. 459-470, September.

Peterson, W. W. (1961). Error-Correcting Codes, Cambridge, MA: MIT Press.

Petroski, H. (2010). The Essential Engineer—Why Science Alone Will Not Solve Our Global Problems, New York: Alfred A. Knopf.

Pickard, G. W. (1920). "Static Elimination by Directional Reception," Proc. IRE, vol. 8, no. 5, pp. 358-394, October.

Pierce, J. R. (1990). "Telstar, A History," From Vintage Electrics, Southwest Museum of Engineering, Communications and Computation. http://www.smecc.org/john_pierce1.htm. Accessed March 4, 2013.

Pierce, J. R. (2007). ECHO-America's First Communications Satellite, Southwest Museum of Engineering, Communications and Computation. http://www.smecc.org/john_pierce___echoredo.htm. Accessed March 6, 2013.

Plotnitsky, A. (2011). "On the Reasonable and Unreasonable Effectiveness of Mathematics in Classical and Quantum Physics," Foundations of Physics, vol. 41, no. 3, pp. 466-491, Springer.

Plutte, Jon. (2011a). Ralph Merkle: 2011 Fellows Interview, Public-Key cryptography, Computer History Museum, March 11.

Plutte, Jon. (2011b). Martin Hellman: 2011 Fellows Interview, Public-Key cryptography, Computer History Museum, March 11.

Poole, Ian. (2006). Cellular communications explained: from basics to 3G, Oxford: Elsevier/Newnes.

Prasad, K. V. (2009). "The Hardware-Software Tango," Proc. IEEE, vol. 97, no. 7, pp. 1159-1160, July.

Proakis, J. G. (2001). "Digital Communications," Electrical Engineering Series, Fourth edition, McGraw-Hill.

Puente, J. G. (2010). "The emergence of commercial digital satellite communications," IEEE Comms. Magazine, vol. 48, no. 7, pp. 16-20, July.

Pupin, M. I. (1899). "Art of Reducing Attenuation of Electrical Waves and Apparatus Therefor," US Patent 652,230, filed December 1899, granted June 1900.

Purvis, M. B., Deverall, G. V., and Herriott, D. R. (1959). "Optics and Photography in the Flying Spot Store," Bell Syst. Tech. J., vol. 38, no. 2, pp. 403-424, March.

Rainey, P. M. (1921). "Facsimile Telegraph System," US Patent 1,608,527, filed July 1921, granted November 1926.

Rao, J. R., Rohatgi, P., Scherzer, H., and Tinguely, S. (2002). "Partitioning Attacks: Or How to Rapidly Clone Some GSM Cards," Proceedings of the 2002 IEEE Symposium on Security and Privacy, no. 31-41, IEEE Computer Society.

Rayleigh, J. W. S. (1876). "On the Application of the Principle of Reciprocity to Acoustics," Proc. of the Royal Society of London, vol. 25, no. 171-178, pp. 118-122, January.

Rees, Mina. (1987). Warren Weaver (1894-1978): A Biographical Memoir, National Academy of Sciences.

Reeves, Alec H. (1965). "The Past, Present and Future of PCM," IEEE Spectrum, pp. 58-63, May. http://tkhf.adaxas.net/cd1/Reeves2.pdf. Accessed October 14, 2012.

Reis, Richard M. (1999). "Making Science Understandable to a Broad Audience," The Chronical of Higher Education, July 23. http://chronicle.com/article/Making-Science-Understandab/45661. Accessed November 24, 2012.

Ring, D. H. (1947). "Mobile Telephony-Wide Area Coverage," Memorandum. Case 20564, Bell Labs., December. http://www.privateline.com/archive/Ringcellreport1947.pdf. Accessed February 19, 2013.

Rissanen, J. and Langdon, G. G. (1979). "Arithmetic Coding," IBM J. Res. Develop, vol. 23, pp. 149-162.

Rissanen, J. J. (1976). "Generalized Kraft Inequality and Arithmetic Coding," IBM J. Res. Develop, vol. 20, pp. 198-203.

Ritchie, A. E. and Tuomenoksa, L. S. (1977). "No. 4 ESS: System Objectives and Organization," Bell Syst. Tech. J., vol. 56, no. 7, pp. 1017-1027, September.

Ritchie, D. M. and Thompson, K. (1978). "The UNIX Time-Sharing System," Bell Syst. Tech. J., vol. 57, no. 6, pp. 1905-1929, August.

Ronalds, Francis. (1823). Descriptions of an Electrical Telegraph, and of Some Other Electrical Apparatus, London: R. Hunter.

Russo, P. A., Bechard, K., Brooks, E., Corn, R. L., Gove, R., Honig, W. L., and Young, J. (1993). "IN rollout in the United States," IEEE Comms. Magazine, vol. 31, no. 3, pp. 56-63, March.

Ryan, James A. (1996). "Leibniz' Binary System and Shao Yong's Yijing," Philosophy East and West, vol. 46, no. 1, pp. 59-90, January.

Samuelson, Pamela. (2011). "The Uneasy Case for Software Copyrights Revisited," George Washington Law Review, vol. 79, no. 6, pp. 1746-1782, September.

Sarton, George and Oersted, John Christian. (1928). "The Foundation of Electromagnetism (1820)," Isis, vol. 10, no. 2, pp. 435, 437-444, June.

Schiavo, Giovanni E. (1958). Antonio Meucco, Inventor of the Telephone, New York: Vigo Press.

Schick, F. B. (1963). "Space Law and Communication Satellites," The Western Political Quarterly, vol. 16, no. 1, pp. 14-33, Western Political Science Association, March.

Schottky, W. (1926). "On the Origin of the Super-Heterodyne Method," Proc. IRE, vol. 14, no. 5, pp. 695-698, October.

Schwartz, M. (2008). "The origins of carrier multiplexing: Major George Owen Squier and AT&T," IEEE Comms. Magazine, vol. 46, no. 5, pp. 20-24, May.

Schwartz, M. (2009a). "Carrier-wave telephony over power lines: Early history," IEEE Comms. Magazine, vol. 47, no. 1, pp. 14-18, January.

Schwartz, M. (2009b). "Armstrong's invention of noise-suppressing FM," IEEE Comms. Magazine, vol. 47, no. 4, pp. 20-23, April.

Schwartz, M. (2009c). "Improving the noise performance of communication systems: radio and telephony developments of the 1920s," IEEE Comms. Magazine, vol. 47, no. 12, pp. 16-20, December.

Scudder, F. J. and Reynolds, J. N. (1939). "Crossbar Dial Telephone Switching System," Bell Syst. Tech. J., vol. 18, no. 1, pp. 76-118, January.

Seckler, H. N. and Yostpille, J. J. (1958). "Functional Design of a Stored-Program Electronic Switching System," Bell Syst. Tech. J., vol. 37, no. 6, pp. 1327-1382, November.

Sedgewick, Robert. (1975). Quicksort, PhD Thesis, Stanford University.

Sedgewick, Robert. (1978). "Implementing Quicksort Programs," Comms. of the ACM, vol. 21, no. 10, pp. 847-857, October.

Sengupta, D. L. and Sarkar, T. K. (2003). "Maxwell, Hertz, the Maxwellians, and the Early History of Electromagnetic Waves," IEEE Antennas and Propagation Magazine, vol. 45, no. 2, pp. 13-19, April.

Shafer, G. and Vovk, V. (2005). "The origins and legacy of Kolmogorov's Grundbegriffe," The Game-Theoretic Probability and Finance Project, University of London, October.

Shahvar, Soli. (2007). "Iron Poles, Wooden Poles: The Electric Telegraph and the Ottoman: Iranian Boundary Conflict, 1863-1865," British J. of Middle Eastern Studies, vol. 34, no. 1, pp. 23-42, British Society for Middle Eastern Studies, April.

Shamir, A. (1982). "A polynomial time algorithm for breaking the basic Merkle-Hellman cryptosystem," Proc. 23rd Annu. Symp. Foundations of Comput. Sci, pp. 145-152, November.

Shankar, Priti. (1997). "Error Correcting Codes: 3. Reed Solomon Codes," Resonance, pp. 33-47, March.

Shannon, C. E. (1938). "A Symbolic Analysis of Relay and Switching Circuits," Trans. AIEE, vol. 57, no. 12, pp. 713-723, December.

Shannon, C. E. (1951). "Prediction and Entropy of Printed English," Bell Syst. Tech. J., vol. 30, no. 1, pp. 50-64, January.

Shannon, C. E. (1956). "The Bandwagon," IRE Trans. Inform. Theory, vol. 2, no. 1, pp. 3, March.

Shannon, C. E. (1948a). "A Mathematical Theory of Communication. Parts I, II," Bell Syst. Tech. J., vol. 27, no. 3, pp. 379-423, July.

Shannon, C. E. (1948b). "A Mathematical Theory of Communication. Parts III-V," Bell Syst. Tech. J., vol. 27, no. 4, pp. 623-656, October.

Shannon, C. E. (1949a). "Communication in the Presence of Noise," Proc. IRE, vol. 37, no. 1, pp. 10-21, January.

Shannon, C. E. (1949b). "Communication Theory of Secrecy Systems," Bell Syst. Tech. J., vol. 28, no. 4, pp. 656-715, October.

Shannon, C. E. (1959a). "Probability of Error for Optimal Codes in a Gaussian Channel," Bell Syst. Tech. J., vol. 38, no. 3, pp. 611-656, May.

Shannon, C. E. (1959b). "Coding Theorems for a Discrete Source with a Fidelity Criterion," IRE National Conv. Rec, no. 4, pp. 142-163.

Shaw, T. and Fondiller, W. (1926). "Development and Application of Loading for Telephone Circuits," Trans. AIEE, vol. XLV, pp. 268-294, January.

Shaw, Thomas. (1951). "The Evolution of Inductive Loading for Bell System Telephone Facilities. Part I-II," Bell Syst. Tech. J., vol. 30, no. 1, pp. 149-204, January.

Sher, Muhammad. (2002). "Error-Control Coding in Satellite Communication," Pakistan J. Applied Sciences, vol. 2, no. 1, pp. 10-16.

Shestakov, V. I. (1938). Some Mathematical Methods for the Construction and Simplification of Two-Terminal Electrical Networks of Class A, PhD Thesis, Moscow: Lomonosov State University.

Shilling, L. M. and Fuller, L. K. (1997). Dictionary of Quotations in Communications, Westport, CT: Greenwood Press.

Shulman, Seth. (2009). The Telephone Gambit: Chasing Alexander Graham Bell's Secret, New York: W.W. Norton & Company.

Shurkin, Joel N. (2008). Broken Genius, New York: Macmillan.

Shustek, Len. (2009). "An Interview with C.A.R. Hoare," Comms. of the ACM, vol. 52, no. 3, pp. 38-41, March.

Silverman, H. F. (2011). "One City-Two Giants: Armstrong and Sarnoff," IEEE Signal Processing Magazine, vol. 28, no. 6, pp. 125-136, November.

Silverman, H. F. (2012). "One City-Two Giants: Armstrong and Sarnoff: Part 2," IEEE Signal Processing Magazine, vol. 29, no. 1, pp. 144-156, January.

Sincoskie, W. D. (2002). "Broadband Packet Switching: A Personal Perspective," IEEE Comms. Magazine, vol. 40, no. 7, pp. 54-66, July.

Singh, Simon. (2000). The Code Book, New York: Anchor Books.

Slepian, David (Ed.). (1974). Key papers in the development of information theory, IEEE Press.

Smith, B. (1957). "Instantaneous Companding of Quantized Signals," Bell Syst. Tech. J., vol. 36, no. 3, pp. 653-709, May.

Smith, C. W. (1924). "Practical Application of the Recently Adopted Transmission Unit," Bell Syst. Tech. J., vol. 3, no. 3, pp. 409-413, July.

Smith, G. K. (1989). "Choice of FDMA/SCPC Access Technique for Aeronautical Satellite Voice System," N89-27909, pp. 23-28, INMARSAT.

Smothers, Ronald. (1998). "Commemorating a Discovery in Radio Astronomy," New York Times, June 9.  http://www.nytimes.com/1998/06/09/nyregion/commemorating-a-discovery-in-radio-astronomy.html. Accessed March 15, 2013.

Standage, T. (1998). The Victorian Internet, New York: Walker.

Stauffer, Robert C. (1957). "Speculation and Experiment in the Background of Oersted's Discovery of Electromagnetism," Isis, vol. 48, no. 1, pp. 33-50, March.

Stern, L. E. (2011). A Closer Look At Joseph Henry's Experimental Electromagnet, Princeton University, May.

Stevenson, Robert Louis. (1901). Memoir of Fleeming Jenkin, New York: Charles Scribner's Sons.

Stix, Gray. (1991). "Encoding the 'Neatness' of Ones and Zeroes, Profile: David A. Huffman," Scientific American, vol. 265, no. 3, pp. 54-58, September. http://www.huffmancoding.com/my-uncle/scientific-american. Accessed November 18, 2012.

Stockman, H. (1948). "Communication by Means of Reflected Power," Proc. IRE, vol. 36, no. 10, pp. 1196-1204, October.

Taylor, A. S. and Vincent, J. (2005). "An SMS History," Chapter 4 in Mobile World: Past, Present and Future, Ed. by Lynne Hamill and Amparo Lasen, pp. 75-91, Springer.

Tews, E., Weinmann, R.-P., and Pyshkin, A. (2007). "Breaking 104 bit WEP in Less than 60 Seconds," Cryptology ePrint Archive, Report 2007/120. http://eprint.iacr.org/2007/120.pdf. Accessed April 15, 2013.

Thompson, K. (1978). "UNIX Time-Sharing System: UNIX Implementation," Bell Syst. Tech. J., vol. 57, no. 6, pp. 1931-1946, August.

Thompson, Thomas M. (1983). From Error-Correcting Codes Through Sphere Packings to Simple Groups, Cambridge University Press. http://www.cambridge.org/aus/catalogue/catalogue.asp?isbn=9780883850374. Accessed January 11, 2013.

Thomson, J. J. (1897). "Cathode Rays," Philos. Mag, vol. 44, no. 295. http://web.lemoyne.edu/~GIUNTA/thomson1897.html. Accessed December 2, 2012.

Thomson, W. (1878). "Official report by Sir William Thomson upon Mr. Graham Bell's Telephone, exhibited at the Centennial Exhibition at Philadelphia in 1876," J. and Proceedings of the Royal Society of New South Wales, vol. XII, pp. 4. http://zapatopi.net/kelvin/papers/report_on_bells_telephone.html. Accessed November 19, 2012.

Tietavainen, Aimo and Perko, A. (1971). "There are no unknown perfect binary code," Ser. A I, vol. 148, pp. 3-10, Ann. Univ. Turku.

Tietavainen, Aimo. (1973). "On the Nonexistence of Perfect Codes Over Finite Fields," SIAM J. on Applied Mathematics, vol. 24, no. 1, pp. 88-96, Society for Industrial and Applied Mathematics, January.

Torr, J. D. (Ed.) (2003). The Information Age, Greenhaven Press.

Tuller, W. G. (1949). "Theoretical Limitations on the Rate of Information," Technical Report No. 114, MIT, April.

Turing, A. M. (1950). "Computing Machinery and Intelligence," Mind, New Series, vol. 59, no. 236, pp. 433-460, Mind Association, October.

Tyne, Gerald E. J. (1977). The Saga of the Vacuum Tube, Indianapolis: Howard W. Sams & Company.

Ulfving, L. and Weierud, F. (1999). "The Geheimschreiber Secret: Arne Beurling and the Success of Swedish Signals Intelligence," Lecture Notes in Computational Science and Engineering, Springer-Verlag.

Ungerboeck, G. (1974). "Adaptive Maximum-Likelihood Receiver for Carrier-Modulated Data-Transmission Systems," IEEE Trans. on Comms, vol. 22, no. 5, pp. 624-636, May.

Ungerboeck, G. (1982). "Channel Coding with Multilevel/Phase Signals," IEEE Trans. on Information Theory, vol. 28, no. 1, pp. 55-67, January.

Usselman, S. W. (2009). "Unbundling IBM: Antitrust and the Incentives to Innovation in American Computing," Chapter 8 in The challenge of remaining innovative : insights from twentieth-century American business. Edited by Sally H. Clarke, Naomi R. Lamoreaux, and Steven W. Usselman, pp. 249-279, Stanford: Stanford University Press.

Vail, Alfred. (1845). Description of the American Electro Magnetic Telegraph: Now is Operation Between the Cities of Washington and Baltimore, Washington: J. & G.S. Gideon.

van der Pas, Peter W. (1971). "The Discovery of the Brownian Motion," Scientiarum Historia, vol. 13, pp. 27-35.  http://www.physik.uni-augsburg.de/theo1/hanggi/History/VanderPas.pdf. Accessed November 4, 2012.

Vardalas, John. (2004). "Oral-History: Leonard Kleinrock," Interview #434 for the IEEE History Center, IEEE, February 21.

Vaughan, H. E. (1959). "Research Model for Time-Separation Integrated Communication," Bell Syst. Tech. J., vol. 38, no. 4, pp. 909-932, July.

Vermeulen, Dirk J. (1998). "The Remarkable Dr. Hendrik van der Bijl," Proc. IEEE, vol. 86, no. 12, pp. 2445-2454, December.

Vernam, G. S. (1918). "Secret Signaling System," US Patent 1,310,719, filed September 1918, granted July 1919.

Vingron, S. P. (2004). Switching Theory—Insight Through Predicate Logic, Springer.

Viterbi, A. J. (1967). "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm," IEEE Trans. on Information Theory, vol. 13, no. 2, pp. 260-269, April.

Viterbi, A. J. (1992). "Digital Wireless Communication Evolution: What, Why, How, Where and When," Opening Lecture for Digital Wireless Communications Workshop.

von Neumann, J. (1993). "First Draft of a Report on the EDVAC," IEEE Annals of the History of Computing, vol. 15, no. 4, pp. 27-75.

Wallace, G. K. (1992). "The JPEG still picture compression standard," IEEE Trans. Consumer Electronics, vol. 38, no. 1, pp. 18-34, February.

Weaver, A. and Newell, N. A. (1954). "In-Band Single-Frequency Signaling," Bell Syst. Tech. J., vol. 33, no. 6, pp. 1309-1330, November.

Weaver, Warren. (1949). "The Mathematics of Communication," Scientific American, vol. 181, no. 1, pp. 11-15.

Weigmann, Katrin. (2012). "Does intelligence require a body?," EMBO Report, pp. 1-4, European Molecular Biology Organization. http://www.iit.it/images/images/icub-facility/press/emboreports.pdf. Accessed April 22, 2013.

Weinberg, Steven. (1983). "The Discovery of the Electron," The Discovery of Subatomic Particles (Chapter 2), New York: W.H. Freeman.

West, Joel. (2006). Interview: Andrew Viterbi, December 15.

Westin, A. F. (1967). Privacy & Freedom, London: Bodley Head.

Wheeler, David A. (2012). The Most Important Software Innovations. http://www.dwheeler.com/innovation/innovation.html. Accessed April 4, 2013.

Wicker, Stephen B. and Bhargava, Vijay K. (1999). Reed-Solomon Codes and Their Applications, John Wiley & Sons.

Wiegand, Frank H. (1944). "The Marker Principle in Telephone Switching Systems," Engineering and Science Monthly, pp. 9-11, 16, November.

Wiener, N. (1950). Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering Application, MIT Press.

Wiener, N. (1956). "What is information theory?," IRE Trans. Inform. Theory, vol. 2, no. 2, pp. 48, June.

Wigner, E. (1960). "The Unreasonable Effectiveness of Mathematics in the Natural Sciences," Comms. in Pure and Applied Mathematics, vol. 13, no. 1, February.

Wimsatt, W. K. (1943). "What Poe Knew about Cryptography," PMLA, vol. 58, no. 3, pp. 754-779, Modern Language Association, September.

Young, W. R. (1979). "Advanced Mobile Phone Service: Introduction, Background, and Objectives," Bell Syst. Tech. J., vol. 58, no. 1, pp. 1-14, January.

Ziv, J. and Lempel, A. (1977). "A universal algorithm for sequential data compression," IEEE Trans. on Information Theory, vol. 23, no. 3, pp. 337-343, May.

###  Preface

A binary digit or bit can take two values: 0 or 1. A byte is composed of 8 bits. A megabyte (MB) is equivalent to a million bytes or sometimes 220 bytes. The latter number has been standardized as mebibyte (MiB). Likewise, a kilobyte is 1,000 bytes (kB) and a kibibyte is 1,024 bytes (KiB). These were standardized in December 1998 as noted at <http://physics.nist.gov/cuu/Units/binary.html>. Accessed April 5, 2013.

 One recent publication that's biased towards the Internet is (Chatfield 2011). Many aspects of digital technology outside of Internet have been left out in that book.

 Quoted in (Reis 1999).

###  Chapter 1

 Leyden jars are named after Leiden, a Dutch town southwest of Amsterdam. A Dutchman from this town is said to have invented this technique of storing charges back in 1746. A German had independently invented a similar device a year earlier. Details are in _Encyclopaedia Britannica_ , <http://www.britannica.com/EBchecked/topic/338415/Leyden-jar>. Accessed November 24, 2012.

 Noted in (Caneva 1980, pp. 127).

 A summary of Schelling's ideas and influence on Oersted is in (Stauffer 1957, pp. 35-36).

 Quoted in (Caneva 1980, pp. 128) with reference to (Oersted 1806).

 The German publication of 1812 is Ansicht der Chemischen Naturgesetze, durch die neueren Entdeckungen gewonnen. This was translated in 1813 as Recherches sur l'identite des forces chimiques et alectriques by Marcel de Serres. These details are noted in (Caneva 1980, pp. 137).

 Quoted in (Caneva 1980, pp. 128).

 That the discovery happened in July 1820 comes from (Agassi 1967b, pp. 74). Oersted's own account of the discovery was communicated to the editor of the _Annals of Philosophy_ in July 1820, which subsequently was translated from Latin to English and published in (Oersted 1820). This is reproduced in (Sarton and Oersted 1928). One source claiming April 1820 as the date is (Stauffer 1957, pp. 45).

 Noted in (Agassi 1967b, pp. 74).

 Noted in (Agassi 1967b, pp. 73). The author claims that this was done during the lecture of April 1820. Only later during the lecture of July 1820, the idea of placing the wire in north-south direction was realized.

 The number sixty is mentioned in (Stauffer 1957, pp. 46), which may be discerned from Oersted's paper of October 1820.

 Quote is from an English translation of (Oersted 1820). This is reproduced in (Sarton and Oersted 1928, pp. 276). Another source where this is reproduced is (Oersted 1932, pp. 119).

 The terms _current_ and _circuit_ in the electrical sense were born only in later years.

 The word is introduced in (Ampère 1820). Earlier use of the word is in (Bischoff 1802), but the term was not adopted by others.

 (Andrews 1989) mentions the date 1746 and this agrees with Wikipedia's entry on the "Electrical telegraph," <http://en.wikipedia.org/wiki/Electrical_telegraph>, which cites (Standage 1998) and (Fahie 1884). Accessed November 18, 2012.

 The book is (Carrington 1949). An entertaining account of the drums appears in (Gleick 2011, chap. 1).

 For a brief treatment of Sömmering's system, refer to (Andrews 1989). The work of Francis Ronalds is described in  http://www.theiet.org/resources/library/archives/biographies/ronalds.cfm. Accessed November 18, 2012. A more elaborate description is due to (Marland 1962, pp. 33-34). Ronalds original work of 1823 is cited. The original (Ronalds 1823) has been digitized by Google.

 Noted in (Marland 1962, pp. 33). He cites an article of 1753 in Scots Magazine, vol. xv, pp. 73. The anonymous author used initials "C. M." Reference to the same publication is made in passing in (Deloraine 1956, pp. 183).

 Quoted in (Holzmann and Pehrson 1995, chap. 2). An online version of this chap. was referred to at  http://people.seas.harvard.edu/~jones/cscie129/papers/Early_History_of_Data_Networks/Chapter_2.pdf. Accessed April 14, 2013.

 Noted in (Holzmann and Pehrson 1995, chap. 2).

 There have been many methods of symbol mapping. The particular one chosen for this illustration is due to <http://www.10bauches.com/photo/art/grande/1840519-2512735.jpg>. Accessed April 22, 2013.

 Quoted from (Hooke 1726). The lecture to the Royal Society was on May 21, 1684.

 The use of 92 symbols for message is mentioned in (Field 1994, pp. 325). (Holzmann and Pehrson 1995, chap. 2) states that it could be either 92 or 94 symbols. He quotes a letter from 1844 by Abraham Chappe, preserved in the Musée de Poste of Nantes. A symbol map showing 92 symbols appears at <http://nikolasschiller.com/images/semaphore_code.jpg>. Accessed April 22, 2013.

 Noted in (Field 1994, pp. 320, 342).

 The details are of experiments 8 and 14 from (Henry 1831). Analysis and modern interpretation of these results are given by (Stern 2011).

 Load is that part of an electric circuit to which the battery supplies power. As examples, an electric lamp or an electric motor is a load. If the battery has a lower resistance than the load, power delivered is less because overall circuit resistance is high, resulting in lower circuit current. If the battery has a higher resistance than the load, more power could be delivered to the load if its resistance were to be matched to the battery's even if circuit current is reduced.

 Noted in (Bowers 2001, pp. 5). The translation was arranged by Wheatstone.

 Strictly speaking, original Morse Code was not binary since pauses also had significance in addition to dots and dashes. For example, characters C and R were coded as ●● ● and ● ●● respectively, differing only in the placement of the pause. Character T was a dash equivalent to two dots, while L was also a dash but spanned the time required to signal four dots. This early code is given in (Vail 1845, pp. 20).

 Noted in (Marland 1962, pp. 40).

 Noted in (Andrews 1989, pp. 14).

 Noted in (Vail 1845, pp. 19). This book has been digitized by Google.

 (Bektas 2000, pp. 673-676) gives a description of Crimean Telegraphy.

 (Shahvar 2007) gives a detailed account of the border dispute during this period. (Ata 1997) gives an account of Ottoman telegraphy of the nineteenth century.

 Numbers are taken from (Hayes 2008b, pp. 44). (Marland 1962, pp. 45) mentions 2500 miles of cable, which equates to about 2200 nautical miles, in close agreement.

 This evolution in shipbuilding is related in (Lewis 2004).

 Quoted from (Jenkin 1862, pp. 993). A note on Jenkin's contributions is written by Thomson in (Stevenson 1901, appendix I).

 The term is due to Tom Standage who authored (Standage 1998).

 Noted in (Huurdeman 2003, pp. 604).

 The article that mentions this is at  http://www.engadget.com/2005/03/12/nokia-files-patent-for-morse-code-generating-cellphone. Accessed November 18, 2012.

 Although the quote is often misquoted, this particular version is from Bell's family papers as noted in (Beauchamp 2010, pp. 854).

 The book is (Shulman 2009). Arguments in favour of Gray are given in (Evenson 2000).

 (Campanella 2008) gives a summary of Meucci's contributions. (Schiavo 1958) is a biography on Meucci. A recent multi-volume biography is due to (Catania 1994) and (Catania 1996).

 Quoted from (Thomson 1878, pp. 4). For this demonstration, Bell used a magneto-electric design of his own invention. Though it was a poorer design compared to Gray's liquid transmitter, Bell was conscious of the fact that Gray was present at the exhibition.

###  Chapter 2

 It may be argued that there have been other such instances earlier to telegraphy. For example, around 1760 Joseph Black discovered the phenomenon of latent heat. This led James Watt to think about the immense latent heat of steam, which directly led to an improved steam engine that used a separate condenser. This propelled the Industrial Revolution. What science had given here was only an improvement of Thomas Newcomen's steam engines, which had existed since early eighteenth century.

 Quoted from (Hochfelder 2010, pp. 30).

 Quoted from (Hochfelder 2010, pp. 30). The author quotes this from Henry's statement in court.

 Numbers are from (Bordeau 1982, pp. 278). This is also indicative of the fact that since early 1790s the French had adopted the decimal metric system while others had stuck to the old school.

 Quoted from (Maxwell 1873, pp. 388).

 Although Henry Cavendish had derived Ohm's Law using only static electricity, Cavendish had not published his results. They came to public view a century later through a publication of Maxwell.

 (Narasimhan 1999, pp. 157), mentions that Ohm erroneously used electrostatic force when he should have considered potential (voltage) difference.

 (Gleick 2011, pp. 138) notes that Gauss and Weber used left and right deflections of a single needle to code letters of the alphabet. No original references are quoted in his work.

 For a detailed treatment, refer to (Gray 1921, chap. 1). Resistance expressed in emu units appears on pp. 23.

 These methods are listed in (Gray 1921, pp. 655).

 The fact that an electromagnetic wave does not require a medium was realized only later. A detailed discussion on this comes in chap. 11 of this book.

 Quoted from Maxwell's biography due to (Campbell and Garnett 1882, pp. 82). Page number is due to a second edition digital copy of 1999 from Sonnet Software, Inc.

 Noted in (Stevenson 1901, appendix I).

 Quoted from (Campbell and Garnett 1882, pp. 163).

 Quoted from (Weinberg 1983, pp. 13).

 Quoted from (Thomson 1897).

 The final conclusion is problematic for it leads to some controversy. (Brown 1992, pp. 277) claims that it is obvious and transcends empiricism. (Norton 1996, pp. 343) argues that it is not obvious in Norton's Platonic sense but the conclusion can be reached by extension of the thought experiment.

 This is mentioned in (Goldstein 2010, pp. 120). Perhaps the modern use of laboratory mice can be traced to this statement.

 Quoted from (Einstein 1954, pp. 274). The Herbert Spencer lecture was delivered at Oxford on June 10, 1933.

 Quadratic equations are those in which the variable is raised to the power of two. An example would be x2-1 = 0.

 (Ball 2010, pp. 387) mentions earlier works as far back as 1750. Wessel's work came to notice at the end of the nineteenth century.

 D'Alembert's equation was partial differential equation with respect to both time and distance along the length of the string. The dependent variable was the displacement of the string from its stationary position. In particular, the displacement had to be continuous in the sense that it could be expressed by a single equation. For example, this old idea of continuous functions excluded a triangular displacement. The reason for this constraint was that the wave equation required the second derivation of displacement to be valid.

 A comparative study of their differing views is given in (Darrigol 2007) and (Kahane and Rieusset 1995).

 (Darrigol 2007, pp. 418) quotes this translation of Fourier's reading of " _Extrait du mémoire sur la chaleur_ " on December 21, 1807.

 Quoted in (Petroski 2010, pp. 28). Original source is (Kemper 1967).

###  Chapter 3

 The narrative is based on a reading of Brown's publications of 1828 and 1829. These are collected in (Bennett 1866, pp. 463-486). Brown's original paper was first privately circulated but later published.

 Brown himself mentions names of others who had earlier seen similar microscopic movements. These references are in (Bennett 1866, pp. 463-486). Of interest is that (van der Pas 1971) credits Jan Ingen-Housz as perhaps the first to observe, in 1784, such a motion with charcoal powder and realize that this is true even for lifeless particles.

 Quoted from (Darwin 1893, pp. 36).

 Joule's work in electrochemistry is detailed in (Cropper 1988). On pp. 2, Cropper claims accuracy of Joule's results.

 Mean free path is introduced in (Clausius 1859).

 Copyright 2012, Daniel Harari. Reprinted with permission. Available at  http://www.flickr.com/photos/danielharari/7855702696/in/photostream. Accessed March 10, 2013.

 (Cohen 2005, pp. 26) claims that Clausius initially assumed all molecules to be moving at the same speed. However, (Garber 1970, pp. 302) quotes Clausius from a publication of 1857 in which Clausius says, "it is possible that the actual velocities of the several molecules differ materially from their mean value."

 (Brush 1968, pp. 8-13) traces these developments of interpreting Brownian Motion in terms of kinetic theory of heat.

 Quoted in (Kleiner 2001, pp. 151) with reference to (Kline 1972, pp. 621). This is also mentioned in a short biographical essay of Laplace in (Newman 1961, pp. 134).

 Quoted from (Einstein 1956, pp. 1-2). This translation from 1926 of Einstein's 1905 paper on Brownian Motion is due to A. D. Cowper.

 Two of Einstein's papers from 1914 are noted in (Cohen 2005, pp. 24).

 A trillion is 1012. The numbers are mentioned in (Espenschied 1922, pp. 117) and (Clark 1923, pp. 78).

 A description of these mechanical repeaters appears in (Buckley 1952, pp. 245).

 This number appears in (Brittain 1970, pp. 36), which cites (Shaw and Fondiller 1926).

 Quoted in (Bell 1986, pp. 308).

 A six-page biography on Pupin is presented in (Jackson 1938).

 Noted in (Brittain 1970, pp. 41).

 Quoted from (Pupin 1899, pp. 3, lines 2-25).

 The view favouring Campbell rather than Pupin is expressed in (Brittain 1970, pp. 48). On pp. 46, Brittain mentions that Pupin initially wrongly suggested adding capacitors instead of inductors. (Milovanovic 2008, pp. 1) credits Pupin for inventing the toroidal inductor. He also gives Pupin's experimental data giving support to the fact that Pupin's contribution was not just theoretical. (Shaw 1951, pp. 154) credits H. S. Warren for an innovative coil design on a toroidal core as early as 1901.

 These details are in (Brittain 1970, pp. 36, 53, 54).

 Noted in (Buckley 1952, pp. 246). This was the belief as late as 1910. The 1911 milestone is noted in (Shaw 1951, pp. 159).

 (Buckley 1952, pp. 246) quotes from a memorandum of December 10, 1910.

 Quoted in (Constable and Somerville 2003, pp. 51).

 These facts are noted in a biography of Fleming in (MacGregor-Morris 1955, pp. 138, 141).

 This is noted in (Vermeulen 1998, pp. 2448). Wikipedia notes that de Forest was later paid more for full rights to the invention. Wikipedia cites (Tyne 1977). Wikipedia entry at <http://en.wikipedia.org/wiki/Lee_De_Forest>. Accessed March 30, 2013.

 The collage of vacuum tubes is in public domain, taken from Wikimedia Commons at <http://commons.wikimedia.org/wiki/File:6AK5_vacuum_tubes.JPG>. Accessed April 12, 2013.

 This slogan is quoted in many sources. (Mindell 2000, pp. 409) is an example.

 Noted in (Galambos 1992, pp. 123).

 Quoted from (Gorman 1969, pp. 20). A similar note on the high reliability of undersea amplifiers is noted in (Hayes 2008b, pp. 45).

 Noted in (Johnson 1971, pp. 44).

 Quoted from (Johnson 1928, pp. 101).

 Quoted in (Schwartz 2009b, pp. 20). The original source is (Carson 1928). The abstract of Carson's paper containing the quote appears in _Bell System Technical Journal_ , vol. 7, no. 4, pp. 808-809.

 The importance of both band selectivity and directional antennas towards reducing static noise is mentioned in (Brown 1927, pp. 251). This paper relates to the development of transatlantic radiotelephony in which radio engineers played an important part.

 Quoted from (Smothers 1998).

 The word decibel has an interesting etymology, noted in (Mindell 2000, pp. 412). Signal attenuation was initially measured as "a mile of standard cable." In the 1920s, a new unit was introduced and named _transmission unit_. This is described in (Smith 1924). This was renamed as _bell_ , in tribute to Graham Bell. This was later reduced by a factor of ten and thus was born the decibel.

 This is derived easily. Noise power = 10·log10(kTB) = 10·log10(1.38 x 10-23 x 300 x 10 x 106/10-3) = -103.8 dBm

 In the sub-millimetre spectrum and beyond, above 300 GHz, another noise source starts to dominate. This is _quantum noise_. Laser technology suffers from quantum noise. For such systems, low noise receivers are not that useful. When quantum noise was discovered, Nyquist's original equation for thermal noise was generalized by including what we call _zero point noise_.

 (Okwit 1984, pp. 1076) recounts this anecdote of Mumford.

 (Clark 1923, pp. 76-78) gives examples of loading coil and amplifier trade-off with specific reference to the transcontinental cables.

 Noted in (Kline 1993, pp. 83).

 These details are given in (Black 1977, pp. 55) as Black reminisced on his invention fifty years earlier.

 Quoted from (Black 1934, pp. 1-2).

 (Mindell 2000) makes a strong case against Black. He claims that Black did not consider the problem of oscillations.

 Historical background on FM appears in (Armstrong 1936, pp. 689-691) and (Carson and Fry 1937, pp. 513).

 (Armstrong 1936) is the first important paper by the inventor. One of early mathematical treatments of FM is due to (Carson and Fry 1937).

 These dates are mentioned in (Schwartz 2009b, pp. 22). The dispute between RCA (represented by Sarnoff) and Armstrong is detailed in a two-part paper: (Silverman 2011) and (Silverman 2012).

 A useful chronological summary of noise, titled "History of Fluctuation Research," appears in (Abbott et al. 1996, pp. 3).

###  Chapter 4

 Quoted from (Rainey 1921, pp. 3).

 _Teleautograph_ is due to Elisha Gray. Bernhard Meyer uses the term _autographic telegraph_. Rainey's 1926 US patent (No. 1,608,527) uses the term _telephotography_ and so did AT&T a decade later. Edouard Belin named his facsimile invention the _telegraphoscope_. The term _pantelegraph_ is due to Italian physicist Giovanni Caselli. (Huurdeman 2003) talks about _phototelegraphy_ in chap. 18 but early forms of the technology are noted in chap. 9 as _image telegraphy_. Huurdeman also mentions Gustav Grzanna using the term _kopiertelegraph._

 Noted in (Luke 1999, pp. 106).

 The birth of PCM and Reeves' contribution is traced in (Chapuis and Joel 1990, chap. VIII-1, pp. 193-302).

 Strictly speaking, two characters were reserved to differentiate between letters and numbers. Thus the final code had an extended character set that include punctuations and fractions.

 Teleprinter technology was one of those rare inventions that did not happen in Bell Labs. In the early years, it was a service that relied on telegraph lines and not on the Bell System. In 1930, AT&T bought the Teletype Corporation and the next year launched its own Teletype service. _The Teletype Story_ , published by the Teletype Corporation in 1959 documents this history. This is available at  http://www.samhallas.co.uk/repository/telegraph/teletype_story.pdf. Accessed April 25, 2013.

 A detailed comparison of FM and PCM appears in a classic paper on PCM: (Oliver et al. 1948, pp. 1328-1330).

 Quoted from (Luke 1999, pp. 108). In (Butzer and Stens 1992, pp. 45) the authors do not acknowledge the contribution of E. T. Whittaker to the sampling theorem. They argue that Whittaker's concern was the interpolation problem. His approach was purely mathematical and certainly not directed towards sampling of band-limited electrical signals.

 Noted in (Nyquist 2002, pp. 283). This is a reprint of the original paper from 1928. In (Nyquist 1924) the author gives a possible pulse shape that gives low distortion. Such a pulse is generated by a filter circuit. In (Nyquist 2002) he gives a more detailed mathematical treatment of pulse shaping.

 This is stated in (Shannon 1948b, pp. 627). A sinc pulse when sampled within the main pulse results in basis expansion of only 2 _TW_ dimensions, where _W_ is the Fourier bandwidth and _T_ is the width of the main pulse. A more readable account of the theorem can be found in (Anderson and Johannesson 2005, pp. 63-66).

 Root-raised cosine has been used in many wireless standards: European TETRA, North American NADC, Japanese PDC, and Japanese PHS. These details are summarized in an _Agilent Application Note 1298_ , 2001, pp. 44-45, "Digital Modulation in Communications Systems—An Introduction."

 One source mentioning this telephone bandwidth is (Alexander et al. 1960, pp. 439).

 These results are reported in (Goodall 1947, pp. 408).

 Among the early publications on companding are (Meacham and Peterson 1948, pp. 7) and (Bennett 1948, pp. 456-459). (Meacham and Peterson 1948, pp. 33-36) also describes the use of a vacuum tube to perform PCM encoding. (Smith 1957, pp. 667) refers to unpublished work of W. R. Bennett from as early as 1944.

 Baudot multiplexing is illustrated in (Deloraine 1956, pp. 185). The rest of the article describes the evolution of pulse transmissions. A figure on PAM/TDM appears on pp. 188. A figure on PCM appears on pp. 189.

 Noted in (Andrews 2011, pp. 15). A similar anecdote is related in (Sincoskie 2002, pp. 57) in relation to packetized video.

 These remarks at SIGSALY's formal opening are reproduced in (Boone and Peterson 2009, appendix A).

 These dates with reference to trials and line coding are in (Andrews 2011, pp. 16).

 One such tube is described in _The Science News Letter_ (1947), vol. 52, no. 15, pp. 229.

 Quoted from (Reeves 1965, pp. 58, 60). Reeves received the Stuart Ballantine Medal in 1965. In 1969, the British GPO issued a stamp that depicted sampling of a waveform.

 (Bellos 2010, chap. 1) describes different base systems including an enumeration of the first twenty numbers used by Lincolnshire shepherds.

 The word _bit_ was first printed in (Shannon 1948a, pp. 380). It was a term suggested by John W. Tukey.

 One such view is expressed in (Ryan 1996). A philosophical perspective is presented in (Carus 1896).

 This is from Blake's poem titled _Auguries of Innocence_ , which may be read online at <http://www.bartleby.com/41/356.html>. Accessed March 14, 2013.

 Quoted from (Hartley 1928, pp. 554).

 These ideas are standard in any textbook on digital communications. Sampling theorem is stated in (Proakis 2001, pp. 70-71). Nyquist criterion is in (Proakis 2001, pp. 556-561). (Tuller 1949, pp. 8-9) describes the technique of equalization. An excellent history of equalization is in (Falconer 2011).

 This was published after the war as (Wiener 1950) by MIT Press.

 The original report was titled "A Mathematical Theory of Cryptography," dated September 1, 1945. It was declassified after the war and published as (Shannon 1949b).

 Noted in (Shafer and Vovk 2005, chap. 1).

 Heisenberg is quoted in (Plotnitsky 2011, pp. 467). Plotnitsky claims that mathematics has been unreasonably effective in quantum mechanics. His work follows in the tradition of (Wigner 1960).

 It was Hungarian physicist Leó Szilárd who first made the link between information and entropy with reference to Maxwell's Demon, although at the time he did not use the word information. Wiener has used the term negative entropy and so has L. Brillouin. Negative entropy is discussed in (Brillouin 1950). The difficulty of understanding thermodynamic version of entropy prompted Karl Darrow to write a 24-page paper on entropy in the _Bell System Technical Journal_ : (Darrow 1942).

 Quoted from (Shannon 1948a, pp. 398). Shannon's symbol refers to the source rather than signalling, the way we have been using the word elsewhere (Chappe telegraphy). Shannon's symbol is what we had previously named character. Within information theory, Shannon's definition is commonly used.

 Quoted from (Shannon 1948a, pp. 388).

 Noted in (Shannon 1951, pp. 50). A readable account of redundancy can be found in (Partridge 1981).

 Noted in (Slepian 1974, frontispiece).

 Quoted from (Shannon 1948a, pp. 409-410).

 (Guizzo 2003, pp. 50) gives the comments of Doob on Shannon's work.

 This work is (Verdú 1998, pp. 2070-2071).

 Sources for these views are (Shannon 1956) and (Wiener 1956, pp. 48).

 Quoted in (Shilling and Fuller 1997).

 Quoted from (Shannon 1948a, pp. 379).

 For a biography of Weaver, refer to (Rees 1987).

 Quoted in (Shilling and Fuller 1997). Kettering perhaps said this in a different context.

###  Chapter 5

 A biographical article on Huffman appears in (Stix 1991).

 In practice, in modern computing, characters are given fixed length codes by the Unicode Consortium for simplicity. In the basic set of CJK Unified Ideographs, two bytes are used per character. In the extensions, three bytes are used.

 Quoted from (Huffman 1952, pp. 1100).

 Quoted from (Stix 1991).

 Huffman coding was applied to PCM so that the quantization levels could be coded based on source statistics. However, A-Law and μ-Law prevailed due to their simplicity. (Proakis 2001, pp. 111-113) treats Huffman (entropy) coding applied to speech.

 The interview is (Colburn 2004).

 Although BMP format today has provision for compression, most tools and image editors do not support this.

 A readable history of lossless data compression can be found on IEEE Global History Network at  http://www.ieeeghn.org/wiki/index.php/History_of_Lossless_Data_Compression_Algorithms. Accessed November 17, 2012.

 Zeno's paradox is well known. It is discussed in (Bellos 2010, pp. 273-275) and (Ball 2010, pp. 25). Both sources say that Achilles will overtake the tortoise but we must keep in mind that this conclusion is physical and not mathematical.

 Rissanen had published on arithmetic coding three years earlier in (Rissanen 1976). Brief history of arithmetic coding and relevant precedence is found in (Langdon 1984, pp. 148-149).

 Other compression methods not mentioned include the Burrows-Wheeler Transform, which found usage in bzip2 format popular with UNIX users. Prediction by Partial Matching (PPM) uses statistical modelling. Matt Mahoney created PAQ in 2002. PAQ is enhances PPM by combining smartly multiple prediction models to achieve higher compression. Understandably, PAQ is slow but achieves very high compression. PeaZip is one tool for Windows that uses a PAQ variant.

 Quoted from (Golomb 1966, pp. 399-400).

 This limitation comes because SMS uses a low-rate channel. The limit is lower at 70 characters for languages such as Greek and Chinese because each of their symbols would require more than a single byte. A short history of SMS including its social impact is related in (Taylor and Vincent 2005, pp. 75-91).

 Quoted from Twitter blog at <http://blog.twitter.com/2012/03/twitter-turns-six.html>. Accessed November 17, 2012.

 The article titled "Girl writes English essay in phone text shorthand" appeared in _The Daily Telegraph_ , March 3, 2003. Online version is at  http://www.telegraph.co.uk/news/uknews/1423572/Girl-writes-English-essay-in-phone-text-shorthand.html. Accessed April 3, 2013.

 Much of this account is due to (Hutchinson 2001). Interestingly, a software glitch in the initial scan meant loss of a line. A problem with the analogue-to-digital converter resulted in a slightly elongated image. So top line of the image was corrupted to begin with. Lena's quote is taken from <http://www.lenna.org/playboy_backups2/31a.html> and it probably appeared in the November 1972 issue. Accessed November 17, 2012.

 Standardization was started way back in 1987 as noted in (Wallace 1992, pp. 2). Wallace also gives a good technical introduction to JPEG. JPEG was released as ITU-T Recommendation T.81 in 1992. Two years later it became ISO/IEC 10918-1 standard. Through the years, four more parts of the standard have been published.

 A good and easy-to-read introduction of KLT is in (Dony 2001). Dony also mentions that KLT was anticipated by the work of H. Hotelling in 1933. Hence it is also called _Hotelling Transform_ or _Principal Components Analysis (PCA)_.

 Despite this similarity, there are many underlying differences between JPEG and facsimile standards. JPEG has given additional provision to use arithmetic coding instead of Huffman coding. JPEG does not standardize Huffman codes, which means that the Huffman tables must be included in the JPEG file. While JPEG has no problem handling colour, facsimile standards for colour came only in the 1990s. By then, the Web and Internet fax were on the rise. Standard colour facsimile machines didn't turn out to be all that important.

 If we take the average word length in a typical English text as five letters per word, and since it takes a byte per character of uncompressed text, thousand words mean 5 kB. Looking at it another way, a page in a paperback novel containing about 250 words translates to only 1.25 kB.

 This is mentioned in a Project Gutenberg e-book, (Lebert 2008).

 These descendants are H.263, MPEG-4 part 2, and MPEG-4 part 10 (also known as H.264 or AVC). H.263 was intended for video telephony and conferencing at bit rates below 64 kbps. MPEG-4 part 2 is popular with CCTV surveillance systems and includes object-oriented features. H.264 offers better compression than MPEG-2 or H.263 but requires high processing.

 (Butler 1992, pp. 163-164) argues that the lead in digital HDTV was taken by the US where the approach was terrestrial rather than satellite broadcasting. This motivated US companies to compress data into 6 MHz of terrestrial TV bandwidths and thus led to better compression techniques.

 The lecture titled "2001, A Broadcasting Odyssey" was delivered on November 21, 1985. Flaherty acknowledges Arthur C. Clarke as the source of his thoughts. The quote is taken from a publication titled "A Perspective on Digital TV and HDTV" at <http://www.hdtvmagazine.com/archives/flahertyhist.html>. Accessed November 17, 2012.

 Titled _Winter Portrait_ , this image is due to Christofer Andersson. It was downloaded from <http://www.flickr.com/photos/thisyear_/4323595976/>. Access March 4, 2013. Portrait was taken on January 31, 2010.

 (Bullington and Fraser 1959, pp. 353) mentions subsequent work from the mid-1940s due to A. C. Dickieson, P. G. Edwards, and others. Experimental model of A. E. Melhose came in 1950. It should be noted that (Brady 1965, pp. 16) treats a talkspurt as speech without pauses and in this case the average talkspurt is only 1.34 seconds.

 This fact is noted in a summary by (Jayant 1993, pp. 89).

 Two well-known papers on this subject are (Brady 1965) and (Brady 1969). Statistical multiplexing applied to speech is often called _Time Assignment Speech Interpolation (TASI)_.

 Quoted from (Cutler 1950a, column 2, lines 15-25).

 The inventor of DPCM, C. C. Cutler, had filed for a patent in June 1950 under the title _Quantized Transmission with Variable Quanta_ , US Patent 2,724,740. However, Cummiskey makes no mention of this in his own paper of 1973. An example of interpreting adaptive as changes to the prediction coefficients is (Atal and Schroeder 1970).

 Quoted in (Shilling and Fuller 1997).

 This example is taken from (Klatt 1987).

 Some such metrics in use today are _Mean Opinion Score (MOS), Perceptual Objective Listening Quality Assessment (POLQA)_ , and _Perceptual Evaluation of Video Quality (PEVQ)_.

###  Chapter 6

A detailed history of Codex Corporation is due to James Pelkey. This is available online at  http://www.historyofcomputercommunications.info/Organizations/Startups/Codex/CodexHome.html. Accessed December 4, 2012. Codex Corporation was acquired by Motorola in 1977.

 This is related by Lucky in an interview in (Hochfelder 1999b).

 Much of the information on Codex Corporation is from (Goldstein 1995) and (Pelkey). Of the latter, specific sections are 1.5, 3.4, and 3.6. These are at  http://www.historyofcomputercommunications.info/Book/BookIndex.html. Accessed December 4, 2012.

 Geometric interpretation can be found in (Shannon 1948b), (Shannon 1949a), and (Kotelnikov 1947).

 Comparison of different modulation schemes are in (Proakis 2001, pp. 282). (Haykin 2001, pp. 402) gives the bandwidth efficiency of M-ary FSK for different values of M. The use of FSK in SIGSALY is due to (Boone and Peterson 2009).

 The inventor's naming of the code is in (Gray 1947, column 3) of the patent text.

 Comparison of DPSK and BPSK can be found in (Proakis 2001, pp. 275). SNR of a telephone line is given in (Anderson 2005, pp. 114). The use of differential encoding in V.32 is mentioned in (Haykin 2001, pp. 421).

 Another modulation that minimizes phase jumps and hence out-of-band interference is /4-QPSK. When encoding is done differentially, it is called /4-DQPSK. This is so popular that it has been used by various wireless systems: NADC (North America), PDC (Japan), PHS (Japan/China), and TETRA (Europe).

 Although MSK is spectrally compact, it has a wider main lobe compared to QPSK or OQPSK. Main lobe contains the main signal bandwidth and hence the information. Side lobes around the main lobe are due to harmonics. They fall off faster in MSK. Wider main lobe of MSK implies that it may be suitable for narrowband satellite links. A good overview of MSK is in (Pasupathy 1979). (Bennett and Rice 1963, pp. 2356) makes the point that spectral density falls off as inverse fourth power of frequency for CPFSK. If the phase derivative too is continuous it falls off more drastically at inverse sixth power. For MSK, the phase derivative is not continuous.

 GMSK has been adopted by various wireless standards: GSM900 (Europe), CDPD (US), DCS1800 (Europe), and PCS1900 (North America).

 Related by Hamming in an interview of 1977. The quote is taken verbatim as reproduced in (Thompson 1983, pp. 17) who also states that Hamming used a Model V computer. Early computers at Bell Labs were influenced by telephone switching. (Andrews 1963, pp. 349) gives a neat table comparing Model I through Model VI of early Bell Labs computers.

 The use of the van Duuren Code in radiotelegraphy is mentioned in (Hamming 1950, pp. 148). The author also mentions the use of 2-out-of-5 code in Bell computers on p. 147-148.

 Wikipedia QR code has been created by Qrc-designer, dated July 28, 2011. It is available at  http://en.wikipedia.org/wiki/File:Extreme_QR_code_to_Wikipedia_mobile_page.png. Accessed April 20, 2013.

 This comment appears in (Lee 1998, pp. 61).

 This is mentioned in (Thompson 1983, pp. 48). Thompson also states that in 1954 Golay gave a geometric rationale for the construction of his (23, 12) code.

 Trivial codes are the repetition codes with only two codewords (all ones or all zeros) of odd length. (Tietavainen and Perko 1971) proves the non-existence of unknown binary perfect codes. (Tietavainen 1973) extends this to the general case that includes non-binary perfect codes. (Tietavainen 1973, pp. 88) gives a short historical account of the progress made by others in this field.

 More detailed account of Galois's brief life can be found in (Bell 1986, chap. 20) and (du Sautoy 2008, chap. 7).

 Incidentally, Peterson is well known for publishing one of the first books on error-correcting codes. This appeared in 1961 from MIT Press. His most important contribution is perhaps the invention of _Cyclic Redundancy Check (CRC)_. Based on cyclic codes, CRC enables error detection. Today it is widespread in all digital communication systems.

 Other than interleaving, there are specific codes to correct burst errors. P. Fire invented his Fire Codes in 1959. Ten years later H. O. Burton came up with his own cyclic codes that could correct burst errors.

 These numbers are noted in (Shankar 1997, pp. 45-46). These are also summarized in a table due to (Wicker and Bhargava 1999, pp. 57).

 Noted in (Thompson 1983, pp. 182). Thompson also gives mathematical details of how Leech achieved this packing.

 The initial packing in 24-dimensional space was not the best. This was in 1964. Leech published the densest packing three years later.

 Common notations for RM codes are ( _n_ , _k_ , _d_ ) where _d_ is the minimum distance; or ( _r_ , _m_ ) notation, which is helpful in code construction. In these notations, RM (32, 6, 16) is equivalent to RM (1, 5). RM codes for which _r_ =1 and _d_ = _n_ /2 are called _biorthogonal codes_.

 This view was put forward by one his students who wrote an obituary on Elias. This is in (Gallager 2008, pp. 3).

 These numbers are due to (Costello and Forney 2007, pp. 1161).

 Quoted from (Massey 1992, pp. 16).

 This short biographical account of Viterbi is due to (Bell 2006) and an interview of Viterbi in (West 2006).

 That the use of BSC instead of AWGN delayed the coming of soft-decision decoding is supported by (Massey 1992).

 Noted in (Sher 2002, pp. 11).

 The Pioneer 10 and 11 missions are described in (Hall 1975) and (Dyer 1980).

 Noted in (Costello and Forney 2007, pp. 1162).

 Quoted from (Viterbi 1992, pp. 5).

 These views are from an interview of Ungerboeck in (Nebeker 2004).

 This claim about LDPC is made in (Biglieri 2005, pp. 10) with references to a paper of 2001. The same claim is made in (Costello and Forney 2007, pp. 1170).

 Quoted from (Costello and Forney 2007, pp. 1174).

###  Chapter 7

 It is customary to write plaintext in lower case and ciphertext in upper case.

 This is simply 26 x 25 x 24 x... x 2 x 1 or 26! = 4 x 1026.

 Here is an important discrepancy that has not been satisfactorily explained by historians. (Singh 2000, chap. 1) claims that the courier Glbert Gifford was a double agent. If all messages went through Gifford, there could have been no other way for Mary and Babington to exchange the workings of the nomenclator. If they had done this through Gifford, then Phelippes must have been aware of the details of the nomenclator. Phelippes' role as a cryptanalyst is therefore greatly diminished.

 (Pesic 1997, pp. 677) refers to al-Khalīl ibn Ahmad. Reference to al-Kindi is in (Singh 2000, pp. 19).

 (Kahn 1980a, pp. 122) refers to the work of Leon Battista Alberti published in 1466. Alberti's idea was later developed by Johannes Trithemius and Giovanni Porta as noted in (Singh 2000, pp. 46). (Kahn 1980a, pp. 123) notes that a simpler form of one-to-many substitution existed in Europe since the late fourteenth century.

 Examples of such codebooks include _World-Wide Travellers' Cipher Code_ (1901) and _The National Coal Association Telegram Code_ (1918). There are mentioned in <http://www.retro-gram.com/telegramhistory.html>. Accessed December 19, 2012. (Gleick 2011, pp. 253-258) mentions a few other codebooks for telegraphy.

 It is in Arthur Conan Doyle's _The_ _Adventure of the Dancing Men_ that Sherlock Holmes undertakes an interesting cryptanalysis. Jules Verne weaves in a cipher into his novel _La Jangada_. The deciphering of this cipher is related in (Gass 1986). Edgar Allan Poe's _The Gold Bug_ involves a ciphered message that leads to a treasure.

 Quoted in (Luciano and Prichett 1987, pp. 2) and partly in (Wimsatt 1943, pp. 776), which refers to its original publication in Graham's Magazine, vol. XIX, no. 33, July 1841.

 The image is available in public domain at  http://commons.wikimedia.org/wiki/File%3AZimmermann_Telegram.jpeg. Accessed March 10, 2013. Original document is dated January 19, 1917. Digital version is dated February 22, 2011.

 Quoted from (Vernam 1918, pp. 8).

 In 1888, French cryptographer Marquis De Viaris expressed polyalphabetic ciphers mathematically. This is noted in (Beaulieu 2008, pp. 270), which refers to (Kahn 1967, pp. 240-242).

 Today it is recognized that the Hill cipher is prone to known plaintext attacks. It has been found to be unsuitable for encrypting images, particularly those with large areas of uniform features. (Ismail et al. 2006) proposes one way to update the keys automatically with every block.

 The Enigma cipher is described along with the work of Polish cryptanalysts in (Christensen 2007). Description of the Geheimschreiber and its cryptanalysis by Sweden's Arne Beurling is in (Ulfving and Weierud 1999).

 The Geheimschreiber image is available in public domain at <http://commons.wikimedia.org/wiki/File:STURGEON.jpg>. Accessed March 10, 2013. Original image is claimed to be from an NSA website. The Enigma image is from  http://commons.wikimedia.org/wiki/File:Enigma_Verkehrshaus_Luzern_cropped.jpg. Accessed March 10, 2013.

 These terms are introduced in (Shannon 1949b, pp. 708-710).

 This context involving the banking industry is noted in (Kahn 1979, pp. 151). IBM's research in this area is noted in (Diffie and Hellman 1977, pp. 74).

 Strictly speaking, the keys used in each round are derived from an original shared secret key, but both parties follow the same algorithm to derive these _round keys_.

 Quoted from (Kahn 1979, pp. 150). Another source of this discussion is (Dam and Lin 1996, appendix E, pp. 414-420).

 The quote is due to Alan Westin in (Westin 1967, pp. 7). A general discussion on the history of privacy is in (Holvast 2009). Different views of privacy are expressed by Simson Garfinkel (pp. 104-111) and Maureen Sirhal (pp. 124-128) in (Torr 2003).

 These comments are from the journal editor and a referee of Merkle's submission. Comments are reproduced from an interview of Merkle conducted by Arnd Weber on May 18, 1995, available at <http://www.itas.fzk.de/mahp/weber/merkle.htm>. Accessed December 20, 2012.

 Hellman expressed these views in an introduction to a reprint of his classic 1978 paper (Hellman 1978). The reprint appeared in May 2002, _IEEE Communications Magazine_ , 50th Anniversary Commemorative Issue. Hellman's association with Feistel and introduction to Diffie is noted in an interview in (Plutte 2011b, pp. 3-4).

 Quoted in (Shilling and Fuller 1997).

 Quoted from (Diffie and Hellman 1976, pp. 644).

 Quoted from (Hellman 1978, pp. 26).

 This is noted in an interview of Merkle conducted by Arnd Weber on May 18, 1995, available at <http://www.itas.fzk.de/mahp/weber/merkle.htm>. Accessed December 20, 2012. The version of the interview transcript is dated January 16, 2002.

 (Shamir 1982) presents an algorithm to break the Merkle-Hellman system. Other knapsack-based cryptosystems are described in (Chor and Rivest 1985), (Morii and Kasahara 1988), and (Murakami and Nasako 2007).

 RSA method first appeared in a memorandum of April 1977 but appeared in public domain only in 1978. The memorandum is noted in (Gardner 1977, pp. 122).

 (Singh 2000, pp. 288) claims that British cryptographers working with the Government Communications Headquarters (GCHQ)—James Ellis, Clifford Cocks, and Malcolm Williamson—had invented something similar to RSA by 1975. They could not reveal their work because it was classified.

 Quoted in (Russo et al. 1993).

 From the perspective of communication protocols, online security is implemented using _Secure Sockets Layer (SSL)_ and its later IETF standardization called _Transport Layer Security (TLS)_. The SSL/TLS security layer sits in between the application and TCP/IP. Web applications use HTTP (http://) protocol. When this is secured via SSL/TLS, we see HTTPS (https://) on our browsers.

 A note on Zimmermann can be found in (Dam and Lin 1996, pp. 164).

 The attack of 1998 is described at <http://www.isaac.cs.berkeley.edu/isaac/gsm-faq.html>. Accessed April 4, 2013. IBM's method is described in (Rao et al. 2002).

 (Tews et al. 2007) introduces the PTW attack for WEP. The original attack is due to (Fluhrer et al. 2001). The Chopchop WEP attack was posted online by KoreK in 2004 at  http://netstumbler.org/unix-linux/chopchop-experimental-wep-attacks-t12489.html. Accessed April 15, 2013.

 This is noted at <http://www.rsa.com/rsalabs/node.asp?id=2098>. Accessed March 12, 2013.

 This is noted at <http://www.rsa.com/rsalabs/node.asp?id=2004>. Accessed March 12, 2013.

###  Chapter 8

 Quoted from (Babbage 1864, chap. VIII): _Of the Analytical Engine_. Quote is reproduced from an online version of the book at <http://www.fourmilab.ch/babbage/lpae.html>. Accessed January 2, 2013.

 This is narrated by Babbage himself in (Babbage 1864, chap. VIII): _Of the Analytical Engine_.

 Swedish engineers Georg and Edvard Scheutz built a version of the Difference Engine through the 1840s and the 1850s. Towards the end of the 1840s, Babbage himself designed a new version of the original Difference Engine that benefited from the design of the Analytical Engine. Difference Engine II was finally completed in 2002 to Babbage's design by the Science Museum of London.

 Elements of switching in telephony are also found in railway networks. Nonetheless, it is telephony that's considered to have directly influenced switching theory and digital computing that followed.

 Some writers including Claude Shannon in (Shannon 1938) adopted the opposite convention of using zero for a closed relay and one for an open relay.

 Details of the complex number calculator are taken from  http://trillian.randomstuff.org.uk/~stephen/history/timeline-NONMECH.html. Accessed January 5, 2013. (Baker 1977, pp. 364-365) also talks about the Complex Calculator. The development of this calculator from telephone switching is covered in (Andrews 1963), which includes pictures of the teletypewriter used and the computing equipment.

 If there is a carry involved, we need to add three. The explanation for this addition is that when a carry happens, the result has an extra ten because addition is binary but representation is excess-three BCD. This is in addition to the extra three. Thus we need to subtract thirteen from the result. This is equivalent to subtracting sixteen (accomplished by dropping the carry bit) and adding three. For the advantages of excess-three BCD, refer to (Glaser 1971, pp. 141-143).

 The original Baudot Code was standardized as _International Telegraph Code No. 1_. The popular Murray Code often used in teletypes was standardized as _International Telegraph Code No. 2_. Historically, the distinction was forgotten. Both were simply referred to as Baudot Code.

 Quoted from (Eccles and Jordan 1918, pp. 1).

 Paper tape image is from <http://commons.wikimedia.org/wiki/File:Baudot_Tape.JPG>. Accessed April 25, 2013. It is attributed to Ricardo Ferreira de Oliveira. Hollerith punch machine is from <http://commons.wikimedia.org/wiki/File:CTR_census_machine.JPG>. Accessed April 25, 2013. This photograph is from March 24, 2005. It is available in public domain.

 These numbers are from (Hollingdale and Tootill 1965, pp. 62-63). Weight of the machine is mentioned at <http://en.wikipedia.org/wiki/History_of_computing_hardware>. Accessed December 2, 2012.

 This was von Neumann's first draft titled "First Draft of a Report on the EDVAC." This draft and later revisions are said to contain many typographical errors. Michael D. Godfrey corrected many errors and republished the same as (von Neumann 1993).

  http://www.computerhistory.org/semiconductor/timeline/1931-The-Theory.html. Accessed January 2, 2013.

 Quoted in (Shurkin 2008, pp. 89). The comment is also noted at <http://www.auuuu.com/computereducation/history>. Accessed April 3, 2013.

 Shockley's initial idea came in 1939 and a refinement was sketched in his notebook in April 1945. This is noted in (Hoddeson 1981, pp. 60-62). Julius Lilienfeld patented a field-effect device in 1926. It used copper sulphide but it was not manufactured. Oskar Heil patented a similar device in 1934 but it too was not translated into a working prototype. These earlier developments are noted in  http://www.computerhistory.org/semiconductor/timeline/1926-field.html. Accessed January 2, 2013.

 These numbers are due to (Hoddeson 1981, pp. 74-75). By early 1948, researchers at Purdue University had noted the role of minority carriers but did not observe amplification because the electrodes were too far apart.

 Vacuum tubes continue to survive in specific applications including television display tubes and X-ray radiation tubes.

 A prototype computer at Manchester University is said to be the first to use transistors. Early applications of the transistor are noted in  http://www.computerhistory.org/semiconductor/timeline/1952-Consumer.html and  http://www.computerhistory.org/semiconductor/timeline/1953-transistorized-computers-emerge.html. Accessed January 2, 2013.

 Leonard Kleinrock recalls such a comment from his college days. He mentions this in an interview in (Vardalas 2004).

 (Huffman 1954a, pp. 161-162) refers to G. A. Montgomerie for a 1948 description of a _table of combinations_. (Vingron 2004) credits L. Wittgenstein for inventing truth tables.

 Karnaugh was inspired by the work of E. W. Veitch whose _chart method_ appeared in 1952. However, K-maps endured because they were easier to use.

 This example appears in (Brand and Sherlock 1973, pp. 5-6).

 Quoted from (Huffman 1954a, pp. 174).

 These developments and the difficulties of wiring are noted in (Brunetti and Curtis 1947).

 Quoted in (Kilby 1967, pp. 649). This was at a symposium of the American Institute of Electrical Engineers.

 Details on these developments including names of inventors are noted in an online history of semiconductors at <http://www.computerhistory.org/semiconductor/timeline.html>. Accessed January 2, 2013.

 This view is expressed by Jay Lathrop in an interview in (Morton 1996).

 This is noted in (Kilby 1967, pp. 652). Later in his Nobel lecture of 2000, Kilby mentions the suitability of titanium nitride for resistors and Teflon for capacitors. This is noted in a compilation of Nobel lectures in physics by (Ekspong 2002, pp. 480).

 The internal zoom of the chip is due to Angeloleithold. Photograph is dated 2004. Obtained from  http://commons.wikimedia.org/wiki/File:IC_Nanotecnology_2400X.JPG. Accessed May 1, 2013.

 These views are expressed in (Negroponte 1995, pp. 75-76).

 Quoted from (Bassett 2002, pp. 3-4). Bassett has written in detail of MOS history with focus on Bell Labs, RCA, Fairchild, and IBM.

 These details of PDP-8 are from (Ceruzzi 2003, pp. 129-136).

 (Ceruzzi 2003, pp. 192) gives a neat table listing some major minicomputer vendors in the period from 1965-1974.

###  Chapter 9

 The quote is from Note A and Note G of Ada from (Menabrea 1842).

 Quoted from (Menabrea 1842).

 This is claimed in (Wheeler 2012).

 These names were acronyms of descriptions of these languages—Formula Translation (FORTRAN), Common Business Oriented Language (COBOL), List Processing (LISP), and Algorithmic Language (ALGOL).

 Noted in (Ingerman 1967).

 An earlier definition of a compiler was an automated program that put together common pieces of code to form a complete program that solved a specific problem. This is noted in (Ceruzzi 2003, pp. 85).

 Noted in (Ceruzzi 2003, pp. 92).

 Quoted from the IBM publication _Our History of Progress (1890s to 2001)_ , 2008, pp. 54, 56. Online at <http://www-03.ibm.com/ibm/history/interactive/ibm_ohe_pdf_13.pdf>. Accessed January 23, 2013.

 That System/360 failed as a time-sharing system is noted in (Usselman 2009, pp. 265).

 (Usselman 2009, pp. 266) writes about antitrust action against IBM and its influence on unbundling software.

 These various definitions of software are noted in (Haigh 2002, pp. 5-6).

 Some of these views are expressed in (Haigh 2002), which gives a detailed summary of software evolution through the sixties.

 Quoted from (Dijkstra 1968, pp. 147).

 (Haigh 2010, pp. 16) gives a count of attendees at the conference and notes that most of them were from universities and research groups. Very few were project managers or active programmers. The very definition of software engineering was open to interpretation even into the 1980s as noted in (Mahoney 1990, pp. 327).

 An overview of structured programming is in (Mills 1977, pp. 1203-1204). Important ideas of structured programming were assertions (R. W. Floyd), Hoare logic (C. A. R. Hoare), and predicate transformers (Edsger Dijkstra).

 Quoted from (Ritchie and Thompson 1978, pp. 1927).

 Brian Kernighan, one of the gurus of C programming, comments in (Kernighan 2008) that even when sophisticated GUI-based tools are available these days, to do simple tasks, he still prefers the powerful and robust tools of the UNIX world.

 Quoted from (Ritchie and Thompson 1978, pp. 1928).

 Thompson noted in 1978 that the UNIX kernel was written with 10,000 lines of C code and another 1,000 lines of assembly code. This is from (Thompson 1978, pp. 1931).

 Quoted in (Mahoney 1990, pp. 334).

 Details are noted at <http://digital-law-online.info/lpdi1.0/treatise17.html>. Accessed April 4, 2013. There is some ambiguity to the inclusion of software under the Copyright Act of 1976. Hence, the year 1980 is considered the actual date in which software became protected. This is noted in (Johnson 1994, pp. 327) and (Samuelson 2011, pp. 1746).

 These details are in an interview of Hoare in (Shustek 2009, pp. 39).

 (Wheeler 2012) credits Maurice Wilkes, Stanley Gill, and David Wheeler for developing the concept of subroutines back in 1952.

 This is noted on a Stanford University webpage  http://www-cs-faculty.stanford.edu/~eroberts/courses/soco/projects/2008-09/tony-hoare/quicksort.html. Accessed January 26, 2013.

 BASIC stands for Beginner's All-Purpose Symbolic Instruction Code. It was invented in 1964 by students at Dartmouth College. The fact that this letter arrived so early in January was because the January issue of Popular Electronics had been available in stores since December 1974.

 Noted in (Allen 2012, pp. 77) by Paul Allen, one of the co-founders of Microsoft.

 The concept of personal computers originated with a remark by John W. Mauchly. This was noted in an article titled "Pocket Computer May Replace Shopping List" that appeared in _The New York Times_ on November 3, 1962. Though early electronic calculators were sometimes termed as computers, the modern realization of the personal computer came with the microcomputer of the late 1970s.

 This comparison is made in (Allen 2012, pp. 126).

 (Campbell-Kelly and Aspray 2004, pp. 224) quotes these numbers with a similar argument.

 Quoted from (Allen 2012, pp. 109).

 Quoted from (Allen 2012, pp. 130).

 This is with reference to the Cape Cod System and its evolution to the SAGE system. These are described later in chap. 10. Online references on these systems are at <http://www.ll.mit.edu/about/History/capecodprototype.html> and <http://www.ll.mit.edu/about/History/SAGEairdefensesystem.html>. Accessed April 9, 2013.

 Noted in (Ceruzzi 2003, pp. 260).

 Noted by Alan Kay in a history of Smalltalk in (Kay 1993, pp. 71).

 Quoted from (Kay 1993, pp. 75).

 Barry Boehm gives a short description on SW-CMM in (Goth 2008, pp. 9). Other developments in Software Engineering can be found in (Northover et al. 2008) and (Boehm 2006).

 Mentioned in (Ante 2010) in an online version of _Wall Street Journal_. Accessed December 30, 2012.

###  Chapter 10

 Manual switching is described in (Craft et al. 1923, pp. 53-59).

 For _N_ users, for one-to-one dedicated circuits, we require _N_ ( _N_ -1)/2 circuits.

 This is noted in Wikipedia: <http://en.wikipedia.org/wiki/Almon_Brown_Strowger>. Accessed February 3, 2013. This is attributed to Katherine Thompson in her history of Penfield where Strowger was born. A good deal of biographical information on Strowger is found at http://www.almonstrowger.com. Accessed February 3, 2013.

 Image was originally uploaded on April 17, 2007. Attributed to Brock Craft, alias Thatbrock. It was obtained from  http://commons.wikimedia.org/wiki/File:Uniselector_Stepper_detail.jpg. Accessed March 22, 2013.

 (Andrews 1963, pp. 341-342) mentions both panel and rotary switches. He also credits panel switches for encoding decimal pulses to forms more suitable for machine processing. A detailed description of panel switches appears in (Craft et al. 1923, pp. 64-74).

 This example is pictorially depicted in (Scudder and Reynolds 1939, pp. 93). This paper is also a good introduction to the workings of a crossbar exchange.

 (Wiegand 1944) gives an overview of markers within crossbar switches.

 Sheldon Hochheiser notes this in a history of electromechanical switching at  http://www.ieeeghn.org/wiki/index.php/STARS:Electromechanical_Telephone-Switching. Accessed April 9, 2013.

 Quoted from (Keister et al. 1964, pp. 1841).

 ESSEX is described in (Vaughan 1959). Early work with switching electronics is in (Malthaner and Vaughan 1952) and (Joel 1956b).

 This is noted in (Keister et al. 1964, pp. 1840). An early description of SPC for switching is in (Seckler and Yostpille 1958). Cathode ray tubes used for program storage are described in (Hoover et al. 1958) and (Purvis et al. 1959).

 Work on signalling is described in (Weaver and Newell 1954) and (Breen and Dahlbom 1960). The latter work compares different systems of tone signalling on pp. 1436. Touch Tone is described in (Meacham et al. 1958).

 This is mentioned in a biography of Deloraine by (Chapuis and Joel 1990, pp. 306-307).

 A detailed reading of ESSEX design clarifies this point in (Vaughan 1959). This view is supported by (Chapuis and Joel 1990, pp. 308).

 This view is expressed by Philip C. Richards at  http://www.ieeeghn.org/wiki/index.php/First-Hand:Event_in_Telecom_Switching_Development. Accessed February 3, 2013.

 No. 101 ESS is described in (Browne et al. 1969). The switching structure of Moorgate exchange is noted in (Chapuis and Joel 1990, pp. 316).

 System overview of No. 4 ESS is in (Ritchie and Tuomenoksa 1977). CCIS is described in (Croxall and Stone 1978).

 (Martersteck 1981, pp. 1044) notes the use of EPL and EPLX.

 In a different context, radio engineers working on the first transatlantic radio telephone link in the 1920s used the same principle to conserve bandwidth. On a voice-switched basis, the same band was used for communication in both directions. This is noted in (Brown 1927, pp. 254).

 These details are documented in (Horton and Vaughan 1955).

 Noted in (Alexander et al. 1960, pp. 461).

 (Ketterling 2004, pp. 74) notes that emergency services clock lower call durations. It is also typical for call centre enquiries to be longer than the average. (Braver et al. 2009, pp. 5) notes that call durations from mobiles while driving is about four minutes.

 Baran mentions this in an interview in (Baro 2003, pp. 29).

 Quoted in (Cordeschi 2002, pp. 163).

 Quoted from (Boehm and Baran 1964, pp. 1-2).

 Kleinrock's background and the early years leading to his entry into MIT are noted in an interview in (Vardalas 2004).

 When Davies wrote his paper (Davies 2001), he left explicit instruction that it should be published only after his death. Kleinrock launched a webpage in 1996 in which he defended his own claim to the invention. This is noted an article of _The New York Times_ dated November 8, 2001. This is available online at  http://www.nytimes.com/2001/11/08/technology/a-paternity-dispute-divides-net-pioneers.html. Accessed February 18, 2013.

 An online reference to the quote is at <http://www.quotationspage.com/quote/367.html>. Accessed April 3, 2013.

 Early history of packet switching in the UK is described in (Kirstein 2009).

 This claim has been challenged by Shiva Ayyadurai of the University of Medicine and Dentistry of New Jersey (UMDNJ). Ayyadurai developed a full-fledged email package in 1978. Ayyadurai claims that everything that happened before that was simply message exchanges that bore little resemblance to the modern form of email. Details are at  http://www.nethistory.info/History of the Internet/email.html. Accessed February 20, 2013.

 This event is noted in (Naughton 2001, pp. 140-141).

 Noted in (Abramson 1970, pp. 7).

 Quoted from (Metcalfe and Boggs 1980, pp. 2).

 This is noted in (Kirstein 2009, pp. 24) in the context of LAN development in the UK.

 Strictly speaking, by 1972, ARPA had been renamed to Defence Advanced Research Projects Agency (DARPA). Interestingly, in 1993 it was renamed to ARPA and in 1996 back to DARPA. These are noted at  http://www.darpa.mil/About/History/ARPA-DARPA__The_Name_Chronicles.aspx. Accessed April 4, 2013.

 Quoted from (Gray 2005, pp. 89). The role of voice in the birth of TCP/IP is described in the same article. A more detailed history is in (Gray 2009b).

 The quote occurs in the transcript of a speech at New York University on May 29, 2001. This is reproduced in (Gay 2002, pp. 165).

 This number is taken from a timeline exhibition at the ACM/IEEE CS SC97, Supercomputing 97 Conference. Available online at  http://www.computerhistory.org/internet_history/internet_history_80s.html. Accessed February 10, 2013.

 Quoted from an online version of (Bush 1945a) from the archives of _The Atlantic Monthly_.

 The term hypertext was coined by Ted Nelson in 1963. Apple's HyperCard of 1987 may be seen as an early example of implementing this concept.

 For the layperson, a description of ISDN appears in (Chapuis and Joel 1990, pp. 506-507). A short chronology of ISDN appears on pp. 506. A view of ISDN in its initial years of deployment is in (Joel 1984, pp. 70).

 These numbers are from (Joel 2002, pp. 14). Downstream refers to the direction from network to subscriber. Upstream is from subscriber to network.

 Quoted from (Sincoskie 2002, pp. 57). Sincoskie wrote this article when he was with Telcordia Technologies.

 Quoted from (Okwit 1984, pp. 1079).

 These numbers are from (Chapuis and Joel 1990, pp. 530). Similar numbers are in (Hayes 2008b, pp. 46) where capacity is quoted in bits per wavelength per optic fibre. Note that TAT-1 initially supported 36 calls that later grew through 48 and 72 due to bandwidth optimization and Statistical TDM.

 (Chang et al. 2010, pp. S53) mentions these codes. Typical BER requirements in optical transport networks are on pp. S52.

 These numbers are reported in (Cooper 2012).

###  Chapter 11

 This view is expressed by Hertz in a paper from 1884, reproduced in (Hertz 1896, pp. 274). The paper is titled "On the Relations Between Maxwell's Fundamental Electromagnetic Equations and the Fundamental Equations of the Opposing Electromagnetics."

 A short biography of Maxwell that also introduces electromagnetic wave theory is in (Newman 1961, pp. 139-193). A definitive biography is (Campbell and Garnett 1882).

 The work of Savary is noted in (Blanchard 1941, pp. 415-416).

 A commentary that points out the significance of Thomson's works is in (Lodge 1931, pp. 512-513).

 This view is expressed in (Sengupta and Sarkar 2003, pp. 16). On pp. 17 they credit FitzGerald and Lodge for the generation of electromagnetic waves other than light.

 Quoted from (Lodge 1931, pp. 513-514).

 The statement was made at a meeting of the German Association for the Advancement of Natural Science and Medicine, at Heidelberg, on September 20, 1889. The translation of this lecture is printed in (Hertz 1896, pp. 313-327). Quote is from pp. 322. Lecture is titled "On the Relations Between Light and Electricity."

 Loomis's US patent is No. 129,971. Edison's US patent is No. 465,971.

 D. E. Hughes used the effect to "listen" to the waves on a telephone receiver. He did not publish the results of his 1879 experiments. He gives a first-hand description of the experiments in (Fahie 1899).

 An historical account of Branly's work on the coherer is in (Dilhac 2009). Oliver Lodge credits himself for discovering the coherer principle in (Lodge 1931, pp. 518), but credits Branly for making it practical. (Hong 1994, pp. 725) supports this view. It was Branly who began using it as a radio detector.

 (Hong 1994, pp. 718-719) mentions that Hugh Aitken in his book Syntony and Spark: The Origins of Radio, credited William Crookes and Oliver Lodge for wireless telegraphy. He refutes this claim and makes the case in favour of Marconi.

 This development is noted in (Dilhac 2009, pp. 22).

 William Crookes and Nikola Tesla have also been credited for tuned transceiver designs in the early 1890s before the works of Lodge and Marconi.

 (Schwartz 2008) describes the contribution of Squier to carrier multiplexing. The author informs that AT&T did not give due credit to Squier and argues in Squier's favour.

 While broadcasting is an old term, the terms unicasting and multicasting came later in the days of the Internet and IP technology. An example of multicasting is online web chatrooms. A comment made by anyone within the chatroom goes to the server, which then updates all users currently in the chat. Though chatrooms are multicasting at the application level, they work using HTTP connectionless point-to-point client-server model.

 These numbers are from (Joel 2002, pp. 12).

 The inset is an original creation. The photograph is due to Holger Ellgaard. Image is dated 2008. It was obtained from <http://commons.wikimedia.org/wiki/File:Kristallradio.JPG>. Accessed April 14, 2013.

 Shortwave transatlantic milestone of ham radio operators is noted at <http://www.radioclubofamerica.org/history.php?page=1921.html>. Accessed April 4, 2013. AT&T's milestone is documented at <http://www.corp.att.com/attlabs/reputation/timeline/27atlan.html>. Accessed March 7, 2013. A description of the technologies used in AT&T's transatlantic voice service is in (Brown 1927). Here it is mentioned that the service used 3 kHz bandwidth at 60 kHz carrier in half-duplex mode that was voice-switched.

 Heising's work is noted in (Messerschmitt 1997). On pp. 749, the author notes that this was the first use of two orthogonal carriers. An early FDM carrier of 1918 between Baltimore and Pittsburgh used AM-SSB modulation as noted in (Joel 2002, pp. 8). Heising's own publication in relation to the transatlantic project is (Heising 1925).

 Armstrong's US patent on positive feedback is No. 1,113,149. An overview of positive feedback and regenerative detectors is in (Myers and Dorbuck 1977, pp. 70, 241).

 History of the superheterodyne architecture is in (Schottky 1926) and (Armstrong 1924).

 One of the engineers who worked on the project gives a technical description in (Mofenson 1946). This work is also related in (Grieg et al. 1948).

 (Stockman 1948) and (Mofenson 1946) note this relationship.

 This is noted in (Schick 1963, pp. 14). This was the Committee on the Peaceful Uses of Outer Space.

 Quoted in (Pelton 2010, pp. 26). SCORE is an acronym for Signal Communication by Orbital Relay _._

 A first-hand description of ECHO is in (Pierce 2007).

 Telstar is described at first hand in (Pierce 1990). Telstar launch dates are in (Joel 2002, pp. 12).

 Pierce expressed this view in a three-part interview in (Goldstein 1992). A short biography of Pierce published by the National Academies Press appears at  http://www.nap.edu/readingroom.php?book=biomems&page=jpierce.html. Accessed April 14, 2013.

 The laws have been reproduced from <http://www.quotationspage.com/quotes/Arthur_C._Clarke>. Accessed February 28, 2013. A variation of the second law appears in the same source as, "The only way of finding the limits of the possible is by going beyond them into the impossible." Another variation appears in  http://www.clarkefoundation.org/sample-page/sir-arthurs-quotations as, "The limits of the possible can only be defined by going beyond them into the impossible." Accessed February 28, 2013. These laws appeared in various essays that were collectively published in later editions of _Profiles of the Future_. The first edition of this book appeared in 1962.

 This context is noted in (Pelton 2010, pp. 29).

 This narrated in (Puente 2010, pp. 18).

 This fact is noted in (Chapuis and Joel 1990, pp. 531).

 (Smith 1989) describes how in INMARSAT aeronautical systems TDM/TDMA was chosen for data and SCPC for voice.

 (Okwit 1984, pp. 1079) notes the use cryogenics. On pp. 1080 photographs of cryogenically cooled amplifiers used in Intelsat are shown.

 The term is used in (Young 1979, pp. 11). (Frenkiel 2010, pp. 18) credits Phil Porter of Bell Labs for this innovation and uses a simpler term for it—"dial-then-send."

 These numbers are due to (MacDonald 1979, pp. 32). Higher numbers are quoted in (Kucar 1991, pp. 76) based on 50 MHz of spectrum allocation. This implies that additional spectrum was allocated in later years to AMPS.

 Quoted from (Frenkiel 2009, pp. 33). The quote relates to cellular work that was started in Bell Labs in 1966.

 An evolution of DSPs is given in (Eyre and Bier 2000). This includes the technical parameters that set them apart from generic microprocessors. The earliest DSP was from TRW in 1973 but it was too expensive for its time to become a commercial success.

 After the first GPRS specifications were standardized in 1997, it took a few years for the service to enter commercial networks. Typical data rates experienced by users were usually far lower than the theoretical maximum.

 These developments are noted in (Ceruzzi 2003, pp. 287-290).

 A short history of ARM is in (Levy 2005).

 This fact is noted in (Chatfield 2011, pp. 74).

 Quoted in (Erdman 1993, pp. 48).

 Global CDMA subscriber base as of Q4 2011 was accessed at <http://www.cdg.org/worldwide/cdma_world_subscriber.asp>. Accessed April 15, 2013. Numbers relating to GSM are from an article of October 2012 at  http://www.gsma.com/newsroom/gsma-announces-new-global-research-that-highlights-significant-growth-opportunity-for-the-mobile-industry. Accessed April 15, 2013.

 An historical summary of cellular standards up to 3G appears in (Poole 2006, pp. 10-11).

 Cooley gives a first-hand account of the history of FFT in (Cooley et al. 1967) and (Cooley 1987). (Heidemann et al. 1985) describes the work of Gauss. The work of Gauss in relation to FFT was first published posthumously in 1866.

 Such a standard is the IEEE 802.11u. IEEE has also introduced standards to cater for greater mobility—802.11r for handovers and 802.11p for fast moving vehicles.

###  Chapter 12

 These are due to (Hilbert and Lopez 2011, pp. 60, 63).

 This is noted on pp. 130 in an Ofcom report "International Communications Market Report 2012," published in December 2012. Report is available at  http://stakeholders.ofcom.org.uk/binaries/research/cmr/cmr12/icmr/ICMR-2012.pdf. Accessed April 15, 2013.

 Quoted from (Meddeb 2010, pp. 86).

 These traffic categories are part of IEEE 802.11e, which attempts to provide QoS on WLAN. Similar categories exist in other technologies such as Virtual LAN, WiMAX, and 3G. In general, these may be referred as _Class of Service (CoS)_ whereby QoS is tied to the traffic class rather than source or destination of the traffic flow. DiffServ, IntServ, RSVP, and MPLS are some of the protocols used in relation to QoS. Even within an IP packet header, _Type of Service (TOS)_ field caters for service differentiation. DiffServ has reengineered this field for its own purposes.

 (Meddeb 2010, pp. 86) and (Bricklin 2003) express this view.

 Quoted from (Moore 1965, pp. 115-116).

 Noted on Intel's website at  http://www.intel.com/content/dam/www/public/us/en/documents/corporate-information/history-moores-law-fun-facts-factsheet.pdf. Accessed April 4, 2013. Article is titled "Fun Facts: Exactly how small (and cool) is 22 Nanometers?"

 This is quoted from the archives of _The Economist_ as reproduced at <http://www.economist.com/node/14116121?story_id=14116121>. Accessed April 22, 2013. The original article appeared on November 19, 1955.

 (Ceruzzi 2003, pp. 191-192, 305) talks about this law briefly.

 End-to-end argument viewed from this perspective is thoroughly discussed in (Cooper 2004, pp. 41-92). The article by Mark Lemley and Lawrence Lessig in the publication is titled "The End of End-to-End: Preserving the Architecture of the Internet in the Broadband Era."

 This view is expressed by Thomas Fisher, a professor at the University of Minnesota, in an online article at  http://www.esri.com/news/arcnews/fall12articles/place-based-knowledge-in-the-digital-age.html. Accessed March 14, 2013.

 This survey is described in an article titled "Library Services in the Digital Age" at <http://libraries.pewinternet.org/2013/01/22/library-services>. Accessed March 14, 2013.

 This example is noted in (Patel 2012). The university in question is Texas A&M University—San Antonio.

 UK number is noted on pp. 151 in an Ofcom report "International Communications Market Report 2012," published in December 2012. Report is available at  http://stakeholders.ofcom.org.uk/binaries/research/cmr/cmr12/icmr/ICMR-2012.pdf. Accessed April 15, 2013. US number is noted in an online book review at  http://www.businessweek.com/magazine/content/10_24/b4182000596077.htm. Accessed March 13, 2013. The book in question is Nicholas Carr's _The Shallows: What the Internet Is Doing to Our Brains_.

 Quoted in (Shilling and Fuller 1997).

 This average for 2012 is noted at  http://www.securelist.com/en/analysis/204792276/Kaspersky_Security_Bulletin_Spam_Evolution_2012. Accessed April 4, 2013. The analysis shows that email spam is on the decline since alternative online channels have become cheaper.

 Such an advice is from Michael Horowitz in an article titled "How to be as safe as possible with Java" at  http://blogs.computerworld.com/cybercrime-and-hacking/21626/how-be-safe-possible-java. Accessed March 22, 2013.

 (Prasad 2009) supports this view. Official memory requirements from Microsoft are noted at  http://office.microsoft.com/en-in/office-2003-resource-kit/office-2003-licensing-and-system-requirements-HA001140301.aspx and <http://technet.microsoft.com/en-us/library/ee624351.aspx>. Accessed April 21, 2013.

 This is noted in (Metz 2012).

 This advertisement from Qualcomm appeared in _IEEE Communications Magazine_ , January issue of 1990.

 One report that takes this view is Phillip J. Longman's "The Information Age Has Not Dramatically Improved Everyday Life." This is reproduced in (Torr 2003, pp. 40-44).

 This research finding is noted in (Davidow 2012, pp. 48).

 These numbers are from (McAfee and Brynjolfsson 2012, pp. 62). Energy statistic is noted in (Levy 2012).

 The Deep Web has been described in (Chatfield 2011, pp. 92-95).

 The original imitation game involved two human interviewees with different goals but Turing does ask what happens when one of them is replaced by a machine in (Turing 1950, pp. 434).

 Quoted from (Minsky 1986, pp. 4). This is an excerpt from the book _Society of Mind_ by the same author.

 (Weigmann 2012) describes the project. In February 2013, it was reported that iCub robot is able to learn languages. This appeared at <http://www.sciencedaily.com/releases/2013/02/130219102649.htm>. Accessed April 22, 2013.

 Different layouts of rotary dials are shown in an illustration in (Breen and Dahlbom 1960, pp. 1434).

 CCITT's involvement in standardizing the layout is noted in (Chapuis and Joel 1990, pp. 501).

 Quoted in (Shilling and Fuller 1997).

