In his video on large language models, or LLM's, OpenAI co-founder and YouTuber Andre Carpathie liken LLMs to operating systems, Carpathie said, I see a lot of equivalence between this new LLMOS and operating systems of today. I'm intrigued by this notion. Operating systems are some of the world's most important technologies, with a history spanning 80 years. It mirrors the journey of computing in all of its physical forms. In today's video, we look at the evolution of the operating system. So what does an operating system do? Maybe not so unsurprisingly, thanks to the long history, this is hard to pin down. One definition I like says that the OS manages the computer's resources for the user efficiently, reliably, and unobtrusively. Hardware is hard. There's a lot of it. The CPU, main memory, secondary memory, display, keyboard, mouse, and the network Users and their applications must navigate the idiosyncrances and pains of that hardware to make it do something useful. Operating systems help make this easier by giving the user or their application programs a clean, pleasant interface for their task, abstracting away the horrors of hardware. An operating system is defined by its abstractions, because those are what the users are interacting with on a daily basis. Some have been around for so long. We forget how revolutionary they are. For instance, take the humble file. In the beginning, users dealt with physical memory, working with cells and bits, but each memory system type has its own peculiarities, and dealing with all of them is a pain in the butt. You might risk one program overriding data being used by another, causing both to crash. It is advised to avoid this. The file throws a blanket on top of all that and just gives you this nice clean abstraction. You might think that your file is sorted away somewhere as a discrete entity on computer memory, like a book in a bookcase. But this is a fraud. In reality, the file data is scattered in pieces like Cheetos across wherever the computer happened to have storage. When you open a file, the file system is gathering those bits, putting it into the right order, and presenting it to you. The OS automatically handles all that behind the scenes, putting them in either the primary or secondary storages as needed. Abstractions like this are needed for us to do our work. Every day we are interacting with abstractions built on top of more abstractions, and it all somehow works. The first computers of the 1940s and 1950s were made to be used by just one user or group of users at a time. So they just gave that user every available resource. These expensive devices cost millions of dollars today, and so are rented to individuals and billed by the hour. However, those individuals found that most of their allotted time was being wasted setting up the equipment for the job. This was costing hundreds of thousands of dollars in lost productivity each month. So in 1956, General Motors Research Lab realized that they can make software for their IBM 701 mainframe to automatically handle the loading and unloading of each job, batch computing. With batch computing jobs are transferred from cards to magnetic tape. The computer with then run them all at once sequentially, with the outputs recorded onto a second tape. Special cards between each job told the computer what resources would be needed to do the jobs. These were called job control languages. There are a few who call these the first operating systems, but the debate between historians on the validity of that statement remains fierce. The 1960s saw better and pricier hardware. Card readers, magnetic tape, drives in I.O. Users realize that not every job used all of the computer's resources, so these expensive resources can be better utilized if different jobs could be run in parallel. Is there some way to take advantage of this? There was. Back in 1956, the Univac 1103a computer introduced a new concept called the interrupt. It let a peripheral hardware call for the processor's attention. At the same time, we introduced new innovations in memory capacity. Items, like magnetic drums, were giving the processing units more memory than they had before. Together, this let the computer hold and run multiple programs at the same time. While one program is occupying something like input output, another can be simultaneously running on the processor. This is known as multi-programming. If you think about it, running multiple programs simultaneously inside a computer is a small step away from having that computer serviced multiple users simultaneously. One of the major problems with batch computing was slow development times. That, in turn, was the result of a long, edit, compile, run sequence. Big batches took hours or even an entire day to run. If there was a bug somewhere, the day's entire output might be just an error message. Immensely frustrating. So in 1959, the computer and cognitive scientist John McCarthy proposed a possible solution to his colleagues at MIT. An operating system that will substantially reduce the time required to get a problem solved on the machine. The only way quick response can be provided at bearable cost is by time sharing. That is, the computer must attend to other customers while one customer is reacting to some output. What this meant was a large central computer connected to what they called terminals, a monitor and keyboard. This gives the user the illusion that they are the only person using the computer. A very powerful software was needed to coordinate all this and provide this illusion. Two years later in 1961, the MIT team, led by Fernando Corvado, managed to get a prototype working on their IBM 709 machine. Lacking hard disk drives, they use a bunch of tape drives attached to four typewriters. It just barely worked. In 1962, MIT announced the compatible time sharing system, or CTSS, as it was called. A year later, CTSS got a hard disk drive and was offered to to large-scale users, though MIT was not allowed to charge for it. Though hints of the feature were implemented for the military's massive SAGE radar coordination system and other specialized systems at the time, we consider CTSS the first time share expressly made for the purpose. By 1965, CTSS had hundreds of registered users at MIT and other colleges across New England, handling up to 30 users at once. It also implemented the first mail and mailbox function between users, a spiritual precursor of the email. Several other time-sharing services emerged throughout the late 1960s and early 1970s. One notable system was the Dartmouth time-sharing system. The basic language was developed on it. An early version of DTSS later powered a popular time-sharing service offered by General Electric. General Electric was the market leader until the mid-1970s when competition overwhelmed them. Today, we have largely forgotten about the phrase time-sharing, though it underpins the idea of what we now call cloud computing, but it had a lasting impact on the history of computers and their operating systems. Before the rise of the PC, this was how people experienced the computer. CTSS's success spurred MIT to create a successor. So in 1964, the CTSSMIT team joined with Bell Labs in General Electric, the market leader in TimeShare's systems to create a new software called multiplexed information and computing service, or Multics. Multics' great vision was to enable a time-sharing computer utility capable of handling hundreds of users. So, kind of like how water utilities provide cheap, ubiquitous water, multics would enable computers to bring computer service to the masses. But the project ballooned as it wanted to be everything for everyone, and progress bogged down. Bell Labs, finally pulled out in 1969 and things fell apart. MIT finally got Multics to work on their own, and was eventually sold to Honeywell, which installed it on a few systems. It gained a cult following and persisted despite Honeywell's determined efforts to kill it. The last site shut down in 2000. Multics's troubles were reflected in another legendary OS project happening at about the same time. In 1964, IBM announced its historic computer line, the System 360. Ideally, a program written for one 360 computer was supposed to run on all of them. That was the whole schick. But programmers struggled to write software that can work on all these different hardware environments. Famously, IBM tried to build a single operating system for it, the OS 360. Despite a monumental budget and an army bigger than the Romans, OS 360 fell way behind on schedule, and in the end they had to split it up anyway. Its project leader, Fred Brooks, later wrote a book based on his learnings from the OS360 experience, the mythical man month. Multics failed as a commercial product, but its groundbreaking ideas, security, hierarchical file systems, a command shell, and more, were incorporated into its spiritual successor, Unix. I already did a video about Unix's development, so I'm not going to reinvent the wheel. But I think it is important to emphasize that Unix was the right thing at the right time. It had many of the revolutionary ideas and multics and added a few of its own. For instance, the pipeline, which let you pipe the output of one process into the inputs of another. It is like the human centipede, but for computer processes. These helpful utilities were written for cheaper, lower-class minicomputers, just as those devices came to be popular with users beyond those of traditional mainframe computers. And because it was written in the high-level C programming language, Unix can be easily ported to other mini-computers. This, and the weird Bell Lab situation that left it in a copyright limbo, helped Unix gain wide adoption in universities and beyond. Unix's rise was a cultural phenomenon that paved the way for other decentralized software communities, like those for the open-source Linux OS and the hobbyist microcomputers. In the mid-1970s, new semiconductor technologies enabled the creation of integrated circuits, with thousands of devices on them. These ICs were powerful enough to be general-purpose chips. The first such microprocessor was Intel's 4004, a 4-bit chip originally made for a calculator and released in 1971. Intel later released the updated 808 in 1973 and then the 8080 in 1974. Other firms like Xilog and Motorola released their own microprocessors as well. These powerful chips would be the heart of what was then called the microcomputer. Intel, then very small, hired a computer scientist and language professor at the Naval Postgraduate School in Monterey, California, named Gary Kildall, has a consultant to produce certain software for their 80-80 chip. Intel needed an 80-cpatible operating system for testing purposes, so they helped Kildol port one he had written while working at the Naval School. The new OS was called CPN, which originally stood for Control Program Monitor. Kildall believed that personal computer hardware was getting good enough to compete with existing time-sharing systems as a programming tool. Remember, the great illusion of time sharing is that every terminal user thinks that they are programming on their own computer. What if that were to be actually true and not an illusion? Key to achieving this dream would be the memory. Existing microcomputer memory systems, particularly secondary memory, sucked. Things like paper tape and cassettes. None of this was acceptable. So Killedall got interested in its new secondary storage technology, first invented and introduced by IBM called the floppy drive. It offered far more storage at a relatively cheap price. Oh, and unlike existing paper tape, it was random access. You can just jump to the data point you want, rather than spooling through the whole thing sequentially. Kildall gets a sample drive from Shugart Associates at the time just a few miles away from Intel. Founded by Storage Legend Alan Shugart. Shugart Associates would later dominate the 8-inch and 5.25 inch floppy drive markets. Now what? So there was Kildall in his room with just a naked floppy drive on his desk and a crude Intel CPU microcomputer. So as you do, he programs a controller software that helped a microcomputer running the CPMOS interface with this floppy disk drive and its data. It might not sound like much, but hardware limitations meant that early microcomputer operating systems were often just that. Just a file system for organizing and managing files on an external disk storage plus the ability to load and run programs on that disk. Disc operating systems or DOS. Killed all found a company called Digital Research and began licensing CPM to microcomputer end users who paid him thousands of dollars. By in 1981, he had several hundred licensees. CPM, retroactively renamed to stand for control program for microcomputers, quickly became the dominant operating system for the small and burgeoning microcomputer community. Though they were not alone in the industry. Others included Apple DOS, the OS for the very popular Apple 2 computer by an Apple computer, a Unix-clonish thing called Coherent, which was ported down for mini computers, and this small thing from Microsoft, and thus DOS. Faithfully, CPM lost its early lead. In 1980, a rogue team at IBM began a secret project to make their own microcomputer, the IBM PC. Facing a tight deadline, the team built the machine together with parts and software sourced from outside vendors. The PC team licensed. The PC team licensed. a basic interpreter from Bill Gates and his company, Microsoft. They were connected to IBM through Gates's mother, who was co-chair of the United Way nonprofit, along with John Opel, IBM CEO. The IBM PC team asked Gates if they knew anyone making a mini-computer OS and he pointed them to CPM. But for reasons that remain unclear today, Kildall did not personally take the meeting with IBM and refused to sign their non-disclosure agreement. So IBM went back to Bill Gates for an OS. Microsoft was then in negotiations with Bell Labs for a Unix license. That effort would eventually result in the Xenix operating system. However, that was not yet done, and there was no time. So Gates went out and bought a DOS from a local computer manufacturer. He then hired its developer Tim Patterson to make a few modifications and rebranded it as MS DOS. Critically, Microsoft did not sell MS DOS outright to IBM, but rather licensed it to them on a non-exercasing. exclusive basis. The IBM version that ran on the PC at its release in 1981 was called PC DOS. To protect PC DOS from clones, IBM wrote part of the OS, the BIOS, to a hardware chip and copyrighted it by publishing it in a journal. The IBM PC, with its iconic name and marketing muscle, quickly became the most popular microcomputer on the market. Its setup became an industry standard, inviting competitors and clones. At the start, MSDOS was a crude piece of software, about 4,000 lines. Nevertheless, it eventually allowed software vendors like VisitCalc to bring their software packages onto the IBM PC platform. Abevia computer makers then managed to work their way around the IBMPC DOS BIOS copyright, kicking off the PC clone industry. Microsoft struck licensing deals with those PC makers rapidly grabbing market share in the industry. Working directly with PC assemblers, or OEMs scaled far better than CPM's approach of going right to end users. Microsoft's MS DOS overthrew CPM has the dominant PCOS. By 1983, they had a fifth of the microcomputer operating system market. Today we might see Microsoft has one-in-one with their operating system. But in the early days, Gates and Microsoft, Moore saw themselves as an applications company. Operating systems were important, who provided 50% of the company's revenues, but they were seen as a means to an end. Gates' thinking at the time was that with an OS you get just a few points of the machine's price, so like $40 for a $2,000 machine. But with an application, you can earn hundreds of dollars. In 1981, their top-selling application was multi-plan, a now somewhat dated-looking spreadsheet application for MS DOS. It sold a million copies over its lifetime. And for that reason, Microsoft in in 1983 was big, but nowhere the giant we know them to be today. That year they generated $70 million in revenues. Very good, but VisitCorp did $60 million, and Lotus did 48. This thinking was why we had these interesting situations with Microsoft offering two operating systems to its customers. MS DOS was for its low-end IBM PC users, and the version of Unix that Microsoft had licensed from AT&T, Xanax, was for high-end users. It took time for Microsoft to realize how powerful of an asset it really had. By 1983, semiconductor hardware got good enough that PC operating systems can start incorporating a few needed features. One of the most needed was multitasking. Computer work was getting more interrelated and complicated involving the outputs of several different programs. For example, making a company report might require a painting program, spreadsheet, and word processor to be open, all at once. With MS DOS and other single-task operating systems of the day, users had to close down the one program running in front of them entirely, which was annoying. Also, the way people interacted with MS DOS was through a command line. You had to type in the right prompt to get the computer to do what you wanted. Deviations in the prompt could give unwanted results. Sounds familiar. By 1983, the PC community narrowed on the windowing graphical user interface, has an elegant solution to these problems. It was first demonstrated by Xerox in the 1970s and later incorporated into the operating systems for the Apple Lisa and Macintosh, sold in 1983 and 1984. Microsoft adopted the windowing GUI for its Windows operating system, first release in late 1985, basically as a shell on top of MS DOS. Throughout the late 1980s and 1990s, the PC ecosystem exploded in size. CPUs and other semiconductor hardware advanced in performance like never before seen. Hardware processes, once only seen on mainframes, quickly made their way to the PC. The PC's modular design encouraged a plethora of hardware peripherals and software drivers. And on the software side, an ecosystem of utilities and applications to suit different environments like the home desktop, the high-performance workstation, and the enterprise server. To handle all of this, Windows evolved a sprawling, modular architecture with each system function handled by a separate OS component. Multiple software layers added new abstractions to help programmers and users navigate these environments. It took years for Microsoft to get this incredibly complicated piece of software working to its full potential. But they benefited as Windows established itself as the dominant operating system, and the company started bundling adjacent software like Office into it. By 1993, Office had 90% of the productivity market, contributing 50% of Microsoft's revenues. Its low prices, in part due to scale and subsidies from Windows, drove competitors like Lotus and WordPerfect out of business a few years later. Thanks to its grip on the PC universe through Windows, Microsoft became the defining technology company of the 1990s. But the sun don't shine on the same dog's butt every day. The first mobile computers were the personal data assistance. These were handheld PCs popular in the mid-1990s for helping people manage their contact information, addresses, notes, and to-dos. Apple had been one of the pioneers in the industry releasing the Newton in 1993. It was an ambitious product, but the hardware was not ready yet, making it difficult to fulfill its promises. For instance, the ability to recognize handwriting. These early devices were extremely constrained in terms of resources. The original Palm Pilot ran on a 16 megahertz processor and 128 kilobytes of RAM. This made them extremely challenging to build for. You can't just scale down a PCOS. Microsoft initially struggled to bring Windows to the PDA market. Their first offering was the Windows CEOS, now Windows Mobile, which they produced with hardware partners. Released in 1996, CE struggled with bad battery life, OS stability, and a very bad interface. Successful companies like Palm produced their operating systems from the ground up with these constraints first in mind. This meant compromises. For instance, the Palm Pilot lacked a keyboard and handwriting recognition, so users had to use a shorthand system called graffiti. It did not take a lot of foresight to see that PDAs and mobile phones will eventually collide. Having seen what Microsoft did to the PC industry, in 1998, three of the largest phone makers joined together and bought into an operating system called Symbion. The Symbian phone OS was produced by a British company of the same name, once producing PDA software. Adopted by the phone maker Symbian became an early leader with 65% market share and 100 million users at its peak. The Symbian failed to build a powerful and lasting ecosystem around its advantages. None of the handset makers wanted to give up their connection to the user, causing serious fragmentation issues and a whole bunch of different UIs. And since it had to serve so many different hardware environments, Symbium was notoriously hard to develop for. The company struggled to build good tools and distribution channels for their developers. Nokia was the leading Symbian phone maker driving 80% of its sales, and while they they grabbed significant market share in Europe and Asia, they struggled in the United States, in part because of the dominant position of the mobile networks, like Verizon and singular. The early 2000s saw more improvements in semiconductor hardware. In addition to faster and more power-efficient processors enabled by the arm instruction set and its ecosystem, the decade saw the rise of flash memory as a compelling secondary memory option. The only thing now missing was a compelling interface to pull it all together, as well as a company capable of cutting through all the red tape that turned Symbian into a convoluted mess. Apple made the first breakthrough with the iPhone, famously building its operating system by scaling down the desktop MacOS. Its multi-touch interface and desktop class browser instantly connected with users. And because it was based on MacOS 10, Apple was able to port over its ecosystem a passionate developer. developers so passionate that they were hacking the OS to make apps of their own before an official SDK was released. The opening of the App Store in 2008 only poured gasoline on that fire. Google saw the writing on the wall and pivoted their Linux-based Android phone OS in the same direction. By giving Android away free via open source, Android rapidly stole share from the then-closed closed source Symbion. Those old legacy operating systems are now gone. What we now call iOS has made Apple one of the biggest companies in the world, and Android is the world's most widely used OS period, and a powerful asset for Google. It is interesting to see how iOS and its deep ties with the App Store have helped drive Apple's massive services division, kind of like how bundled apps like Office made Microsoft king of the tech world in the 1990s. I have noticed that the story of operating systems across its various form factors share a bit. of a theme. In the beginning systems were limited by compute. The first devices, mainframes, microcomputers, and mobile PDAs were not fast enough to run anything other than the most rudimentary programs. Compromises had to be made to get the products to work. Over time, the processors did get fast enough. Now the new limit is memory. Mainframes needed dram and the disk drive to manage multiple tasks and users. PCs had a craving for memory that was eventually fulfilled by the floppy disk drive, and hard disk drives too. And mobile OSs could not produce bigger programs until flash memory got cheap enough. Then finally after that, we are limited by input output, or the interface. We needed new paradigms of communicating and interacting with our computers to get the results we need. For the PC, that was the GUI, from mobile that was multi-touch. So we cycle back to our original question, our LLMs and X operating system. I don't know. but it did notice something. It first took breakthroughs and compute to show that larger new networks had some potential. Then after that, we leverage improvements in DRAM memory to really scale up LLM sizes to where they can show economic value. And then most recently, we needed to find new paradigms of interacting with these LLMs with chat GPT, and off we are to the races. It's fun to ponder the possibilities of an LLM operating system. and where the metaphor can take us. What new abstractions and environments for doing work can an LLMOS do for us? What might that actually look like? I am not sure about the answers for these questions, like I said, but I look forward to finding it out. All right, everyone, that's it for tonight. Thanks for watching. Subscribe to the channel, sign up for the Patreon, and I'll see you guys next time.