Introduction to Unix

Why Unix was Developed

Unix started because two programmers at Bell Labs couldn't get AT&T to buy them a computer with an operating system. They talked the Legal Department into letting them share the legal computer if they would write some word processing software for it.

The legal department computer did not have a multi-user interactive operating system. At that time, the only such system was the MULTICS system at MIT. The two programmers, Kernigan and Thompson, had just come to Bell Labs from the MIT MULTICS project. They wanted an interactive computer system. Since they were supposed to be doing operating systems research for Bell, they wrote a simple interactive operating system for the Legal Department PDP8. This first version was a simple single user system that was interactive. They called it "unix" as a pun on "MULTICS."

This first version of an interactive operating system was written in the assembly language for the PDP8. It was very small and quite limited. The guiding philosophy of Unix began to emerge when they upgraded the Legal Department Computer to a PDP11. This upgrade was just enough different to require a major rewrite. They decided to try writing it in a higher level language to simplify porting it to new machines. The first language they tried was a sort of super assembler that was in use at Bell Labs, called BCPL, or "B" for short. This language had only one data type, integer, that was used for everything. However, it was close enough to the hardware to do things that normally required assembly language code to accomplish. The limitations of "B" soon became too serious for writing a system like an operating system and they began adding the needed features to the "B" language. By the time they were through they had a language they could use to write an operating system, but it didn't look very much like "B." To avoid confusion, they renamed their new variant "C" since it superceded "B."

This was the first "multiuser" Unix. At this time, operating systems were a relatively new concept. Originally, all I/O code was written right into the program. A program was written to run on one machine and only one machine.
To move it to a different machine, it had to be rewritten. Programmers soon realized that they were spending excessive time writing the same code over and over again, to do things like send a character to the printer. Soon the concept of an "I/O Deck" was established. There were a set of common subroutines to handle I/O requirements. You merely included a set of the "I/O Deck" cards at the back of your program specific cards when you loaded your program. Then you didn't have to rewrite all of the I/O code for each program. It wasn't long after this "I/O Deck" business before someone pointed out that the same deck of cards was added to every program deck that they loaded. It made sense to just leave the "I/O Deck" loaded in a standard place in the machine, and let everyone use it without having to carry the cards around all of the time. The last operating monitors of this type were the ones developed for microprocessors in the seventies. These were CP/M and MSDOS. They provide for I/O operations by making what are basically subroutine calls to "known entry points" to perform standard I/O functions. They only allow for one process to be running at any given time, so the more advanced functions of operating systems, such as scheduling and resource allocation, are not required.

When Unix went "multiuser" all of a sudden the need to consider resource sharing and control became important. Unix used a unique approach to these problems. The easy way to determine which process should be granted control of devices like printers and disk drives was simply to give them to a process that was always running, and not allow them to be assigned to any other processes or users! They created these processes, which they called "daemons" to control the specific devices on the system. Then, if you wanted to print something, you didn't take control of the printer. Instead, you sent a message to the "printer daemon" along with a copy of the information you wanted it to print. The printer daemon then took care of seeing that your printout was not intermixed with someone elses'.

This concept, of running processes that perform special tasks when they receive a request from another process, is now called "object oriented" or "client-server" programming. You can think of the "printer daemon" as a print server, that responds to requests from "client" processes. Alternatively, you can consider it an a "printer" object that performs a "printer" task when you send it a "perform" message. This paradigm permeates Unix. Whenever a function or task is added to the capabilities of Unix, a "daemon" is created to perform it on request. There have been so many "daemons" added to Unix over the years, that there is now a special "daemon master" daemon that receives all task requests and then starts the appropriate daemon to take care of the request. Otherwise, there are so many processes running all of the time, that there is no room on the system for users!

Some other ideas of Kernigan and Thompsons were also written in to the design of Unix. These were the concepts of "filters," "reuseable tools," and a consistent appearance and procedure for using I/O objects. These concepts are the primary reason the Unix operating system is favored by programmers who use it.

The key to the implementation of "filters" and "reuseable tools" lies in the concept of standardized appearance for I/O objects. In Unix the same system calls are used for reading and writing all I/O objects. The specific object knows what to do with these requests. The basic I/O requests are open, close, read, write, and seek. Different I/O devices may respond to these requests in different ways. For example printers typically ignore requests to "read" while terminals may ignore a "seek" request. The same system calls and syntax are used to put a message on a terminal as to write output to a file on disk. Each device has a pair of device drivers, a major driver and a minor driver. The major driver handles the entire class of devices, such as all hard disk drives, or all floppy disk drives. The minor device driver handles a specific instance of the class. The major driver translates the requests into the appropriate format for the specific device driver and sends the request on. The system creates an I/O control table for each device. This table points to the approp-riate "filter" and function for each of the I/O functions.

The "filter" concept involves writing utilities like a piece of pipe that can be "inserted" between other programs to perform functions upon the data as it passes through the "filter." The filter was enabled by the Unix concept of "standard I/O." Every function in Unix starts out with three I/O devices already open. These are usually channel 0, 1, and 2. Channel 0 is "standard in." Channel 1 is "standard out." Channel 2 is "standard error." A Unix program written to function as a filter, will use "standard in" for all input, and "standard out" for all normal output. Error messages and nonstandard output is routed to "standard error." Normally all three are pointed at the users terminal. A redirection operator on the command line, or redirection calls in the application, can "redirect" any of these three standard channels to any device or file on a Unix system. In addition, a concept unique to Unix, called a "pipe" allows the standard output of one process to be connected to the standard input of another. Filters can then be "plumbed" together with "pipes" to perform sequential processes on a data stream without having to rewrite code. It can be done directly on the command line, just by stringing the commands together separated by the "pipe" operator. The traditional example for this concept is a Unix spelling checker that is built like:

cat filename | sed "s/' ',\t,\n"/\n/" | sort | uniq | dict

In this line, the "pipe" operator is a "|" symbol. The Unix utilities used in this example are as follows:

cat read in the file "filename" and sent it to standard out.

sed read standard in, performed the editing command on it, and write it to standard out. This editing command changes blanks, tabs, and newlines to newlines. This has the result of placing each word in the input on a line by itself.

sort read standard in, sort the lines alphabetically, and write the result to standard out.

uniq read the standard in, eliminate duplicate lines, and write result to standard out.

dict read standard in, and look up each word in the dictionary. Send those NOT found to standard out.

These two concepts of filters and reuseable tools, facilitated by the operating system that makes all devices appear the same to the software, is the heart of the Unix system. These concepts allow a programmer to retain and reuse experience. Every program you write, becomes a tool for further program development. The reuseable tool concept can be greatly enhanced by setting some standards for development and for computer languages that make it easier to build and reuse these tools.

Some of the standards that I have found useful, and are all made easier to apply by the use of the "C" language, including its subsequent variants like "C++" and "OOPSC" (pronounced "oopsie" ) that extend the basic "C" language more gracefully into the object oriented world, are given here.

Never use a global variable unless there is no other way to accomplish the needed access. Every case I have seen so far I have been able to find a better way, so use globals with great care!
Each function should only do one thing. If more than one thing needs to be done, group several functions. It is allowable, where several functions use the same variables, to make those variables global to only that small group of functions, as long as the variables are not visible outside the group.
Arguments to functions and subroutines should generally be passed by value. Do not pass the location of the actual variable unless it is absolutely necessary. It is better to pass the current value to a function, have the updated value returned, the then have the calling routine, which owns the variable do the update using the value returned.
All the above rules come under the general heading of "data hiding." To prevent side effects and problems it is always best to minimize the access to data to only those functions that need it. If a number of different functions need the data, or need to be able to change the data, it is usually wise to create a special function to modify that data that can only modify it in the exact ways required. Then the other functions that need that data item can access it by calling or requesting the data control function. Do not create one function that controls all of the data. One function should only control one data item, or at most, one small group of tightly related data items.
Each function should be documented at the front of the function. The documentation shall include the list of arguments passed to the function, with their data types and range of allowable values, the values returned by the function for both successful and unsuccessful completion. The error codes should be defined so they can be understood and interpreted correctly by the calling function. The range of acceptable valid returns when the function runs correctly should be defined also. A statement defining the relationship between the arguments given and the output generated or provided should be clear and exhaustive. The documentation should be generated prior to the function being coded. The coded function can then be tested against the internal documentation.
Low-level functions should be designed to be as generic as possible to facilitate reuseability. Generally, the lower level a function is in a program heirarchy, the more likely it is to be useful in other program situations.

These are a few of the kind of considerations that facilitate reuseability of program constructs. It is relatively easy to use system tools to create libraries of these functions generated during program development. By pulling the documentation paragraphs out of each function and combining them into a single document, a library dictionary can be generated that provides enough information to readily make use of the functions in the library. This allows a positive feedback from a programmer's effort to make it progressively easier to generate new programs and meet changing requirements. That concept is fundamental in the philosophy of Unix as an operating system.

Revised 07/02/2001



	© 2001 Board of Trustees Southern Illinois University Carbondale