BYTE Magazine, August 1981
Learning Research Group
Xerox Palo Alto Research Center
3333 Coyote Hill Rd
Palo Alto CA 94304
The amount of information in a person's brain is truly vast; even the amount accessed in the course of few hours of thought is vast. This is in contrast to the amount of information in the main memory of a computer, which is minuscule by comparison. The exciting thing about computers, though, is that we can use them to extend and enhance our thought. If a computer is to serve effectively as an aid to thought, it must be able to hold enough information to be useful. However, the memory of the largest computer today is so small that it severely limits what that computer can do. There are so many orders of magnitude between the capacity of the brain and the capacity of a computer that given the question, "How much memory will the computer need?" the answer should always be "As much as possible."
Software for personal computers is just crossing a threshold of usefulness and flexibility. There are tasks, such as revising a draft of a paper, which are tremendously easier to do with a computer than without. Once you have edited with a computer, it seems absurd to edit by hand. The number of tasks for which the computer is essential is growing rapidly, causing a very sharp rise in the demand for storage in each personal computing system. As we design more useful aids to human thought, we will immediately want to access an amount of information closer to the amount in someone's head. Many extraordinary ideas will become software realities in the next few years. And large quantities of memory will be needed to run and store all of that wonderful software.
The practical limit on the size of a computer's memory is cost. Every project, especially a personal computer, has cost limits. The question becomes how to get the most memory for the least cost. Roughly speaking, memory falls into two categories: fast, expensive memory and slow, inexpensive memory. Main memory and core are common names for the fast, semiconductor memory. The slow memory, secondary memory, is almost always a disk. If we bought all slow memory, the processor would continually wait for the disk and would give very poor performance. If we spent all our money on fast memory, we could not get very much of it, and many of the bigger and better programs would not fit in. The game is to buy some fast memory and some slow memory and arrange things so that the processor rarely has to wait for the slow memory. This game, and specifically the mechanism which hides the slow memory from the processor, is called virtual memory.
If there were no way at all to predict which byte of memory the processor might want next, it would be impossible to win the game of virtual memory. However, pieces of data that are used together are often stored together, and program instructions tend to be executed and stored in a sequence. The principle of locality of reference states that the processor is most likely to access a memory location very near the last ear the last one it accessed (see reference 2 at the end of this article). The game of virtual memory is based on a trick: when the processor starts to ask for bytes from a block of code or data, it should move that code or data into the fast memory. If the processor continues to aceess that information, all of the accesses will be to fast memory. When the program moves on to a new activity, it may again be forced to get its information from the slow memory. To win the game, a virtual memory must maintain a situation where most of the processor's accesses are to the fast memory. If the strategy fails and the processor often wants data from the slow memory, the entire system will run very slowly.
The act of moving programs or data between the two kinds of memory is called swapping (see figure 1). The program that the user is running may or may not control the swapping explicitly. Overlays are large groups of subroutines that are moved to and from the disk under control of the user program. In an automatic virtual memory, however, the user program is unaware that swapping is occurring. The programmer does not specify how the program should be divided up into pieces or when swapping should occur.
Figure 1: Main memory, secondary memory, and swapping combine to form a virtual machine that seems to have more memory.
In certain cases, letting the programmer control swapping directly can result in good performance. However, the virtual memory game is very complex and is played very quickly inside the computer. We believe that the programmer should not be burdened with deciding what part of the data to swap and when to do it. Asking the programmer to instruct the virtual memory is like asking a race car designer to write down, for the driver, exactly how to move the steering wheel in some future race.
In this article, we first look at a common type of automatic virtual memory called paging. We then introduce a new type of virtual memory that takes advantage of its knowledge of objects. We describe in detail a specific object-oriented virtual memory for the Smalltalk-76 system and explain how it plays the virtual memory game better than a paging system.
The most common kind of automatic virtual memory is called paging. In paging, the program is cut up arbitrarily into pieces. Each piece is called a page and contains the same number of bytes as every other page--say, 512. There are many more pages than will fit into main memory at once, so most of them stay on the disk. The processor knows only about byte addresses in one large address space called the virtual address space. Every time the processor accesses a byte, the address of the byte is checked. The high-order bits of the address tell which page contains that byte. (The low-order bits tell which byte within the page.) If that page is not in main memory, the user program stops. The virtual memory program starts up, finds an old page, moves it to the disk, and brings the desired page into memory in its place. (We will use the term "memory" to refer to the fast, main memory only.) The act of discovering that a page is needed from the disk and bringing it into memory is called a page fault.
An advantage of paging is that it works regardless of the contents of the pages. The mechanism needed to determine whether a given page is in memory is simple. Many computers have special hardware to speed up the translation between an address in the virtual space and a page in memory.
There are problems with paging, however. If the program needs a particular byte, the entire page surrounding that byte must be brought into memory. If no other bytes on that page are useful at the moment, a large amount of main memory is wasted. Since programs are cut up arbitrarily into pages in the first place, it is common that the rest of the page has nothing to do with the part currently wanted. Sometimes a significant fraction of memory is taken up by pages from which the processor wants only a few bytes (see figure 2). These pages crowd out pages containing other parts of the program, causing many pages to be swapped to run the rest of the program. The many accesses to slow secondary memory cause the whole system to be slow.
Figure 2: Virtual memory by paging.
Another problem with paging is that every address of a byte or a word must be a long address. When an object-oriented language is built on a paging system, a pointer to an object typically the address of the first word of the object. Every pointer must be capable of reaching any word in the entire virtual space, and each one must have enough bits to span the space. Pointers comprise a large fraction of many programs and data structures. If they could be shorter, more of the program could be packed into one page in memory, and the entire program would take fewer pages of memory.
Smalltalk is a system composed of objects. An object is little package of functionality. It contains the values of a few variables or a small piece of program. The important thing about an object is that its parts belong together. If a program wants a part of an object, it probably wants other parts, too. Different pieces of information were packaged together in that object exactly because they will be used together. Locality of reference is strong inside an object and, in general, weak between objects.
An object-oriented virtual memory swaps individual objects instead of entire pages between disk and main memory. Objects that are brought into memory are packed end to end with the objects already there. Memory is thus entirely filled with useful or likely-to-be-useful data. A larger percentage of memory is actually holding useful information than it would under a paging system. The result is that a larger part of the program can fit into memory at once.
There is a penalty for swapping objects, however. Objects are generally smaller than pages, and there are a lot of them in memory at once. The virtual memory program must keep track of which object is in which place in the memory, and it must be able to find out where each object came from on the disk. Managing individual objects is more complicated than managing pages, but the advantage of packing main memory with useful objects makes up for the time spent managing the objects.
By object-oriented virtual memory, we mean a system that swaps objects which have meaning in the high-level language and which are typically small. Segments in the B5500 (reference 4) and objects in HYDRA (reference 5), while being the units of swapping in their systems, are large. These "objects" require tens or hundreds of bytes of overhead information each. An object-oriented virtual memory, in our sense, gives an object the same swapping freedom as a segment and shrinks the overhead to a few bytes per object.
An object consists of fields, which hold the values of their named and indexed instance variables. Each field contains a numeric value, which can be interpreted as itself or (usually) as a pointer to another object. This number, called the object pointer, is the unique identifier of the other object. Every object has an object pointer. Given an object pointer, the virtual memory must be able to locate that object, whether it is in memory or on the disk (see figure 3).
Figure 3: Objects and object pointers (as seen by casual observers). The "magic" is the unspecified process of translating the value of the object pointer to the actual address at which the object is stored.
Creating, destroying, and moving objects in memory is the job of a storage manager. The virtual memory program takes the place of the storage manager (as described in Glenn Krasner's article, "The Smalltalk-80 Virtual Machine," on page 300 of this issue). It fetches and stores the fields of objects, creates new objects, and collects and manages free space. It also keeps track of the length of each object and the Smalltalk class of each object.
When the interpreter is working on an object that is in memory, the operations of fetching a field and storing a field must run fast. Both the fetch and store operations specify an object by giving its unique object pointer. The translation from the object pointer to the object's location in memory must be fast. The virtual memory spends most of its time doing this translation. A fixed correspondence between object pointers and locations in memory does not work, since almost any combination of objects may be in memory at the same time. The translation from object pointer to memory location must be highly variable.
Once in a while, the interpreter attempts access to an object that is not in memory. The virtual memory must detect the attempt, find the object on the disk, and bring it into memory. This process is called an object fault. Sometimes other objects must first be removed from memory to make room for the incoming object. In order to find an object on the disk, there must be a correspondence between an object pointer and that object's location on the disk. The data needed to hold this,correspondence must be compactly represented, as there may be many objects in the system.
In 1975 and 1976, Dan Ingalls and I designed and built a virtual memory to support the Smalltalk-74 system, called OOZE (Object-Oriented Zoned Environment). It then became the foundation for the Smalltalk-76 system (reference 3). The combination was very successful, and many interesting projects have been built in it. OOZE serves as an excellent illustration of a usable object-oriented virtual memory implemented entirely in software. At the end of this article, we discuss possible modifications of OOZE for the Smalltalk-80 system.
For OOZE to play the game of virtual memory well, we had to design it to fit the rules. Economics (of our existing hardware) dictated the size of main memory, the size of the disk, and the ratio of their speeds. The rules also included the things that the Smalltalk interpreter expected objects to do. We considered these and decided that in OOZE an object pointer would be 16 bits long, to fit into a machine word. We wanted every combination of 16 bits to be a legal object pointer, giving a total of 64 K objects. With a mean object size of 10 to 20 words, this was a good match to the size of our disk. To guarantee good performance during a fault on an object, we specified that any object can be brought into memory by reading, at most, one place on the disk. We did not allow one disk read to look up the disk address and another disk read to get the actual object.
The design of OOZE centers around the handling of the two important object pointer translations. Finding an object's location in memory from its object pointer must be fast. This mapping must also be flexible, since the exact combination of objects in memory changes from moment to moment. The correspondence between object pointer and memory location is a large hash table, called the Resident Object Table (ROT). Of the 64 K objects on the disk, perhaps 4000 are in memory at once. Each of these has an entry in the ROT. To find the location of an object, the hash routine uses the object pointer to compute where to look in the ROT. If it finds an entry whose object pointer matches, that entry also contains the memory address of the object (see figure 4). If the hash routine finds no match in the few entries it searches, the object is not in memory. The magic puffs of smoke in figure 3 depict the act of hashing an object pointer into the ROT to find its memory address.
Figure 4: Hashing an object pointer in the Resident Object Table (ROT).
OOZE must maintain the ROT. When an object is brought in from the disk, OOZE hashes its object pointer and looks in the ROT. When it finds an empty entry among the few possibilities, it claims that entry for the new object. Conversely, when an object is removed from memory and put back on the disk, its entry in the ROT is marked empty. Sometimes an object moves in memory, and its memory address in its ROT entry must be updated (as referred to in figure 4).
Hashing object pointers into the ROT to find memory addresses is the highest bandwidth operation in OOZE. If hashing were supported by special-purpose hardware, the hashing operation would not consume much time. (Many machines provide similar hardware support for paging.) In our implementations of the Smalltalk-76 system, the best we were able to do was to write the ROT hashing algorithm in microcode. In spite of this, OOZE spends a large fraction of its time hashing into the ROT. Any hash that can be avoided saves time. We modified the Smalltalk interpreter to remember the memory addresses of certain frequently used objects. During the straight-line execution of a Smalltalk method, the interpreter holds the memory address of the currently executing method, the receiver, and the object on the top of the stack. Smalltalk spends significantly less time in OOZE when hashes of these frequently used objects are circumvented.
Hashing into the ROT is optimized in yet another way. As mentioned before, the hash routine uses the object pointer to compute a series of places to search in the ROT. The entries examined form a chain, with different object pointers having different chains. These chains crisscross throughout the ROT. An entry on one chain is many times filled by an object pointer from a different chain that also uses this entry. The hash routine is searching for an entry that matches a certain object pointer. The search will succeed faster if the chain has all its own entries at the beginning and all other chains' entries at the end. The algorithm for deleting an entry from the ROT provides this optimization. After deleting the proper entry, it shuffles the remaining entries and moves them forward in their chains. Because of this. strategy, the average number of entries examined to find an object in memory is only 1.8. Typically, the resident object table is 80 percent full.
The translation from an object pointer to the disk address of the object is also important. Since a list of the disk addresses of all 64 K objects would easily fill up main memory, OOZE must use a trick. Instead of object pointers being assigned randomly to objects, information is encoded in each object pointer. This is done by dividing the set of object pointers into pseudoclasses. The bits in the upper part of the pointer indicate to which pseudoclass that object belongs. All objects in a pseudoclass have the same Smalltalk class and have the same length. The Pseudoclass Map is a table that is in dexed with the pseudoclass number. There OOZE finds the length of the object and its class (see figure 5). A single Smalltalk class may own as many pseudoclasses as it needs to cover all of its instances. Classes whose instances may have indexable variables, such as class String, own a different pseudoclass for each length or range of lengths. The pseudoclass encoding saves space because each object does not use a word to hold its class or a word to hold its length. Objects in memory in OOZE are actually two words shorter than objects in the Smalltalk-80 system.
Figure 5: Information encoded in an object pointer.
The disk address of an object is also found by using its pseudoclass. All objects in a pseudoclass are the same length, and they are stored consecutively on the disk. By knowing which object we want within the pseudoclass, we can compute its offset from the beginning of the pseudoclass. If we know the starting disk address of the pseudoclass, we can add the offset and find the object. The Pseudoclass Map contains the starting disk address of the object's pseudoclass (see figure 5). The low bits of the object's pointer tell which object it is within the pseudoclass. This encoding allows the disk addresses of all 64 K objects to be stored in 512 words of memory. (There are actually two additional levels of translation for the disk address. Tables for these take another 740 words).
By using the Pseudoclass Map, OOZE can find the disk address of any object from its object pointer. If it is in memory, OOZE also finds the object from its object pointer. Thus the same object pointer serves to identify and find an object, no matter where the object is. Because moving an object between disk and memory does not change its pointer, fields that point to the object need not change when the object moves. A field always contains the object pointer of the object to which it refers, regardless of the field's location and regardless of the object's location.
The management of the swapping space has several aspects. Objects are created and destroyed by Smalltalk upon request, and they are also moved in and out of memory. Each of these actions causes insertion or deletion in the ROT and allocation or deallocation in memory. Consider a class that wants to create a new instance: the new instance must receive an object pointer whose pseudoclass is already owned by that class. For this reason, we treat free instances of a class as legitimate objects. They "belong to the class" and can be swapped to and from the disk just like normal instances. Each class keeps a linked list of free instances. The class thinks that there are an infinite number of free instances on the disk waiting to be swapped in. To create a new instance, the class merely pulls the first object off its "free list." If that object is not in memory, a fault brings it in from the disk. When a free list on the disk runs out, OOZE constructs new free instances on the disk as they are requested.
We have reduced the problem of managing memory and the ROT to the problem of swapping. Main memory has some free blocks between the areas being used for objects. These free blocks are linked together on free lists according to their size. The ROT also contains unused entries, which are marked as such. During an object fault, OOZE claims a free ROT entry and a proper-sized block of memory for the incoming object. Occasionally, OOZE cannot find a legal ROT entry or a free block of memory that is large enough. The fault routine stops, and OOZE starts purging objects from memory by copying them to their proper places on the disk. It then frees the memory space and ROT entries of the objects it throws out. When the purge routine finishes, the fault routine resumes its work.
The purge routine must decide which objects to throw out of memory. To play the game of virtual memory perfectly, OOZE should keep the objects which will be used soon and throw out those which will not. Since OOZE cannot see into the future, it throws out the least recently used objects. Objects that have been active recently are kept, and inactive ones are tossed out. The purge routine examines objects in memory in an order determined by their object pointers. Consider the space of all object pointers to be a circle. The purge routine tours the circle, keeping objects that have been accessed since the last time around. Objects that have remained unaccessed since the routine last visited them are purged to the disk (see figure 6). Typically, it takes several calls on the purge routine to complete a tour of the circle of object pointers.
Figure 6: The order in which objects are purged from main memory.
Purging objects in order of their object pointers has a very important side effect. Since all objects in a pseudoclass are consecutive on the disk, purge sends out the objects it is purging in the same order that they appear on the disk. This minimizes the movement of the disk head and saves time.
Objects that have not been changed since they came in from the disk do not have to be rewritten. They are correct as they stand on the disk. A single bit in each object in memory tells if it is "dirty" (ie: if it has been changed since it was copied to memory). If an object about to be purged is not dirty, we do not rewrite it on the disk and thus save time. This savings can be enhanced by purging in the background. Normally the purge routine runs in response to an immediate demand for space in memory. A special version of the routine runs when Smalltalk is idle and looks ahead in the circle of object pointers. It writes dirty objects to the disk and marks them as being "clean." A subsequent demand call on the purge routine will run quickly because many of the objects it wants to throw out are already written on the disk.
After each round of purging, the degree of fragmentation of memory is tested. If there are too many small blocks and no big ones, we perform a compaction. All objects are moved to one end of memory and all free blocks are merged into a single block at the other end. Memory addresses are updated in the ROT entries of the objects that have moved. OOZE performs this operation without using additional storage in order to keep a list of which objects have moved to which place in memory.
As a storage manager, OOZE must detect when an object is no longer being used. Like the storage manager mentioned in Krasner's article, OOZE uses reference counting. When the reference count of an object goes to zero, its object pointer is not in any field of any other object. At this point, it is impossible for that object ever to be accessed by the interpreter. OOZE, therefore, puts the object on its class's free list. Before doing so, however, it decreases the reference count of the object pointed to by each field of this object. In the process, more counts may go to zero, and more objects may get freed. To save space, reference counts are only four bits wide. The few objects with fifteen or more fields pointing at them are noted in a separate overflow reference count table.
An average Smalltalk-76 system with contains 40,000 objects and occupies one megabyte of disk space. In main memory, the system uses 96 K words, including 8 K words for the ROT and 40 K words for the objects that are currently swapped in. (We sometimes run with only 64 K words of memory.) On the Alto computer (see reference 1), we implement hashing into the ROT and the allocation of common objects in microcode. Performance is equivalent to a paging system with several times the swapping space. The OOZE virtual memory has allowed the Smalltalk-76 system to grow from an experiment into a system for building large and serious applications.
OOZE was designed in 1975. Several rules of the virtual memory game have changed since then. Here are some ways in which OOZE shows its age at Xerox PARC:
Naturally, our minds have turned to building a virtual memory for the Smalltalk-80 system with even better performance than OOZE. In 1980, a group of us at Xerox PARC designed just such a system, the Large Object-Oriented Memory (LOOM). Individual objects in LOOM carry slightly more overhead than objects in OOZE. (Since users can afford more memory, this is not a problem.) Besides allowing a much larger virtual space and unlimited classes, LOOM provides some new properties:
The goal of the virtual memory game is to make a mixture of fast and slow memory perform almost at well as if it were all fast memory. The strategy is to guess what information the processor will need soon and move it to fast memory. In an object-oriented language such as Smalltalk, the object is an excellent unit for locality of reference. Once an object is accessed, it will most likely be accessed again soon. Recently used objects have a similar degree of locality to recently used pages, and many more objects than pages fit into a given amount of fast memory.
OOZE is the first representative of the new category of object-oriented virtual memories. These systems use a construct in the high-level language, the object, as the unit of swapping. Objects as small as one field in length are swapped individually by the same mechanism used for large strings and arrays. To be a member of this category, a virtual memory must also have automatic control of swapping and automatic creation and freeing of objects. While OOZE is implemented in software, we believe that future systems will be implemented like languages: hardware assist for a few high-bandwidth operations, some microcode, and support code in machine- or high-level language. We expect that mature object-oriented virtual memories will identify groups of objects that are used together and swap them as a unit.
As the virtual memory which supports the Smalltalk-76 system, OOZE is interesting in itself. It provides the ability to address 2N objects with N-bit pointers. Only currently active objects occupy memory, and they are packed end to end. This provides exceedingly good use of memory. Because the class and length of an object are encoded in the object pointer, that information does not occupy space with each object in memory. Movement of the disk head is reduced because objects are purged to the disk in the order of their disk addresses. OOZE is implemented in software without any special hardware support. It runs in an amazingly sprightly fashion and performs as well as paging systems with several times the swapping space.
The fact that Smalltalk uses objects consistently and completely allows its virtual memory to be radical in design. Object-oriented virtual memories get their power from a close coupling with the high-level languages they serve. The success of OOZE and the changing rules of the virtual memory game have inspired the design of LOOM, a larger and more efficient object-oriented virtual memory.
Here are a few ads found alongside the article: Creative Software ("Wondering where to find programs for your new VIC?"), Spellguard ("Proofreads 20 pages in under one minute"), and Mark Gordon Computers ("[TRS-80] Model II 64K System: $3499.00").