Assembly language is a wonderful tool for teaching about how computers work. Professor Sevenich explains how it is used at WSU.
In the core program for our computer science curricula we offer two assembly language courses as elements in that part of our sequence providing hardware emphasis. Although the students do learn to program in this arcane language, the emphasis is on using assembly language as a detective’s tool to learn about the underlying hardware.
Both courses involve the omnipresent Intel 80x86 architecture. However, the first course treats the chip as an 8086/88 and works within the MS-DOS environment. Insofar as is practical within the existing time constraints, we pretend that MS-DOS is not present and try to simulate an embedded systems environment. The essential fact is that MS-DOS puts us in charge of the system resources, i.e., in real address mode. This first course is a prerequisite for our subsequent hardware courses.
The focus of the second course is to examine the architecture elements that support a multitasking, multiuser operating system. For this course we have chosen Linux as the environment. This second course is a prerequisite for subsequent hardware courses as well as for our operating systems sequence. Linux is typically used in the latter. Of course, in such a sophisticated multitasking, multiuser system we no longer have direct control over the hardware resources. It is of central interest to see how the operating system protects itself.
This article discusses the second course. The intended audience consists of those who have an interest in the features of the 80x86 (x >= 3) Intel architecture that support an operating system such as Linux. The two techniques we use for investigating are as follows:
- Write our own assembly language code to probe the architecture.
- Examine assembly language written by others.
These two approaches are discussed in their respective sections later in this article. This article is not an attempt to investigate the Intel architecture (a subject for a large volume), but to describe the tools and resources available to do so.
Virtually all textbooks on the Intel 80x86 architecture assume that the reader is working in a Microsoft environment, usually with the Microsoft Assembler, MASM. Because we are working in a Linux environment, we do not use such traditional textbooks; instead we use as the primary resource the Intel486 Processor Family: Programmer’s Reference Manual (1995), Intel Order Number 240486-003. This is a large manual and of special interest are Parts I and II dealing with application and system programming, respectively. Other useful resources are the on-line Kernel Hacker’s Guide (see http://www.ssc.com/linux/ldp.html), Brennan’s Guide to Inline Assembly (see http://www.rt66.com/-brennan/djgpp/djgpp_asm.html), and the various man pages and info documents available within Linux itself. Using such a set of resources rather than a focused textbook is, of course, typically how a real world software engineer operates.
Why Use Linux? Which Distribution?
Linux is a natural choice for rather obvious reasons:
- It is free.
- It includes a complete set of development and detective tools.
- The source code is available.
- It is an evolving multitasking, multiuser environment making use of the advanced features of the underlying chip architecture.
I recommend Debian GNU/Linux to our students because:
- It is quite stable.
- It can be updated/upgraded nondestructively, in place.
- Various libraries are in the standard locations.
- It is non-commercial, so students can get more seriously involved with maintenance and development later in the curriculum.
- The Debian users and developers are extremely responsive and helpful.
Other distributions such as Red Hat or Craftworks meet most of these requirements quite well also, except for item 4, which is important for our students, but perhaps not to others.
Writing Our Own Assembly Language Programs
We have found it convenient and productive to write our assembly language in-line within C source code. Labels can be interjected in the source code at appropriate places to provide breakpoints for the debugger. The primary motivation for writing in-line assembly language is to examine architectural features. The assembly language statements are AT&T style rather than Intel style. The former seems to be the Unix custom.
Listing 1
/* filename: example1.c */ void some_assembly_language() { asm(" bp0: # housekeeping - stash flags # and eax for later restoration pushf pushl %eax pushf # put copy of original flags in eax popl %eax bp1: # flip all bits in that copy xorl $0xffffffff, %eax bp2: pushl %eax # try to write flipped bit version into # flags popf pushf # puts copy of new flag attempt into eax popl %eax bp3: # housekeeping - restore # original flags and eax */ popl %eax popf bp4: "); } void main() { some_assembly_language(); exit(0);
As a simple example, we’ll exhibit a short program, example1.c (see Listing 1), whose purpose is to examine the flags register which has three types of flag bits: status bits (e.g., the Carry Flag), system flags (e.g., the two bit combination giving the I/O Privilege Level), and a control flag, the Direction Flag. The program does the following:
- Puts a copy of the flags register in the eax register for examination (breakpoint bp1).
- Flips all the bits in that copy (breakpoint bp2).
- Attempts to write that bit-flipped copy into the flags register and then puts a copy of the resulting flags register into eax for examination (breakpoint bp3).
Note how in-line assembly language is supported by the asm macro.
To compile this into the executable program example1.x, containing necessary information for subsequent use by the debugger, we use the -g switch in the following command:
gcc -g example1.c -o example1.x
The next step is to invoke the debugger. It is convenient to also get a log of the debugger activities via a pipe to the tee command so the command line entry would be:
gdb -silent example1.x | tee example1.log
yielding the gdb prompt
(gdb)
Now gdb is ready to run example1.c, while tee will produce a record of our activity in example1.log. The latter can be printed or examined with an editor.
It is beyond the scope of this article to also be a tutorial on the use of gdb; such documentation is readily available in man page and info format. In addition, for use within a browser, one can find, in html format, the FSF document Debugging with gdb by Stallman and Pesch. One current URL for this is: http://funnelweb.utcc.utk/~harp/gnu/tars.
It might be more efficient to first look at the terse, readable introduction to gdb given in Getting to Know gdb by Loukides and Oram in Issue 29 of Linux Journal (September 1996).
Listing 2
(gdb) break bp1 Breakpoint 1 at 0x107b (gdb) break bp2 Breakpoint 2 at 0x107e (gdb) break bp3 Breakpoint 3 at 0x1082 (gdb) run Starting program: /floppy/example1.x Breakpoint 1, 0x107b in bp1 () at example1.c:3 3 asm(" (gdb) info reg eax eax 0x246 582 (gdb) cont Continuing. Breakpoint 2, 0x107e in bp2 () at example1.c:3 3 asm(" (gdb) info reg eax eax 0xfffffdb9 -583 (gdb) cont Continuing. Program received Trace/breakpoint signal (SIGTRAP) 0x1081 in bp2 () at example1.c:3 3 asm(" (gdb) cont Continuing. Breakpoint 3, 0x1082 in bp3 () at example1.c:3 3 asm(" (gdb) info reg eax eax 0x244f93 2379667 (gdb) cont Continuing. Program exited normally. (gdb) quit
Having said that, let’s at least show a typical example1.log (see Listing 2) which shows setting breakpoints and then stopping at those breakpoints to examine registers of interest. Lines starting with the (gdb) prompt are commands entered by the user, whereas everything else is information volunteered by the debugger.
The log file tells the following:
- The original value of the flags register was 0x246.
- `Our attempt to flip all the bits and write the flipped value back to the flags register was only partially successful and that attempt generated an exception (signal SIGTRAP).
The investigator might go through a questioning process rather like this:
- What does the original value of the flags register mean in terms of individual bits (e.g., what is the I/O Privilege Level)?
- Which instruction generated an exception and why?
- Which bits could be flipped and which could not? Why?
Interesting facts are then uncovered. For example, in the log file shown, the ID flag (bit 21) was successfully flipped. According to the Intel documentation this indicates that this processor can execute the CPU_ID instruction. On the other hand, the bits giving the I/O Privilege Level (bits 12 and 13) could not be modified. Clearly, that is expected—the casual user should not be able to change anything that might help get at the I/O hardware directly.
Examining Assembly Language as Written by Others
Typically, even for device drivers, Linux developers do not use assembly language. Hence, it is particularly revealing to examine those very few parts of the kernel which are written in assembly language. These can be found within the Linux distributions with the command:
find -name *.S
entered from the root directory. Of particular interest are these:
- bootsect.S (Intel style instructions)
- setup.S (Intel style)
- head.S (AT&T style)
These are heavily commented, but additional guidance can be found in the Intel documentation and in Alessandro Rubini’s Tour of the Linux Kernel Source, found in the Kernel Hacker’s Guide. These modules do the first portions of system initialization, a process which is completed by C routines. Once they have been executed, the assembly language routines are done. Another module of interest is entry.S (AT&T style) whose tasks are ongoing. In particular, it contains low level routines for handling system calls and faults.
Conclusion
This material should help interested readers start their own investigations of the Intel 80x86 (x >= 3) architecture and the Linux kernel. Much can then be learned about such topics as operating modes, memory management, and building the various descriptor tables.