Peter’s Programming Notes: Assertions

Understanding assertions and how and when to use them.

By Peter N Lewis, Perth, Australia

Peter N Lewis is a successful shareware author. He founded Stairways Software Pty Ltd in 1995 and specializes in TCP/IP products but has been known to diversify into other areas.

Introduction
Why you should use assertions
What is an assertion
When and where to use assertions
Compiler generated assertions and warnings
Duplicate your code and check your data structures
How to implement assertions
Extending assertions
Conclusion
References

Introduction

Many programmers believe it is impossible to write bug free code. They just assume bugs are a part of life and that beta testers and QA departments (or even end users!) will find and report the bugs which will then (hopefully) be tracked down and resolved. I'm not sure I believe it is possible to write bug free code, but one thing I certainly believe is that it will not happen without a conscious effort on the part of the programmer.

Once you have decided that writing bug free code is a worthwhile goal, the first and most important tool at your disposal is the "Assertion". An assertion is simply a piece of code that validates part of the state of your program, and alerts you if something is wrong. This article describes assertions, why you want to use them, what they are, when and where to use them, and how you implement them. Throughout this article I will use examples in Pascal or C, but the concepts apply to almost any language.

Why you should use assertions

Bugs come in many shapes and sizes, but there is a general rule of thumb that the earlier you detect a bug the less time it takes to fix it:

If you detect it as you are typing, it takes basically no time.
If you detect it during syntax check, compile, or link, it takes only a few seconds.
If you detect it as soon as the program launches, it takes only a minute.
If you detect is in your own testing, you probably will not waste much time, especially if an assertion fires to tell you exactly what went wrong.
If you send it to your beta testers/QA department, you are wasting days (and other people's time)
If you ship it and your end users find the bugs, the results can be arbitrarily bad (imagine your company going out of business because of bad reviews of your buggy product!)

Assertions make it easier to detect bugs earlier. By automatically detecting bugs you can find them before the bug has a chance to cascade and destroy the evidence. The more liberally you use assertions, the more quickly you will find bugs. And because assertions can be "compiled out" of your code (I'll show you how to do this in C and Pascal later), assertions only slow down your beta versions so you are free to use lots of them.

What is an assertion

At its most basic form, an assertion is simply a procedure that takes a boolean parameter and reports (to the programmer) if the boolean is false, for example:

procedure Assert( must: boolean );
begin
  if not must then begin
    DebugStr( 'Assertion failed;sc' );
  end;
end;

You can report the error using any method you like (DebugStr, Alert, writeln/printf, etc), and can include any extra information you want (such as the source file and line or a text message explaining what happened). I generally use DebugStr since it will work even in interrupt level code, but it does require you install MacsBug or some other low level debugger. Also, since the Metrowerks Debugger can catch DebugStr and leave you pointing at exactly the place where the assertion failed (once you step out of the Assert function), there is no need to include an explanatory message.

It is important to note that assertions are not a form of error checking. Assertions exist to detect programatical errors, they are not useful for detecting real life error conditions like disk or network errors - errors such as these must be detected, handled and reported by error checking code that remains in the shipping version and that reports to the user in a helpful manner. Assertions are to help you as the programmer, your users should never see them if you do your job properly.

When and where to use assertions

The short answer is to use assertions everywhere. Anywhere you are using facts about your programs state that are not obvious from the proceeding lines of code, you should consider using an assertion to confirm that the "facts" really are true. The most important places are:

At the start of each procedure (check that the parameters are acceptable)
At the start of each loop (check that the loop invariants hold true)
At the end of each loop (check that the loop has done its job)
At the end of each procedure (check that the procedure has done its job)
Before using any pointer (check that the pointer is not nil)
Before using any structure (check that the structure is valid)

For example, say we want to write a routine that accepts two string pointers, source and dest, where the source is suppose to be an 8 character string of lowercase letters, and its job is to uppercase the string and store it in dest. With assertions added, we might write it like this:

procedure Uppercase( source: StringPtr; dest: StringPtr );
  const
    required_length = 8;
  var
    i: integer;
begin
  Assert( (source <> nil) & (length(source^) = required_length) );
  Assert( dest <> nil );  // Ideally we would like to test that dest^ is long enough
  Assert( source <> dest );  // Ideally we would like to test that they do not overlap
  dest^[0] := chr(required_length);
  for i := 1 to required_length do begin
    Assert( source^[i] in ['a'..'z'] );
    dest^[i] := UpCase( source^[i] );
  end;
  Assert( EqualString( source^, dest^, false, true ) );
end;

So we start off checking the preconditions (that source is not nil and is the right length and that dest is not nil). We should really also check that source is made up entirely of lowercase letters, but we defer that to the loop where it is easier to check. And then at the end we check that we have done the job - it is a pretty loose test, (checking only that dest is case insensitively equal to source), but it at least checks that we have done something like what we said we would do.

We also assert that source must not be the same as dest - it would be easy enough to ensure that the code worked properly in this case, but I don't feel like checking the code for that case so instead I save myself some work and simply disallow it - if at some future time a programmer tries to use this routine with that case they will immediately get an assertion, they can then either fix their code to use two strings, or update Uppercase to ensure this code works properly in the case where source and dest are the same.

This is another use for assertions, they are a form of self documenting code. If I simply added a comment to the documentation that source and dest are not allowed to overlap, a programmer might not notice and might accidentally use the procedure in this manner. Worse, the code might work sometimes but not always. It is much better to enforce the restriction in the code so that a any future user of this routine immediately learns of their mistake.

Compiler generated assertions and warnings

It is worth noting that the compiler is capable of generating some assertions of its own, and you should take advantage of these whenever possible. Take the time to go through the compiler settings and ensure all possible warnings and checks are enabled. For example, the compiler may have range checking or nil checking options. It may also be able to detect things like unused variables, variables used before they are initialised and functions that do not return results. These warnings and errors can save you a lot of time so turn them on!

Duplicate your code and check your data structures

In the example above, we checked at the end of the routine that dest was case insensitively equal to source. We can actually go further and check that dest is exactly what it should be by duplicating the routine, something like this:

  var
    i: integer;
{$ifc do_debug}
    test_string: Str255;
{$endc}
begin
   
{$ifc do_debug}
  test_string := source^;
  UpperString( test_string, false );
  Assert( dest^ = test_string );
{$endc}
end;

Now when this routine executes we don't have to wonder if it is doing the right thing and hope that we spot the problem if it isn't. We know it works correctly every time because if it ever fails it will immediately notify us of the problem.

Note how the debugging variable test_string is compiled out if do_debug is false. This is for two reasons, first it avoids the unused variable warning when you build the non-debugging version, but more importantly it makes it clear that test_string is for debugging purposes only and ensures it is not accidentally used in the "real" code.

Many programs have a single important job that they perform. For example, in a drawing program it might be rendering to the screen, in a spreadsheet it might be the recalculation engine, in a game it might be updating the game state. These all involve taking some input state and mapping it to a new state. You can use assertions to validate your code in two important ways:

by checking that the state is valid before and after the update.
by duplicating the engine and running both and ensuring they get the same results.

Checking the state is generally pretty easy, you just go through each variable in each structure and ensure that it is within acceptable ranges. For each interaction, you ensure the variables are compatible. For example in a drawing package, you might assert that each object is within range, that each rectangle has four points, that each colour or pattern is valid, that each group is made up of objects that are inside the group, and so forth.

Duplicating the code engine can be a fair amount of work, but it can also be very valuable. Often these engines must be very efficient so they end up being highly optimised. At the start of the project, you might write a very simple engine as a proof of concept - rather than throw this engine away, keep it and execute it in parallel with the new optimised engine you write and then check that the results are identical. For example, in a drawing package, you might do something like this:

procedure UpdateOffscreenWorld( offscreen: GWorldPtr );
  var
    rgn: RgnHandle;
{$ifc do_debug}
    debug_world: GWorldPtr;
{$endc}
begin
  FindChangeRegion( rgn );
  RedrawOnlyChanges( offscreen, rgn );
  DisposeRgn( rgn );
{$ifc do_debug}
  MakeNewOffscreenWorld( debug_world );
  DrawEverything( debug_world );
  Assert( IdenticalBits( offscreen, debug_world ) );
{$endc}
end;

How to implement assertions

As described above, the basic assertion is simply a procedure that takes a boolean and reports if the boolean is false. However, since computing the assertion condition may be computationally expensive and since it does not in any way affect the execution of the program, it is desirable to have them automatically removed from your code before you ship the final version. To do this, we use compiler macros (#define in C, {$definec} in Pascal) like this:

#ifndef do_debug
#define do_debug 1
#endif

#if !do_debug
#define Assert(b)
#else
#define Assert(b) AssertCode(b)
#endif

#if do_debug
void AssertCode( Boolean b );
#endif

{$ifc not defined do_debug}
{$setc do_debug := 1}
{$endc}

{$ifc not do_debug}
{$definec Assert(b)}
{$elsec}
{$definec Assert(b) AssertCode(b)}
{$endc}

{$ifc do_debug}
  procedure AssertCode (b: boolean);
{$endc}

First, we default do_debug to true. Then we define the macro, mapping Assert( condition ) to either nothing at all or to a call to AssertCode - the actual procedure is renamed to AssertCode so that it's definition is not mangled by the Assert macro. For final builds you can use a prefix file to set do_debug to false and then recompile all your source.

There are two things you have to be careful with. First, since the macro mechanism effectively removes the Assert lines from your program, you must never use a function with a side effect in your assertion. For example, you might be tempted to do something like this:

Assert( NewGWorld(   ) == noErr );

But when you compile that with do_debug set to false, the line will disappear and the GWorld will not be created. Instead you should write:

err = NewGWorld(   );
Assert( err == noErr );

For this reason, and just for general safety, it is import that you set do_debug to false for at least your last few beta builds (after you have resolved all bugs that cause assertions to fire of course!) so that you can get some serious testing with a build that is almost identical to the final build.

Extending assertions

Assertions can be as simple or as complicated as you choose to make them. I have described a very simple implementation, but you can expand on the concept in several ways.

You could use a more interesting reporting mechanism than simple DebugStrs such as Alerts or sending the reports out a serial or TCP connection. You could build on what you want to assert, such as asserting that a pointer or file reference or TCP stream is valid. You could also add information to the assertion such as the file name or line number or a message describing the cause of the assertion.

Always keep in mind that you want to use assertions frequently in all your code so there may be some constraints on what you can do in the Assert routine if you are writing any low level code like interrupt routines or drivers, and you should avoid making the act of including an assertion overly tedious (that is why I generally dont include an explanatory message in my assertions).

You can also look around for other assertion libraries - CodeWarrior's MSL and PowerPlant both include support for assertions.

Conclusion

The single most important question to ask yourself whenever you find a bug in your code is "How could I have prevented this bug?" or at the very least "How could I have found this bug earlier?". Assertions are one way of finding bugs very early. Steve Maguire's excellent book, Writing Solid Code, describes assertions and many other ways of finding or preventing bugs (including stepping through any new code you write, writing good interfaces, choosing safe/debugable implementations).

These techniques really do work. They will save you time and frustration, and they will dramatically increase the level of confidence you have in your code. Where you would previously have said "this routine probably does more or less what I expect" you can say with confidence that it does exactly what it is suppose to do, and if it ever fails, you'll hear about it immediately.

So if you are going to write code (especially if it will end up running on my Mac!) go and read Writing Solid Code, get an attitude adjustment, and start writing bug free code!

References

Writing Solid Code by Steve Maguire. This book is the definitive reference in my opinion. I believe all programmers should read this book, it does a great job at explaining this topic and at motivating the reader to strive to write bug-free code.
Effective C++ by Scott Meyers. C++ has many ways to introduce bugs in to your code that are very difficult to debug. Effective C++ describes many of these and how to avoid them.

Assertions

Contents