Fixed Point Math


So I had actually written a bunch of fixed-point math and support stuff for myself a couple years ago and had figured that I was all done with it. There are many articles and books out there written on the topic, so I do feel like it is very well covered by this point. But I was working on something a couple months ago that highlighted a bit of a gap in my fixed-point math code and noticed that none of the books that I personally owned actually discussed this particular issue, 'nor did any of the articles on the topic that I had bookmarked. Maybe the authors thought that it was "obvious" and didn't need explanation? I dunno.

At any rate, encountering that problem made me decide that I would just go ahead and write an article on fixed-point math with a specific focus on retro-programming (because of course I would do this) and include the solution for the specific issue I had found. Maybe someone else will find this useful.

Floating-Point and x86

Normally in your code, when you want to work with numbers with decimals, such as 3.14159, you would use a floating-point type. For example, in C/C++, this would be float or double. These types use the IEEE-754 representation for holding numbers. Using floating-point data types is of course very common today. You don't even think twice about it when you need to work with non-integer numbers. You just use these types and go on your merry way.

Way back when, CPUs didn't have built-in hardware support for floating-point data types. For example, to get hardware support on x86 CPUs, you needed to have a separate math co-processor, like the Intel 80x87. Having one of these installed would mean that you could use your programming language's built-in floating-point data types, and they would execute floating-point instructions using the math co-processor, which was significantly faster than the alternative of using an emulation library. Many compilers made switching between use of an emulation library or using native 80x87 instructions easy (usually just a compiler switch or something similar... no code changes needed). Of course, if you didn't have a math co-processor installed or you were distributing programs to users who might not have a math co-processor, you had no choice but to rely on some sort of emulation.

The 486 was the first Intel x86 CPU with the math co-processor functionality built in. Well, unless you had an SX, heh. Ok, well at least from the Pentium-onwards, it was all built-in without any Intel-manually-disabling-the-built-in-support-purely-for-money-grubbing-purposes nonsense. Before the 486/Pentium were commonplace, as a software developer, you really couldn't just assume that everyone running your code had a math co-processor available.

As an aside, even today you could still come across modern hardware, such as embedded systems, with no hardware floating-point support. Even back then, once Pentium processors were much more common place in desktop computers, you still had stuff like video game consoles such as the original Playstation, which had no hardware floating-point support, despite the fact that games for that console heavily featured impressive (for the time) 3D graphics. Another somewhat later example of this is the original Nintendo DS.

Enter Fixed-Point

Fixed-point is an alternative to using floating-point that is useful when you need to write code that will run on machines where either no math co-processor will be available, or even if it is, it would be too slow to run the number of floating-point calculations that are going to be needed. Fixed-point will not give you exactly the same levels of precision as floating-point would, but for many purposes, it is "good enough."

Fixed-point works by representing floating-point numbers using otherwise normal integer data types (e.g. in C/C++, using an int or long) and arbitrarily dividing the bits into whole number (integer) and decimal (fractional) parts.

For example, if we imagine the bits in a 32-bit integer:

0000000000101101 0010100011110101
  whole number       decimal

The above shows the 16:16 fixed-point representation of the number 45.16. The high word (16 bits) represents the whole number portion (45), while the lower word represents the decimal (0.16). As you can see, with this "16:16" representation we've decided to place an imaginary decimal point between the high and low 16 bits of a 32-bit integer. This means that the decimal point doesn't move around as it can with real floating-point. Instead, it is fixed in place.

Nothing stops you from shifting the decimal either in your own fixed-point math routines. You might prefer to use say, 24 bits for the whole number portion and only 8 bits for the decimal. It depends on how much precision you need. You could even define multiple different fixed-point types which specify different levels of precision. But the point is that with fixed-point, we're essentially defining a convention for ourselves on where the decimal point will be.

Of course, in order to use some other custom decimal placement, you would need to customize your fixed-point math routines from what I will show in this article.

This 16:16 representation obviously imposes some limits on the maximum values we can hold in such a fixed-point data type. With 16 bits for the whole number (integer) portion, we can only hold values +/- 32767. And with 16 bits for the decimal (fractional) portion, we have a maximum precision of about 0.000015. Note that we're going to be using signed values (because our programming language's normal floating-point data types all allow us to use signed values, and so we want to do the same because that's the whole point of this ... to get a fast equivalent to floating-point, right?). Because we're using 32-bit integers, the sign bit is going to live in the upper 16-bits.

Now, before we go any further ... why exactly is it better to use fixed-point on these older x86 CPUs?

As mentioned previously, if your code is to run on a system that does not have 80x87 instruction support at all, either via a math co-processor or by having built-in support (as in 486DX or later x86 CPUs), then you need to rely on some sort of emulation library that your compiler probably provides for you which is definitely going to run slower than the equivalent hardware support would.

But, just because your 286/386/486-equipped computer has 80x87 support, it doesn't mean that using those instructions is necessarily the best idea if you absolutely need raw speed. Until the Pentium rolled in, 80x87 instructions could be somewhat slow even when the hardware was present. Consider the following instruction timing listings (numbers shown are the number of cycles needed):

80x87 Instruction Timings
Operation 387 486 Pentium
fadd 23-34 8-20 3/1*
fsub 26-37 5-17 3/1*
fmul 29-57 16 3/1*
fdiv 88-91 73 39

* = latency/throughput

x86 Instruction Timings (32-bit register operands)
Operation 386 486 Pentium
add 2 1 1
sub 2 1 1
imul 9-38 12-42 10
mul 9-38 12-42 10
idiv 43 43 46
div 38 40 41

Since fixed-point uses normal integer data types, the fixed-point math we will be using just uses x86 instructions like add, sub, imul, idiv, etc. That is where we will gain speed over using 80x87 instructions or (especially) using emulation libraries.

Note that I've omitted 286 or earlier timings from the above tables. Obviously things get even slower the further back you go. I will not be discussing fixed-point math in detail for x86 CPUs prior to the 386 in this article. On those CPUs you lack 32-bit registers (as well as some of the specific assembly instructions I will be using here), which means that you end up either having to settle for something like 8:8 fixed-point, or to use two 16-bit registers in order to provide support for something like 16:16 fixed-point. This obviously complicates the fixed-point math code, but it is certainly doable.

A Fixed-Point Data Type

Let's start by defining our fixed-point data type. For this article, I'm going to go with a 16:16 fixed-point data type, because that's what I use in my own projects that use fixed-point. Obviously it makes sense to use a 32-bit integer:

typedef int fixed;       // if you're using a 32-bit compiler anyway. maybe use int32_t if you have stdint.h

There's our 32-bit fixed-point data type, fixed. Simple. Note that we want to be able to use signed values, as that is usually very useful in applications using floating-point calculations, especially so in games.

Of course, if you were using a 16-bit compiler like Borland Turbo C++, or one of the old DOS Microsoft C/C++ compilers, and you wanted to have a 16:16 fixed-point data type, then you would need to define your fixed type like so:

typedef long fixed;

I use Watcom C/C++ myself, so the rest of this article is going to be oriented towards that compiler.

Converting To And From Fixed-Point

So, obviously, we cannot just do something like this to assign a value to our fixed-point type:

fixed f = 3.14159;

Because fixed is actually an int, so you won't get the right value as the decimal would be dropped. Plus, the integer value that your compiler would assign here will be "wrong" as far as our fixed-point representation is concerned. The bits for the integer 3 would be located in the lower 16 bits which in our 16:16 representation, we've designated as the decimal (fractional) portion of our fixed-point representation.

What we need to do instead is shift the value so the whole number and decimal bits get stored in the right places. This needs to be done differently when assigning integer values to a fixed-point variable then it does when assigning actual floating-point values to a fixed-point variable. This is because you can actually bit-shift an integer value to move the bits to the right spot. But you cannot (meaningfully, at least) bit-shift a floating-point value due to its IEEE representation. Instead, for floating-point values, we just multiply them by the equivalent of the integer bit-shift, and then cast the resulting value to our 32-bit integer type.

Let's start with integer to fixed-point conversions first:

fixed a = 42 << 16;                       // a is now "42.0f" in fixed-point representation

We shift 42 by 16 bits to the left, to place the integer value 42 in the upper 16 bits of the fixed-point value because we've designated the upper 16 bits as the whole number (integer) portion of our fixed-point representation. There is no decimal portion to assign here obviously, so this is all that's needed. The decimal portion that we will be left with here will just be zero, thus the resulting fixed-point equivalent value 42.0f.

The bits of our fixed-point variable a will look like this:

0000000000101010 0000000000000000

Now for floating-point to fixed-point conversions:

fixed b = (fixed)(3.14159f * 65536.0f)    // b is now "3.14159f" in fixed-point representation

Here we multiply 3.14159f by 65536.0f (65536 = 2^16, in other words, the equivalent multiplication to the bit-shift we used for the integer previously) and cast the result to our fixed type (which is just a 32-bit integer). Casting a floating-point value in this way would normally mean dropping the decimal portion, but since we've already shifted the decimal portion up with the multiplication, we don't actually lose it during the cast.

The bits of our fixed-point variable b will look like this:

0000000000000011 0010010000111111

To convert fixed-point numbers back to either integer or floating-point values, we just do the opposite. Here is how you could do a fixed-point to integer conversion:

int i = a >> 16;                          // i is now "42" (converted from "42.0f" fixed-point)

Looking at the previously shown bits for the fixed-point variable a that should be pretty self-explanatory. In this type of conversion, obviously any decimal portion (if there was any) is discarded.

As a little something extra, if we wanted we could do a fixed-point to integer conversion that rounds up any decimal portion to the next whole number before the doing the right-shift to turn it back into a normal integer. We could do this by adding the equivalent of 0.5f in fixed-point, which for our 16:16 format is 0x8000.

int i = (a + 0x8000) >> 16;               // if a was "3.5f" fixed-point, i would now be 4
                                          // however, if a was "3.4f" fixed-point, i would now be 3

Now for fixed-point to floating-point conversions. Again, we cannot just do a right-shift. The fixed-point value we have is represented as an integer and to turn it into a real floating-point value we need to get it back into that IEEE representation.

float f = (float)(b / 65536.0f);          // f is now "3.14159f" (converted from "3.14159f" fixed-point)

It is obviously useful to define some helper functions or macros for doing all of these conversions. These are the macros that I use:

#define FTOFIX(f)      ((fixed)((f) * 65536.0f))
#define ITOFIX(i)      ((fixed)((i) << 16))
#define FIXTOF(x)      ((float)((x) / 65536.0f))
#define FIXTOI(x)      ((int)((x) >> 16))

Doing Fixed-Point Math

Addition and Subtraction

Luckily, addition and subtraction can be done as normal. Add and subtract away!

fixed a, b, c;

a = FTOFIX(1.2f);
b = FTOFIX(7.7f);

c = a + b;               // c will be "8.9f" fixed-point
c = a - b;               // c will be "-6.5f" fixed-point

One Little Gotcha

One very important consideration when doing fixed-point math is that you need to make sure that you are not unintentionally mixing fixed-point and non-fixed-point values together within the same calculations.

Convert as necessary to make sure you're not mixing them together! If you want a fixed-point result, make sure all the values being used in the calculation are fixed-point. If you want an integer result, make sure any fixed-point values used in the calculation are converted to integers. Same thing goes if you want a floating-point result.

Here's an example to illustrate this point.

int a;
fixed b, c;

a = 10;
b = ITOFIX(20);      // b = "20.0f" in fixed-point

c = b + a;

What is the result left in c? It's not going to be fixed-point 30.0f. It's actually going to be the fixed-point equivalent of 20.000153f.

Why is that? Because b is 20.0f in fixed-point. Meaning that the whole number portion 20 is shifted to the left so that it's in the high 16 bits of b. Then we add a to it. But a is just the plain integer value 10. Because it is just a normal integer value, that means its bits are located in the lower 16 bits of that variable. Adding a and b together has the probably-unintended consequence of treating a as if it were a really, really small fixed-point value (equivalent to a really small fraction in floating-point). When the addition is performed, the value 10 ends up in the resulting fixed-point value c's lower 16 bits, which is where the decimal portion of the fixed-point value lives in our 16:16 representation, while the 20 will be left in c's higher 16-bits.

To hopefully more clearly illustrate, let's look at the bits for a and b after their initial values are assigned:

a = 0000000000000000 0000000000001010
b = 0000000000010100 0000000000000000

As you can probably imagine after seeing this, after the addition is performed the bits in c will look like this:

c = 0000000000010100 0000000000001010

Of course there are a couple ways to fix this. The way to fix this if a really did need to be defined as a normal int, and we really did want to add that integer value to the fixed-point value b, we would need to rewrite the addition so that all of the values being added together are being added as fixed-point values by just doing an integer-to-fixed-point conversion on the spot for a:

c = b + ITOFIX(a);

Which would give us the value we expected, 30.0f in fixed-point.

Multiplication and Division

Multiplication and division can be done using normal means, but with some extra tricks being needed.


Let's start with a simple multiplication and see what we're dealing with.

fixed a, b, c;

a = ITOFIX(2);
b = ITOFIX(4);
c = a * b;         // will this be "8.0f" in fixed-point?

What ends up in c? Zero! What!? Let's take a look at the bits of our fixed-point values a and b.

a = 0000000000000010 0000000000000000
b = 0000000000000100 0000000000000000

These two values should make sense to you now. The actual underlying 32-bit integer values being multiplied here are 131072 and 262144, the result of which is 34359738368, which is huge. Too big for a 32-bit integer to hold. Our C compiler is going to compile this multiplication using an imul instruction which places the resulting 64-bit value in two registers, the high double-word in edx and the low double-word in eax. 34359738368 shown as 64-bits looks like this:

0000000000000000 0000000000001000 0000000000000000 0000000000000000
|---- high double-word (edx) ---| |---- low double-word (eax) ----|

The compiler just ended up returning the low double-word value (which is 0) because our code only specified a 32-bit variable to hold the result.

But hold on, why is fixed-point 2.0 multiplied by fixed-point 4.0 such a gigantic result? It's because in order to represent our fixed-point values, we are effectively multiplying them individually by 2^16 = 65536 (left-shift of 16) before using them in the multiplication. When we then multiply them together, this "scaling" of each of the numbers gets factored into the result. The result we got, 34359738368, is actually 8 * 2^32 (essentially this result is in 32:32 fixed-point representation). But we expected to get 8 * 2^16 (16:16 representation) for our fixed-point result.

So you can probably see what the fix here is. We just need to shift the resulting 64-bit value right by the same amount as what our fixed-point shift amount is, 16 in our case for our 16:16 representation. But we have a little wrinkle in our plan still. The result was a 64-bit value spread across two 32-bit registers.

Let's test a little multiplication tweak using our fixed-point values from above and see what we can do:

mov eax, 131072             ; "2.0f" fixed-point represented as a 32-bit int
mov ebx, 262144             ; "4.0f" fixed-point represented as a 32-bit int
imul ebx
shrd eax, edx, 16           ; right-shift the entire 64-bit value edx:eax back by 16 bits into eax

After the shrd instruction, we will have the following bits in eax:

0000000000001000 0000000000000000

Which is 524288, which is 8 * 2^16 ... which is the exact fixed-point value (8.0f) we expected to get from our multiplication. Phew!

Now that we know how to deal with this, we can turn this bit of assembly into a function. With Watcom C/C++:

fixed fix_mul(fixed a, fixed b);
#pragma aux fix_mul =      \
    "imul ebx"             \
    "shrd eax, edx, 16"    \
    parm [eax] [ebx]       \
    modify [eax ebx edx]   \
    value [eax];

// let's try this again ...

fixed a, b, c;

a = ITOFIX(2);
b = ITOFIX(4);
c = fix_mul(a, b);         // this will now be "8.0f" in fixed-point!

// negative numbers work too of course

a = FTOFIX(2.5f);
b = FTOFIX(-6.3f);
c = fix_mul(a, b);         // will be "-15.75f" in fixed-point


Division is similarly tricky, but let's walk through it so that we understand exactly what's going on.

Let's start with a simple multiplication and see what we're dealing with.

fixed a, b, c;

a = FTOFIX(4.8f);
b = FTOFIX(2.4f);
c = a / b;         // will this be "2.0f" in fixed-point?

What ends up in c this time? 2! Yay, that was easy! Wait... no, that's wrong! We expected to get 2.0f in fixed-point, not 2 as a plain ol' integer!

The problem here is basically the reverse of what it was with multiplication. Those of you reading this with math degrees probably already expected this. But alas, I am a bit of a math dummy, so all these simple concepts I learnt back in school I had to re-learn again here while exploring fixed-point math.

Our two fixed-point values a and b have been pre-multiplied (left-shifted) in order to turn them into 16:16 fixed-point values. When dividing, this multiplication factor present in both of the values gets cancelled out, and so the divide basically un-shifts our fixed-point values back into integers and divides those integers all at once.

Probably the easiest fix here is to convert the dividend, a, into 32:32 fixed-point using a 64-bit value across edx:eax before the division. This would be un-coincidentally convenient as when using idiv with a 32-bit divisor as we are, it is actually dividing the 64-bit value in edx:eax anyway. The result will be a 32-bit value in eax.

Now, we cannot just left-shift our a value by 16 so that it sits across edx:eax because our fixed-point values are signed. We need to sign-extend our 32-bit value a across 64-bits before we do the left-shift that is necessary to convert our 16:16 fixed-point representation value in a so that it is in proper 32:32 format. And of course, that left-shift needs to occur across eax and into edx. The cdq instruction will do the sign-extension into 64-bits for us, while shld will handle the left-shift into edx from eax.

Let's test this out quickly with a bit of assembly:

mov eax, 0x4cccc            ; our dividend value "4.8f" fixed-point represented as a 32-bit int in hex
mov ebx, 0x26666            ; the divisor value "2.4f" fixed-point represented as a 32-bit int in hex
cdq                         ; sign extend eax into edx:eax
shld edx, eax, 16           ; shift edx left by 16, using bits from eax to fill in on the right of edx on each shift
shl eax, 16                 ; shld leaves eax unmodified though. so shift eax left by 16 also to finish fixing edx:eax
idiv ebx

To understand this a bit better, lets look at the bits of our dividend value 4.8f fixed-point as we're transforming it here before the idiv is finally executed.

After the value is initially set, eax has:

eax = 0000000000000100 1100110011001100

After the cdq is executed, we end up with the following 64-bit value across edx:eax:

edx = 0000000000000000 0000000000000000
eax = 0000000000000100 1100110011001100

Because our dividend value in eax is positive, the sign bit value of 0 is extended all the way through edx. If the dividend value was negative, we would expect to see a bunch of 1s extended all the way through edx. Next we begin shifting our dividend value into 32:32 fixed-point representation with the shld:

edx = 0000000000000000 0000000000000100
eax = 0000000000000100 1100110011001100

shld leaves the second register operand, in this case eax, unaltered. So, we have our edx value shifted in from the high 16 bits of eax now. So edx is now all good for 32:32 fixed-point representation. The whole number portion 4 is there. Now we need to fix eax by shifting it left 16 as well using shl.

edx = 0000000000000000 0000000000000100
eax = 1100110011001100 0000000000000000

Great! Now edx has the correct high 32-bits representing the whole number portion of our original 16:16 fixed-point value, and eax has the correct low 32-bits representing the decimal portion. Sure, the left-shift on eax ended up leaving a bunch of extra zeros in the low 16-bits, but we didn't have any additional bits from our original value anyway, so this is fine and won't affect our division at all.

After the idiv executes, we get the following 32-bit result in eax:

eax = 0000000000000010 0000000000000000

Which is exactly what we want, 2.0f as 16:16 fixed-point representation! No more shifting of futzing about with values is needed. We can return this into our fixed result variable as-is.

Phew! Oh my! Now we can package this assembly up into a function too. For Watcom C/C++:

fixed fix_div(fixed a, fixed b);
#pragma aux fix_div =    \
    "cdq"                \
    "shld edx, eax, 16"  \
    "shl eax, 16"        \
    "idiv ebx"           \
    parm [eax] [ebx]     \
    modify [edx]         \
    value [eax];

// let's give it a try ...

fixed a, b, c;

a = FTOFIX(4.8f);
b = FTOFIX(2.4f);
c = fix_div(a, b);         // this will now be "2.0f" in fixed-point!

// and again, negative numbers work too

a = FTOFIX(16.0f);
b = FTOFIX(-2.0f);
c = fix_div(a, b);         // will be "-8.0f" in fixed-point

A Big Division Gotcha

These methods are the usual ones that many people have better written about before me over the past decades. I cannot take credit for any of this. However, there is at least one big gotcha with doing division this way with idiv that I've yet to see a book or article mention that discusses fixed-point math.

Imagine for a moment that you are fiddling around with a ray caster just for fun, and also as a means to give your fixed-point math routines a good work out (as I was at some point). Also imagine that at some point as your ray caster test program is running, it ends up executing a division (as mine was). And finally, let us imagine that the values being divided were, say, 2.0f and 0.000061f fixed-point (as eventually happened to me).

fixed a, b, c;

a = FTOFIX(2.0f);
b = FTOFIX(0.000061f);
c = fix_div(a, b);            // will be "32786.8852f" in fixed-point .... right?

And then imagine that your code fails at the idiv operation run by fix_div with a "divide by zero" error.

First off, what the heck? We aren't dividing by zero here! Our divisor value 0.000061f fixed-point is greater than what we figured our smallest precision was (mentioned near the beginning of this article), so it's probably not anything like the value being rounded to zero or something like that. So why would it fail with a "divide by zero" error? Let's trace through our fix_div assembly routine with these inputs and see what's going on.

mov eax, 0x00020000         ; our dividend value "2.0f" fixed-point represented as a 32-bit int in hex
mov ebx, 0x00000003         ; the divisor value "0.000061f" fixed-point represented as a 32-bit int in hex
shld edx, eax, 16
shl eax, 16
idiv ebx

Now, let's take a look at our register values when execution reaches the idiv. ebx will of course just have the value 3 in it. But here is how our dividend value looks after it has been sign-extended and shifted into 64-bits across edx:eax:

edx = 0000000000000000 0000000000000010
eax = 0000000000000000 0000000000000000

How does this result in a "divide by zero" error? Let's just manually calculate the result of this division ourselves and see if we can spot the issue. We're dividing the 64-bit value 8589934592 by 3 which is 2863311530. This result does fit into 32-bits, not a problem. So what is wrong? I'll freely admit that I didn't see the problem with this particular result at all and was quite confused until I finally decided to look at the bits for it:

2,863,311,530 = 1010101010101010 1010101010101010

Oooohh, pretty it's a nice repeating pattern. Ok, yeah it's pretty I guess, but the real problem is actually that this result value has the most significant bit set to 1 and we're performing signed division with idiv. The result of this division cannot be represented in just 31 bits (32 bits minus the sign bit), so idiv throws a "divide by zero" error, which is actually just a "divide error", but most debuggers will present this as a "divide by zero" error which is kinda confusing.

Ok, so ... fine ... idiv is working as advertised. No problem!

Except that this can be annoying when you don't have a good way to trap this or otherwise handle it in some way that doesn't result in your ray caster crashing just because you were turning around in a full circle and these problematic values ended up in a division by chance (as happened to me). How to handle this properly is going to vary from application to application, but regardless of how you want to handle it you will need a way to catch this error in the first place and return some kind of error code or value that we can detect.

Eventually I discovered that the fixed-point division routine in Allegro has a way of working around this and returning an error result if the division would overflow. After seeing how Allegro was doing it I had an "oh, duh!" moment because it seemed so obvious in hindsight, heh. However, I will not try to hide my oversight on coming up with this solution on my own and am happy to give full credit to them!

The solution is to just perform the signed division with div instead of idiv, but handle the manipulation of the sign bits yourself, as well as checking for the overflow conditions yourself, and when they arise, return some sort of error code/value of your choosing. This is simple to implement as we know from math classes in school from way back that dividing two positive values together results in a positive value. Dividing two negative values together also results in a positive. But dividing a negative and a positive results in a negative. So we can manipulate the sign bits ourselves based on this understanding. This does end up being less performant than our existing fix_div routine which is unfortunate. If you know that you will never ever be doing divisions with values that will overflow like this, then you need not worry and can continue using the smaller, faster fix_div already shown! I'm not a math genius, so I do not really know off the top of my head if there is even a straight-forward way to just "know" this in advance. That would surely be nice though.

Let's test this solution out:

    mov eax, 0x00020000         ; our dividend value "2.0f" fixed-point represented as a 32-bit int in hex
    mov ebx, 0x00000003         ; the divisor value "0.000061f" fixed-point represented as a 32-bit int in hex

    xor ecx, ecx
    or eax, eax                 ; check if the dividend is negative
    jns checkDivisorSign        ; move on to checking the divisor if the dividend is already positive
    neg eax                     ; dividend is negative. make it positive
    inc ecx                     ; increment counter, indicating that we switched the sign of one of the values
    or ebx, ebx                 ; check if the divisor is negative
    jns divide                  ; move on to performing the division if the divisor is already positive
    neg ebx                     ; divisor is negative. make it positive
    inc ecx                     ; increment counter, indicating that we switched the sign of one of the values
    xor edx, edx
    shld edx, eax, 16           ; convert dividend value to 32:32 fixed point. no need to sign extend this time
    shl eax, 16
    cmp edx, ebx                ; one last check for possible division result overflow
    jae error                   ; if found, skip right to returning an error result
    div ebx                     ; divide! using unsigned division this time. result will be positive!
    or eax, eax                 ; test division result's sign bit. if it is set, the division result is too big
    jns restoreSignBit          ; however, if the result's sign bit was not set, the result is fine. skip to return
    mov eax, 0x7fffffff         ; this is our error code result. because why not?
    jmp done                    ; and skip to the end. we just want to return this error value as-is
    cmp ecx, 1                  ; if ecx=1, only one of the values was negative and we should negate the result
    jne done                    ; if ecx=0 or ecx=2, then leave the result positive (neither or both were negative)
    neg eax                     ; negate result, returning a negative value

We check the sign bits of the divisor and dividend separately and if either is negative, we flag it using ecx as a counter for how many negative values we found, and we also force both the divisor and dividend to be positive going into the div. There is an additional last-minute check before the div runs, but after the conversion of our dividend to 32:32 fixed-point representation. We check the high double-word of the dividend by itself against our divisor to see if it is larger or equal. This would end up with a division result that is always going to be too large to fit in a 32-bit signed result and the div operation would end up throwing a "divide error", so we definitely want to catch this scenario and if found, return an error value instead of letting the CPU crash our application (that's the whole reason that we're writing this monstrosity of a division routine after all!). Immediately after the div is performed we need to check the sign bit of the result. If it is set, then the division result was too big to fit in a 32-bit unsigned value (remember div is doing unsigned division) and we need to set our error result in this case. Finally, we need to check if we need to manually set the sign bit on the division result. This is only needed if ecx is 1. The only way that ecx would be 1 is if only one of the values (either the dividend or divisor) was negative. And since a division with a positive and negative value results in a negative, we need to make sure our result is negative in this case (which it will never be unless we set this here).

Phew! For completeness-sake, here is the final Watcom C/C++ routine:

fixed fix_div_safe(fixed a, fixed b);
#pragma aux fix_div_safe =      \
    "    xor ecx, ecx"          \
    "    or eax, eax"           \
    "    jns checkDivisorSign"  \
    "    neg eax"               \
    "    inc ecx"               \
    "checkDivisorSign:"         \
    "    or ebx, ebx"           \
    "    jns divide"            \
    "    neg ebx"               \
    "    inc ecx"               \
    "divide:"                   \
    "    xor edx, edx"          \
    "    shld edx, eax, 16"     \
    "    shl eax, 16"           \
    "    cmp edx, ebx"          \
    "    jae error"             \
    "    div ebx"               \
    "    or eax, eax"           \
    "    jns restoreSignBit"    \
    "error:"                    \
    "    mov eax, 0x7fffffff"   \
    "    jmp done"              \
    "restoreSignBit:"           \
    "    cmp ecx, 1"            \
    "    jne done"              \
    "    neg eax"               \
    "done:"                     \
    parm [eax] [ebx]            \
    modify [ecx edx]            \
    value [eax];

// let's try it out!

fixed a, b, c;

a = FTOFIX(2.0f);
b = FTOFIX(0.000061f);
c = fix_div_safe(a, b);            // will be 0x7fffffff, signalling an error!

// now lets see how it handles with successful divisions ...

a = ITOFIX(8);
b = ITOFIX(2);
c = fix_div_safe(a, b);            // will be "4.0f" in fixed-point

a = ITOFIX(-8);
b = ITOFIX(2);
c = fix_div_safe(a, b);            // will be "-4.0f" in fixed-point

a = ITOFIX(8);
b = ITOFIX(-2);
c = fix_div_safe(a, b);            // will be "-4.0f" in fixed-point

a = ITOFIX(-8);
b = ITOFIX(-2);
c = fix_div_safe(a, b);            // will be "4.0f" in fixed-point


This ended up being much longer then I had originally imagined. And I have some more things to say about fixed-point math for old retro-computers. There are useful trigonometry functions and other useful math functions, like a sqrt() function, that we will want fixed-point equivalents for. But that will be covered in a future post.

My Custom Socket 7 Build

hardware dos win9x

A couple years ago, I wrote a post about the 486 DX2 computer that I built and figured I would do the same with a Socket 7 build. Looking back at that post I wrote, I regret not discussing some of those hardware choices in more detail, so I will take the opportunity to go into more of those details here. Anyway, I've actually had this computer finished and working for almost the same amount of time as that 486 computer. Some little tweaks were made here and there, but for the past year its hardware has been left unchanged.

So, why build this machine anyway after building my 486 computer? Well, the obvious answer is that it's a more powerful system then a 486 (duh!), while still retaining practically all of the same DOS and Windows 3.1/9x compatibility that I personally, would care about. So does that mean that I don't need or want my 486 computer anymore? No, actually, truth be told, even though I've had this machine built for almost two years, I still use the 486 quite a lot more. Why? *shrug* The "old-school" charm of it? I cannot really answer that question in a logical way. They're both, to me, fun to use machines for different purposes. The 486 machine I like most for pure DOS, while this other computer is better for late DOS and early Windows 95/98 software.

More than that though, is that after finishing my 486 build, I had a spare baby AT computer case to either use or toss. The case I used for this Socket 7 build is the one that I originally bought to use for my 486 build. But when the case that I actually really wanted to use for the 486 popped up on eBay, I just had to get it and that left me with this spare case. These old AT cases are only getting harder and harder to find as time goes by, and so just tossing it out seemed silly. Plus the case was in such great shape too.


I had two Socket 7 motherboards on hand in preparation for this build, actually.

On the left, a Tyan S1590S and on the right, an Atrend ATC-5030. The Tyan board is actually a "Super" Socket 7 motherboard using the Via MVP chipset. This supports AGP (though, early AGP implementations on these boards was not very good), faster front-side bus (FSB) speeds like 100MHz, and is actually a great fit for AMD K6-2 and K6-III processors. Plus the sheer number of expansion ports available on this motherboard is actually quite odd for the AT form-factor. It is quite a packed motherboard, really. The Atrend motherboard on the other hand, is a mostly "normal" Socket 7 motherboard using an Intel 430TX chipset. Later Socket 7 processors such as the aforementioned AMD ones can still be used with this motherboard, but not at the faster FSB speeds support by Super Socket 7 motherboards.

What I like about both of these motherboards is the positioning of the PCI and ISA slots with respect to the CPU socket's location. At least a couple of long PCI and ISA cards can be used and won't get in the way of the CPU once a heatsink and fan are installed. This is really important for AT form-factor motherboards as they are a fair bit more constrained for space then the larger ATX form-factor Socket 7 motherboards were. A whole lot of Socket 7 motherboards either cut back on the number of PCI and/or ISA slots and/or have the CPU socket positioned so that you cannot install long / full-length PCI or ISA cards.

The Tyan S1590S is what I ended up going with here as it has the largest number of customization options available, making future "tinkering" and swapping out of different components possible.

It is worth mentioning that the inclusion of an AGP slot on this motherboard is not really that useful to me with the case I am using, due to it's position on the motherboard. With the size of a baby AT case, and how all of the I/O connectors are right by the AGP slot, it means that area will be really crowded. The last thing I would want is a graphics card right there with bunched up bits of IDE cables hanging on it. Really, to utilize the AGP slot on this motherboard, you'd probably want to use a slightly bigger case.


The two choices I had for a Socket 7 CPU are an Intel Pentium 233 MMX and an AMD K6-2 400. Most people would naturally go with the AMD K6-2, especially when using a Super Socket 7 motherboard as I am. I may one day use the K6-2 in this build, but for now I went with the Pentium 233 MMX. Why? Mostly nostalgia, really. While growing up, we never had any computers using AMD CPU's until 2001-ish when my dad got an AMD Duron for the computer my brother and I shared at the time. So I just figured I'd stick with something we actually had back then, and a Pentium 233 MMX actually fits that criteria perfectly. Normally these Pentium MMX CPUs would run at FSB speeds of 66MHz, but with the Tyan S1590S motherboard, I can bump that up to 100MHz easily (these Pentium MMX CPU's were well known for being good at overclocking) and run the CPU at a 2.5x multiplier for a final 250MHz clock speed. Not a huge overclock for the CPU itself, but the benefit of running the system bus at 100MHz is nice overall and a noticeable difference in Windows performance.

I did, for fun, try running with a 3.0x multiplier at 100MHz FSB (for a 300MHz CPU clock), just to see if it would work. But alas, the system would not POST with that configuration. Some people back in the day did this with Pentium 233MHz CPUs, but it was not guaranteed that your particular CPU would be able to run like this (I guess it depended on how good the particular manufacturing batch was or something).

At this time, Pentium CPUs were significantly better at floating-point operations then AMD K6 CPUs were. I think that integer performance might have been a bit better with later K6-2 and K6-III CPUs, but I might be wrong. AMD K6-2 CPUs and later supported AMD's 3DNow! extensions which gave developers access to a bunch of extra instructions to help boost performance. Both of these features of each CPU are interesting to me and so I can definitely see myself using both at different times in this build in the future.

But for the time being, I went with the Pentium 233 MMX.


Anyone building a Socket 7 build today for fun is almost certainly going to pop one (or more) of 3dfx's Voodoo graphics card(s) into their build. If you knew anything about Socket 7 hardware before beginning to read this post, you were probably already expecting to see a 3Dfx Voodoo card make an appearance. And so I shall not disappoint you, and uncoincidentally, have two 3dfx Voodoo2 (Diamond, 12MB) cards on hand.

I can remember reading about these cards in issues of PC Gamer in 1998 as a kid back then. I would've loved to have one (or two) of these. They could be joined together in an SLI configuration to enable higher resolution modes to be used and also to help speed up operation. This is the original, bad-ass, SLI graphics card configuration! Sadly, I would never own one of these. The first 3D graphics accelerator we got back then was a Diamond Stealth III S540, 32MB AGP card for our AMD Duron computer a few years later. And we got that card when it was already old news (primarily because it was a budget card already, so I imagine my dad was able to buy it for quite cheap).

Anything 3dfx-branded is expensive nowadays. And I will not even talk about the ridiculous prices that Voodoo5 cards go for. However, I did not let these prices stop me from getting these two cards. I still kind of want to get an original Voodoo card also, but that can wait. Apparently there are some slight differences in compatibility (some older DOS games that supported 3dfx acceleration worked best with a Voodoo1 from what I've read).

I didn't buy these both at once, figuring that I probably didn't care much about the SLI capability. Eventually though, I was curious and grabbed the second one too. And then I installed them both and had the SLI configuration up and running and tried it out ... and ... meh. No real improvement. This makes sense when you think about it though. The Voodoo2 came out in 1998. The Pentium 233 MMX CPU was the last Socket 7 Intel Pentium desktop CPU and released in 1997. By 1998 (when the Voodoo2 was available), Pentium II CPUs were the latest and greatest. So, it makes sense that for some of the more demanding titles for 1998 (e.g. Quake II, or Unreal), that the Voodoo2 cards would be bottle-necked by the slower CPU. The Voodoo2 processor scaling project by Phil, over at Phil's Computer Lab, definitely shows this. You basically need a faster Pentium II to really see the benefits of SLI. All that an SLI configuration really does for a computer with a Pentium 233 MMX CPU, is allow higher resolutions to be used (like 1024x768). Even a Voodoo2 card maxed out with 12MB was only able to run at 800x600 by itself.

As a result of this, I ended up only going with a single Voodoo2 card in my build. This also has the benefit of keeping the temperature down inside the case while the computer is running. An important consideration when using a baby AT case which does not have a rear exhaust fan.

Voodoo2 cards (like the original Voodoo1 cards) only handled 3D graphics though. They used a VGA passthrough cable that you connected to a second graphics card your computer had to handle 2D graphics. You'd then plug your monitor into the second VGA port on the Voodoo card. I had a few options for the secondary graphics card:

Here we have a Matrox Mystique 220, an S3 ViRGE/DX and an ATI Rage 3D. I've not ever used an ATI card from this era, and I've never really read the greatest things about them. They always seem to have some issues with DOS compatibility. S3 cards on the other hand always seem to have excellent DOS compatibility. However, ViRGE cards in particular seemed to be hit or miss with regards to the VGA output quality. I've heard people complain about the colours not looking as "vibrant" as other cards. However, I had used this particular S3 ViRGE card already and had no complaints about it's quality. Matrox cards seem to be generally well-regarded as a choice to pair with a Voodoo and I'd never used one of them, so this was the choice I went with in the end just to try something a bit different. It does have a few DOS compatibility problems, but for 98%+ of DOS games it is totally fine. If I end up running into any issues myself, I will just swap in the S3 card.

Let me just take a moment here again to blab about 3dfx Voodoo graphics just a little bit more. Out of the box, they enable this filtering effect when being used to render 3D scenes (now that I think about it, I am unsure if this is a feature unique to games utilizing 3dfx's Glide API or not). There is a way to turn this filter off, but most people back then would've been playing games with it on. It's a fairly unique effect actually and I personally think it looks best on a CRT monitor. Click the below screenshot for a 3x zoom.

You can see that there's a kind of scanline type of effect throughout the image, but it's easiest to spot in the darker areas of that screenshot. I'm not sure if that scanline effect was actually intended or if it's just a side-effect of the filtering that the card is doing, but it's definitely a unique look. I never saw a 3dfx Voodoo card in action for myself as a kid, only just saw screenshots in magazines and such. But seeing it for myself, and it's little quirks (such as this filtering)... it's kind of grown on me actually.


One of the things that fascinates me about 80's and 90's computing is the wide variety of audio hardware. After the early/mid 2000's the majority of people just use whatever audio hardware is built into their motherboard and it usually sounds excellent and you just don't even give it a second thought. I'm by no means an audiophile, but even still, I do enjoy the abundance of choice for sound cards in older computer builds.

I have quite a few choices available (and this actually isn't even all of the ones I have on hand, heh).

  • Sound Blaster AWE64 CT4500
  • Sound Blaster 16 CT1750 (DSP 4.05)
  • Sound Blaster Pro 2 CT2600
  • Philips PCA750AF YMF719E-S
  • Gravis Ultrasound Classic 3.74
  • Gravis Ultrasound CD3 ("clone" card by Synergy)

Because this build will still be used for DOS games, a Gravis Ultrasound of some sort is a must-include for me. Gravis Ultrasound cards are quite expensive today. I already had one for my 486 build (the red one pictured above), and decided I wanted to get a second one for this build. They are both long cards and so it is worth considering which one fits best in which build so I put them both up for consideration here, and the remaining one will go back into my 486. I ended up putting the Gravis Ultrasound CD3 in this build. It's not quite as long as the Classic card and I liked how that fit in this build's case a bit better. I set the jumpers to disable the CD interfaces though. They aren't useful to me since my motherboard has two IDE ports on-board and they'd probably just end up causing hardware conflicts under DOS if left enabled.

That leaves the choice of Sound Blaster-compatible card. Each of these four have their own pros and cons. The AWE64 is not "true Yamaha OPL" but just an integrated emulation/clone. The Sound Blaster 16 sounds a bit "noisy" and has DSP clipping problems with certain titles (mostly I notice this in Wolfenstein 3D). However this card has the right DSP chip version (4.05) to avoid the infamous "hanging note" bug when also using the card for MPU-401 output. And it has a real Yamaha OPL chip. The Sound Blaster Pro 2 is a great sounding card for DOS titles, but maybe not the best choice for Windows games with higher quality audio assets. Also it does not support MPU-401 output via the gameport. The Philips YMF719 card is actually the best choice in my eyes with the least compromises. In fact, I cannot think of any compromises it makes when stacked up against these other choices. It has a real Yamaha OPL chip, the YMF719 which has great DOS and Windows support. It supports MPU-401 output via the gameport. And it does not suffer from the "hanging note" bug. Great all around, so I chose it.

For MPU-401 music, I will be using a Roland Sound Canvas SC-88VL that I was already using with my 486.

It is interesting having so much choice for MIDI music in DOS games. Sound Blaster's Yamaha OPL, Gravis Ultrasound, or Roland Sound Canvas. To my ears (and I imagine, most other people's ears too), the Roland Sound Canvas is the clear winner. But the Gravis Ultrasound also has quite a distinctive sound to it that I like. Plus several DOS games (e.g. One Must Fall 2097, Epic Pinball, Jazz Jackrabbit) support the Ultrasound directly, but do not offer MPU-401 support so an Ultrasound card definitely has it's own special place for use with DOS. The classic Yamaha OPL MIDI sound also cannot be forgotten (it is what most people will remember the clearest since it was probably all that they had when they were younger), but I will admit it is not my usual choice nowadays.

Gravis did ship Windows drivers for their Ultrasound cards. I did try this out myself not thinking anything of it, and just installed drivers for both the Philips YMF719 and Ultrasound in Windows 95. This proved problematic as the Ultrasound drivers wanted to control the typical Sound Blaster hardware resources (e.g. port 220h, IRQ 5/7, DMA 1) for use with it's own Sound Blaster emulation support. I could not find any way to disable this in the driver, so I was forced to uninstall it. Thankfully I am not interested in using the Ultrasound with Windows-specific software so this is no problem to me. I can still play DOS titles from within Windows and use the Ultrasound for sound/music output just fine (if I really wanted to launch DOS games this way that is), as long as the DOS ULTRINIT tool was run during startup.

Final Components List

  • CPU: Intel Pentium 233 MMX at 100MHz FSB with 2.5x multiplier (= 250MHz)
  • Motherboard: Tyan S1590S
  • RAM: 64MB (2x 32MB) PC100 (100MHz)
  • Graphics: Diamond 3DFX Voodoo2 12MB and Matrox Mystique 220 4MB
  • Audio: Philips PCA750AF YMF719E-S and Gravis Ultrasound CD3
  • Network: 3Com EtherLink III 3C509B-TPO
  • Hard Disk: Maxtor 8.4GB (within a ViPower Super Rack mobile rack enclosure)
  • CD-ROM: Matsushita CR-583-BCQ 8x
  • Floppy: Sony MPF920-E 1.44MB
  • Power: Astec 145W AT
  • OS: Windows 95C

The remaining choices are perhaps not quite as interesting and/or they were more "automatic" for me based on what I had or didn't have on-hand.

64MB of RAM was actually quite a lot for 1997/1998. This was a time when most people's computers were probably running at most 32MB, and probably even still a lot of people with 16-24MB. I could go higher as I have quite a lot of compatible RAM modules on hand and this motherboard supports more then 64MB, but I don't really see the point in putting in more.

For network connectivity, I really like using 3Com, 3C509 ISA cards specifically. These are very easy to set up in DOS and Windows 95. Under DOS, you just need a copy of the Crynwr packet driver 3C509.COM (which you can get from here, amongst other places). This packet driver won't help you use things like Microsoft DOS networking tools though. Under Windows 95 it's even easier though. The drivers for these cards are included out-of-the-box. You just have to make sure that TCP/IP networking support is included during installation (by default it is not). Just a simple checkbox and you're done.

Internet connectivity is not so useful in general on an old computer like this. The modern web is basically unusable on Internet Explorer 3, or whatever other ancient web browser you might remember using. Newer SSL protocols in common use, and the fact that the majority of web sites these days are SSL-only means you'll just get generic connectivity errors in your ancient web browser because it doesn't understand the newer SSL versions. Even if it did that wouldn't help you much. Modern web development practices mean heavier single-page-app (SPA) style of development with megabytes of Javascript generating the page dynamically in many cases. And your run-of-the-mill ancient web browser will not support all these fancy Javascript things even if it wasn't so ridiculously slow at running it. Google searching is perhaps unsurprisingly still doable on Internet Explorer 3.

No, the real practical use for network connectivity on a "retro" computer such as this is for transferring files on a local network. With a 3C509 ISA network card I can, after a fresh install of Windows 95, begin immediately accessing files stored on the Linux HTPC on my local network running a Samba share (well, as long as authentication is not required) by simply navigating to "Network Neighbourhood" where after a short wait, an icon for my server will pop up once Windows 95 detects it. This can be used to copy across drivers and software of course, and so it is quite useful.

An 8.4GB IDE hard disk is perhaps a bit on the large-ish side for 1997/1998. But it is what I had on hand that was still in working order and is closest to what I feel is era-appropriate. I decided to use a mobile rack enclosure for it to keep it out of the way inside the already cramped baby AT case. And if it dies (due to age), it will be easy to swap out and replace with something like a CF card using a CF-to-IDE adapter (or one of my other remaining larger IDE hard drives... yeah I still have a stack of my old drives).

The particular CD-ROM drive I used is otherwise unremarkable, but was chosen because it looks like the same style as the ones I always remember having back when I was younger. And it really didn't feel right putting in something much faster such as a 48x speed drive (even though those would probably be cheaper then this... once you can call something "vintage" suddenly the price rises, no matter what it is ...).

The Astec PSU I used here is the only AT power supply I had that was not already in use. There are some manufacturers still making brand new AT power supplies though. Such as the Athena AP-AT30 300W AT PSU which I actually use in my 486 computer. This Astec PSU is actually "new old stock" which I picked up for cheap locally when I went to go pick up an old Microsoft PS/2 mouse I had bought off eBay. When I arrived, the guy asked if I was interested in anything else and let me take a look at what he had. I had driven to an address he had provided me which was the address of his father's old business that he was cleaning out finally and had all kinds of unsold merchandise. I had originally bought just a single Microsoft PS/2 mouse, but ended up leaving with the Astec PSU, a second mouse, and a couple games, all of this stuff brand new old stock and he only asked me for $20 extra. Not bad I thought, heh. Anyway, the PSU is still in great working order, though I would like to replace the capacitors on it soon just to be safe (they look fine so far though).

One thing that bothers me, in 2019, when reading retro-computing forums where people share writeups and pictures about their builds is how many people doing Socket 7 builds install Windows 98 instead of Windows 95. On a classic Pentium system, Windows 95 performs quite noticeably faster then Windows 98. And again, on a classic Pentium system, you are not likely running software that will only work on Windows 98. I mean, you could be, but I think your typical retro-computing-forum-visitor today in 2019 who is building a retro computer to play games (by far the most likely use), is probably not going to need Windows 98. I don't understand why people intentionally reduce their performance for no other real noticeable gain. Unless you just really prefer Windows 98 over Windows 95 I guess. *shrug*. I suppose this is being true to the era. I can remember people would do this all the time whenever a new version of Windows would come out, and install it on slower hardware even though it sometimes killed performance (relatively speaking of course).

Anyway, Windows 95C is a great choice for this type of build in my opinion. Performance-wise, it just flies. It is very stable. And it includes early USB support, but I don't really care about that as I'm not using USB in this build at all.


If you've only ever built a PC in say, the past 12-15 years... well, this here is a different beast. Around 1998 or so, ATX was starting to become the standard form-factor for cases, motherboards and power supplies and it made building computers easier in a number of ways. As well, at that time, computer cases were slowly starting to be built to be easier to work with.

Before all of that however, you were working with the AT form-factor. AT cases are annoying to work with. Baby AT cases (the smallest AT form-factor) are especially annoying to work with. I don't even really know if the "old-school" charm of it all can really offset this annoyance. These cases are cramped and this is especially annoying as this is also in the era of expansion cards and/or motherboards which have many jumpers or other little wires or other plugs that need to be connected. Often these connectors are not located in convenient places. And you'll probably end up setting it up wrong the first (and second) time and have to adjust it and then you'll really feel the cramped-ness. But wait, there's more! Your hard drive(s), CD-ROM drive, and floppy drive(s) are all going to be connected with IDE cables which are long and flat and take up a bunch of space and you probably have to twist them around because the other end of the cable wants to be plugged in upside down or something like that. So this all makes the inside of your computer even more cramped. Wow.

The cherry on top of it all is that these cases could practically be considered weapons with how many sharp edges there are on the inside. You'll definitely end up with a couple cuts if you're not too careful.

One example of where you'd configure jumpers on these older computers is for the (admittedly, pretty useless) "CPU MHz speed" LED displays that would be located on the front-panel of the case, usually alongside the power and reset buttons, and usually always accompanied by a "turbo" button. These displays are quite silly actually. They do not have any ability to somehow hook into your CPU and read it's current speed. All they do is display a fixed number that you "program in" by setting various jumpers on a large array of pins on the back of the little PCB that the LED display is on. Many of these would also have pins for the turbo button on your case (which would usually have two sets of wires hooked up to it, one you'd connect to your motherboard, and the other to the LED display). When hooked up this way, when the turbo button was pressed the LED display would start reading from a different set of pins on the PCB, allowing the number to change to reflect the current "turbo" state.

Shown in the left photo are the backs of two other spare MHz LED display boards I have. The way that I powered on the 3-digit display that I used in this build, in order to test the different pin arrangements as shown in the right photo, was probably not the best idea in hindsight. But I guess it worked out fine.

There were a great many different variations of these displays produced and you can still find scans for the manuals for a bunch of them out there, but I couldn't find one for the particular one I used here. So, I just used a trial-and-error approach, randomly placing pins until I had a good idea about what pins controlled what segments. Not a big deal.

There's a bit of a myth out there that I've read many times now on various forums, etc, that the "turbo" button found on older computers always worked in reverse to what you would think. That is, that pressing the turbo button in would result in your computer running slower, while when the turbo button was not pressed in, your computer would run at it's normal speed. Whether this was true or not for your old computer entirely depended on how the turbo button was wired up to your motherboard. The turbo button connector you'd connect to your motherboard was almost always a three-pin connector (certainly all of the ones I've seen have been) while there would only be two pins on your motherboard to connect it to. Depending on which two of the three pins you connected would determine how the button would behave. On my 486 for example, I specifically connected it so that when the turbo button is pressed in, the computer runs at full speed, as that makes logical sense to me.

This Tyan motherboard I am using here has no pins on the motherboard for turbo button functionality, so it is left disconnected. The turbo LED (not the speed display, but just the LED on the front panel that turns on when the button is pressed in) is instead hooked up to the sleep LED pins on the motherboard. Meh.

Anyway, yeah... many little pins. Actually this build wasn't too bad in that regard once the MHz speed LED was taken care of. And because Socket 7 boards usually (always?) included the I/O connectors for your hard disk, floppy drives, serial ports, etc on board, you didn't need to install any multi-I/O cards which usually came complete with a ton of jumpers. On these later Socket 7 motherboards, the only jumpers you probably had to set would be for overclocking purposes.

Some AT power supplies come with a little connector that you can hook up directly to the MHz display LED to power it. The power supply that I am using does not have this ('nor did the one I used in my 486 build). So, what I do is buy these molex-to-3pin connectors (shown in the first photo above) that are really intended to be used with case fans. However, this specific cable that I use here needs to be hacked up a bit first in order to use it with a MHz display LED. As-is, this connector will connect to the 12v line from a standard AT/ATX molex power connector which is too much for a MHz display LED (I may have found that out the hard way one time... thankfully it only resulted in a burnt out resistor that was easily replaced... oops!). So, I just cut the plastic a bit so that the connector will fit in reversed, such that the yellow wire on this cable connects with the red 5v line from the power supply's molex power connector. Finally, the smaller 3-pin connector on the other end of this cable is a little bit too big for all of the MHz display LED boards that I have. So, I just shave a bit of the plastic to make the connector shape smaller and then it fits just fine and works perfectly. I'm sure that there might be some perfect cable out there for connecting power up to these display boards, but after hours of searching I could not find anything.

The remaining pins to be connected are the ones on the motherboard for all of the front-panel buttons (such as the reset button), and the LEDs (power, hard-disk activity, sleep). This motherboard is a bit weird with the pins it provides for this. It appears to simply be missing ground pins for many connectors. For example, there are voltage pins for the power and sleep LEDs, but no ground pin for either of them (as you would expect to find right beside each other when connecting up a 2-pin cable). I can only assume that some cases maybe had different types of connectors for this that this motherboard was intended for? No idea honestly. So I improvised here by buying some cables to borrow the other unused ground pins from elsewhere (there were two others that I was not using for anything else). It just added to the mess of wires inside, but I guess it works.

You might recall that when discussing what motherboard I was going to be using near the beginning of this post, I mentioned how I figured that the AGP slot on this motherboard was not going to be useful to me due to my use of a baby AT case. This photo shows exactly why. You cannot even see the AGP slot because those cables are directly in the way. I'd need to do some really serious cable management here to clear that space to be able to fit an AGP card that did not have these cables laying flat all over the top of it.

That being said, I suppose that I am mostly happy with this cable management. The number of cables is reduced somewhat because I decided to use a master/slave arrangement on just the primary IDE interface, putting both the hard disk and CD-ROM drive on the same IDE channel. Yes, this does impact performance in cases where both the hard disk and CD-ROM are being utilized at the same time, but I'm not really that concerned. That kind of thing doesn't happen all that often. And having one less IDE cable in this small case really made a big difference. The one thing I'd still like to change here is to get a shorter floppy cable. There was too much slack left over on it once I had it all connected and I just had to shove it off into the hard drive bay (which thankfully was empty due to my use of a mobile rack enclosure to put the hard disk in).


That's pretty much it. I am quite happy with this build. I think most people would definitely want to slap an AMD K6-2 or K6-III CPU in there, which as I mentioned above, I will likely do at some point down the road (probably will bounce back and forth between them over time). But even configured as-is with the Pentium 233 MMX, it runs nicely in both DOS and Windows for "general use" and it plays DOS and Windows titles from the early/mid-90's up until 1998 or so and then things start getting a little bit slow. For older speed-sensitive games, disabling CPU caches does help to slow the system down (even if it is annoying to have to do so). For the more high-end games released in 1998 or so, I've also been mostly pleasantly surprised. I did play through the entirety of Unreal on this shortly after building it almost two years ago now, and I had a great time. I found that setting it to a 512x384 screen resolution was a nice balance between framerate and detail, probably owing a fair bit to the 100MHz FSB that I configured this all with. Since I'm using a CRT monitor, I can get away with using these "odd-ball" screen resolutions and not have to worry about how it will look scaled up.

Rebuilding ACK-3D for DOS

code dos

In my last post, I talked about my recent experience in revisiting the book "Amazing 3-D Games Adventure Set" by Lary Myers (published in 1995) for the first time in 20 years. I was never able to build the code back when I first got the book as a kid and it always bothered me. But it didn't stop me from playing with the map editor a bunch and dreaming of what could have been.

When I revisited the code included on the book's CD recently, I had a bunch of trouble building it even though I finally, after all this time, had the required compilers!

As I talked about in my previous post, there were three versions of ACK-3D included on the book's CD. I naturally started with the DOS-based one that was intended to be built with Watcom C++. That one resulted in executables that would crash. However, the pre-built executable included on the book's CD did not exhibit the same crashing behaviour, leading me to believe that the pre-built executables were built with a different version of the code than what was included on the CD. At any rate, that belief would not help me at all, so no sense in dwelling on it. Fortunately, the Windows version of the code and example projects were all buildable with relative ease and did not crash in random ways. So at the very least, I could start with that working code base. As a bonus, according to the book, the Windows version was the latest version of the ACK-3D code anyway (well... the "latest" at the time the book was published, in 1995).

In my last post I decided to try diving into the DOS code that did crash to see if I could fix it. After I wrote that post, I continued playing with that code a bit more but I was not able to track down the apparent memory/linked-list corruption bug that I had found. Eventually, I figured it was probably not worth the effort and that I should instead focus my efforts on porting the Windows version of the code back to DOS. Additionally, comparing the code listings printed in the book with the code included on the CD, it is quite clear that all of the chapters that dissect the ACK-3D code in detail are all doing so with the Windows version.

I'm pleased to say that I have now finished doing porting and cleanup/fixing work and am pretty happy with the result which I have published to a repository on Github.

Here's a summary of the changes from what was included on the book's CD:

  • Removal of Windows compatibility. I am not interested in it (currently), and anyone who is can just use the code that was on the book's CD. Adding it back so you could compile a Windows version with Watcom C++ should be fairly simple though.
  • Replacing the build scripts for Lary's custom MK.EXE tool with Watcom makefiles.
  • Copying keyboard and timer interrupt handling from the FDEMO/MALL demo project sources to ACKLIB itself, replacing the existing interrupt handlers. The FDEMO and MALL projects were originally built with Watcom and their interrupt handlers were built for DOS/4GW compatibility, versus what was leftover in the Windows sources for ACKLIB which was intended for the Borland PowerPack for DOS.
  • Re-adding GIF bitmap support. This was talked about in the book and was removed because of GIF patent issues in the '90s. That patent has since expired and fortunately Lary accidentally included a fully implementation of ACKGIF.C on the book's CD anyway! Oops. So I just added it back in.
  • Removal of some unneeded source files (e.g. KIT.H), and other minor code cleanups of that sort.
  • Fix MOD player issues in the FDEMO and MALL projects. Background music never worked, even in the pre-built FDEMO and MALL executables included on the book's CD. But it turned out to be fairly simple to fix.
  • Sorted out the mess that was the DOS map editor source files and build scripts. There were actually two versions of the DOS map editor included on the book's CD and the source files for both were jumbled together in the same directory. Ugh.
  • Fixed assumptions about paths to files being loaded by the DOS map editor and BPIC.EXE tool for building ACK-3D resource files. These tools both always assumed everything was in the current working directory, which usually meant you needed to copy the tools around to each project's asset directory (that is also how the files on the book's CD were laid out). Quite annoying. Now you can simple place MAPEDIT.EXE and BPIC.EXE somewhere on your PATH and be done with it.
  • Probably other things that I've now forgotten.

Comparing my re-built version of both the FDEMO and MALL projects with the pre-built executables included on the book's CD, it does feel like the performance was faster with whatever version of the code Lary had used to build them with. It's not a super massive difference, but it is noticeable to me. I have not yet dug into this however. Doing a simple diff between both versions of ACKLIB shows there are a bunch of changes so it will take some time to compare.

One thing I thought would be a fun mini-project for myself to add to this was to build an example project from scratch. Chapters 11 and 14 of the book talk about what is needed to actually write an application from scratch using the ACK-3D engine. Chapter 11 is mostly just about the initialization steps, while chapter 14 is a full-blown "how to build a Windows ACK-3D application" tutorial. The result is nothing special, but it is quite useful. However, it is Windows-specific. I wanted something like this project but for DOS. And actually, it was quite easy to get up and running.

(I drew the blue frame in Deluxe Paint II, excuse my rather simplistic "art", heh).

This project just re-uses assets from the FDEMO, MALL and Station Escape demo applications included on the book's CD. It was kind of fun revisiting the map editor after all this time. Less fun was figuring out how to set up all the configuration files necessary to first, get everything loaded into the map editor, and then to build an ACK-3D resource file (the pics.dtf files that all the demo projects use). It's not the worst thing in the world, and thankfully there is literally a book written on the subject to assist me with it, but it definitely could have been simplified I think. Actually part of that simplification process was the Windows version of the map editor which could read ACK-3D resource files directly. The DOS version of the editor is not capable of this.

Most of the existing bugs in ACK-3D remain. Things like the "push-wall" secret doors (ala. Wolfenstein 3D) not rendering correctly in many cases. It looks kind of funky sometimes actually:

The secret doors otherwise function perfectly. It just appears to be solely a rendering glitch. I suspect a problem with the ray casting distance calculation but my initial attempts at fixing it proved unsuccessful. One more reason to re-read the book, as I'm sure with better understanding of the math and the engine's internals it would be simple to fix.

Otherwise, everything is now working great! I'm quite happy with the result (heh, it only took me 20 years...). That all being said, I'm not sure that I would actually want to base any project off this code as-is. As I've spent time dissecting this code I've become somewhat turned-off by it, heh. Tons of global variables, large functions that do a ton of things, messy formatting, etc, etc. It is largely a product of the time it was written honestly, so I cannot fault Lary alone for this. The industry has grown quite a bit in the last 25+ years.

That all being said, now that I can successfully build the code, I would now like to re-read this book from start to finish and use it to write some type of similar engine from scratch using the techniques Lary discusses. I'm not sure when I'll get to this though, as I've been fiddling with Turbo Pascal related things a lot recently (upcoming post on that hopefully sometime soon, maybe).

Amazing 3-D Games Adventure Set

code dos win9x

I wanted to write a bit about a book I've had sitting either on my bookshelf (or, at times, packed away in a box) for the past 22 years.

"Amazing 3-D Games Adventure Set" by Lary L. Myers, published in 1995.

To summarize the contents of the book briefly, it basically walks through a game engine that the author had written called "ACK-3D" (short for "Animation Construction Kit 3D") for building Wolfenstein 3D-like games. Actually, I'd guess it would be more accurate to say it's for building Blake Stone 3D-like games, since his engine also includes features like textured floors and ceilings, and light shading effects, and so on (unlike Wolfenstein 3D). But still at it's core, it's the same 2D grid, raycasting approach used by Wolfenstein 3D. The author takes you through the internals of his game engine, explaining the math and how it works along the way. Honestly, having re-read chunks of it recently for the first time in many years, he does a pretty good job of explaining things overall and his writing style is good at not putting you to sleep (unlike many math books I find). It's not anything along the lines of a super in-depth math textbook at any rate, as the math required for a raycaster like this is not that complicated at the end of the day. At the time that this book was published, this style of game engine was already "old news" since Doom had been out for over a year by then.

My grandmother bought this book for me while I was in a bookstore with her sometime in 1997 (possibly 1998?). I noticed it while looking at the computer books section and was totally blown away by the idea of being able to create 3D games. I was 13 or 14 at the time and had only been programming for two years or so (almost entirely QBasic, tiny bit of C by that point). Unbeknowst to me at the time, was that the level of complexity that that book was going to bring to the table was still a little bit beyond my grasp. But all the pretty screenshots on the back and in the colour gallery pages at the front of the book had me hooked.

Sure enough after getting home with it and proceeding to read through it, I struggled to understand a lot of the math (math was never my strongest subject and trigonometry was not something that was taught until one or two years later in school where I was). Perhaps most importantly was that I could not follow along with the code being discussed because I did not own any C compilers that were required to build the code included on the CD the book came with. At this point in time, I'd taught myself a little bit of C with the book "C DiskTutor" my dad had bought for me. This book included a stripped down version of Watcom C 8.5, and unfortunately was not suitable for this game engine code.

The demos on the included CD were fun to play with though, and it included a map editor as well with a fairly large library of game artwork to use in your levels.

Over the years I occasionally would pick up this book and flip through it, always feeling disappointed that I couldn't do anything with the code (even as my math knowledge improved and more of the book's explanations started making sense to me).

More recently, since I've fallen into this admittedly somewhat bizarre path of collecting obsolete software development tools, I realized when I picked this book out of a box in my closet that "hey, I could probably do something with this code now after all these years!"

I no longer have the original CD that my copy of the book came with (guessing it was an accidental victim of one of my previous "throw out all the junk" sessions which inevitably end up with my realizing I should not have thrown something out) but thankfully it is available still via Google search.

Anyway the author of the book, Lary, included multiple different versions of his ACK-3D game engine on the CD.

  • DOS version built with Borland C++ 4.0 (compatible with 4.5), found on the CD under /ACK/DOS/BORLAND.
  • DOS version built with Watcom C/C++ 9.5, found on the CD under /ACK/DOS/WATCOM. Additionally, the "FDEMO" and "MALL" demo projects under /ACK/DOS/FDEMO and /ACK/DOS/MALL both have source code intended to be built with Watcom C/C++ under those respective directories.
  • Windows version built with Borland C++ 4.0 (compatible with 4.5), found on the CD under /ACK/WIN. All the Windows projects/code under that directory are for Borland compilers.

As a teenage-hobbyist self-taught programmer in the 90's, you were probably most likely to be using Borland Turbo C++ as your C compiler, and not Borland C++ (sans "Turbo"). This is an important distinction as you would not be able to build the Borland DOS code with Borland Turbo C++. Why? Because Lary (quite understandably) utilized the Borland PowerPack for DOS addon which allowed you to write 32-bit DOS code that ran under Borland's DPMI extender. From what I can tell, this PowerPack addon was not included in Borland C++ (certainly it is nowhere to be found on my Borland C++ 4.02 CD) and instead had to be bought separately. It was also probably not likely that you were using Watcom C/C++ as a hobbytist programmer since it was more expensive (at least before it became Open Watcom around 2000/2001) and was not nearly as user-friendly as the Borland tools were.

So since I like Watcom C/C++ myself, I naturally decided to start with that version of the ACK-3D code. The included "FDEMO" and "MALL" executables worked perfectly fine and clearly were built with Watcom since they displayed the familiar DOS 4GW banner on startup.

For reasons unknown to me, Lary decided to use his own MK.EXE tool (included on the book's CD) instead of normal Makefiles using WMAKE or just MAKE. It's not the end of the world though, as his custom .MAK format is still perfectly readable (if a bit verbose). I was able to build the source code myself with Watcom C/C++ 10.0 completely without a problem, but running the built executable was another story.

Running it through a debugger indicated it was crashing in one of the assembly drawing routines. Great, yet another author that was probably incredibly rushed in the lead up to his book being published and made some mistake along the way in preparing the files for the CD, probably copying the wrong version of the source code over or something. *sigh*

Lary doesn't specifically mention what version of Watcom C/C++ he used anywhere in the book (many parts of the book make it feel like he was primarily developing with Borland and the Watcom version was only a side-venture), but included on the CD in a couple places are compiler .MAP files that included a Watcom version banner indicating he was using version 9.5. So I decided to search around the internet for a copy of that specific version and to try building the code with that. Alas, no such luck, my own executables built with the same version of the compiler Lary used fail in the same way.

So, since as I mentioned above, the book left me with a strong impression that Lary was using Borland C++ primarily in developing ACK-3D, so maybe I should give that version of the code a try. I installed Borland C++ 4.02 and the PowerPack for DOS on a Windows 95 Pentium MMX machine I built a while back (probably will write a post about this machine later, maybe) and the ACKLIB and FDEMO projects both compiled. Well, not so fast actually. I had to first fix up some hardcoded path's that were set in the project files (no big deal), but more importantly, I also had to fix some bizarre issue with TASM which seemingly could not be invoked successfully by the Borland C++ IDE. Specifically, this seems to be an issue with TASM32.EXE, as after some experimentation, I found that adding an alternate .asm to .obj "translator" (Borland's terminology for it) using TASMX.EXE worked fine. Trying to manually run TASM32.EXE from the command line seemed to highlight the source of the problem ... it seems that under Windows 95, TASM32.EXE 4.0 (which I am using) is unable to read any files (always fails with an error that it is "unable to locate [file]"). Oh well, at least I have a workaround. The only difference between the two TASM executables is that one can include 32-bit debugging information and the other 16-bit debugging information (at least according to my TASM manual anyway), so it should not matter to me for now.

With build issues out of the way, I was able to successfully build an executable for the FDEMO project. But it failed to run under Windows 95 with an error message "This DPMI32 module cannot be run on Win32." Ok, well, Borland C++ 4.0 (and the PowerPack) was released before Windows 95, so I guess it's perhaps understandable that it's DPMI extender might not be totally compatible with it. I rebooted into MS-DOS mode and tried to run again and while it worked, it didn't look exactly right.

As a bonus, it crashes when you use the FDEMO project's keys to toggle ceiling/floor drawing on/off. I didn't experiment with this version of the code much since I knew I was unlikely to pursue using the Borland DOS code if I wanted to build anything off the ACK-3D engine. Perhaps though, there is a simple solution to the problems I saw.

Needless to say that so far this was all not really that encouraging. I'm kind of glad way back when as a teenager that I did not bug my parents to buy me specific versions of compilers to so I could build the code for this book... I surely would've been quite disappointed!

The only thing left to try was the Windows version of the code. Back in 1995 this probably would've been pretty exciting as that was before DirectX was a thing (DirectX v1.0 would not have been released until later that year) and writing fast graphics code for Windows was uncommon at best. I imagine most people picking up this book in 1995 would've dug into the Windows code first just out of pure curiosity if nothing else. If you were one of the few game developers who really wanted to write fast graphics code for Windows instead of just writing for DOS as pretty much everyone else was at that time, you would've had pretty much no choice but to turn to WinG, which was a precursor to DirectX. WinG basically gave you fast pixel-level access to Windows DIBs which allowed you to do all the same low-level pixel stuff you would've done under DOS. Lary's Windows version of ACK-3D used WinG.

Once again I fired up Borland C++ 4.0 and this time built the Windows ACKLIB and WIN_EXAM projects (the former being more-or-less the equivalent of the DOS FDEMO project). Once some path errors were fixed and the same weird TASM building issues under Borland C++ were dealt with, it actually worked!

No crashes that I could find. I like to imagine that my younger teenage self would be so excited at this moment. It is such a shame to me that this is the only version of the ACK-3D code that works properly out of the box from the book's included CD, because in 2019 I have little-to-no interest in building Windows 3.1/95 code... at least right now anyway.

Anyway, the ACK_EDIT project which is the Windows version of Lary's map editor also builds and runs successfully. Also the example game that Lary frequently talks about in his book, "Station Escape" builds and runs flawlessly too.

Now... yes, I am greatly disappointed in the apparent lack of quality control / verification / testing that went into the source code that was included on this book's CD. It seems pretty clear that the perfectly working DOS demo project's pre-built executables were built with some different version of the DOS source code than what was included on the CD. That would seem to be the only explanation. I've spent a lot of time tweaking build settings and whatnot and cannot get my build artifacts to match the ones Lary included on the CD (he included .OBJ files as well as .EXE). They're always off by several KB or more. I thought at first that it was the difference of debug information being included, but that turns out not to be the case. Something else is up and again, my suspicion is that, like many other book authors, Lary was probably rushed during the final preparation of the CD files and perhaps mistakenly coping the wrong files. If you imagine back to the 90's, source code version control tools were not common place as they are today. And the source control tools that did exist at that time were usually not very good. Typically for version control you'd just copy files manually (either individually or entire directories) and end up with things like "GAME", "GAME1", "GAMEBAK", "GAMEBAK2", etc. If this was also the kind of thing that Lary was doing, then I could easily imagine the wrong version of the DOS source code being included accidentally. However, that is all just a wild guess... I really have no idea after all!

Since in 2019, I am primarily interested in building for DOS, I still wanted to try to get the DOS version of the ACK-3D code running reliably. So I decided to dig into the Watcom version in a bit more detail.

I've mentioned this before in a past post, but the Watcom C++ compiler, WPP386, catches many more warnings and errors than WCC386 does. I'm not a C/C++ standards expert so maybe there is a logical reason for this, but at any rate, I've often found it useful to temporarily compile a project with WPP386 just as kind of an extra "code linter" even if my project is 100% plain C. Doing this with the ACK-3D sources sure picked up a bunch of stuff, including pointer casting mistakes. Unfortunately fixing all of these did not resolve the crashing. That sure would've been nice and easy though, heh.

Stepping through the code with a debugger, the first set of problems I encountered was in AckDrawPage function found in ACKRTN.ASM. I actually happened upon this first before I decided to step through all the code bit by bit, as I was fiddling around with viewport size settings for some reason and noticed that if I set the viewport size to be the maximum screen size,

ae->WinStartX = 0;
ae->WinEndX = 319;
ae->WinStartY = 0;
ae->WinEndY = 199;

then crashes happened far less frequently and never upon immediately running the program. That led me to the aforementioned AckDrawPage function.

This function does a check to determine if it's rendering to a smaller (not maximum screen-sized) viewport, and if so, it runs a separate render loop.

    mov	    eax,[_gWinStartOffset]
    add	    edi,eax
    add	    esi,eax
    movzx   eax,[_gWinStartX]
    add	    edi,eax
    add	    esi,eax
    mov	    dx,[_gWinHeight]   ; <---- PROBLEM #1
    inc	    dx
    movzx   ebx,[_gWinWidth]
    mov	    ebp,320            ; <---- PROBLEM #2
    sub	    ebp,ebx

    mov	    ecx,ebx
    shr	    ecx,1
    rep	    movsw
    rcl	    ecx,1
    rep	    movsb
    add	    edi,ebp
    add	    esi,ebp
    dec	    dx
    jnz	    dp010

    pop	    edx
    pop	    ecx
    pop	    ebx
    pop	    edi
    pop	    esi

The first problem would only occur at random and would depend on what was in the EDX register in prior code. But what I would see is that sometimes it would have some large 32-bit value in it (some value larger then a 16-bit number) and so the dp010 loop below, which uses DX as a loop counter could end up looping a really large number of times and end up trashing a bunch of memory in the process. Switching the mov to a movzx so the 16-bit _gWinHeight variable overwrites the entire EDX register value solves that nicely. I suspect this might have been a missed bug that was a holdover from ACK-3D's 16-bit roots.

The second problem was much more serious and always occurred for me. Using EBP as a general purpose register was a common tactic when you were trying to write a tight loop and run out of registers to use. It's tricky though as it usually means you cannot reference variables on the stack or other locals in your assembly code (your assembler usually replaces those with references to memory using EBP) and also you need to be careful to restore it's original value when you are done. In this case, the latter was not being done which always caused a crash when this function returned. Oops. I also see that this big is present in the Windows version of the ACK-3D code, but all the Windows projects worked fine for me because they all use a maximum-sized viewport (at a guess anyway... I've not returned to look at the Windows code since discovering this bug).

Fixing both of these problems solved the common crash cases. However, it still would occasionally crash at random. Argh!

Turning back to the debugger, the remaining crash always occurs (when it occurs, seemingly at random) while the 2D bitmap object rendering code is walking through the engine's linked-list of Slices in FindObject within ACKVIEW.C.

sa = &Slice[Column];

if (sa->Active)
    while (sa != NULL)
	if (j <= sa->Distance)
	    sa2 = sa;
	    while (sa2->Next != NULL)
		sa2 = sa2->Next;            // CRASHES HERE

	    saNext = sa2->Prev;
	    while (sa2 != sa)
		sa2->Active = saNext->Active;
		sa2 = sa2->Prev;
		saNext = saNext->Prev;

	    sa->Distance = distance;
	    sa->bNumber	 = ObjNum;
	    sa->bColumn	 = BmpColumn;
	    sa->bMap	 = omaps;
	    sa->Active	 = 1;
	    sa->Type	 = ST_OBJECT;
	    sa->Fnc	 = WallMaskRtn;

	if (!sa->Active)

	sa = sa->Next;
    if (j <= sa->Distance)
	sa->Active = 1;
	saNext = sa->Next;
	sa->Distance = distance;
	sa->bColumn  = BmpColumn;
	sa->bNumber  = ObjNum;
	sa->bMap     = omaps;
	sa->Type     = ST_OBJECT;
	sa->Fnc	     = WallMaskRtn;
	saNext->Active = 0;

The first thing here that caught my eye was the memmove calls with the curious size value being used. Looking at the SLICE definition answered that question and one more that I had.

typedef struct _slicer {
    UCHAR     **bMap;
    UCHAR     *mPtr;
    short     bNumber;
    unsigned  short bColumn;
    short     Distance;
    short     mPos;
    unsigned  char Type;
    void      (*Fnc)(void);
    unsigned  char Active;        /* Keep these last 3 fields in this order */
    struct    _slicer *Prev;      /* a memmove is done on sizeof(SLICE)-9 */
    struct    _slicer *Next;      /* to move everything above these fields. */

I had noticed before that the existing build files for all of the ACK-3D projects (DOS and Windows) specified byte-alignment for structures and I was curious about that, figuring it was just an oversight, but I always left it alone just incase. And this code here explains at least one reason why it's like that. The sizeof(SLICE)-9 wouldn't work otherwise.

Unfortunately this is where I am currently stuck! This particular crash happens seemingly at random (I'm sure there is a pattern to it, but I cannot tell what it is yet). The links in the global Slice linked-list appear to be getting broken somewhere along the way. Other then the initial memory allocation, this loop here seems to be the only place that directly manipulates the linked-list via the Next and Prev pointers, so unless I've missed some other place (possibly in the assembly code somewhere if anywhere), then it would seem to be likely that some accidental pointer misuse or something is trashing part of the linked-list's memory occasionally. I ensured that all memory allocations that the engine does at startup get zeroed out and in this case the Next pointer it crashes on is definitely not NULL.

I intend on continuing to try getting this to work ... I feel like I owe it to my younger self, heh. My last resort is to try back-porting the Windows code to DOS since that version of the ACK-3D engine is newer (based on the timestamps of the files on the CD).

UPDATE: I wrote a follow-up post here.

A Late Postmortem of "Monster Defense"

code rambling

A few months ago, I uploaded the full source code and assets I used in an unfinished project I worked on back in 2011/2012 to Github. However, as is typical of me lately, I forgot to take the opportunity to write about it here. This is going to be a very informal postmortem that is probably better described as me writing random things in an unorganized manner about an old project of mine.

"Monster Defense" was to be an arena shoot-em-up / bullet-hell type of arcade game where you are supposed to survive an onslaught of monsters that come at you in waves. Monsters you kill can drop power-ups and other random power-ups will spawn periodically throughout the level as you go. There were two different planned modes of play, "timed" and "endless" which are probably fairly self-explanatory.

The artwork is all placeholder. Because I opted to use Quake 2 MD2 models for reasons discussed later in this post, I found that using the wealth of character models from Egoboo (whose assets are freely available under a GPL3 license) was helpful in getting reasonable placeholder artwork going. The problem is I of course never got "real" artwork in the end to replace the placeholders. D'oh.

Not exactly a project worthy of any awards in unique design, heh. Good thing I don't fancy myself a game designer.

Anyway, I suppose I might as well start with a little background on how this project came to be.

Background and Timeline

For most of my side-projects, especially game projects, I find I tend to have a very strong Not Invented Here syndrome approach where I rather enjoy building up the groundwork myself rather then using some preexisting framework or whatnot that may be available. For game projects, this means that you will probably never find me using something like Unity or Unreal Engine or something like that. Lower-level libraries like SDL are a different story, especially if you want to target multiple platforms. In fact, I think something like SDL is really a sweet spot for people like me in that it provides just enough of an abstraction that I can easily target multiple platforms but build up my own engine from scratch if I so choose. In recent years, I've found myself becoming more accepting of using something for game projects that provides a little bit more abstraction, such as libGDX, while still not going to all-out Unity-levels of abstraction.

But back in 2011, I had been working on my own C/C++ framework for my game projects and it was just based on SDL for desktop PC builds. It was coming along well, and in early 2012 I actually left my job to start working on the project that would become Monster Defense full-time. I had grandiose ideas that I was going to somehow make it as an indie game developer. Heh. Well, that was part of the reason I left my job. The other big part was that I was getting a little bit fed up and noticed that I was really just coasting along at work so I guess this was my own way of shaking things up for myself. I think I knew in the back of my head at the time that there was absolutely no chance I was going to make it as any kind of remotely successful indie game developer. I have barely any artistic talent (at least, not enough to be able to draw my own complete set of game art assets) and as you can tell from the fairly uninspired "design" of Monster Defense described previously, I'm not exactly a game design genius.

Actually, I should point out that I was originally going to work on a completely different game. I even mocked up a promo image for it.

I don't now remember why I switched the project to Monster Defense instead. I think the reason might have been that I thought it would be an easier project to finish or something like that. Of course that wasn't true. Both project ideas were quite involved. But hindsight is (usually) 20/20...

I remember that within two or three months of quitting my job, I had fully accepted the fact that Monster Defense was going to just be a portfolio piece I would use to get a job in the games industry and that there was no way it was going to become a fully polished and published game that would make me money. Didn't really take long for reality to fully set in, right?

I continued working on it and around late fall or so of 2012, I think I had come to realize that I actually didn't want to work in the games industry. Instead, I was fine with game development just being a hobby. This was actually the second time in my life I had come to that same conclusion. The first was in 2005 after I had finished a two year college diploma and was about to commit to moving out west to attend the game development course at the Art Institute of Vancouver (apparently this has all changed significantly in the past decade or so, I don't think it exists in Vancouver anymore?). I re-thought the decision and ended up getting a job as a "normal" software developer instead.

Anyway, during summer and fall of 2012 I had started to notice that I was becoming less enthusiastic about working on Monster Defense as time went on. A big part of it was the realization that I was not ever going to be able to release it as a finished product using placeholder art assets as I was currently using. And I was still unsure exactly what was the best way to solve this problem. You could say that, obviously, recruiting an artist was the solution here. I was not so enthusiastic about that idea because the last time I worked on a game project in a team with an actual artist, the experience left a really bad taste in my mouth.

Around 2006/2007 or so, I was working on a Zombie survival / puzzle game for Nintendo DS (using homebrew development tools, like devkitARM) in a team of friends, one of whom was an artist and two others who also had some varying artistic talents as well. That project ended abruptly with our artist seemingly just disappearing (I actually never met him in person, but he was a friend of a friend and lived in Vancouver). In the end no artwork had actually been produced. At least, none that I ever saw. In fact, the only thing that had ever been produced during this failed project was a fairly basic 3D engine for the Nintendo DS (coded from the ground-up by yours truly) and a very, very overly ambitious game design document that detailed a plan we were honestly never going to come close to completing. The overly ambitious design is a different problem altogether and I think is a very common problem, but the artist randomly disappearing was hugely problematic and ruined whatever trust I may have had in a team composed mostly of friends being reliable. Actually, now that I think about it, this was the second time that I've had an artist disappear on me (the first was for a QBasic game project around 2001/2002 or so). Maybe the problem is me? Heh. *shrugs*

Very early test build of the Nintendo DS project. Bits of this code ended up being used in Monster Defense too!

But regardless, even if I had wanted to try finding an artist again to see how it went, I did not have much money left to pay for one and I was skeptical of being able to find one who would volunteer to help me out on the promise of splitting future profits once the game was ready to be sold. Especially since I no longer really had that end goal in mind anyway. I didn't want to be dishonest, recruiting someone on that premise when I no longer believed it would happen at all.

As a result the project basically ended right there during fall 2012 and in early 2013 I found a new full-time job. I continued working on Monster Defense in bits here and there during early 2013 but more just as "something to do."

The Project Itself and How I Built It

As mentioned earlier, Monster Defense was built from the ground up using SDL for desktop PC builds. I developed it using C/C++, in a "C with classes" coding style, targeting OpenGL ES 2.0 (or the equivalent desktop OpenGL feature-set). I built my own framework and game engine for this project. I also was targeting mobile devices through the use of the now defunct Marmalade SDK (formerly Airplay SDK).

Before SDL 2.0 really stabilized and ironed out their mobile device support, Marmalade SDK was really quite awesome I think. It provided an environment where you could code your game using a quite familiar feeling game loop, C-style, in your main() function. It had some higher-level graphics/audio/input abstraction support but you could also opt to just use OpenGL directly and ignore their own graphics library which is what I did.

I updated my game framework to support different platforms mainly via an abstraction provided in two interfaces, OperatingSystem and GameWindow. I had two implementations, one for SDL for desktop builds, SDLSystem and SDLGameWindow, and the other for Marmalade SDK for mobile builds, MarmaladeSystem and MarmaladeGameWindow. This platform abstraction carried on a little bit further with interfaces like Keyboard, Mouse, Touchscreen, File, and FileSystem.

I was pretty happy with this architecture. Maybe it was not perfect, but it seemed to work well enough for me. The game loop was handled in a base class called BaseGameApp which the game project itself would implement (what I always just called GameApp). The main() function just served to instantiate the OperatingSystem, GameWindow and GameApp classes and then called GameApp:Start() which would get the game loop running.

BaseGameApp provided access to the system objects mentioned above as you can probably imagine.

OperatingSystem* GetOperatingSystem();
GraphicsDevice* GetGraphicsDevice();
Keyboard* GetKeyboard();
Mouse* GetMouse();
Touchscreen* GetTouchscreen();
ContentManager* GetContentManager();

The ContentManager was an idea I took from XNA although I'm sure many other game frameworks use a similar approach. The content manager object took care of caching of loaded assets and loading them if they were not already when assets were requested.

Texture* texture = GetContentManager()->Get<Texture>("assets://textures/mytexture.png");

The above snippet would return a Texture* for the texture in the specified file. If it was not yet loaded it would be right then and there. Thusly, asset file paths were used as a unique identifier. In hindsight perhaps using numeric IDs to identify assets via the ContentManager might have been better. At any rate, what I ended up doing in an attempt to simplify re-use of common assets was to have a singleton class ContentCache that had methods like this on it:

Texture* GetUISkin();
Texture* GetGamePadButtons();
SpriteFont* GetTitleFont();
SpriteFont* GetFont();
SpriteFont* GetSmallFont();
SpriteFont* GetUIFont();
TextureAtlas* GetEnvironment();
TextureAtlas* GetParticles();
TextureAtlas* GetItems();
Texture* GetShadow();

Looking at it now, I feel like the addition of ContentCache highlighted perfectly how ContentManager was not providing the right level of abstraction and I probably should have merged the two. ContentManager just ended up being a fancy general-purpose thing which I did not really need. At least, not in that fully general-purpose way (I definitely did need the caching and re-use of loaded assets obviously).

Anyway, back to the game loop, which was implemented via callbacks in GameApp that were called every frame. Additionally, other important system event callbacks could be invoked for certain things, such as the window being minimized or the app running on the mobile device being paused.

void OnAppGainFocus();
void OnAppLostFocus();
void OnAppPause();
void OnAppResume();
bool OnInit();
void OnLoadGame();
void OnLostContext();
void OnNewContext();
void OnRender();
void OnResize();
void OnUpdate(float delta);

At the time I remember that this style of game loop, implemented via callbacks (in this case, the main callbacks being OnUpdate() and OnRender() of course) seeming to be "obvious" to me in a "well, duh, why wouldn't you do it like this" kind of way. I had been doing game projects (such as the aforementioned Nintendo DS project) in this style for a while already. Nowadays, I'm not so certain. It might be because of my recent return to "old-school coding" with MS-DOS, but I feel like today I would prefer (if at all possible) to just put everything into a simple while(true) { ... } type of loop. No fully separated update and render operations. I would still separate update and render calls so that updates generally all happen before rendering each frame as that only makes sense. But I feel like sometimes there are some grey areas and having the full, forced, separation makes you sometimes do silly things to work around the 1-5% of the time you need to do something weird. Plus I guess that I just like the simplicity of a simple loop. Meh.

However, one plus of using this game app object callbacks architecture was that it made it pretty simple to plug in a game state/process system which still allowed each state and process to react to the very same types of callbacks/events.

The idea with the game state/process system was that in a typical game there are multiple different states where the player needs to interact with the game in a very different way. For example, they launch the game, and a main menu appears. They make some choices and then the game starts. Now they are playing the game. They can pause the game which maybe brings up a different menu. Or they could open some status/inventory display during game play which requires them to interact with a UI of some sort.

Each of these things can be represented by a different state, and all of the states together are represented by a stack. And that is exactly what StateManager and GameState are for. StateManager has a bunch of useful methods for manipulating the stack of game states.

template<class T> T* Push();
template<class T> T* Push(const stl::string &name);
template<class T> T* Overlay();
template<class T> T* Overlay(const stl::string &name);
template<class T> T* SwapTopWith();
template<class T> T* SwapTopWith(const stl::string &name);
template<class T> T* SwapTopNonOverlayWith();
template<class T> T* SwapTopNonOverlayWith(const stl::string &name);
void Pop();
void PopTopNonOverlay();

The reason for the use of templates in most of these methods is so that the class type of the game state is passed but not the actual object instance itself since I guess I didn't want the caller to have to instantiate the object itself (I honestly don't recall my reasoning for that decision now).


The code in StateManager always felt overly complex to me but I do feel like the end result worked quite well. In addition to normal game states that could be pushed/popped or swapped, you could also use "overlay states." This essentially let you have the current "main" state running in the background while overlaying another state over it to allow some limited processing to continue happening in the "main" state, but where the overlay state was the one that was really the current state. The best example of this is where the player is in game and they open their inventory. You might want to overlay a menu UI over-top of the game screen which still continues to be rendered and even maybe with animations still continuing in the background, but you want to pause game logic (so enemies cannot attack the player while they are in their inventory, for example). This type of thing could be implemented by having the inventory game state be an "overlay state."

GameState objects implemented all the same basic callbacks/events that GameApp had in addition to having a few extras to allow them to react to manipulation of the game state stack.

void OnPush();
void OnPop();
void OnPause(BOOL dueToOverlay);
void OnResume(BOOL fromOverlay);
BOOL OnTransition(float delta, BOOL isTransitioningOut, BOOL started);

OnTransition() in particular was neat because it allowed game states to very simply do things like implement their own fading in/out or similar type of effects.

Now, game states were not by themselves enough. At least I didn't think so at the time. Very often you may want to do multiple things each frame while a game state is running and they must run together. The best example of this I think is during a GamePlayState where the player has full control over the gameplay, you probably also want to show some kind of game status (e.g. score, lives, health, etc). You could obviously just call whatever code to show this in your GamePlayState or you could do what I did and have each game state have it's own ProcessManager which containers a list of GameProcesses, and then have a StatusUI process do the game status display.

ProcessManager worked quite similarly to StateManager except that instead of a stack of processes, it was just a list. Each GameProcess added was always processed each frame in sequence.

Circling back to what I was talking about earlier with regards to the game loop and the pros/cons of implemented via some really simple while(true) { ... } with all the code right there in the body of the loop, or by doing it how I described here using callbacks in GameApp... I think that (at least to me) my StateManager and ProcessManager setup made a pretty strong case for sticking with this approach. In fact I liked this architecture so much that I kept it for the next game project I worked on in 2014, even porting it all to Java/libGDX in the process.

Another really important thing for a game to have is some sort of events system. I'm not actually talking about events from the operating system or input devices, etc, but game logic events. Such as when the timer runs out, the player gets hurt or even dies, the player picks up an item, etc. My framework had support for this too, and this was another aspect that I think worked well in practice but had some little things to cleanup in the future maybe.

In my event system, classes can subscribe to events by implementing EventListener and then registering themselves with the EventManager (which was managed by BaseGameApp). I can remember it was very important to me that game states and processes (and, well, really anything else) should be able to subscribe to events via code like the following:


And then unsubscribe from events via StopListeningFor() called in a very similar way. And this is exactly the way it worked. Event handling happened in a Handle() method that any class implementing EventListener would need to provide.

BOOL GamePlayState::Handle(const Event *event)
    if (event->Is<QuitGameEvent>())
        // do stuff
    else if (event->Is<IntroFinishedEvent>())
        // do other stuff
    // ...

It was a very minor bit of syntax sugar, but I really remember the use of templates here to provide the information about what type of event to subscribe to, or here with the Is() method so it knows what to compare against, was a really important feature I wanted the API to have. shrug Looking back at it now, I do have to admit that I quite like the way it reads so I guess it was worth the effort.

Events were used quite a bit in Monster Defense. I had defined 35 different event types at the time I stopped working on the project. Definitely there would have been more had it continued. There were events for things like animation updates, player buffs, entity state changes, healing, damage, movement, score, weapon switching, timers, and other things.

Which brings us to entity management. I remember at this time there had been a number of articles written on the subject of "entity/component systems." I was initially intrigued by the idea as I was not liking the idea of going with some object hierarchy approach. I remember reading this article (and a number of others by the same author) and really liking the approach but that it clearly seemed to be missing some necessary extras and what I deemed "basic functionality." As well, I didn't like the idea of using ID's to represent entities and figured I'd instead just use an Entity object (but purely as a code convenience... it still would not be meant to contain any data directly).

class Entity
    Entity(EntityManager *entityManager);
    virtual ~Entity();
    template<class T> T* Get() const;
    template<class T> T* Add();
    template<class T> void Remove();
    template<class T> BOOL Has() const;
    template<class T> BOOL WasCreatedUsingPreset() const;
    BOOL WasCreatedUsingPreset(ENTITYPRESET_TYPE type) const;

The Get(), Add(), Remove() and Has() methods all operated on Component objects which you attached to entities and these were the things that contained the actual data that belonged to the entity itself.

class Component
    virtual ~Component();
    virtual void Reset();

As you can see from this, a base Component is incredibly simple and itself contains no data. The specific component types would actually have the data defined. For example, a PositionComponent to hold an entity's position in the game:

class PositionComponent : public Component
    void Reset();
    // the actual data this component holds
    Vector3 position;

ComponentSystem objects are registered with the EntityManager. Then each time during the game's OnUpdate() and OnRender() callbacks, each component system is called in turn.

When a component system is called to do it's processing (either during OnUpdate() or OnRender()), it queries the EntityManager for a list of all entities that contain a component it cares about. For example, the PhysicsSystem would fetch a list of entities that currently have a PhysicsComponent.

void PhysicsSystem::OnUpdate(float delta)
    EntityList list;

    for (EntityList::iterator i = list.begin(); i != list.end(); ++i)
        Entity *entity = *i;
        PhysicsComponent *physics = entity->Get<PhysicsComponent>();

        // do stuff

Generally, component systems would do their main processing loop based on entities having one "main" component type that that component system cared about, but component systems could very well retrieve other components from a given entity if needed (for example, the PhysicsComponent will need access to the entity's position via the PositionComponent in addition to whatever current physics state is present in its PhysicsComponent).

The entity system and event system also were integrated. Component systems were also EventListeners. For example, the PhysicsSystem would respond to JumpEvent and MoveForwardEvent amongst others. These types of events were EntityEvent subclasses which required an Entity object to be set so that the thing that listened to those events (almost always a ComponentSystem) would manipulate the source entity in response to the event.

MoveForwardEvent *moveEvent = new MoveForwardEvent(entity);

Most of the entities that were created and managed in Monster Defense fit into the same categories and were re-used a lot. For example, there were a few types of zombie monsters that were constantly re-used. Same for power-up objects that the player could pick up. To handle this standardized creation of new entities, the addition of new entities via EntityManager needed to be passed off to an EntityPreset.

An EntityPreset had a very important Create() method which would be called by EntityManager when an entity was being Add()ed. It was then up to that Create() method to actually add the entity to the EntityManager and then return it. The MonsterPreset and ZombiePreset together give you an idea of how much manual component adding and manipulation was required to create an entity in Monster Defense (ZombiePreset is a subclass of MonsterPreset so when a zombie monster entity is added, both of their Create()'s get called).

Entity* MonsterPreset::Create(EntityPresetArgs *args)
    Entity *entity = GetEntityManager()->Add();
    entity->Add<ColorComponent>()->color = COLOR_WHITE;
    return entity;

Entity* ZombiePreset::Create(EntityPresetArgs *args)
    Entity *entity = MonsterPreset::Create(args);
    ContentManagerComponent *content = GetEntityManager()->GetGlobalComponent<ContentManagerComponent>();
    KeyframeMesh *mesh = content->content->Get<KeyframeMesh>("assets://characters/zombie.mesh");
        ->AddSequence("idle", *mesh->GetAnimation("IDLE"), 0.05f)
        ->AddSequence("walk", *mesh->GetAnimation("WALK"), 0.02f)
        ->AddSequence("dead", *mesh->GetAnimation("KILLED"), 0.10f)
        ->AddSequence("punch", *mesh->GetAnimation("BASH_LEFT_2"), 0.1f);
        ->Set(ENTITYSTATE_IDLE, "idle")
        ->Set(ENTITYSTATE_WALKING, "walk")
        ->Set(ENTITYSTATE_DEAD, "dead", FALSE, TRUE)
        ->Set(ENTITYSTATE_PUNCHING, "punch", FALSE, TRUE);
        ->bounds.radius = 0.49f;

    PhysicsComponent *physics = entity->Get<PhysicsComponent>();
    physics->friction = FRICTION_NORMAL;
    physics->maxWalkSpeed = 6.0f;
    physics->walkingAcceleration = 4.0f;
        ->offset = Vector3(0.0f, -0.5f, 0.0f);

    KeyframeMeshComponent *keyframeMesh = entity->Add<KeyframeMeshComponent>();
    keyframeMesh->mesh = mesh;

    CanBeAttackedComponent *canBeAttacked = entity->Add<CanBeAttackedComponent>();
    canBeAttacked->byPlayer = TRUE;
    canBeAttacked->byNPC = TRUE;
    return entity;

Monster Defense had 65 different component types. Some of these were complex (like PhysicsComponent) and others served no purpose other then as a simple "marker" or type of glorified flag on the entity (such as AffectedByGravityComponent).

Now, a decision I am in hindsight a bit bothered by was that EntityPresets also sometimes subscribed to events and did processing on EntityEvents only when the entity in question was created by that same EntityPreset (when an entity was added using an EntityPreset the EntityManager would automatically attach an EntityPresetComponent to that entity to mark the EntityPreset type that was used to create it). Anyway, EntityPresets subscribed to stuff like DespawnedEvent, HealEvent, HurtEvent and KilledEvent so that common functionality that should occur for entities when these events occurred could be customized per entity-type. For example, when a bullet entity dies (KilledEvent) different things should happen then when a zombie entity dies. In the bullet's case, some particles should be created where it was last located at and nothing else. When a zombie dies, some smoke particles are created and a random chance for a power-up entity to be created needs to happen.

Looking back at this, I like the idea of the EntityPreset::Create() but I think that the other additions to EntityPresets made it a bit too complicated in practice. It's hard to put my finger on it exactly though, but I know if I were to start a new project today using this same entity/component system, I would probably spend a good long while thinking about a better way of organizing this.

However, in general I really liked the whole entity/component system concept. It felt very powerful to me. Especially with the physics system I had, I liked how easy it was to add full physics processing to basically anything. Added some smoke particles and didn't like them flying through solid walls? Fixed by adding a PhysicsComponent to it. Want to attach a "blob" (stupid simple black circle) shadow to some entity? Add a BlobShadowComponent. Want to make it so that one entity can push other entities around? Add a PusherComponent.

That all being said, I do feel like this also made the code feel a little bit more like spaghetti in some ways. It could be a bit harder to fully figure out how an entity was going to work once all it's components were added because the code for fully processing that entity was now spread out over (at most) 19 different component systems.

I won't be able to talk about every entity subsystem (of which there were many), but I do want to talk about the physics system a bit more because I spent so much freaking time on it. I think this took up at least half of my development time over the entire project, but it is hard to say for sure.

I remember initially investigating things like Bullet Physics and Open Dynamics Engine (ODE) and coming away from each of these feeling incredibly stupid. I never did particularly well in math 'nor physics classes in high school. I was at best a 'C' student in both subjects overall. Some years I did better then others. Of course, this is why things like the aforementioned two physics libraries exist... to make it possible for math "dummies" like myself to not have to fret over the scary details as much. Note the emphasized part of that last sentence. You do still need to know some stuff. Anyway, personally I felt like the documentation and examples for both of these libraries just flew over my head and left me almost clueless. I remember having a weak moment after investigating both of these and coming away empty handed where I almost decided that I really should have gone with something like Unity 3D instead from the very beginning ("Things like this must by one of the major reasons why these frameworks are so popular! I cannot possibly be the only one so stupid!"). I remember reading discussions at the time how the Bullet documentation was not the greatest so at least I didn't feel totally alone there.

Anyway, I decided to pick up a prototype I had worked on a few years earlier for a physics system that was based on the old Peroxide "Improved Collision detection and Response" article from 2003. I had it half working back then and figured I might see if I could fix the remaining problems and integrate it into my engine here for Monster Defense. I think after about a month of futzing about with this I became a victim of the sunk cost fallacy, unable to pull myself away from this and admit defeat. I just had to get it working.

I rather "enjoyed" spending lots of time working on problems with ramps. Of course, the single final level included in Monster Defense doesn't even have ramps in it, but hey, let's just ignore that fact. In particular I remember there was some bug that would only occur sometimes, where an entity that climbed up to the top of the ramp could suddenly disappear (in reality, they would be launched either upward or downward ��� I forgot which ��� at incredible velocity). This was due to some floating point inaccuracy that could cause a calculation to result in NaN and propagate into other calculations and cause mayhem. Took me a looong time to narrow that down.

Lots and lots of time was spent tweaking the "feel" of the physics system too, from the player's perspective. And I still don't think it's tweaked quite right. I remember reading something from Twitter at the time where an indie developer was saying something like how typically when a game's physics system feels just right, it's usually not the most accurate (real-world accurate that is). Like it's kind of "cartoon physics" in a way or something. I really agree with this. I definitely never nailed this either, heh. A lot of my time was spent on jump physics. Things that you never thing of, like how it should react when you jump up and hit a flat surface. Or you jump up and hit the corner of a ceiling edge. Or when you jump up and just as you're falling you catch the edge/lip of the top of a wall (there's a lingering "bug" in Monster Defense where you can double-jump in this scenario if you time it right, but the single level in it doesn't give you any opportunity to test this).

Test level for working on ramps, steps and other physics bugs.

One thing I did learn was that debugging physics problems by overlaying wireframe geometry onto the display was incredibly helpful. Looking at tons of numbers (so many you have to use 2 letter labels as in the above screenshot) is difficult. As well, stepping through with a debugger is not always the most helpful either.

The game world was represented using 3D tiles. Kind of Minecraft-inspired, where the world is subdivided into 16x16x16 TileChunks which are all part of a larger TileMap. Each grid component in the map being represented by a Tile which had a number of different properties. Special, optimized processing was present for Tiles which were just simple cubes. But it was also possible for tiles to be represented by arbitrary 3D meshes (ramps could be done in this way for example).

TileChunks needed to be turned into 3D meshes with a ChunkRenderer of which there was also an optional subclass, LitChunkVertexGenerator which could apply lighting effects, based on tiles that were defined to be light sources and would "spread" light to adjacent tiles (if there was empty space for light to pass though, or a tile was marked as not solid).

The full TileMap could be rendered by simply looping through all of the TileChunks in the map and rendering their pre-generated 3D meshes (which would each just be a VBO).

The use of a 3D tilemap for the game world made integration with the physics engine easier. Scanning the world for nearby world geometry to test for potential collisions with was made incredibly simple due to this. Once I did solve the seemingly insurmountable task of getting the physics engine to work decently enough, it really did feel like it all came together well. I remember when I finally got it all working, I just spent a day moving around one of my test worlds, running past the same set of walls and ramps and whatever other objects again and again, marveling at the fact that it actually worked. If I had to choose one single thing in my "programming career" that I worked on to date that gave me the most satisfaction or that I was most proud of, it would be this thing. Not this project (Monster Defense), but just getting this physics engine working as well as it is now, despite whatever bugs remain. I truthfully did not even think I would end up getting it working this well when I first began working on it.

There are a bunch of other things in Monster Defense that I could talk about in detail but nothing else feels like it is super worthy of writing lots of words about. One thing I feel that is missing that probably would have been added had I continued working on it was some sort of scripting integration, probably with Lua in particular since it's fairly easy to integrate with C. I think that using it to define entity behaviours and even creation of entities (versus the previously described EntityPreset system) would probably have been the best approach. Assuming of course that I could have arrived at a decent abstraction through Lua for interacting with the entity/component system. I suspect that adding/manipulating entity components through simple maps would have been the best way and is very likely something I will explore in future projects.

The last thing I think I will talk about is the graphics and math engine used in Monster Defense. There was a fair bit of inspiration taken from XNA here.

A GraphicsDevice is used to manage an OpenGL context, and allows Texture, Shader, VertexBuffer and IndexBuffer objects to be bound.

The integration between Shaders, VertexBuffers and IndexBuffers is something I was pretty proud of at the time, but am not sure how well it might hold up today (I've not really kept up to date on graphics APIs so I cannot say honestly).

VertexBuffer and IndexBuffer objects are both subclasses of BufferObject which handles the OpenGL details of VBO's or client-side buffers. Each of these two buffer subclasses had nice, easy to use methods for programmatically manipulating the buffer's contents, such as Move(relativeOffset), MoveTo(absoluteOffset), MoveNext(), MovePrevious(), SetCurrentPosition3(Vector3), GetCurrentPosition3(), etc. If the buffer was initially set to be backed by a VBO, using these "setter" methods would set a dirty flag and the next time the buffer was to be rendered by GraphicsDevice it would know to upload the buffer's data via OpenGL. A lot of this convenience functionality was definitely not efficient, but damned if it wasn't convenient. For pure performance, you obviously could just pre-set the buffer data and upload it once and then render it many times after that.

Example usage, initializing a VertexBuffer (part of some code to set up a wireframe grid mesh).

VERTEX_ATTRIBS attribs[] = {

points = new VertexBuffer();
points->Initialize(attribs, 2, width * 2 + 2, BUFFEROBJECT_USAGE_STATIC);

for (uint i = 0; i < height + 1; ++i)
    points->SetPosition3((i * 2), -(width / 2.0f), 0.0f, i - (height / 2.0f));
    points->SetColor((i * 2), 1.0f, 1.0f, 1.0f);
    points->SetPosition3((i * 2) + 1, width / 2.0f, 0.0f, i - (height / 2.0f));
    points->SetColor((i * 2) + 1, 1.0f, 1.0f, 1.0f);

And then rendering via GraphicsDevice

SimpleColorShader *colorShader = graphicsDevice->GetSimpleColorShader();

graphicsDevice->Clear(0.25f, 0.5f, 1.0f, 1.0f); 


graphicsDevice->RenderLines(0, points->GetNumElements() / 2);


You can see here a Shader is used (a preset Shader instance in this case, one of several that the GraphicsDevice provides, but custom shaders can be used too). Missing is some of the boilerplate that you might see in other OpenGL code to map vertex attribute to the shader (like the position and color attributes we previously set up above). This is handled automatically through the VertexBuffers use of "standard vertex attributes" which Shader objects are aware of and know how to automatically bind. This is accomplished per-shader due to the nature of the actual GLES shader source code being able to use completely arbitrary names for the attributes. But it's still easy to set up, like in the constructor of the SimpleColorShader:

    : StandardShader()
    BOOL result = LoadCompileAndLinkInlineSources(m_vertexShaderSource, m_fragmentShaderSource);
    ASSERT(result == TRUE);

    MapAttributeToStandardAttribType("a_position", VERTEX_POS_3D);
    MapAttributeToStandardAttribType("a_color", VERTEX_COLOR);

Where a_position and a_color are the actual attribute names used in the GLES source code.

An additional convenience that Shader objects provided was the ability to let you pre-set uniform values (even before the Shader is bound). If a uniform value is set on a shader object, it is simply cached, and the cache flushed (uniforms set via OpenGL) when the shader is bound. This is a minor thing, but it ended up being handy in a few places where I suddenly didn't need to care about this kind of thing. Probably not the best idea for performance-reasons, but I was not (and still am not) working on anything remotely cutting-edge.

While my game framework did have some limited support for skeletal animation with 3D models, it was half-baked and only ever worked with one or two test models. I fell back to vertex interpolation for animation (via good ol' Quake 2 MD2 models) as a result of giving up on trying to extract skeletal animation data out of Autodesk FBX models via their SDK. The FBX file format feels like PDF in that it is a bit of a "kitchen sink" format and as such, it hard to grasp fully for the uninitiated.

Final Thoughts

So, Monster Defense was of course never finished and probably won't be, at least not in it's current form. It was fun working on, but I am disappointed that it was never finished.

Technology-wise, some final notes:

  • While it was fun building everything myself from scratch, it ended up taking a ton of effort and this was probably the single biggest reason it was never finished.
  • Having said that, if I could it over again, I would still build everything from scratch. I learnt a lot.
  • Inefficiencies galore! I could get away with it here because I was by no means pushing any boundaries, but it's still (in my opinion) important to recognize where you are taking inefficient shortcuts. Some of these have been noted above, but one thing that still bothers me is that I was lazy about memory allocations. In particular I was all too willing to just arbitrarily new() up memory whenever I needed it.
  • Organization of the code, in hindsight after not looking at it for six years, feels messy. I find myself jumping around a lot trying to trace through how things worked.
  • I had some terrible STACK_TRACE macro I used everywhere which is a holdover from the Nintendo DS origins for a lot of this code where the debugging facilities are much more crude. I don't know why I kept this around here in Monster Defense. Also, ASSERT()ing after every allocation was a bit silly (again, another holdover from the Nintendo DS).
  • I will probably get shunned or banished as a programmer for this next point: I would probably consider using global singletons for a lot of "system"-type objects. For example, my ContentManager, StateManager, EntityManager, EventManager ... almost anything with "Manager" in the name I guess. I just don't see the value in having to pass these things around when they are actually global state. I don't see the value in hiding from this fact.

In late 2013 and through 2014 I ported much of this game framework code to Java using libGDX and put together the beginnings of a worthy successor.

I cannot currently release this as-is because the 3D model assets used are paid assets so throwing it up on Github would not be a good idea (I could do it with the code, but you would not be able to run it). However, I did release a sample demo of the underlying engine in 2014.

In this project, I did not resolve all my concerns with this game framework as noted in this post. But I did confirm a lot of them, heh. In particular the spaghetti code feel of a lot of the entity system architecture. It felt a lot worse in this project (but there were a lot more entity types...).

Perhaps there will be a "take 3" on this project in the future. Or I may just bring it back to MS-DOS. Maybe!