I suppose I'm writing this post for my own benefit primarily. I'll likely forget many of these details in a month, and then go and try to write a bunch more assembly and run into problems. So I'll try to proactively solve that future problem for myself. Everything here is better documented in the compiler documentation. However, it is scattered around a bit and of course isn't written with specific examples for using TASM.
One of the performance benefits that Watcom brought with it that was a pretty big deal at the time was that it's default calling convention used registers for up to the first 4 arguments to called functions. Past that, and the stack would be used as per standard C calling conventions.
As mentioned this calling convention is the default, but it can be globally changed via the CPU instruction code generation compiler switch. For example,
/3r both select 386 instructions with register-based calling convention, while
/3s selects 386 instructions with stack-based calling convention.
Borland Turbo Assembler (TASM) does not natively support this register-based calling convention among it's varied support for programming-language specific calling conventions. However it does let you use it's "NOLANGUAGE" option (which is the default if no language is specified) and then you can handle all the details yourself.
ideal p386 model flat codeseg locals public add_numbers_ ; int add_numbers(int a, int b) ; inputs: ; eax = a ; edx = b ; return: ; eax proc add_numbers_ near push ebp mov ebp, esp add eax, edx pop ebp ret endp end
This is pretty normal looking TASM. Complete with normal looking assembly prologue and epilogue code. Note that we are intentionally not specifying a language modifier.
So, first off,
add_numbers_ has a trailing underscore to match what Watcom expects by default. If you don't like this for whatever reason, you can change the name here to your liking, but the use of a
#pragma in your C code is necessary to inform Watcom about the different naming convention for this function.
Second, via the magic of the register-based calling convention, Watcom will have our two number arguments all ready for us in
edx. Our return value is assumed to be in
eax, and that is correct in our case so we're all good.
The great thing is, we don't actually need to do anything fancy to call this function from our C code.
// prototype int add_numbers(int a, int b); // usage int result; result = add_numbers(10, 20);
But that was the simple case.
This register-based calling convention actually places the burden on the called function to clean things up before returning. This includes preserving some register values as well. According to the documentation: "All used 80x86 registers must be saved on entry and restored on exit except those used to pass arguments and return values." So, in our
add_numbers_ function if we had wanted to use
ecx, we would need to push and pop it during the prologue and epilogue code. But we didn't need to do so for
edx because those were used to pass arguments and return a value.
As mentioned previously, the stack gets used for arguments once all the registers have been used for arguments (by default,
ecx in that order). In this case, the called function is responsible for popping them off the stack when it returns. So, if there were two
int arguments that were passed on the stack, we would need to do a
ret 8 to return.
; For this function, using the default register calling convention, the first 4 arguments ; will be passed in registers eax, edx, ebx and ecx. The last two will be passed on the stack. ; void direct_blit_4(int width4, ; int lines, ; byte *dest, ; byte *src, ; int dest_y_inc, ; int src_y_inc); proc direct_blit_4_ near arg @@dest_y_inc:dword, @@src_y_inc:dword push ebp mov ebp, esp ; don't try to be clever and move this elsewhere! push edi ; likewise, don't try to group the push's all together! push esi ; code here (that also modifies edi and esi, thus the additional pushs/pops) pop esi pop edi pop ebp ret 8 endp
Is this all too cumbersome to worry about? Well, I don't really think it's a big deal, but there is a way we can remove ourselves from this burden.
Let's say we didn't want to have to worry about preserving any of
esi regardless of how many arguments our function has and what (if any) return value it uses. Also, maybe we don't want to have to worry about popping arguments off the stack ourselves when our assembly functions return.
// define our "asmcall" calling convention #pragma aux asmcall parm caller \ modify [eax ebx ecx edx edi esi]; #pragma aux (asmcall) add_numbers; int add_numbers(int a, int b); // no change to the function prototype is necessary
What if we actually wanted to use the normal C stack-based calling convention for our assembly functions and ignore this register argument nonsense? Maybe you're using an existing library and it was written for other compilers that don't use this register-based calling convention.
#pragma aux asmstackcall parm caller  \ modify [eax ebx ecx edx edi esi];
Watcom also pre-defines the
cdeclsymbol for this same purpose, which you can and probably should use instead of defining your own.
The empty brackets
 denotes an empty register set to be used for parameter passing. That is, we are saying not to use any registers, so the stack is used instead for all of them. With that in mind, we could expand the set of default registers used for parameter passing:
#pragma aux asmcallmorereg parm caller [eax edx ebx ecx edi esi] \ modify [eax ebx ecx edx edi esi];
In this case the
modify list is redundant and need not be specified.
Of course, saying that your function will use/modify more registers means that the compiler has to work around it before and after calls to your assembly function which may result in less optimal code being generated. There's always a trade off!
None of the above
#pragmas remove the need for the standard prologue and epilogue code that you've seen a thousand times before:
push ebp mov ebp, esp ; ... pop ebp
The only exception is if your assembly function isn't using the stack at all.
There are many details I've left out. For example, passing
double values will mean two registers will get used for one argument because
doubles are 8 bytes. But if you only have one register left (maybe you passed 3
ints first), then the
double value will get passed on the stack instead. Additionally there are more details to know when passing/returning
structs. But I'm not doing any of this right now, so I've not really looked into it beyond a passing glance.