I suppose I'm writing this post for my own benefit primarily. I'll likely forget many of these details in a month, and then go and try to write a bunch more assembly and run into problems. So I'll try to proactively solve that future problem for myself. Everything here is better documented in the compiler documentation. However, it is scattered around a bit and of course isn't written with specific examples for using TASM.

One of the performance benefits that Watcom brought with it that was a pretty big deal at the time was that it's default calling convention used registers for up to the first 4 arguments to called functions. Past that, and the stack would be used as per standard C calling conventions.

As mentioned this calling convention is the default, but it can be globally changed via the CPU instruction code generation compiler switch. For example, /3 and /3r both select 386 instructions with register-based calling convention, while /3s selects 386 instructions with stack-based calling convention.

Borland Turbo Assembler (TASM) does not natively support this register-based calling convention among it's varied support for programming-language specific calling conventions. However it does let you use it's "NOLANGUAGE" option (which is the default if no language is specified) and then you can handle all the details yourself.

ideal

p386  
model flat  
codeseg

locals

public add_numbers_

; int add_numbers(int a, int b)
; inputs:
;   eax = a
;   edx = b
; return:
;   eax
proc add_numbers_ near  
    push ebp
    mov ebp, esp

    add eax, edx

    pop ebp
    ret
    endp

end  

This is pretty normal looking TASM. Complete with normal looking assembly prologue and epilogue code. Note that we are intentionally not specifying a language modifier.

So, first off, add_numbers_ has a trailing underscore to match what Watcom expects by default. If you don't like this for whatever reason, you can change the name here to your liking, but the use of a #pragma in your C code is necessary to inform Watcom about the different naming convention for this function.

Second, via the magic of the register-based calling convention, Watcom will have our two number arguments all ready for us in eax and edx. Our return value is assumed to be in eax, and that is correct in our case so we're all good.

The great thing is, we don't actually need to do anything fancy to call this function from our C code.

// prototype
int add_numbers(int a, int b);

// usage
int result;  
result = add_numbers(10, 20);  

But that was the simple case.

This register-based calling convention actually places the burden on the called function to clean things up before returning. This includes preserving some register values as well. According to the documentation: "All used 80x86 registers must be saved on entry and restored on exit except those used to pass arguments and return values." So, in our add_numbers_ function if we had wanted to use ecx, we would need to push and pop it during the prologue and epilogue code. But we didn't need to do so for eax and edx because those were used to pass arguments and return a value.

As mentioned previously, the stack gets used for arguments once all the registers have been used for arguments (by default, eax, edx, ebx, ecx in that order). In this case, the called function is responsible for popping them off the stack when it returns. So, if there were two int arguments that were passed on the stack, we would need to do a ret 8 to return.

; For this function, using the default register calling convention, the first 4 arguments
; will be passed in registers eax, edx, ebx and ecx. The last two will be passed on the stack.

; void direct_blit_4(int width4,
;                    int lines,
;                    byte *dest,
;                    byte *src,
;                    int dest_y_inc,
;                    int src_y_inc);
proc direct_blit_4_ near  
arg @@dest_y_inc:dword, @@src_y_inc:dword  
    push ebp
    mov ebp, esp  ; don't try to be clever and move this elsewhere!
    push edi      ; likewise, don't try to group the push's all together!
    push esi

    ; code here (that also modifies edi and esi, thus the additional pushs/pops)

    pop esi
    pop edi
    pop ebp
    ret 8
    endp

Is this all too cumbersome to worry about? Well, I don't really think it's a big deal, but there is a way we can remove ourselves from this burden.

Let's say we didn't want to have to worry about preserving any of eax, ebx, ecx, edx, edi, or esi regardless of how many arguments our function has and what (if any) return value it uses. Also, maybe we don't want to have to worry about popping arguments off the stack ourselves when our assembly functions return.

// define our "asmcall" calling convention
#pragma aux asmcall parm caller \
                    modify [eax ebx ecx edx edi esi];

#pragma aux (asmcall) add_numbers;
int add_numbers(int a, int b);       // no change to the function prototype is necessary  

What if we actually wanted to use the normal C stack-based calling convention for our assembly functions and ignore this register argument nonsense? Maybe you're using an existing library and it was written for other compilers that don't use this register-based calling convention.

#pragma aux asmstackcall parm caller [] \
                         modify [eax ebx ecx edx edi esi];

Watcom also pre-defines the cdecl symbol for this same purpose, which you can and probably should use instead of defining your own.

The empty brackets [] denotes an empty register set to be used for parameter passing. That is, we are saying not to use any registers, so the stack is used instead for all of them. With that in mind, we could expand the set of default registers used for parameter passing:

#pragma aux asmcallmorereg parm caller [eax edx ebx ecx edi esi] \
                           modify [eax ebx ecx edx edi esi];

In this case the modify list is redundant and need not be specified.

Of course, saying that your function will use/modify more registers means that the compiler has to work around it before and after calls to your assembly function which may result in less optimal code being generated. There's always a trade off!

None of the above #pragmas remove the need for the standard prologue and epilogue code that you've seen a thousand times before:

push ebp  
mov ebp, esp  
; ...
pop ebp  

The only exception is if your assembly function isn't using the stack at all.

There are many details I've left out. For example, passing double values will mean two registers will get used for one argument because doubles are 8 bytes. But if you only have one register left (maybe you passed 3 ints first), then the double value will get passed on the stack instead. Additionally there are more details to know when passing/returning structs. But I'm not doing any of this right now, so I've not really looked into it beyond a passing glance.