Switching to -mshort

I had wanted to do this for a while, but PeyloW’s great post on Atariscne.org pushed me to finally do it: I decided to update the game to the -mshort compiler flag.

The reason I wanted to switch to -mshort is that it allows for more efficient code generation on the 68000. Since that CPU has a 16-bit data bus, needlessly passing around 32-bit values on the stack is wasteful, causing additional CPU cycles to be spent.

By using -mshort, the compiler will default to word-length for integers (type int) and generate code that uses 16-bit values on the stack for those. Of course, you can still use long for 32-bit values when needed (by using the long type), but for many cases, such as loop counters or small offsets, using int is sufficient.

Besides performance gains, -mshort also allows for smaller code size.

Pitfalls

While this worked out quite well for my codebase, I encountered a few pitfalls.

For example, I had to switch libcmini to the short version as well, to make the calling conventions align between the two.

Also, I noticed that controlling fast-forward in Hatari using Natfeats didn’t work properly anymore. Whatever argument I passed (0, 1), it always proceeds to enable fast-forward.

Natfeats is a standard that is implemented by Hatari and it allows you to control certain aspects of the emulator, such as fast-forwarding, from your code. I use this to speed up the startup time of the game during testing, so I can quickly start playing. See Using Natfeats in 68000 Assembly Language for more details on Natfeats.

It turned out that for arguments, Natfeats expects long values on the stack, but with -mshort, a word is pushed onto the stack by my code so Hatari doesn’t find the correct value on the stack.

The reason that my code now pushes a word onto the stack was that the literal that was passed is by default an int, which is now word-length:

nfOps->call(fastForwardId, 1);

This was fixed by making the literal explicitly long:

nfOps->call(fastForwardId, 1L);

Also, I had to change

#define HEIGHT 512
#define SCANLINE_BYTES 160
long offset = HEIGHT * SCANLINE_BYTES

to:

#define HEIGHT 512
#define SCANLINE_BYTES 160
long offset = (long)HEIGHT * SCANLINE_BYTES

Because, again, by default the constants are int and therefore words, causing the expression (512 * 160) to overflow.

I could have also written the following:

#define HEIGHT 512L
#define SCANLINE_BYTES 160L
long offset = HEIGHT * SCANLINE_BYTES

But I think this is less ideal, because now the constants are forced to be longs while in other places using them as words would suffice (i.e., when multiplying by a value that doesn’t cause an overflow).

Finally, I found that calling C library function itoa didn’t work for me anymore in certain cases. It turned out that I had to use ltoa, which explicitly takes a long argument. itoa had silently changed to take a word and I was passing in a long.

Depending on your code, you might encounter similar issues when switching to -mshort, so be sure to check the types of your literals and function arguments.

I’m sure there are many more pitfalls that may arise when switching to -mshort, but these are the ones I encountered so far.

Performance gains

Of course, all of this effort wouldn’t be worth it if it didn’t result in a significant performance gain.

I’m using a special mode for benchmarking, where the game runs for a short while without user interaction where everything that happens is deterministic. The benchmark then measures the number of CPU cycles spent. This allows me to reliably compare the performance of different versions of the code.

While not a very scientifically sound benchmark, it gives a good sense of the performance results:

Without -mshort
Average: 102337086 cycles

With -mshort
Average: 101441200 cycles

The average gain is 0.875%.

Comparing the code size is a bit harder in this case because I’ve made some other changes as well, as a result of the -mshort change, which changed the code size slightly. But I can say that the resulting executable was about 1% smaller.

Conclusion

I must admit that I hoped for a bit more but it was relatively easy to achieve, so I’m happy with it.

Of course the result depends a lot on the codebase and mine already has quite a few assembly-based optimizations in place for tight loops with function calls, which otherwise could benefit from -mshort as well.