As mentioned previously I’ve been doing a bit of coding for microcontrollers lately. Coming from the world of desktop and web programming, it’s downright revelatory. With no other code running, and no operating system, I can use every cycle on a 16MHz chip, which suddenly seems blazing fast. You might have to worry about hardware interrupts- in fact I had to swap serial connection libraries out because the one we were using misused interrupts and threw of the timing of my process.

And boy, timing is amazing when you’re the only thing running on the CPU. I was controlling some LEDs and if I just went in a smooth ramp from one brightness level to the other, the output would be ugly steps instead of a smooth fade. I had to use a technique called temporal dithering, which is a fancy way of saying “flicker really quickly” and in this case depended on accurate, sub-microsecond timing. This is all new to me.

Speaking of sub-microsecond timing, or "subus", let's check out Jindra S’s submission. This code also runs on a microcontroller, and for… “performance” or “clock accuracy” is assembly inlined into C.

/*********************** FUNCTION v_Angie_WaitSubus *******************************//**
@brief Busy waits for a defined number of cycles.
The number of needed sys clk cycles depends on the number of flash wait states,
but due to the caching, the flash wait states are not relevant for STM32F4.
4 cycles per u32_Cnt
*******************************************************************************/
__asm void  v_Angie_WaitSubus( uint32_t u32_Cnt )
{
loop
    subs r0, #1
    cbz  r0, loop_exit
    b loop
loop_exit
    bx lr
}

Now, this assembly isn’t the most readable thing, but the equivalent C code is pretty easy to follow: while(--u32_Cnt); In other words, this is your typical busy-loop. Since this code is the only code running on the chip, no problem right? Well, check out this one:

/*********************** FUNCTION v_Angie_IRQWaitSubus *******************************//**
@brief Busy waits for a defined number of cycles.
The number of needed sys clk cycles depends on the number of flash wait states,
but due to the caching, the flash wait states are not relevant for STM32F4.
4 cycles per u32_Cnt
*******************************************************************************/
__asm void  v_Angie_IRQWaitSubus( uint32_t u32_Cnt )
{
IRQloop
    subs r0, #1
    cbz  r0, IRQloop_exit
    b IRQloop
IRQloop_exit
    bx lr
}

What do you know, it’s the same exact code, but called IRQWaitSubus, implying it’s meant to be called inside of an interrupt handler. The details can get fiendishly complicated, but for those who aren’t looking at low-level code on the regular, interrupts are the low-level cousin of event handlers. It allows a piece of hardware (or software, in multiprocessing systems) to notify the CPU that something interesting has happened, and the CPU can then execute some of your code to react to it. Like any other event handler, interrupt handlers should be fast, so they can update the program state and then allow normal execution to continue.

What you emphatically do not do is wait inside of an interrupt handler. That’s bad. Not a full-on WTF, but… bad.

There’s at least three more variations of this function, with slightly different names, scattered across different modules, all of which represent a simple busy loop.

Ugly, sure, but where’s the WTF? Well, among other things, this board needed to output precisely timed signals, like say, a 500Hz square wave with a 20% duty cycle. The on-board CPU clock was a simple oscillator which would drift- over time, with changes in temperature, etc. Also, interrupts could claim CPU cycles, throwing off the waits. So Jindra’s company had placed this code onto some STM32F4 ARM microcontrollers, shipped it into the field, and discovered that outside of their climate controlled offices, stuff started to fail.

The code fix was simple- the STM32-series of processors had a hardware timer which could provide precise timing. Switching to that approach not only made the system more accurate- it also meant that Jindra could throw away hundreds of lines of code which was complicated, buggy, and littered with inline assembly for no particular reason. There was just one problem: the devices with the bad software were already in the field. Angry customers were already upset over how unreliable the system was. And short of going on site to reflash the microcontrollers or shipping fresh replacements, the company was left with only one recourse:

They announced Rev 2 of their product, which offered higher rates of reliability and better performance, and only cost 2% more!

[Advertisement] Forget logs. Next time you're struggling to replicate error, crash and performance issues in your apps - Think Raygun! Installs in minutes. Learn more.