April 7, 2020

Eliminate Useless Timer Interrupts by Coalescing Timers

While learning about RTOSes, I came across the idea of Timer Coalescing which improves performance by eliminating unnecessary context switches. But there is no reason that this technique cannot be applied to bare metal firmware (without RTOS). To test out how well it works, I implemented it using a STM32F4 discovery board.

I wanted to see the improvement factor in a typical use case, so I decided on running 8 software timers on a 1ms timer peripheral. The software timers count down to 0 then reload themselves. From previous experience, typical periods were chosen for these timers of 1000, 200, 125, 50, 18, 27, 40 and 600ms. (If they had a common factor of say 5ms, then it would make more sense to use that as the peripheral timer rate, but it restricts the flexibility for future changes.)

Timer Peripheral Setup

TIM6 was chosen because we only need a Basic 16 bit timer in the STM32F4 architecture. It runs off APB1 at 42 MHz (HCLK at 168 MHz). Prescaler was set to 41, Counter Period ARR set to 1999 for 1ms and auto-reload preload set to Disable. TIM6 interrupts were enabled. Once all this is set up in the Device Configuration Tool, all that is needed to start TIM6 is a call HAL_TIM_Base_Start_IT(&htim6);

Software Timers Setup

Instead of each software timer being a separate variable with its own name, I used an array of 16 bit unsigned values. An array is necessary so that loops can be used, and wouldn't be that much of a change for any existing code since the name can be incorporated into a #define for the array index, for example
uint16_t tmrPIR_TimeOut becomes
swtmrs[TMR_PIR_TIMEOUT_IDX] with #define TMR_PIR_TIMEOUT_IDX 0

The timers are created and initialised like so

volatile uint16_t swtmrs[NUM_SW_TIMERS];
const uint16_t swtmrs_period[NUM_SW_TIMERS] = {1000, 200, 125, 50, 18, 27, 40 , 600};

static void swtmrs_init(void)
{
    uint8_t i;
    for (i=0;i<NUM_SW_TIMERS;i++)
    {
        swtmrs[i] = swtmrs_period[i];
    }
}

Simple, non-coalescing timers Interrupt Handler

Every TIM6 period (1ms), the interrupt fires and we have to decrement all the software timers and see if any have expired. This is the obvious way to do it, but it means many interrupts occur just to decrement the software timers and it is relatively rare for one to expire and flag.

void TIM6_DAC_IRQHandler(void)
{
    /* USER CODE BEGIN TIM6_DAC_IRQn 0 */
    uint8_t i;
    uint16_t tim6_reload;
    uint8_t measuring_tmr_fired = 0;
    HAL_GPIO_WritePin(GPIOD, GPIO_PIN_13, GPIO_PIN_SET);
    /* USER CODE END TIM6_DAC_IRQn 0 */
    HAL_TIM_IRQHandler(&htim6);
    /* USER CODE BEGIN TIM6_DAC_IRQn 1 */

    //naive timer handling, just decrement all of them, if they get to zero, reload. (Would also set a flag for main processing in realistic code).
    for (i=0;i<NUM_SW_TIMERS;i++)
    {
        if (swtmrs[i] > 0 && --swtmrs[i] == 0)  //timers may be stopped elsewhere by clearing them to zero, so they stay stopped.
        {
            //here set flag to say timer fired
            HAL_GPIO_WritePin(GPIOD, GPIO_PIN_15, GPIO_PIN_SET);
            swtmrs[i] = swtmrs_period[i];       //reload
            HAL_GPIO_WritePin(GPIOD, GPIO_PIN_15, GPIO_PIN_RESET);
        }
    }
    
    tim6_reload = TIM6_ARR_1MS - 1;
    TIM6->ARR = tim6_reload;    //reload ARR manually because it is allowed to change between interrupts
    HAL_GPIO_WritePin(GPIOD, GPIO_PIN_13, GPIO_PIN_RESET);
    /* USER CODE END TIM6_DAC_IRQn 1 */
}

Although this code says that the timer is allowed to change between interrupts, in this simple version it does not.

Coalescing Timers Interrupt Handler

The idea is to avoid having interrupts just to decrement the software timers, instead set the timer so that it comes back when there is something to do. Ideally this would mean only running this IRQ when there is a decrement of a software timer, but since there are 16 bit limits on the variables, longer timers may require a few interrupts before a decrement but this is still more efficient than the simple method. If you wanted to always achieve the best efficiency, you would also adjust the preload register.

void TIM6_DAC_IRQHandler(void)
{
    /* USER CODE BEGIN TIM6_DAC_IRQn 0 */
    uint8_t i;
    uint16_t tim6_reload;
    uint8_t measuring_tmr_fired = 0;
    HAL_GPIO_WritePin(GPIOD, GPIO_PIN_13, GPIO_PIN_SET);
    /* USER CODE END TIM6_DAC_IRQn 0 */
    HAL_TIM_IRQHandler(&htim6);
    /* USER CODE BEGIN TIM6_DAC_IRQn 1 */
  
    //timer coalescing is more complex but more time efficient.
    //find when next timer is due and don't interrupt until it is (min value of swtmrs)
    {
        static uint16_t interrupt_ticks;    //needs to be static so that it keeps value from last iteration

        for (i=0;i<NUM_SW_TIMERS;i++)
        {
            if (swtmrs[i] > 0) //timers may be stopped elsewhere by clearing them to zero, so they stay stopped.
            {
                if (swtmrs[i] >= interrupt_ticks)
                {
                    if (swtmrs[i]-interrupt_ticks == 0)
                    {
                        //here set flag to say timer fired
                        HAL_GPIO_WritePin(GPIOD, GPIO_PIN_15, GPIO_PIN_SET);
                        swtmrs[i] = swtmrs_period[i];       //reload
                        HAL_GPIO_WritePin(GPIOD, GPIO_PIN_15, GPIO_PIN_RESET);
                    }
                    else    //not hit zero, step down by same amount
                    {
                        swtmrs[i] -= interrupt_ticks;
                    }
                }
            }
        }

        //calc when to come back again
        interrupt_ticks = min_value_nz_array_u16(swtmrs, NUM_SW_TIMERS);

        //ensure ARR (16 bit) doesn't overflow, may require multiple interrupts in this case for longer software timers. Could be improved by also adjusting prescaler.
        if (interrupt_ticks > (0xFFFF/TIM6_ARR_1MS))
            interrupt_ticks = (0xFFFF/TIM6_ARR_1MS);

        tim6_reload = TIM6_ARR_1MS * interrupt_ticks - 1;
    }

    TIM6->ARR = tim6_reload;    //reload ARR manually because it is allowed to change between interrupts
    HAL_GPIO_WritePin(GPIOD, GPIO_PIN_13, GPIO_PIN_RESET);
    /* USER CODE END TIM6_DAC_IRQn 1 */
}

Calculating when next interrupt is due

We just need to find the minimum value in the array of software timers

uint16_t min_value_nz_array_u16(volatile uint16_t *arr, uint16_t num_entries)
{
    uint16_t i, minV;

    minV = 0xFFFF;

    for (i=0;i<num_entries;i++)
    {
        if (arr[i] < minV)
            minV = arr[i];
    }
    return minV;
}

Improvement made by using Coalescing Timers

Although the coalescing timers IRQ is more complex and takes twice as long to process, it does so less often (88% less often) which means that the average IRQ handling time is reduced by 75%.

The difference in the amount of interrupts is visually striking.

The simple method runs at 1000 interrupts per second, with each averaging 4.0us, so over one second the time spent in interrupts is 4000us.

The coalescing method runs at 116 interrupts per sec, with each averaging 8.32us, so over one second the time spent in interrupts is 965us. Thus 3035us is saved, which is 75%.

In practical applications, the IRQ handler is likely to have a bit more code in it, so the difference in IRQ handler execution times between these two cases will be minimal; the additional time spent in the interrupt due to using coalescing timers will be very small. But the reduction in the number of interrupts will remain, so the improvement factor should be maintained.