RTOS debugging: When the CPU has too much on its hands

Written by Dr. Johan Kraft, CEO and founder of Percepio

In embedded systems using multitasking, you may find yourself in a situation where some of your tasks run slowly or not at all. This is called task starvation (the affected tasks are starved of CPU time) and it can happen for a number of reasons; today we will look at one particular reason – a fixed priority schedule where the task priorities happen to be wrong, or at least unsuitable.

In a scenario with fixed priorities, the developer need to assign a runtime priority for each task when it is created. Obviously, high-priority tasks should always execute before tasks with lower priority but when the low-priority tasks receive too little CPU time to do their jobs, you may have a starvation problem. This can manifest itself in many ways, for example the device may work perfectly most of the time but sometimes it becomes unresponsive.

A naïve fix would be to just raise priorities of the affected tasks, but this kind of thinking would ultimately render prioritization useless, since every task would become “highest priority”. Instead, be very careful about what tasks receive high scheduling priority: reserve it for tasks that are predictable, execute in a recurring pattern like “every x milliseconds”, and ideally also have a short execution time compared to the recurring interval.

It follows that handlers for external events, like the user pressing a button or network traffic, should execute with relatively low priority since these are almost by definition irregular events and the handlers can be invoked at any time. And to no-one’s great surprise, tasks with a long execution time should also execute at low priority.

“But wait,” you may be thinking by now, “I have time-critical interrupt service routines that require a lot of CPU time to move all their data.” The best solution in that case is to use hardware features like DMA to offload the processor core, but otherwise it is often a good idea to split such processing into several tasks and use different priorities at different stages. Isolate the really time-critical stuff in a small, high-priority task and have that task alert a medium-priority task to do the bulk of the processing. You may also have a third stage, implemented in a task that runs at low priority.

It is very important to put tasks to sleep when they have no more work to do. You should avoid polling and other forms of busy wait; after all, that amounts to little more than a waste of processor time. After completion, your task should delay itself or wait for an RTOS event (e.g., a semaphore signal), as this will allow the RTOS to schedule tasks with lower priority while the task is waiting. The sleeping task is revived by the RTOS when suitable.

If you have a really critical system, you may want to look into something called Rate-monotonic scheduling (RMS) (LINK: https://en.wikipedia.org/wiki/Rate-monotonic_scheduling). That is a formal technique for calculating response times resulting from a set of tasks, given their respective execution times. During the analysis, tasks are assigned priorities based on their execution time, so that the quickest task runs with the highest priority.

RMS can give you a formal guarantee that task starvation will not happen or, seen from the other angle, tell you how much processing capacity your system needs to avoid starvation. The flip side is that you need to have really good information about your tasks – execution times, periodicity et cetera – to carry out your analysis.

This is part of a series from Zephyr Project member Percepio. You can find the first blog here.

If you have any questions or comments, please feel free to reach out to the Zephyr community on the Zephyr Discord Channel.