When 32 bits isn’t enough — Porting Zephyr to RISCV64

Written by Nicolas Pitre, Senior Software Engineer at BayLibre

This blog post originally ran on the BayLibre website last month. For more details about BayLibre, visit https://baylibre.com/.

Conventional wisdom says you should normally apply small microcontrollers to dedicated applications with constrained resources. 8-bit microcontrollers with a few kilobytes of memory are still plentiful today. 32-bit microcontrollers with a couple of dozen kilobytes of memory are also very popular. In the latter case, it is typical to rely on a small RTOS to provide basic software interfaces and services.

The Zephyr Project provides such an RTOS. Many ARM-based microcontrollers are supported, but other architectures including ARC, XTENSA, RISC-V (32-bit) and X86 (32-bit) are also supported.

Yet some people are designing products with computing needs that are simple enough to be fulfilled by a small RTOS like Zephyr, but with memory addressing needs that cannot be described by kilobytes or megabytes, but that actually require gigabytes! So it was quite a surprise when BayLibre was asked to port Zephyr to the 64-bit RISC-V architecture.

Where to start

The 64-bit port required a lot of cleanups. Initially, we were far from concerned by the actual RISCV64 support. Zephyr supports a virtual “board” configuration faking basic hardware on one side and interfacing with a POSIX environment on the other side which allows for compiling a Zephyr application into a standard Linux process. This has enormous benefits such as the ability to use native Linux development tools. For example, it allows you to use gdb to look at core dumps without fiddling with a remote debugging setup or emulators such as QEMU.

Until this point, this “POSIX” architecture only created 32-bit executables. We started by only testing the generic Zephyr code in 64-bit mode. It was only a matter of flipping some compiler arguments to attempt a 64-bit build. But unsurprisingly, it failed.

The 32-bit legacy

Since its inception, the Zephyr RTOS targeted 32-bit architectures. The assumption that everything can be represented by an int32_t variable was everywhere. Code patterns like the following were ubiquitous:

static inline void mbox_async_free(struct k_mbox_async *async)
{
        k_stack_push(&async_msg_free, (u32_t)async);
}

Here the async pointer gets truncated on a 64-bit build. Fortunately, the compiler does flag those occurrences:

In function ‘mbox_async_free’:
warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
k_stack_push(&async_msg_free, (u32_t)async);
^

Therefore the actual work started with a simple task: converting all u32_t variables and parameters that may carry pointers into uintptr_t. After several days of work, the Hello_world demo application could finally be built successfully. Yay!

But attempting to execute it resulted in a segmentation fault. The investigation phase began.

Chasing bugs

While the compiler could identify bad u32_t usage when a cast to or from a pointer was involved, some other cases could be found only by manual code inspection. Still, Zephyr is a significant body of code to review and catching all issues, especially the subtle ones, couldn’t happen without some code execution tracing in gdb.

A much more complicated issue involved linked list pointers that ended up referring to non-existent list nodes for no obvious reason, and the bug only occurred after another item was removed from the list. This issue was only noticeable with a subsequent list search that followed the rogue pointer into Lalaland. And it didn’t trigger every time.

The header file for list operations starts with this:

#ifdef __LP64__
typedef u64_t unative_t;
#else
typedef u32_t unative_t;
#endif

So one would quickly presume that the code is already 64-bit ready. From a quick glance, it does use unative_t everywhere. What is easily missed is this:

#define SYS_SFLIST_FLAGS_MASK 0x3U

static inline sys_sfnode_t *z_sfnode_next_peek(sys_sfnode_t *node)
{
        return (sys_sfnode_t *)(node->next_and_flags & ~SYS_SFLIST_FLAGS_MASK);
}

Here we return the next pointer after masking out the bottom 2 flag bits. But 0x3U is interpreted by the compiler as an unsigned int and therefore a 32-bit value, meaning that ~0x3U is equal to 0xFFFFFFFC. Because node->next_and_flags is an u64_t, our (unsigned) 0xFFFFFFFC is promoted to 0x00000000FFFFFFFC, effectively truncating the returned pointer to its 32 bottom bits. So everything worked when the next node in the list was allocated in heap memory which is typically below the 4GB mark, but not for nodes allocated on the stack which is typically located towards the top of the address space on Linux.

The fix? Turning 0x3U into 0x3UL. The addition of that single character required many hours of debugging, and this is only one example. Other equally difficult bugs were also found.

The unsuspecting C library

One major change with most 64-bit targets is the width of pointers, but another issue is the change in width of long integer variables. This means that the printf() family of functions have to behave differently when the “l” conversion modifier is provided, as in “%ld”. On a 32-bit only target, all the printf() modifiers can be ignored as they all refer to a 32-bit integer (except for “%lld” but that isn’t supported by Zephyr). For 64-bit, this shortcut can no longer be used.

Alignment considerations are different too. For example, memory allocators must return pointers that are naturally aligned to 64-bit boundaries on 64-bit targets which has implications for the actual allocator design. The memcpy() implementation can exploit larger memory words to optimize data transfer but a larger align is necessary. Structure unions may need adjustments to remain space efficient in the presence of wider pointers and longs.

Test, test and test

One great thing about Zephyr is its extensive test suite. Once all the above was dealt with, it was time to find out if the test suite was happy. And of course it wasn’t. In fact, the majority of the tests failed. At least the Hello_world demo application worked at that point.

Writing good tests is difficult. The goal is to exercise code paths that ought to work, but it is even better when tests try to simulate normal failure conditions to make sure the core code returns with proper error codes. That often requires some programming trickery (read: type casting) within test code that is less portable than regular application code. This means that many tests had to be fixed to be compatible with a 64-bit build. And when core code bugs only affecting 64-bit builds were found, fixing them typically improved results in large portions of the tests all at once.

OK, but where does RV64 fit in this story?

We wrote the RV64 support at the very end of this project. In fact, it represented less than 10% of the whole development effort. Once Zephyr reached 64-bit maturity, it was quite easy to abstract register save/restore and pointer accesses in the assembly code to support RV64 within the existing RV32 code thanks to RISC-V’s highly symmetric architecture. Testing was also easy with QEMU since it can be instructed to use either an RV32 or an RV64 core with the same machine model.

Taking full advantage of 64-bit RISC-V cores on Zephyr may require additional work depending on the actual target where it would be deployed. For example, Zephyr doesn’t support hardware floating point context switching or SMP with either 32-bit or 64-bit RISC-V flavors yet.

But the groundwork is now done and merged into the mainline Zephyr project repository. Our RV64 port makes Zephyr RTOS 2.0.0 a milestone release — it’s the first Zephyr version to support both 32-bit and 64-bit architectures.