Category

Blog

Co-simulating HDL models in Renode with Verilator running on Zephyr RTOS

By | Blog

This blog originally ran on the Antmicro website. For more Zephyr development tips and articles, please visit their blog.

Antmicro’s open source simulation framework, Renode, was built to enable simulating real-life scenarios – which have a tendency to be complex and require hybrid approaches.

That’s why, besides other things, the Renode 1.7.1 release has introduced an integration layer for Verilator, a well known, fast and open source HDL simulator, which lets you use hardware implementations written in Verilog within a Renode simulation.

When you are working on ASIC or FPGA IP written in an HDL, forming a part of a bigger system with unknowns both in the hardware and software, many things can go wrong on multiple levels. That’s why ultimately it’s best to test it within the scope of the full system, with drivers and test software, in a real-world use case. Simulating complete platforms with CPUs and all peripherals using actual HDL simulation, however, can be too slow for effective software development (and sometimes downright impossible, e.g. when access to the entire SoC’s HDL is not available). Renode models will give you better speed and flexibility to experiment with your architectural choices (as in the security IP development example of our partner Dover Microsystems) than HDL, but there might still be scenarios where you could quickly try to directly use complex peripherals you already have in HDL form before going on to model them in Renode. For these use cases Antmicro has enabled the option of co-simulating HDL in Renode using Verilator. Co-simulating means you’re only ‘verilating’ one part of the system, and may in turn expect a much faster development experience than with trying to perform an HDL simulation of the whole system.

In the 1.7.1 release of Renode you will find a demo which includes a ‘verilated’ UARTLite model connected to a RISC-V platform via the AXI4-Lite bus running Zephyr.

Integration layer overview

The integration layer was implemented as a plugin for Renode and consists of two parts: C# classes which manage the Verilator simulation process, and an integration library written in C++ that allows you to turn your Verilog hardware models into a Renode ‘verilated’ peripheral.

The ‘verilated’ peripheral is compiled separately and the resulting binary is started by Renode. The interprocess communication is based on sockets.

How to make your own ‘verilated’ peripheral

An example ‘verilated’ UARTLite model is available on Antmicro’s GitHub.

To make your own ‘verilated’ peripheral, in the main cpp file of your verilated model you need to include C++ headers applicable to the bus you are connecting to and the type of external interfaces you want to integrate with Renode – e.g. UART’s rx/tx signals. These headers can be found in the integration library.

// uart.h and axilite.h can be found in Renode's VerilatorPlugin
#include "src/peripherals/uart.h"
#include "src/buses/axilite.h"

Next, you will need to define a function that will call your model’s eval function, and provide it as a callback to the integration library struct, along with bus and peripheral signals.

void eval() {
#if VM_TRACE
  main_time++;
  tfp->dump(main_time);
#endif
  top->eval();
}

void Init() {
  AxiLite* bus = new AxiLite();

  //==========================================
  // Init bus signals
  //==========================================
  bus->clk = &top->clk;
  bus->rst = &top->rst;
  bus->awaddr = (unsigned long *)&top->awaddr;
  bus->awvalid = &top->awvalid;
  bus->awready = &top->awready;
  bus->wdata = (unsigned long *)&top->wdata;
  bus->wstrb = &top->wstrb;
  bus->wvalid = &top->wvalid;
  bus->wready = &top->wready;
  bus->bresp = &top->bresp;
  bus->bvalid = &top->bvalid;
  bus->bready = &top->bready;
  bus->araddr = (unsigned long *)&top->araddr;
  bus->arvalid = &top->arvalid;
  bus->arready = &top->arready;
  bus->rdata = (unsigned long *)&top->rdata;
  bus->rresp = &top->rresp;
  bus->rvalid = &top->rvalid;
  bus->rready = &top->rready;

  //==========================================
  // Init eval function
  //==========================================
  bus->evaluateModel = &eval;

  //==========================================
  // Init peripheral
  //==========================================
  uart = new UART(bus, &top->txd, &top->rxd,
  prescaler);
}

As part of the last step, in the main function, you have to call simulate, providing it with port numbers, which are passed as the first two command-line arguments of the resulting binary.

Init();
uart->simulate(atoi(argv[1]), atoi(argv[2]));

Now you can compile your project with Verilator:

verilator -cc top.v --exe -CFLAGS "-Wpedantic -Wall -I$(INTEGRATION_DIR)" sim_main.cpp $(INTEGRATION_DIR)/src/renode.cpp $(INTEGRATION_DIR)/src/buses/axilite.cpp $(INTEGRATION_DIR)/src/peripherals/uart.cpp

make -j 4 -C obj_dir -f Vtop.mk

The resulting simulation can be attached to the Renode platform and used in a .repl file as a ‘verilated’ peripheral.

uart: Verilated.VerilatedUART @ sysbus <0x70000000, +0x100>
  simulationFilePath: @verilated_simulation_file_path
  frequency: 100000000

When you load such a platform in Renode and run a sample application, this is the output you’ll see. Keep in mind that the UART window displays data printed by the verilated peripheral.

You can also enable signal trace dumping by setting the VERILATOR_TRACE=1 variable in your shell. The resulting trace is written into a vcd file and can be viewed in e.g. GTKWave viewer.

Renode’s powerful co-simulation capabilities

Whether you are working on a new hardware block or you want to reuse the HDL code you have, Renode’s co-simulation capabilities allow you to test your IP in a broader context than just usual hardware simulation, connecting it to entire RISC-V, ARM or other SoCs even without writing any model.

You can use Renode’s powerful tracing and logging mechanisms to observe your peripheral’s behavior when used by an operating system of your choice, in an environment of your choice – be it a full-blown Linux-capable multi-core system or a small RTOS-ready SoC, or even a mix of those options.

Want to debug your driver via GDB but your target FPGA does not have a debugger connector? Or maybe it is just too small to contain the whole SoC you’d like to run? Perhaps you’d like to run a Python script to create a nice graph on each peripheral access? Renode has got you covered with all these features available out of the box.

If this sounds interesting, you can start using Renode’s co-simulation capabilities today or let us know about your use case directly so that we can potentially help you improve your simulation-driven workflow – all you need to do is get back to us at contact@renode.io.

If you’re new to Zephyr RTOS, please see our Getting Started Guide and check out our Contributor Guide. Or, you can join the conversation and ask questions on our Slack channel or Mailing List and follow #zephyrproject on IRC.

When 32 bits isn’t enough — Porting Zephyr to RISCV64

By | Blog

Written by Nicolas Pitre, Senior Software Engineer at BayLibre

SiFive HiFive Unleashed Board

This blog post originally ran on the BayLibre website last month. For more details about BayLibre, visit https://baylibre.com/.

Conventional wisdom says you should normally apply small microcontrollers to dedicated applications with constrained resources. 8-bit microcontrollers with a few kilobytes of memory are still plentiful today. 32-bit microcontrollers with a couple of dozen kilobytes of memory are also very popular. In the latter case, it is typical to rely on a small RTOS to provide basic software interfaces and services.

The Zephyr Project provides such an RTOS. Many ARM-based microcontrollers are supported, but other architectures including ARC, XTENSA, RISC-V (32-bit) and X86 (32-bit) are also supported.

Yet some people are designing products with computing needs that are simple enough to be fulfilled by a small RTOS like Zephyr, but with memory addressing needs that cannot be described by kilobytes or megabytes, but that actually require gigabytes! So it was quite a surprise when BayLibre was asked to port Zephyr to the 64-bit RISC-V architecture.

Where to start

The 64-bit port required a lot of cleanups. Initially, we were far from concerned by the actual RISCV64 support. Zephyr supports a virtual “board” configuration faking basic hardware on one side and interfacing with a POSIX environment on the other side which allows for compiling a Zephyr application into a standard Linux process. This has enormous benefits such as the ability to use native Linux development tools. For example, it allows you to use gdb to look at core dumps without fiddling with a remote debugging setup or emulators such as QEMU.

Until this point, this “POSIX” architecture only created 32-bit executables. We started by only testing the generic Zephyr code in 64-bit mode. It was only a matter of flipping some compiler arguments to attempt a 64-bit build. But unsurprisingly, it failed.

The 32-bit legacy

Since its inception, the Zephyr RTOS targeted 32-bit architectures. The assumption that everything can be represented by an int32_t variable was everywhere. Code patterns like the following were ubiquitous:

static inline void mbox_async_free(struct k_mbox_async *async)
{
        k_stack_push(&async_msg_free, (u32_t)async);
}

Here the async pointer gets truncated on a 64-bit build. Fortunately, the compiler does flag those occurrences:

In function ‘mbox_async_free’:
warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
k_stack_push(&async_msg_free, (u32_t)async);
^

Therefore the actual work started with a simple task: converting all u32_t variables and parameters that may carry pointers into uintptr_t. After several days of work, the Hello_world demo application could finally be built successfully. Yay!

But attempting to execute it resulted in a segmentation fault. The investigation phase began.

Chasing bugs

While the compiler could identify bad u32_t usage when a cast to or from a pointer was involved, some other cases could be found only by manual code inspection. Still, Zephyr is a significant body of code to review and catching all issues, especially the subtle ones, couldn’t happen without some code execution tracing in gdb.

A much more complicated issue involved linked list pointers that ended up referring to non-existent list nodes for no obvious reason, and the bug only occurred after another item was removed from the list. This issue was only noticeable with a subsequent list search that followed the rogue pointer into Lalaland. And it didn’t trigger every time.

The header file for list operations starts with this:

#ifdef __LP64__
typedef u64_t unative_t;
#else
typedef u32_t unative_t;
#endif

So one would quickly presume that the code is already 64-bit ready. From a quick glance, it does use unative_t everywhere. What is easily missed is this:

#define SYS_SFLIST_FLAGS_MASK 0x3U

static inline sys_sfnode_t *z_sfnode_next_peek(sys_sfnode_t *node)
{
        return (sys_sfnode_t *)(node->next_and_flags & ~SYS_SFLIST_FLAGS_MASK);
}

Here we return the next pointer after masking out the bottom 2 flag bits. But 0x3U is interpreted by the compiler as an unsigned int and therefore a 32-bit value, meaning that ~0x3U is equal to 0xFFFFFFFC. Because node->next_and_flags is an u64_t, our (unsigned) 0xFFFFFFFC is promoted to 0x00000000FFFFFFFC, effectively truncating the returned pointer to its 32 bottom bits. So everything worked when the next node in the list was allocated in heap memory which is typically below the 4GB mark, but not for nodes allocated on the stack which is typically located towards the top of the address space on Linux.

The fix? Turning 0x3U into 0x3UL. The addition of that single character required many hours of debugging, and this is only one example. Other equally difficult bugs were also found.

The unsuspecting C library

One major change with most 64-bit targets is the width of pointers, but another issue is the change in width of long integer variables. This means that the printf() family of functions have to behave differently when the “l” conversion modifier is provided, as in “%ld”. On a 32-bit only target, all the printf() modifiers can be ignored as they all refer to a 32-bit integer (except for “%lld” but that isn’t supported by Zephyr). For 64-bit, this shortcut can no longer be used.

Alignment considerations are different too. For example, memory allocators must return pointers that are naturally aligned to 64-bit boundaries on 64-bit targets which has implications for the actual allocator design. The memcpy() implementation can exploit larger memory words to optimize data transfer but a larger align is necessary. Structure unions may need adjustments to remain space efficient in the presence of wider pointers and longs.

Test, test and test

One great thing about Zephyr is its extensive test suite. Once all the above was dealt with, it was time to find out if the test suite was happy. And of course it wasn’t. In fact, the majority of the tests failed. At least the Hello_world demo application worked at that point.

Writing good tests is difficult. The goal is to exercise code paths that ought to work, but it is even better when tests try to simulate normal failure conditions to make sure the core code returns with proper error codes. That often requires some programming trickery (read: type casting) within test code that is less portable than regular application code. This means that many tests had to be fixed to be compatible with a 64-bit build. And when core code bugs only affecting 64-bit builds were found, fixing them typically improved results in large portions of the tests all at once.

OK, but where does RV64 fit in this story?

We wrote the RV64 support at the very end of this project. In fact, it represented less than 10% of the whole development effort. Once Zephyr reached 64-bit maturity, it was quite easy to abstract register save/restore and pointer accesses in the assembly code to support RV64 within the existing RV32 code thanks to RISC-V’s highly symmetric architecture. Testing was also easy with QEMU since it can be instructed to use either an RV32 or an RV64 core with the same machine model.

Taking full advantage of 64-bit RISC-V cores on Zephyr may require additional work depending on the actual target where it would be deployed. For example, Zephyr doesn’t support hardware floating point context switching or SMP with either 32-bit or 64-bit RISC-V flavors yet.

But the groundwork is now done and merged into the mainline Zephyr project repository. Our RV64 port makes Zephyr RTOS 2.0.0 a milestone release — it’s the first Zephyr version to support both 32-bit and 64-bit architectures.

Zephyr RTOS 2.0 Release Highlights

By | Blog

Written by Ioannis Glaropoulos, Software System Architect at Nordic Semiconductor and active member of the Zephyr Technical Steering Committee

Last month, the Zephyr Project announced the release of Zephyr RTOS 2.0 and we are excited to share the details with you! Zephyr 2.0 is the first release of Zephyr RTOS after the 1.14 release with Long-Term support in April 2019. It is also a huge step up from the 1.14 release, bringing a wide list of new features, significant enhancements in existing features, as well as a large list of new HW platforms and development boards.

On the Kernel side, we enhanced the compatibility with 64-bit architectures, and significantly improved the precision of timeouts, by boosting the default tick rate for tickless kernels.

Additionally, we are excited to welcome ARM Cortex-R into the list of architectures supported in Zephyr RTOS.

A major achievement in this release is the stabilization of the Bluetooth Low Energy (BLE) split controller, which is now the default BLE controller in the Zephyr RTOS. The new BLE controller enables support for multi-vendor Bluetooth v5.0 radio hardware with a single controller code-base, thanks to a layered modular architecture, where most of the controller code is hardware agnostic. The new controller also features improved scheduling of continuous scanning and directed advertising, and increased radio time utilization. The latter significantly improves the achievable communication bandwidth – among other use-cases – in BLE Mesh networking.

In the networking area, we introduced support for SOCKS5 proxy, an Internet protocol that exchanges network packets between a client and server through a proxy server. In addition, we added support for 6LoCAN, a 6Lo adaption layer for Controller Area Networks, and for Point-to-Point Protocol (PPP), which is used to establish a direct connection between two nodes. We, finally, added support for UpdateHub, an end-to-end solution for large scale over-the-air device updates.

A most sincere thank you to the more than 215 developers who contributed to this release. Not only did you add a wealth of new features during the merge window, you also rallied together as a community during the stabilization period across time zones, companies, architectures, and even weekends, to find and fix bugs, to make Zephyr 2.0 yet another great release! This release would not have been possible without your hard work!

To learn more about Zephyr Project please see our Getting Started Guide, join the mailing list or follow #zephyrproject on IRC.

Zephyr RTOS and Cortex-R5 on Zynq UltraScale+

By | Blog

This blog originally appeared on the Antmicro blog.

The UltraScale+, a high-performance FPGA SoC designed for heterogeneous processing with 4 Cortex-A53 cores and 2 Cortex-R5 cores, is often used in Antmicro’s projects. For certain complex devices, the combined processing capabilities of the US+ FPGA SoC’s heterogeneous cores are ideal – with the R5 cores used for real-time processing, the A53s for running Linux with non-critical software, and the FPGA used for dedicated accelerators for large amounts of data, such as high-resolution video. A good example is our fast 3D vision system, X-MINE, currently being deployed in several valuable mineral mines across Europe.

For such AMP applications however, only FreeRTOS and bare metal are available as options to be run on the R5 core by default. Coming from a software-oriented and standards-driven perspective, Antmicro likes to work with the Linux-Foundation backed, vendor-neutral and scalable Zephyr RTOS, of which we are a member – and so porting Zephyr to the US+ was an obvious choice.

Why is Zephyr a good choice

Dedicated for all but the most resource-constrained devices, Zephyr can target a variety of use cases in real-time applications with the US+’ Cortex-R.

Zephyr allows for easy handling of multiple configuration options, APIs and external components, and is well suited to structured application development. We’d worked with many AMP Linux+RTOS applications on various platforms, including ones executed in TEEs, which makes us especially sensitive to mixing programming styles and code architecture, which differ immensely between Zephyr and more traditional RTOS.

Another benefit of Zephyr is that it targets some very serious protocol and standard implementations, being e.g. the first open source RTOS to introduce TSN support – by way of Antmicro’s contribution. The rising popularity of TSN in automotive and aerospace applications, and just about everywhere else, could be a very important reason to start using Zephyr in your TSN-capable product.

The Zephyr port

Just recently initial support for Cortex-R has been introduced in Zephyr, providing basic context switching and interrupts, as well as adding a testing platform in simulation.

Antmicro’s contribution, released to GitHub today, introduces the support for a first real hardware platform. Our choice was the Enclustra Mercury XU1 system-on-module, which is often used by ourselves and our customers, encapsulating the complexity of the UltraScale+ MPSoC in an easily swappable module. Antmicro has a standard devkit based on this SoM which you could use to recreate this demo, but of course it should be possible to run it with minor tweaks on any Zynq UltraScale+ MPSoC device.

It also helps that Antmicro has been in charge of developing the entire OS-level software stack for all the FPGA SoC modules from Enclustra, basing on Buildroot/OpenEmbedded and our deep cross-area HW/SW/FPGA expertise which helped build a very easy to use interface for thousands of customers purchasing SoMs from this vendor. For simplicity, we will leverage this building block, called Enclustra Build Environment, in this note, although of course that’s not a strict necessity.

How to run a demo

Zephyr can be run on Cortex-R5 either from Linux running on Cortex-A53, or using a JTAG adapter. Here we will use Linux to have supervisor control (power on, load firmware, power off) over the remote processor.

Zephyr comes with a number of demos you can run on the Mercury XU1 SoM, and we’ll focus on the philosophers demo here. It implements a solution to the Dining Philosophers problem, which in computer science is considered a classic multi-thread synchronization problem.

Building the Linux environment for Cortex-A53

The setup requires the Enclustra Build Environment – a tool by Antmicro that enables easy and fast build of Linux with necessary bootloaders and firmware. It provides a simple ncurses-based GUI and command line interface to fetch and build U-Boot, Linux and a Buildroot-based root file system. At present, EBE supports 10 modules of the two SoC families: Zynq-7000 and Zynq-UltraScale+ including the Mercury XU1 module that we used. To get you started with EBE, refer to its online documentation.

In order to use the Cortex-R5, you have to load the firmware to the Tightly Coupled Memory of the processor. For this, a few additions in the original devicetree, that are related to the remoteproc device, are required.
The necessary devicetree parts can be found in this document (pages 15-16).

A ready to use devicetree file for Mercury XU1 module can be found in the Xilinx Linux repository.

Building the ‘philosophers’ demo in Zephyr

The application was built with Zephyr version 1.14.99 and ZephyrOS SDK version 0.10.2. The code is available on GitHub.

The philosophers demo can be built using the following bash commands:

Go to the zephyr repository:
cd zephyrproject/zephyr
Set up your build environment:
source zephyr-env.sh
Go to the location of the demo and build it:

Starting the Zephyr app from Linux

The Zephyr app should be copied to /lib/firmware in the root filesystem
By default, the driver for the remote processor is compiled as a module. It can be loaded into the kernel using the following command:

Load and start the application for Cortex-R5:

The image below presents the result of running the application.

Future development

In the near future, We plan to enable Inter Processor Communication between Linux running on the A53 core and Zephyr on the R5 core. Both Zephyr and Linux support OpenAMP (Open Asymmetric Multi Processing) which is a platform that implements homogenous API for asymmetric multiprocessing.

Currently, Zephyr has only one OpenAMP demo. It targets the LPC54114 SoC which features a Cortex-M4 core. Adding a demo with Zephyr running on R5 and Linux on A53 would be the very first of this kind, so it’s definitely a worthwhile endeavor.

Benefits of AMP on US+

Asymmetric multiprocessing (AMP) can be really useful to get the best of both worlds, allowing you to get predictable, real-time responses where they matter, while keeping the ease-of-use and richness of a standard Linux OS. We have built many FPGA and regular SoC based Linux devices which benefited from e.g. running Web-based control servers and GUIs on Cortex-A cores, while keeping a critical functionality running on another CPU core (be it Cortex-M, A or R) with an RTOS. And in terms of the programming experience, Zephyr is a good match for the Linux you’ll be running on the main application core.

If you want to develop a complex application for Xilinx’s Zynq UltraScale+ MPSoC, and could use HW-SW co-design capabilities of Antmicro, whether it is designing dedicated PCBs, creating well-structured and modern FPGA code and/or integrating this with Linux or Zephyr – or both, don’t hesitate to contact us at contact@antmicro.com.