Running and Testing TensorFlow Lite on Microcontrollers without hardware in Renode

This article originally appeared on the TensorFlow Lite blog. For more content like this, click here.

Every day more and more software developers are exploring the worlds of machine learning, embedded systems, and the Internet of Things. Perhaps one of the most exciting advances to come out of the most recent innovations in these fields is the incorporation of ML at the edge and into smaller and smaller devices – often referred to as TinyML.

In “The Future of Machine Learning is Tiny”Pete Warden predicted that machine learning would become increasingly available on tiny, low-power devices. Thanks to the work of the TensorFlow community, the power and flexibility of the framework is now also available on fairly resource-constrained devices like Arm Cortex-M MCUs, as per Pete’s prediction.

Thousands of developers using TensorFlow can now deploy ML models for actions such as keyphrase detection or gesture recognition onto embedded and IoT devices. However, testing software at scale on many small and embedded devices can still be challenging. Whether it’s difficulty sourcing hardware components, incorrectly setting up development environments or running into configuration issues while incorporating multiple unique devices into a multi-node network, sometimes even a seemingly simple task turns out to be complex.

Even experienced embedded developers find themselves trudging through the process of flashing and testing their applications on physical hardware just to accomplish simple test-driven workflows which are now commonplace in other contexts like Web or desktop application development.

The TensorFlow Lite MCU team also faced these challenges: how do you repeatedly and reliably test various demos, models, and scenarios on a variety of hardware without manually re-plugging, re-flashing and waving around a plethora of tiny boards?

To solve these challenges, they turned to Renode, an open source simulation framework from Antmicro that strives to do just that: allow hardware-less, Continuous Integration-driven workflows for embedded and IoT systems.

In this article, we will show you the basics of how to use Renode to run TensorFlow Lite on a virtual RISC-V MCU, without the need for physical hardware (although if you really want to, we’ve also prepared instructions to run the same exact software on a Digilent Arty board.

While this tutorial focuses on a RISC-V-based platform, Renode is able to simulate software targeting many different architectures, like Arm, POWER and others, so this approach can be used with other hardware as well.

What’s the deal with Renode?

At Antmicro, we pride ourselves on our ability to enable our customers and partners to create scalable and sustainable advanced engineering solutions to tackle complex technical challenges. For the last 10 years, our team has worked to overcome many of the same structural barriers and developer tool deficiencies now faced by the larger software developer community. We initially created the Renode framework to meet our own needs, but as proud proponents of open source, in 2015 we decided to release it under a permissive license to expand the reach and make embedded system design flexible, mobile and accessible to everyone.

Renode, which has just released version 1.9, is a development framework which accelerates IoT and embedded systems development by letting you simulate physical hardware systems – including both the CPU, peripherals, sensors, environment and – in case of multi-node systems – wired or wireless medium between nodes. It’s been called “docker for embedded” and while the comparison is not fully accurate, it does convey the idea pretty well.

Renode allows you to deterministically simulate entire systems and dynamic environments – including feeding modeled sample data to simulated sensors which can then be read and processed by your custom software and algorithms. The ability to quickly run unmodified software without access to physical hardware makes Renode an ideal platform for developers looking to experiment and build ML-powered applications on embedded and IoT devices with TensorFlow Lite.

Getting Renode and demo software

To get started, you first need to install Renode as detailed in its README file – binaries are available for Linux, Mac and Windows.

Make sure you download the proper version for your operating system to have the renode command available. Upon running the renode command in your terminal you should see the Monitor pop up in front of you, which is Renode’s command-line interface.

Renode Monitor CLI

Once Renode has started, you’re good to go – remember, you don’t need any hardware.

We have prepared all the files you will need for this demo in a dedicated GitHub repository.

Clone this repository with git (remember to get the submodules):

git clone --recurse-submodules https://github.com/antmicro/litex-vexriscv-tensorflow-lite-demo 

We will need a demo binary to run. To simplify things, you can use the precompiled binary from the binaries/magic_wand directory (in Building your own application below we’ll explain how to compile your own, but you only need to do that when you’re ready.)

Running TensorFlow Lite in Renode

Now the fun part! Navigate to the renode directory:

cd renode

The renode directory contains a model of the ADXL345 accelerometer and all necessary scripts and assets required to simulate the Magic Wand demo.

To start the simulation, first run renode with the name of the script to be loaded. Here we use “litex-vexriscv-tflite.resc“, which is a “Renode script” (.resc) file with the relevant commands to create the needed platform and load the application to its memory:

renode litex-vexriscv-tflite.resc

You will see Renode’s CLI, called “Monitor”, from which you can control the emulation. In the CLI, use the start command to begin the simulation:

(machine-0) start

You should see the following output on the simulated device’s virtual serial port (also called UART – which will open as a separate terminal in Renode automatically):

Renode-TFlite-Zephyr-Litex

What just happened?

Renode simulates the hardware (both the RISC-V CPU but also the I/O and sensors) so that the binary thinks it’s running on the real board. This is achieved by two Renode features: machine code translation and full SoC support.
First, the machine code of the executed application is translated to the native host machine language.

Whenever the application tries to read from or write to any peripheral, the call is intercepted and directed to an appropriate model. Renode models, usually (but not exclusively) written in C# or Python, implement the register interface and aim to be behaviorally consistent with the actual hardware. Thanks to the abstract nature of these models, you can interact with them programmatically from the Renode CLI or from script files.
In our example we feed the virtual sensor with some offline, pre-recorded angle and circle gesture data files:

i2c.adxl345 FeedSample @circle.data

The TF Lite binary running in Renode processes the data and – unsurprisingly – detects the gestures.

This shows another benefit of running in simulation – we can be entirely deterministic should we choose to, or devise more randomized test scenarios, feeding specially prepared generated data, choosing different simulation seeds etc.

Building your own application

If you want to build other applications, or change the provided demos, you can now build them yourself using the repository you have downloaded. You will need to install the following prerequisites (tested on Ubuntu 18.04):

sudo apt update
sudo apt install cmake ninja-build gperf ccache dfu-util device-tree-compiler wget python python3-pip python3-setuptools python3-tk python3-wheel xz-utils file make gcc gcc-multilib locales tar curl unzip

Since the software is running the Zephyr RTOS, you will need to install Zephyr’s prerequisites too:

sudo pip3 install psutil netifaces requests virtualenv

# install Zephyr SDK
wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.11.2/zephyr-sdk-0.11.2-setup.run
chmod +x zephyr-sdk-0.11.2-setup.run
./zephyr-sdk-0.11.2-setup.run -- -d /opt/zephyr-sdk

Once all necessary prerequisites are in place, go to the repository you downloaded earlier:

cd litex-vexriscv-tensorflow-lite-demo

And build the software with:

cd tensorflow
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=zephyr_vexriscv \
magic_wand_bin

The resulting binary can be found in the tensorflow/lite/micro/tools/make/gen/zephyr_vexriscv_x86_64/magic_wand/CMake/zephyr folder.

Copy it into the root folder with:

TF_BUILD_DIR=tensorflow/lite/micro/tools/make/gen/zephyr_vexriscv_x86_64
cp ${TF_BUILD_DIR}/magic_wand/CMake/zephyr/zephyr.elf ../
cp ${TF_BUILD_DIR}/magic_wand/CMake/zephyr/zephyr.bin ../

You can run it in Renode exactly as before.

To make sure the tutorial keeps working, and to showcase how simulation also enables you to do Continuous Integration easily, we also put together a Travis CI for the demo, and that is how the binary in the example is generated.

Travis CI

We will describe how the TensorFlow Lite team uses Renode for Continuous Integration and how you can do that yourself in a separate note soon – stay tuned for that!

Running on hardware

Now that you have the binaries and you’ve seen them work in Renode, let’s see how the same binary behaves on physical hardware.

You will need a Digilent Arty A7 board and ACL2 PMOD, connected to the rightmost Pmod connector as in the picture.

Digilent Arty board with Pmod

The system is a SoC-in-FPGA called LiteX, with a pretty capable RISC-V core and various I/O options.

To build the necessary FPGA gateware containing our RISC-V SoC, we will be using LiteX Build Environment, which is an FPGA oriented build system that serves as an easy entry into FPGA development on various hardware platforms.

Now initialize the LiteX Build Environment:

cd litex-buildenv
export CPU=vexriscv
export CPU_VARIANT=full
export PLATFORM=arty
export FIRMWARE=zephyr
export TARGET=tf

./scripts/download-env.sh
source scripts/enter-env.sh

Then build the gateware:

make gateware

Once you have built the gateware, load it onto the FPGA with:

make gateware-load

With the FPGA programmed, you can load the Zephyr binary on the device using the flterm program provided inside the environment you just initialized above:

flterm --port=/dev/ttyUSB1 --kernel=zephyr.bin --speed=115200

flterm will open the serial port. Now you can wave the board around and see the gestures being recognized in the terminal. Congratulations! You have now completed the entire tutorial.

Summary

In this post, we have demonstrated how you can use TensorFlow Lite for MCUs without (and with) hardware. In the coming months, we will follow up with a description of how you can proceed from interactive development with Renode to doing Continuous Integration of your Machine Learning code, and then show the advantages of combining the strengths of TensorFlow Lite and the Zephyr RTOS.

You can find the most up to date instructions in the demo repository. The repository links to tested TensorFlow, Zephyr and LiteX code versions via submodules. Travis CI is used to test the guide.

If you’d like to explore more hardware and software with Renode, check the complete list of supported boards. If you encounter problems or have ideas, file an issue on GitHub, and for specific needs, such as enabling TensorFlow Lite and simulation on your platform, you can contact us at contact@renode.io

Zephyr RTOS and Nordic nRF52-DK: debugging, unit testing, project analysis

This tutorial originally ran on the PlatformIO docs website. You can find it here. For more content like this, click here.

The goal of this tutorial is to demonstrate how simple it is to use VSCode to develop, run and debug a simple Bluetooth project using Zephyr framework for the Nordic nRF52-DK board.

  • Level: Intermediate
  • Platforms: Windows, Mac OS X, Linux

Requirements:

Contents

Setting Up the Project

  1. Click on “PlatformIO Home” button on the bottom PlatformIO Toolbar:../../_images/zephyr-debugging-unit-testing-inspect-1.png
  2. Click on “New Project”, select Nordic nRF52-DK as the development board, Zephyr as the framework and a path to the project location (or use the default one):../../_images/zephyr-debugging-unit-testing-inspect-2.png

Adding Code to the Generated Project

  1. Create a new file main.c in src_dir folder and add the following code:

2. By default Bluetooth feature is disabled, we can enable it by creating a new file prj.conf in zephyr folder and adding the following lines:

Compiling and Uploading the Firmware

  1. To compile the project use one of the following options:
    • Build option from the Project Tasks menu
    • Build button in PlatformIO Toolbar
    • Task Menu Tasks: Run Task... > PlatformIO: Build or in PlatformIO Toolbar
    • Command Palette View: Command Palette > PlatformIO: Build
    • Hotkeys cmd-alt-b / ctrl-alt-b:
    ../../_images/zephyr-debugging-unit-testing-inspect-3.png
  2. If everything went well, we should see a successful result message in the terminal window:../../_images/zephyr-debugging-unit-testing-inspect-4.png
  3. To upload the firmware to the board we can use the following options:
    • Upload option from the Project Tasks menu
    • Upload button in PlatformIO Toolbar
    • Command Palette View: Command Palette > PlatformIO: Upload
    • Task Menu Tasks: Run Task... > PlatformIO: Upload
    • Hotkeys cmd-alt-u / ctrl-alt-u:
    ../../_images/zephyr-debugging-unit-testing-inspect-5.png
  4. Connect the board to your computer and update the default monitor speed to 115200 in platformio.ini file:[env:hifive1-revb] platform = sifive board = hifive1-revb framework = zephyr monitor_speed = 115200
  5. Open Serial Monitor to observe the output from the board:../../_images/zephyr-debugging-unit-testing-inspect-6.png
  6. If everything went well, the board should be visible as a beacon:../../_images/zephyr-debugging-unit-testing-inspect-7.png

Debugging the Firmware

Since Nordic nRF52-DK includes an onboard debug probe we can use PIO Unified Debugger without any configuration.

  1. To start a debug session we can use the following options:
    • Debug: Start debugging from the top menu
    • Start Debugging option from Quick Access menu
    • Hotkey button F5:
    ../../_images/zephyr-debugging-unit-testing-inspect-8.png
  2. We can walk through the code using control buttons, set breakpoints, add variables to Watch window:../../_images/zephyr-debugging-unit-testing-inspect-9.png

Writing Unit Tests

Note

Functions setUp and tearDown are used to initialize and finalize test conditions. Implementations of these functions are not required for running tests but if you need to initialize some variables before you run a test, you use the setUp function and if you need to clean up variables you use tearDown function.

For the sake of simplicity, let’s create a small library called calculator, implement several basic functions addsubmuldiv and test them using PIO Unit Testing engine.

  1. PlatformIO uses a unit testing framework called UnityUnity is not compatible with C library implemented in the framework. Let’s enable standard version of newlib C library in prj.conf file using the following config:CONFIG_NEWLIB_LIBC=y
  2. Create a new folder calculator in the lib folder and add two new files calculator.h and calculator.c with the following contents:calculator.h:#ifndef _CALCULATOR_H_ #define _CALCULATOR_H_ #ifdef __cplusplus extern “C” { #endif int add (int a, int b); int sub (int a, int b); int mul (int a, int b); int div (int a, int b); #ifdef __cplusplus } #endif #endif // _CALCULATOR_H_ calculator.c:#include “calculator.h” int add(int a, int b) { return a + b; } int sub(int a, int b) { return a – b; } int mul(int a, int b) { return a * b; }
  3. Create a new file `test_calc.c to the folder test and add basic tests for calculator library:#include <calculator.h> #include <unity.h> void test_function_calculator_addition(void) { TEST_ASSERT_EQUAL(32, add(25, 7)); } void test_function_calculator_subtraction(void) { TEST_ASSERT_EQUAL(20, sub(23, 3)); } void test_function_calculator_multiplication(void) { TEST_ASSERT_EQUAL(50, mul(25, 2)); } void test_function_calculator_division(void) { TEST_ASSERT_EQUAL(32, div(100, 3)); } void main() { UNITY_BEGIN(); RUN_TEST(test_function_calculator_addition); RUN_TEST(test_function_calculator_subtraction); RUN_TEST(test_function_calculator_multiplication); RUN_TEST(test_function_calculator_division); UNITY_END(); }
  4. Let’s run tests on the board and check the results. There should be a problem with test_function_calculator_division test:../../_images/zephyr-debugging-unit-testing-inspect-10.png
  5. Let’s fix the incorrect expected value, run tests again. After processing the results should be correct:../../_images/zephyr-debugging-unit-testing-inspect-11.png

Project Inspection

For illustrative purposes, let’s imagine we need to find a function with the biggest memory footprint. Also, let’s introduce a bug to our project so PIO Check can report it.

  1. Open PlatformIO Home and navigate to Inspect section, select the current project and press Inspect button:../../_images/zephyr-debugging-unit-testing-inspect-12.png
  2. Project statistics:../../_images/zephyr-debugging-unit-testing-inspect-13.png
  3. The biggest function:../../_images/zephyr-debugging-unit-testing-inspect-14.png
  4. Possible bugs:../../_images/zephyr-debugging-unit-testing-inspect-15.png

Conclusion

Now we have a project template for Nordic Nordic nRF52-DK board that we can use as a boilerplate for the next projects.

Renode 1.9 release with new platforms, RISC-V improvements, dual radio & more

This blog originally ran on the Antmicro website. For more blogs and articles like this one, visit https://antmicro.com/blog/.

Developers of IoT and embedded systems often have to deal with the considerable inconvenience of manually re-plugging and re-flashing a number of boards and components to test various scenarios, firmware versions and setups. This requires access to large amounts of physical hardware and is a convoluted process. Facing the exact same hurdles and looking for ways to make our work more efficient, we at Antmicro developed Renode – a framework for simulating physical hardware: from CPUs, peripherals, sensors, environment, to wired or wireless medium between nodes.

Over the years, Renode has matured, extending its functionality with every release and gathering a strong following of developers who have been successfully using it for their purposes. Recently, the framework has reached an important milestone with its 1.9 release, which comes with support for new platforms, a range of RISC-V-related improvements, and a host of other useful additions, fixes and changes.

New RISC-V platforms and improvements

Renode constantly develops its support for the RISC-V ecosystem. The 1.9 release introduces support for Privileged Architecture 1.1 and Kendryte 210 – an AI capable RISCV64 dual core SoC, as well as Supervisor level support for the VexRiscv FPGA-optimized RISC-V implementation, better customizability of RISC-V cores, a wide range of improvements to various peripherals in the LiteX SoC builder ecosystem and improved support for a range of platforms.

We have also enhanced the support for the LiteX framebuffer, preparing a dedicated demo of Linux with framebuffer targeting LiteX on the NeTV2 open video development board.
Moreover, the framework’s co-simulation capability has been expanded with new Wishbone support for verilated peripherals. This allows you to integrate your IP within the simulated environment easily, without having to create Renode models. The newest release includes a demo with the Zephyr RTOS running on LiteX/VexRiscv with a verilated UART.

OpenPOWER

The POWER Instruction Set Architecture has joined RISC-V as the second major open source ISA supported by Renode. Our framework now contains a platform and demo based on IBM’s first open source POWER implementation called Microwatt, running MicroPython. The introduction of the POWER ISA support in Renode means that our simulation framework is ready for more open source CPUs based on this architecture that are expected to be released in the future.

Dual radio support

Another interesting development that the 1.9 release comes with is support for Zolertia Firefly – a breakout board which features two Texas Instruments’ radios able to bridge networks operating at the frequencies of 2.4 GHz (CC2538, with integrated MCU) and 868 MHz (Sub-GHz – CC1200). It is an interesting IoT platform which showcases Renode’s capability to develop and test complex, multi-protocol systems with ease, and we are planning to publish a separate blog note dedicated to this topic soon.

Open FPGA QuickLogic development boards

Apart from Zolertia Firefly, Microwatt and Kendryte 210, the bunch of newly added Renode-supported platforms includes QuickFeather and Qomu boards featuring the EOS S3 SoC from QuickLogic, which has recently become the first FPGA vendor to embrace open source FPGA development tools by contributing crucial data to SymbiFlow – a collaborative project involving Antmicro, Google and a growing community of developers. Additionally, Antmicro ported the Zephyr RTOS the EOS S3 chip, with Renode samples provided for QuickLogic’s boards.

Quickfeather board

Online momentum

Renode is generating quite some buzz online, reflected in a few articles that have popped up since its most recent release. Memfault, a company providing firmware diagnostics services, has successfully used our platform and has written about running their firmware on it – an interesting read if you want to learn about Renode from a user’s perspective.

Based on our successful collaboration, PlatformIO added documentation for Renode’s integration with their framework for embedded applications, and Carlos Eduardo de Paula from RedHat wrote an excellent guest note on the Zephyr Project blog about using all three together – Renode, PlatformIO and Zephyr.

Summary of most notable updates

If you already are a Renode user, note that in version 1.9 the Renode configuration directory was moved to another location. To use your previous settings and Monitor history, you will need to start Renode 1.9 and copy your old config folder over the new one. On Linux and macOS the directory has been moved from ~/.renode to ~/.config/renode. On Windows it has been moved from the Documents folder to AppData\Roaming. Those changes are in line with the default locations of the config files in the respective OS.

List of selected upgrades:

Added:

  • support for RISC-V Privileged Architecture 1.11
  • EOS S3 platform, with QuickFeather and Qomu boards support
  • EFR32MG13 platform support
  • Zolertia Firefly dual radio (CC2538/CC1200) platform support
  • Kendryte K210 platform support
  • NeTV2 with LiteX and VexRiscv platform support
  • EFR32 timer and gpcrc models
  • MAX3421E USB controller model
  • support for Wishbone bus in verilated peripherals, exemplified with the riscv_verilated_liteuart.resc sample
  • scripts to create Conda packages for Linux, Windows and macOS

Changed:

  • VexRiscv now supports Supervisor level interrupts, following latest changes to this core
  • PolarFire SoC script now has a sample binary, running FreeRTOS with LwIP stack
  • NetworkInterfaceKeywords now support wireless communication

The full list of additions, changes and fixes can be found in Renode’s changelog.

If you are developing IoT systems, you might want to consider using Renode to save time, effort and cost by capitalizing on the frameworks’ ability to simulate, test and debug complex, multi-node systems of devices. Write to us at contact@renode.io to find out how exactly Renode can transform your embedded software testing and hardware-software co-development process.

Competitive Warehouse Automation with micro-ROS

This blog originally ran on the eProsima website. For more content like this, please visit their website.

Automation is the key to the development within the supply chain and logistics sectors, as it has the potential to improve productivity while reducing costs and protecting workers’ health.

micro-ROS warehouse automation demo

At its most basic, an automated warehouse attempts to cut down on manual tasks that slow down the movement of goods, and to minimize the exposure of human workers to potential hazards. As robotics evolves, ever more warehouses around the word plan to increase their investment in technology, with a focus on automation and scheduling tools. 

The demo below is an example of the potential entailed by micro-ROS to act as a first-line player in the next generation of robotics applications in the logistics sector. It showcases a ROS 2 enabled robotic arm (Robotis OpenMANIPULATOR-X) connected to a ST VL53l1X ToF sensor able to measure the distance between a target object and the base of the arm.

The sensor is operated by an Olimex STM32-E407 development board, which features a STM32F407 microcontroller running a micro-ROS app. The app runs on Zephyr, a Real-Time Operating System (RTOS) that is especially convenient thanks to the large collection of sensor drivers available, and is in charge of passing the distance measurements to the software controlling the arm kinematics. This communication is mediated by a Raspberry Pi 4 bridge, through which the arm is instructed to grab the object with millimetric precision and relocate it in a different position.

(micro-ROS warehouse automation demo in times of coronavirus pandemic)

Tasks of this kind can be integrated into a bigger and more complex operations chain, as a building block of a fully automated protocol, relevant to sectors such as that of warehouses discussed above.

WHAT IS MICRO-ROS?

micro-ROS is a Robot Operating System specifically tailored for embedded and resource-constrained platforms, such as microcontrollers.

While inheriting most of its key features and architecture from ROS 2, its more famously known elder ‘brother’, micro-ROS bridges seamlessly the ‘macro’ and the ‘micro’ robotics worlds.

micro-ROS is an open-source software that runs on a Real-Time Operating System (RTOS), and uses the DDS middleware Micro XRCE-DDS, that is, DDS for eXtremely Resource-Constrained Environments. Above that, it runs the ROS 2 stack with a few microcontrollers-specific improvements.

micro-ROS offers powerful developer tools, such as a complete build system for different hardware platforms, and the whole pool of robotic applications available in a ROS 2 environment. Along with these, a rich collection of tutorials is available for the user to program its own applications, and out-of-the-box instructions are provided for reproducing compelling use-cases.

WHY USE MICRO-ROS?

One of the advantages of migrating robotics applications towards low-resources technologies is a drastic cost reduction, making micro-ROS especially convenient for both competitive industrial mass production and the access of those who want to move their first steps into robotics on a shoestring budget.

AVAILABLE DOCUMENTATION:

MORE INFORMATION ABOUT MICRO-ROS AND THIS DEMO:

For any questions please contact info@eprosima.com

Zephyr 2.3.0 released!

Written by Carles Cufí, TSC member and Open Source software engineer at Nordic Semiconductor

For the last 3 months,  we’ve been busy working on the next release – Zephyr 2.3.0.  More than 200 conttributors have added over 3200 commits to the codebase, in one of our busiest, most feature-packed and secure releases yet.

Looking back at the tenets that underpin the project, security has always been a fundamental objective of Zephyr. A recent report by security firm NCC Group analyzed security vulnerabilities that were found in the Zephyr codebase and reported to Zephyr’s security team before disclosing the actual report. All the critical and high vulnerabilities found were fixed before the end of the release cycle, and thus the 2.3.0 release contains no know security issues. In keeping with the spirit of open source collaborative development and full transparency, Zephyr now includes a  vulnerabilities web page that lists all disclosed, public security issues and the patches that fixed them.This is on top of Zephyr’s comprehensive and detailed security policies, which are also publicly available.

Another milestone for the project is the addition of integration with the Trusted Firmware M open source Trusted Execution Environment framework, which implements Arm’s Platform Security Architecture specification. Zephyr has long included support for Arm’s TrustZone hardware, including being able to target the secure side of the firmware, but by adding integration with the standard Trusted Firmware M project, it now also offers the option to combine TF-M and Zephyr to create a PSA-certified solution.

Another highlight of the release has to do with Zephyr’s extensive use of the devicetree standard to describe the hardware it runs on. The RTOS has used this format for several years, but this release overhauls the mechanism by which drivers and applications can retrieve the information present in the devicetree source files that are processed as inputs.

A powerful new static, macro-based API gives developers the ability to query any information they might require of the nodes and properties that the devicetree source files contain. The new API will be further extended in future releases to allow for even more complex operations that, until now, were only available in systems that compile the devicetree source into a binary blob, such as hierarchical queries and unique device identifiers. All of the in-tree devicetree users have been ported to the new API, leading to a substantial improvement in readability, clarity and structure to drivers and other subsystems.

Digital Signal Processing is also of major importance in some applications that Zephyr supports. Until this release, Zephyr users that compiled the RTOS for Arm-based platforms were required to add and integrate Arm’s complete DSP extension library manually, leading to code duplication, complexity and bugs. This release introduces the mainline integration of CMSIS-DSP into the Zephyr distribution, simplifying enormously the task of Zephyr users who want to benefit from this solid and mature Digital Signal Processing framework. Not only has the actual framework been integrated, but also the comprehensive suite of tests, so that we can make sure that the functionality works correctly on all supported platforms. Users themselves can make use of these tests to verify that the port to their custom board functions properly when enabling the DSP library.

The kernel has long presented a simple but established format for timeout parameters. This has served the project well for many years, but it was time to overhaul it in order to be able to later introduce advanced features such as clock source selection, high-precision (including 64-bit) timeouts and even absolute timeout values, which are critical for certain subsystems that need to track time with high precision based on external events, such as radio activity of some sort. The new timeout API encapsulates the parameter into an opaque structure, making for a smooth transition for Zephyr users while at the same time ensuring that the solution is future-proof.

Additional features  include:

  • A new CMake package system that reduces the need for the use of environment variables, one of the biggest hurdles when setting up Zephyr for the first time
  • Support for Advertising Extensions in the Bluetooth Low Energy Host, which enables devices to broadcast data and establish connections at long-range
  • A new heap allocator which is substantially more flexible and performant than the existing mem pool one

To top all of the above, more than 900 GitHub issues were closed during this release cycle, including Enhancements, Feature Requests and, of course, hundreds of bugs. Check out the full release notes for a complete list of issues and benefits.

As always we’d like to extend our gratitude to all of the contributors that have made this release possible, no matter whether they are company-sponsored or volunteers contributing their own free time. Zephyr is only made possible by them, and we are thrilled to see the community grow and become even more involved in the future of the project.

Join Us

We invite you to try out Zephyr 2.3.0. You can find our Getting started Guide here.  If you are interested in contributing to the Zephyr Project please see our Contributor Guide. Join the conversation or ask questions on our Mailing List.

Zephyr’s Security Assessment

Written by Joel Stapleton, Technical Product Manager, Nordic Semiconductor, and Zephyr Governing Board Chair

In January 2020, NCC Group notified the Zephyr Project of a number of security issues found as part of their independent research into the security posture of Zephyr. NCC Group, a global expert in cyber security and risk mitigation, initiated independent research into the Zephyr RTOS in response to growing client interest in the Project. They noted that they found Zephyr to be a mature, and a highly active and growing project with increasing market share. The report, which came out in May 2020, outlines the issues discovered in detail and acknowledges the proactive work of the Security Committee to fix issues and follow-up on recommendations of the report.  This blog aims to explain how the Zephyr Project offers an IoT solution, beyond an RTOS kernel and source code when it comes to securing end-products, and what the NCC Group’s report showed the Project and how it responded.

Since its launch in February 2016, the Zephyr Project has been steered by this vision:

“The Zephyr Project strives to deliver the best-in-class RTOS for connected, resource-constrained devices, built to be secure and safe.”

This vision was created to challenge the status-quo of proprietary or commercial kernels, and commercially-governed open projects that exist today. NCC Group’s report noted that their work to date has found IoT devices are typically built using a “hodgepodge” of chipset vendor board support packages, bootloaders, SDKs, and an RTOS kernel.  In contrast, the Zephyr solution is unique as it is vendor-neutral, with a scope from multi-architecture board support packages, to cloud connectivity for IoT products.  All source code is available and within the Project Continuous Integration (CI) framework.  The Project brings together a community of experts to participate on all aspects of the solution, from the standards to adopt, policies and processes to follow, and methodologies for build, test, maintenance, distribution and incident response.  The aim is to make a solution that developers can trust for the lifecycle of their products.

Zephyr has experienced significant growth in contributors, users, and the number of end-products made with Zephyr. The Zephyr RTOS is becoming established as a robust IoT solution for resource constrained devices where Linux is not an option.  Product and service providers in the IoT space are increasingly working with Zephyr as its maturity and user base increases.  

This was the case for NCC Group according to Technical Director Jeremy Boone: “NCC Group serves as a strategic security advisor to many companies that manufacture Internet-of-Things devices. Recently we have received an increasing number of queries regarding Zephyr. Our clients were primarily concerned with gaining an understanding of Zephyr’s overall security posture, and to better understand the factors that must be considered when designing a secure IoT device that is based on Zephyr. In order to better serve our clients, we decided to invest in a significant research effort to acquire a deep understanding of the Zephyr architecture.”

The Zephyr Project has always recognized the importance of the Security Posture of the solution which goes beyond the correctness of encryption implementations or the minimization of vulnerabilities.  Security Posture incorporates the application of  best practices for secure development and design, and the readiness and ability to notify and respond to security incidents.

The Project established a Security Committee and the role of Security Architect from the beginning.  The Security committee have put in place the relevant policies and procedures followed by the project, and selected industry frameworks such as the Linux Foundation’s (LF) Core Infrastructure Initiative (CII) Best Practices badging program for which the project has maintained gold status over the last few years.  The Zephyr Project is a CVE Numbering Authority (CNA) and adds vulnerabilities to the U.S. National Vulnerability Database (NVD) by being able to assign Common Vulnerability and Exposure (CVE) numbers and create notifications directly from the Project

The first Zephyr Long Term Support (LTS) release of Zephyr (14.0) in April 2019 was the realization of Zephyr’s strategy to provide LTS releases for applications that require highly stable code bases with bug and security patch support for long periods. Most recently in 2020, Zephyr’s vulnerability and embargo policies were revised and a system put in place to allow product makers, not participating as a member, to receive Security Vulnerability Notifications at the start of embargo periods so risk mitigation and corrective measures can be taken by users for their customers.

So when NCC Group investigated the Zephyr RTOS and MCUBoot bootloader, what happened and how did we do?

NCC Group informed the Zephyr Project of their research findings and intention to publish a report detailing vulnerabilities that were discovered. An embargo period was agreed and the Security Response process within Zephyr came into action. NCC Group reported 26 issues to Zephyr.  Within those 26 issues, 2 were considered critical vulnerabilities, 2 high risk vulnerabilities, with the remainder medium, low or informational.  1 low risk issue was reported for MCUBoot, although it was noted that no issues that could undermine the secure boot function were found.

The first step the Security Response team took after the issues were reported was to capture the information in the Zephyr Project bug tracking Jira system to begin the issue management lifecycle, limiting access to information on a need-to-know basis.  The team then triaged the issues to self-assess the severity and set a priority. At this time, CVE numbers were created (without publishing details) for all critical, high and medium risk issues. 

At that time, Zephyr did not have a registry for notification of vulnerabilities to users who are not members represented in the response team. That registry is now in place giving project members and registered product makers need-to-know access to information when CVE numbers are created so those affected can take action.

Following CVE number creation, issues were assigned for fixes. Zephyr has Maintainer roles for major sub-systems which may be assigned and/or delegated to those with the deepest knowledge of the affected code. Once fixes were proposed, they went through a key step in the life cycle of all Zephyr contributions – the Pull Request (PR) & Review.  This step allowed peers to assess the fix for quality and correctness.  Once approved, the PRs were merged resulting in code being changed in Zephyr to resolve the issues for the next Zephyr release, v2.2. 

8 issues in total were prioritized by the Project and fixed in time for v2.2 of Zephyr (March 2020) which was already in the release process at the time. The issues were still under embargo, so only those with a need-to-know had a link between the vulnerability and the PR for each fix.  

While a further 5 lower priority issues were resolved in the Zephyr Master branch in preparation for Zephyr v2.3, the security committee further assessed the fixes that should be back-ported to Zephyr v2.2, and the LTS release (v1.14) to maintain those releases for users.  4 fixes were back-ported to create v2.2.1 in May 2020, and 7 fixes for LTS release to create 1.14.2 in April 2020 (Note: some issues were not affecting this release).

On May 26, the embargo was lifted and NCC Group published the report. CVE entries were updated with details of the vulnerabilities.

The response effort involved people from many of our Project members, and individual contributors, to collaborate to manage the issues.  NCC Group was able to observe the response of the project prior to publication, and the Zephyr Project is pleased to direct our community to their findings.

It is an old adage that no software is bug free. Likewise, no software is completely secure or vulnerability free. The pursuit of securing an IoT solution is to reduce the risk of vulnerability inclusion by explicitly planning for secure design, security hardening of critical attack surfaces, and ensuring an issue management system is in place, including notification of findings so users are able to take mitigating actions early, and stay ahead of attackers. The work of NCC Group and our community has resulted in security vulnerabilities being fixed, an evaluation of our processes, and measurement of our response capabilities to make Zephyr better today than ever.

Many thanks to the work done by NCC Group; and many thanks to the Zephyr community that have worked to strengthen Zephyr and responded to these issues for the benefit of all users.

For more details about Zephyr’s security initiatives, please visit https://www.zephyrproject.org/security/. Or, feel free to join the Zephyr Project Slack and ask questions and participate in the discussions. 

Nordic Semiconductor offers broad product line support for its short-range and cellular IoT devices on nRF Connect platform including a suite of development tools and open source nRF Connect SDK

This article originally ran on the Nordic Semiconductor website. For more content like this, please visit https://www.nordicsemi.com/News.

The nRF Connect platform now adds support for Nordic’s popular nRF52 Series SoCs to complement existing support for the nRF5340 short-range and nRF9160 cellular IoT products. It’s now possible for product makers to use the same software development environment and tools for short- and long-range applicationsNordic Semiconductor today announces nRF Connect Software Development Kit (SDK) for short-range and cellular IoT products will support the market leading nRF52 Series Systems-on-Chip (SoCs) from v1.3 with code and documentation already available in the latest Master branch of development. This makes it possible to develop with nRF52 Series devices on the same platform as the recently launched flagship nRF5340 SoC and its award-winning nRF9160 System-In-Package (SiP) for cellular IoT applications. 


To complement this latest nRF Connect SDK release the nRF Connect suite of tools has added features to bring even greater simplicity to the development process. The new Toolchain Manager in nRF Connect for Desktop makes setting up an advanced configuration and build environment for nRF Connect SDK straight forward for Windows users, with Linux and Mac OS to come.


The nRF Connect SDK has been available for more than a year offering the toolbox to support Cellular IoT development with the nRF9160 SiP.  Developers seeking a feature-rich and scalable RTOS, or a powerful enterprise build system, can now look at what is possible for their future short-range application development. As an open source software solution, nRF Connect SDK brings a step change in flexibility and scalability for developing on Nordic products today and in the future.

The SDK incorporates the Zephyr RTOS for constrained, energy-conscious and secure IoT products. Development with nRF Connect SDK will allow developers to build highly reliable, efficient, multi-threaded applications that can scale better than ever before. nRF Connect will be a one-stop shop for developing any kind of connectivity product using Nordic technology. This makes things much simpler for developers in the long run and offers many fantastic benefits with code reuse across platforms being only one
Kjetil Holstad, Nordic SemiconductorThe Zephyr RTOS is integral to developing with nRF Connect SDK and is designed to enable very simple applications with one or two threads in a very compact build, occupying a small memory footprint, right up to applications running hundreds of threads safely and securely.

The Zephyr RTOS is a true open source RTOS that is under governance by the Linux Foundation. It has a rapidly expanding developer community and is the most active FLOSS IoT project, eclipsing other RTOS projects, with more than 600 contributors from the community of organizations and individuals over the past 12 months. The nRF Connect SDK has a broad range of support for short-range applications including Bluetooth® Low Energy (Bluetooth LE) and Bluetooth mesh. Cellular support for the nRF9160 SiP is also included for applications wanting to take advantage of the burgeoning market using LTE-M, NB-IoT, and GPS. It includes: MCUBoot, for secure boot and firmware updates and integrated support for Segger Embedded Studio Nordic Edition, which is available as a free download for Nordic developers. The nRF Connect SDK is hosted on GitHub and available under permissive licensing terms including Nordic 3 and 5-clause BSD and Apache 2.0. The true scalability of nRF Connect SDK is beyond multi-chip support. To make an LTE-M solution targeting an asset tracking application, as well as a Bluetooth LE medical application using the same driver, library, and RTOS interfaces means product makers can minimize investments to deploy a diverse product range or a multi-node, connected, IoT system with a mix of protocols and device types. The advanced build and configuration systems used in nRF Connect SDK make it possible to build for test and production, or even build for different hardware platforms, without code changes. The nRF Connect suite is a comprehensive set of tools to enable set up, evaluation and development using Nordic connectivity products. The suite consists of nRF Connect for Desktop, nRF Connect for Mobile, nRF Connect SDK, and nRF Connect for Cloud. nRF Connect for Desktop complements Segger Embedded Studio in the development and test phases. nRF Connect for Mobile offers a range of connectivity evaluation and test features on mobile apps. nRF Connect for Cloud supports testing and evaluation across cellular networks with rich user interface features and Cloud service integration options. 

“The nRF Connect SDK has been used by our cellular IoT customers for more than a year now,” says Kjetil Holstad, Director of Product Management, Nordic Semiconductor. “There is inevitably a transition period when taking a strategy to offer true open-source software solutions to developers with the features they want to use. It takes some time to arrive at a point where it begins to offer the same levels of functionality as our existing SDKs and SoftDevice Bluetooth LE stacks, but now we are approaching that inflection point. ” 

“Nordic SoftDevices and supporting SDKs have been integral to our market-leading position today. They will remain available and supported and are the right choice for building many future feature-packed applications. In fact, the DNA of this best-in-class software has made it into the nRF Connect SDK and it is now ready to offer a level of performance, flexibility, and scalability that wasn’t possible previously.  “nRF Connect will be a one-stop shop for developing any kind of connectivity product using Nordic technology. This makes things much simpler for developers in the long run and offers many fantastic benefits with code reuse across platforms being only one,” continues Holstad. 

nRF Connect SDK v1.2 and associated documentation is available now for download from GitHub and www.nordicsemi.com.

Support for the CivetWeb HTTP server in Zephyr

This blog originally ran on the Antmicro website. For more blogs and articles like this one, visit https://antmicro.com/blog/.

HTTP support in Zephyr

Zephyr has always had a big advantage in the form of its custom-tailored networking stack. As the RTOS continued to grow, more and more networking applications were developed to be run on top of it. However, while the networking stack itself proved to be very useful for its original purpose – proving that Zephyr was a robust and stable choice for IoT devices – its custom nature was becoming a burden. As Zephyr finds its way into more and more use cases, not all of which are tiny and wireless, the decision to rally around existing standard networking APIs was becoming more obvious, and some time ago the decision was made to base upon the well-known BSD sockets API.

The biggest issue with switching to another networking API was the ensuing necessity to rewrite all the applications and libraries which had been using the previous API so that they do not break, as full backwards compatibility was not an option. To make the transition process manageable, the Zephyr networking team decided to temporarily drop support for multiple protocols, including HTTP.

Obviously, that was not an ideal situation, and as a Silver Member of the Zephyr project with a long history of contributions to it, Antmicro was approached by the Zephyr Project Networking community to bring the missing capabilities back fast, so that HTTP-based applications could continue to be built even in the transition period. There were severals ways to approach it, the most obvious ones being:

  • doing what had already been done before, that is implementing our own HTTP support from scratch and tightly integrating it with Zephyr;
  • implementing the HTTP support as an application/sample, allowing others to use our code as a starting point for their Zephyr applications that use HTTP;
  • integrating an already existing third-party HTTP library with Zephyr.

Going the third-party route

As is our standard practice, we leaned towards reusing an existing library, and after a discussion with both our Client and in the broader forum we agreed that this route would be a valuable addition to the project. Zephyr is all about integrating with external libraries and frameworks, and one of the primary features of its helper meta-tool, West (yes, it’s a pun, in case you wondered), is multi-repo capability for pooling together code from various sources.

The third-party library route meant we could let the networking stack redesign and reimplementation proceed at its own pace, while we could fast-track to a fully-fledged implementation that had been proven to work before – and test how well Zephyr integrates with quite complex external libraries in the process. Another huge benefit of going that path was the possibility of testing the newly supported BSD sockets API – using a third-party library which had been working with that API for many years was a great way to verify the correctness and completeness of the Zephyr’s implementation.

An additional advantage here is that most HTTP libraries also rely on POSIX APIs, which Zephyr is working to be compliant with as well. The support for the POSIX APIs is still under development, but porting an external application which uses them can serve as a great starting point to improve Zephyr in that area.

CivetWeb turning out to be the best fit

After researching various open-source HTTP implementations, we decided that CivetWeb was the best candidate. Civetweb’s mission is to provide a permissively licensed, easy-to-use, powerful, C (C/C++) embeddable web server with optional CGI, SSL and Lua support. CivetWeb can be used as a library, adding web server functionality to an existing application, or it can work as a stand-alone web server running on Windows or Linux.

As it turned out, CivetWeb had everything we needed: it can work both as an HTTP client and HTTP server, it can be easily embedded into an already existing application, it can be used as a library and it is highly customizable so we could remove all the features we didn’t need, which made it easier to use it on resource-constrained devices that Zephyr is targetting. It uses both the BSD sockets API and the POSIX APIs, making it a great real-life test for Zephyr.

Making CivetWeb work with Zephyr

The project required our work on both ends. First we made it possible for CivetWeb to be compiled as a Zephyr library, by preparing a CMake configuration in CivetWeb so it could be included by the Zephyr buildsystem. We also enabled CivetWeb to work on OSes with no filesystem and added several Zephyr-specific modifications.
Then we added it to Zephyr as a West module. The final step was adding a simple sample application which could serve as a quick-start guide for other users.

We used the Microchip SAM E70 Xplained board for development and testing. Running the sample application on it results in the board serving an HTTP page at 10.0.0.111:8080 (or other, depending on the settings). It serves several URLs which are used to show various possibilities of the server (like serving static text, handling json requests or cookie usage). In addition to that, it can also be used to demonstrate handling of various HTTP errors (like 404 - not found).

Main page of the CivetWeb Zephyr sample

It can be built like any other Zephyr sample, e.g. for the Atmel SAM E70 Xplained board, run:

west build -b sam_e70_xplained samples/net/sockets/civetweb

For more information about the sample refer to the README.

Tapping into Open Source

Zephyr is a popular, multi-purpose, security-focused and robust RTOS which owes its capabilities to active developers and code quality, as well as the open style of governance and flexibility. By turning to standard APIs used in the open-source world, Zephyr was able to harness the functionalities of numerous available software applications, making it even easier to build complex solutions that would not be feasible without the use of third-party libraries.
The ability to integrate with a very complex application like CivetWeb to provide HTTP implementation proves Zephyr’s modularity and versatility.

Antmicro has a long history of integrating great open source projects together – check out our recent work on combining TFLite with Zephyr.

If you have a project which could benefit from using Zephyr’s capabilities with third-party libraries, or are building a product which needs integrating many software components together, feel free to reach out to us at contact@antmicro.com.

First micro-ROS Application on Zephyr RTOS

This tutorial aims to create a new micro-ROS application on Olimex STM32-E407 evaluation board with Zephyr RTOS. It originally ran on the micro-ROS website. For more content like this, click here.

Required hardware

This tutorial uses the following hardware:

Item
Olimex STM32-E407
Olimex ARM-USB-TINY-H
USB-Serial Cable Female

What is micro-ROS?

micro-ROS is an open source robotic operating system which bridges extremely resource constrained platforms to more complex robotic architectures of ROS 2, the de facto standard robotic framework. It runs on Real-Time Operating Systems (RTOS), and uses the DDS middleware Micro XRCE-DDS, that is, DDS for eXtremely Resource-Constrained Environments.

Adding a new micro-ROS app

First of all, make sure that you have a ROS 2 installation.

TIP: if you are familiar with Docker containers, this image may be useful: ros:dashing

On the ROS 2 installation open a command line and follow these steps:

# Source the ROS 2 installation
source /opt/ros/$ROS_DISTRO/setup.bash

# Create a workspace and download the micro-ROS tools
mkdir microros_ws 
cd microros_ws
git clone -b $ROS_DISTRO https://github.com/micro-ROS/micro-ros-build.git src/micro-ros-build

# Update dependencies using rosdep
sudo apt update && rosdep update
rosdep install --from-path src --ignore-src -y

# Build micro-ROS tools and source them
colcon build
source install/local_setup.bash

Now, let’s create a firmware workspace that targets all the required code and tools for Olimex development board and Zephyr:

# Create step
ros2 run micro_ros_setup create_firmware_ws.sh zephyr olimex-stm32-e407

Now you have all the required tools to crosscompile micro-ROS and Zephyr for Olimex STM32-E407 development board. At this point, you must know that the micro-ROS build system is a four-step workflow:

  1. Create: retrieves all the required packages for a specific RTOS and hardware platform.
  2. Configure: configures the downloaded packages with options such as the micro-ROS application, the selected transport layer or the micro-ROS agent IP address (in network transports).
  3. Build: generates a binary file ready for being loaded in the hardware.
  4. Flash: load the micro-ROS software in the hardware.

micro-ROS apps for Olimex + Zephyr are located at firmware/zephyr_apps/apps. In order to create a new application, create a new folder containing two files: the app code (inside a src folder) and the RMW configuration.

# Creating a new app
pushd firmware/zephyr_apps/apps
mkdir my_brand_new_app
cd my_brand_new_app
mkdir src
touch src/app.c app-colcon.meta
popd

You will also need some other Zephyr related files: a CMakeLists.txt in order to define the building process and a prj.conf where Zephyr is configured. You have these two files here, for now it is ok to copy them.

For this example we are going to create a ping pong app where a node sends a ping package with a unique identifier using a publisher and the same package is received by a pong subscriber. The node will also answer to pings received from other nodes with a pong message:

pingpong

To start creating this app, let’s configure the RMW with the required static memory. You can read more about RMW and Micro XRCE-DDS Configuration here. The app-colcon.meta should look like:

{
    "names": {
        "rmw_microxrcedds": {
            "cmake-args": [
                "-DRMW_UXRCE_MAX_NODES=1",
                "-DRMW_UXRCE_MAX_PUBLISHERS=2",
                "-DRMW_UXRCE_MAX_SUBSCRIPTIONS=2",
                "-DRMW_UXRCE_MAX_SERVICES=0",
                "-DRMW_UXRCE_MAX_CLIENTS=0",
                "-DRMW_UXRCE_MAX_HISTORY=4",
            ]
        }
    }
}

Meanwhile src/app.c should look like the following code:

#include <rcl/rcl.h>
#include <rcl_action/rcl_action.h>
#include <rcl/error_handling.h>
#include "rosidl_generator_c/string_functions.h"
#include <std_msgs/msg/header.h>

#include <rmw_uros/options.h>

#include <stdio.h>
#include <unistd.h>

#include <zephyr.h>

#define STRING_BUFFER_LEN 100

// App main function
void main(void)
{
  //Init RCL options
  rcl_init_options_t options = rcl_get_zero_initialized_init_options();
  rcl_init_options_init(&options, rcl_get_default_allocator());
  
  // Init RCL context
  rcl_context_t context = rcl_get_zero_initialized_context();
  rcl_init(0, NULL, &options, &context);

  // Create a node
  rcl_node_options_t node_ops = rcl_node_get_default_options();
  rcl_node_t node = rcl_get_zero_initialized_node();
  rcl_node_init(&node, "pingpong_node", "", &context, &node_ops);

  // Create a reliable ping publisher
  rcl_publisher_options_t ping_publisher_ops = rcl_publisher_get_default_options();
  rcl_publisher_t ping_publisher = rcl_get_zero_initialized_publisher();
  rcl_publisher_init(&ping_publisher, &node, ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Header), "/microROS/ping", &ping_publisher_ops);

  // Create a best effort pong publisher
  rcl_publisher_options_t pong_publisher_ops = rcl_publisher_get_default_options();
  pong_publisher_ops.qos.reliability = RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT;
  rcl_publisher_t pong_publisher = rcl_get_zero_initialized_publisher();
  rcl_publisher_init(&pong_publisher, &node, ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Header), "/microROS/pong", &pong_publisher_ops);

  // Create a best effort pong subscriber
  rcl_subscription_options_t pong_subscription_ops = rcl_subscription_get_default_options();
  pong_subscription_ops.qos.reliability = RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT;
  rcl_subscription_t pong_subscription = rcl_get_zero_initialized_subscription();
  rcl_subscription_init(&pong_subscription, &node, ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Header), "/microROS/pong", &pong_subscription_ops);

  // Create a best effort ping subscriber
  rcl_subscription_options_t ping_subscription_ops = rcl_subscription_get_default_options();
  ping_subscription_ops.qos.reliability = RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT;
  rcl_subscription_t ping_subscription = rcl_get_zero_initialized_subscription();
  rcl_subscription_init(&ping_subscription, &node, ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Header), "/microROS/ping", &ping_subscription_ops);

  // Create a wait set
  rcl_wait_set_t wait_set = rcl_get_zero_initialized_wait_set();
  rcl_wait_set_init(&wait_set, 2, 0, 0, 0, 0, 0, &context, rcl_get_default_allocator());

  // Create and allocate the pingpong publication message
  std_msgs__msg__Header msg;
  char msg_buffer[STRING_BUFFER_LEN];
  msg.frame_id.data = msg_buffer;
  msg.frame_id.capacity = STRING_BUFFER_LEN;

  // Create and allocate the pingpong subscription message
  std_msgs__msg__Header rcv_msg;
  char rcv_buffer[STRING_BUFFER_LEN];
  rcv_msg.frame_id.data = rcv_buffer;
  rcv_msg.frame_id.capacity = STRING_BUFFER_LEN;

  // Set device id and sequence number;
  int device_id = rand();
  int seq_no;
  
  int pong_count = 0;
  struct timespec ts;
  rcl_ret_t rc;

  uint32_t iterations = 0;

  do {
    // Clear and set the waitset
    rcl_wait_set_clear(&wait_set);
    
    size_t index_pong_subscription;
    rcl_wait_set_add_subscription(&wait_set, &pong_subscription, &index_pong_subscription);

    size_t index_ping_subscription;
    rcl_wait_set_add_subscription(&wait_set, &ping_subscription, &index_ping_subscription);
    
    // Run session for 100 ms
    rcl_wait(&wait_set, RCL_MS_TO_NS(100));

    // Check if it is time to send a ping
    if (iterations++ % 50 == 0) {
      // Generate a new random sequence number
      seq_no = rand();
      sprintf(msg.frame_id.data, "%d_%d", seq_no, device_id);
      msg.frame_id.size = strlen(msg.frame_id.data);
      
      // Fill the message timestamp
      clock_gettime(CLOCK_REALTIME, &ts);
      msg.stamp.sec = ts.tv_sec;
      msg.stamp.nanosec = ts.tv_nsec;

      // Reset the pong count and publish the ping message
      pong_count = 0;
      rcl_publish(&ping_publisher, (const void*)&msg, NULL);
      printf("Ping send seq %s\n", msg.frame_id.data);  
    }
    
    // Check if some pong message is received
    if (wait_set.subscriptions[index_pong_subscription]) {
      rc = rcl_take(wait_set.subscriptions[index_pong_subscription], &rcv_msg, NULL, NULL);

      if(rc == RCL_RET_OK && strcmp(msg.frame_id.data,rcv_msg.frame_id.data) == 0) {
          pong_count++;
          printf("Pong for seq %s (%d)\n", rcv_msg.frame_id.data, pong_count);
      }
    }

    // Check if some ping message is received and pong it
    if (wait_set.subscriptions[index_ping_subscription]) {
      rc = rcl_take(wait_set.subscriptions[index_ping_subscription], &rcv_msg, NULL, NULL);

      // Dont pong my own pings
      if(rc == RCL_RET_OK && strcmp(msg.frame_id.data,rcv_msg.frame_id.data) != 0){
        printf("Ping received with seq %s. Answering.\n", rcv_msg.frame_id.data);
        rcl_publish(&pong_publisher, (const void*)&rcv_msg, NULL);
      }
    }
    
    usleep(10000);
  } while (true);
}

Once the new folder is created, let’s configure our new app with a serial transport on the USB:

# Configure step
ros2 run micro_ros_setup configure_firmware.sh my_brand_new_app --transport serial-usb

When the configuring step ends, just build the firmware:

# Build step
ros2 run micro_ros_setup build_firmware.sh

Once the build has successfully ended, let’s power and connect the board. First, connect Olimex ARM-USB-TINY-H JTAG programmer to the board’s JTAG port:

Make sure that the board power supply jumper (PWR_SEL) is in the 3-4 position in order to power the board from the JTAG connector:

You should see the red LED lighting. It is time to flash the board:

# Flash step
ros2 run micro_ros_setup flash_firmware.sh

Running the micro-ROS app

The micro-ROS app is ready to connect to a micro-ROS-Agent and start talking with the rest of the ROS 2 world.

First of all, create and build a micro-ROS agent:

# Download micro-ROS-Agent packages
ros2 run micro_ros_setup create_agent_ws.sh

# Build micro-ROS-Agent packages, this may take a while.
colcon build
source install/local_setup.bash

Then connect the Olimex development board to the computer using the USB OTG 2 connector (the miniUSB connector that is furthest from the Ethernet port).

TIP: Color codes are applicable to this cable. Make sure to match Olimex Rx with Cable Tx and vice-versa. Remember GND!

Then run the agent:

# Run a micro-ROS agent
ros2 run micro_ros_agent micro_ros_agent serial --dev [device]

TIP: you can use this command to find your serial device name: ls /dev/serial/by-id/*. Probably it will be something like /dev/serial/by-id/usb-ZEPHYR_Zephyr_microROS_3536510100290035-if00

And finally, let’s check that everything is working in another command line. We are going to listen to ping topic to check whether the Ping Pong node is publishing its own pings

source /opt/ros/$ROS_DISTRO/setup.bash

# Subscribe to micro-ROS ping topic
ros2 topic echo /microROS/ping

You should see the topic messages published by the Ping Pong node every 5 seconds:

user@user:~$ ros2 topic echo /microROS/ping
stamp:
  sec: 20
  nanosec: 867000000
frame_id: '1344887256_1085377743'
---
stamp:
  sec: 25
  nanosec: 942000000
frame_id: '730417256_1085377743'
---

On another command line, let’s subscribe to the pong topic

source /opt/ros/$ROS_DISTRO/setup.bash

# Subscribe to micro-ROS pong topic
ros2 topic echo /microROS/pong

At this point, we know that our app is publishing pings. Let’s check if it also answers to someone else pings in a new command line:

source /opt/ros/$ROS_DISTRO/setup.bash

# Send a fake ping
ros2 topic pub --once /microROS/ping std_msgs/msg/Header '{frame_id: "fake_ping"}'

Now, we should see on the ping subscriber our fake ping along with the board pings:

user@user:~$ ros2 topic echo /microROS/ping
stamp:
  sec: 0
  nanosec: 0
frame_id: fake_ping
---
stamp:
  sec: 305
  nanosec: 973000000
frame_id: '451230256_1085377743'
---
stamp:
  sec: 310
  nanosec: 957000000
frame_id: '2084670932_1085377743'
---

And in the pong subscriber, we should see the board’s answer to our fake ping:

user@user:~$ ros2 topic echo /microROS/pong
stamp:
  sec: 0
  nanosec: 0
frame_id: fake_ping
---


 Improve this page

Designing a RISC-V CPU in VHDL, Part 19: Adding Trace Dump Functionality

Written by Colin Riley, an Engineer and Writer at Domipheus Labs

This part of a series of posts detailing the steps and learning undertaken to design and implement a CPU in VHDL. You can find more articles from Colin on his blog via http://labs.domipheus.com/blog/. To read more from this series, click here.

For those who follow me on twitter, you’ll have seen my recent tweets regarding Zephyr OS running on RPU. This was a huge amount of work to get running, most of it debugging on the FPGA itself. For those new to FPGA development, trying to debug on-chip can be a very difficult and frustrating experience. Generally, you want to debug in the simulator – but when potential issues are influenced by external devices such as SD cards, timer interrupts, and hundreds of millions of cycles into the boot process of an operating system – simulators may not be feasible.

Blog posts on the features I added to RPU to enable Zephyr booting, such as proper interrupts, exceptions and timers are coming – but it would not have been possible without a feature of the RPU SoC I have not yet discussed.

CPU Tracing

Most real processors will have hardware features built in, and one of the most useful low-level tools is tracing. This is when at an arbitrary time slice, low level details on the inner operation of the core are captured into some buffer, before being streamed elsewhere for analysis and state reconstruction later.

Note that this is a one-way flow of data. It is not interactive, like the debugging most developers know. It is mostly used for performance profiling but for RPU would be an ideal debugging aid.

Requirements

For the avoidance of doubt; I’m defining “A Trace” to be one block of valid data which is dumped to a host PC for analysis. For us, dumping will be streaming the data out via UART to a development PC. Multiple traces can be taken, but when the data transfer is initiated, the data needs to be a real representation of what occurred immediately preceding the request to dump the trace. The data contained in a trace is always being captured on the device in order that if a request is made, the data is available.

These requirements require a circular buffer which is continually recording the state. I’ll define exactly what the data is later – but for now, the data is defined as 64-bits per cycle. Plenty for a significant amount of state to be recorded, which will be required in order to perform meaningful analysis. We have a good amount of block rams on our Spartan 7-50 FPGA, so we can dedicate 32KB to this circular buffer quite easily. 64-bits into 32KB gives us 4,096 cycles of data. Not that much you’d think for a CPU running at over 100MHz, but you’d be surprised how quickly RPU falls over when it gets into an invalid state!

It goes without saying that our implementation needs to be non-intrusive. I’m not currently using the UART connected to the FTDI USB controller, as our logging output is displayed graphically via a text-mode display over HDMI. We can use this without impacting existing code. Our CPU core will expose a debug trace bus signal, which will be the data captured.

We’ve mentioned the buffer will be in a block ram; but one aspect of this is that we must be wary of the observer effect. This issue is very much an issue for performance profiling, as streaming out data from various devices usually goes through memory subsystems which will increase bandwidth requirements, and lead to more latency in the memory operations you are trying to trace. Our trace system should not effect the execution characteristics of the core at all. As we are using a development PC to receive the streamed data, we can completely segregate all data paths for the trace system, and remove the block ram from the memory mapped area which is currently used for code and data. With this block ram separate, we can ensure it’s set up as a true dual port ram with data width the native 64bit. One port will be for writing data from the CPU, on the CPU clock domain. The second port will be used for reading the data out at a rate which is dictated by the UART serial baud – much, much, slower. Doing this will ensure tracing will not impact execution of the core at any point, meaning our dumped data is much more valuable.

Lastly, we want to trigger these dumps at a point in time when we think an issue has occurred. Two immediate trigger types come to mind in addition to a manual button.

  1. Memory address
  2. Comparison with the data which is to be dumped; i.e, pipeline status flags combined with instruction types.

Implementation

The implementation is very simple. I’ve added a debug signal output to the CPU core entity. It’s 64 bits of data consisting of 32 bits of status bits, and a 32-bit data value as defined below.

This data is always being output by the core, changing every cycle. The data value can be various things; the PC when in a STAGE_FETCH state, the ALU result, the value we’re writing to rD in WRITEBACK, or a memory location during a load/store.

We only need two new processes for the system:

  • trace_streamout: manages the streaming out of bytes from the trace block ram
  • trace_en_check: inspects trigger conditions in order to initiate a trace dump which trace_streamout will handle

The BRAM used as the circular trace buffer is configured as 64-bits word length, with 4096 addresses. It was created using the Block Memory Generator, and has a read latency of 2 cycles.

We will use a clock cycle counter which already exists for dictating write locations into the BRAM. As it’s used as a circular buffer, we simply take the lower 12 bits of the clock counter as address into the BRAM.

Port A of the BRAM is the write port, with it’s address line tied to the bits noted above. It is enabled by a signal only when the trace_streamout process is idle. This is so when we do stream out the data we want, it’s not polluted with new data while our slow streamout to UART is active. That new data is effectively lost. As this port captures the cpu core O_DBG output, it’s clocked at the CPU core clock.

Port B is the read port. It’s clocked using the 100MHz reference clock (which also drives the UART – albeit then subsampled via a baud tick). It’s enabled when a streamout state is requested, and reads an address dictated by the trace_streamout process.

The trace_streamout process, when the current streamout state is idle, checks for a dump_enable signal. Upon seeing this signal, the last write address is latched from the lower cycle counter 12 bits. We also set a streamout location to be that last write +1. This location is what is fed into Port B of the BRAM/ circular trace buffer. When we change the read address on port B, we wait some cycles for the value to properly propagate out. During this preload stall, we also wait for the UART TX to become ready for more data. The transmission is performed significantly slower than the clock that trace_streamout runs at, and we cannot write to the TX buffer if it’s full.

The UART I’m using is provided by Xilinx and has an internal 16-byte buffer. We wait for a ready signal as then we know that writing our 8 bytes of debug data (remember, 64-bit) quickly into the UART TX will succeed. In addition to the 8 bytes of data, I also send 2 bytes of magic number data at the start of every 64-bit packet as an aid to the receiving logic; we can check the first two bytes for these values to ensure we’re synced correctly in order to parse the data eventually.

After the last byte is written, we increment our streamout location address. If it’s not equal to the last write address we latched previously, we move to the preload stall and move the next 8 bytes of trace data out. Otherwise, we are finished transmitting the entire trace buffer, so set out state back to idle and re-enable new trace data writes.

Triggering streamout

Triggering a dump using dump_enable can be done a variety of ways. I have a physical push-button on my Arty S7 board set to always enable a dump, which is useful to know where execution currently is in a program. I have also got a trigger on reading a certain memory address. This is good if there is an issue triggering an error which you can reliably track to a branch of code execution. Having a memory address in that code branch used as trigger will dump the cycles leading up to that branch being taken. There are one other types of trigger – relying on the cpu O_DBG signal itself, for example, triggering a dump when we encounter an decoder interrupt for an invalid instruction.

I hard-code these triggers in the VHDL currently, but it’s feasible that these can be configurable programmatically. The dump itself could also be triggered via a write to a specific MMIO location.

Parsing the data on the Debug PC

The UART TX on the FPGA is connected to the FTDI USB-UART bridge, which means when the FPGA design is active and the board is connected via USB, we can just open the COM port exposed via the USB device.

I made a simple C# command line utility which just dumps the packets in a readable form. It looks like this:

12345678910[22:54:19.6133781]Trace Packet, 00000054,  0xC3 40 ,   OPCODE_BRANCH ,     STAGE_FETCH , 0x000008EC INT_EN , :[22:54:19.6143787]Trace Packet, 00000055,  0xD1 40 ,   OPCODE_BRANCH ,    STAGE_DECODE , 0x04C12083 INT_EN , :[22:54:19.6153795]Trace Packet, 00000056,  0xE1 40 ,     OPCODE_LOAD ,       STAGE_ALU , 0x00000001 INT_EN , :[22:54:19.6163794]Trace Packet, 00000057,  0xF1 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6183798]Trace Packet, 00000058,  0x01 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6183798]Trace Packet, 00000059,  0x11 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6193799]Trace Packet, 00000060,  0x20 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6203802]Trace Packet, 00000061,  0x31 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6213808]Trace Packet, 00000062,  0x43 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6213808]Trace Packet, 00000063,  0x51 C0 ,     OPCODE_LOAD , STAGE_WRITEBACK , 0x00001CDC REG_WR  INT_EN , :

You can see some data given by the utility such as timestamps and a packet ID. Everything else is derived from flags in the trace data for that cycle.

Later I added some additional functionality, like parsing register destinations and outputting known register/memory values to aid when going over the output.

1234[22:54:19.6213808]Trace Packet, 00000062,  0x43 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6213808]Trace Packet, 00000063,  0x51 C0 ,     OPCODE_LOAD , STAGE_WRITEBACK , 0x00001CDC REG_WR  INT_EN , :MEMORY 0x0000476C = 0x00001CDCREGISTER ra = 0x00001CDC

I have also been working on a rust-based GUI debugger for these trace files, where you can look at known memory (usually the stack) and register file contents at a given packet by walking the packets up until the point you’re interested in. It was an excuse to get to know Rust a bit more, but it’s not completely functional and I use the command line C# version more.

The easiest use for this is the physical button for dumping the traces. When bringing up some new software on the SoC it rarely works first time and end up in an infinite loop of some sort. Using the STAGE_FETCH packets which contain the PC I can look to an objdump and see immediately where we are executing without impacting upon the execution of the code itself.

Using the data to debug issues

Now to spoil a bit of the upcoming RPU Interrupts/Zephyr post with an example of how these traces have helped me. But I think an example of a real problem the trace dumps helped solve is required.

After implementing external timer interrupts, invalid instruction interrupts, system calls – and fixed a ton of issues – I had the Zephyr Dining Philosophers sample running on RPU in all it’s threaded, synchronized, glory.

Why do I need invalid instruction interrupts? Because RPU does not implement the M RISC-V extension. So multiply and divide hardware does not exist. Sadly, somewhere in the Zephyr build system, there is assembly with mul and div instructions. I needed invalid instruction interrupts in order to trap into an exception handler which could software emulate the instruction, write the result back into the context, so that when we returned from the interrupt to PC+4 the new value for the destination register would be written back.

It’s pretty funny to think that for me, implementing that was easier than trying to fix a build system to compile for the architecture intended.

Anyway, I was performing long-running tests of dining philosophers, when I hit the fatal error exception handler for trying to emulate an instruction it didn’t understand. I was able to replicate it, but it could take hours of running before it happened. The biggest issue? The instruction we were trying to emulate was at PC 0x00000010 – the start of the exception handler!

So, I set up the CPU trace trigger to activate on the instruction that branches to print that “FATAL: Reg is bad” message, started the FPGA running, and left the C# app to capture any trace dumps. After a few hours the issue occurred, and we had our CPU trace of the 4096 cycles leading up to the fatal error. Some hundreds of cycles before the dump initiated, we have the following output.

What on earth is happening here? This is a lesson as to why interrupts have priorities 🙂

I’ve tried to reduce the trace down to minimum and lay it out so it makes sense. There are a few things you need to know about the RPU exception system which have yet to be discussed:

Each Core has a Local Interrupt Controller (LINT) which can accept interrupts at any stage of execution, provide the ACK signal to let the requester know it’s been accepted, and then at a safe point pass it on to the Control Unit to initiate transfer of execution to the exception vector. This transfer can only happen after a writeback, hence the STALL stages as it’s set up before fetching the first instruction of the exception vector at 0x00000010. If the LINT sees external interrupts requests (EXT_INT – timer interrupts) at the same time as decoder interrupts for invalid instruction, it will always choose the decoder above anything – as that needs immediately handled.

And here is what happens above:

  1. We are fetching PC 0x00000328, which happens to be an unsupported instruction which will be emulated by our invalid instruction handler.
  2. As we are fetching, and external timer interrupt fires (Packet 01)
  3. The LINT acknoledges the external interrupt as there is no higher priority request pending, and signals to the control unit an int is pending LINT_INT (Packet 2)
  4. As we wait for the WRITEBACK phase for the control unit to transfer to exception vector, PC 0x00000328 decodes as an illegal instruction and DECODER_INT is requested (Packet 5)
  5. LINT cannot acknowledge the decoder int as the control unit can only handle a single interrupt at a time, and its waiting to handle the external interrupt.
  6. The control unit accepts the external LINT_INT, and stalls for transfer to exception vector, and resets LINT so it can accept new requests (Packet 7).
  7. We start fetching the interrupt vector 0x00000010 (Packet 12)
  8. The LINT sees the DECODE_INT and immediately accepts and acknowledges.
  9. The control unit accepts the LINT_INT, stalls for transfer to exception vector, with the PC of the exception being set to 0x00000010 (Packet 20).
  10. Everything breaks, the PC get set to a value in flux, which just so happened to be in the exception vector (Packet 25).

In short, if an external interrupt fires during the fetch stage of an illegal instruction, the illegal instruction will not be handled correctly and state is corrupted.

Easily fixed with some further enable logic for external interrupts to only be accepted after fetch and decode. But one hell is an issue to find without the CPU trace dumps!

Finishing up

So, as you can see, trace dumps are an great feature to have in RPU. A very simple implementation can yield enough information to work with on problems where the simulator just is not viable. With different trigger options, and the ability to customize the O_DBG signal to further narrow down issues under investigation, it’s invaluable. In fact, I’ll probably end up putting this system into any similarly complex FPGA project in the future. The HDL will shortly be submitted to the SoC github repo along with the updated core which supports interrupts.