Support for the CivetWeb HTTP server in Zephyr

This blog originally ran on the Antmicro website. For more blogs and articles like this one, visit https://antmicro.com/blog/.

HTTP support in Zephyr

Zephyr has always had a big advantage in the form of its custom-tailored networking stack. As the RTOS continued to grow, more and more networking applications were developed to be run on top of it. However, while the networking stack itself proved to be very useful for its original purpose – proving that Zephyr was a robust and stable choice for IoT devices – its custom nature was becoming a burden. As Zephyr finds its way into more and more use cases, not all of which are tiny and wireless, the decision to rally around existing standard networking APIs was becoming more obvious, and some time ago the decision was made to base upon the well-known BSD sockets API.

The biggest issue with switching to another networking API was the ensuing necessity to rewrite all the applications and libraries which had been using the previous API so that they do not break, as full backwards compatibility was not an option. To make the transition process manageable, the Zephyr networking team decided to temporarily drop support for multiple protocols, including HTTP.

Obviously, that was not an ideal situation, and as a Silver Member of the Zephyr project with a long history of contributions to it, Antmicro was approached by the Zephyr Project Networking community to bring the missing capabilities back fast, so that HTTP-based applications could continue to be built even in the transition period. There were severals ways to approach it, the most obvious ones being:

  • doing what had already been done before, that is implementing our own HTTP support from scratch and tightly integrating it with Zephyr;
  • implementing the HTTP support as an application/sample, allowing others to use our code as a starting point for their Zephyr applications that use HTTP;
  • integrating an already existing third-party HTTP library with Zephyr.

Going the third-party route

As is our standard practice, we leaned towards reusing an existing library, and after a discussion with both our Client and in the broader forum we agreed that this route would be a valuable addition to the project. Zephyr is all about integrating with external libraries and frameworks, and one of the primary features of its helper meta-tool, West (yes, it’s a pun, in case you wondered), is multi-repo capability for pooling together code from various sources.

The third-party library route meant we could let the networking stack redesign and reimplementation proceed at its own pace, while we could fast-track to a fully-fledged implementation that had been proven to work before – and test how well Zephyr integrates with quite complex external libraries in the process. Another huge benefit of going that path was the possibility of testing the newly supported BSD sockets API – using a third-party library which had been working with that API for many years was a great way to verify the correctness and completeness of the Zephyr’s implementation.

An additional advantage here is that most HTTP libraries also rely on POSIX APIs, which Zephyr is working to be compliant with as well. The support for the POSIX APIs is still under development, but porting an external application which uses them can serve as a great starting point to improve Zephyr in that area.

CivetWeb turning out to be the best fit

After researching various open-source HTTP implementations, we decided that CivetWeb was the best candidate. Civetweb’s mission is to provide a permissively licensed, easy-to-use, powerful, C (C/C++) embeddable web server with optional CGI, SSL and Lua support. CivetWeb can be used as a library, adding web server functionality to an existing application, or it can work as a stand-alone web server running on Windows or Linux.

As it turned out, CivetWeb had everything we needed: it can work both as an HTTP client and HTTP server, it can be easily embedded into an already existing application, it can be used as a library and it is highly customizable so we could remove all the features we didn’t need, which made it easier to use it on resource-constrained devices that Zephyr is targetting. It uses both the BSD sockets API and the POSIX APIs, making it a great real-life test for Zephyr.

Making CivetWeb work with Zephyr

The project required our work on both ends. First we made it possible for CivetWeb to be compiled as a Zephyr library, by preparing a CMake configuration in CivetWeb so it could be included by the Zephyr buildsystem. We also enabled CivetWeb to work on OSes with no filesystem and added several Zephyr-specific modifications.
Then we added it to Zephyr as a West module. The final step was adding a simple sample application which could serve as a quick-start guide for other users.

We used the Microchip SAM E70 Xplained board for development and testing. Running the sample application on it results in the board serving an HTTP page at 10.0.0.111:8080 (or other, depending on the settings). It serves several URLs which are used to show various possibilities of the server (like serving static text, handling json requests or cookie usage). In addition to that, it can also be used to demonstrate handling of various HTTP errors (like 404 - not found).

Main page of the CivetWeb Zephyr sample

It can be built like any other Zephyr sample, e.g. for the Atmel SAM E70 Xplained board, run:

west build -b sam_e70_xplained samples/net/sockets/civetweb

For more information about the sample refer to the README.

Tapping into Open Source

Zephyr is a popular, multi-purpose, security-focused and robust RTOS which owes its capabilities to active developers and code quality, as well as the open style of governance and flexibility. By turning to standard APIs used in the open-source world, Zephyr was able to harness the functionalities of numerous available software applications, making it even easier to build complex solutions that would not be feasible without the use of third-party libraries.
The ability to integrate with a very complex application like CivetWeb to provide HTTP implementation proves Zephyr’s modularity and versatility.

Antmicro has a long history of integrating great open source projects together – check out our recent work on combining TFLite with Zephyr.

If you have a project which could benefit from using Zephyr’s capabilities with third-party libraries, or are building a product which needs integrating many software components together, feel free to reach out to us at contact@antmicro.com.

First micro-ROS Application on Zephyr RTOS

This tutorial aims to create a new micro-ROS application on Olimex STM32-E407 evaluation board with Zephyr RTOS. It originally ran on the micro-ROS website. For more content like this, click here.

Required hardware

This tutorial uses the following hardware:

Item
Olimex STM32-E407
Olimex ARM-USB-TINY-H
USB-Serial Cable Female

Adding a new micro-ROS app

First of all, make sure that you have a ROS 2 installation.

TIP: if you are familiar with Docker containers, this image may be useful: ros:dashing

On the ROS 2 installation open a command line and follow these steps:

# Source the ROS 2 installation
source /opt/ros/$ROS_DISTRO/setup.bash

# Create a workspace and download the micro-ROS tools
mkdir microros_ws 
cd microros_ws
git clone -b $ROS_DISTRO https://github.com/micro-ROS/micro-ros-build.git src/micro-ros-build

# Update dependencies using rosdep
sudo apt update && rosdep update
rosdep install --from-path src --ignore-src -y

# Build micro-ROS tools and source them
colcon build
source install/local_setup.bash

Now, let’s create a firmware workspace that targets all the required code and tools for Olimex development board and Zephyr:

# Create step
ros2 run micro_ros_setup create_firmware_ws.sh zephyr olimex-stm32-e407

Now you have all the required tools to crosscompile micro-ROS and Zephyr for Olimex STM32-E407 development board. At this point, you must know that the micro-ROS build system is a four-step workflow:

  1. Create: retrieves all the required packages for a specific RTOS and hardware platform.
  2. Configure: configures the downloaded packages with options such as the micro-ROS application, the selected transport layer or the micro-ROS agent IP address (in network transports).
  3. Build: generates a binary file ready for being loaded in the hardware.
  4. Flash: load the micro-ROS software in the hardware.

micro-ROS apps for Olimex + Zephyr are located at firmware/zephyr_apps/apps. In order to create a new application, create a new folder containing two files: the app code (inside a src folder) and the RMW configuration.

# Creating a new app
pushd firmware/zephyr_apps/apps
mkdir my_brand_new_app
cd my_brand_new_app
mkdir src
touch src/app.c app-colcon.meta
popd

You will also need some other Zephyr related files: a CMakeLists.txt in order to define the building process and a prj.conf where Zephyr is configured. You have these two files here, for now it is ok to copy them.

For this example we are going to create a ping pong app where a node sends a ping package with a unique identifier using a publisher and the same package is received by a pong subscriber. The node will also answer to pings received from other nodes with a pong message:

pingpong

To start creating this app, let’s configure the RMW with the required static memory. You can read more about RMW and Micro XRCE-DDS Configuration here. The app-colcon.meta should look like:

{
    "names": {
        "rmw_microxrcedds": {
            "cmake-args": [
                "-DRMW_UXRCE_MAX_NODES=1",
                "-DRMW_UXRCE_MAX_PUBLISHERS=2",
                "-DRMW_UXRCE_MAX_SUBSCRIPTIONS=2",
                "-DRMW_UXRCE_MAX_SERVICES=0",
                "-DRMW_UXRCE_MAX_CLIENTS=0",
                "-DRMW_UXRCE_MAX_HISTORY=4",
            ]
        }
    }
}

Meanwhile src/app.c should look like the following code:

#include <rcl/rcl.h>
#include <rcl_action/rcl_action.h>
#include <rcl/error_handling.h>
#include "rosidl_generator_c/string_functions.h"
#include <std_msgs/msg/header.h>

#include <rmw_uros/options.h>

#include <stdio.h>
#include <unistd.h>

#include <zephyr.h>

#define STRING_BUFFER_LEN 100

// App main function
void main(void)
{
  //Init RCL options
  rcl_init_options_t options = rcl_get_zero_initialized_init_options();
  rcl_init_options_init(&options, rcl_get_default_allocator());
  
  // Init RCL context
  rcl_context_t context = rcl_get_zero_initialized_context();
  rcl_init(0, NULL, &options, &context);

  // Create a node
  rcl_node_options_t node_ops = rcl_node_get_default_options();
  rcl_node_t node = rcl_get_zero_initialized_node();
  rcl_node_init(&node, "pingpong_node", "", &context, &node_ops);

  // Create a reliable ping publisher
  rcl_publisher_options_t ping_publisher_ops = rcl_publisher_get_default_options();
  rcl_publisher_t ping_publisher = rcl_get_zero_initialized_publisher();
  rcl_publisher_init(&ping_publisher, &node, ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Header), "/microROS/ping", &ping_publisher_ops);

  // Create a best effort pong publisher
  rcl_publisher_options_t pong_publisher_ops = rcl_publisher_get_default_options();
  pong_publisher_ops.qos.reliability = RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT;
  rcl_publisher_t pong_publisher = rcl_get_zero_initialized_publisher();
  rcl_publisher_init(&pong_publisher, &node, ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Header), "/microROS/pong", &pong_publisher_ops);

  // Create a best effort pong subscriber
  rcl_subscription_options_t pong_subscription_ops = rcl_subscription_get_default_options();
  pong_subscription_ops.qos.reliability = RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT;
  rcl_subscription_t pong_subscription = rcl_get_zero_initialized_subscription();
  rcl_subscription_init(&pong_subscription, &node, ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Header), "/microROS/pong", &pong_subscription_ops);

  // Create a best effort ping subscriber
  rcl_subscription_options_t ping_subscription_ops = rcl_subscription_get_default_options();
  ping_subscription_ops.qos.reliability = RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT;
  rcl_subscription_t ping_subscription = rcl_get_zero_initialized_subscription();
  rcl_subscription_init(&ping_subscription, &node, ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, Header), "/microROS/ping", &ping_subscription_ops);

  // Create a wait set
  rcl_wait_set_t wait_set = rcl_get_zero_initialized_wait_set();
  rcl_wait_set_init(&wait_set, 2, 0, 0, 0, 0, 0, &context, rcl_get_default_allocator());

  // Create and allocate the pingpong publication message
  std_msgs__msg__Header msg;
  char msg_buffer[STRING_BUFFER_LEN];
  msg.frame_id.data = msg_buffer;
  msg.frame_id.capacity = STRING_BUFFER_LEN;

  // Create and allocate the pingpong subscription message
  std_msgs__msg__Header rcv_msg;
  char rcv_buffer[STRING_BUFFER_LEN];
  rcv_msg.frame_id.data = rcv_buffer;
  rcv_msg.frame_id.capacity = STRING_BUFFER_LEN;

  // Set device id and sequence number;
  int device_id = rand();
  int seq_no;
  
  int pong_count = 0;
  struct timespec ts;
  rcl_ret_t rc;

  uint32_t iterations = 0;

  do {
    // Clear and set the waitset
    rcl_wait_set_clear(&wait_set);
    
    size_t index_pong_subscription;
    rcl_wait_set_add_subscription(&wait_set, &pong_subscription, &index_pong_subscription);

    size_t index_ping_subscription;
    rcl_wait_set_add_subscription(&wait_set, &ping_subscription, &index_ping_subscription);
    
    // Run session for 100 ms
    rcl_wait(&wait_set, RCL_MS_TO_NS(100));

    // Check if it is time to send a ping
    if (iterations++ % 50 == 0) {
      // Generate a new random sequence number
      seq_no = rand();
      sprintf(msg.frame_id.data, "%d_%d", seq_no, device_id);
      msg.frame_id.size = strlen(msg.frame_id.data);
      
      // Fill the message timestamp
      clock_gettime(CLOCK_REALTIME, &ts);
      msg.stamp.sec = ts.tv_sec;
      msg.stamp.nanosec = ts.tv_nsec;

      // Reset the pong count and publish the ping message
      pong_count = 0;
      rcl_publish(&ping_publisher, (const void*)&msg, NULL);
      printf("Ping send seq %s\n", msg.frame_id.data);  
    }
    
    // Check if some pong message is received
    if (wait_set.subscriptions[index_pong_subscription]) {
      rc = rcl_take(wait_set.subscriptions[index_pong_subscription], &rcv_msg, NULL, NULL);

      if(rc == RCL_RET_OK && strcmp(msg.frame_id.data,rcv_msg.frame_id.data) == 0) {
          pong_count++;
          printf("Pong for seq %s (%d)\n", rcv_msg.frame_id.data, pong_count);
      }
    }

    // Check if some ping message is received and pong it
    if (wait_set.subscriptions[index_ping_subscription]) {
      rc = rcl_take(wait_set.subscriptions[index_ping_subscription], &rcv_msg, NULL, NULL);

      // Dont pong my own pings
      if(rc == RCL_RET_OK && strcmp(msg.frame_id.data,rcv_msg.frame_id.data) != 0){
        printf("Ping received with seq %s. Answering.\n", rcv_msg.frame_id.data);
        rcl_publish(&pong_publisher, (const void*)&rcv_msg, NULL);
      }
    }
    
    usleep(10000);
  } while (true);
}

Once the new folder is created, let’s configure our new app with a serial transport on the USB:

# Configure step
ros2 run micro_ros_setup configure_firmware.sh my_brand_new_app --transport serial-usb

When the configuring step ends, just build the firmware:

# Build step
ros2 run micro_ros_setup build_firmware.sh

Once the build has successfully ended, let’s power and connect the board. First, connect Olimex ARM-USB-TINY-H JTAG programmer to the board’s JTAG port:

Make sure that the board power supply jumper (PWR_SEL) is in the 3-4 position in order to power the board from the JTAG connector:

You should see the red LED lighting. It is time to flash the board:

# Flash step
ros2 run micro_ros_setup flash_firmware.sh

Running the micro-ROS app

The micro-ROS app is ready to connect to a micro-ROS-Agent and start talking with the rest of the ROS 2 world.

First of all, create and build a micro-ROS agent:

# Download micro-ROS-Agent packages
ros2 run micro_ros_setup create_agent_ws.sh

# Build micro-ROS-Agent packages, this may take a while.
colcon build
source install/local_setup.bash

Then connect the Olimex development board to the computer using the USB OTG 2 connector (the miniUSB connector that is furthest from the Ethernet port).

TIP: Color codes are applicable to this cable. Make sure to match Olimex Rx with Cable Tx and vice-versa. Remember GND!

Then run the agent:

# Run a micro-ROS agent
ros2 run micro_ros_agent micro_ros_agent serial --dev [device]

TIP: you can use this command to find your serial device name: ls /dev/serial/by-id/*. Probably it will be something like /dev/serial/by-id/usb-ZEPHYR_Zephyr_microROS_3536510100290035-if00

And finally, let’s check that everything is working in another command line. We are going to listen to ping topic to check whether the Ping Pong node is publishing its own pings

source /opt/ros/$ROS_DISTRO/setup.bash

# Subscribe to micro-ROS ping topic
ros2 topic echo /microROS/ping

You should see the topic messages published by the Ping Pong node every 5 seconds:

user@user:~$ ros2 topic echo /microROS/ping
stamp:
  sec: 20
  nanosec: 867000000
frame_id: '1344887256_1085377743'
---
stamp:
  sec: 25
  nanosec: 942000000
frame_id: '730417256_1085377743'
---

On another command line, let’s subscribe to the pong topic

source /opt/ros/$ROS_DISTRO/setup.bash

# Subscribe to micro-ROS pong topic
ros2 topic echo /microROS/pong

At this point, we know that our app is publishing pings. Let’s check if it also answers to someone else pings in a new command line:

source /opt/ros/$ROS_DISTRO/setup.bash

# Send a fake ping
ros2 topic pub --once /microROS/ping std_msgs/msg/Header '{frame_id: "fake_ping"}'

Now, we should see on the ping subscriber our fake ping along with the board pings:

user@user:~$ ros2 topic echo /microROS/ping
stamp:
  sec: 0
  nanosec: 0
frame_id: fake_ping
---
stamp:
  sec: 305
  nanosec: 973000000
frame_id: '451230256_1085377743'
---
stamp:
  sec: 310
  nanosec: 957000000
frame_id: '2084670932_1085377743'
---

And in the pong subscriber, we should see the board’s answer to our fake ping:

user@user:~$ ros2 topic echo /microROS/pong
stamp:
  sec: 0
  nanosec: 0
frame_id: fake_ping
---


 Improve this page

Designing a RISC-V CPU in VHDL, Part 19: Adding Trace Dump Functionality

Written by Colin Riley, an Engineer and Writer at Domipheus Labs

This part of a series of posts detailing the steps and learning undertaken to design and implement a CPU in VHDL. You can find more articles from Colin on his blog via http://labs.domipheus.com/blog/. To read more from this series, click here.

For those who follow me on twitter, you’ll have seen my recent tweets regarding Zephyr OS running on RPU. This was a huge amount of work to get running, most of it debugging on the FPGA itself. For those new to FPGA development, trying to debug on-chip can be a very difficult and frustrating experience. Generally, you want to debug in the simulator – but when potential issues are influenced by external devices such as SD cards, timer interrupts, and hundreds of millions of cycles into the boot process of an operating system – simulators may not be feasible.

Blog posts on the features I added to RPU to enable Zephyr booting, such as proper interrupts, exceptions and timers are coming – but it would not have been possible without a feature of the RPU SoC I have not yet discussed.

CPU Tracing

Most real processors will have hardware features built in, and one of the most useful low-level tools is tracing. This is when at an arbitrary time slice, low level details on the inner operation of the core are captured into some buffer, before being streamed elsewhere for analysis and state reconstruction later.

Note that this is a one-way flow of data. It is not interactive, like the debugging most developers know. It is mostly used for performance profiling but for RPU would be an ideal debugging aid.

Requirements

For the avoidance of doubt; I’m defining “A Trace” to be one block of valid data which is dumped to a host PC for analysis. For us, dumping will be streaming the data out via UART to a development PC. Multiple traces can be taken, but when the data transfer is initiated, the data needs to be a real representation of what occurred immediately preceding the request to dump the trace. The data contained in a trace is always being captured on the device in order that if a request is made, the data is available.

These requirements require a circular buffer which is continually recording the state. I’ll define exactly what the data is later – but for now, the data is defined as 64-bits per cycle. Plenty for a significant amount of state to be recorded, which will be required in order to perform meaningful analysis. We have a good amount of block rams on our Spartan 7-50 FPGA, so we can dedicate 32KB to this circular buffer quite easily. 64-bits into 32KB gives us 4,096 cycles of data. Not that much you’d think for a CPU running at over 100MHz, but you’d be surprised how quickly RPU falls over when it gets into an invalid state!

It goes without saying that our implementation needs to be non-intrusive. I’m not currently using the UART connected to the FTDI USB controller, as our logging output is displayed graphically via a text-mode display over HDMI. We can use this without impacting existing code. Our CPU core will expose a debug trace bus signal, which will be the data captured.

We’ve mentioned the buffer will be in a block ram; but one aspect of this is that we must be wary of the observer effect. This issue is very much an issue for performance profiling, as streaming out data from various devices usually goes through memory subsystems which will increase bandwidth requirements, and lead to more latency in the memory operations you are trying to trace. Our trace system should not effect the execution characteristics of the core at all. As we are using a development PC to receive the streamed data, we can completely segregate all data paths for the trace system, and remove the block ram from the memory mapped area which is currently used for code and data. With this block ram separate, we can ensure it’s set up as a true dual port ram with data width the native 64bit. One port will be for writing data from the CPU, on the CPU clock domain. The second port will be used for reading the data out at a rate which is dictated by the UART serial baud – much, much, slower. Doing this will ensure tracing will not impact execution of the core at any point, meaning our dumped data is much more valuable.

Lastly, we want to trigger these dumps at a point in time when we think an issue has occurred. Two immediate trigger types come to mind in addition to a manual button.

  1. Memory address
  2. Comparison with the data which is to be dumped; i.e, pipeline status flags combined with instruction types.

Implementation

The implementation is very simple. I’ve added a debug signal output to the CPU core entity. It’s 64 bits of data consisting of 32 bits of status bits, and a 32-bit data value as defined below.

This data is always being output by the core, changing every cycle. The data value can be various things; the PC when in a STAGE_FETCH state, the ALU result, the value we’re writing to rD in WRITEBACK, or a memory location during a load/store.

We only need two new processes for the system:

  • trace_streamout: manages the streaming out of bytes from the trace block ram
  • trace_en_check: inspects trigger conditions in order to initiate a trace dump which trace_streamout will handle

The BRAM used as the circular trace buffer is configured as 64-bits word length, with 4096 addresses. It was created using the Block Memory Generator, and has a read latency of 2 cycles.

We will use a clock cycle counter which already exists for dictating write locations into the BRAM. As it’s used as a circular buffer, we simply take the lower 12 bits of the clock counter as address into the BRAM.

Port A of the BRAM is the write port, with it’s address line tied to the bits noted above. It is enabled by a signal only when the trace_streamout process is idle. This is so when we do stream out the data we want, it’s not polluted with new data while our slow streamout to UART is active. That new data is effectively lost. As this port captures the cpu core O_DBG output, it’s clocked at the CPU core clock.

Port B is the read port. It’s clocked using the 100MHz reference clock (which also drives the UART – albeit then subsampled via a baud tick). It’s enabled when a streamout state is requested, and reads an address dictated by the trace_streamout process.

The trace_streamout process, when the current streamout state is idle, checks for a dump_enable signal. Upon seeing this signal, the last write address is latched from the lower cycle counter 12 bits. We also set a streamout location to be that last write +1. This location is what is fed into Port B of the BRAM/ circular trace buffer. When we change the read address on port B, we wait some cycles for the value to properly propagate out. During this preload stall, we also wait for the UART TX to become ready for more data. The transmission is performed significantly slower than the clock that trace_streamout runs at, and we cannot write to the TX buffer if it’s full.

The UART I’m using is provided by Xilinx and has an internal 16-byte buffer. We wait for a ready signal as then we know that writing our 8 bytes of debug data (remember, 64-bit) quickly into the UART TX will succeed. In addition to the 8 bytes of data, I also send 2 bytes of magic number data at the start of every 64-bit packet as an aid to the receiving logic; we can check the first two bytes for these values to ensure we’re synced correctly in order to parse the data eventually.

After the last byte is written, we increment our streamout location address. If it’s not equal to the last write address we latched previously, we move to the preload stall and move the next 8 bytes of trace data out. Otherwise, we are finished transmitting the entire trace buffer, so set out state back to idle and re-enable new trace data writes.

Triggering streamout

Triggering a dump using dump_enable can be done a variety of ways. I have a physical push-button on my Arty S7 board set to always enable a dump, which is useful to know where execution currently is in a program. I have also got a trigger on reading a certain memory address. This is good if there is an issue triggering an error which you can reliably track to a branch of code execution. Having a memory address in that code branch used as trigger will dump the cycles leading up to that branch being taken. There are one other types of trigger – relying on the cpu O_DBG signal itself, for example, triggering a dump when we encounter an decoder interrupt for an invalid instruction.

I hard-code these triggers in the VHDL currently, but it’s feasible that these can be configurable programmatically. The dump itself could also be triggered via a write to a specific MMIO location.

Parsing the data on the Debug PC

The UART TX on the FPGA is connected to the FTDI USB-UART bridge, which means when the FPGA design is active and the board is connected via USB, we can just open the COM port exposed via the USB device.

I made a simple C# command line utility which just dumps the packets in a readable form. It looks like this:

12345678910[22:54:19.6133781]Trace Packet, 00000054,  0xC3 40 ,   OPCODE_BRANCH ,     STAGE_FETCH , 0x000008EC INT_EN , :[22:54:19.6143787]Trace Packet, 00000055,  0xD1 40 ,   OPCODE_BRANCH ,    STAGE_DECODE , 0x04C12083 INT_EN , :[22:54:19.6153795]Trace Packet, 00000056,  0xE1 40 ,     OPCODE_LOAD ,       STAGE_ALU , 0x00000001 INT_EN , :[22:54:19.6163794]Trace Packet, 00000057,  0xF1 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6183798]Trace Packet, 00000058,  0x01 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6183798]Trace Packet, 00000059,  0x11 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6193799]Trace Packet, 00000060,  0x20 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6203802]Trace Packet, 00000061,  0x31 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6213808]Trace Packet, 00000062,  0x43 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6213808]Trace Packet, 00000063,  0x51 C0 ,     OPCODE_LOAD , STAGE_WRITEBACK , 0x00001CDC REG_WR  INT_EN , :

You can see some data given by the utility such as timestamps and a packet ID. Everything else is derived from flags in the trace data for that cycle.

Later I added some additional functionality, like parsing register destinations and outputting known register/memory values to aid when going over the output.

1234[22:54:19.6213808]Trace Packet, 00000062,  0x43 C0 ,     OPCODE_LOAD ,    STAGE_MEMORY , 0x0000476C REG_WR  INT_EN , :[22:54:19.6213808]Trace Packet, 00000063,  0x51 C0 ,     OPCODE_LOAD , STAGE_WRITEBACK , 0x00001CDC REG_WR  INT_EN , :MEMORY 0x0000476C = 0x00001CDCREGISTER ra = 0x00001CDC

I have also been working on a rust-based GUI debugger for these trace files, where you can look at known memory (usually the stack) and register file contents at a given packet by walking the packets up until the point you’re interested in. It was an excuse to get to know Rust a bit more, but it’s not completely functional and I use the command line C# version more.

The easiest use for this is the physical button for dumping the traces. When bringing up some new software on the SoC it rarely works first time and end up in an infinite loop of some sort. Using the STAGE_FETCH packets which contain the PC I can look to an objdump and see immediately where we are executing without impacting upon the execution of the code itself.

Using the data to debug issues

Now to spoil a bit of the upcoming RPU Interrupts/Zephyr post with an example of how these traces have helped me. But I think an example of a real problem the trace dumps helped solve is required.

After implementing external timer interrupts, invalid instruction interrupts, system calls – and fixed a ton of issues – I had the Zephyr Dining Philosophers sample running on RPU in all it’s threaded, synchronized, glory.

Why do I need invalid instruction interrupts? Because RPU does not implement the M RISC-V extension. So multiply and divide hardware does not exist. Sadly, somewhere in the Zephyr build system, there is assembly with mul and div instructions. I needed invalid instruction interrupts in order to trap into an exception handler which could software emulate the instruction, write the result back into the context, so that when we returned from the interrupt to PC+4 the new value for the destination register would be written back.

It’s pretty funny to think that for me, implementing that was easier than trying to fix a build system to compile for the architecture intended.

Anyway, I was performing long-running tests of dining philosophers, when I hit the fatal error exception handler for trying to emulate an instruction it didn’t understand. I was able to replicate it, but it could take hours of running before it happened. The biggest issue? The instruction we were trying to emulate was at PC 0x00000010 – the start of the exception handler!

So, I set up the CPU trace trigger to activate on the instruction that branches to print that “FATAL: Reg is bad” message, started the FPGA running, and left the C# app to capture any trace dumps. After a few hours the issue occurred, and we had our CPU trace of the 4096 cycles leading up to the fatal error. Some hundreds of cycles before the dump initiated, we have the following output.

What on earth is happening here? This is a lesson as to why interrupts have priorities 🙂

I’ve tried to reduce the trace down to minimum and lay it out so it makes sense. There are a few things you need to know about the RPU exception system which have yet to be discussed:

Each Core has a Local Interrupt Controller (LINT) which can accept interrupts at any stage of execution, provide the ACK signal to let the requester know it’s been accepted, and then at a safe point pass it on to the Control Unit to initiate transfer of execution to the exception vector. This transfer can only happen after a writeback, hence the STALL stages as it’s set up before fetching the first instruction of the exception vector at 0x00000010. If the LINT sees external interrupts requests (EXT_INT – timer interrupts) at the same time as decoder interrupts for invalid instruction, it will always choose the decoder above anything – as that needs immediately handled.

And here is what happens above:

  1. We are fetching PC 0x00000328, which happens to be an unsupported instruction which will be emulated by our invalid instruction handler.
  2. As we are fetching, and external timer interrupt fires (Packet 01)
  3. The LINT acknoledges the external interrupt as there is no higher priority request pending, and signals to the control unit an int is pending LINT_INT (Packet 2)
  4. As we wait for the WRITEBACK phase for the control unit to transfer to exception vector, PC 0x00000328 decodes as an illegal instruction and DECODER_INT is requested (Packet 5)
  5. LINT cannot acknowledge the decoder int as the control unit can only handle a single interrupt at a time, and its waiting to handle the external interrupt.
  6. The control unit accepts the external LINT_INT, and stalls for transfer to exception vector, and resets LINT so it can accept new requests (Packet 7).
  7. We start fetching the interrupt vector 0x00000010 (Packet 12)
  8. The LINT sees the DECODE_INT and immediately accepts and acknowledges.
  9. The control unit accepts the LINT_INT, stalls for transfer to exception vector, with the PC of the exception being set to 0x00000010 (Packet 20).
  10. Everything breaks, the PC get set to a value in flux, which just so happened to be in the exception vector (Packet 25).

In short, if an external interrupt fires during the fetch stage of an illegal instruction, the illegal instruction will not be handled correctly and state is corrupted.

Easily fixed with some further enable logic for external interrupts to only be accepted after fetch and decode. But one hell is an issue to find without the CPU trace dumps!

Finishing up

So, as you can see, trace dumps are an great feature to have in RPU. A very simple implementation can yield enough information to work with on problems where the simulator just is not viable. With different trigger options, and the ability to customize the O_DBG signal to further narrow down issues under investigation, it’s invaluable. In fact, I’ll probably end up putting this system into any similarly complex FPGA project in the future. The HDL will shortly be submitted to the SoC github repo along with the updated core which supports interrupts.

Microwatt and the POWER ISA support in Renode

This blog originally ran on the Antmicro website. For more blogs and articles like this one, visit https://antmicro.com/blog/.

In August 2019, the IBM-initiated OpenPOWER Foundation open sourced the POWER Instruction Set Architecture (ISA), making it a second major open computer architecture after RISC-V. The decision to open the ISA has created an interesting alternative to proprietary solutions in the server room, especially where security and openness are key, as POWER, with its mainline software support and open firmware approach, had been an established solution in data centers for years.

To start building an open ecosystem around this ISA early, IBM immediately followed up the announcement with an open source softcore implementation of POWER: the Microwatt. For its relatively short existence, Microwatt has found its way into many other open source projects, reflecting the community’s excitement with “RISC-V’s older brother”.

OpenPower and Renode logos

At Antmicro we are always looking for opportunities to build bridges between various open source hardware and software communities and projects, so the obvious choice was to implement support for the 64-bit PowerPC (a subset of the POWER ISA) instructions in Renode – our open source, multi-architecture, heterogeneous multi-core capable simulator for software development and software-hardware co-development. Thus, POWER has become the second major open source ISA in Renode’s portfolio after RISC-V. As a result, users are now able to simulate POWER-based nodes in a heterogenous, complex environment using the powerful debugging and testing capabilities of our framework.

Apart from supporting processors based on the POWER architecture, Renode can now also emulate basic peripherals and platform descriptions for Microwatt. To run the Microwatt demo, all you need to do is install Renode for your OS as described in the README, run it and use the following command: start @scripts/single-node/microwatt.resc. The demo contains everything you need, including the sample binary (which you can later exchange for your own using one command). For a full list of demos with “batteries included”, see the “supported boards” section of our documentation.

MicroWatt demo

Although the demo uses a MicroPython binary, Microwatt also has experimental support for the ZephyrRTOS – and since very recently it can run Linux as well! Furthermore, a Chisel version is in the works, leveraging the advantages of the Scala-based HDL framework to simplify and parametrize the code of the POWER core, preparing it to address more scenarios and use cases. It has also recently been integrated with LiteX, an SoC generator we work with and contribute to very frequently – so it’s quickly becoming a part of Antmicro’s standard swiss-army-knife toolset of open tools, IPs, hardware and software for FPGAs.

In the broader context, implementing POWER support in Renode means that developers can now test their applications based on this ISA before running them on actual hardware, experiment with their complete POWER-based SoC implementations before committing to RTL, or co-simulate between ISS and an FPGA or Verilator. IBM’s Microwatt is the first open source implementation of the ISA but the community is now expecting more activity in this area from the OpenPOWER Foundation and more open source CPU releases in the future. Our simulation framework is now ready for those developments and we at Antmicro will be watching this space with a lot of interest as well.

If you’re interested in creating an open FPGA design using the POWER ISA, building a custom POWER based product, or accelerating your workflow with simulation and CI, reach out to us at contact@antmicro.com – we are sure we can help you.

Bluetooth Mesh Developer Study Guide v2.0

Written by Martin Woolley, Senior Developer Relations Manager, Bluetooth Developer Relations Team

This blog originally ran on the Bluetooth website. For more content like this, visit the Bluetooth SIG blog.

The Bluetooth® mesh specification was adopted in the summer of 2017 and has already been qualified in almost five hundred products.

Bluetooth mesh allows networks of tens of thousands of Bluetooth devices to be created so that, for example, every device and system in a large building can be monitored, controlled, and participate in automation scenarios.

To help developers learn about Bluetooth mesh networking, we created the Bluetooth Mesh Developer Study Guide. Study guides are self-paced educational resources which cover both the theory and practical steps involved in developing Bluetooth software.

Version 2.0 of the Bluetooth Mesh Developer Study Guide has been released.

Version 2.0 Highlights

The Bluetooth Mesh Developer Study Guide uses the Zephyr RTOS for coding exercises and to illustrate what tends to be involved when developers implement Bluetooth mesh. See https://www.zephyrproject.org/.

Bluetooth Mesh Developer Study Guide 2.0 has been upgraded to use version 1.14.1 of Zephyr, which has Long Term Support (LTS) status and, most importantly, includes a qualified version of the Bluetooth mesh profile. See https://launchstudio.bluetooth.com/ListingDetails/95153.

One of the advantages of using Zephyr is that hundreds of developer boards are supported by the OS and SDK. Users of Bluetooth Mesh Developer Study Guide V2.0 are now free to choose the boards they prefer to use, although we do provide a bill of materials which reflects the equipment which was used in creating and testing the exercises.

The use cases and Bluetooth mesh models covered now include switching lights on or off using the generic on off mesh models and changing the colour of lights using the light HSL mesh models. If you don’t know what a mesh model is, don’t worry. It’s all explained in the theory part of the study guide!

Provisioning is covered in detail in this new release. Guided at every step, developers implement the device code which makes it possible to securely provision their mesh node using a suitable smartphone application.

The Bluetooth Proxy Study Guide

Web, desktop, and mobile application developers who want to know how to create GUI applications with which to monitor or control devices in a Bluetooth mesh network should download the companion resource, the Bluetooth Mesh Proxy Study Guide.

The Bluetooth Mesh Developer Study Guide is available for download from the Resources section of the bluetooth.com web site.

Open hardware FPGA platform for multi-camera systems

This blog originally ran on the Antmicro website. For more blogs and articles like this one, visit https://antmicro.com/blog/.

Advanced multi-camera systems often require the low latency, high bandwidth and energy-efficiency that FPGA solutions can provide.

The deep control over hardware and software working in tandem offered by FPGAs can be a great fit for applications such as real-time object detection and tracking, signal conversion, stereovision, as well as image compression, overlays and ISP processing.

The challenge with FPGA development however is that much of the ecosystem revolves around proprietary, vendor-specific technologies, platforms and tools. At Antmicro, we are working to change that and push FPGAs towards a more open source and software-driven approach, and with that, more widespread adoption, one step at a time.

Zynq Video Board – a carrier board not only for Zynq

Antmicro’s Zynq Video Board is an open hardware carrier board we released on GitHub to drive high-end video applications. Despite the Zynq-centric name, the board supports all current – and future – Mars FPGA and FPGA SoC modules offered by our partner Enclustra.

ZVB front

We have used the Zynq Video Board and many similar platforms to build advanced custom video streaming and processing products for our clients, including gate logic and corresponding custom Zephyr and Linux drivers. We released the Zynq Video Board on a permissive license to push towards an open source ecosystem that can make high-end camera projects simpler to kickstart.

The board joins a long list of open source PCB contributions from Antmicro, such as our NVIDIA Jetson Nano / Xavier NX baseboardGoogle Coral baseboard, or our own chiplet-based GEM ASIC development platform .

Board features

The default module supported by the board is the Mars ZX3, with the popular Zynq-7020 FPGA SoC. It’s a flexible combination of an ARM dual-core Cortex-A9 processor and 85K Artix-7 programmable logic cells.

The Zynq Video Board break-routes a typical set of I/O interfaces from the Mars module:

  • Gigabit Ethernet
  • microSD
  • USB 2.0 host with on-board hub with two regular USB connectors for downstream ports
  • JTAG connector supporting Xilinx Platform Cable
  • two Digilent Pmod connectors for external accessories
  • eight general purpose LEDs and 4 buttons for debugging and testing
ZVB overview

The board also exposes a debug Micro USB providing access to a quad-channel USB/serial converter – FTDI FT4232H. One of the channels is dedicated to controlling and managing the operating system running on the ARM core inside the Zynq-7020 using a serial debug console, whereas the other three channels are connected to the FPGA fabric inside the Zynq, enabling a serial console and a JTAG to be interfaced with the soft-cores in the FPGA. As all the debug interfaces are available to the host PC platform over a single USB connection, software debugging gets hugely simplified.

A 50-pin FFC connector with a unified pin-out present on the Zynq Video Board matches a variety of video accessories designed by Antmicro, including the OV9281 Camera Board, recently released as an open hardware design. In its current configuration, the camera interface allows connecting up to two video sources, with signals transmitted over a 2-lane MIPI CSI-2. The CSI data lanes are connected to the differential I/Os of the Mars module through a dedicated resistor net implementing a D-PHY interface. It enables receiving CSI signals from IP cores in the FPGA.
A standard HDMI connector break-routes a set of Mars I/O differential signals and ensures ESD protection and signal conditioning. By default, the HDMI interface connector allows implementing HDMI output from IP cores in the FPGA.

FPGA-oriented services and use cases

Antmicro’s open hardware Zynq Video Board is a flexible evaluation platform suitable for a variety of applications, such as machine vision and the development of advanced multi-core and heterogeneous processing systems, data encoding and advanced real-time control tasks.

Our FPGA-related services focus on creating custom hardware, FPGA gateware and software for advanced products our customers want to build, and the ZVB is ideal as a starting point.

If an Artix-7 FPGA is used in place of the Zynq, we can offer designs based on a configurable soft RISC-V SoC builder which can run Zephyr or Linux, with relevant I/O drivers contributed by Antmicro.

ZVB setup

For Zynq/UltraScale+, we have created the Enclustra Build Environment, a configurable buildsystem for all Enclustra FPGA SoC modules – and offer Linux customization services including interfacing hardware and software via dedicated Linux drivers. We have also built many asynchronous multi-processing (AMP) solutions for Zynq and UltraScale+ for use cases which need to combine real time operation with complex Linux applications.

Our experience and use of software-driven, open source-based methodologies in the FPGA area makes us uniquely positioned to assist our customers in completing advanced projects involving machine vision and other industrial IoT applications. If you are about to start one, do not hesitate to reach out to us at contact@antmicro.com.

Zephyr Project marks remarkable milestone, proving we’re on to something big

John Round, Software R&D Fellow, NXP Semiconductors, and Zephyr Project Governing Board Member

At the end of April, the Zephyr Project passed a huge and significant milestone, hitting 40K commits to the project since its inception. There can be no greater sign of support for a community driven project than to have global developers invest their time and effort to both collaborate and contribute to the mission of delivering a truly open source and open governance RTOS. We’re on to something big here!

In just a few short years, the Zephyr Project has grown to leverage the collective expertise of several hundred developers intent on delivering an SoC-neutral, class-leading, commercially deployable RTOS platform. The Zephyr RTOS is designed for use in any embedded application where configurability, modularity, and high functionality ‘out of the box’ are required. For example, current Zephyr RTOS capabilities include an LTS release along with support for Bluetooth (4.2 and 5.0) and USB, further demonstrating that Zephyr is not ‘just another’ RTOS kernel.

As we look to the future of the Zephyr Project, we see literally endless potential. Active project committees are researching, planning and driving developments in areas such as security and functional safety, and ultimately expanding its reach to an ever wider range of end markets.

I have been on the Zephyr Project Board for a few years now and can attest that the sustained commitment and energy from the member companies is quite incredible. We are fortunate to have the guidance and assistance of the Linux Foundation who ensure both open governance and high integrity within the management of the project, holding all members accountable to a high standard. Many software projects and their providers claim to be open source, but as we all know that can be a claim of convenience. Without a commitment to a strict open governance structure that drives a fundamental difference in the quality, longevity and inclusion of the project, it falls well short of being truly open source.

Participating in the creation and development of the Zephyr Project will be a legacy that all involved should be rightly proud of. For years to come, we will point to next generation products enabled with the Zephyr RTOS, the truly open source, openly auditable, high functionality RTOS platform that supports all major SoC architectures and delivers security and high functionality for customers.

Congratulations to the teams and community involved in the delivery of such a significant milestone, and I look forward to seeing the expansion of our thriving developer community in the future.  Let’s keep the momentum going.

If you are interested in contributing to the Zephyr Project please see our Contributor Guide or our Getting started Guide here. Join the conversation or ask questions on our Slack channel or Mailing List.

Can Smart PPE help fighting COVID-19 by providing Worker Contact Tracing and Social Distancing?

Written by Mathieu Destrian, CEO of Intellinium

This blog originally ran on the Intellinium website. For more content like this, visit https://intellinium.io/news/.

Intellinium, in collaboration with BIANCO (a French construction company), is currently working on a new smart PPE feature (based on Intellinium Safety Pods technology that is powered by Nordic Semiconductor and Zephyr RTOS) that would help construction workers to be protected against the COVID-19 pandemic coronavirus. The first tests on the field will start this week.

Safety Pod attached to shoes or boots enabling smart PPE features

The principle is to alert a worker through a vibrating signal on his smart safety shoes when he’s at risk based on the social distancing (someone is nearby at a distance of about 1 to 3 m). When the worker gets this signal, he has 2 choices. Either he can put a mask (FFP2 mask as standardized by AFNOR for example) or he can move away from the other worker(s) nearby. Based on workers feedbacks, we have found that wearing FFP2 mask during a full working day is not possible. Another impractical solution would be to add to the hard hat a face protective layer that would be worn the full day.

It would also be possible to tell a given worker whether he has been exposed to someone (anonymously) suffering from the COVID-19 virus during the last 2 weeks (which is the state-of-the-art knowledge about incubation time). It will totally depend on a volontary-based declaration from infected worker. Based on that information, the worker can increase his alertness about potential COVID-19 symptoms such as fever (but we know that the temperature is not the only symptom).

For employer (public or private sector), the benefit is to ensure that worker protection is effective even though workers are again at work after lock-down period. It’s also important for employers to provide innovative tools to fight against this pandemic crisis. And smart PPEs can really be helpful in this situation as they bring smart and connected algorithms that traditionnal PPE cannot bring.

For the community, it’s also critical as lock-down period kills economies as the virus kills people. The human and financial burden is terrible and both issues shall be addressed simultaneously if possible. Thousands, if not millions of companies will probably go bankrupt and it’s critical to keep on working while protecting workers. Unemployment was already a problem before the crisis and we must ensure that the unemployment level does not reach a certain threshold that could add political turmoil to the current crisis. Europe knows better than anyone else what could be the human price to pay for extreme social & political crisis and it’s much scarier than the COVID-19…

Data Privacy Protection has always been top priority for Intellinium, and COVID-19 innovative solution cannot compromise on it (the solution shall also meet requirements of European GDPR law). Therefore, this COVID-19 feature has been developped with a privacy-by-design and security-by-design paradigms. For instance, there is no location or end-user name associated with this feature, and all data are totally and defintely anonymised after 1 month. Each worker must give his explicit consent. For those wishing not to participate, an anonymous ID is available which can be associated to a specific team within a given organisation.

There are some public initiaves such as the Stop Covid mobile app on smartphones (based on an EU coalition of tech organisations backing a ‘privacy-preserving’ standard for COVID-19 contacts tracing called PEPP-PT or the ROBERT protocol ?) or even worse with Google & Apple contact-tracing (…). Unfortunately, there are several problems with this initiative (and all other mobile app based COVID-19 applications):

  1. The solution is based on Bluetooth Low Energy (so you need a BLE-enabled smartphone) and of course you need a Smartphone with the right Android or iOS version. Does anyone know how many smartphone fits these conditions in France?
  2. What about foreigners? If someone enters in France and does not have the mobile app, then the solutions is pretty useless because it won’t help protecting anyone. The person using this mobile app will feel safe by reading the results every day but she might have been infected by someone not using the app.
  3. As BLE will probably be always-on, the smarthone battery might be severely impacted. Will citizen be informed about that?
  4. What about the data plan? Who is going to pay for extra data charge? Will it be supported by citizens? By the way, how much data transfer is required/expected for a week?
  5. A huge database will be potentially created, potentially on millions of citizens. It might attract hackers and be a potential threat for privacy rights. The problem is not really gathering data neither ensuring the system is safe but it’s about creating in a couple of days a unique app gathering millions of sensitive data.
  6. Does anyone know about the BLE advertising packet encryption? Said differently, is there a way for a hacker to rogue N-mobile app ID ? If it’s possible to fake an ID, then the cure would be worse than the disease.
  7. Another potential issue is that BLE distance measuring is far from being 10 cm accurate… Especially when you cannot ensure that the device is always worn the same way on the same place of the human body (BLE doesn’t like water which body is mainly made of). So, can we trust a BLE-based distance application on a mobile phone to reach 1 to 3 m detection range? Not sure.
  8. Last but not the least, some mobile app use heat or some questions to assess whether you are potentially infected. Those apps should be forbidden as there are not only dangerous to people anxious by nature but totally inefficient as infected people can be asymptomatic (no external symptoms). False positive rate can break the trust of the overall system.

There are other alternatives based on BLE wearables and we believe that those products can better fit the requirements.

The main advantages of our approach are:

  1. We leverage an existing device designed to resist the harsh working environment.
  2. We offer not only one feature but a full safety bundle through an all-in-1 smart PPE (Panic Button, Man-Down…).
  3. We encrypt our BLE advertising messages for message integrity and authentication. That way, no one can break the trust.
  4. Our Safety Pod is attached to (safety) shoes or boots. That way, you ensure that the worker alway wears his protective device and for the worker he’s confident about not forgetting his protection. As we know where the device is, it’s slightly easier to ensure distance measurement (but of course far from being 1 cm range).
  5. As a first safety barrier reminder, we immediately notify the end-user each time the social distancing is not respected. Vibration is a discrete and hidden way to alert someone. Getting vibration on the shoe ensure that you can feel them.
  6. We also provide a simple way for the end-user to acknowledge the social distancing signal, to deactivate temporarily the system for instance to indicate that he has the right virus protective equipment (mask…)

Although our safety pod only focuses on construction or industrial site, it might also be applied to warehouse to help protection workers in a very tight area (this problem is a real concern for Amazon for instance). It could also be used by senior and vulnerable adults exposed to COVID-19. The main issue for us is now to scale our production when lead-time and production capacity are under pression everywhere…

Introduction of Coding Guidelines for Zephyr RTOS

Written by Amber Mary Hibberd, PhD. Software Engineering Manager at Intel Corporation and member of Zephyr Project Technical Steering Committee

The Zephyr Project consistently develops with the integrity of the code in mind, as highlighted by many of the safety and security initiatives ongoing in the Project. We are excited to announce that we are taking the next step towards improving the robustness of our codebase with the introduction of official Zephyr Project Coding Guidelines. (This will complement the existing Coding Style Guidelines).

For the past several months, the Zephyr Project Safety Working Group, in collaboration with the Technical Steering Committee, has been busy defining a set of rules that are relevant to our code, and are intended to increase reliability, readability, and maintainability, as well as avoid undefined behavior. We surveyed nearly 300 published rules from existing coding standards such as MISRA C:2012, SEI Cert C, and JPL. These guidelines have been referenced for decades to minimize systematic fault in safety-critical systems such as robotic spacecraft developed for NASA.1 The proposed guidelines are expected to be ratified by the Zephyr Technical Steering Committee by the end of the month, and will be published as part of the Project collaboration guidelines.

Many of the rules in the proposed Zephyr Coding Guideline are also safety-specific requirements. For example, mandating that all code be traceable to documented requirements. This means establishing the traceability from functional requirements, to implementation, to test cases and test results. For pre-existing software, we have the added challenge of retroactively defining functional requirements to cover where there are gaps, and then establish the traceability linking requirements to tests. Intel architect, Anas Nashif, has developed a methodology to achieve this required tracing, and to ultimately demonstrate 100% coverage, using tooling that is free and widely available – staying true to open source philosophy.

We hope adhering to a rigorous coding guideline will increase Zephyr community and customer adoption, especially with customers that require safety compliant code for their applications. As announced last year, the Zephyr Project is working towards achieving SIL 3 (SC3) per IEC 61508. Our finalized architecture scope for our initial certification can be seen in Figure 1 below. We are trending towards basing our certified code base off of our next LTS release in Spring 2021. Safety certification presents challenging goals for open source software, and as we prioritize quality and the stringency with which we will deliver “safe” code over timelines, the specific release is a moving target. Stay tuned for future updates. 


Figure 1: Indicated in green (components with thicker borders) is the architecture scope for the Zephyr Project initial safety certification.
 
https://trs.jpl.nasa.gov/bitstream/handle/2014/43875/11-2798_A1b.pdf?sequence=1

You can find the Zephyr Getting started Guide here.  If you are interested in contributing to the Zephyr Project please see our Contributor Guide. Join the conversation or ask questions on our Slack channel or Mailing List.

Security Fixes for IoT Products

Written by Kate Stewart, Senior Director of Strategic Programs at the Linux Foundation

“Device makers, especially consumer-focused ones, have been the Achilles’ heel of IoT security. These vendors have often viewed proper security implementations as extra cost, complexity, and time-to-market burdens with an unclear payoff.” – Maciej Kranz *

“New security loopholes are constantly popping up because of wireless networking. The cat-and-mouse game between hackers and system administrators is still in full swing.” — Kevin Mitnick **

When the Zephyr project  launched  in February of 2016, the security committee was one of the first working groups formed. We wanted Zephyr to be used in IoT products, and knew that we were not always going to be able to anticipate all the vulnerabilities that would be discovered over time in such a dynamic environment for each release, let alone after a release was out. To help Zephyr become a  trusted RTOS to use in products, the security committee has been actively looking for best practices so we can be a responsible upstream for product makers to depend on.

One of the early inspirations for the projects came for the Linux Foundation’s (LF) Core Infrastructure Initiative (CII) Best Practices badging program. This program focused on identification of best practices for open source software (OSS) projects to follow for production of Open Source Software; was based on practices observed in a well-run OSS projects; would increase the likelihood of better quality & security, and the criteria was designed so that any OSS project could qualify. Challenge accepted! Zephyr’s security committee worked towards the identified goals, and achieved passing status and then silver status, and then the very rare gold status over the last few years.

One of the widespread accepted ways of communicating vulnerability information about products is through use of Common Vulnerability and Exposure (CVE) numbers, in the U.S. National Vulnerability Database (NVD). Since the Zephyr Project is under a neutral governance, it didn’t make sense to rely on any specific company to triage potential vulnerabilities. The security team decided that it would be best for the Zephyr Project to become a CVE Numbering Authority (CNA), and directly manage vulnerabilities that were detected and reported to the project directly rather than relying on member company security incident response teams (PSIRTs). To further this goal, the project applied to become a CNA in 2017, and has been one ever since.

Last year, we released the first Zephyr Long Term Support 14.0. With this release, the project committed to do backports of significant bug and security fixes for the next two years in this code base, together with the main development branch, and latest release. Since then,   community members have reached out with issues they’ve found. We are very appreciative of their efforts to help us improve Zephyr! When appropriate, the Zephyr project has issued CVEs to track these vulnerabilities, and ensured that they are documented in the release notes. With the recent LTS release of Zephyr 1.14.2 last week, there were some CVE numbers referenced that were still under embargo at that point. Once the embargo period ends, the release information will be updated to provide further details, as will the associated records with those CVE numbers in the NVD.

In 2018, we started seeing evidence of products based on Zephyr appearing in the marketplace including:

 As we move into 2020, we are hearing (through word of mouth) and the community about new products every day. While we are able to share embargo information with our project members who are part of the security team, we have lacked the ability to reach out to the product makers and let them know when a vulnerability had been fixed. As a sign of the commitment of the Zephyr project to product makers, we created a form where product makers, who are not currently members of the Zephyr project, can also request to get notifications of vulnerabilities that may impact their products during the embargo window. To be eligible to receive vulnerability notification during the embargo window, a product maker must have a publicly available product based on Zephyr that is posted on their web site and provide valid contact information. Zephyr project member who participate in the security committee receive this information already.

To learn more about the Zephyr security program for product makers see:

To understand more about how the Zephyr security team handles vulnerabilities see:

https://docs.zephyrproject.org/latest/security/security-overview.html

If you’re a product maker who meets the criteria and would like to apply to be notified of vulnerabilities during the embargo period, please fill out the form at:

* https://www.brainyquote.com/quotes/maciej_kranz_898966

** https://www.inspiringquotes.us/quotes/zrdP_oXdCwbsY