Skip to main content
Planet Zephyr

Exploring printf on Cortex-M

By September 19, 2023No Comments

The C printf function is a staple of embedded development. It’s a simple way
to get logs or debug statements off the system and into a terminal on the host.
This article explores the various ways to get printf on Cortex-M

Table of Contents


Ah, the first and last of debugging techniques!

Let’s first explore what happens when printf runs. Here’s a simplified flowchart
showing what happens when printf is called:

blockdiag { A [label = “printf”]; B [label =
“vprintf”]; C [label = “_write syscall”]; A -> B -> C; }



_write syscall

Going call by call:

  1. printf is called by the application, for example:

    printf("hello world! today is %d-%d-%dn", 2021, 2, 16);
  2. printf calls vprintf, the variadic version of printf (simplifying a bit
    for this example). vprintf processes the format string and consumes the
    “variadic args” (here the date components), and then calls _write to output
    the formatted data.

  3. _write is a libc function responsible for writing bytes to a file
    descriptor. It’s called with the file descriptor for stdout (1), and a
    pointer to the formatted string. It executes the necessary system call to
    write the data to the appropriate file descriptor. On hosted platforms (eg
    Linux, macOS, Windows), this is usually a write syscall. On embedded
    platforms, this might be a call to a UART driver.

Using printf in a bare-metal application

On unhosted platforms (eg bare-metal or minimal RTOS), you may not have built-in
support for printf/scanf/fprintf. Fear not, you can get some printf in
your life by using newlib-nano’s support:

# add these to your linker command line
LDFLAGS += --specs=nano.specs --specs=nosys.specs

That adds libc and “nosys” support to your application, which provide the
implementations for the various libc functions like printf, sprintf, etc.
This means you can add printf calls and your application will link! But we’re
not quite done.

Redirecting _write

_write() is the function eventually called by printf/fprintf responsible
for emitting the bytes formatted by those functions to the appropriate file

If you want to redirect the output of your printf/fprintf calls, you might
want to implement your own copy of _write(). By default, newlib libc doesn’t
provide an implementation for _write(): you might see an error like this when

arm-none-eabi/lib/thumb/v7e-m/nofp/libg_nano.a(libc_a-writer.o): in function `_write_r':
newlib/libc/reent/writer.c:49: warning: _write is not implemented and will always fail

See a thorough explanation here.

The minimal implementation might look like this:

// stderr output is written via the 'uart_write' function
#include   // for size_t
#include   // for STDERR_FILENO
int _write (int file, const void * ptr, size_t len) {
  if (file == STDERR_FILENO) {
    uart_write(ptr, len);

  // return the number of bytes passed (all of them)
  return len;

Note that the libc setbuf API controls I/O buffering for newlib. By default
_write will be called for every byte. You might want to instead buffer up to
line ends, by:

setvbuf(stout, NULL, _IOLBF, 0);

See man setvbuf or for


Fancy word for if a function can be safely premepted and called simultaneously
from multiple (interrupt) contexts.

Some libc functions in newlib share a context structure that contains global
data structures. If you want to safely use them in multiple threads, you have a
couple of options:

  • only use these functions in one thread in your program
  • disable interrupts when using non-reentrant functions
  • only use the reentrant versions of the functions (denoted by _xxx_r(), eg
    _printf_r in newlib)
  • recompile newlib with __DYNAMIC_REENT__ to have the non-reentrant functions
    call a user-defined function to get the current thread’s reentrancy structure
  • override the default global reentrancy structure as necessary

See the header file here for some detailed information:

And great explanations here- especially if using FreeRTOS, there’s built-in
support for managing reentrancy structures:

Let’s go on a small side adventure into reentrancy approaches for

One thread only

It might not be necessary to use the non-reentrant functions in small interrupt
handlers (and is probably a reasonable idea to prohibit them).

Here’s an approach that will cause a runtime error if a non-reentrant function
is called in an interrupt context. Unfortunately it requires rebuilding newlib,
so it’s really only for curiousity or the very paranoid.

First, rebuild newlib (or build it as part of your project build; this is
probably only practical if you’re using a compiler cache like ccache), setting
__DYNAMIC_REENT__ when building.

Then, when building your application source, define a __getreent() function
with the following contents:

// Function to check if in interrupt context
int isInInterruptContext(void) {
    // Read the CONTROL register
    uint32_t control_reg;
    asm volatile ("MRS %0, CONTROL" : "=r" (control_reg));

    // Check the nPRIV bit (bit 0)
    return (control_reg & 1) != 0;
struct _reent * __getreent(void) {
  if (isInInterruptContext()) {
    __asm__ ("bkpt 0");
  // return the built in global reentrancy context
  return _impure_ptr;

Also be sure to define __DYNAMIC_REENT__ in your CFLAGS when building your
application source.

This example trips a software breakpoint if a non-reentrant function is called
from an interrupt context, to catch the error condition. Alternatively, the
function could return NULL, and libc calls using the reentrancy structure
should return with no effect, but in practice this may not always be handled
in newlib.

Disable interrupts

This approach requires that every lower-priority thread enclose any
non-reentrant calls in a critical section, eg:


This can add latency to interrupts (on cortex parts, typically interrupts will
pend until enabled after the call), particularly bad if your printf output is
relatively slow, eg 115200 baud UART for example.

It also requires you to be diligent about where you are using non-reentrant

Only use reentrant libc functions

This approach requires only calling the reentrant versions of the libc
functions. See an example below.


// call printf from an interrupt without breaking other threads
void Foo_Interrupt_Handler(void) {
  struct _reent my_impure_data = _REENT_INIT(my_impure_data);
  _printf_r(&my_impure_data, "hey!n");

This is a pretty manual technique but it is relatively simple, and doens’t
require disabling interrupts or knowing the relative preemption priority of the
current call site.

Dynamic reentrancy

See the section above about building newlib; we need to enable
__DYNAMIC_REENT__ in the library build to have the non-reentrant functions
call __getreent() (i.e. enable it in the crosstool-ng configuration if
you’re building a whole toolchain). Then we can implement a version that
provides the correct reentrancy structure depending on our thread context:

// this system only needs 1 reentrancy context
static struct _reent isr_impure_structs[] = {

struct _reent * __getreent (void) {
  struct _reent *isr_impure_ptr;

  // this might use NVIC calls to figure out which ISR is active, or if there is
  // an OS, return current executing thread
  int thread_id = get_thread_id();

  switch thread_id {
    case 0:
      isr_impure_ptr = isr_impure_structs[0];
      // crash
      __asm__("bkpt 0");

  return isr_impure_ptr;

Note that if you’re using an RTOS, this may already be handled in some way or
there might be a hook when switching thread contexts (see above)
where we can move __impure_ptr, and we don’t even need to recompile newlib!

For example, Zephyr RTOS
handles logging from interrupts
or multiple preemptible thread contexts, so there’s no need to fiddle with the
low-level libc reentrancy structures.

Overriding __impure_ptr

Another option is to override the global reentrancy structure when inside
interrupts; a little trickier but can be useful in certain systems.

The global reentrant structure is referenced by the pointer __impure_ptr,
defined here:;a=blob;f=newlib/libc/reent/impure.c;h=76f67459e48d3efc20698f8fda39620b1359f63f;hb=HEAD#l27

You can instantiate your own copy of the structure:

struct _reent my_impure_data = _REENT_INIT(my_impure_data);

And then temporarily override _impure_ptr:

_impure_ptr = &my_impure_data;
// restore default global reentrancy struct
_impoure_ptr = &impure_data;

This makes it safe to call the above snippet in an interrupt handler, because it
won’t stomp on the global reentrancy data if it preempted an in-progress
non-reentrant function!

Note that you’ll need to do the same thing in every preemption context, i.e. any
call site that can interrupt another thread that is using non-reentrant
functions. So this approach would be primarily useful in cases with only a small
number of threads, or a well-considered preemption hierarchy where we know
exactly where to move the pointer and where we don’t need to.

Redirecting stdio

This section describes a few approaches for redirecting libc standard
input/output functions, and compares the tradeoffs each one makes.

Method Advantages Disadvantages
UART – simple, no special host HW needed – requires spare UART + pins on target
– can be slow
Semihosting – no extra HW requirements on target – requires a debug probe for the host to use
prohibitively slow, interrupts target
SWO – requires only a single (often specific) spare pin on the target
– relatively fast
– usually needs a debug probe for the host to use
– output only
RTT – no extra HW requirements on target
– relatively fast
– requires a debug probe for the host to use
– license may be problematic


UART: asynchronous serial. Most embedded microcontrollers will have at least


  • simple and widely available protocol (lots of available software and hardware
    tools interface to it)
  • doesn’t require an attached debugger; you can use it in PROD 😀


  • requires extra hardware to interface to the PC host, typically (USB to
    asynchronous serial adapter, like the FTDI or CP2102 adapters)
  • requires configuring and using the UART peripheral (if your microcontroller
    doesn’t have many UARTs, might be a problem using it for printf)

The biggest downside is you’re spending one of your UART peripherals for printf,
which might be needed for your actual application.

Also, configuring a UART might require managing clock configuration, chip
power/sleep modes, figuring out buffering the data (DMA?), making sure
everything is interrupt safe, etc.

And finally, you’ll usually need some kind of adapter dongle to interface the
UART with your PC (eg USB-to-TTL or USB-to-asynchronous serial).

Because the interface is contained entirely on the microcontroller, it can be
used without a debug probe, which can make it useful when not actively debugging
your system (eg you can use it to write log statements to an attached PC!).

Worth noting, if your microcontroller has support for USB, you might be able to
skip the need for an external FTDI-like adapter by implementing CDC
(Communications Device Class) on-chip. That adds a lot more software to your
system, but can be useful. Out of scope of this FAQ.


Next up the food chain, semihosting is a protocol that proxies the “hosting”
(meaning libc hosting, eg printf/scanf/open/close etc) over an attached
debug probe to the PC.

Some good descriptions:

And the reference documentation can be found in:

ARM Developer Suite (ADS) v1.2 Debug Target Guide, Chapter 5. Semihosting

This describes the detailed “RDIMON” (“Remote Debug Interface Monitor”)

TLDR, semihosting operates by the target executing the breakpoint instruction
with a special breakpoint id, bkpt 0xab, with the I/O opcode stored in r1
and whatever necessary data stored in a pointer loaded into r2.

The debug probe executes the I/O operation based on the opcode and data, then
resumes the microcontroller.

To use it, the simplest approach is to use newlib’s rdimon library spec:

# add these to your linker command line
LDFLAGS += --specs=nano.specs --specs=rdimon.specs

Then add this call somewhere in your system init, prior to calling any libc I/O

// this executes some exchanges with the debug monitor to set up the correct
// file handles for stdout, stderr, and stdin
extern void initialise_monitor_handles(void);

printf and friends should work after that! Be sure to check the user manual
for your debug probe on how it exposes the semihosting interface. For example,
segger jlink requires running some commands to enable semihosting:

# from gdb client
monitor semihosting enable
monitor semihosting ioclient 2


For pyocd, you can pass the --semihosting argument when starting the gdb

As an extra bonus, see below for a simplified semihosting solution based on the
newlib one that doesn’t require linking against rdimon.specs, and saves some
code space. Note that some debug probes require the setup steps run by
so those may need to be included as well (depending on your setup).

// Based on the implementation here:
// And here:
int _write(int file, uint8_t *ptr, int len) {
  int data[3] = {file, (int) ptr, len};
  __asm__("mov r0, %0 n mov r1, %1 n bkpt #0xAB"
          : : "r" (5), "r" (&data) : "r0", "r1", "memory");
  return len;


  • only requires SWD connection (SWCLK + SWDIO)
  • built into most debug servers (pyocd, openocd, blackmagic, segger jlink)
  • basic implementation is provided by newlib (no extra user code) and is dead
    simple to use
  • can do lots more than just printf; open, fwrite, read from stdin!


  • slowwww. really slow. tens to hundreds of milliseconds per transfer, depending
    on debug probe and host
  • doesn’t work when debugger is disconnected (target will be stuck on bkpt
    instruction; either a forever hang or a crash depending on cortex-m
    architecture set)
  • newlib implementation is relatively code-heavy (8k or bigger!)

Serial Wire Output (SWO)

A dedicated pin that can be used essentially as a UART for outputting data. It
can be pretty fast- baud rate sometimes reaching 1/1 of CPU core clock, since it
uses the TPIU.

It is however an optional hardware feature, so might not be available on every
chip (most common chips like STM32’s and nRF chips tend to include it), and
requires an extra pin- typically routed to the 10-pin ARM debug header.

It’s output only, and there’s a small bit of code necessary on the target to
enable routing printf to SWO.

A good tutorial, including background information and an example implementation,
can be found here:

The debug adapter and software does need to support the protocol. Most debug
adapters do support it, however (ST-Link’s, JLINK, DAPLink, etc).


  • generally pretty fast, similar performance to UART (speed is both target and
    probe hardware dependent)
  • simple implementation


  • output only
  • uses a pin on the target
  • not all chips or probes support it

Real Time Transfer (RTT)

Segger’s RTT reads/writes into a buffer in RAM while the chip is running. It can
be pretty fast compared to even a fast UART, and is much more efficient than
Semihosting without requiring any extra pins or hardware.

It also supports both output (target -> host) and input (host -> target).

Segger provides the RTT target implementation, as well as a few tools for using
RTT on the host. OpenOCD and PyOCD also support the protocol.

You can download the implementation here:

Zephyr also maintains a copy here, if you’re curious about perusing it (or want
to use it as a git submodule):

The downside is it requires a debug adapter to operate, since it relies on
reading and writing to memory.

It’s also important to understand the implications of enabling this in
production; be sure your device doesn’t get into a bad state if there’s no debug
adapter connected (i.e. when no debugger is attached, drop data instead of
backing up the RTT buffers).

I’m personally a big fan of RTT over semihosting- it’s usually going to be a
strict improvement if you don’t need file I/O.


  • fast!
  • simple on-device implementation
  • supported out of the box on Zephyr RTOS, among other platforms


  • requires a debugger with support for RTT


That’s all for now! Hopefully this article provided a little window into the
various ways printf can be used on small microcontrollers (or other unhosted
environments). It’s a surprisingly subtle corner of embedded development, and
there’s a few pitfalls that should be watched for, especially around concurrency
or performance.

The links below should provide a lot of additional information if you’re curious
about exploring the topic further.

And as always, if you have any questions or comments, feel free to reach out-
either in the comments here or in the Interrupt community Slack!

See anything you’d like to change? Submit a pull request or open an issue at GitHub


Benjamin Cabé