FOUNDRIES.IO INSIGHTS

Zephyr Logging Survival Guide, Part 2: Logger Subsystem

By Marti Bolivar | August 14, 2018

An overview of the new logging system merged for v1.13, or: Can’t Get Enough of your Logs, Zephyr.

In part one of this series, we described Zephyr’s venerable SYS_LOG logging API. While it has (mostly) served us well, there’s a brand new alternative: a “Logger” subsystem was recently merged into master, and will be part of the v1.13 release. We’ll explore it in detail in this second and final part, comparing and contrasting it with SYS_LOG.

Console output is from Zephyr 55327a183 on the nrf52840_pca10056 board. Complete application source code for all examples is available.

Contents:

Basic Usage

At first blush, using Logger isn’t that different from SYS_LOG:

  1. set a log level (one of LOG_LEVEL_NONE, LOG_LEVEL_DBG, LOG_LEVEL_INF, ..._WRN, ..._ERR)
  2. set a “module” name (more on this later)
  3. include <logging/log.h>
  4. perform some module-specific initialization
  5. log the logs!

Logger has printf()-like macros LOG_DBG, LOG_INF, LOG_WRN, and LOG_ERR, which work the way you’d expect.

Example code (full source on GitHub):

/* Display all messages, including debugging ones, in this file: */
#define LOG_LEVEL LOG_LEVEL_DBG

/* Set the "module" for log messages in this file: */
#define LOG_MODULE_NAME foundries_io_basics

#include <logging/log.h>

/* Initialize module-specific magic state (once per module name): */
LOG_MODULE_REGISTER();

void main(void)
{
	LOG_DBG("verbose debug %d", 3);
	LOG_INF("everything is fine, mask=0x%x", 0xa1);
	LOG_WRN("warning: %s was seen", "something bad");
	LOG_ERR("error %d", 3);
}

Output screenshot:

The output is formatted as follows:

[TIMESTAMP] <LEVEL> <MODULE>: formatted_message + \n

Like SYS_LOG, the Logger subsystem’s output (by default):

  • goes to the console UART
  • includes the formatted message
  • automatically appends a newline
  • prints the “module” (which is somewhat similar to a SYS_LOG domain)

Unlike SYS_LOG, however:

  • By default, output includes timestamps and is colorized (hurray!).
  • The name of the function which logged the message is not included in the output. You can of course add it yourself, e.g. with LOG_INF("%s: hi there", __func__).
  • Module names must be valid C identifiers (they can also begin with a digit), rather than arbitrary strings: so foundries_io_basics is OK, but foundries.io/basics will cause a cryptic compilation error. This is a “gotcha” for SYS_LOG users, since upstream Zephyr uses a slash (/) for namespacing in SYS_LOG domains.

Yawn… OK, so what’s the big deal? That’s not a huge difference; why introduce a whole new subsystem?

So Many Fancy Features

Logger has several features not present in the more basic SYS_LOG. We’ll dig into these in particular:

  • Deferred logging
  • Timestamps
  • Binary data dumping
  • Module filtering

We’ll describe Logger’s quirks and sharp edges as well.

There are a few Logger features we won’t evaluate:

  • Instance logging: we’re curious to kick the tires on this feature, but weren’t able to get to it due to time constraints.
  • Multiple backends: at time of writing, only UART logging is supported. Additional logging destinations (like network or flash logging) may be added later on – this is an opportunity to contribute to Zephyr!
  • Multi-domain/multi-processor readiness: the Logger subsystem was designed for use with those newfangled multi-core environments Zephyr has been adding support for lately. We’ve only got Zephyr running on humble single-core MCUs on our desk, though. And anyways, multi-core is for running Linux, right?
  • Logger as a printk() backend: enabling CONFIG_LOG_PRINTK redirects printk() output to the Logger. Good to know, but we’re happy with printk() as a nice, simple way to write to the UART console.

Interlude: Who Will Time the Timestamps Themselves?

If you’ll permit the digression.

Before we dig any deeper into Logger’s other features, we’ll analyze its basic behavior further, comparing it with an equivalent SYS_LOG example from part 1.

This will help explain things, promise.

Logger Timestamps vs. Message Print Times

First, let’s take a close look at the timestamps Logger printed:

Wait… the four messages all “happened” in the very same microsecond? (Readers who need further convincing regarding the decimal separator format are referred to log_output.c.)

The above screenshot is of output from a UART console at the usual 115200 baud, 8N1 setting. It takes at least 86 microseconds to output a single character at that speed, so the LOG_XXX macros definitely aren’t sitting around polling the UART, the way their SYS_LOG equivalents do: the timestamp calculation has to be happening before anything gets printed.

Readers who are playing along at home will also have noticed a significant delay between Zephyr’s boot banner and the appearance of the first log message. This can be seen by capturing the serial output using grabserial in timestamp mode, like so:

$ grabserial -t -b 115200 -d /dev/serial-port-device

On this system, the output was (with ASCII art notes on the grabserial timestamps):

 +-- Rough timestamp since first character, in seconds
 |
 |        +--- Relative timestamp since last line, in seconds
 |        |
 v        v
[0.000001 0.000001] ***** Booting Zephyr OS v1.12.0-1181-g55327a183 *****
          +--- Huh?! Over a second elapses between boot banner and first log!
          |
          v
[1.007872 1.007871] [00:00:00.003,662] <dbg> foundries_io_basics: verbose debug 3

          +--- Next log messages appear quickly thereafter.
          |
          v
[1.012974 0.005102] [00:00:00.003,662] <inf> foundries_io_basics: everything is fine, mask=0xa1
[1.019813 0.006839] [00:00:00.003,662] <wrn> foundries_io_basics: warning: something bad was seen
[1.027559 0.007746] [00:00:00.003,662] <err> foundries_io_basics: error 3

What’s up with that entire extra second of delay after Booting Zephyr OS but before the first log message? Is Logger doing some heavy-duty initialization, or is something else going on?

Comparison with SYS_LOG

Though SYS_LOG doesn’t print timestamps, we can capture print times with grabserial. Sure enough, SYS_LOG messages start printing right away:

[0.000001 0.000001] ***** Booting Zephyr OS v1.12.0-1181-g55327a183 *****
          +--- No significant delay here.
          |
          v
[0.004089 0.004088] [foundries.io/basics] [DBG] main: verbose debug 3
[0.008852 0.004763] [foundries.io/basics] [INF] main: everything is fine, mask=0xa1
[0.014059 0.005207] [foundries.io/basics] [WRN] main: warning: something bad was seen
[0.020125 0.006066] [foundries.io/basics] [ERR] main: error 3

On Screen!

Let’s see the default timing in detail by generating some pulses on a GPIO pin at the same time as doing some Logger calls:

  • One pulse with a few NOPs in between, to get a baseline
  • Two consecutive pulses wrapping a LOG_DBG("") invocation, for rough figures of merit

This is the overall result:

Pulse generation code (full source on GitHub) follows.

The first pulse is done with a few NOPs in between, just to get a rough idea of pulse behavior:

NRF_P0->OUTSET = 1U << PIN;
__NOP();
__NOP();
__NOP();
NRF_P0->OUTCLR = 1U << PIN;

The next two pulses wrap LOG_DBG("") calls. This may underestimate the true cost to log a message, as the compiler could move some code outside of the OUTSET/OUTCLR lines, but it’s a good-enough approximation:

NRF_P0->OUTSET = 1U << PIN;
LOG_DBG("");
NRF_P0->OUTCLR = 1U << PIN;

NRF_P0->OUTSET = 1U << PIN;
LOG_DBG("");
NRF_P0->OUTCLR = 1U << PIN;

(Generously) accounting for slew rate, the first pulse is in the neighborhood of 100 ns, which puts pulse overhead in the noise:

Subsequent pulses are a bit over 4 microseconds:

So, at least on this board, there are only about 4 microseconds of overhead to prepare and buffer a basic log message. Sweet! That’ll work great in our IRQs.

… Wait a minute, though: the messages’ logged timestamps agree to the microsecond in the UART output, but they should be about 5 microseconds apart in real time. Gotcha:

Logger timestamps are printed to the microsecond, but are (probably) not accurate to the microsecond.

The nRF52840 SoC we’re using for our testing has a timer driver with maximum resolution of about 30 microseconds. Since we observed 4-5 microseconds of overhead to prepare each Logger message, it makes sense that four back-to-back messages could have the same timestamp:

(5 μsec / message) * (4 messages) = 20 μsec < 30 μsec

(Incidentally, further testing showed that just getting the timestamp itself takes a little bit more than a microsecond on this part.)

On other ARM Cortex-M SoCs, timestamp accuracy is likely to be limited by the SysTick peripheral’s resolution. Elsewhere, your mileage will vary.

We’ll briefly discuss Logger’s mechanisms for using more accurate timers, if your hardware has them, below.

Where’s the UART output? Taking out the logic analyzer, sure enough, it appears about a second later (the near-invisibly-thin line on channel 0 is the sequence of three pulses from earlier):

What the Heck?

Unlike SYS_LOG, Logger captures timestamps and prepares log messages for later printing in another thread.

This is called deferred logging in the official documentation, which further reads:

Logger is designed to be thread safe and minimizes time needed to log the message. Time consuming operations like string formatting or access to the transport are not performed by default when logger API is called. When logger API is called a message is created and added to the list. Dedicated, configurable pool of log messages is used.

In other words, macros like LOG_ERR() don’t print anything; instead, they make a note of information to be printed at some point later on. The timing above is from Logger’s default configuration for what “later on” means. (This can be tuned, as we’ll see… later on.)

Deferred Logging

The default deferred logging behavior shown above can be fine-tuned with some Go Faster knobs, and also has an Emergency We’re On Fire Print Everything Right Now mode. If you need further control, there are escape hatches to allow the application to determine all details of when log messages are handled.

Going Faster: Trigger Threshold and Sleep Time

Let’s talk about the Go Faster sliders first, as pulling the fire alarm has serious consequences.

By default, Logger creates a low-priority background thread that actually prints messages, and starts it up at boot time. This thread wakes up and processes messages when either of the following things happens:

These options exist for users to tweak the background thread’s behavior.

Setting CONFIG_LOG_PROCESS_TRIGGER_THRESHOLD=2 in prj.conf and using grabserial, we see a shorter delay (full source on GitHub):

[0.003972 0.003970] main() started
[0.005874 0.001902] [00:00:00.005,065] <dbg> foundries_io_threshold: verbose debug 3

(Here, we’ve switched to a “main() started” printk() call to help disambiguate time spent by Zephyr between printing the boot banner and calling main(), now that we’re paying closer attention to timing.)

Likewise, setting CONFIG_LOG_PROCESS_THREAD_SLEEP_MS=500 also decreases the delay to just over half a second (full source on GitHub):

[66.731052 0.004132] main() started
[67.235140 0.504088] [00:00:00.005,126] <dbg> foundries_io_sleep: verbose debug 3

Pulling the Fire Alarm: Panic Mode

As discussed in part 1, one benefit to SYS_LOG’s simplicity is that a polling UART implementation is exceedingly difficult to break. This is extremely useful for printing last gasp messages in fatal error handlers, if, say, the scheduler’s memory gets hopelessly corrupted and deferring logs to another thread wouldn’t work.

To cover situations like this, Logger has a “panic mode”. If you call the LOG_PANIC() macro (defined in <logging/log_ctrl.h>):

  • all pending messages are flushed
  • deferred logging is permanently disabled; all further Logger calls are handled synchronously

A couple of notes:

  • the upstream documentation currently covers the log_panic() function instead of LOG_PANIC(), but the macro has the nice property of compiling away when logging is disabled, so we recommend using it instead.
  • Since LOG_PANIC() flushes all pending messages, among other things, the call itself can take a long time to print existing messages out on the UART.

Escape Hatch: Processing Messages Yourself

Finally, if none of the above work for you, you can disable CONFIG_LOG_PROCESS_THREAD entirely and take control over when messages are printed.

The <logger/log_ctrl.h> APIs allow this. Pointers for getting started:

  • Call log_init() during system initialization, to finish logger setup.
  • Call log_process() regularly to dequeue and print messages.
  • If you want to support trigger thresholds as described above, you should also call log_thread_set() with the thread ID of your processing thread. The logger core will wake it from sleep when the threshold is crossed.

Timestamps

Logger prints timestamps down to the microsecond level. By default, it uses k_cycle_get_32() to fetch the timestamps themselves. Despite the name, this is often not a cycle-accurate timer – you’ll need to check the implementation on your platform to be sure.

As a rule of thumb, it’s best to ignore the pieces of the timestamp below the millisecond range unless you know what you’re doing.

If you need higher-resolution timestamps than the default setting and have an accurate timer, you can use log_set_timestamp_func() to override Logger’s default settings.

However, it’s probably not worth going crazy trying to achieve microsecond accuracy in most cases, since preparing even simple log messages already takes a few microseconds on a microcontroller. If you truly need sub-microsecond accuracy, then system trace peripherals, GPIOs, etc. remain your friends.

Binary Data Dumping

Sometimes, it’s the little things.

SYS_LOG doesn’t have a way to dump binary data. Use it for long enough, and you’ll end up rolling your own hexdump wrappers, which you have to copy/paste around, especially if you’re debugging upstream code. A minor inconvenience, but one nonetheless.

Thankfully, Logger has an answer for this (full source on GitHub):

unsigned char data[] = {0xde, 0xad, 0xbe, 0xef,
			'd', 'e', 'a', 'd',
			'b', 'e', 'e', 'f',
			0xde, 0xad, 0xbe, 0xef};

LOG_HEXDUMP_INF(data, sizeof(data));

Output (note hex and ASCII are both present):

[00:00:00.004,638] <inf> foundries_io_dumping:  de ad be ef 64 65 61 64|....dead
                                                62 65 65 66 de ad be ef|beef....

There are LOG_HEX_DUMP_ERR(), ..._WRN(), and ..._DBG() variants as well, of course.

When using this feature, it’s important to be aware that data are copied into log message buffers until they can be processed. This memory comes from the same pool as all other messages, so logging long blobs can cause you to run out of space for other messages.

We’ll discuss what happens in cases of overflow in the Quirks and Gotchas section below.

Module Filtering

Logger supports compile and run time filtering of each module’s logs.

What is a module, you may ask? When you see module, think of modular programming. Potential examples:

  • blocks on your block diagram
  • the part of your application that captures sensor data
  • a subsystem (like USB)
  • the core kernel

If you’re unsure of how to break your application into modules, here are some suggestions to get started:

  • one module per file (named for the file) or subdirectory of src/ (named for the directory)
  • one module for your entire application (named for your app)

Compile Time

Logger can do compile-time filtering of log calls by verbosity level on a per-file basis (or per-module basis, if you set it up nicely) through use of the LOG_LEVEL macro.

It’s usually worth taking the time to set up a Kconfig option to control a module’s log level. This is especially true if your module is going into upstream Zephyr, or if it’s split into multiple files.

Example code for how to set up and use a log level Kconfig option follows. As usual, you can find full source on GitHub, and that’s especially worth checking out here, since you also need to make some CMakeLists.txt and Kconfig file changes as documented in the Application Development Primer.

In module_main_file.c:

#define LOG_LEVEL CONFIG_FOUNDRIES_IO_MOD_LOG_LEVEL
#define LOG_MODULE_NAME foundries_io_mod
#include <logging/log.h>
LOG_MODULE_REGISTER();

void module_init(void)
{
	LOG_INF("initialized; getting help.");
	module_get_help();
}

In module_helper_file.c:

#define LOG_LEVEL CONFIG_FOUNDRIES_IO_MOD_LOG_LEVEL
#define LOG_MODULE_NAME foundries_io_mod
#include <logging/log.h>

/*
 * Note LOG_MODULE_DECLARE() is used instead of LOG_MODULE_REGISTER().
 * Exactly one file registers the module. The other files in the
 * module still need to declare module-specific state before using
 * logger APIs.
 */
LOG_MODULE_DECLARE();

void module_get_help()
{
	LOG_DBG("i'm trying to help!");
}

Default output:

[00:00:00.003,753] <inf> foundries_io_mod: initialized; getting help.
[00:00:00.003,753] <dbg> foundries_io_mod: i'm trying to help!

If we set CONFIG_FOUNDRIES_IO_MOD_LOG_LEVEL=3 (INF) in prj.conf, rebuild, and reflash, the <dbg> message disappears and the output is:

[00:00:00.003,753] <inf> foundries_io_mod: initialized; getting help.

If we set CONFIG_FOUNDRIES_IO_MOD_LOG_LEVEL=0 (NONE), no logs are printed from this module.

Run Time (Advanced)

Logger also supports run time filtering. One use for this could be to enable more verbose logs in response to some input or condition, to collect detailed reports from misbehaving devices without cluttering the logs on devices that are performing correctly.

Run time filtering is a somewhat advanced topic, with a few sharp edges:

  • Since it increases program size and involves a performance penalty, it is disabled by default, and must be turned on by enabling CONFIG_LOG_RUNTIME_FILTERING before use. Otherwise, attempts to change filter settings are silently ignored. (It would be nice if these APIs returned errors instead in that case.)
  • The log_filter_set() routine, which is used to control run time filter settings, requires digging into some currently unused/internal state in the Logger subsystem. (We’ve hidden these details in a helper; it would be nice if Logger itself had a simple wrapper for the basic “single core, this module” case.)
  • Changing a module’s filter settings at runtime is orthogonal to its outstanding messages actually getting processed. This can lead to surprising results depending on the times when: 1. messages are logged, 2. filters are changed, and 3. the background processing thread actually runs. (We work around this by waiting for all logs to get flushed before changing the filter.)

(Maybe you’ll improve this situation and contribute your patches upstream?)

Example (full source on GitHub)

void func()
{
	/* Log a couple of messages. */
	LOG_INF("for your information");
	LOG_DBG("no bugs here!");

	/* Wait for them to get processed. */
	wait_for_log_processing();
}

void main(void)
{
	printk("All logs disabled:\n");
	set_my_log_level(LOG_LEVEL_NONE);
	func();

	printk("INF and up enabled:\n");
	set_my_log_level(LOG_LEVEL_INF);
	func();

	printk("All logs enabled:\n");
	set_my_log_level(LOG_LEVEL_DBG);
	func();
}

Output:

***** Booting Zephyr OS v1.12.0-1181-g55327a183 *****
All logs disabled:
INF and up enabled:
[00:00:00.007,263] <inf> foundries_io_mod_runtime: for your information
All logs enabled:
[00:00:01.015,960] <inf> foundries_io_mod_runtime: for your information
[00:00:01.015,960] <dbg> foundries_io_mod_runtime: no bugs here!

Quirks and Gotchas

Grab Bag

The Logger subsystem has a few surprises (especially if you’re used to SYS_LOG), some of which we’ve mentioned above:

  • Module names: Logger’s macro metaprogramming uses module names in C identifiers, which means they can’t be arbitrary strings. Sticking to valid C identifiers is best, though we’ve found that leading digits didn’t pose a problem.
  • Timestamp accuracy: Logger won’t replace your scope. Timestamps are displayed to the microsecond, but as we’ve seen earlier, there’s no guarantee they are accurate to the microsecond. Be aware of this before relying on them for precise measurements.
  • Processing delays: By default, Logger takes up to a second before processing logs. You can tune this, but the defaults can surprise.
  • Complex run time filtering: the run time filtering mechanism has some sharp edges, as discussed in above sections.
  • Starvation: By default, the Logger background thread is low priority, so it doesn’t interfere with your system. If the system is under heavy load, however, that means the thread could be starved and logs might not print, the pool of messages might become empty, causing lost messages, etc.

A couple of additional quirks deserve sections of their own.

Logging Strings

Unlike the binary data dumping routines, logging strings which can change between log creation time and processing time doesn’t work the way you’d expect. Logger doesn’t copy string data when buffering log messages (since it doesn’t inspect format strings at all, to save time), so it’s not safe to change a string after you’ve passed a pointer to it to Logger.

If you really need to log a mutable string, consider using the binary data dumping routines described above as a hackaround, since they copy the data and include ASCII output.

Overflow

Since logger has a fixed-size pool for buffering log messages, it’s possible to run out of room. When that happens, Logger has two choices:

  1. throw away old messages to make room to buffer new ones
  2. throw away new messages until old ones are processed

The default is option 1, which can be enforced by setting CONFIG_LOG_MODE_OVERFLOW=y. The alternative can be chosen by setting CONFIG_LOG_MODE_NO_OVERFLOW=y.

If you have space, you can also increase the pool size by increasing CONFIG_LOG_MSG_SIZE. You can also try to use compile or run time filtering to decrease the number of logged messages you don’t care about (though be aware that just enabling run time filtering carries a footprint penalty of its own).

Summary

The Logger subsystem has a tiered architecture with a fairly complex set of features designed to allow logs to be created and buffered with minimum latency, and processed later, with a sophisticated set of filtering and tuning knobs.

This extra complexity has important benefits:

  • Low Latency: log messages can be buffered in a matter of microseconds, making them much safer to use in interrupts, latency-sensitive threads, etc.
  • Thread safety: log messages are created from a thread-safe buffer pool and reference counted, so there’s no risk that logging from two threads will cause garbled output, and memory is returned to the pool when no longer in use.
  • Run time filtering: This feature, absent from SYS_LOG, is enabled in Logger through explicitly managing extra state associated with modules and instances.
  • Backend support: Logger was designed for expansion, so network and nonvolatile storage logging backends are sure to make an appearance in the future.
  • Multi-processor ready: Logger will enable safe logs in multi-processor Zephyr systems.

It also has important drawbacks compared to the simplicity of SYS_LOG:

  • More complex APIs: the introduction of “source” and “domain” IDs, necessary to make Logger work, can make for clumsy usage in simple cases.
  • Memory overhead: Logger’s use of a message pool which copies log data is a space for speed tradeoff.
  • Scheduling considerations: SYS_LOG messages go out at the call site, no questions asked. Deferred logging implies making sure the processing thread neither starves or adds too much system overhead.
  • Extra quirks: Logger can be something of a leaky abstraction, with a few quirks to be aware of as discussed throughout the article.

Additional Resources

Upstream Zephyr Logger documentation:

comments powered by Disqus