Never Trust the Software Guys: Only developers can prevent Header Path pollution

Shameless paraphrased from the Smokey Bear Wildfire Prevention campaign slogan: “Only You Can Prevent Wildfires”. Only developers can prevent Header Path pollution. What is Header Path pollution? Header Path pollution is when your build scripts have to specify dozens of directory paths so that the C/C++ compiler can find the header files named in your source code’s #include statements. Here are two examples:

I worked on an embedded project using a microcontroller from Nordic Semiconductor and the Zephyr RTOS. For this project, 20 different header directory paths had to be specified to compile against the microcontroller’s SDK and Zephyr.
The Raspberry Pi Pico C/C++ SDK requires 40 different directory paths to be specified to compile against the SDK.

The above examples do not include any application header directory paths or additional paths required for third-party packages. In both examples, the total number of header paths in the build scripts exceeded the numbers listed above. Ultimately this does not scale. Yes, it just the build scripts – but someone has to maintain them. I have yet to encounter a developer who enjoyed maintaining build scripts – myself included. And if your day job requires you to develop on a Windows box, the Windows command line length is limited to 8191 characters.

File Organization is a Strategic Best Practice

Source code file organization is very often done organically. That is, developers start writing code and organizing their files based on their immediate needs or what they did on their last project. Consequently, only later in the project when there are new requirements or when the code is being reused on another project does the consequences of unplanned file organization become an issue. Changing rigid source code directory structures can be very painful, i.e. IME it is rarely done.

Thinking long term about your file organization before you start creating files is especially important when it comes to header files. Let’s examine a time-honored practice in the C programming world where the public header files (.h) are located in a separate include/ directory from the source (.c) files which would be located in a source/ directory. The separation of the public header files from the source files provides a very clear and compiler enforceable semantics that an application only has access to the library’s intended public interface. This is a good example of strategic thinking.

Zephyr Example

Let’s take the case of the Zephyr RTOS. Zephyr has a single include/ directory that contains all of the public header files for the core Zephyr kernel. Below is a snippet of the Zephyr’s directory structure

zephyr
+---arch
+---boards
+---drivers
+---include
|   +---app_memory
|   +---arch
|   +---audio
|   +---bluetooth
|   +---canbus
|   +---console
|   +---crypto
|   +---data
|   +---debug
|   +---devicetree
|   +---dfu
|   +---disk
|   +---display
|   +---drivers
...
|   +---task_wdt
|   +---timing
|   +---toolchain
|   +---tracing
|   +---usb
|   \---zephyr
+---kernel
...

If Zephyr has all of it public header files under a single include/ directory – why did my project required 20 header directory paths added the build scripts? This is because the Zephyr RTOS is really a mix of the core kernel and a collection of modules, and third-party packages. The Nordic nRF SDK being an example of third-party packages. These extra modules and packages are managed separately in their own GIT repositories. The break down of the project’s header paths is as follows:

1 include path for the core Zephyr kernel
1 include path for the auto generated header files produced during the build
2 include paths for the ARM architecture of the MCU
1 include path for supporting the libc alternative newlib
2 include paths for Mbed TLS cryptographic library
12 include paths specific to the Nordic nRF SDK.

All of the above seems fairly reasonable, except maybe for why does the Nordic nRF SDK require 12 different header directories? Short answer is: because that is how Nordic implemented their SDK. However, as a consumer of the SDK, I can’t but wonder why their header include strategy was not as well thought as Zephyr’s. Especially when you consider that the nRF SDK is Nordic’s next generation SDK. Overall Zephyr gets high marks for preventing Header Path pollution. I am on the fence about Nordic’s nRF SDK.

Raspberry Pi Pico

IMO the Pi Pico C/C++ SDK is an example of a best practice gone wrong. Now before I get further, let me say two things: 1) This is not just a Raspberry PI Pico issue – it is just the my most recent encounter. I have used many other SDKs that have the same issue, and 2) I positively enjoy working with Pi Pico.

Below is a snippet of the SDK directory structure. I am only showing the include directories that have to be added to the build scripts.

src
+---boards
|   \---include
+---common
|   +---boot_picoboot
|   |   \---include
|   +---boot_uf2
|   |   \---include
|   +---pico_base
|   |   \---include
|   +---pico_binary_info
|   |   \---include
|   +---pico_bit_ops
|   |   \---include
|   +---pico_divider
|   |   \---include
|   +---pico_stdlib
|   |   \---include
|   +---pico_sync
|   |   \---include
|   +---pico_time
|   |   \---include
|   +---pico_usb_reset_interface
|   |   \---include
|   \---pico_util
|       \---include
+---host
|   +---hardware_divider
|   |   \---include
|   +---hardware_gpio
|   |   \---include
|   +---hardware_sync
|   |   \---include
|   +---hardware_timer
|   |   \---include
|   +---hardware_uart
|   |   \---include
|   +---pico_multicore
|   |   \---include
|   +---pico_platform
|   |   \---include
|   \---pico_stdio
|      \---include
+---rp2040
|   +---hardware_regs
|   |   \---include
|   \---hardware_structs
|       \---include
\---rp2_common
    +---boot_stage2
    |   +---asminclude
    |   \---include
    +---cmsis
    |   +---include
    |   \---stub
    |       \---CMSIS
    |           +---Core
    |           |   \---Include
    |           \---Device
    |               \---RaspberryPi
    |                   \---RP2040
    |                       +---Include
    +---hardware_adc
    |   \---include
    +---hardware_base
    |   \---include
    +---hardware_claim
    |   \---include
    +---hardware_clocks
    |   +---include
    +---hardware_divider
    |   \---include
    +---hardware_dma
    |   \---include
    +---hardware_exception
    |   \---include
    +---hardware_flash
    |   \---include
    +---hardware_gpio
    |   \---include
    +---hardware_i2c
    |   \---include
    +---hardware_interp
    |   \---include
    +---hardware_irq
    |   \---include
    +---hardware_pio
    |   \---include
    +---hardware_pll
    |   \---include
    +---hardware_pwm
    |   \---include
    +---hardware_resets
    |   \---include
    +---hardware_rtc
    |   \---include
    +---hardware_spi
    |   \---include
    +---hardware_sync
    |   \---include
    +---hardware_timer
    |   \---include
    +---hardware_uart
    |   \---include
    +---hardware_vreg
    |   \---include
    +---hardware_watchdog
    |   \---include
    +---hardware_xosc
    |   \---include
    +---pico_bootrom
    |   \---include
    +---pico_cyw43_arch
    |   \---include
    +---pico_double
    |   \---include
    +---pico_fix
    |       \---include
    +---pico_float
    |   \---include
    +---pico_int64_ops
    |   \---include
    +---pico_lwip
    |   \---include
    +---pico_malloc
    |   \---include
    +---pico_mem_ops
    |   \---include
    +---pico_multicore
    |   \---include
    +---pico_platform
    |   \---include
    +---pico_printf
    |   \---include
    +---pico_runtime
    |   \---include
    +---pico_stdio
    |   \---include
    +---pico_stdio_semihosting
    |   \---include
    +---pico_stdio_uart
    |   \---include
    +---pico_stdio_usb
    |   \---include
    \---pico_unique_id
        \---include

This is Header Path pollution. When compiling one the SDK’s Wifi example projects on my laptop, the compiler command for a single file is 5577 bytes long, of which 4151 bytes are used to specify the header search paths.

Final thoughts

Project and company-wide file organization planning that is done early and deliberately is a strategic best practice. This is especially critical for packages (e.g. SDKs, middleware, stacks, etc.) that will be included on other projects. And there is more to file organization than simply where the header files are located – file naming. For example: how to prevent header file name collisions. As the memory footprint for microcontrollers gets larger, embedded projects tend to have more and more third-party code. A common C practice is to not use path information in #include statements. This means that the header files for your entire application all have to have unique file names. But this is discussion for another day.

So be like Smokey Bear and plan your file organization before you create the first file.

Patterns in the Machine