Every developer has opinions on coding standards. I am no exception. If you are looking for the one Coding Convention to rule them all – this is not that post. Instead, I am going to discuss my thought process when defining or reviewing a Coding Convention.
I have authored more than one Coding Convention document over my career. And as an author, I have had to campaign (convince) team members that the proposed conventions are the right ones (at least for some values of right). Everyone has different expectations of what makes up coding conventions. For me coding conventions are broken down into the categories are listed below. I find it helpful to keep these categories in mind when defining/deciding/negotiating what is or is not in a coding conventions document. For example, does it make sense to fight long and hard for tabs instead of space, or for restricting how the heap is used?
Coding Standards
Coding standards define rules that effect the reliability of the code. These rules should be scoped to coding practices which bear directly on the reliability of code, i.e. violating a code standard puts your code in jeopardy of error. For example, rules dealing with dynamic memory, or type casting.
Naming
Naming rules are further sub-divided into naming of code constructs such as variables, function, classes, structs, preprocessor symbols, etc.; and file/directory naming. The first concern for the naming rules is to provide scalability as the size of the code base increases, i.e. prevent name collisions. Failure to comply with naming rules is not going to reduce the quality of your code, but it can cause pain or undesirable work-arounds in the future when there are name collisions.
Style
Style rules provide formatting consistency and readability of the source code. This is where the whole tabs vs spaces, curly brace notation, etc. comes into play. The goal here is to make the code base as readability as possible since more time is spent reading source code than authoring. Another argument for Style rules is that visually pretty code is easier comprehend.
Below is a list of what I consider are essential conventions. The list is far from complete – even for a minimum set of Conventions. My intent was to provide a starting point and to get the reader thinking in terms of the why and value of the rules, not just syntax.
Coding Standards
Specify the language standard
For C/C++ projects define the language standard(s) your source code must conform to, e.g. C99 ISO/IEC 9899:1999, C++ 11 ISO/IEC 14882:2011. This is typically dedicated by what cross-compiler(s) are available for your target platform. This is especially important when your initial work is done with a functional simulator because the default compilers for Windows/Linux are usually the latest language standard, but your cross compiler for the target may not support the latest standard. You want to make sure that the compiler(s) used for the unit test and the functional simulator are the same language standard as your target cross compiler.
At least for the embedded space – I recommend that you do not use the latest and greatest language standard. Case in point. On one project I worked on (circa 2018), we specified C++11 as the language standard. We thought we were being overly conservative by picking a standard that had be out for 7 years. However later on in the project where we asked to build strip down version – for laboratory research – that ran on the legacy platform. The legacy platform was running WinCE 5.0. The build tools for WinCE 5.0 is Visual Studio 2008 – which does not support C++11. Obviously, a problem. Fortunately, only a subset of our code base needed to be built for the legacy platform. But we still had to go back refactor this code to be C++03 compliant.
Note: Specifying a language standard also means that you should not use any compiler specific features, extensions, #pragma
, etc. (I am talking about you Mr. gcc). The obvious exception to this rule is in your BSP code, which by definition is specific to a hardware platform and a compiler. However, make the BSP as small as possible and do not expose hardware or compiler dependencies in its interfaces to the application.
No Compiler warnings
The compiler is your best and first defense against errors. When it warns you that something is not kosher, you need to pay attention. In practice 99 out of 100 compiler warnings are nuisances. However, if you code is riddle with compiler warnings trying to find that one meaning full warning in the haystack of warnings is nigh impossible. A better alternative is to always turn on the treat-warnings-as-errors compiler option along with enabling a reasonable warning level (e.g. for gcc use: -Wall -W3
) and prevent the warnings from beginning.
You build scripts from day one should be configured to treat warnings as errors. Anything else is just incurring technical debt.
Some of the most overlooked warnings – which are real errors – are sign mismatches and loss of precision (e.g. truncating a 32 bit value into a 16 bit variable). The moral of this is to actually resolve the warnings – not just throw an explicit cast to make the warnings go away.
Avoid type casting
Strict type checking is your friend – take full advantage of it! The more type checking done by the compiler the better off your code will be. Explicit casting defeats the compiler’s ability to detect type errors. When using C++ there is less need (versus C code) for type casting. When you do need to do type casting – try to restrict the casting such that it is not part of a public interface. Or said another way, you don’t want clients/consumers of a module’s interface having to perform type casting when using the module.
Do not use recursion
Recursion is unbounded dynamic memory allocation in disguise. Yes, technically there is an exception to this rule: when the worst case stack usage can be determine for all usage of the function AND that the stack usage fits within the allocate stack space for all thread(s) that the code executes in. But verifying all of the exception criteria is challenging and error prone (i.e. what about tomorrow usage?). It is best to just make this a no-exception rule.
Use more than one Compiler
You would think that writing code to the language standard would mean that your code is compiler independent – wrong! Yes, the code compiles – but the execution is not same or is incorrect. Just google C++ undefined behavior
I have built and ran the same unit test with 3 different compilers (VC, Mingw, gcc) only to have the unit test fail at run time when built with one of the compilers, but run successfully when built by the other two compilers. In most cases where this happened there was a bug in the unit test or the code under tests that was the ultimate root cause of the failure. So having to get a passing grade from three different compilers was a good thing.
And yes, I realize that if you only build for single target platform this rule sounds silly. However, I have worked on more than one project where we had to change Microcontroller vendors part way through the project. And changing the Microcontroller vendor required a different compiler. This is where my sales pitch comes in for having functional simulator or automated unit tests that run on the Windows/Linux box. Not only a best practice, but it is great insurance for that when your target platform changes unexpectedly, i.e. you already know that large part of code base is platform independent.
Naming
Name collisions are bad. My latest name collision story comes from a project where we were using a third-party RTOS and third-party JSON library. The RTOS package defined a macro named _current
and the JSON Library had classes that had a member variable named _current
. This was not a problem till we had module that had #include
statements for both the RTOS and the JSON header files. The final fix for this was to always include the JSON header file before any RTOS header files. Unfortunately this imposed an artificial dependency on the RTOS package to now be dependent on the JSON library. You can argue which third party package was at fault – but the impact was to our project! Fortunately, this naming collision was a compile time error. I have had another experience – involving yet another RTOS and poorly named macros, where the naming collision did not cause a compile/link time failure – but was a run time failure. Needless to say, this error took a considerable amount of time to find.
Note: Your naming rules will only protect your code base from introducing naming collisions in the future. Unfortunately, these rules cannot eliminate all naming collisions the are introduced by third-party packages. Maybe some day good practices around naming will be as common as don’t use gotos
.
Header File Content
To avoid naming collision in your source files – you need to have rules that will guarantee that all symbols (macros, class names, function names, etc.) in header files are globally unique.
For C++ code the easiest way to do this is to require the use of namespaces, i.e. require that all classes, functions, variables are scoped within a namespace (and not the std::
namespace). I also recommend that you organize your directory structure to map 1-to-1 with your namespaces. This approach accounts for all of the header file symbols being unique except for preprocessor symbols. Yes, I know that die-hard C++ programmers do not use the preprocessor – but for those few/some/most C++ programmers that do use the preprocessor there needs to be rules for ensuring that all preprocessor symbols in header files are unique. My recommendation for the preprocessor symbols is to prefix the basic symbol name with the directory structure (since you took my suggestion to have directories map 1-to-1 with namespaces). Here are is an example for the file: src/Cpl/Type/Guid.h
#ifndef Cpl_Type_Guid_h_
#define Cpl_Type_Guid_h_
#include <stdint.h>
/// The number of binary bytes in a GUID
#define CPL_TYPE_GUID_BINARY_LENGTH 16
///
namespace Cpl {
///
namespace Type {
/// This structure defines a type for a 16 Byte GUID/UUID.
struct Guid_T
{
/// Value as single memory block, and/or internal structure
union
{
uint8_t block[CPL_TYPE_GUID_BINARY_LENGTH];
struct
{
uint8_t time_low[4]; //!< Low 32 bits of the time
uint8_t time_mid[2]; //!< Middle 16 bits of the time
uint8_t time_hi_version[2]; //!< Version, time
uint8_t clock_seq_hi_lo[2]; //!< Variant, Clock sequence
uint8_t node[6]; //!< Node ID (48 bits)
};
};
};
…
} // end namespaces
}
#endif // end header latch
What about C code since the C language does not have namespaces? The short answer is: the same as C++. Use namespaces – which is this scenario are directories names – and prefix all symbols with the file’s directory name. This is the same rule as the C++ preprocessor rule above. Here is an example for the file: src/Cpl/Container/cslist.h
#ifndef Cpl_Container_cslist_h_
#define Cpl_Container_cslist_h_
#include "Cpl/Container/citem.h"
//
#ifdef __cplusplus
extern "C" {
#endif
/// This structure define the internal structure of a Singly Linked List.
*/
typedef struct CplContainerSList_T
{
CplContainerItemSListLinkage_T* headPtr; //!< Pointer to the first item
CplContainerItemSListLinkage_T* tailPtr; //!< Pointer to the last item
} CplContainerSList_T;
/// Initializes the list.
void Cpl_Container_SList_initialize( CplContainerSList_T* listToInitialize );
/// Moves the content of 'srcList' to the 'dstList'.
void Cpl_Container_SList_move( CplContainerSList_T* dstList,
CplContainerSList_T* srcList );
…
//
#ifdef __cplusplus
};
#endif
#endif // end header latch
Files
There are two gotchas when it comes to file naming. The first is case sensitivity or lack-of case sensitivity by the host file system (thanks Microsoft). If your code base is only developed on a Linux box or only on Windows box and will never be compiled on different host OS, then you can ignore this. Did I mention that my rule #11 for development in the Patterns in Machine book: Never believe never.
I strongly recommend that you define specific requirements around the case of file and directory names. And do not allow two files in the same directory have the same name that only differ by case, e.g. do not allow Item.cpp
and item.cpp
The second file naming issue is how to guarantee that all header file names are unique. This issue only get compounded when you starting using third-party packages that may or may not be well behaved with respect to their file naming. I will take a somewhat silly example. Let’s say you want to have the header file src/MyWidget/time.h
. If you follow traditional C/C++ conventions you would have the following statement: #include “time.h”
. Which is obviously a problem because the name conflicts with the C standard library. However, #include “MyWidget/time.h”
is fine and there is no name collision. While this example was a collision with the standard C library – it is just as simple to have a collision within your code base or with a third-party package header file. My recommendation is to require that directory path information be part of #include
statements. For example:
#ifndef Cpl_TShell_Processor_h_
#define Cpl_TShell_Processor_h_
#include "Cpl/TShell/Context_.h"
#include "Cpl/System/Mutex.h"
#include "Cpl/System/Api.h"
#include "Cpl/Text/FString.h"
#include "Cpl/Text/Frame/StreamEncoder.h"
#include "Cpl/Text/Frame/StreamDecoder.h"
…
#endif // end header latch
Style
My first recommendation for Style rules is to reinforce that the goal of the Style rules is to make the code more readable by the entire team (and future team members). Avoid the temptation to go full OCD[1] with your Style rules.
Next is: life is better for everyone when you do not have multiple options for rule X. For example, define the use of tabs or spaces – don’t allow both. Specify a specific curly brace notation (Allman, Whitesmith, etc.), don’t allow multiple notations.
Be brave, take a stand. Appeasement just causes confusion.
If you use tool like Doxygen you should always to define Style rules for the tool. In addition, have a no-Doxygen-warnings-allowed ruled. The no Doxygen warning rule can be enforced as part of your CI builds, i.e. run Doxygen on every CI build and fail the build if there are any Doxygen warnings. My experience with Doxygen is that it can be a little too robust in continuing after it encounters warnings. What I mean by this – the final Doxygen output can have errors it – but the error is result of an ‘upstream’ warning – and not anything wrong with the source-code/formatting where the error shows up.
[1] For the curious reader, I am a tabs, indent 4, and Allman brace notation – kind of guy.
Enforcement
Coding Conventions require discipline on the part of the developers to be followed. The more onerous an organization’s Conventions are, the less likely that the developers will faithfully follow them. Process and its sibling documentation, are some of the first casualties of schedule pressures. The moral of the story is to keep your Coding Conventions to a manageable set – especially in time of crisis. Or said another way, less is more with respect to Coding Conventions. The exceptions to this recommendation are when:
- You have access to (and the resources to configure and maintain) automated tools that verify Coding Convention compliance (e.g. PCLint). This removes the burden of compliance verification and allows the verification to be done as part of the Continuous Integration process.
- You have requirements to be compliant with Industry Standards (e.g. MISRA C, CERT C Coding Standard, etc.)
Summary
Software development is a team sport. It only makes sense to define and document a minimum set of best practices that individuals on the team are expected to follow. It is lost time when team members are continually debating (okay arguing) over basic best practices instead working on deliverables.
Every project should have documented Coding Conventions. And the Conventions should not be overly burdensome to comply with or to enforce.