erenon.hu

Assume you want to create a preprocessor macro that takes a macro F and a variable number of arguments. When expanded, it invokes F with each argument:

#define FOREACH(F, ...) //...

FOREACH(G, a, b, c) // expands to: G(a) G(b) G(c)

Such a macro can be useful for example for a log library. There are several ways to implement it. Here, we compare three different implementations, with regard to complexity, ease of change, and compile time.

The complete source code used for testing is available at erenon/pp-foreach-comparison.

Contenders

The first implementation is taken from binlog. It consists of a macro that counts the number of arguments, and N foreach macros, one for each arity. It is generated by a python script, binlog.py. N is a parameter of the generator script, here, N=512. This version is slightly different from the original one used in binlog, to match the capabilities of the other contenders, for a fair comparison.

import sys

N = 512

print('#define CAT(a, b) CAT_I(a, b)')
print('#define CAT_I(a, b) a ## b')

sys.stdout.write('#define COUNT(...) COUNT_I(__VA_ARGS__')
for i in reversed(range(N)):
    sys.stdout.write(f',{i}')
print(')')

sys.stdout.write('#define COUNT_I(')
for i in range(1, N+1):
    sys.stdout.write(f'a{i},')
print(f'...) a{N}')

print('#define FOREACH(F, ...) CAT(FOREACH_, COUNT(__VA_ARGS__)) (F, __VA_ARGS__)')

for i in range(1, N+1):
    sys.stdout.write(f'#define FOREACH_{i}(F')
    for j in range(i):
        sys.stdout.write(f',a{j}')
    sys.stdout.write(')')
    for j in range(i):
        sys.stdout.write(f' F(a{j})')
    sys.stdout.write("\n")

For N=3, it looks like this:

#define CAT(a, b) CAT_I(a, b)
#define CAT_I(a, b) a ## b
#define COUNT(...) COUNT_I(__VA_ARGS__,2,1,0)
#define COUNT_I(a1,a2,a3,...) a3
#define FOREACH(F, ...) CAT(FOREACH_, COUNT(__VA_ARGS__)) (F, __VA_ARGS__)
#define FOREACH_1(F,a0) F(a0)
#define FOREACH_2(F,a0,a1) F(a0) F(a1)
#define FOREACH_3(F,a0,a1,a2) F(a0) F(a1) F(a2)

Disclaimer: I’m the author of binlog.

The second implementation is taken from swansontec. It is a complex, but documented solution. It works with up to N=364 arguments. The idea can be expanded to support more arguments. The entry point of the solution is renamed to FOREACH in the snippet below:

/* Created by William Swanson in 2012. Public Domain. */

#define EVAL0(...) __VA_ARGS__
#define EVAL1(...) EVAL0(EVAL0(EVAL0(__VA_ARGS__)))
#define EVAL2(...) EVAL1(EVAL1(EVAL1(__VA_ARGS__)))
#define EVAL3(...) EVAL2(EVAL2(EVAL2(__VA_ARGS__)))
#define EVAL4(...) EVAL3(EVAL3(EVAL3(__VA_ARGS__)))
#define EVAL(...)  EVAL4(EVAL4(EVAL4(__VA_ARGS__)))

#define MAP_END(...)
#define MAP_OUT
#define MAP_COMMA ,

#define MAP_GET_END2() 0, MAP_END
#define MAP_GET_END1(...) MAP_GET_END2
#define MAP_GET_END(...) MAP_GET_END1
#define MAP_NEXT0(test, next, ...) next MAP_OUT
#define MAP_NEXT1(test, next) MAP_NEXT0(test, next, 0)
#define MAP_NEXT(test, next)  MAP_NEXT1(MAP_GET_END test, next)

#define MAP0(f, x, peek, ...) f(x) MAP_NEXT(peek, MAP1)(f, peek, __VA_ARGS__)
#define MAP1(f, x, peek, ...) f(x) MAP_NEXT(peek, MAP0)(f, peek, __VA_ARGS__)

#define MAP_LIST_NEXT1(test, next) MAP_NEXT0(test, MAP_COMMA next, 0)
#define MAP_LIST_NEXT(test, next)  MAP_LIST_NEXT1(MAP_GET_END test, next)

#define MAP_LIST0(f, x, peek, ...) f(x) MAP_LIST_NEXT(peek, MAP_LIST1)(f, peek, __VA_ARGS__)
#define MAP_LIST1(f, x, peek, ...) f(x) MAP_LIST_NEXT(peek, MAP_LIST0)(f, peek, __VA_ARGS__)

/**
 * Applies the function macro `f` to each of the remaining parameters.
 */
#define FOREACH(f, ...) EVAL(MAP1(f, __VA_ARGS__, ()()(), ()()(), ()()(), 0))

/**
 * Applies the function macro `f` to each of the remaining parameters and
 * inserts commas between the results.
 */
#define MAP_LIST(f, ...) EVAL(MAP_LIST1(f, __VA_ARGS__, ()()(), ()()(), ()()(), 0))

The third implementation is taken from hobbes. It is a complex solution. It works with up to N=513 arguments, probably it can be changed to support more. The entry point of the solution is renamed to FOREACH in the snippet below:

// Copyright 2016 Morgan Stanley
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// https://github.com/morganstanley/hobbes
#define HOBPP_FIRST(a, ...) a
#define HOBPP_SECOND(a, b, ...) b
#define HOBPP_CAT(a,b) a ## b
#define HOBPP_IS_PROBE(...) HOBPP_SECOND(__VA_ARGS__, 0)
#define HOBPP_PROBE() ~, 1
#define HOBPP_NOT(x) HOBPP_IS_PROBE(HOBPP_CAT(HOBPP_SNOT_, x))
#define HOBPP_SNOT_0 HOBPP_PROBE()
#define HOBPP_BOOL(x) HOBPP_NOT(HOBPP_NOT(x))
#define HOBPP_IF_ELSE(condition) HOBPP_SIF_ELSE(HOBPP_BOOL(condition))
#define HOBPP_SIF_ELSE(condition) HOBPP_CAT(HOBPP_SIF_, condition)
#define HOBPP_SIF_1(...) __VA_ARGS__ HOBPP_SIF_1_ELSE
#define HOBPP_SIF_0(...)             HOBPP_SIF_0_ELSE
#define HOBPP_SIF_1_ELSE(...)
#define HOBPP_SIF_0_ELSE(...) __VA_ARGS__
#define HOBPP_EMPTY()
#define HOBPP_EVAL(...) HOBPP_EVAL256(__VA_ARGS__)
#define HOBPP_EVAL256(...) HOBPP_EVAL128(HOBPP_EVAL128(__VA_ARGS__))
#define HOBPP_EVAL128(...) HOBPP_EVAL64(HOBPP_EVAL64(__VA_ARGS__))
#define HOBPP_EVAL64(...) HOBPP_EVAL32(HOBPP_EVAL32(__VA_ARGS__))
#define HOBPP_EVAL32(...) HOBPP_EVAL16(HOBPP_EVAL16(__VA_ARGS__))
#define HOBPP_EVAL16(...) HOBPP_EVAL8(HOBPP_EVAL8(__VA_ARGS__))
#define HOBPP_EVAL8(...) HOBPP_EVAL4(HOBPP_EVAL4(__VA_ARGS__))
#define HOBPP_EVAL4(...) HOBPP_EVAL2(HOBPP_EVAL2(__VA_ARGS__))
#define HOBPP_EVAL2(...) HOBPP_EVAL1(HOBPP_EVAL1(__VA_ARGS__))
#define HOBPP_EVAL1(...) __VA_ARGS__
#define HOBPP_DEFER2(m) m HOBPP_EMPTY HOBPP_EMPTY()()
#define HOBPP_HAS_PARGS(...) HOBPP_BOOL(HOBPP_FIRST(HOBPP_SEOAP_ __VA_ARGS__)())
#define HOBPP_SEOAP_(...) HOBPP_BOOL(HOBPP_FIRST(HOBPP_SEOA_ __VA_ARGS__)())
#define HOBPP_SEOA_() 0
#define FOREACH(f, VS...) HOBPP_EVAL(HOBPP_MAPP(f, VS))
#define HOBPP_MAPP(f, H, T...)        \
  f(H)                                \
  HOBPP_IF_ELSE(HOBPP_HAS_PARGS(T))(  \
    HOBPP_DEFER2(HOBPP_SMAPP)()(f, T) \
  )(                                  \
  )
#define HOBPP_SMAPP() HOBPP_MAPP

Compile Time Comparison

To measure the time it takes to preprocess a solution, for each solution, a source file is generated, that includes the implementation, and invokes FOREACH with M arguments, 32 times. Then the time the compiler spends in user space, while preprocessing the file is measured. Finally, measured time is plotted against the number of arguments:

Plot showing how much time different solutions take to preprocess with gcc, as the number of arguments increases

Plot showing how much time different solutions take to preprocess with clang, as the number of arguments increases

The measurement was done using gcc 11.1.0 and clang 13.0.0, on Linux 5.16, running on a x86_64 AMD machine. From the plots we can infer that:

Preprocessing binlog takes a small, constant amount of time, regardless the number of FOREACH arguments.
The preprocessing time of swansontec and hobbes linearly increases as the number of arguments increases. The hobbes solution is roughly two times slower than swansontec.
There is no meaningful case where swansontec or hobbes would be faster than binlog.
clang is almost three times slower to preprocess these files, than gcc.

Implementation Comparison

The three solutions are also different implementation wise. I’m giving here a comparison along a mix of subjective and objective axes:

	binlog	swansontec	hobbes
Complexity	low	high	high
Extensibility	easy	difficult	difficult
Documentation	no	yes	yes
File Size (N<=512)	1.6 MB	2 kB	2 kB
Generated Code	yes	no	no
License	Apache 2	Public Domain	Apache 2

By Complexity, I mean the effort required to understand and verify the correctness of the implementation. By Extensibility, I mean the effort required to change the behavior of the implementation, e.g: support 0 arguments, pass along a constant data value, increase N, inject a token between F calls, etc.

Verdict

Interestingly, less code doesn’t mean faster to compile (preprocess) code. Despite that the generated source code of the binlog solution is almost a 1000 times bigger, it is still favourable compile time performance wise. Furthermore, the straightforward, simple solution is easier to understand and modify. Therefore, if code generation or having large source files is not an issue, I see no reason to use a more complicated solution.