

Benchmarking a small piece of generic code such as this library is challenging because no benchmark can represent every actual use-case. What looks good on paper might perform poorly in practice, and vice-versa. Neither the comparison similar solutions is trivial. It is obvious, that devector excels if it can take advantage of its small buffer optimization. However, comparing a small buffer enabled devector to a standard vector is just measuring the performance of the memory allocator.

It was tried to compare equivalent functionality of different containers, but the measurement gave surprising results. [1]

Instead of disclosing those unreliable performance numbers, the actual compiled code is presented here, to prove that the quality of the generated code matches the equivalent standard implementations. Below, a loop is shown, pushing back a constant value into a std::vector/devector. The code is compiled using GCC 5.2 with -O2 -march=native on a AMD FX 8320 processor.

Table 1.1. Assembly code of v.push_back(1) loop



  0: test   %rax,%rax
     je     <1>
     movl   $0x1,(%rax)
  1: add    $0x4,%rax
     mov    %rax,0x68(%rsp)
     inc    %rbx
     cmp    %rbp,%rbx
     jae    <reallocate>
     mov    0x68(%rsp),%rax
     movl   $0x1,0x3c(%rsp)
     cmp    0x70(%rsp),%rax
     jne    <0>

  0: mov    %ebx,%eax
     lea    (%r14,%rax,4),%rax
     test   %rax,%rax
     je     <1>
     movl   $0x1,(%rax)
  1: inc    %ebx
     inc    %r13
     mov    %ebx,0xb4(%rsp)
     cmp    0x8(%rsp),%r13
     jae    <reallocate>
     cmp    %ebx,%r12d
     jne    <0>
