Monday, November 20, 2017

Avoid Static Initializers

First of all, let's try to define what a static initializer is:
A static initializer is a small piece of code that the system dynamic linker runs when it loads a shared library or executable in memory. This code is executed just after the binary is reallocated and before the linker returns from dlopen(), LoadLibrary() or main().

C++ compilers generate static initializers for the following cases:
- non-POD that requires to be initialized by calling a constructor.
   e.g.: static std::string hello("hello");
- POD that initialization cannot take place at compiling time.
   e.g.:
     extern const int my_external_constant;
     const int my_local_constant { my_external_constant + 2 };

In these cases the compiler will generate code in a special hidden function and place it in the object file and initialize these variables for you. Then, when the static linker assembles all these binary object files it also creates a list a list of addresses of the static initializers, that will be read later by the dynamic linker at runtime.

Why should we avoid them

Static C++ variable initializer make loading libraries slower, and sometimes really slower (because of page faults):
- They make loading libraries slower because the must run before dlopen() or LoadLibrary() returns.
- They are small pieces of code, but they use to be randomly scattered through the final binary, which means that loading them forces the paging of a whole machine page from disk. On a cold startup this is terrible.
- They also affects global variables that are randomly spread over the library's .data section, resulting in more PSS (Portion Set Size) usage (i.e. private system physical page that cannot be shared, and requiere swap).
- They can make library unloading slower, because running the destructors of all global variables that were initialized in them is required before dlclose() returns, or the program exits.
- Finally, they prevent globant constant variables from being stored int the .text section of the final library (the one that can be shared read-only between all processes, without swapping). Instead these variables are put into mutable storage (e.g. the .data section) and will consume private non-shared memory.

Removing static initializers in big projects has been result in saving time in one order of magnitud.

Putting the things in code (just a little bit)

class Foo {
   public:
     Foo(const int value) : value{ value } {}
     int get() const { return my_value; }
     int add(const Foo& other) { my_value += other.get(); return my_value; };
     static Foo fromInt(const int value) { return Foo(value); }
   private:
     int my_value;
};

Now some global variables:

Foo foo_val_1(10);
Foo foo_val_2 = Foo::fromInit(2);

As these variables' types has a constructor, the compiler has to generate a static initializer for this source file. It may be a function to initialize foo_val_1 and foo_val_2 before anything else.

Without optimizations, this might look like:

void __global_initializer_for_foo.cpp_debug() {
   foo_val_1.Foo::Foo(10); // Call Foo constructor on foo_val_1
   Foo temp(2); // Create temp object with value 2
   foo_val_2.Foo::Foo(temp); // Call auto-generated Foo default copy-constructor
}

With optimizations something like:

void __global_initializer_for_foo.cpp_release() {
   foo_val_1::value_ = 10; // Inlined constructor
   foo_val_2::value_ = 2; // Inlined copy construction with temp obj.
}

In this specific case, the function only contains trivial assignments, and could be completely removed for the compiler, putting directly the initialization values directly into the objects file's .data section.

But unfortunately, this is very often not the case. Even for trivial uses, the compiler will, often, generate the static initializer function.

Static locals

One way to avoid static initializers is to place the variable into accessor functions, make them local statics. The C++ specification mandates that these objects are only constructed when the control flow reaches these declarations:

const Value& ​get_constant() {
   ​static const ​value magical_constant(complex_computation());
   ​return​ magical_constant;
}

No static initializer is generated here, but this function will contain special extra code to ensure that the magical_constant object is only constructed on the first call to get_constant().

But there are two problems with this technique:
- C++ local static construction is not guaranteed to be thread-safe on all systems. Some compilers, like Visual Studio implement it in a way that make it useless for use in multi-thread environment.
- Under the hood, it turns any constant into a mutable variable. Which potentially makes your code larger, slower and more memory consumer.

Something important to mention is that some compilers, like GCC, have the option to disable thread-safe local static initialization code, which makes your code more cross-platform.

Finding static initializers

Identifying code that generates static initializers from those that do not, might be hard and error-prone. It is better to look at the generated binary and look for them. There are some ways to do that:

- Object files will contain a special named function corresponding to the initialization code. GCC generates __GLOBAL__I_you_source_file. e.g. __GLOBAL__I_foo.cpp. Note the dot, which is not possible for ordinary C++ functions.
- Shared libraries contain static initializers entries. This varies for each platforms. For x86 and x64, look at the .ctors section. For ARM look for the .init_array
section. These will contain a list to function entry points in the binary.

Summary

Prefer POD global variable over non-POD. e.g:
   ​char​ name[128];
instead of:
   ​std::string​ name;

No comments:

Post a Comment