Monday, March 7, 2011

Tail-padding reuse in GCC

Today, let me take you on a very deep dive into a corner of the C++ language.

First, have a look at the following program. Think about it before you read the rest of this article. You might even want to load it up in your compiler, but before you do so: what do you think it will print?


# include <stdio.h>

class Super {
short s;
char c1;
} ;

struct Sub : public Super
{
char c2;
} ;

int main()
{
printf("Size of Super is %d\n", sizeof(Super));
printf("Size of Sub is %d\n", sizeof(Sub));
return 0;
}



Do you see what the program is doing? It's declaring a sub-class which extends the super-class, and adds some additional state, and it is computing the size of the memory footprint that will be used for an instance of the super-class, and for an instance of the sub-class.

So, armed with that knowledge, and armed with the knowledge that a short generally requires 2 bytes, and a char generally requires 1 byte, what do you think the program prints?

The answer, at least on the various versions of GCC that I've tried on various platforms, is that program prints:

Size of Super is 4
Size of Sub is 4


You might find this a surprising result; at least, I did. The sub-class Sub adds additional state to its super-class Super, so how can the two classes have the same sizeof?

The mystery may start to clear up in your mind a little bit if we add a couple more lines to the program, so that it now looks like this:


# include <stdio.h>

class Super {
short s;
char c1;
} ;

struct Sub : public Super
{
char c2;
} ;

int main()
{
Sub sub;

printf("Size of Super is %d\n", sizeof(Super));
printf("Size of Sub is %d\n", sizeof(Sub));
printf("Offset of c2 in Sub is %d\n", (char *)&sub.c2 - (char *)&sub);
return 0;
}


Now what do you think it prints?

For me, it prints:

Size of Super is 4
Size of Sub is 4
Offset of c2 in Sub is 3


This, too, is quite startling behavior! How can the offset of a field in the sub-class be a smaller value than the size of the super-class? Doesn't the first field in the sub-class always have to be laid out in memory strictly after the memory space used by the super-class?

It turns out that this behavior is something called "tail-padding reuse", I believe, and it dates back to GCC Version 3.2, and the adoption of a specification called the C++ Application Binary Interface, which specifies

the Application Binary Interface for C++ programs, that is, the object code interfaces between user C++ code and the implementation-provided system and libraries. This includes the memory layout for C++ data objects, including both predefined and user-defined data types, as well as internal compiler generated objects such as virtual tables.


The C++ ABI is a documented intended for authors of compilers. Unfortunately, even for them, reading and understanding it is tricky business.

But the bottom line is that, for reasons of memory efficiency, in cases such as the code that I show in this sample program, the compiler is allowed to (in fact, is actually encouraged to) pack the data members of the sub-class closely together with the data members of the super class, eliminating the tail-padding that would otherwise have occurred, and shrinking the memory footprint of the resulting code.

Sounds great, doesn't it?

I think that, in general, it is great. Unless, that is, your code

  • assumes that when a sub-class adds state to a super-class, the size of the sub-class will always exceed the size of the super-class, or

  • assumes that the offset of the first field in the sub-class will be no less than the size of the super-class, or

  • assumes that you could safely write a bit of code such as:

    sub.c2 = 'b';
    memset(&sub, '\0', sizeof(Super));
    if( sub.c2 == 'b' ) { ... }

    and expect that the body of the if statement would be executed.



For most C/C++ programmers that I know, these are reasonable and widely-held assumptions.

But, clearly, they are not correct assumptions.

So, be careful out there, now that you know about tail-padding reuse in GCC!

No comments:

Post a Comment