disclaimer: this is an old post i took from my uni-days but before i remove everything i thought i’d repost some stuff here

A link i found by casually strolling through Hacker News in the train: http://mattwarren.org/2018/06/15/Tools-for-Exploring-.NET-Internals/

I liked https://github.com/SergeyTeplyakov/ObjectLayoutInspector in particular, because it shows you how inefficient your class/struct layout probably really is. Because padding is still an “issue” if you want to call it that. I learned this in the last semester in my computer architecture course: compilers don’t magically optimize your memory layout. I wish i would’ve learned that earlier, since it’s not something people tell you. If you’re using a struct consisting of a sole single byte data type and your allocated memory-block has a width of 8 byte, then only 1/8th of the actual block is used. The remaining 7 byte are padded when the “next” object has a width larger than the remaining space, e.g. the remainder is filled with zero-values. In our example with only one type shows the nature of this problem, but not the implications.

Let’s say you have a struct with some more types. The fitting type is always first-fit, not best-fit. This is obvious if you had a hand at stack smashing (you might actually want some information strictly after some other information.) - another neat example can be found on the english wikipedia page for data structure alignment. You will see that the “remaining bits” of a word will always be padded if the following type does not fit in the block. So you have a word-width of 4byte, and some objects that use (1,4,2) byte (remember: the order is immutable.) each of these objects are stored in a seperate block since they don’t fit in the same one. so the first and the third objects are padded, although they would’ve fit in the same block. Lets say those are simple structs:

type example struct {
	var first *verySimpleType // 1 byte
	var second *notSoSimpleType // 4 byte
	var third *kindofSimpleType // 2 byte
}

This would compile into:

type example struct {
	var first *verySimpleType // 1 byte
	var padding[0] // 3 byte
	var second *notSoSimpleType // 4 byte
	var third *kindofSimpleType // 2 byte
	var padding[1] // 2 byte
}

This could be easily improved by switching the second and the third variables:

type example struct {
	var first *verySimpleType // 1 byte
	var third *kindofSimpleType // 2 byte
	var second *notSoSimpleType // 4 byte
}

This would compile into:

type example struct {
	var first *verySimpleType // 1 byte
	var third *kindofSimpleType // 2 byte
	var padding[0] // 1 byte
	var second *notSoSimpleType // 4 byte
}

Holy hell, we would save a whole 4byte block. You think that’s not much? Let’s say we’re using a lot of data, like five million objects - in memory. That would be five million blocks with a width of 4 byte. That’s a whooping ~19 MiB, with just one block won. Consider an optimization of about 20 blocks, suddenly we are talking about ~400 MiB of memory. And now let’s say we are using the XMM registers, where a word has a length of 16 byte - that’s 2 GiB of memory.

While this is all theoretical improvement many people will never have to consider, it is - for my liking - a good example for things we have to optimize manually. And we shouldn’t rely on compilers, linters and code conventions more than we need to know about how computers actually work. Because “just get more memory” is an argument that i don’t ever want to use.