Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was trying to figure out why I could write a ring buffer in Go and it would perform so much faster than the equivalent C# implementation.

This is part of the answer. Despite an attempt to align a struct to leverage the cache, it was getting rearranged.

Of course, there is also less inlining and the schedule behind the async command really doesn't like it when you get more active pseudo-threads than cores.

In the end, the data flow TPL was the only implementation that was roughly in the same ball park.



Yeah the .NET JITter is free to do what it wants with the field layout, mostly to pack/align things for better perf.

BTW have you looked at Distuptor.NET[0], the .NET port of Disruptor? It's a high-speed ring buffer written in .NET, that might contain some tricks to get your perf closer to the Go version.

[0] https://github.com/disruptor-net/Disruptor-net


Forgive me. What do you mean by "pack/align things for better perf"? Specifically, in what use cases would this yield better performance? And by what measurements?


It's related to the CPU, there's a nice wiki article[0] that goes into it a bit more. Also in [0] it mentions that SSE instructions will only work with 128-bit aligned data, so sometimes it's a requirement. Alignment can make memory access more efficient, but means that data can take up more space. Packing is the opposite of this, rearranging fields, so that they take up the smallest space possible.

So the 2 things are a trade-off.

[0] http://en.wikipedia.org/wiki/Data_structure_alignment


Data need to be aligned to multiples of its size on most CPU architectures to access them without performance degradation (or at all, even). Reordering members for smaller object size mostly makes sense so that more of them can fit into cache.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: