Rendered at 19:35:56 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
someonebaggy 3 hours ago [-]
This is a great demonstration of how compilers, good as they are, are a leaky abstraction, just like all abstravtions. And sometimes you need to peer below the abstraction.
im3w1l 5 hours ago [-]
So one thing I thought of when reading this is that very often it is known ahead of time (at runtime or even compile-time) how many push_backs will be done. The programmer could make a reserve call but doesn't bother since the efficiency gain is minimal.
The gain is minimal for doing this optimization at one location. But doing it everywhere, that could matter. Pushing back in a loop could maybe be optimized to a single allocation and a memcopy.
tialaramex 5 hours ago [-]
In C++ programmers are often taught not to use their reservation API for this purpose because it's designed in such a way that if you don't have perfect foresight you can destroy amortization and thus get much worse performance.
For example Bjarne Stroustrup suggests you should use reservation for "avoiding invalidation of iterators" instead.
oneshtein 4 hours ago [-]
API should have reserve_at_least(n) and reserve_exactly(n), instead of one reserve(n), which works as reserve_exactly(n), but described as reserve_at_least(n).
tialaramex 3 hours ago [-]
I agree that this is the better design. Rust calls these Vec::reserve and Vec::reserve_exact
Zig calls the analagous methods ensureTotalCapacity and ensureTotalCapacityPrecise
It's worth mentioning that talk of "exact" and "precise" may be misleading because it probably makes sense to (and despite that word exact Vec does explicitly say it might) discern that the actual allocation was big enough for slightly more items and increase the capacity accordingly.
Right now, many (most?) allocators if asked for enough room to store 7 Doodads, each 7 bytes in length with one byte alignment (thus total 49 bytes), may give you 64 bytes because it was easier, but can't or won't tell you about that. Rust's GlobalAlloc can't do better, nor can C's malloc family. But Rust's (unstable) Allocator trait and many modern malloc-descendents can tell you hey, here are 64 bytes instead. With that fixed your growable array type should consider this, after all instead of 7 Doodads, 64 bytes is enough for 9 Doodads, so it's free capacity.
dzaima 2 hours ago [-]
Note that glibc does provide a malloc_usable_size to query the size of a malloc'd block; not standard though of course.
A problem with just directly exposing such is it makes precise sanitizing impossible, as you'd have to tolerate some out-of-intended-bounds reads/writes. (and making the sanitizer always give exact-size allocations would also be bad as that'd end up not testing code paths that may break when they're not)
im3w1l 2 hours ago [-]
Sorry I completely messed up in communicating my point: That it would be nice if the compiler would do it for us automatically in the many simple cases. That's what I tried to say, but didn't say, when I said it could matter fixing it everywhere.
dzaima 6 hours ago [-]
The crappiness of shrink-wrapping in gcc and clang (but especially clang) annoys me a lot. It feels like there should be a quite decent amount of general performance to be gained from properly pushing more into slow paths (or, not necessarily even slow paths, but generally paths with high register pressure / uninlined function calls), never mind calling conventions in general.
On the push impl in the article - for non-x86 (and perhaps even on x86 for performance, though not size/instruction count) it'd be better to allow the size increment to reuse the size read done by the capacity check; with C++'s lack of suitable aliasing information, the interleaved memcpy/store prevents the compiler from deciding this itself.
RossBencina 6 hours ago [-]
> prevents the compiler from deciding this itself.
Interesting. I understand why it does that, but it makes me realise that I usually think "the compiler will reuse the loaded value/perform CSE" without considering the cases where it won't. Are there tools that will detect and warn/indicate this situation? e.g. "warning: could not reuse previously loaded value of 'foo' due to aliasing hazard 'memcpy' at line 234."
dzaima 6 hours ago [-]
Not that I know of; and such would necessarily have false-positives (...or, rather, entirely consist of potential false-positives) because you may actually want the re-read.
The gain is minimal for doing this optimization at one location. But doing it everywhere, that could matter. Pushing back in a loop could maybe be optimized to a single allocation and a memcopy.
For example Bjarne Stroustrup suggests you should use reservation for "avoiding invalidation of iterators" instead.
Zig calls the analagous methods ensureTotalCapacity and ensureTotalCapacityPrecise
It's worth mentioning that talk of "exact" and "precise" may be misleading because it probably makes sense to (and despite that word exact Vec does explicitly say it might) discern that the actual allocation was big enough for slightly more items and increase the capacity accordingly.
Right now, many (most?) allocators if asked for enough room to store 7 Doodads, each 7 bytes in length with one byte alignment (thus total 49 bytes), may give you 64 bytes because it was easier, but can't or won't tell you about that. Rust's GlobalAlloc can't do better, nor can C's malloc family. But Rust's (unstable) Allocator trait and many modern malloc-descendents can tell you hey, here are 64 bytes instead. With that fixed your growable array type should consider this, after all instead of 7 Doodads, 64 bytes is enough for 9 Doodads, so it's free capacity.
A problem with just directly exposing such is it makes precise sanitizing impossible, as you'd have to tolerate some out-of-intended-bounds reads/writes. (and making the sanitizer always give exact-size allocations would also be bad as that'd end up not testing code paths that may break when they're not)
On the push impl in the article - for non-x86 (and perhaps even on x86 for performance, though not size/instruction count) it'd be better to allow the size increment to reuse the size read done by the capacity check; with C++'s lack of suitable aliasing information, the interleaved memcpy/store prevents the compiler from deciding this itself.
Interesting. I understand why it does that, but it makes me realise that I usually think "the compiler will reuse the loaded value/perform CSE" without considering the cases where it won't. Are there tools that will detect and warn/indicate this situation? e.g. "warning: could not reuse previously loaded value of 'foo' due to aliasing hazard 'memcpy' at line 234."