The optimization exists in MSVC, LLVM, and GCC. It's called "shrink wrapping". M...

leni536 · on July 12, 2021

None of gcc, clang or msvc deallocate the frame or part of the frame in `baz2` before calling `bar` the first time here:

https://godbolt.org/z/rvx36sM74

What I would like to like is something like the generated call of `call_foo1` substituted verbatim into the generated code of `baz1` in the place of the function call. That way at the point of calling `bar` there is much less stack space allocated, minimizing stack usage.

But maybe this would pessimize other things, or for some weird reason is actually incorrect.

gcc at least deallocates the stack before tail-calling `baz`, I don't know if that is "shrink wrapping" or just plain TCO.

ndesaulniers · on July 12, 2021

You can get into weird cases with DWARF based unwinders when two paths through a function with different stack depths makes it impossible to reliably unwind.

LLVM has bugs in this regard with calls to variadic calls, since after a certain number of arguments have been passed in registers, you start pushing parameters on the stack (ABI dependant).