What I would like to like is something like the generated call of `call_foo1` substituted verbatim into the generated code of `baz1` in the place of the function call. That way at the point of calling `bar` there is much less stack space allocated, minimizing stack usage.
But maybe this would pessimize other things, or for some weird reason is actually incorrect.
gcc at least deallocates the stack before tail-calling `baz`, I don't know if that is "shrink wrapping" or just plain TCO.
You can get into weird cases with DWARF based unwinders when two paths through a function with different stack depths makes it impossible to reliably unwind.
LLVM has bugs in this regard with calls to variadic calls, since after a certain number of arguments have been passed in registers, you start pushing parameters on the stack (ABI dependant).
See https://github.com/gcc-mirror/gcc/blob/master/gcc/shrink-wra... or https://llvm.org/doxygen/ShrinkWrap_8cpp_source.html.