The XZ project’s build system is and was hermetic. The exploit was right there in the source tarball. It was just hidden away inside a checked-in binary file that masqueraded as a test for handling of invalid compressed files.
(The ostensibly autotools-built files in the tarball did not correspond to the source repository, admittedly, but that’s another question, and I’m of two minds about that one. I know that’s not a popular take, but I believe Autotools has a point with its approach to source distributions.)
I thought that the exploit was not injected into the Git repository on GitHub at all, but only in the release tarballs. And that due to how Autoconf & co. work, it is common for tarballs of Autoconf projects to include extra files not in the Git repository (like the configure script). I thought the attacker exploited the fact that differences between the release tarball and the repository were not considered particularly suspicious by downstream redistributors in order to make the attack less discoverable.
First of all, even if that were true, that wouldn’t have much to do with hermetic builds as I understand the term. You could take the release tarball and build it on an air-gapped machine, and (assuming the backdoor liked the build environment on the machine) you would get a backdoored artifact. Fetching assets from the Internet (as is fashionable in the JavaScript, Go, Rust, and to some extent Python ecosystems) does not enter the equation, you just need the legitimate build dependencies.
Furthermore, that’s not quite true[1]. The differences only concerned the exploit’s (very small) bootstrapper and were isolated to the generated configure script and one of the (non-XZ-specific) M4 scripts that participated in its generation, none of which are in the XZ Git repo to begin with—both are put there, and are supposed to be put there, by (one of the tools invoked by) autoreconf when building the release tarball. By contrast, the actual exploit binary that bootstrapper injected was inside the Git repo all along, disguised as a binary test input (as I’ve said above) and identical to the one in the tarball.
To detect the difference, the distro maintainers would have needed to detect the difference between the M4 file in the XZ release tarball and its supposed originals in one of the Autotools repos. Even then, the attacker could instead have shipped an unmodified M4 script but a configure script built with the malicious one. Then the maintainers would have needed to run autoreconf and note that the resulting configure script differed from the one shipped in the tarball. Which would have caused a ton of false positives, because that means using the exact versions of Autotools parts as the upstream maintainer. Unconditionally autoreconfing things would be better, but risk breakage because the backwards compatibility story in Autotools has historically not been good, because they’re not supposed to be used that way.
(Couldn’t you just check in the generated files and run autoreconf in a commit hook? You could. Glibc does that. I once tried to backport some patches—that included changes to configure.ac—to an old version of it. It sucked, because the actual generated configure file was the result of several merges and such and thus didn’t correspond to the output of autoreconf from any Autotools install in existence.)
It’s easy to dismiss this as autotools being horrible. I don’t believe it is; I believe Autotools have a point. By putting things in the release tarball that aren’t in the maintainer’s source code (meaning, nowadays, the project’s repo, but that wasn’t necessarily the case for a lot of their existence), they ensure that the source tarball can be built with the absolute bare minimum of tools: a POSIX shell with a minimal complement of utilities, the C compiler, and a POSIX make. The maintainer can introduce further dependencies, but that’s on them.
Compare this with for example CMake, which technically will generate a Makefile for you, but you can’t ship it to anybody unless they have the exact same CMake version as you, because that Makefile will turn around and invoke CMake some more. Similarly, you can’t build a Meson project without having the correct Python environment to run Meson and the build system’s Python code, just having make or ninja is not enough. And so on.
This is why I’m saying I’m of two minds about this (bootstrapper) part of the backdoor. We see the downsides of the Autotools approach in the XZ backdoor, but in the normal case I would much rather build a release of an Autotools-based project than a CMake- or Meson-based one. I can’t even say that the problem is the generated configure script being essentially an uninspectable binary, because the M4 file that generated it in XZ wasn’t, and the change was very subtle. The best I can imagine here is maintaining two branches of the source tree, a clean one and a release one, where each release commit is notionally a merge of the previous release commit and the current clean commit, and the release tarball is identical to the release commit’s tree (I think the uacme project does something like that?); but that still feels insufficient.
> Unconditionally autoreconfing things would be better, but risk breakage because the backwards compatibility story in Autotools has historically not been good, because they’re not supposed to be used that way.
Yes and no. "make -f Makefile.cvs" has been a supported workflow for decades. It's not what the "build from source" instructions will tell you to do, but those instructions are aimed primarily at end users building from source who may not have M4 etc. installed; developers are expected to use the Makefile.cvs workflow and I don't think it would be too unreasonable to expect dedicated distro packagers/build systems (as distinct from individual end users building for their own systems) to do the same.
Focusing on the technical angle is imo already a step too far. This was first and foremost a social engineering exercise, only secondary a technical one.
this is very true, and honestly troubles me that it’s been flagged.
Even I’m guilty of focusing on the technical aspects, but the truth is that the social campaign was significantly more difficult to understand, unpick and is so much more problematic.
We can have all the defences we want in the world, but all it takes is to oust a handful of individuals or in this case, just one: or bribe them or blackmail them- then nobody is going to be reviewing because everybody believes that it has been reviewed.
I mean, we all just accept whatever the project believes is normal right?
It’s not like we’re pushing our ideas of transparency on the projects… and even if we were, it’s not like we are reviewing them either they will have their own reviewers and the only people left are package maintainers who are arguably more dangerous.
There is an existential nihilism that I’ve just been faced with when it comes to security.
unless projects become easier to reproduce and we have multiple parties involved in auditing then I’m a bit concerned.
> I mean, we all just accept whatever the project believes is normal right?
Not in this thread we don’t? The whole thing has been about the fact that it wasn’t easy for a distro maintainer to detect the suspicious code even if they looked. Whether anyone actually does look is a worthy question, but it’s not orthogonal to making the process of looking not suck.
Of course, if we trust the developer to put software on our machine with no intermediaries, the whole thing goes out the window. Don’t do that[1]. (Oh hi Flatpak, Snap. Please go away. Also hi NPM, Go, Cargo, PyPI; no, being a “modern programming language” is not an excuse.)
(The ostensibly autotools-built files in the tarball did not correspond to the source repository, admittedly, but that’s another question, and I’m of two minds about that one. I know that’s not a popular take, but I believe Autotools has a point with its approach to source distributions.)