I think you are not sufficiently familiar with this field, Dijkstra wrote "Discipline" years before computational complexity results on program analysis were known, and I think you misunderstand the tremendous gap between what works on small examples in the lab and what is scalable enough to be applied widely in the field and the difference between something that might be possible in principle vs. something that's affordably feasible in practice. We know more than Dijkstra did, the techniques have improved, and still no "ordinary" (non-niche) real-world software of any significant size has been written using formal refinement to date. If you claim this is possible today -- go ahead and do it for a medium-sized program of say, at least 100KLOC. A formalism like TLA+ is well suited for this task, so go ahead and try.
I am not American, I don't like making generalisations about Americans or any other nationality, and, as you could see from my blog, I am very much in favour of employing maths in system design -- that's what TLA+ is all about. But confusing "mathematical object" with "can be feasibly reasoned about" is just ignoring computational complexity. I can give you an example of two subroutines, each only a few lines long and each very simple to understand, and yet you won't be able to answer a most basic question about their composition. Complexity theory tells us that mathematical problems have an inherent hardness, and the fact you can write them does not mean you can feasibly solve them. There are no simple answers here. In some carefully constructed ways formal methods can help in some specific circumstances, different methods helping in different use-cases, and you cannot just say that it's "better" overall. For example, while programs of millions of lines of code are routinely tested, no one has ever managed to formally reason about deep properties of programs of this size.
I take it that means you've never read Discipline or Predicate Calculus and Program Semantics.
I don't doubt your expertise in TLA+, as I hope you can see from my studious agreement with you on the items that are clearly within your expertise, but on this point you're not even wrong. You are attempting to refute the claim that a human programmer can derive a program that has desired properties more effectively with mathematical reasoning than without it by offering the objection that mechanical verification has high computational complexity[1]. That is sheer nonsense. It's like saying that because factoring products of primes is hard that it's in no way feasible to multiply them rigorously so just use an estimate instead.
As an aside, the avionics for the Space Shuttle was over 250KLOC and was proved correct from the bottom up. And it functioned flawlessly. This was done before computational complexity results on program analysis were known, but that lack of knowledge was observably irrelevant to building correct software. Sadly, the hardware left a bit to be desired.
No ordinary, real-world software has ever been fully and affordably formally proven. There have been specialised 1MLOC programs checked with a model checker, but the techniques don't generalise. I would be interested in a source to your claim about the Space Shuttle software, because even though that's both high-cost and niche software, I think you might have the details wrong. Anyway, I encourage you to try to affordably write ordinary software in this way. This is certain to win you some high-profile award, as no one has been able to do it so far. Either you succeed or you fail, but there's really no point in insisting it can be done without pointing to anyone who has done it.
This discussion would be more fruitful if you carefully read my responses. I'm not actually disagreeing with you, but you're missing my point. I don't mean to be rude, but you really appear to be suffering from man-with-a-hammer syndrome here. As I've already shown, what can be mechanically verified with a model checker has no bearing on a programmer's ability to formally derive totally correct software. It would be extremely surprising if it were otherwise, since that would mean that one of the most fundamental theorems in computing science is incorrect.
> Anyway, I encourage you to try to affordably write ordinary software in this way
I already stated in an earlier response that I've been doing this professionally. And I have been for over a decade.
Regarding the Shuttle, you'll have to hit a library, the only online source I can find is Feynman's minority report and it doesn't go into great detail. It does however show how properly decomposing a complex system into component parts is necessary to keep it intellectually manageable. And bear in mind that while Feynman is best known as a physicist, he knew enough about computing science to write a book on it.
And my point is that discussion of hypothetical ability is not what I am talking about. I can show you why there are good reasons to doubt the programmer's ability to reason about a large program in light of computational complexity results accumulated in the last twenty years (and you are simply wrong about the bearings of model checking to the situation), but instead of trying to argue over hypotheticals, I'm merely pointing out that the ability you believe exists has never, to the best of my knowledge, actually been realised in a scalable, repeatable, affordable way in the field. If you claim you could achieve it, please try.
> And I have been for over a decade.
What exactly have you done for over a decade?
I am quite familiar with predicate transformers, but the distance between using Hoare logic and the like on code units -- which is relatively easy and pretty widely done (e.g. with SPARK) -- and formally (which means mechanically checkable, even if it is human directed) deriving a large program from a specification is gargantuan. Proving/deriving code units from contracts and the like is not the problem I'm talking about. Neither Dijkstra nor anyone else has come remotely close to achieving it, and achieving it is a guaranteed Turing award.
> I can show you why there are good reasons to doubt the programmer's ability to reason about a large program in light of computational complexity results accumulated in the last twenty years (and you are simply wrong about the bearings of model checking to the situation)
OK. Show me.
This is why it's important to write the specification with an eye toward keeping the implementation manageable. This is probably a drawback of using a tool like TLA+ to check a specification. The tool makes it relatively easy to write a terribly structured specification, check it with TLA+ and declare victory, and leave the programmer in a hopeless situation. A well designed specification will unsurprisingly mirror well designed code[1]. What I have empirically tested is that keeping my functions/methods simple enough to be easily specified also keeps them easy to formally derive correct program text for. The problem of composition is treated entirely separately. You don't need to understand the fully concatenated predicate transformers for each function/method, you just need to know what wp (or another appropriate predicate transformer) is for each and be sure to satisfy it.
> What exactly have you done for over a decade?
Please understand this is a greatly abridged response. My general habit is to look at the requirements and infer the implied specification since a formal specification is sadly almost always missing. Thankfully most business code is theoretically trivial so postulating a reasonable specification from the requirements is easy. So, I then document my assumptions about what I believe the customer is actually asking for formally. Before I write code I think about the postcondition that I want that code to establish and determine what preconditions, if any, must hold. I put this as comments in the test modules. Then I do a more or less standard TDD approach, with the tests trivially derived from the specification[2], because management likes tests and they're paying the bills. Then I derive the code to make the tests pass from the specification using predicate transformers from the precondition to the desired postcondition. And then I run the tests and move on to the next properly isolated concern.
Obviously no human discipline is infallible and I still make mistakes, but I make far fewer mistakes than I would using the typical sloppy approach most programmers use. One time I got chewed out by my manager for allowing one bug to make it into production in nine months, because I had raised expectations of the quality of my code so high. And by the way I've always just presented my methodology as TDD because of the mathematical phobia that's so prevalent in American industry. If I tell management that this code is easy to maintain, you just need programmers proficient in the predicate calculus they will dismiss me. If on the other hand I say look I have well over 90% branch coverage I'll get an attaboy and management will be happy.
By this time I hope it's clear that my thoughts on the subject were not formed in the lab, but at a Big Tech Company You've Definitely Heard Of. I hope we can agree that the general tendency in industry to ignore or even deride what computing science can teach us is unfortunate.
> formally (which means mechanically checkable, even if it is human directed)
Dubious[3]. A mathematical formalism does not require algorithmic automated checking to exist. And even if said checking is decidable, which it mostly isn't, it may not run in a practical time on any possible mechanical computer we can conceivably build. However if you mean like any other formal mathematics where "the symbols do the work" so long as you follow the appropriate rules of derivation then I am in complete agreement.
[1] Proper separation of concerns and such.
[2] Obviously no tests are going to exhaustively check a non-trivial specification, but my approach is still strictly better than the typical practice of primarily optimizing for satisfying coverage tools.
Here is a simple specification: write a program that, given a positive even integer, returns two prime integers, whose some is the input, and never fails to do so. Start with a formal statement of the requirement, and derive a program that provably satisfies it. Decompose your program however you like.
> Then I derive the code to make the tests pass from the specification
Do you use predicate transformers all the way from a high-level spec to each and every line in a 1MLOC distributed and interactive system, with management consoles, reports and ten-thousand users per server?
> A mathematical formalism does not require algorithmic automated checking to exist.
It most certainly does. That is the literal meaning of "formal" in formal logic/method/synthesis/verification. A formal logic has a proof theory made of inference rules. It means that conclusions are reached by manipulation of the syntactic form -- hence formal -- without use of any mathematical intuition. Alan Turing then showed that a proof system can always be described as a Turing machine, and that formal = mechanical. There is a huge gap between informal mathematical reasoning and formal reasoning. I believe I've seen someone mention an exponential blowup in proof size of formal proofs compared to mathematical proofs.
> Obviously no human discipline is infallible and I still make mistakes, but I make far fewer mistakes than I would using the typical sloppy approach most programmers use.
There's a very, very wide gap between "disciplined" and formal. I am all for disciplined and for using formal methods in various places. End-to-end formal derivation/verification, i.e. from high-level spec to code, in a scalable, feasible manner, is yet to be achieved.
> Here is a simple specification: write a program that, given a positive even integer, returns two prime integers, whose some is the input, and never fails to do so.
I'm not going to solve Goldbach's conjecture on hackernews or anywhere else. I wish I could, I really do, but my incapability there doesn't invalidate my arguments. I'm not even annoyed though, that was a cute trick and I appreciate your cleverness. Fortunately Fields Medal tier mathematical problems have never come up in my professional programming career.
> It most certainly does. That is the literal meaning of "formal" in formal logic/method/synthesis/verification. A formal logic has a proof theory made of inference rules. It means that conclusions are reached by manipulation of the syntactic form -- hence formal -- without use of any mathematical intuition.
Once again, I completely agree with this. Let the symbols do the work. But we keep coming back to the same misunderstanding. A program can feasibly be derived that can't feasibly be verified without access to the discrete documented steps of the derivation[1]. Showing the work of the derivation is sufficient to convince any honest interlocutor of the correctness of the proof. It may well be enough to allow an algorithm to perform the verification too, I'll defer to your expertise on that. In any event, as I said before using the analogy of primes, verification is rather a lot harder than construction.
Edit: Also I want you to know I'm finding this a pleasant discussion and am thankful that you're taking the time to help develop my thoughts and hopefully yours too.
[1] By analogy, factoring the product of two primes is easy when in addition to being given the product you are also given the multiplier and multiplicand.
> Fortunately Fields Medal tier mathematical problems have never come up in my professional programming career.
Ah, but you see, I've used this problem only because you could be easily convinced of its difficulty because it's a well-known one and others have tried solving it unsuccessfully. But hard problems come up all the time, especially when concurrency is involved. You just don't know they're hard at first glance. For example, the specification that a database be kept on multiple computers to offer redundancy without loss of data is a staple of distributed computing, but finding an algorithm that does it (perhaps together with some other latency requirements) is extremely hard and it took many years to find the first one that could do it.
That easy specifications are easy to derive doesn't help us, because they might also be the same programs that are easy to test or be convinced of their correctness in some other ways. The whole point of formal methods is to help us with the hard cases, which, sadly, are very frequent.
> A program can feasibly be derived that can't feasibly be verified without access to the discrete documented steps of the derivation.
Not if it's done formally. A formal derivation guarantees proof, and, in fact, the problem of verification is easily reducible to derivation, i.e., derivation is no easier. Here is the proof: X is a program, and so, also a specification, and P(X) is some property that you want to verify for X. A formal derivation of a program from the specification X ∧ P(X) will, therefore, prove P(X). This is the "trick" I used with the "Goldbach program", except it's not a trick, but a basic result in the study of the hardness of program analysis, a subject that's been extensively studied for a few decades now. The claim that it's easier to derive than to prove is just mathematically false. With regards to claims about informal "proofs" or "derivations" it's hard to say anything, because the problem is not well-stated.
> finding an algorithm that does it (perhaps together with some other latency requirements) is extremely hard and it took many years to find the first one that could do it.
I agree. And correctly implementing those algorithms is much easier using a formal derivation with a mathematical tool such as the predicate calculus than it is by just eyeballing it and hoping your tests catch all errors.
> But hard problems come up all the time, especially when concurrency is involved
Predicate transformers are well suited for reasoning about nondeterminism and concurrency.
> Not if it's done formally
Program derivation with predicate transformers is done formally. It's a chain of equivalences, implications, and consequences as appropriate to the problem. The program text is literally developed as a proof, and each transformation of the code must be justified by axiom, theorem, or lemma and at the end of that derivation all properties of the specification hold. You can't possibly be claiming that no formal mathematics was done before the invention of automated checkers, because that would be both ridiculous and observably false.
> Here is the proof: X is a program, and so, also a specification,
I disagree. A specification maps to either an empty or countably infinite set of programs that satisfy it. Also all programs satisfy countably infinitely many specifications. They are not isomorphic or even dual. In this case that conflation doesn't seem to impair our understanding too much though.
> and P(X) is some property that you want to verify for X.
Right, and the task of the programmer is to write a program text that provably establishes that P(X) holds everywhere in the program state space.
> A formal derivation of a program from the specification X ∧ P(X) will, therefore, prove P(X)
Yes this is what I meant when I said you should use the same techniques to derive the program text that you would to derive any other mathematical proof.
> it's not a trick, but a basic result in the study of the hardness of program analysis
It's a trick because it's a clever way of saying write a program for which there is no known algorithm. As of now nobody is going to be able to do that regardless of their methodology. I get the impression that you mean this to be a compelling point, but I'm not grasping how I'm better off not using mathematical reasoning to derive the programs that I do know how to write.
> The claim that it's easier to derive than to prove is just mathematically false
It's not even false it's incoherent. But I didn't make that claim. Both derivation and verification are ways to prove a program satisfies a specification.
As you say, a derivation of a program that satisfies a specification is a proof. Specifically, a proof by construction. The claim is that it's considerably easier to derive a program that satisfies some property than it is to verify if an arbitrary program satisfies that property. For example say we have a specification S: "returns a product of two primes" and two programs. P1: "Return 2^82589933-1 * 2^77232917-1" and P2: "Return 2^8258993365541237147-1". The proof of the correctness of P1 is trivial because I constructed it to be so[1]. As for P2 I have no idea if it satisfies S, but I am certain that it'll be considerably harder to convince myself and others that it does than it was for P1.
> With regards to claims about informal "proofs" or "derivations" it's hard to say anything, because the problem is not well-stated.
> And correctly implementing those algorithms is much easier using a formal derivation
Again, no one in the history of humanity has managed to do this affordably for programs of significant size. If you have, you are pretty much guaranteed a Turing award.
> You can't possibly be claiming that no formal mathematics was done before the invention of automated checkers, because that would be both ridiculous and observably false.
Formal mathematics is not necessarily checked by a computer, but it is necessarily checkable by a computer.
> I disagree. A specification maps to either an empty or countably infinite set of programs that satisfy it.
It's not a matter of disagreement. This is the proof. A program is a specification, but it is not necessarily the only implementation of this specification. There are mathematical definitions of what it means to implement a specification (it's called the abstraction-refinement relation, and one definition is that of "simulation": https://en.wikipedia.org/wiki/Simulation_preorder). The issue of the hardness of program analysis and synthesis is not a matter of opinions, but has been studied quite extensively for decades. You can read papers on the subject by people like Philippe Schnoebelen, Rajeev Alur and quite a few others. I can't give an introduction to the subject here.
> I'm better off not using mathematical reasoning to derive the programs that I do know how to write.
That's not the point. The point of formal methods is not to do the things we know how to do, but to do things we're not sure we know how to do. Formal methods tries to help us specify some properties and then write programs that satisfy them, even if -- and especially -- if doing so is hard or even uncertain to be possible. I'm not yet clear on what it is that you do, but I am explaining what formal program synthesis/refinement means in the context of formal methods, and what it is that hasn't been achieved. For example, taking the distributed system specification and deriving a program that implements something like Paxos. We know that easy things are easy. Formal methods are mostly for the cases that are particularly tricky.
> The claim is that it's considerably easier to derive a program that satisfies some property than it is to verify if an arbitrary program satisfies that property.
Your claim is mathematically disproven by the proof I've given. This isn't an argument, but a mathematical result: program verification is reducible to synthesis.
> Again, no one in the history of humanity has managed to do this affordably for programs of significant size. If you have, you are pretty much guaranteed a Turing award.
I offer that perhaps the issue here is an ignorance of the history of humanity, at least with respect to computing?
You have heard of decomposing a large program into smaller programs, right? That's how it is done and has been done.
Dijkstra did do what you claim is impossible several times in his career, for example the THE multiprogramming system. And he did win a Turing award for doing so[1]. He also had more than a few colleagues and students over the years and at least some learned what he was awarded for.
Of course if you're going to move the goalposts such that it having been done means the problem isn't of "significant size" then I'm happy to dismiss your argument as sophistry.
> Formal mathematics is not necessarily checked by a computer, but it is necessarily checkable by a computer.
Irrelevant to my point. And human programmers routinely derive programs specified to cause pathological behavior in checkers, which is to say making them impracticable to check.
> That's not the point. The point of formal methods is not to do the things we know how to do, but to do things we're not sure we know how to do.
This is a foolish self-imposed constraint. The point of formal mathematical reasoning is to be unambiguous, to reduce cognitive load by letting the symbols do as much of the work as possible, and to be confident in the result arrived at. This is just as useful for solving known problems[1] as it is for researching novel algorithms. And as I'm sure you'd agree there are plenty of concurrency problems that are solved in the literature but are still tricky to implement correctly even given a formal specification. Formal mathematical reasoning, for example using predicate transformers to derive a program text that satisfies the specification, is a good way to keep that trickiness within the capabilities of those of us who are not Von Neumann tier intellectual powerhouses, but are at least bright enough to manage the predicate calculus.
> Your claim is mathematically disproven by the proof I've given. This isn't an argument, but a mathematical result: program verification is reducible to synthesis.
You didn't at all prove that, but I trust you that reducibility of verification to synthesis has been proved[3]. It's still irrelevant to my claim about it being easier for a programmer to derive programs that meet an arbitrary specification than to test that an arbitrary program text satisfies it. Human programmers aren't computers. Please stop misrepresenting my claims. Are you one of those philosophical types who believes that human beings are turing machines? That would certainly explain your confusion and the persistent disconnect in our views.
[2] Most working programmers can't even implement binary search correctly, to say nothing of writing a correct lock-free queue or similar.
[3] That's less interesting than it sounds. Given a working verifier we can just enumerate all strings and return the first one the verifier accepts. Obviously the complexity of the verification dominates and the synthesis will fall within the same computational complexity class.
> I offer that perhaps the issue here is an ignorance of the history of humanity, at least with respect to computing?
Perhaps, but as no one has offered a counterexample, I have no way of knowing.
> Dijkstra did do what you claim is impossible several times in his career
No, he did not. The meaning of a technique not being scalable is that while it does work for small instances it does not for large ones.
We have learned so much since Dijkstra, and can apply even more powerful end-to-end verification/synthesis than he had at his disposal and for larger programs, but we still can't scale it, just as he couldn't, and, indeed, didn't.
> You have heard of decomposing a large program into smaller programs, right? That's how it is done and has been done.
No, it has not been done, but unlike 50 years ago, in the decades we've studied the "hardness" of program analysis/synthesis we have an idea as to why. There are now complexity theory results that show that decomposition cannot always work. In addition, we now know -- what we didn't 50 years ago -- that there are essential barriers to scaling. Things that are easy when small can be absolutely impossible when large.
> This is a foolish self-imposed constraint.
It's not a constraint as much as saying that the claim you have a technique that can more effectively yield programs that we already know how to get more-or-less right and the claim you have a technique that can effectively, feasibly and scalably produce programs that are otherwise very hard to get right are two very different claims. It's like the difference between improving a polynomial algorithm by a constant and finding a polynomial algorithm for NP-complete problems. The first might get you published; the second will make you world-famous.
> It's still irrelevant to my claim about it being easier for a programmer to derive programs that meet an arbitrary specification than to test that an arbitrary program text satisfies it.
The proof is relevant because it precisely proves that your claim, as stated, is, in fact, false. Your claim might be made about "common cases" etc., but that would require a much more careful wording, as well as a careful empirical study rather than insistence.
> No, he did not. The meaning of a technique not being scalable is that while it does work for small instances it does not for large ones.
I see, so a complete and correct multiprocessing operating system running on unreliable hardware is a "small instance" according to you. This is the sophistry I promised to dismiss. Consider it done.
> There are now complexity theory results that show that decomposition cannot always work.
Obviously decomposition can't always work. Primitives can't be decomposed by definition. Yet another trivial truth.
>It's not a constraint as much as saying that the claim you have a technique that can more effectively yield programs that we already know how to get more-or-less right and the claim you have a technique that can effectively, feasibly and scalably produce programs that are otherwise very hard to get right are two very different claims. It's like the difference between improving a polynomial algorithm by a constant and finding a polynomial algorithm for NP-complete problems. The first might get you published; the second will make you world-famous.
This word-salad has nothing to do with anything I've written.
> The proof is relevant because it precisely proves that your claim, as stated, is, in fact, false. Your claim might be made about "common cases" etc., but that would require a much more careful wording, as well as a careful empirical study rather than insistence.
This is more interesting, but also misleading. Obviously every single program ever written falls within the class of "common cases," since, asymptotically speaking, very nearly all program texts have never been written, let alone studied. Look ma, I can do sophistry too!
You are talking about high-assurance code, which is roughly a statistical classification regarding failure rate. Pron is talking about formal verification, which proves something to be correct according to a mathematical model.
Formal verification is not required to produce code with a very low failure rate. I have personally known about unpatched vulnerabilities worth a small fortune which were not exploited because most people who chose to become programmers aren't assholes, or at least not criminally so.
Formal verification is not a magical certification that software is bug free, as there will inevitably be an error in the specification itself or how the specification relates to the hardware. This is astonishingly rare, however, I think Coq has had a single major error in the TCB.
The space shuttle, various nuclear accidents, and plain old manufacturing are example on the limits to which engineering can prevent failure: we just can't shove the real world into statistical models.
You two are in agreement, but seem to be talking past each other :(
I am not American, I don't like making generalisations about Americans or any other nationality, and, as you could see from my blog, I am very much in favour of employing maths in system design -- that's what TLA+ is all about. But confusing "mathematical object" with "can be feasibly reasoned about" is just ignoring computational complexity. I can give you an example of two subroutines, each only a few lines long and each very simple to understand, and yet you won't be able to answer a most basic question about their composition. Complexity theory tells us that mathematical problems have an inherent hardness, and the fact you can write them does not mean you can feasibly solve them. There are no simple answers here. In some carefully constructed ways formal methods can help in some specific circumstances, different methods helping in different use-cases, and you cannot just say that it's "better" overall. For example, while programs of millions of lines of code are routinely tested, no one has ever managed to formally reason about deep properties of programs of this size.