Relational databases performed a task that didn't need doing (1991)

davidw · on Nov 23, 2009

> Biological systems follow the rule "ontogeny recapitulates phylogeny", which states that every higher-level organism goes through a developmental history which mirrors the evolutionary development of the species itself.

I learned that that was incorrect in high school.

http://en.wikipedia.org/wiki/Recapitulation_theory

euccastro · on Nov 23, 2009

True, but the phrase you quote has a merely rethoric use anyway. The OP wasn't basing any of his argumentation on that broken thesis.

davidw · on Nov 24, 2009

No, but it still takes something away from the whole when you know it's flat out wrong. It's like starting out "Since the earth is flat, we believe files should be too".

euccastro · on Nov 24, 2009

That's a bad illustration, since the false fact is used in the argumentation.

I find bad spelling a better analogy. If the OP's spelling and grammar were insufferable, it would have "taken something away from the whole" and still not be worth commenting on.

That said, no big deal. I only felt compelled to reply to your comment because I found it strange that it was the highest ranked.

_csoo · on Nov 23, 2009

That theory was rejected one year after this letter.

jbarciauskas · on Nov 23, 2009

Every organization I've ever worked at has huge data quality issues. Also, a company's data is one of its most valuable assets, providing financial reporting, forecasting, market analysis, and transactional functions.

The question I pose, then, is the data available within corporations today of a higher consistency and quality than it would be if all organizations used only key-value stores or a document stores or CSV files?

I'm going to venture to say despite the generally abysmal quality of data in large organizations today, it'd be much worse without the tools made available through RDBMS's.

protomyth · on Nov 23, 2009

I'm not so sure. It seems that the shoehorning that is done to get data into the tables cause some problems. I am not sure if it is inherent in the model or just a function of the standard two team arrangement (dev & DBA). The rigid implementation requirements and mismatch with the languages developer use seems to be a problem.

mattmcknight · on Nov 23, 2009

The two team arrangement has caused me no end of problems over the past 10 years of my life, beyond shoehorning. The separation of the database team from the development team introduces communication problems, priority mismatches, and fixes done on one side or the other that should have been made on the other side. If you are reading this and you have separate database and development teams, merge them now, or at the least, put control of the schema in the hands of the development team.

StrawberryFrog · on Nov 24, 2009

I'm not so sure. It seems that the shoehorning that is done to get data into the tables cause some problems.

Possibly. But I shudder to think of what it would be like if there was no powerful standard tool like SQL for essential add-hoc queries and fixes. That is what you'd get with custom file storage.

Also the "shoehorning", if done right (normal forms and all that) prevents duplicating information, and all attendant problems.

cpr · on Nov 23, 2009

People probably don't realize who Henry Baker is.

http://en.wikipedia.org/wiki/Henry_Baker_(computer_scientist...

Yes, he's trolling here a bit, but he's brilliant (if quirky).

(Disclaimer: I knew him when I worked at MIT lo these many decades ago.)

bradgessler · on Nov 23, 2009

I think this would be more appropriately titled, "I think RDBMS has set the industry back by 10 years".

Some of the more interesting tidbits:

   I can categorically state that relational databases 
   set the commercial data processing industry back at 
   least ten years and wasted many of the billions of 
   dollars

   --

   Virtually all commercial applications in the 1960's
   were based on files of fixed-length records of 
   multiple fields, which were selected and merged. 
   Codd's relational theory dressed up these concepts 
   with the trappings of mathematics (wow, we lowly 
   Cobol programmers are now mathematicians!) by 
   calling files relations, records rows, fields 
   domains, and merges joins.

I would love to hear HN's thoughts on this.

gchpaco · on Nov 23, 2009

SQL being notoriously poor at hierarchal data has been known for quite a while; he's obviously very correct there. Oracle goes so far as to offer a non-standard CONNECT statement for tree traversal. And there's not a whole lot of sophistication to relational theory on the ground level. Optimizing it is hard, but that's always been the case.

To a large degree we have designed things so that the data we try to store is as relational as possible, never mind the domain implications--then, the mismatch is covered up. A shopping cart is a perfect stupid problem easily solved by relational databases, until you start introducing the real world; special discounts, group packages, "buy these items and pay less $$ than you would getting them individually" is usually done through gross hackery, etc.

neilc · on Nov 24, 2009

SQL being notoriously poor at hierarchal data has been known for quite a while; he's obviously very correct there.

While SQL's difficulty with handling hierarchical data is well-known, he asserts that it is somehow a fundamental problem with relational databases, which I think vastly overstates the magnitude of the problem.

Oracle goes so far as to offer a non-standard CONNECT statement for tree traversal.

Standard SQL added support for recursive queries via WITH RECURSIVE more than 10 years ago, and most modern databases support it; Oracle is just anachronistic here.

Retric · on Nov 23, 2009

A databases job is to handle Data and a relational database can do that just fine. When you want to handle really complex relationships you need to write code because the rules are not abstract. You can create a horribly complex view / stored procedure and pretend it's the databases job, or you accept that some rules are stored on the database but not implemented using the database. Edit: None of the "database alternatives" help you solve the checkout problem.

PS: It's like complaining that HS level Calculus does not let you solve a complex differential equation. Useful abstractions always have their limits because they are abstractions.

gchpaco · on Nov 24, 2009

Data is not simple rows. Data is complex, human generated stuff that has both implicit and explicit structure and it does not collapse down well to any single abstraction, not even object graphs. Fine, so we have to pick some abstraction. That doesn't mean the abstraction we pick is necessarily appropriate. Memory is an abstraction; we could just dump all of memory out to a file and call that a database (and folks used to, and still do sometimes). It has issues for persistent databases, like brittleness and pointer swizzling, so we adopt a more high level one.

The relational view of data is a higher level abstraction but that doesn't mean it's an especially useful one. Hierarchy is something that comes up constantly in the real world and that is something very awkward to represent in a relational database. You can try to use Codd's adjacency list, but reassembling the links is costly and drilling down a hierarchy requires a database query at every level. If you don't believe me, here's a use case: I want everything there is about a sub-part of a Bill of Materials with one query and without having to grope through the entire index. There's a reason why Oracle implemented CONNECT BY, which is abjectly non-relational.

Also the sheer amount of research and work that goes into making even a simplistic relational database responsive is enormous; when Baker wrote this it was not uncommon to lose two orders of magnitude throughput by going to a relational database from a custom written one, and if you were very lucky be able to claw back another order of magnitude by indexes. People today are complaining greatly about the inefficiencies of using row-structured data when you're usually interested in the columns. Even today the way to get maximum throughput from a database is to denormalize and in doing so virtually set the schema in stone.

A 1960s era heirarchial database could traverse a heirarchy as fast as the computer could request data from memory. The models were brittle and didn't always represent the real world well, but you could run an entire country's airlines in the amount of computing power your watch has now. The banks today rely on IBM's IMS, which is hierarchial.

Retric · on Nov 24, 2009

As I said "Useful abstractions always have their limits" you can represent any tree using the relational model. The basic problem is recursive data structures need a higher level of abstraction. Consider the following situation:

  Bob is managed by Ted
  Ted is managed by Bob

If that situation is allowed then all recursive queries need to deal with it. On the other hand if it is NOT allowed then every commit needs to deal with it. However, while the relational model does not understand those issues just like it does not understand HTML it still stores the data just fine. Generally it's a fairly moot point because it's a program talking to the database and not a person, so that program can provide that level of abstraction just fine, granted it's less efficient than a model that understood the abstraction.

As to speed, Relational Databases are the x86 chips of the Database world. There are plenty of custom system that are faster doing specific things, but they sacrifice speed for adaptability and backwards comparability. For example: A core 2 chip's instruction decoder and an SQL databases query optimizer do basically the same thing. With a custom solution you can just skip that step and directly say what to do, but the abstraction is normally worth it so you can upgrade the hardware for years without changing the program.

So the relational model makes the basic assumption that speed is not the primary concern or you would be writing at a more basic level, yet people want more speed so there are plenty of real world hacks to help things along like CONNECT BY and stored procedures, and all those new x86 instructions. So, 99.9% of the time in the real world an of the shelf database running on an x86 chip is plenty fast if used correctly and it's also fairly cheap compared to the total costs of the other solutions.

regularfry · on Nov 24, 2009

If you added up all the things that have supposedly "set the industry back by 10 years," the industry would have to have been born before Galileo.

wolfish · on Nov 24, 2009

Setting an industry back has no relation to when it started.

sfwc · on Nov 23, 2009

Seemingly the same Henry Baker who wrote all those papers on garbage collection.

http://home.pipeline.com/~hbaker1/

http://www.bakercapital.com/team_baker_g.html

Also, http://c2.com/cgi/wiki?ResponseToBakersAntiRelationalPaper

(all via http://c2.com/cgi/wiki?HenryBaker)

mahmud · on Nov 23, 2009

Henry Baker = Hans Boehm + Zed Shaw.

ThinkWriteMute · on Nov 24, 2009

Oh SNAP

amalcon · on Nov 23, 2009

People act like non-relational data storage is some kind of new idea, but it's not. Even if you discount flat files, there's still BerkelyDB (early 90's) and the things it was meant to improve on (earlier). Why, then, are relational databases popular?

I find it hard to believe the author's claim that it's because people wanted to claim they were doing fancy math with their data storage, over all other considerations. If this were the case, Lisp would have been the programming lingua franca through the 70s and 80s, not C. Lisp is obviously closer to its mathematical roots.

Maybe it's because relational databases got the nice consistency features first; Interbase certainly predates BerkeleyDB. I'm more inclined to think it's a combination of two factors: First, the database can do much of the heavy lifting for you. This is nice to application programmers who are stuck with the likes of C, FORTRAN, COBOL, and Pascal; it's less interesting with functional abstractions available.

Second, you get the SQL prompt. It's a dangerous tool, and should almost never be used for updates, but it's really nice to be able to answer unanticipated questions by simply writing up a query. Think of what the Django admin buys you; this is like a miniature version of that.

blasdel · on Nov 23, 2009

His point is not that "people wanted to claim they were doing fancy math with their data storage, over all other considerations" -- but that they wanted to reify their extant degenerate patterns as theory.

amalcon · on Nov 23, 2009

You're right, that does make more sense. It would still seem that Lisp would have been more prominent if the ability to claim good theory (regardless of what you're actually doing) was a primary consideration. The average rank-and-file programmer (or manager) today just doesn't care about theory.

rbranson · on Nov 23, 2009

Agreed. Most development organizations, especially those in large enterprises, the primary users of RDBMS technology when this article was written, are more concerned with coming up with the correct solution at the fastest clip. As far as they're concerned, theory be damned. Relations provide a simple, easy-to-understand data modeling concept and the query language is relatively quick to pick up for most developers. RDBMS are like the Java of data containers. You'll never get fired for going with an RDBMS.

jwhite · on Nov 24, 2009

Lisp started from mathematical roots and grew towards practical applications. OP asserts that data processing grew from practical applications and claims that it then went for mathematical theory to try to make itself feel and appear more respectable.

These are two quite different trajectories. Additionally, different disciplines often have quite different cultures and politics, which influences how their technologies get adopted.

chubbard · on Nov 24, 2009

I think this is a very poignant essay given the place in history we are at. In 1991 this essay looked like sour grapes over forgotten history, but I'm more interested in what lessons we can learn from previous system. It's interesting to hear about a time before SQL and relational systems, and how early developers dealt with persistence problems. The historical account on the controversy of relational technology rings true today given we are struggling with it as well.

I heard someone say that when the industry was moving away from big iron to mini-computers and PCs the programs they wrote were considered worthless because they couldn't reuse them on the next platform. It was the data that was the jewel. Relational technology embraced separating data from the program to enable reusing data within another program or different hardware. Prior to that data and program were tied together. They were fixed and non-portable. Relational technology came to occupy separation of data and program to facilitate data portability.

What's changed is the way in which we write programs. We aren't tied to machine architecture like we once were. We use python, ruby or Java which is portable among many platforms. What was once trash is now treasure because we aren't constrained to one architecture.

Does that mean let's go back to flat files? No. I think it's combination of the two. The features like data separation is important. Network access to data is important. Query languages are important. Hierarchical has always been important, but we've probably down played how important. Scalability and speed are ever more important.

What's not as important anymore is relational. Those other features can be separated.

xenoterracide · on Nov 23, 2009

"With the recent arrival of object-oriented databases, the industry may finally achieve some of the promises which were made 20 years ago about the capabilities of computers to automate and improve organizations."

um... wow... this 1 sentence discredits the entire email. Object Oriented Databases were a HUGE failure. I can't be sure but PostgreSQL might be the only surviving one, and it survived because it went relational. just my 2 cents.

sfwc · on Nov 23, 2009

GemStone was beloved of most who used it, and it is poised to make a comeback through MagLev.

Meanwhile, ORMs (which attempt to implement an OODB on top of an RDB) are ubiquitous.

shiro · on Nov 23, 2009

This article becomes interesting if you realize that (1) it was written in 1991 and (2) was written by a distinguished computer scientist. I think it amusing to see how smart people were thinking back then.

I built a system with OODB back in late 90s. It was a pleasant experience in a way; I could walkaround inflexibility of RDBMS. RDBMS works great if your data domain can be straightforwardly represented in relational model. But if your application is inherently working on a graph, it becomes awkward to map it onto relational model. With OODB, graphs formed by objects can be directory stored.

OTOH, I had a difficulty from the lack of good abstraction; ironically, implementaion of application data structure was too tightly coupled with the data storage model. Application-level implementation details could affect data storage unnecessarily, or some optimization techniques in the db side could affect the application. Relational model and SQL do give a good abstraction barrier in a sweet spot, at least for some type of applications.

I think graph database will give us this nice abstraction barrier for the network-of-object type application domain. Let's see how it turns out in 20 years later...

jbooth · on Nov 23, 2009

No it doesn't.. ok, early 90s, people thought object oriented everything was the way forward.

But his main point, that blasting through a flat file without a ton of row-locking and transactional overhead could satisfy a number of needs better than the actual DBMS, is still true.

If you need to be transactional, on the other hand, that's a different story.

simc · on Nov 23, 2009

PostgreSQL was not an object-oriented database, it was an object-relational database. Despite not living up to some people's exceptions that that it would replace the relational database object-oriented databases were not a "huge failure". They are useful for applications where relational databases were not suitable, for example CAD (as mentioned in the article).

rapind · on Nov 23, 2009

What's the difference between object oriented and document oriented databases?

Aren't all of the recent key/value databases lately pretty much the same concept as an object database?

Sorry if that's a dumb question...

wmf · on Nov 23, 2009

OODBs tend to use navigation of object graphs (see http://c2.com/cgi/wiki?NavigationalDatabase ) as the key operation, while document or key-value DBs rely more on queries.

rapind · on Nov 24, 2009

Thanks for the link. Dug around a bit and from what I can tell they are pretty similar. Object databases have been referred to as hierarchical databases (I.e. XML data), many of the key-value databases being touted in the past couple of years seem hierarchical to me, only with some additional indexing and query language packaged on top (which you could add yourself with Lucene or Sphinx I would assume).

I must be missing something, but I'll keep digging.

ars · on Nov 23, 2009

This:

> Biological systems follow the rule "ontogeny recapitulates phylogeny", which states that every higher-level organism goes through a developmental history which mirrors the evolutionary development of the species itself.

Is not true. And it had been known to not be true since 1922, and possibly even 1890.

randallsquared · on Nov 23, 2009

It was brought up as possibly-true in my 1980s high school, so I guess it was probably something he learned in high school or college in the 1950s or 1960s and never unlearned. People forget that so much more of what we thought we knew before google, wikipedia, and snopes was wrong. You have to work harder to be wrong on stuff like that, nowadays, since it's so easy to look up. In 1991, you had to go to or have a decent library to find out that that was well-known to be wrong, so a non-specialist might never question what they learned in long-ago school. Now, it's so easy to look up minor details as you write an essay that it seems willful if someone doesn't, but if you apply that standard to twentieth century essays, you'll come away with a view of the writers that's often unwarranted...

Next up: spellcheck. ;)

lsb · on Nov 23, 2009

The task they did do, that needed doing, was causing a shift in thinking that was tantamount to the shift you undergo when you learn Prolog, having only been exposed to Assembler.

By abstracting away the storage model, you can reuse the same database calls in the browser (SQLite), on one machine (MySQL/Postgres), or on a cluster, and you're free to think about more business-specific logic.

edw519 · on Nov 23, 2009

I had great difficulty in controlling my mirth while I read the self-congratulatory article "Database Systems: Achievements and Opportunities" in the October, 1991, issue of the Communications,...

I had little difficult in controlling my mirth when I realized that in 18 years, some things haven't changed. Linkbait, drama, and trolling all still look the same.

As a designer of commercial manufacturing applications on IBM mainframes in the late 1960's and early 1970's...

If you're going to wave your resume, make sure it's "wavable". If I had had a hand as a designer of commercial manufacturing applications on IBM mainframes in the late 1960's and early 1970's, I sure wouldn't brag about it. They were an excellent example of what not to do: so expensive, so difficult to deploy and use, and so ineffective, that the whole world rushed out to write better apps on mini-computers, and eventually, PCs. Ironically, the one thing they did do well was their relational database storage systems. If you owned a multi-million dollar AMAPS installation in 1978, the COBOL apps were soon worthless. The only thing salvageable was the TOTAL DBMS.

I can categorically state that relational databases set the commercial data processing industry back at least ten years and wasted many of the billions of dollars that were spent on data processing.

This conclusion is based upon what data? Maybe in 1991, you could bullshit the ACM without supporting data, but 2009 readers demand citings. Wikis & google have exposed the posers.

Unfortunately, relational databases performed a task that didn't need doing; e.g., these databases were orders of magnitude slower than the "flat files" they replaced,

Again, based upon what data? From what planet? Just because someone overnormalized a commercial database doesn't make it the fault of the underlying technology. That would be like saying, "That webpage sucks, therefore HTML sucks."

Why were relational databases such a Procrustean bed? Because organizations, budgets, products, etc., are hierarchical...

Organizations, budgets, products, etc. are data sources and sinks and can be structured any number of ways, including hierarchical. But the lifeblood of any business is its order flow and business processes which are almost always ideally suited to be relational; they're "linked" to almost everything else. Not everything has to be in 4th normal form, but flat files and hirerarchical data bases are almost always a poor stepchild to RDBMS for business flow.

These databases could also respond quickly to "real-time" requests for information, because the data was readily accessible through pointers and hash tables--without performing "joins".

I guess it's not really fair to "debate" with an OP from a generation ago. Even with Moore's Law, he would have had a hard time wrapping his head around where the real bottlenecks would be today. But one thing really hasn't changed that much: throughput has rarely been on the critical path. Why sacrifice data integrity, adherance to business rules, and effective delivery of user needs for a few microseconds? I remember routinely witnessing subsecond intercontinental response time on massive relational database installations as early as 1981. Why didn't OP?

Oh, and RDBMS with pointers and hash tables have been around since 1965:

http://en.wikipedia.org/wiki/Pick_operating_system

and now also support object oriented technology:

http://en.wikipedia.org/wiki/InterSystems_Cach%C3%A9

I shudder to think about the large number of man-years that were devoted during the 1970's and 1980's to "optimizing" relational databases to the point where they could remotely compete in the marketplace.

I shudder to think about the large number of man-years lost by PHBs who read drivel like this and waste the time of people who do real work with initiatives derived from these drama-based conclusions.

Database research has produced a number of good results, but the relational database is not one of them.

From someone who has built a career rescuing so many manufacturers and distributors from flat file systems with relational database technology to whoever posted this: thanks for the laugh. I really needed it on a tough Monday.

giardini · on Nov 24, 2009

Arguably not a single statement that you have made in your overly hasty post is true. To address a few:

Henry Baker is no troll. His letter was written to the ACM in 1991. Baker was an established and respected computer scientist and iconoclast. To treat the letter as if it were written yesterday is foolish and disrespectful, In any case, had Baker's remarks been misguided or irrelevant then the ACM would not have published them.

Baker was always in demand. No need to compare resumes: you likely would not rank well against him,

"Flat files" were used often for batch processing. For sequential processing and merging/splitting/sorting of data they are usually faster than RDBMS. During that processing, temporary files might be created that are indexed, hashed, or sequential.

Today network-style databases remain faster by at least an order of magnitude (and sometimes two) than relational databases. Ask any knowledgeable mainframer about the relative speeds of their network-style vs relational databases. The same holds for smaller computers too.

Baker would have no problem "wrapping his head around" almost any problem. An archive of some of his research papers:

http://home.pipeline.com/~hbaker1/

You saw "subsecond intercontinental response time on massive relational database installations as early as 1981", when IIRC the only relational databases available was Oracle and possibly Ingres. My memories are that Oracle's early versions were performance dogs by any standard. But perhaps you can refresh our memories by citing performance statistics and showing us benchmarks.

PICK was not a relational database.

TOTAL was not a relational database.

Your remarks remind me of how so often when an excellent programmer leaves a shop, the remaining developers diss his work and argue for rewriting all of his (excellent working) code.

But to miss on almost every statement? Oh, my!

chipsy · on Nov 24, 2009

What bugs me is that even though Baker states in the comment that he proved a particular deficiency in the relational algebra(the expression of transitive closures), the assertions he uses to lead into this seem extremely vague:

they(relational databases) made trivial problems obviously trivial, but did nothing to solve the really hard data processing problems...

organizations, budgets, products, etc., are hierarchical; hierarchies require transitive closures for their "explosions"...

He never describes concretely what an "explosion" is in a real situation - not in the comment, at least. I'll give the benefit of the doubt that perhaps he brings it up in some paper.

But my understanding is that the relational algebra is useful primarily because of its linguistic properties - instead of every problem needing a customized data model and customized graph-traversal solution, it describes how to use a generic query syntax and joins everywhere.

While this limits the theoretical power of the system and introduces order-of-magnitude slowdowns, it still solves a major subset of business problems, and it does it with a more compact description than the equivalent graph solution. And that, in turn, means that the theoretical limitations aren't necessarily going to matter for shipping and maintaining a solution, especially when working with a short-term view and small scale.

Essentially, he banks on his credentials to turn an "academic" oriented statement into a "industry" one. But from the 70s onward he was writing papers on garbage collection, concurrency and PL theory, not going out and building systems in the industry.

So my view is that he's being overly biased towards theoretical purity above practicality, a tendency that has led many an academic astray.

(This is not a slam to academics and their pursuit of purity, though. It's a great way to advance the computing arts, and complements well the messy practicality of hackers.)

chubbard · on Nov 24, 2009

> So my view is that he's being overly biased towards theoretical purity above practicality, a tendency that has led many an academic astray.

Not so fast. He contrasts working systems in practice compared to the pure relational technology. He is directly saying relational technology was one of purity backed up by mathematics to make it seem more rigorous and battle tested. Where we have a perfectly practical solution that works well and performs better. I don't see his academic purity leading him away from practicality at all.

I think his academic resume is making you think he must be all about purity, but in fact he's being very practical.

edw519 · on Nov 24, 2009

Arguably not a single statement that you have made in your overly hasty post is true.

You can tell from the post times that I spent over an hour (off & on while doing other work) on my post. This article was so absurd that I wanted to make sure I addressed it exactly the way I wanted to (which I did).

Baker was an established and respected computer scientist and iconoclast.

Therefore, everything he ever wrote or said must be true, according to you.

To treat the letter as if it were written yesterday is foolish and disrespectful

No letter that begins with "I had great difficulty in controlling my mirth..." deserves respect. He was begging for rebuttal. I simply complied.

"Flat files" were used often for batch processing.

See there's one of the many differences between sitting in an ivory tower and actually being on the ground doing the work. This was written in 1991, not 1961. Batch processing had long been surpassed by on-line transaction processing. CICS had taken over for batch COBOL, and mini-computers and PCs were always on-line oriented. Dismissing an entire technology because it performed less well in one area of 5% of your processing sounds like something said by someone who doesn't have to do the work and doesn't have a clue how things really work.

Baker was always in demand. No need to compare resumes: you likely would not rank well against him,

My work is my resume. AFAIC, the only thing that matters is delivering results to customers. If I have relational database technology in my back pocket and my competitor doesn't, I'll kick his ass every time. (This is not a judgement on anyone's ability; it's an assessment of the technology under discussion.)

Today network-style databases remain faster by at least an order of magnitude (and sometimes two) than relational databases.

What difference does it make if you lose data integrity? This commonly happens in flat file environments. Doing the wrong thing faster is meaningless. There are many times when RDMS makes much more sense.

My memories are that Oracle's early versions were performance dogs by any standard.

Oracle was a dog. Therefore relational database technology performed a task that didn't need doing. Just like the internet performs a task that doesn't need doing because ie6 is a dog. Right.

PICK was not a relational database. TOTAL was not a relational database.

Let's see. Both had columns, rows, joins, indexes, stored procedures, and triggers. What else did they need, certification by Codd himself?

Your remarks remind me of how so often when an excellent programmer leaves a shop,

Based on OP's original article (the data at hand), he most certainly would not make an excellent programmer in any shop I've ever been in. He sounds like the ivory tower guru who pontificates B.S. while the rest of us got all the work done. Again, the difference between theory and practice.

giardini · on Nov 24, 2009

Arguably not a single statement that you have made in your overly hasty second post is true:

This article was so absurd that I wanted to make sure I addressed it exactly the way I wanted to (which I did).

The letter was on-point at the time: the ACM believed it to be relevant.

He was begging for rebuttal. I simply complied.

You're 18 years too late! May as well rebuff Isaac Newton for missing relativity!

What difference does it make if you lose data integrity?

The common network-style (CODASYL) and hierarchical databases enforce referential integrity and are full ACID.

Both[ed. PICK,TOTAL] had columns, rows, joins, indexes, stored procedures, and triggers.

Indexes, stored procedures and triggers are implementation details and are not part of relational database theory per se. None are required for a database to be relational: none guarantee that a database is relational. And to be pedantic, neither are "rows" and "columns" part of relational theory: the proper nomenclature is "tuple" and "attribute".

maukdaddy · on Nov 23, 2009

PhDs are funny. Let's go shopping!

giardini · on Nov 25, 2009

I wonder if relational databases would not be as popular had easy-to-use transactional filesystems been commonly available at an earlier time.

To gain transactional integrity (ACID properties) in that day, one had to either purchase a database management system (usually hierarchical or network) and work within that system or be a heckuva programmer and write it oneself.

Had transactional filesystems been commonly available, then file processing with those filesystems would have been akin to working with a relational database management system (RDBMS).

But that leaves referential integrity (RI) as a programmer task. Unfortunately most sites I've worked at do not seem to believe that RI should be strictly enforced. So maybe it wouldn't make much difference.

JulianMorrison · on Nov 24, 2009

Seems to me, messing with immense flat files is going to be faster iff you have a Computer Scientist of an equal caliber to the ones who wrote <insert professional SQL DB here> to optimize your indexing and storage formats and disk access patterns and whatnot. That is, at the very least, going to be expensive.

It may just be simpler to just throw Oracle/Hadoop and hardware at the problem.

TheSOB88 · on Nov 23, 2009

What?

47 · on Nov 23, 2009

> With the recent arrival of object-oriented databases, the industry may finally achieve some of the promises which were made 20 years ago about the capabilities of computers to automate and improve organizations.

Oh No! i thought NoSQL/Key-value databases are the future