Generating JSON Directly from Postgres

recursivedoubts · on July 15, 2021

> Too often, web tiers are full of boilerplate that does nothing except convert a result set into JSON.

Yeah. This is the problem: we've abandoned the hypermedia architecture of the web for a dumb-data, RPC model. I suppose if you are going to go that direction and things are simple enough, you can jam it in the DB and get smoking perf.

But as things get more complicated, where does the business logic live? Maybe in the database as stored procedures? It's less crazy than it sounds.

An alternative is to cut out the other middle man: JSON, ditch heavy front-end javascript and return HTML instead, going back to the hypermedia approach. The big problem with that is a step back in UX functionality, but there are tools[1][2][3] for addressing that.

[1] - https://htmx.org (my library)

[2] - https://unpoly.com

[3] - https://hotwired.dev

JohnTHaller · on July 15, 2021

> Maybe in the database as stored procedures?

I did this for the very first external web app built for Bankers Trust Company back in the day. SQL Server back end with (classic) ASP on the front end. Even if someone had gained access to the web server, they wouldn't have had access to anything they shouldn't have. I built web apps, an online store, and a learning/video website using the same tech around the same time and they worked without issue for well over a decade. So, there is something to be said for this approach, though I wouldn't build something like this today.

meepmorp · on July 16, 2021

A place I worked at had an ASP app in vbscript + one stored proc that dynamically executed raw sql strings from a database table. It worked so much better than it ever should've.

CRConrad · on July 25, 2021

> > Maybe in the database as stored procedures?

> I did this for the very first external web app built for Bankers Trust Company back in the day. SQL Server back end with (classic) ASP on the front end.

What, "(classic) ASP on the front end"? I thought ASP was a server technology; wasn't your actual front end, like, Internet Explorer?

Anyway, I did that too, twenty years ago, using a couple of Oracle PL/SQL packages. Weird dual-language meta-programming, writing SQL that spits out JavaScript that in turn calls the next PL/SQL-generated page...

ducharmdev · on July 15, 2021

Out of curiosity, what would you build today?

JohnTHaller · on July 18, 2021

I still believe the stored procedure route has merit. It's harder since most available libraries are built around the idea that the frontend has full access to the database. A frontend in (insert favorite frontend lang/framework here) and a backend in Postres using stored procedures could work quite well.

Unfortunately, the 'meta' is all about frontends with full backend access, so that's what I'm learning alongside other stuff.

ducharmdev · on July 19, 2021

Not sure if you've looked into either of postgrest or postgraphile, but I feel like those tools may be right up your alley. Rather than have your app layer generate SQL via an ORM, these tools instead generate your app layer based on your database schema, functions, etc. Postgraphile (generates graphql APIs) is particularly friendly towards frontend-heavy web apps.

JohnTHaller · on July 22, 2021

Thanks for the recommendations, I'll check them out!

wreath · on July 16, 2021

You almost convinced me until the very last sentence. Do you mind sharing why you wouldnt build this today? Especially for simpler CRUD apps.

JohnTHaller · on July 18, 2021

I meant more that I wouldn't use VB/ASP and SQL Server. I still believe the stored procedure route has merit. It's harder since most available libraries are built around the idea that the frontend has full access to the database. A frontend in (insert favorite frontend lang/framework here) and a backend in Postres using stored procedures could work quite well.

phoenixdblack · on July 16, 2021

The company I currently work for still actively uses classic ASP in production and they still actively develop that product. Currently trying to get away asap, because I don't wanna touch spaghetti vbscript from '99 with a ten foot pole.

scosman · on July 16, 2021

Lots is boilerplate, but eventually you'll need more logic, then some CPU heavy tasks, then a memory heavy task, then a task suited to a specific language, then a task that needs a GPU, then a distributed memory cache.

Web servers are easy to horizontally scale. Even minute to minute if needed. Scaling your database hits limits and is much harder. Keep the database dumb.

danielheath · on July 16, 2021

That’s reasonable advice for growth stage companies aiming for huge user bases.

I work at an Alexa top10k, and our database server is quite small and sits at about 10% load most of the time.

You can grow huge with vertical scaling. Not all the way to “household name” size, but still truly huge. Servers with tens of tb of ram are available by the hour.

scosman · on July 16, 2021

Really really expensive servers that sit at 10% load most of the time is one of the points.

danielheath · on July 16, 2021

This isn’t an expensive server. It’s just not a problem that requires substantial resourcing.

fomine3 · on July 16, 2021

Licensing cost is another factor. DBMS tend to be paid software than Web application server. Scaling up Oralce sometimes very expensive.

munk-a · on July 16, 2021

Oracle is extremely expensive. I've been using Postgres for a long time and you can eek quite a bit of performance out of it.

rualca · on July 16, 2021

> That’s reasonable advice for growth stage companies aiming for huge user bases.

Not really, that's the result of hard-earned lessons that pay off from day 1.

Think about it: you have data in a database, and you put up an API to allow access to it.

Your API needs change, and now you need to make available a resource with/without an extra attribute. You need to provide it as a new API version in parallel with the current API version you're supporting.

What will you do then? Reupdate the whole db with a new migration? Put up two databases? Or are you going to tweak the DTO mapper and be done with it?

danielheath · on July 16, 2021

I use a database that was designed by someone who understands one sometimes wants to run a migration. It’s not hard to add new fields.

Perhaps my experience is unusual having worked on a distributed database for several years but this stuff just seems like table stakes.

rualca · on July 16, 2021

> I use a database that was designed by someone who understands one sometimes wants to run a migration. It’s not hard to add new fields.

We're not talking about adding attributes to a database. We're talking about one of the most basic cases in API design: changing a interface and serving two or more versions.

If the mapping step between getting data from the db and outputting it in a response is expected to be eliminated by getting the DB to dump JSON, how is this issue solved?

And how exactly is a full blown db migration preferable to mapping and serializing a DTO?

taffer · on July 16, 2021

The way you are supposed to use Postgrest or Postgraphile is to provide only views and stored procedures as the API. These views and stored procedures are layered over your data schema and should be the only way to query or change the data. This way you can make API changes without requiring migrations.

The point is that instead of having queries that return tables that you then map to DTOs for serialization, you have queries that return JSON directly.

Another way to accomplish the same thing, if you don't want views or stored procedures, is to use a library like jOOQ that does much the same thing in your Java middleware.

BenjieGillam · on July 16, 2021

Or use an API layer such as GraphQL that doesn’t require versioning for such a minor change since additive changes (or deprecations) to the API do not affect existing queries - each query states exactly what it needs.

munk-a · on July 16, 2021

Backwards incompatible changes will eventually be required by the business - at least once that business is oriented such that technical concerns do not overrule all other concerns.

rualca · on July 16, 2021

> Or use an API layer such as GraphQL (...)

But that already qualifies as the boilerplate code that was supposedly the root problem, isn't it?

But now instead of a simple mapper you've bolted on another complex system.

So exactly how does JSON in the db solve any of the problems that were presented? Other than adding two buzzwords, what were the improvements?

ric2b · on July 23, 2021

You can use views for that usecase, and you can have multiple available at once.

exdsq · on July 15, 2021

I worked on a stored-procedure driven finance engine once. There was an internal language to build procedures quicker. Swear to god I had errors running them through the SQL IDE because they were too large. Once had to debug a 1.6m line SQL file. These things can really get out of hand if people are left unchecked!

MuffinFlavored · on July 15, 2021

> Maybe in the database as stored procedures?

I always wondered if I could make something like this work: https://github.com/plv8/plv8

Maybe couple it with this: https://postgrest.org/

Just not sure if it was worth the effort upfront to learn something other than simple Express (node.js) servers/middleware functions/router controllers with database client access. That paradigm just feels infinitely more "in control" and "extensible" to me.

rattray · on July 15, 2021

You might like http://postgraphile.org, which is "postgrest for graphql" but where you can override anything with normal Node code – it's just an express plugin. (The docs don't make it obvious how powerful/flexible the tool is). Of course, you can also use pl/v8, pl/pgsql, etc and any other advanced postgres feature as well.

inopinatus · on July 15, 2021

Once the client is the MVC execution environment with the client-server interaction used mostly for data replication, plus some extra invokable server-side behaviours written in isomorphic code, we can congratulate ourselves on having more-or-less reinvented Lotus Notes.

bob1029 · on July 15, 2021

> But as things get more complicated, where does the business logic live? Maybe in the database as stored procedures? It's less crazy than it sounds.

We do this today with SQLite, but we don't use stored procedures. Assuming you load all required facts into appropriate tables, there is no logical condition or projection of information that you cannot determine with SQL.

We also leverage user/application defined functions to further enhance our abilities throughout. It's pretty amazing how much damage you can do with SQLite when you bind things as basic as .NET's DateTime ToString() or TryParse() into it. There's also no reason you cannot bind application methods that have side effects if you want to start mutating the outside world with SQL.

jasfi · on July 16, 2021

Putting business logic into stored procedures is an old idea, and very widely used from what I can tell. Especially with Oracle's PL/SQL and SQL Server's T-SQL.

One huge advantage is that the DB has a cache that is automatically utilized, instead of having to build your own cache in an application layer.

The PostgreSQL userbase seems to be behind when it comes to utilizing stored procedures, it seems that they were not a priority for the dev team for a long time.

branko_d · on July 16, 2021

Few more advantages:

  - Round-trips avoided.
  - Security.
  - Data integrity.

The last two points rely on building a public API out of stored procedures / functions / views and disallowing direct access to base tables. That ensures no one can access anything without proper authorization and no one can change the data unless in accordance to what stored procedures allow.

It's just a shame that procedural extensions of SQL (T-SQL, PL/SQL and brethren) seem stuck in the '80s when it comes to developer experience.

sureglymop · on July 15, 2021

I am doing this at work, I have an application with all the business logic in the database. It is a huge hassle though... I am versioning a bunch of scripts with git but it really isn't convenient.

The thing is that a database already stores data but all the schematics and ddl stuff would be much more conveniently kept in descriptive plaintext.

formerly_proven · on July 15, 2021

One thing I particularly dislike about SQL is DDL and all the system-specific extensions (e.g. for access control). I don't really get why there is a separate thing like "ALTER TABLE foo DROP COLUMN bar;" if you could have "DELETE FROM schema.foo WHERE name = 'bar';". 90 % of DDL is just duplicating basic DML functionality, Except For That Special Kind Of Object Which Totally Smells Like It's Relational But It's Not. Especially for access control - why is that just not a table? I'm sure it literally is one.

This makes SQL needlessly complex and tools to work with schemas far more complicated than they actually have to be.

In Lisp, most things are a list/pair. In SQL, most things should have been a table.

inopinatus · on July 15, 2021

In many RDBMS this is in fact the case. Some take it up a notch, for example in postgresql even types are represented as a relation in the system catalog. However, a system schema is necessarily very close to the database’s architecture and capabilities, making it implementation specific; commonality of underlying structure doesn’t really emerge from relational algebra because it is a) primarily a query paradigm and b) SQL is only a very loose adherent. In practice, woe betide the DBA that writes “insert into pg_enum” or “insert into pg_database” without knowing all the extra stuff that CREATE <anything> actually does.

The DDL is also implementation specific but there’s a much higher degree of practical commonality. When writing application db migrations, and especially when supporting multiple backend databases, I’d definitely rather compose/generate DDL than engage with the minutiae of each.

rhacker · on July 15, 2021

That would be pretty nice,

    insert into schema.users (type, name, columns) values ('btree_index', 'index_dob', 'dob DESC, name ASC')

talaketu · on July 16, 2021

You might find some eminent agreement[1], though to be fair he was talking about relational DBMS, not SQL DBMS. See rule 0 and 1:

Even table names, column names and domain names are represented as character strings in some tables. Tables containing such names are normally part of the built-in system catalog. The catalog is accordingly a relational data base itself — one that is dynamic and active and represents the metadata (data describing the rest of the data in the system)

[1] The Twelve Rules, E.F.Codd https://reldb.org/c/index.php/twelve-rules/

CRConrad · on July 25, 2021

Is that you, Bryce?

programmarchy · on July 15, 2021

Curious how far you’ve gone with this. You could have your CI/CD sync your stored procedures. But you would probably want to go further than that and manage them with a tool like db-migrate so you can keep your stored procedures tracking with your schema changes rather than just versioning them. At the end of the day, can you really separate your stored procedures from the data they operate on? I would say no you can’t.

cmrdporcupine · on July 16, 2021

Reminds me of a job back in the early 2000s which was Oracle PL/SQL stored procedures fronted by JSP with inline JDBC requests.

It was awful. Not intrinsically because of the architecture, but because of the piss-poor implementation. Piles of badly written PL/SQL that should have been on dailywtf, with even worse JSP. It was all written by IBM contractors and was obviously a garbage rush job.

Of course the response from mgmt and the internal dev team was to rewrite everything in the "modern" J2EE style, but keep the old database partially because large parts of the existing logic were just not understood well enough to rewrite safely. So the rot kind of just continued.

After I and others left I believe everything was finally rewritten from scratch in RoR or something.

princesse · on July 15, 2021

> But as things get more complicated, where does the business logic live? Maybe in the database as stored procedures? It's less crazy than it sounds.

I've worked with a bunch of systems like this before. One was just a collection of PHP scripts that would trigger SQL queries. Another one was all lambdas and Cron jobs on top of mssql stored procedures.

If you have a decent team and if you use version control as intensely as you would with code, I have no reason to believe this cannot work. To be fair, those condition are true regarding of tech and architecture.

What you cannot do easily though is pivot, hire, and scale. This is what bit those teams I worked with and why those specific systems are no longer around.

simonw · on July 15, 2021

I used to avoid things like stored procedures like the plague, because of my fear that crucial business logic would end up outside of version control.

These days I'm comfortable enough driving EVERY database change through a version controlled migration system (usually Django migrations) that I'm not concerned about this any more. It's not hard to enforce version controlled migrations for this kind of thing, provided you have a good migration system in place.

_lqaf · on July 15, 2021

> One was just a collection of PHP scripts that would trigger SQL queries.

Yes, I've had to build a system like that (Java, but same diff.)

This model is popular in particular with banks, because they can split sensitive responsibilities in a way that makes sense to them.

It works fine if you have good communications between developers and DBAs. If you don't... well, I won't have to suggest finding another gig, you already want to.

munk-a · on July 15, 2021

This is just my experience but I've found that business logic usually ends up living before the query - most pages (whether read or update) follow a sort of pattern like...

1. Receive, route and validate (for form) the request

2. Validate the request for function (business rule validation - can such a user do such on a Tuesday and the like)

3. Compute business attributes around the form of the response or update

4. Execute the query against the database

5. Send the results to the user

I strongly agree that step 5 there doesn't really need to involve very much stuff happening outside of the DB - in our particular platform we have a very smooth rowset -> JSON translator in the application framework that includes support for mapping functions over the rows as they come out of the DB - the result is that we pretty much stop executing application code as soon as we send a query to the DB. While we do still delay the actual JSON encoding until the application layer it's thin, dumb and done in a streamed manner so that we don't have to hold all that junk in memory - and it comes with mode switches to either redirect the big DB blob to an HTML template, echo it as a CSV or even run a PDF template to render it into what the user wants to see.

3pt14159 · on July 16, 2021

The problem with this route is that as you scale everything breaks. You end up putting the events or activities table on DynamoDB or you have to store user presence in Redis or you have to vertically or horizontally shard and have to move all that business logic that was in your DB back into the application layer anyway.

Also, it's just harder to debug stored procedures when they do the wrong thing than it is with a typical problem in application code on a stack like Rails or Django.

amw-zero · on July 15, 2021

Maybe this is why we can’t ever speak in generalities in this industry. I was going to say the exact opposite - almost all business logic I write comes _after_ the query is made. But I prefer a more DDD-like approach where logic is encoded in models, so there would be no way to do anything useful with raw data from a query.

SahAssar · on July 16, 2021

Step 2 would require user info from the DB right? That is unless your AuthZ model is simple enough to fit in a JWT.

dalore · on July 16, 2021

I built pretty much the same thing for django for a private project. Html attributes to issue ajax requests and update doms. Coupled with middleware that would detect the requests and run the same django views but extract what was needed and make it an isomorphic app.

kumarvvr · on July 16, 2021

To me, apps that end up simply being translators for JSON conversion are poorly built or feature poor.

There is only so much you can do with moving data around.

Also, my approach to web app development is to put everything in a domain module, where the interface objects are pure language level objects.

LAC-Tech · on July 16, 2021

Thanks for the link. htmx looks really interesting - and the size is right too (1.3kb minified + gzipped). My biggest question would be how you make an offline capable site with this but I'm guessing service workers could take care of it.

I honestly think we don't do enough work on the backend these days, there's a trend for the client to get sent very raw data and to do a lot of processing on it just to render. If we can move more to the backend while still giving customers those isn't SPA style page transitions, it's a win in my books.

7thaccount · on July 16, 2021

The application I work with has billions flow through it annually with hundreds of stored SQL procedures. It works well, but can be a bit difficult to follow at times. Everything is database driven, from the displays to the data in files exported for the number crunching apps, which then send result data back to the database.

jsmith45 · on July 16, 2021

Taking the concept to the extreme, the entire crt.sh site is served from postgres. It literally constructs the html in stored procedures, and uses some small apache httpd extensions to send requests to the db, and use the response (including letting the DB response dictate specific HTTP headers when required).

zozbot234 · on July 15, 2021

You could return linked data via JSON-LD and get the best of both worlds - standardization plus smart UX.

nsriv · on July 15, 2021

This has been the appeal behind Phoenix LiveView for me.

rhacker · on July 15, 2021

This all feels backwards for me. I mean I already did YEARS of JSF with facelets/richfaces/icefaces/primefaces/JSF2.0

mistersys · on July 15, 2021

Okay, yeah we can serve UI over the wire. It's possible to make it work. But the experience will never be the same as a a native application that's designed to handle offline functionality, interactivity and optimistic updates.

The Hey email client is great example, hotwired.dev was built for Hey.

Guess what? It kind of sucks. It's buggy and slow. Randomly it stops working. When the internet goes down, random things work, random things don't work. If it weren't for Hey's unique features like the screener, I would much rather use a native app.

There's a ton we can do to make the the developer experience of rendering on the client side better, but there's only so much we can do to make the user experience of serving UI over the wire better. When the wire breaks or slows down, the UI renderer stops working.

We built an internal tool for our team we call "restless", and it lets us write server side code in our project, and import it from the client side like a normal functional call, with full typescript inference on both ends. It's a dream. No thinking about JSON, just return your data from the function and use the fully typed result in your code.

We combine that with some tools using React.Suspense for data loading and our code looks like classic synchronous PHP rendering on the server, but we're rendering on the client so we can build much more interactive UIs by default. We don't need to worry about designing APIs, GraphQL, etc.

Of course, we still need to make sure that the data we return to the client is safe, so we can't use the active record approach of fetching all data for a row and sending that to the client. We built a SQL client that generates queries like the ones in the OP for us. As a result, our endpoints are shockingly snappy because we don't need to make round trip requests to the db server (solving the n+1 ORM problem)

We write some code like:

    select(
      Project,
      "id",
      "name",
      "expectedCompletion",
      "client",
      "status"
    )
      .with((project) => {
        return {
          client: subselect(project.client, "id", "name"),
          teamAssignments: select(ProjectAssignment)
            .with((assignment) => {
              return {
                user: subselect(assignment.user, "id", "firstName", "lastName"),
              };
            })
            .where({
              project: project.id,
            }),
        };
      })
      .where({
        id: anyOf(params.ids),
      })

And our tool takes this code and generates a single query to fetch exactly the data we need in one shot. These queries are easy to write because it's all fully typed as well.

peferron · on July 16, 2021

Sounds awesome. To anyone interested in that approach, the closest open-source stack would be:

  - PostgreSQL
  - PostGraphile
  - Restrict prod to only allow persisted queries
  - Generate TS types using apollo-cli or graphql-code-generator

crxgames · on July 17, 2021

BlitzJS does something like this as well

yourenotsmart · on July 16, 2021

> An alternative is to cut out the other middle man: JSON, ditch heavy front-end javascript and return HTML instead, going back to the hypermedia approach.

You are incorrectly assuming all JSON APIs are consumed by HTML sites on the same domain.

APIs are for machine consumption across the web.

Your solution of "just make static HTML" doesn't even begin to address the problem. Unless you're saying API consumers should go back to scraping HTML instead of having made for the purpose APIs.

nop_slide · on July 19, 2021

Why do you suggest Unpoly over Alpine JS now? (Regarding a previous comment of yours)

yourenotsmart · on July 16, 2021

> ditch heavy front-end javascript and return HTML instead, going back to the hypermedia approach.

I'm left with the impression you think hypermedia means "server-side static HTML".

JavaScript is part of hypermedia, it's part of the definition of REST (code on demand). If you don't like that, go argue with Fielding.

recursivedoubts · on July 16, 2021

code on demand is optional

self describing messages and HATEOAS is not

would love to chat w/ fielding some time

yourenotsmart · on July 16, 2021

Yes, it's optional. Not forbidden. Do you understand the difference?

kkapelon · on July 16, 2021

htmx is a breath of fresh air compared to the various Javascript frameworks.

Thank you for creating/updating it!

tiew9Vii · on July 16, 2021

Rest over HTTP works well. You get easy load balancing, caching etc.

JSON seems to of been the message format of choice due to easy interoperability with the browser. However, with this, you end up writing serialisers on the server, deserialisers on the client. That's the majority of the job, reading what JSON fields you need to send down and for the client reading what JSON fields to read.

Using unstructured JSON has lead to OpenAPI previously Swagger. You put a bunch of metadata in your server API code to generate some loosely typed schema describing your service so clients can generate deserialisers etc. This is still unstructured JSON and seems like a kludge.

I've recently had success using Protobuf as the message format for HTTP API's. I still get easy load balancing, CDN caching, all the benefits of HTTP, which I'd lose if I went gRPC, Protobuf has a smaller size than JSON and add Brotl compression reduces page sizes. The excellent thing about using Protobuf over HTTP is it does what OpenAPI attempts to do without the kludge. You publish your Protobuf schemas, the server generates strongly typed objects to use, the clients with Typescript generate strongly typed objects. There's no more serialising JSON or manually reading JSON. You get typed parsers generated from a schema definition, so you call `encode` or `decode` on them.

Protobuf works across a large number of languages. My server side code has become trivial, my client side code has become trivial. As a side effect, I spend less on network bandwidth, pay less in compute with de/serialisation, and clients see a speed increase due to reduced page load sizes. Protobufs then also reduce the need for OpenAPI. Grpc entirely removes the need for OpenAPI/Contract based testing but going full Grpc on the client-side in the browser has more setup with a special proxy and you loose the load balancing/caching of HTTP.

TLDR; stop using JSON for web API's using something such as Protobuf with a schema. JSON is the equivalent of running your app in debug mode.

ardit33 · on July 16, 2021

you just re-invented XML + Schemas, just in binary format. We had all this in the 90s and early 2000s. JSON became popular for a good reason.

tiew9Vii · on July 16, 2021

Good reason being JSON is trivial to parse in the browser and webapps exploded in use? It's a valid reason but that doesn't mean something with a strong schema that has a good user experience is bad?

There's nothing fundamentally wrong with XML + Schemas just they are a hassle to deal with being verbose and more effort generating/parsing. XML still has a place it's just forgotten and not cool anymore. Personally I like XML, maybe not for webapps message format but it's properties still hold today.

Protobuf / Thrift etc are probably a evolution of XML+Schemas on the web using a binary format. They are less verbose and Javascript isn't the Javascript of 15 years ago so also easy to work with in the browser.

It seems my original comment is down voted which is fair enough. I've been generally surprised at how much better an experience all round it has been switching to Protobuf instead of using JSON. My code is easier, there's less of it, it's more performant, it's easy to share across the backend/frontend layers, it has a long list of languages it can be used from with zero effort.

I was a bit pessimistic about it but had a API I needed to save bandwidth on for end users so tried switching from JSON as an experiment. I expected it to be a hassle but it turned in to the complete opposite. Any REST service I create going forward JSON won't be my first choice. Protobuf is simpler to work with in the browser (no manual parsing/typing the wrong field) and server side, it's also far more efficient.

For clients they get evolution support, can download prebuilt parsers/deserilisers or create there own from published schema definitions, I see it as win win all round and no drawbacks.

SOAP is probably closer to gRPC?

slt2021 · on July 16, 2021

this technology has been around for a while: Webservices (WSDL+SOAP all based on XML)

DrBenCarson · on July 16, 2021

WTLDR we have reinvented SOAP

rualca · on July 16, 2021

> Yeah. This is the problem: we've abandoned the hypermedia architecture of the web for a dumb-data, RPC model.

I'm not sure that's the issue at all.

The article complains about boilerplate code used to convert results to JSON. To me those complains make absolutely no sense at all, because that boilerplate is used for two things: validate data, and ensure that an interface adheres to it's contract.

It makes absolutely no sense at allto just dump JSON from the outside world into a db without validating or mapping. And vice-versa. That's what the so called boilerplate does.

What's more astonishing is that nowadays the so called boilerplate is pretty close to zero, thanks to the popularity of mapping and validation frameworks. Take .NET Core for example, and modules like AutoMapper and FluentValidation. They map and project and validate all the DTOs with barely any noticeable code. Why is this a problem?

tofuahdude · on July 15, 2021

A lot of commenters jumping on the "cut out the middle tier" title.

The bulk of the content is simply about formatting JSON in the DB instead of manually mapping rows in the application layer.

It doesn't say "eliminate your API layer" or "have no application logic between client and db" as most are jumping to.

I find the actual methods described as helpful in that I can convert to the data structures I ultimately want in one pass instead of two.

Doesn't mean I don't have validation or a traditional API layer. Just easier to use.

nicholasjarnold · on July 15, 2021

If one wishes to "cut out the middle tier" they might be better served by a investigating a project like PostgREST [0]

[0] https://postgrest.org/en/stable/

herdrick · on July 15, 2021

Or PostGraphile: https://www.graphile.org/postgraphile/

joshgel · on July 15, 2021

I’ve been trying to decide whether to move a Django-DRF API to one of these.

rattray · on July 15, 2021

It should be very quick/easy to build a demo!

mixmastamyk · on July 15, 2021

And/or Hasura.

rishav_sharan · on July 16, 2021

or supabase.io which is built on top of postgrest

tofuahdude · on July 15, 2021

The linked post has nothing to do with replacing rest APIs / is a completely different use case than postgrest.

felixge · on July 15, 2021

As with everything, it depends.

I've implemented a large system managing billions of records using exactly this approach of cutting out a lot of boilerplate in the application layer.

The most important thing is to be pragmatic. This approach works great for CRUD as well as many types of complex analytical queries. However, for cases where the the complexity or performance of the SQL was unacceptable, we also decoded data into application data structures for further processing.

When done well, you can get the best of both worlds as needed.

maxk42 · on July 15, 2021

It is a really, really bad idea to put business logic in your database and there are people in this thread downvoting everyone who points it out. Please do not do this in production - I've had to maintain systems that married business logic and database and it was a nightmare to maintain and cost a fortune to fix.

midev · on July 15, 2021

> It is a really, really bad idea to put business logic in your database and there are people in this thread downvoting everyone who points it out

Your database is your business logic. From constraints to the queries you write, it's all business logic.

Regardless, you misunderstood the article. This is about moving formatting into the db layer. Letting Postgres return JSON, as opposed to middleware code to translate rows to JSON.

Which is like the opposite of business logic code.

tshaddox · on July 15, 2021

What do you mean? Isn’t your database precisely where you ought to put your business logic?

jayd16 · on July 15, 2021

Like everything, it depends. Some business logic like "this field should never be negative" is probably fine.

More complex logic is a lot harder to maintain. You don't want to stress your DB's CPU if you can avoid it. You still need some kind of connection pooler anyway. There's a lot of reasons it doesn't make sense to put all your business logic in the DB. SQL is not the easiest language to maintain (not enough abstraction).

IMO the line to draw is just enough to keep data sanity.

aww_dang · on July 16, 2021

Validation on the client, for user convenience. Validation on the backend, for security. Validation on the DB, for sanity.

Error states are a bit funky if you don't validate in the backend logic.

pbreit · on July 15, 2021

Wut? You put data in a database. Logic goes in the app.

tyingq · on July 15, 2021

In some cases, things like constraints and the model itself represent business logic.

orange8 · on July 16, 2021

I like this distinction from wikipedia [1] "Anything that is a process or procedure is business logic, and anything that is neither a process nor a procedure is a business rule."

Meaning constraints and the model represent business rules.

1. https://en.wikipedia.org/wiki/Business_logic

throw1234651234 · on July 15, 2021

Is this a joke? SPs and SQL Functions are far less maintainable than .NET/Java/Node/Python.

ako · on July 15, 2021

Why are they less maintainable? It’s just code you can write in a code editor, version control in a versioning system, have automatic tests for, have automatic deployment scripts for, etc. In the end the database is just an application server executing your code close to the data. Only thing you have to ask yourself is how much your code needs to scale. But it can be much more efficient to run business logic in the database as stored procedures.

munk-a · on July 15, 2021

I personally have found it's much more difficult to partially mock out concepts when writing logic in the DB layer though my tooling may have been the weakpoint. In a nice application language I can easily isolate specific units of code for direct testing - while every test I write over complex database logic feels like composing a symphony where I need to make sure the horns are coming in on the right pitch and the wind section isn't too early to properly simulate a relatively easy error scenario.

Since the application layer is much more naturally aligned with small well isolated chunks of logic it is easier to minimize the volume of code and logic in scope when testing a particular attribute - when that goes into the DB I've always seen things get more complex rapidly.

mdavidn · on July 15, 2021

Using a migration utility like ActiveRecord, Flyway, or go-pg, SQL definitions can also live in version control, have automatic tests, and deploy automatically. Who has a database in production without these things?

throw1234651234 · on July 16, 2021

Unrelated, but which ones of these do you recommend (also factor in LiquiBase) that's free?

As far as automatic tests for dbs - I have no idea what you are talking about unless you mean integration testing through a backend.

mdavidn · on July 29, 2021

I’d recommend whichever one is available in a language the project uses already.

Automatic tests for a database could simply be writes followed by reads with expectations for the returned rows. This is how Postgres tests itself.

My personal go-to right now is ActiveRecord and RSpec. The examples could just as easily run raw SQL directly using the Ruby pg driver.

Thaxll · on July 15, 2021

"But it can be much more efficient to run business logic in the database as stored procedures."

But this is not code, this SQL or equivalent, if SQL was known to be better than code to do busines logic everyone would do that.

ako · on July 15, 2021

How is this not code: https://oracle.su/docs/11g/appdev.112/e10472/packages.htm#in... ?

What is your definition of code?

mousepilot · on July 15, 2021

There's all sorts of programming styles in use and calling stored procedures from dot net programs for instance is at least a thing.

Plus, lots of those ORM things get plenty of hate from DBAs, I mean I like them but then again I'm not a very good programmer.

mdavidn · on July 15, 2021

That's perhaps true of stored procedures, but SQL views, constraints, and triggers written in a declarative manner are far more maintainable and far less buggy than imperative application code, in my experience.

throw1234651234 · on July 16, 2021

I have no words for "triggers" and "maintainable" in the same sentence - enjoy your deadlocks and having no clue why a single write says "UPDATED x records" y times without having to dig down through an ancient UI.

I am actually "good" at this stuff - writing logic in SQL through triggers, which call functions (at which SQL Server is way better than PG is) and stored procs, etc, and it's just a horrible, horrible practice.

jasonpeacock · on July 15, 2021

Business logic goes into the code, data goes into the database.

Sometimes business logic is implemented as rules, in which case the rules (configuration) can go into either configuration files or a database. But that doesn't make it data...

fiddlerwoaroof · on July 15, 2021

Data is just crystallized business logic. Anemic databases aren’t a big deal when the database is just a persistence layer for a single app but, when you’re rewriting the application code or allowing BI queries directly against the database, designing a database as a service whose interface is stored procedures starts making data integrity much easier.

peferron · on July 15, 2021

Once you start having multiple data sources, business logic implemented in application code might be easier to reuse than business logic implemented in one of the data sources. For example: start with Postgres, then add ClickHouse to the mix. It might be easier to let the application handle access to both DBs, rather than write RLS policies in Postgres and then try to apply them to ClickHouse.

ahallock · on July 15, 2021

I think this is slightly reductive. The majority of your business logic, I agree, should reside in application code, but lightweight business logic in the form of database constraints are powerful at disallowing bad data. For example, you may have a rental booking application that shouldn't have overlapping date ranges. The database can give you this guarantee out of the box and will work for ALL clients and applications.

LAC-Tech · on July 16, 2021

The boundary between business logic and database validation logic is a fuzzy one. In one extreme everything goes in stored procedures inside an RDBMS, at the other extreme you right everything on top of a no-frills key value store.

Where exactly do you draw the line?

deepstack · on July 16, 2021

I would say 10 years ago yes. However now days with the way postgres is developing and with support for python/v8, and postgresREST. For db intensive stuff, it just make sense to do it in DB like postgres.

jrochkind1 · on July 15, 2021

how old were the systems? And have you ever had to maintain systems that were that many years old that weren't a nightmare to maintain and cost a fortune to fix?

mistersys · on July 15, 2021

This isn't really business logic... unless you're in the business of simply converting data formats.

As a data fetching approach and especially as tooling primitive to eliminate the n+1 ORM problem this is a terrific tool to be aware of.

oauea · on July 15, 2021

Thanks for sharing now wisdom, now if you could also explain why?

tofuahdude · on July 15, 2021

The posted article doesn't advocate for putting business logic in the DB.

It only describes postgres native functions for formatting data.

jansommer · on July 15, 2021

The obligatory comment to these posts is always "check out PostgREST". We use it for all frontend api's. Combined with Node or any other middleware, we add additional code if everything can't be done in plpgsql, but requests are usually just passed straight through Node.

kiwicopple · on July 16, 2021

Also a huge fan of PostgREST. We run it in every Supabase project - it's especially effective when you use it with PostgreSQL's row level security.

Postgraphile is another tool that does the same but for GraphQL instead of REST using the same concept.

jkingsbery · on July 15, 2021

"Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should."

While interesting, the advice offered in this post is generally bad, or at least in complete. OK, so, you "cut out the middle tier..." what now? Are web clients connecting to PostGRES directly? Will PostGRES handle connection authentication and request authorization? Logging that a particular user looked at particular data? Can the client cache this data? If so, for how long? Even taking the author's premise, this is not a good pattern - a simple JSON get request still comes with a bunch of other stuff that this doesn't bother to address.

But the premise is wrong - few applications just spit out JSON like this, they also have to handle updates, and there's business logic involved in that. Data needs to be structured in a way that's reasonable for clients to consume, which isn't always a row of data (maybe it should be collapsing all values in a column, so that in the JSON the client just gets a JSON array of the values).

simonw · on July 15, 2021

I worked on a Django project recently where we were generating GeoJSON from data in a PostgreSQL table.

We investigated using tricks like this to generate the JSON directly from a SQL query - my notes here: https://til.simonwillison.net/postgresql/constructing-geojso...

The performance was a big enough improvement that, had we been bottlenecked on this particular operation, we would have shipped this (I'm afraid I don't have notes on benchmarks).

We only decided not to ship it because of concerns about the ongoing maintenance overhead, since we had other Python code generating GeoJSON that we would have still used - so we would have ended up maintaining multiple implementations.

xaduha · on July 15, 2021

The more things change, the more they stay the same.

If we lived in XHTML 2.0 universe instead of HTML5 universe, then that would be the way you do things. Not with SQL and JSON mind you, but with XML everywhere.

I had a glimpse into that world and it looked pretty good at the time. XML databases + XQuery can produce whatever you want as long as it's XML. And XML could be many things. Many horrible things too, made by people with arms growing out of places they shouldn't.

limaoscarjuliet · on July 15, 2021

It is pretty easy to get any result set straight from PostgreSQL as json object:

"WITH _ AS (" real query goes here ") SELECT COALESCE(array_to_json(array_agg(_.*)), '[]') AS xyz FROM _";

From there it behaves as standard JSON, wich can be easily loaded into any language.

pjungwir · on July 15, 2021

If you are using Rails and ActiveModelSerializers, I wrote a gem to take your serializer definition and build the whole JSON result in Postgres: [0] It hasn't gotten much use, but I know at least one company that has used it to get 100x performance boosts on #index endpoints.

It generates jsonapi right now, since that is standard for Ember projects, but it'd be pretty easy to add an adapter for some other format.

[0] https://github.com/pjungwir/active_model_serializers_pg

karmajunkie · on July 16, 2021

I've done this manually as an optimization on particularly hairy (and large) reports, retrieving a JSON string from the postgres query, then interpolating it into a boilerplate string response.

In a few, particularly pathological cases (that nonetheless were fairly common to run into in prod), the response time of the endpoint went from literal minutes to a few hundred milliseconds on a test server with unlimited response timeout (prod would have timed out after 60s). Most of the performance increase came from not requiring Ruby to parse strings into fat objects, then turn it back into a long string to go out the wire.

davidwparker · on July 15, 2021

How does it compare in terms of performance to https://github.com/jsonapi-serializer/jsonapi-serializer (a fork of Fast JSON API out of Netflix, which wasn't being maintained)?

jrochkind1 · on July 15, 2021

wow, that's really neat, this could solve a problem I have, bookmarking!

green7ea · on July 15, 2021

I made a side project to experiment with this approach (and rust) a few years ago. It worked pretty well but it made simple queries more complex[1]. I was a little bit worried that complex queries would become a lot more complex so I'm not sure how well this approach would scale.

It removed a crazy amount of boilerplate code [2] though and made the project much more focused on the actual data which is always a good thing.

Ever since, I've been very curious to try edgedb [3] since it promises to do similar things without making the queries more complex.

[1]https://github.com/green7ea/newsy/blob/master/src/feed_overv...

[2]https://github.com/green7ea/newsy/blob/master/src/main.rs#L4...

[3]https://www.edgedb.com/

throwaway4400 · on July 16, 2021

I'm surprised no one has mentioned lock in. It is very difficult to swap out your db if all your business logic is weaved into it.

wilsonfiifi · on July 16, 2021

I think it’s more common to switch application language rather than the Database. I’m not implying that it’s not done but I suspect it occurs less often. Unless you’re going from SQL to NoSQL or vice versa.

your_challenger · on July 16, 2021

Databases are providing greater native functionalities which will reduce our efforts in writing boilerplate code.

But the underlying SQL gets complex and it gets harder to maintain high-performance code - or hire good engineers.

This has made me intrested in EdgeDB [1]. They are abstracting the high-performance SQL bit from the engineer by introducing a new query language EdgeQL [2].

You can think of EdgeQL and EdgeDB to be the next step in abstraction just like how we have many programming languages, that did just that for machine code and beyond. You may be able to write high performance assembly, but it's tiresome and may not lead to high performance. Also a better UX for the engineer (eg: Kotlin on JVM) improves time to market and overall better for the ecosystem.

[1] https://www.edgedb.com/

[2] https://www.edgedb.com/docs/edgeql/index

1st1 · on July 16, 2021

Very well put. :)

// I work at EdgeDB

zbjornson · on July 15, 2021

Shameless plug: I made bson-to-json for more or less the same purpose with MongoDB: converting the raw BSON response from the DB directly to JSON strings. Saves lots of CPU cycles and GC pressure.

https://github.com/zbjornson/bson-to-json

DoomHotel · on July 15, 2021

Postgres (>11) also implements the SQL/JSON path language that lets you extract table data from JSON.

SahAssar · on July 16, 2021

Is there a strict implementation of that in JS? I've looked and there seems to be a bunch of libraries doing JSONPath, but none that are strict implementation of PGs variant. I'd like to be able to use it for previews etc.

kbenson · on July 16, 2021

To me, the thing that excites me about this is more efficient ORM querying of data. Where previously you either had to make multiple queries to fill parent and child relationships (whether on demand of explicitly and prefilling a cache) or make a larger join where a lot of data may be duplicated in 1:N relationships making it inefficient (with a few levels of 1:N relationships chained leading to extremely inefficient data transfer), now you could actually just get a JSON data structure back of exactly what each field should be. There are inefficiencies at scale here as well with duplicated key names, but that can possibly be worked around to a degree with arrays and an array type index of some sort.

tehlike · on July 15, 2021

Once you build V8 into postgres, things get real interesting (thin javascript projections, and a bunch of other stuff)..

https://github.com/plv8/plv8

kisamoto · on July 16, 2021

I have an issue about performance I have often wondered.

Discussions like these encourage me to put more work into the DB layer however only recently have SQL databases like PostgreSQL become horizontally scalable (I know read replicas are a thing and maybe they are the answer).

So why would I put more work and effort into the less scalable data storage over the horizontally scalable, stateless app layer?

dalyons · on July 16, 2021

You wouldn’t. If you’re planning on doing anything with a large user count(millions+), putting it all in the db is guaranteed terrible pain later on. Replicas are not the answer. But if you’re doing a LoB app for hundred of users you get away with it. (I still wouldn’t)

rkwasny · on July 15, 2021

Don't do this.

It will create various problems when next engineer adds some business logic, another one adds some stored procedures....

ptrwis · on July 15, 2021

I did the approach with nesting jsonb_agg queries, it worked for me but was a bit hacky and sometimes complex. I hoped for Graph Query Language (the one supposed to be part of SQL standard) to make it easier to retrieve structred json from flat tables, but it doesn't look ready yet.

kerreks · on July 15, 2021

Does it scale well? I feel like doing all business logic in procedures would require company to use one huge database in order to intersect various domains

adsharma · on July 16, 2021

  resp = (
      await DepartmentQuery([1,2])
      .edge("employees")
      .project(["name", "start_date", ":id"])
      .to_json()
  )

https://github.com/adsharma/fquery/ https://adsharma.github.io/fquery/

adsharma · on July 16, 2021

Also exposes the underlying dataclasses as graphql types and capable of converting them to django models via decorators. I've tested that basic migrations work.

The schema export as json in the original article can be used to generate the schema for fquery.

Thaxll · on July 15, 2021

It's a bad idea and I'd like to see how do people actually do test using this model. Now you have to run PG locally everytime you need to unit test something? Also PG returns some JSON but if the object is not exactly what you want to send back you create another object and merge the two? Just don't do it, it's a terrible design. It "works" for simple API but for anything serious this is wrong.

midev · on July 15, 2021

You misunderstood the entire article. This is about formatting the data from Postgres. It doesn't change anything about your testing. If you're mocking data, you mock the JSON instead of raw rows. If you were doing integration tests, nothing changes anyway.

> Also PG returns some JSON but if the object is not exactly what you want to send back you create another object and merge the two

Sure. Literally nothing changes. Just as if you had to assemble a JSON blob from two Postgres queries and merge them.

> Just don't do it, it's a terrible design. It "works" for simple API but for anything serious this is wrong

I think you completely misunderstood the article. Returning a JSON object from a database query is not "wrong", and I've used this pattern with great success. It's so much easier to call a json_agg to turn a subquery into an array, than to try and manually parse out rows yourself in the middle layer.

tofuahdude · on July 15, 2021

Did you read the article at all before criticizing it as "terrible design"? All its talking about are native postgres methods for pre-formatting data.

gigatexal · on July 15, 2021

Docker run postgres… boom testing solved ;)

Thaxll · on July 15, 2021

For DB that you can't run locally how do you do? or the CI/CD, there are a millions of example where people don't run the DB locally.

cachehit · on July 15, 2021

Why can you not run PG locally?

kerreks · on July 15, 2021

It's pretty common to run db locally in docker for integration testing these days

dboreham · on July 15, 2021

You can also insert JSON directly into PostgreSQL.

gigatexal · on July 15, 2021

I wish RedShift got these functions. :(

blacktriangle · on July 15, 2021

Please just no, do not do this. Even the dumbest most simple system there is will evolve beyond simple CRUD access to tables. The second you put your schema on the wire, that schema can now never change. That's just such an insane commitment to make.

tengbretson · on July 15, 2021

The application layer is typically just a view that maps the data layer to the expectations of the view layer. If you are set on eliminating the application layer surely you could just implement that view in the data layer using... a view.

chousuke · on July 15, 2021

the database schema is an API like everything else, and you can solve API evolution problems the same way as you would with a web API, using eg. views, stored procedures and versioning.

There's no reason not to expose parts of your database schema directly to users as long as you treat it like you would any other API you provide.

LukeEF · on July 15, 2021

this is a really interesting way to think about it and something that I've given a fair amount of thought to as we embark on a data mesh journey - how does a domain team serve up a schema like an API, how does it evolve, and how do we think about versions.

chousuke · on July 15, 2021

I find that there are very few things related to programming that don't constitute an interface in some way, so it feels natural to me to think about everything in terms of the interface it presents to users, both explicitly in what you can express via whatever language you're using, and implicitly via conditions that can't be enforced but are required for correctness.

And when you have an interface, you really ought to put some thought into its design regardless of how "private" it is.

An API is an interface used by a program in some way, and any user-exposed database objects definitely qualify under that definition.

onefuncman · on July 15, 2021

that's what a Schema Registry is for... protobuf is a little more polished around managing forwards and backwards compatibility than avro but they both work.

joevandyk · on July 15, 2021

you can use views or functions to easily make changes to the schema

iammisc · on July 15, 2021

> The second you put your schema on the wire, that schema can now never change. That's just such an insane commitment to make.

If you change your schema, you can just write a sql expression to convert between schemas 'at runtime'. Like if you have a table with first name and last name and then you decide you really just need the name on the frontend, just change your SELECT statement to concat the two columns, or vice versa.

jexp · on July 15, 2021

I'm not impressed. Same in Neo4j's

MATCH (e:Employee) RETURN e { .* } as employee;

MATCH (e:Employee) RETURN collect(e { .* }) as employees;

MATCH (e:Employee)-[:IN]->(d:Department) WITH d, collect(e {.*}) as employees