> Too often, web tiers are full of boilerplate that does nothing except convert a result set into JSON.
Yeah. This is the problem: we've abandoned the hypermedia architecture of the web for a dumb-data, RPC model. I suppose if you are going to go that direction and things are simple enough, you can jam it in the DB and get smoking perf.
But as things get more complicated, where does the business logic live? Maybe in the database as stored procedures? It's less crazy than it sounds.
An alternative is to cut out the other middle man: JSON, ditch heavy front-end javascript and return HTML instead, going back to the hypermedia approach. The big problem with that is a step back in UX functionality, but there are tools[1][2][3] for addressing that.
I did this for the very first external web app built for Bankers Trust Company back in the day. SQL Server back end with (classic) ASP on the front end. Even if someone had gained access to the web server, they wouldn't have had access to anything they shouldn't have. I built web apps, an online store, and a learning/video website using the same tech around the same time and they worked without issue for well over a decade. So, there is something to be said for this approach, though I wouldn't build something like this today.
A place I worked at had an ASP app in vbscript + one stored proc that dynamically executed raw sql strings from a database table. It worked so much better than it ever should've.
> I did this for the very first external web app built for Bankers Trust Company back in the day. SQL Server back end with (classic) ASP on the front end.
What, "(classic) ASP on the front end"? I thought ASP was a server technology; wasn't your actual front end, like, Internet Explorer?
Anyway, I did that too, twenty years ago, using a couple of Oracle PL/SQL packages. Weird dual-language meta-programming, writing SQL that spits out JavaScript that in turn calls the next PL/SQL-generated page...
I still believe the stored procedure route has merit. It's harder since most available libraries are built around the idea that the frontend has full access to the database. A frontend in (insert favorite frontend lang/framework here) and a backend in Postres using stored procedures could work quite well.
Unfortunately, the 'meta' is all about frontends with full backend access, so that's what I'm learning alongside other stuff.
Not sure if you've looked into either of postgrest or postgraphile, but I feel like those tools may be right up your alley. Rather than have your app layer generate SQL via an ORM, these tools instead generate your app layer based on your database schema, functions, etc. Postgraphile (generates graphql APIs) is particularly friendly towards frontend-heavy web apps.
I meant more that I wouldn't use VB/ASP and SQL Server. I still believe the stored procedure route has merit. It's harder since most available libraries are built around the idea that the frontend has full access to the database. A frontend in (insert favorite frontend lang/framework here) and a backend in Postres using stored procedures could work quite well.
The company I currently work for still actively uses classic ASP in production and they still actively develop that product. Currently trying to get away asap, because I don't wanna touch spaghetti vbscript from '99 with a ten foot pole.
Lots is boilerplate, but eventually you'll need more logic, then some CPU heavy tasks, then a memory heavy task, then a task suited to a specific language, then a task that needs a GPU, then a distributed memory cache.
Web servers are easy to horizontally scale. Even minute to minute if needed. Scaling your database hits limits and is much harder. Keep the database dumb.
That’s reasonable advice for growth stage companies aiming for huge user bases.
I work at an Alexa top10k, and our database server is quite small and sits at about 10% load most of the time.
You can grow huge with vertical scaling. Not all the way to “household name” size, but still truly huge. Servers with tens of tb of ram are available by the hour.
> That’s reasonable advice for growth stage companies aiming for huge user bases.
Not really, that's the result of hard-earned lessons that pay off from day 1.
Think about it: you have data in a database, and you put up an API to allow access to it.
Your API needs change, and now you need to make available a resource with/without an extra attribute. You need to provide it as a new API version in parallel with the current API version you're supporting.
What will you do then? Reupdate the whole db with a new migration? Put up two databases? Or are you going to tweak the DTO mapper and be done with it?
> I use a database that was designed by someone who understands one sometimes wants to run a migration. It’s not hard to add new fields.
We're not talking about adding attributes to a database. We're talking about one of the most basic cases in API design: changing a interface and serving two or more versions.
If the mapping step between getting data from the db and outputting it in a response is expected to be eliminated by getting the DB to dump JSON, how is this issue solved?
And how exactly is a full blown db migration preferable to mapping and serializing a DTO?
The way you are supposed to use Postgrest or Postgraphile is to provide only views and stored procedures as the API. These views and stored procedures are layered over your data schema and should be the only way to query or change the data. This way you can make API changes without requiring migrations.
The point is that instead of having queries that return tables that you then map to DTOs for serialization, you have queries that return JSON directly.
Another way to accomplish the same thing, if you don't want views or stored procedures, is to use a library like jOOQ that does much the same thing in your Java middleware.
Or use an API layer such as GraphQL that doesn’t require versioning for such a minor change since additive changes (or deprecations) to the API do not affect existing queries - each query states exactly what it needs.
Backwards incompatible changes will eventually be required by the business - at least once that business is oriented such that technical concerns do not overrule all other concerns.
I worked on a stored-procedure driven finance engine once. There was an internal language to build procedures quicker. Swear to god I had errors running them through the SQL IDE because they were too large. Once had to debug a 1.6m line SQL file. These things can really get out of hand if people are left unchecked!
Just not sure if it was worth the effort upfront to learn something other than simple Express (node.js) servers/middleware functions/router controllers with database client access. That paradigm just feels infinitely more "in control" and "extensible" to me.
You might like http://postgraphile.org, which is "postgrest for graphql" but where you can override anything with normal Node code – it's just an express plugin. (The docs don't make it obvious how powerful/flexible the tool is). Of course, you can also use pl/v8, pl/pgsql, etc and any other advanced postgres feature as well.
Once the client is the MVC execution environment with the client-server interaction used mostly for data replication, plus some extra invokable server-side behaviours written in isomorphic code, we can congratulate ourselves on having more-or-less reinvented Lotus Notes.
> But as things get more complicated, where does the business logic live? Maybe in the database as stored procedures? It's less crazy than it sounds.
We do this today with SQLite, but we don't use stored procedures. Assuming you load all required facts into appropriate tables, there is no logical condition or projection of information that you cannot determine with SQL.
We also leverage user/application defined functions to further enhance our abilities throughout. It's pretty amazing how much damage you can do with SQLite when you bind things as basic as .NET's DateTime ToString() or TryParse() into it. There's also no reason you cannot bind application methods that have side effects if you want to start mutating the outside world with SQL.
Putting business logic into stored procedures is an old idea, and very widely used from what I can tell. Especially with Oracle's PL/SQL and SQL Server's T-SQL.
One huge advantage is that the DB has a cache that is automatically utilized, instead of having to build your own cache in an application layer.
The PostgreSQL userbase seems to be behind when it comes to utilizing stored procedures, it seems that they were not a priority for the dev team for a long time.
- Round-trips avoided.
- Security.
- Data integrity.
The last two points rely on building a public API out of stored procedures / functions / views and disallowing direct access to base tables. That ensures no one can access anything without proper authorization and no one can change the data unless in accordance to what stored procedures allow.
It's just a shame that procedural extensions of SQL (T-SQL, PL/SQL and brethren) seem stuck in the '80s when it comes to developer experience.
I am doing this at work, I have an application with all the business logic in the database. It is a huge hassle though... I am versioning a bunch of scripts with git but it really isn't convenient.
The thing is that a database already stores data but all the schematics and ddl stuff would be much more conveniently kept in descriptive plaintext.
One thing I particularly dislike about SQL is DDL and all the system-specific extensions (e.g. for access control). I don't really get why there is a separate thing like "ALTER TABLE foo DROP COLUMN bar;" if you could have "DELETE FROM schema.foo WHERE name = 'bar';". 90 % of DDL is just duplicating basic DML functionality, Except For That Special Kind Of Object Which Totally Smells Like It's Relational But It's Not. Especially for access control - why is that just not a table? I'm sure it literally is one.
This makes SQL needlessly complex and tools to work with schemas far more complicated than they actually have to be.
In Lisp, most things are a list/pair. In SQL, most things should have been a table.
In many RDBMS this is in fact the case. Some take it up a notch, for example in postgresql even types are represented as a relation in the system catalog. However, a system schema is necessarily very close to the database’s architecture and capabilities, making it implementation specific; commonality of underlying structure doesn’t really emerge from relational algebra because it is a) primarily a query paradigm and b) SQL is only a very loose adherent. In practice, woe betide the DBA that writes “insert into pg_enum” or “insert into pg_database” without knowing all the extra stuff that CREATE <anything> actually does.
The DDL is also implementation specific but there’s a much higher degree of practical commonality. When writing application db migrations, and especially when supporting multiple backend databases, I’d definitely rather compose/generate DDL than engage with the minutiae of each.
You might find some eminent agreement[1], though to be fair he was talking about relational DBMS, not SQL DBMS. See rule 0 and 1:
Even table names, column names and domain names are represented as character strings in some tables. Tables containing such names are normally part of the built-in system catalog. The catalog is accordingly a relational data base itself — one that is dynamic and active and represents the metadata (data describing the rest of the data in the system)
Curious how far you’ve gone with this. You could have your CI/CD sync your stored procedures. But you would probably want to go further than that and manage them with a tool like db-migrate so you can keep your stored procedures tracking with your schema changes rather than just versioning them. At the end of the day, can you really separate your stored procedures from the data they operate on? I would say no you can’t.
Reminds me of a job back in the early 2000s which was Oracle PL/SQL stored procedures fronted by JSP with inline JDBC requests.
It was awful. Not intrinsically because of the architecture, but because of the piss-poor implementation. Piles of badly written PL/SQL that should have been on dailywtf, with even worse JSP. It was all written by IBM contractors and was obviously a garbage rush job.
Of course the response from mgmt and the internal dev team was to rewrite everything in the "modern" J2EE style, but keep the old database partially because large parts of the existing logic were just not understood well enough to rewrite safely. So the rot kind of just continued.
After I and others left I believe everything was finally rewritten from scratch in RoR or something.
> But as things get more complicated, where does the business logic live? Maybe in the database as stored procedures? It's less crazy than it sounds.
I've worked with a bunch of systems like this before. One was just a collection of PHP scripts that would trigger SQL queries. Another one was all lambdas and Cron jobs on top of mssql stored procedures.
If you have a decent team and if you use version control as intensely as you would with code, I have no reason to believe this cannot work. To be fair, those condition are true regarding of tech and architecture.
What you cannot do easily though is pivot, hire, and scale. This is what bit those teams I worked with and why those specific systems are no longer around.
I used to avoid things like stored procedures like the plague, because of my fear that crucial business logic would end up outside of version control.
These days I'm comfortable enough driving EVERY database change through a version controlled migration system (usually Django migrations) that I'm not concerned about this any more. It's not hard to enforce version controlled migrations for this kind of thing, provided you have a good migration system in place.
> One was just a collection of PHP scripts that would trigger SQL queries.
Yes, I've had to build a system like that (Java, but same diff.)
This model is popular in particular with banks, because they can split sensitive responsibilities in a way that makes sense to them.
It works fine if you have good communications between developers and DBAs. If you don't... well, I won't have to suggest finding another gig, you already want to.
This is just my experience but I've found that business logic usually ends up living before the query - most pages (whether read or update) follow a sort of pattern like...
1. Receive, route and validate (for form) the request
2. Validate the request for function (business rule validation - can such a user do such on a Tuesday and the like)
3. Compute business attributes around the form of the response or update
4. Execute the query against the database
5. Send the results to the user
I strongly agree that step 5 there doesn't really need to involve very much stuff happening outside of the DB - in our particular platform we have a very smooth rowset -> JSON translator in the application framework that includes support for mapping functions over the rows as they come out of the DB - the result is that we pretty much stop executing application code as soon as we send a query to the DB. While we do still delay the actual JSON encoding until the application layer it's thin, dumb and done in a streamed manner so that we don't have to hold all that junk in memory - and it comes with mode switches to either redirect the big DB blob to an HTML template, echo it as a CSV or even run a PDF template to render it into what the user wants to see.
The problem with this route is that as you scale everything breaks. You end up putting the events or activities table on DynamoDB or you have to store user presence in Redis or you have to vertically or horizontally shard and have to move all that business logic that was in your DB back into the application layer anyway.
Also, it's just harder to debug stored procedures when they do the wrong thing than it is with a typical problem in application code on a stack like Rails or Django.
Maybe this is why we can’t ever speak in generalities in this industry. I was going to say the exact opposite - almost all business logic I write comes _after_ the query is made. But I prefer a more DDD-like approach where logic is encoded in models, so there would be no way to do anything useful with raw data from a query.
I built pretty much the same thing for django for a private project. Html attributes to issue ajax requests and update doms. Coupled with middleware that would detect the requests and run the same django views but extract what was needed and make it an isomorphic app.
Thanks for the link. htmx looks really interesting - and the size is right too (1.3kb minified + gzipped). My biggest question would be how you make an offline capable site with this but I'm guessing service workers could take care of it.
I honestly think we don't do enough work on the backend these days, there's a trend for the client to get sent very raw data and to do a lot of processing on it just to render. If we can move more to the backend while still giving customers those isn't SPA style page transitions, it's a win in my books.
The application I work with has billions flow through it annually with hundreds of stored SQL procedures. It works well, but can be a bit difficult to follow at times. Everything is database driven, from the displays to the data in files exported for the number crunching apps, which then send result data back to the database.
Taking the concept to the extreme, the entire crt.sh site is served from postgres. It literally constructs the html in stored procedures, and uses some small apache httpd extensions to send requests to the db, and use the response (including letting the DB response dictate specific HTTP headers when required).
Okay, yeah we can serve UI over the wire. It's possible to make it work. But the experience will never be the same as a a native application that's designed to handle offline functionality, interactivity and optimistic updates.
The Hey email client is great example, hotwired.dev was built for Hey.
Guess what? It kind of sucks. It's buggy and slow. Randomly it stops working. When the internet goes down, random things work, random things don't work. If it weren't for Hey's unique features like the screener, I would much rather use a native app.
There's a ton we can do to make the the developer experience of rendering on the client side better, but there's only so much we can do to make the user experience of serving UI over the wire better. When the wire breaks or slows down, the UI renderer stops working.
We built an internal tool for our team we call "restless", and it lets us write server side code in our project, and import it from the client side like a normal functional call, with full typescript inference on both ends. It's a dream. No thinking about JSON, just return your data from the function and use the fully typed result in your code.
We combine that with some tools using React.Suspense for data loading and our code looks like classic synchronous PHP rendering on the server, but we're rendering on the client so we can build much more interactive UIs by default. We don't need to worry about designing APIs, GraphQL, etc.
Of course, we still need to make sure that the data we return to the client is safe, so we can't use the active record approach of fetching all data for a row and sending that to the client. We built a SQL client that generates queries like the ones in the OP for us. As a result, our endpoints are shockingly snappy because we don't need to make round trip requests to the db server (solving the n+1 ORM problem)
And our tool takes this code and generates a single query to fetch exactly the data we need in one shot. These queries are easy to write because it's all fully typed as well.
> An alternative is to cut out the other middle man: JSON, ditch heavy front-end javascript and return HTML instead, going back to the hypermedia approach.
You are incorrectly assuming all JSON APIs are consumed by HTML sites on the same domain.
APIs are for machine consumption across the web.
Your solution of "just make static HTML" doesn't even begin to address the problem. Unless you're saying API consumers should go back to scraping HTML instead of having made for the purpose APIs.
Rest over HTTP works well. You get easy load balancing, caching etc.
JSON seems to of been the message format of choice due to easy interoperability with the browser. However, with this, you end up writing serialisers on the server, deserialisers on the client. That's the majority of the job, reading what JSON fields you need to send down and for the client reading what JSON fields to read.
Using unstructured JSON has lead to OpenAPI previously Swagger. You put a bunch of metadata in your server API code to generate some loosely typed schema describing your service so clients can generate deserialisers etc. This is still unstructured JSON and seems like a kludge.
I've recently had success using Protobuf as the message format for HTTP API's. I still get easy load balancing, CDN caching, all the benefits of HTTP, which I'd lose if I went gRPC, Protobuf has a smaller size than JSON and add Brotl compression reduces page sizes.
The excellent thing about using Protobuf over HTTP is it does what OpenAPI attempts to do without the kludge. You publish your Protobuf schemas, the server generates strongly typed objects to use, the clients with Typescript generate strongly typed objects. There's no more serialising JSON or manually reading JSON. You get typed parsers generated from a schema definition, so you call `encode` or `decode` on them.
Protobuf works across a large number of languages. My server side code has become trivial, my client side code has become trivial. As a side effect, I spend less on network bandwidth, pay less in compute with de/serialisation, and clients see a speed increase due to reduced page load sizes. Protobufs then also reduce the need for OpenAPI. Grpc entirely removes the need for OpenAPI/Contract based testing but going full Grpc on the client-side in the browser has more setup with a special proxy and you loose the load balancing/caching of HTTP.
TLDR; stop using JSON for web API's using something such as Protobuf with a schema. JSON is the equivalent of running your app in debug mode.
Good reason being JSON is trivial to parse in the browser and webapps exploded in use? It's a valid reason but that doesn't mean something with a strong schema that has a good user experience is bad?
There's nothing fundamentally wrong with XML + Schemas just they are a hassle to deal with being verbose and more effort generating/parsing. XML still has a place it's just forgotten and not cool anymore. Personally I like XML, maybe not for webapps message format but it's properties still hold today.
Protobuf / Thrift etc are probably a evolution of XML+Schemas on the web using a binary format. They are less verbose and Javascript isn't the Javascript of 15 years ago so also easy to work with in the browser.
It seems my original comment is down voted which is fair enough. I've been generally surprised at how much better an experience all round it has been switching to Protobuf instead of using JSON. My code is easier, there's less of it, it's more performant, it's easy to share across the backend/frontend layers, it has a long list of languages it can be used from with zero effort.
I was a bit pessimistic about it but had a API I needed to save bandwidth on for end users so tried switching from JSON as an experiment. I expected it to be a hassle but it turned in to the complete opposite. Any REST service I create going forward JSON won't be my first choice. Protobuf is simpler to work with in the browser (no manual parsing/typing the wrong field) and server side, it's also far more efficient.
For clients they get evolution support, can download prebuilt parsers/deserilisers or create there own from published schema definitions, I see it as win win all round and no drawbacks.
> Yeah. This is the problem: we've abandoned the hypermedia architecture of the web for a dumb-data, RPC model.
I'm not sure that's the issue at all.
The article complains about boilerplate code used to convert results to JSON. To me those complains make absolutely no sense at all, because that boilerplate is used for two things: validate data, and ensure that an interface adheres to it's contract.
It makes absolutely no sense at allto just dump JSON from the outside world into a db without validating or mapping. And vice-versa. That's what the so called boilerplate does.
What's more astonishing is that nowadays the so called boilerplate is pretty close to zero, thanks to the popularity of mapping and validation frameworks. Take .NET Core for example, and modules like AutoMapper and FluentValidation. They map and project and validate all the DTOs with barely any noticeable code. Why is this a problem?
I've implemented a large system managing billions of records using exactly this approach of cutting out a lot of boilerplate in the application layer.
The most important thing is to be pragmatic. This approach works great for CRUD as well as many types of complex analytical queries. However, for cases where the the complexity or performance of the SQL was unacceptable, we also decoded data into application data structures for further processing.
When done well, you can get the best of both worlds as needed.
It is a really, really bad idea to put business logic in your database and there are people in this thread downvoting everyone who points it out. Please do not do this in production - I've had to maintain systems that married business logic and database and it was a nightmare to maintain and cost a fortune to fix.
> It is a really, really bad idea to put business logic in your database and there are people in this thread downvoting everyone who points it out
Your database is your business logic. From constraints to the queries you write, it's all business logic.
Regardless, you misunderstood the article. This is about moving formatting into the db layer. Letting Postgres return JSON, as opposed to middleware code to translate rows to JSON.
Which is like the opposite of business logic code.
Like everything, it depends. Some business logic like "this field should never be negative" is probably fine.
More complex logic is a lot harder to maintain. You don't want to stress your DB's CPU if you can avoid it. You still need some kind of connection pooler anyway. There's a lot of reasons it doesn't make sense to put all your business logic in the DB. SQL is not the easiest language to maintain (not enough abstraction).
IMO the line to draw is just enough to keep data sanity.
I like this distinction from wikipedia [1] "Anything that is a process or procedure is business logic, and anything that is neither a process nor a procedure is a business rule."
Meaning constraints and the model represent business rules.
Why are they less maintainable? It’s just code you can write in a code editor, version control in a versioning system, have automatic tests for, have automatic deployment scripts for, etc. In the end the database is just an application server executing your code close to the data. Only thing you have to ask yourself is how much your code needs to scale. But it can be much more efficient to run business logic in the database as stored procedures.
I personally have found it's much more difficult to partially mock out concepts when writing logic in the DB layer though my tooling may have been the weakpoint. In a nice application language I can easily isolate specific units of code for direct testing - while every test I write over complex database logic feels like composing a symphony where I need to make sure the horns are coming in on the right pitch and the wind section isn't too early to properly simulate a relatively easy error scenario.
Since the application layer is much more naturally aligned with small well isolated chunks of logic it is easier to minimize the volume of code and logic in scope when testing a particular attribute - when that goes into the DB I've always seen things get more complex rapidly.
Using a migration utility like ActiveRecord, Flyway, or go-pg, SQL definitions can also live in version control, have automatic tests, and deploy automatically. Who has a database in production without these things?
That's perhaps true of stored procedures, but SQL views, constraints, and triggers written in a declarative manner are far more maintainable and far less buggy than imperative application code, in my experience.
I have no words for "triggers" and "maintainable" in the same sentence - enjoy your deadlocks and having no clue why a single write says "UPDATED x records" y times without having to dig down through an ancient UI.
I am actually "good" at this stuff - writing logic in SQL through triggers, which call functions (at which SQL Server is way better than PG is) and stored procs, etc, and it's just a horrible, horrible practice.
Business logic goes into the code, data goes into the database.
Sometimes business logic is implemented as rules, in which case the rules (configuration) can go into either configuration files or a database. But that doesn't make it data...
Data is just crystallized business logic. Anemic databases aren’t a big deal when the database is just a persistence layer for a single app but, when you’re rewriting the application code or allowing BI queries directly against the database, designing a database as a service whose interface is stored procedures starts making data integrity much easier.
Once you start having multiple data sources, business logic implemented in application code might be easier to reuse than business logic implemented in one of the data sources. For example: start with Postgres, then add ClickHouse to the mix. It might be easier to let the application handle access to both DBs, rather than write RLS policies in Postgres and then try to apply them to ClickHouse.
I think this is slightly reductive. The majority of your business logic, I agree, should reside in application code, but lightweight business logic in the form of database constraints are powerful at disallowing bad data. For example, you may have a rental booking application that shouldn't have overlapping date ranges. The database can give you this guarantee out of the box and will work for ALL clients and applications.
The boundary between business logic and database validation logic is a fuzzy one. In one extreme everything goes in stored procedures inside an RDBMS, at the other extreme you right everything on top of a no-frills key value store.
I would say 10 years ago yes. However now days with the way postgres is developing and with support for python/v8, and postgresREST. For db intensive stuff, it just make sense to do it in DB like postgres.
how old were the systems? And have you ever had to maintain systems that were that many years old that weren't a nightmare to maintain and cost a fortune to fix?
The obligatory comment to these posts is always "check out PostgREST". We use it for all frontend api's. Combined with Node or any other middleware, we add additional code if everything can't be done in plpgsql, but requests are usually just passed straight through Node.
"Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should."
While interesting, the advice offered in this post is generally bad, or at least in complete. OK, so, you "cut out the middle tier..." what now? Are web clients connecting to PostGRES directly? Will PostGRES handle connection authentication and request authorization? Logging that a particular user looked at particular data? Can the client cache this data? If so, for how long? Even taking the author's premise, this is not a good pattern - a simple JSON get request still comes with a bunch of other stuff that this doesn't bother to address.
But the premise is wrong - few applications just spit out JSON like this, they also have to handle updates, and there's business logic involved in that. Data needs to be structured in a way that's reasonable for clients to consume, which isn't always a row of data (maybe it should be collapsing all values in a column, so that in the JSON the client just gets a JSON array of the values).
The performance was a big enough improvement that, had we been bottlenecked on this particular operation, we would have shipped this (I'm afraid I don't have notes on benchmarks).
We only decided not to ship it because of concerns about the ongoing maintenance overhead, since we had other Python code generating GeoJSON that we would have still used - so we would have ended up maintaining multiple implementations.
The more things change, the more they stay the same.
If we lived in XHTML 2.0 universe instead of HTML5 universe, then that would be the way you do things. Not with SQL and JSON mind you, but with XML everywhere.
I had a glimpse into that world and it looked pretty good at the time. XML databases + XQuery can produce whatever you want as long as it's XML. And XML could be many things. Many horrible things too, made by people with arms growing out of places they shouldn't.
If you are using Rails and ActiveModelSerializers, I wrote a gem to take your serializer definition and build the whole JSON result in Postgres: [0] It hasn't gotten much use, but I know at least one company that has used it to get 100x performance boosts on #index endpoints.
It generates jsonapi right now, since that is standard for Ember projects, but it'd be pretty easy to add an adapter for some other format.
I've done this manually as an optimization on particularly hairy (and large) reports, retrieving a JSON string from the postgres query, then interpolating it into a boilerplate string response.
In a few, particularly pathological cases (that nonetheless were fairly common to run into in prod), the response time of the endpoint went from literal minutes to a few hundred milliseconds on a test server with unlimited response timeout (prod would have timed out after 60s). Most of the performance increase came from not requiring Ruby to parse strings into fat objects, then turn it back into a long string to go out the wire.
I made a side project to experiment with this approach (and rust) a few years ago. It worked pretty well but it made simple queries more complex[1]. I was a little bit worried that complex queries would become a lot more complex so I'm not sure how well this approach would scale.
It removed a crazy amount of boilerplate code [2] though and made the project much more focused on the actual data which is always a good thing.
Ever since, I've been very curious to try edgedb [3] since it promises to do similar things without making the queries more complex.
I think it’s more common to switch application language rather than the Database. I’m not implying that it’s not done but I suspect it occurs less often. Unless you’re going from SQL to NoSQL or vice versa.
Databases are providing greater native functionalities which will reduce our efforts in writing boilerplate code.
But the underlying SQL gets complex and it gets harder to maintain high-performance code - or hire good engineers.
This has made me intrested in EdgeDB [1]. They are abstracting the high-performance SQL bit from the engineer by introducing a new query language EdgeQL [2].
You can think of EdgeQL and EdgeDB to be the next step in abstraction just like how we have many programming languages, that did just that for machine code and beyond. You may be able to write high performance assembly, but it's tiresome and may not lead to high performance. Also a better UX for the engineer (eg: Kotlin on JVM) improves time to market and overall better for the ecosystem.
Shameless plug: I made bson-to-json for more or less the same purpose with MongoDB: converting the raw BSON response from the DB directly to JSON strings. Saves lots of CPU cycles and GC pressure.
Is there a strict implementation of that in JS? I've looked and there seems to be a bunch of libraries doing JSONPath, but none that are strict implementation of PGs variant. I'd like to be able to use it for previews etc.
To me, the thing that excites me about this is more efficient ORM querying of data. Where previously you either had to make multiple queries to fill parent and child relationships (whether on demand of explicitly and prefilling a cache) or make a larger join where a lot of data may be duplicated in 1:N relationships making it inefficient (with a few levels of 1:N relationships chained leading to extremely inefficient data transfer), now you could actually just get a JSON data structure back of exactly what each field should be. There are inefficiencies at scale here as well with duplicated key names, but that can possibly be worked around to a degree with arrays and an array type index of some sort.
I have an issue about performance I have often wondered.
Discussions like these encourage me to put more work into the DB layer however only recently have SQL databases like PostgreSQL become horizontally scalable (I know read replicas are a thing and maybe they are the answer).
So why would I put more work and effort into the less scalable data storage over the horizontally scalable, stateless app layer?
You wouldn’t. If you’re planning on doing anything with a large user count(millions+), putting it all in the db is guaranteed terrible pain later on. Replicas are not the answer.
But if you’re doing a LoB app for hundred of users you get away with it. (I still wouldn’t)
I did the approach with nesting jsonb_agg queries, it worked for me but was a bit hacky and sometimes complex. I hoped for Graph Query Language (the one supposed to be part of SQL standard) to make it easier to retrieve structred json from flat tables, but it doesn't look ready yet.
Does it scale well? I feel like doing all business logic in procedures would require company to use one huge database in order to intersect various domains
Also exposes the underlying dataclasses as graphql types and capable of converting them to django models via decorators. I've tested that basic migrations work.
The schema export as json in the original article can be used to generate the schema for fquery.
It's a bad idea and I'd like to see how do people actually do test using this model. Now you have to run PG locally everytime you need to unit test something? Also PG returns some JSON but if the object is not exactly what you want to send back you create another object and merge the two? Just don't do it, it's a terrible design. It "works" for simple API but for anything serious this is wrong.
You misunderstood the entire article. This is about formatting the data from Postgres. It doesn't change anything about your testing. If you're mocking data, you mock the JSON instead of raw rows. If you were doing integration tests, nothing changes anyway.
> Also PG returns some JSON but if the object is not exactly what you want to send back you create another object and merge the two
Sure. Literally nothing changes. Just as if you had to assemble a JSON blob from two Postgres queries and merge them.
> Just don't do it, it's a terrible design. It "works" for simple API but for anything serious this is wrong
I think you completely misunderstood the article. Returning a JSON object from a database query is not "wrong", and I've used this pattern with great success. It's so much easier to call a json_agg to turn a subquery into an array, than to try and manually parse out rows yourself in the middle layer.
Please just no, do not do this. Even the dumbest most simple system there is will evolve beyond simple CRUD access to tables. The second you put your schema on the wire, that schema can now never change. That's just such an insane commitment to make.
The application layer is typically just a view that maps the data layer to the expectations of the view layer. If you are set on eliminating the application layer surely you could just implement that view in the data layer using... a view.
the database schema is an API like everything else, and you can solve API evolution problems the same way as you would with a web API, using eg. views, stored procedures and versioning.
There's no reason not to expose parts of your database schema directly to users as long as you treat it like you would any other API you provide.
this is a really interesting way to think about it and something that I've given a fair amount of thought to as we embark on a data mesh journey - how does a domain team serve up a schema like an API, how does it evolve, and how do we think about versions.
I find that there are very few things related to programming that don't constitute an interface in some way, so it feels natural to me to think about everything in terms of the interface it presents to users, both explicitly in what you can express via whatever language you're using, and implicitly via conditions that can't be enforced but are required for correctness.
And when you have an interface, you really ought to put some thought into its design regardless of how "private" it is.
An API is an interface used by a program in some way, and any user-exposed database objects definitely qualify under that definition.
that's what a Schema Registry is for... protobuf is a little more polished around managing forwards and backwards compatibility than avro but they both work.
> The second you put your schema on the wire, that schema can now never change. That's just such an insane commitment to make.
If you change your schema, you can just write a sql expression to convert between schemas 'at runtime'. Like if you have a table with first name and last name and then you decide you really just need the name on the frontend, just change your SELECT statement to concat the two columns, or vice versa.
Yeah. This is the problem: we've abandoned the hypermedia architecture of the web for a dumb-data, RPC model. I suppose if you are going to go that direction and things are simple enough, you can jam it in the DB and get smoking perf.
But as things get more complicated, where does the business logic live? Maybe in the database as stored procedures? It's less crazy than it sounds.
An alternative is to cut out the other middle man: JSON, ditch heavy front-end javascript and return HTML instead, going back to the hypermedia approach. The big problem with that is a step back in UX functionality, but there are tools[1][2][3] for addressing that.
[1] - https://htmx.org (my library)
[2] - https://unpoly.com
[3] - https://hotwired.dev