280 points by escot 19 hours ago | 134 comments
KronisLV 6 hours ago
CREATE TABLE ... WITH SOFT DELETES
Where the regular DELETE wouldn't get rid of the data for real but rather you could query the deleted records as well, probably have timestamps for everything as a built in low level feature, vs having to handle this with a bunch of ORMs and having to remember to put AND deleted_at IS NULL in all of your custom views.If we like to talk about in-database processing so much, why don't we just put the actual common features in the DB, so that toggling them on or off doesn't take a bunch of code changes in app, or that you'd even be able to add soft deletes to any legacy app that knows nothing of the concept, on a per table basis or whatever.
moritzwarhier 4 hours ago
"Soft deletes" is just a name for a regular write operation, with specific semantics.
Adding a layer of magic to the DB for this doesn't seem right to me.
And applications could have many different requirements for soft deletes, like the article points out. For example, the simplest version would be just a boolean "deleted" column, but it could also be "deleted_at", "deleted_by", etc.
All of these cases require an bunch of code changes anyway, and the more complex ones could interfere with an implementation of this feature at the database level: such a transparent implementation couldn't access app-specific concerns such as user data, for example.
Adding soft deletes to a legacy app that knows nothing about it would only work for a boolean flag and a maybe date-time value, unless the DBMS would also offer triggers for soft deletes etc?
Seems to me to that this capability would make a DBMS much more complicated.
hnthrow90348765 3 hours ago
As an aside, I've never found this to be worth it since you have to change too much and re-test everything for minimal user benefit and time savings. The effort is way worse if the code is not great in the first place. It can be a great decision to make before everything is written.
Maybe it's worth it for files which are hard to reproduce, but you can also rely on DB backups to get those back. If people are regularly deleting things they're not supposed to, you're better off removing the user-facing delete actions, limiting the action to specific users, etc.
Spivak 3 hours ago
PaulHoule 2 hours ago
... WITH SOFT UPDATES
and it adds to the table definition as well as to the schema that will cause subsequent statements to be rewritten. There's a lot of interesting logic (in the literal sense) in SQL that is hidden by a strange, irregular syntax that is more obvious in other approaches to databases such as Datalog. I think it was little appreciated outside the hardcore semantic web community that you could compile SPARQL + OWL to SQL and get powerful inference facilities. SQL is a great target for metaprogramming precisely because it is not Turing complete and that a code generator doesn't have to think at all about the order that events are sequenced in. It's kinda sad that metaprogramming tools for SQL are almost all pre-Chomsky and pre-dragon book internal DSLs like JooQ and SQLAlchemy which have their charms (JooQ's excellent integration with Java IDEs) but fall short of what could be done with SQL-to-SQL and SQL-to-X transformations.Stored procedures are great but many shops don't use them for various reasons. It doesn't help that they look like a mix of FORTRAN and COBOL and also come in a few variations from the (better) set-based PL/SQL of Oracle to the (worse) Transact-SQL based stored proc of Microsoft SQL and PostgresSQL. The other day I talked with Krisztián Szabó of
who developed a compiler that writes stored procs that do database synchronization.
On the other hand, if you've got access to the internals of the frickin' database I think you can do something better than the ordinary application level soft updates. For instance a "customer record" might very well be not just a row in one table but maybe 15 rows in four tables that are inserted in a transaction and you want to be able to undelete them as a unit.
crazygringo 3 hours ago
Different products will handle soft deletes differently. Which queries need to include soft-deleted rows and which don't? What about different levels of soft deletes, e.g. done by the user (can be undone by user) vs. done by an admin (can't be undone by user)?
Implementing soft deletes yourself isn't hard. Yes you'll have to make a bunch of decisions about how they work in every circumstance, but that's the point.
marcosdumay 2 hours ago
If they did this, nobody would use it. They do lots of more useful things that people don't use because it's not portable.
There's a sibling comment about temporal databases. Those solve a very bothersome problem, so a few people use them. That means that there's a chance soft deletes get adopted as a side effect of a much more complex standard.
hypeatei 4 hours ago
0: https://learn.microsoft.com/en-us/sql/relational-databases/t...
rjbwork 3 hours ago
ajuc 25 minutes ago
It's one of these things that are available but most people ignore it and implement it manually with created_at updated_at deleted_at columns etc. I think one reason for this is lack of support in ORMs and lack of standardization between RDBMSes.
crazygringo 14 hours ago
They need to be surfaced to the product owner to decide. There may very well be reasons pieces of data should not be stored. And all of this adds complexity, more things to go wrong.
If the product owner wants to start tracking every change and by who, that can completely change your database requirements.
So have that conversation properly. Then decide it's either not worth it and don't add any of these "extra" fields you "might" need, or decide it is and fully spec it out and how much additional time and effort it will be to do it as a proper feature. But don't do it as some half-built just-in-case "favor" to a future programmer who may very well have to rip it out.
On a personal project, do whatever you want. But on something professional, this stuff needs to be specced out and accounted for. This isn't a programming decision, it's a product decision.
Extasia785 6 hours ago
Also not sure what you mean by additional effort? Created_at, updated_at or soft-deletes are part of most proper frameworks. In Spring all you need is an annotation, I've been using those in major projects and implementation cost is around a few seconds with so far zero seconds of maintenance effort in years of development. At least those fields are solved problems.
crazygringo 5 minutes ago
I've been parts of teams where features had to be totally thrown out and rebuilt because developers made big assumptions that turned out to be wrong, because they didn't think it was worth it to check with the product owner. Because they assumed it was only a "technical decision".
This doesn't mean checking with product around each line of your code, obviously. But deciding what information gets stored in the database, what level of event tracking you do, whether deletes are hard or soft -- these have massive product implications, and potentially legal ones.
And it is additional effort. Now you have to write tests for all those things. Are the timestamps being stored correctly? Are the permission bits being stored correctly? Is "created_by" coming from the right user? Is "updated_at" actually being updated on every row change? But are we making sure "updated_at" is not getting changed when we import data from a separate table? How often do we remove soft-deleted data, and with what cron job, and who maintains that? Where do alerts go if the cron job fails? In order to ensure that we comply with privacy regulations, where deleted data actually needs to be deleted? I could go on and on and on.
So that's what I mean by additional effort. It's not "around a few seconds". Because it's not just a technical question, it's a product one.
exe34 5 hours ago
the hypothetical future programmer is you in two weeks.
hamandcheese 12 hours ago
hombre_fatal 6 hours ago
Being unable to even call the shot of whether a database table should have an updated_at or soft-delete sounds like a Dilbertian hellscape to me.
bayindirh 6 hours ago
dogleash 2 hours ago
Sure there were people who's job was to offload as much compliance work from everyone else; by turning it into internal requirements, participating in design discussion and specializing in ensuring compliance. But trying to isolate the development team from it is just asking for micromanagers.
bayindirh 2 hours ago
Think before you act. The machine has no brain. Use yours.
> Part of every development job is learning the product domain.
Yes.
> In that case devs become comfortable with reading standard/law/regulations and anticipating when software implementation might interact with the areas covered.
This is what I'm saying, too. A developer needs to think whether what they are doing is OK by the regulation they're flying against. They need to ask for permissions by asking themselves "wait, is this OK by the regulation I'm trying to comply?".
> But trying to isolate the development team from it is just asking for micromanagers.
Nope, I'm all for taking initiatives, and against micromanagement. However, I'm also against "I need no permission because I'm doing something amazing" attitude. So own your craft, "code responsibly".
dogleash 2 hours ago
bayindirh 2 hours ago
IanCal 6 hours ago
danielPort9 6 hours ago
PaulHoule 2 hours ago
On the other hand, in some shops there is a dedicated DBA who is in charge of database schemas and possibly everything else. Before it became fashionable to create a "service layer" where people access the database (now database(s)) throw web endpoints, some organizations would put all the database access into a set of stored procedures managed by the DBA. Maybe that's extreme, but in the real world product owners come and go but the database is forever and deserves to have somebody speaking out for its interests.
chii 10 hours ago
in other words - YAGNI !
arialdomartini 11 hours ago
- updated_at
- deleted_at (soft deletes)
- created_by etc
- permission used during CRUD
to every table is a solution weaker than having a separate audit log table.
I feel that mixing audit fields with transactional data in the same table is a violation of the separation of concerns principle.
In the proposed solution, updated_at only captures the last change only. A problem that a separate audit log table is not affected to.
grey-area 10 hours ago
wodenokoto 7 hours ago
Put your documentation in doc strings where the function is defined - don’t have a separate file in a separate folder for that. It might separate concerns, but no one is looking there.
Similarly if those fields aren’t nullable, someone trying to add new rows will have to fill in something for those metadata fields - and that something will now very likely be what’s needed, rather than not pushing anything to the audit table.
Obviously your app can outgrow these simple columns, but you’re getting value now.
motorest 11 hours ago
nine_k 9 hours ago
motorest 6 hours ago
The absolute basics is to support snapshots and event replay. This is hardly rocket science.
nine_k 5 hours ago
Let's assume we want to remove every message related to user A.
A photo by user B got to be the best of the day because it collected most upvotes. Without the A's vote, it's no longer so. The photo also got to become the best of the month because it was later voted as the top from the best-of-the-day entries, and received a prize. Should we now play the message stream without the A's upvote, things are going to end up radically different, or end up in a processing error.
User B was able to send a message to user C, and thus start a long thread, because user A had introduced them. With user A removed, the message replay chokes at the attempt of B to communicate with C.
One way is to ignore the inconsistencies; it deprives you of most of the benefits of event sourcing.
Another way is anonymizing: replace messages about user A with messages about some null user, representing the removed users. This can lead to more paradoxes and message replay inconsistencies.
motorest 4 hours ago
That's not how snapshots work. You record the state of your system at a point in time, and then you keep all events that occurred after that point. This means you retain the ability to rebuild the current state from that snapshot by replaying all events. I.e., event sourcing's happy flow.
> User B was able to send a message to user C, and thus start a long thread, because user A had introduced them. With user A removed, the message replay chokes at the attempt of B to communicate with C.
Not really. That's just your best attempt at reasoning how the system could work. In the meantime, depending on whether you have a hard requirement on retaining messages from removed users you can either keep them assigned to a deleted user or replace them by deleted messages. This is not a problem caused by event sourcing; it's a problem caused by failing to design a system that meets it's requirements.
arialdomartini 10 hours ago
motorest 6 hours ago
alganet 14 hours ago
Do the long walk:
Make the schema fully auditable (one record per edit) and the tables normalized (it will feel weird). Then suffer with it, discover that normalization leads to performance decrease.
Then discover that pruned auditing records is a good middle ground. Just the last edit and by whom is often enough (ominous foreshadowing).
Fail miserably by discovering that a single missing auditing record can cost a lot.
Blame database engines for making you choose. Adopt an experimental database with full auditing history. Maybe do incremental backups. Maybe both, since you have grown paranoid by now.
Discover that it is not enough again. Find that no silver bullet exists for auditing.
Now you can make a conscious choice about it. Then you won't need acronyms to remember stuff!
lud_lite 13 hours ago
If you never use it, that data can be dumped to s3 glacier periodically (e.g. after 90 days).
By losing the foreign key you gain flexibility in what you audit. Maybe audit the operation and not the 20 writes it causes.
yes_man 9 hours ago
So like OP said, no silver bullets exist for auditing.
lud_lite 7 hours ago
klysm 14 hours ago
datadrivenangel 14 hours ago
klysm 13 hours ago
alganet 14 hours ago
It is the point where you give up modeling the audit as part of the systems tables.
The drawbacks of this choice are often related to retrieval. It depends on the engine.
I once maintained a system that kept a fully working log replicated instance delayed by 24h, ready for retrieval queries, in addition to regular disk backups (slow costy retrieval).
I am more developer than DBA, so I can probably speak more about modeling solutions than infra-centric solutions.
klysm 13 hours ago
globular-toast 10 hours ago
awesome_dude 13 hours ago
alganet 13 hours ago
I'm not saying databases are blameless. It's just that experiencing the issues they have by yourself is rewarding!
There is also a walk before the long walk of databases. Store things in text files and use basic tools (cat, sed, sh...).
The event driven stuff (like Kafka) reminds me of that. I am not very familiar with it though, just played a little bit with it once or twice.
awesome_dude 12 hours ago