375 points by virtualwhys 3 days ago | 137 comments
hn_throwaway_99 3 days ago
It seems like one of the biggest fundamental flaws is that Postgres chose the O2N approach for tracking row versions instead of N2O. While switching to N2O wouldn't solve all problems (e.g. the article also talks about how Postgres stores full row copies and not just diffs), from an "80/20 rule" perspective, it seems like it would get rid of most of the downsides with the current implementation. For example, I'd assume that the vast majority of the time that transactions want the latest row version, so using the N2O ordering means you could probably do away with storing each row version in an index, as you'd only need to traverse the linked list of you needed an older version, which should be much less common.
topherjaynes 2 days ago
Tostino 2 days ago
fweimer 3 days ago
Every application is a bit different, but it's not that the PostgreSQL design is a loser in all regards. It's not like bubble sort.
winternewt 2 days ago
When doing game development in the early 2000s I learned that bubble sort is not a loser in all regards. It performs well when a list is usually almost sorted. One situation when this is the case is in 3D rendering, which sorts objects by their distance from the camera. As you move the camera around or rotate it, bubble sort works very well for re-sorting the objects given the order they had in the previous frame.
To prevent bad worst-case scenarios you can count the number of comparisons that failed on the last pass and the number of passes you have performed so far, then switch to a different sort algorithm after reaching a threshold.
jeltz 2 days ago
15155 2 days ago
Bubble sort is great in hardware and for mostly-sorted sets.
personomas 2 days ago
indulona 3 days ago
the article literally says that pg's mvcc design is from the 90s and no one does it like that any more. that is technology that is outdated by over 30 years. i'd say it does not make it a loser in all regards, but in the most important aspects.
mikeocool 3 days ago
I’d prefer not to be the first person running up against a limit or discovering a bug in my DB software.
mannyv 3 days ago
Flat files have also been reliably used in production for decades. That doesn't mean they're ideal...although amusingly enough s3 and its equivalent of flat files is what we've migrated to as a data store.
diroussel 2 days ago
simne 2 days ago
Many people consider this most expensive bug in history, when on first flight of Ariane 5, it enters speed range, which was hard prohibited in Ariane 4 software and caused software exception and then 1 billion crashed.
Honesty, they could re-check all ranges, but they decided, it would cost like write new software, so to save money, was made decision to just use old software, without additional checks.
j16sdiz 2 days ago
Actually, it is 1980s. The article:
> Its design is a relic of the 1980s and before the proliferation of log-structured system patterns from the 1990s.
kunley 3 days ago
So, newer not always means better, just saying
nine_k 3 days ago
While persisting key architectural ideas certainly has benefits, so does evolving their implementations.
kunley 21 hours ago
Having said that, I need to add, I am not an expert to say MVCC is good enough to be considered equally good like other write-concurrency mechanism in SQL databases. My example was given to just have a caution when judging, especially that the original counterexample had mentioned notoriously bad architectures (hello, MySQL...)
gregw2 2 days ago
Linus began work on it in April 1991: https://groups.google.com/g/comp.os.minix/c/dlNtH7RRrGA/m/_R...
kunley 1 day ago
kunley 1 day ago
throwawayie6 3 days ago
Are you referring to C++? That was actually created by a Danish guy, who was also inspired by the object oriented Simula language created in the 60s
nneonneo 3 days ago
naranha 3 days ago
jbellis 3 days ago
OrvalWintermute 3 days ago
“ In the 2000s, the conventional wisdom selected MySQL because rising tech stars like Google and Facebook were using it. Then in the 2010s, it was MongoDB because non-durable writes made it “webscale“. In the last five years, PostgreSQL has become the Internet’s darling DBMS. And for good reasons! It’s dependable, feature-rich, extensible, and well-suited for most operational workloads.”
Smart engineers were choosing postgres not because of the logical fallacy of popularum, but for the following reasons:
Data safety - not MyIsam, ACID, Similarity to Oracle, MVCC, SQL standards adherence, Postgres team, Helpful awesome community, Data types, High performance, BSD flexibility
Above are the reasons I selected Postgres while at ATT early 2000s and our Oracle DBA found it a very easy transition. While Mysql went through rough transitions, PG has gone from strength to strength and ever improving path.
I think Bruce Momjian is a big part of this success; they truly have an excellent community. <3
andruby 2 days ago
The preference kept growing thanks to data safety, DDL's in transactions, etc.
mxey 2 days ago
You know, I was wondering something regarding this write amplification. It's true that MySQL doesn't need to update its indexes like that. However, MySQL replication relies on the binlog, where every change has to be written in addition to the database itself (InnoDB redo log and so on). So, it seems to me, MySQL, if used in a cluster, has a different kind of write amplification. One that PostgreSQL does not have, because it reuses its WAL for the replication.
In addition, on the receiving side, MySQL first writes the incoming binlog to the relay log. The relay log is then consumed by the applier threads, creating more InnoDB writes and (by default) more binlog.