82 points by lambrospetrou 6 days ago | 59 comments
samsquire 3 days ago
And the pattern of including "check" transaction item is how we manually maintain data integrity (characteristic of Atomic in DBMS)
And we know which transactions are writing because they told us they wanted to write in the prepare phase (the part that the transaction manager handles separate from the one shot transaction information perspective from the client with its own communication between the transaction manager and storage nodes)
I implemented a toy dynamodb that is a trie in front of a hash map, it handles the "begins with" query style.
XorNot 3 days ago
I couldn't really find any compelling reason to use it though: an RDBMS would've been way easier.
smashedtoatoms 3 days ago
davidjfelix 3 days ago
pdhborges 3 days ago
cldcntrl 3 days ago
Here's most of the time out in the real world:
- Low-cardinality partition key leading to hot keys, trashing capacity utilization.
- Bad key design means access patterns are off the table forever, as nobody wants to take on data migration with BatchWriteItem.
- Read/write spikes causing throttling errors. The capacity concept is difficult - people don't understand how capacity relates to partitions and object sizes, or wrongly assume "On-Demand Capacity" means throttling is impossible, or that Provisioned Capacity Autoscaling is instant.
- Multiple GSIs to cover multiple access patterns = "why is our bill so high?".
I've seen these issues over and over again while working with real organizations.
Of course it's impressive technology, it's just so littered with traps that I've stopped recommending it except in very specific cases.
tbarbugli 4 days ago
eknkc 3 days ago
- In a SAAS API service we used dynamodb to look up API keys and track their daily usage data. It is fast enough to look up k/v pairs (api key => key info). And also aggregate small sets (We'd sum up call counts for current month and check if the API key had enough credits). This meant that the API itself did not need our RDBMS to function. We also had a postgresql instance for all relational data, subscriptions, user info etc. Had a trigger that would push any api key / subscription change to DynamoDB. In case of RDS issues, things kept chugging along.
- Working on a large buzzfeed like social media / news site in my country. We needed to store a lot of counters (reactions to articles, poll answers etc). All went into dynamodb and looked up from there. No hits on actual rdbms. There were a lot of traffic and dynamo made scaling things / keeping rds from melting easy for this kind of non critical data.
I'd not build an entire thing on DynamoDB but for specific use cases, I just loved it.
rad_gruchalski 3 days ago
Wouldn't doing it right there in postgres limit your footprint?
eknkc 2 days ago
Needed a pretty high uptime guarantee so we decided that as long as AWS region is up and running, the API would also be available by using only completely managed aws services like dynamodb, lambda etc. Also had a bunch of beefy servers around other providers (hetzner, online.net etc) handling the actual work. They did not have any other dependencies either.
narmiouh 3 days ago
eknkc 3 days ago
We used it extensively on the second project I mentioned and a couple of other projects for caching / rate limiting and distributed locking needs. Never enabled the persistence layer (which I believe is pretty durable). So we only treated as an ephemeral data store, lowering the architectural complexity of things significantly. Otherwise you need to think about backups, testing backups, clustering in case of scaling needs, I have no idea how persistence works with clustering... DynamoDB is fully managed and solid.
mejutoco 3 days ago
ndr 3 days ago
guiriduro 3 days ago
mrkeen 3 days ago
My items are not relations, and I don't see the point in transforming them to and from relational form. And if I did, each row would have like 5 columns set to NULL, in addition to a catch-all string 'data' column where I put the actual stuff I really need. Which is how you slow down an SQL database. So RDBMS is no good for me, and I'm no good for RDBMS.
RDBMS offers strong single-node consistency guarantees (which people leave off by default by using an isolation level of 'almost'!). But even without microservices, there are too many nodes: the DB, the backend, external partner integrations, the frontend, the customer's brain. You can't do if-this-then-that from the frontend, since 'this' will no longer be true when 'that' happens. So even if I happen to have a fully-ACID DB, I still lean into events & eventual consistency to manage state across the various nodes.
Given that I'm using more data than a naive CRUD/SQL app would (by storing events for state replication) and my data is stringy enough to kill my (and others') performance. So what's the solution? Make my read-writes completely independent from other read-writes - no joins, no foreign keys, etc.
The thing that would put me off using DynamoDB is the same reason I wouldn't use any other tech - can I download it? For this reason I'd probably reach for Cassandra first. That said I haven't looked at the landscape in a while and there might be much better tools.
But it also wouldn't matter what I want to use instead of DynamoDB, because the DevOps team of wherever I work will just choose whatever's native&managed by their chosen cloud provider.
throwaway82452 3 days ago
Amazon provides a downloadable version for development. I don't know how close it is to the real thing, but it makes it easier to do local dev.
Localstack also supports it in their paid version
dygd 3 days ago
tempworkac 3 days ago
plandis 3 days ago
snapcaster 2 days ago
Lapapapaja 3 days ago
You can get really far with a RDMS before event sourcing etc is needed, the benefit being both your dev and user experience are going to be much simpler and easier.
If you already know your problem domain and scaling concerns up front sure. But starting with a scalable pattern like this is a premature optimization otherwise and will just slow you down.
mrkeen 3 days ago
You can manage up to 0 partners easily. Once you go above that threshold, you're into "2-Generals" territory. At that point you're either inconsistent, eventually-consistent, or you're just bypassing your own database and using theirs directly.
> dev and user experience are going to be much simpler and easier.
I have objects, not relations. I'm not going to do the work of un-nesting a fat json transaction to store it in a single relation (or worse, normalise it into rows across multiple tables).
mulmen 3 days ago
mrkeen 2 days ago
SQL now (for dev experience) && no-SQL later (for scaling)
to: no-SQL initially (for *much better* dev experience) && no-SQL later (for scaling)
I can get behind that.> When your objects are inconsistently shaped something has to fix them
They have one schema (the class file) instead of two (the class file and the SQL migrations).
mulmen 2 days ago
But what happens when that schema defining class file needs to change? You put all your migration code there? How is that different from SQL migrations?
mike_hearn 3 days ago
njitbew 3 days ago
Most of these arguments probably don't outweigh the benefits. If you're in need of a managed, highly-consistent, highly-scalable, distributed database, and you're already an AWS customer, what would you use instead?
oweiler 3 days ago
belter 3 days ago
andrewstuart 2 days ago