The AWS Data Ecosystem Grand Tour - Ledger Databases
Written by Alex Rasmussen on January 24, 2020
This article is part of a series. Here are the rest of the articles in that series:
- Where Your AWS Data Lives
- Block Storage
- Object Storage
- Relational Databases
- Data Warehouses
- Data Lakes
- Key/Value and "NoSQL" Stores
- Graph Databases
- Time-Series Databases
- Ledger Databases
- SQL on S3 and Federated Queries
- Streaming Data
- File Systems
- Data Ingestion
- Data Interfaces
- Training Data for Machine Learning
- Data Security
- Business Intelligence
In the last couple articles, we've been looking at special-purpose databases that perform well in domains where relational databases struggle. The service we'll cover in this article doesn't quite fit into that category. Its primary function has long been performed by traditional relational databases. In fact, its core functionality predates computers and is almost as old as civilization itself! We're of course talking about ledgers.
Ledgers have their origins in accounting, where they serve as a permanent master record of economic transactions. As a rule, ledgers are immutable and append-only. You aren't ever allowed to erase an entry from a ledger, edit an existing entry, or go back and insert an entry in the middle; you're only allowed to add new records to the end. Corrections to the ledger are made by adding correcting entries that compensate for any mistakes. In principle, this prevents someone from cooking the books and letting company money disappear, usually into the book-cooker's pocket.
In the world before computers, these ledgers were typewritten or hand-written books or cards or tablets that were stored in a safe place. Some of the oldest known examples of human writing are ledgers, written in cuneiform on papyrus by some ancient Mesopotamian accountant to track quantities of traded goods.
Today, ledgers are largely stored digitally, but the same concerns about someone cooking the books still apply. Financial institutions care a great deal about having an immutable, append-only, tamper-proof record of money changing hands. Companies with supply chain management problems also want to track the supply chain for the components that make up their products in a way that ensures (at least in theory) that nobody on the supply chain can lie about what they're shipping or where it came from.
This is where the ledger database, and AWS's Amazon Quantum Ledger Database (QLDB), comes in.
Wait, Is This Blockchain?
Before we talk about QLDB, it's worth addressing the false conflation of digital ledgers with blockchain technologies, one of the hot technological topics of the last couple years. To summarize as quickly as I can, blockchain technologies are trying to provide a distributed and decentralized ledger that no single entity controls, where the parties using the ledger don't have to trust each other. It turns out QLDB's ledgers are neither distributed nor decentralized; instead, they're fully controlled and operated by AWS. This means that at some level you have to trust AWS to not tamper with your QLDB ledgers, but in all but the most security-sensitive environments that's probably fine. If that's not something you can live with, AWS has a managed blockchain solution called Amazon Managed Blockchain, but we won't cover it here.
Rather than make users interact with ledgers directly, QLDB exposes them as tables. Each table contains as an unordered bag of documents. Writing or creating tables or documents is transparently mapped to append operations on the underlying ledger. In addition to normal CRUD operations on documents, users can also query the revision history for a table to see what documents have changed over time and how they've changed.
At its core, QLDB is an append-only journal that records every change to the ledger's data. The journal is composed of a linked list of blocks that contain information about a change. Each block has a digest that can be used to determine if a block has been tampered with. For the first block in the chain, that digest is just a function of the block's contents. All subsequent blocks' digests are a function of both the block's contents and the digest of the previous block in the list. This makes both the block's data and its linkage to the rest of the list tamper-resistant.
Documents are written in AWS's Ion document format, a superset of JSON with a richer type system and read-optimized binary representation. Tables are queried using PartiQL, AWS's extension of SQL that supports semi-structured data. Interestingly, PartiQL isn't specific to QLDB; it's also the query language used for AWS services like Redshift Spectrum that query data in a data lake or data federated across multiple systems, which we'll look at further when we discuss federated queries.
There are a few restrictions on what you can do with QLDB tables versus what you might be used to in relational databases. You can declare indexes, but you can only create 5 indexes per table, can only create them when the table is empty, and can't drop them after they're created. Indexes are only used to speed up equality predicates, so you shouldn't expect
SELECT ... WHERE fieldName < 12 to go any faster if you have an index on
fieldName. You can do joins, but only inner joins are currently supported.
Given these restrictions, you might wonder why you shouldn't just do all this work in an existing relational or document database. The short answer is that you totally can, but you'd probably end up re-implementing a lot of the same stuff that QLDB already has. Given the higher than usual potential for vendor lock-in with QLDB, that may well be the right choice for you.
QLDB is serverless, so you're not managing any instances directly. Instead, you pay for read and write requests per 1 million requests. You also pay for index and journal storage by the GB-month.
Next: Join the Federation
In this article, we took a look at AWS's ledger database offering, which defines a SQL-like interface on top of a domain-specific data structure. In the next article, we'll look at systems that support SQL-like interfaces on top of data that's not always relational, and not always in one data store.
If you'd like to get notified when new articles in this series get written, please subscribe to the newsletter by entering your e-mail address in the form below. You can also subscribe to the blog's RSS feed. If you'd like to talk more about any of the topics covered in this series, please contact me.