Recent Writing

Converging on Declarative Data Materialization

Workflow schedulers are a bad fit for data materialization, and something better is badly needed. This article examines a potential solution that will feel familiar, its benefits, and the challenges facing its implementors.

It's Time to Retire the CSV

Despite its ubiquity and ease of access, CSV is a wretched way to exchange data. The time has long passed to retire CSV and replace it with something better.

S3 Intelligent-Tiering: What It Takes To Actually Break Even

When does it make sense for an object to be in Amazon S3’s Intelligent-Tiering ("S3-IT") storage class? The answer, unfortunately, is "it depends". (Published on the Duckbill Group blog.)

The AWS Data Ecosystem Grand Tour

In late 2019 and early 2020, I wrote a series of articles offering a whirlwind tour of the portions of AWS's vast service ecosystem dedicated to data management.

  1. Introduction
  2. Where Your AWS Data Lives
  3. Block Storage
  4. Object Storage
  5. Relational Databases
  6. Data Warehousing
  7. Data Lakes
  8. Key/Value and "NoSQL" Stores
  9. Graph Databases
  10. Time-Series Databases
  11. Ledger Databases
  12. SQL on S3 and Federated Queries
  13. Search
  14. Streaming Data
  15. File Systems
  16. Data Ingestion
  17. ETL
  18. Processing
  19. Data Interfaces
  20. Training Data for Machine Learning
  21. Data Security
  22. Business Intelligence