Posts
All the articles I've posted.
What is Apache Arrow? Erasing the Serialization Tax
Published: at 12:00 PMIf you pull a million records from a database into a Python notebook, the query runs instantly, but the transfer feels endlessly slow. Your compute en...
What is Apache Iceberg? The Table Format Revolution
Published: at 12:00 PMIf you drop ten thousand Parquet files into an S3 bucket, you have a data swamp. You do not have a database. To run SQL queries against those files sa...
What is Apache Parquet? Columns, Encoding, and Performance
Published: at 12:00 PMIf you ask a data analyst to calculate the average transaction amount for the month of July using a massive CSV file, the compute engine must read eve...