A Brief History: From Ledgers to Big Data
In the 1970s and 80s, databases were designed to mimic physical financial ledgers. If a customer visited a bank, the teller needed to locate and update a specific row of data (containing their name, address, and balance). This was the origin of row-based Relational Database Management Systems (RDBMS).
Systems like Oracle and MySQL dominated the industry because they were highly efficient for OLTP (Online Transactional Processing)—handling thousands of small transactions quickly. However, by the 2000s, demands shifted. We began asking questions like, “What is the total sales trend across 50 million transactions?” This is where the row-based format started to struggle, paving the way for columnar technology (such as Snowflake, BigQuery, or ClickHouse).
Row-Based Storage: A Complete but Heavy Package
Imagine a row-based database as a grocery store that only sells giant, pre-packaged bundle boxes. Even if you only need a single egg, you are forced to buy a massive box that contains rice, cooking oil, milk, and meat all at once.
- How It Works: Data gets written to the disk horizontally. In a 100-column table, every single piece of information for “User A” is packed tightly together in one physical data block.
- The Pros: It’s lightning-fast for writing data. Tacking on a new customer is just a quick “tap” at the very end of the file.
- The Enemy of Dashboards: The moment your dashboard tries to calculate “Total Profit,” the system has to sift through every unrelated column—like Name, Address, and Phone Number—just to get to that “Profit” number. This triggers a massive I/O bottleneck by loading 90% of data you have absolutely no use for.
Columnar Storage: The Smart, Specialized Rack
Columnar storage flips this logic on its head. Instead of bundling data by individual records, it groups data by category. All ‘Price’ data is kept in one place, while all ‘Date’ data is stored in another
How This Makes Dashboards Faster?
Technically speaking, here is what happens under the hood of a columnar engine:
- Selective Reading (Projection): If your chart only requires “Date” and “Revenue,” the database performs a seek operation strictly on the physical blocks of those specific columns, completely ignoring hundreds of other columns.
- High-Level Compression: Since a single column contains the exact same data type, the database utilizes highly efficient algorithms:
- Run-Length Encoding (RLE): It writes “Indonesia x 1000” instead of repeating the same word a thousand times.
- Dictionary Encoding: It replaces long string values with short numeric codes (e.g., swapping “Smartphone” with 1).
- Data Skipping (Predicate Pushdown): The engine stores metadata (such as Min/Max values) for each data block. If you query for “Year 2024,” the engine automatically skips—a process known as data pruning—millions of rows whose metadata indicates a range of 2020-2022, without reading them at all.
- Vectorized Execution (SIMD): Unlike row-based systems that process records one by one, columnar engines leverage SIMD (Single Instruction, Multiple Data) technology. This allows the CPU to fetch 1,000 data points simultaneously and compute them in a single clock cycle.
| Feature | Row-based (OLTP) | Column-oriented (OLAP) |
| Analogy | Grocery Bundle | Specialized Rack |
| Write Speed | Slow for large aggregations | Super fast for analytics |
| Disk Efficiency | Very fast (Update/Insert) | Slower (Batch load) |
| Efisiensi Disk | Inefficient (Uncompressed data) | Highly efficient (High compression) |
| Best Suited For | Web Applications, ATM Transactions | Dashboards, BI, Data Science |
Conclusion: The Cure for Slow Dashboards
If your dashboard is running slow, chances are you are trying to perform heavy analytical computations on top of a transactional, row-based engine.
Switching to a columnar database is not just a “quick fix”—it is a paradigm shift from a heavy cargo truck to a super-fast bullet train. By leveraging I/O efficiency and parallel processing power, no matter how massive your data is, it ceases to be a bottleneck and transforms into an asset that can be queried instantly.