What is a Columnar Database?

What is a Columnar Database?


Hi I’m Jared Hillam, A couple of years ago I was presented to something
called a Columnar database. It was accompanied by a demo where the presenter queried several
billions of rows of data in less than 3 seconds. I waited patiently for magic ferries to fly
out of the server. But soon after I realized that this solution was simply taking a far
more logical approach to acquiring large sums of data. Today I’m going to share a simple exercise
which will help you see the power of using a Columnar database for certain tasks. To illustrate this exercise, I’m going to
use a good old excel spreadsheet, populated with some baseball Statistics. Let’s imagine that you’re going to play
the role of the Traditional Database. Before I have you play this role though, you need
to know that there’s a rule. You have to read the data from left to right starting
from the beginning of each row, as you go from row to row, kind of like reading a book. So, let’s begin. Traditional Database, please
retrieve for me all of the American League Teams… OK I’ve asked for two columns of data, Leagues
and Teams. Here goes, as you read the data notice that you’re retrieving the League
and Team on each pass of every row, and eventually you’ll acquire all the American Leagues
and their associated Teams. Now servers do this very quickly, however what if my request
has several BILLIONS of rows, you can start to see that reading each row just to grab
the league and team columns could take a while. And this is where a columnar database comes
in handy. We’re going to redo the exercise one more time, but instead of giving you Traditional
Database rules, I’m going to give you columnar database rules. This means you’re going
to read the data from top to bottom, and you’re only going to read the columns I ask you to. OK, Columnar Database, please retrieve for
me all of the American League Teams… Let’s begin, Notice that this method skips
all that data that isn’t related to what you’re looking for. Once we’re done getting
that information we start with the next column, skipping the columns that you don’t need. But wait a second, how does the computer know
which League to assign each Team it’s retrieving. The way it does this is that the Columnar
Databases assigns a number to each row of data, allowing it to quickly pair up the many
columns that it retrieves. You can see how this really comes in handy
when you start reading the Teams column. All it needs to know is which number the American
League values ended at, in this case row 258. Meaning I just need Teams from 1 to 258 Now you probably noticed how the columnar
database was repeating American over and over. This is actually another advantage of a columnar
database. By using the numbering system in columnar databases, algorithms can be used
to simplify the retrieval of data. And you’ll find that each Columnar Database system has
highly sophisticated methods of gaining further performance measures. Knowing what I know now, I realize that the
presentation I sat through a few years ago wasn’t reading billions of rows of data.
It was just reading a few long columns of data. Columnar Databases can be a very straight
forward way of addressing the mounds of data corporations have to wade through. They’re
not a replacement for Traditional Databases, but certainly a powerful way of doing highly
aggregated analysis. Intricity specializes in providing leading
edge solutions for today’s complex data problems. I’d encourage you to visit the
Intricity website, and talk with one of our Specialists. And I commit to you that we’ll
live up to our slogan of simplifying complexity.

Leave a Reply

Your email address will not be published. Required fields are marked *