SQL Server Licensing: Big Data Clusters | Data Exposed

SQL Server Licensing: Big Data Clusters | Data Exposed


[MUSIC]>>Hi, my name is Amit Banerjee. I’m a Group Program Manager with
the SQL Server product group. My day job is to take care of the SQL Server database engine
that runs your beloved SQL Server. So in today’s episode, we will actually talk about how your container licensing that we talked about in the
previous episode can be applied to SQL Server big data clusters that we
introduced in SQL Server 2019. So you have a concept
called big data cluster that actually allows you to
join your relational database, that’s your SQL Server Instance. You can take any database and join it with any data that’s sitting inside HDFS or any other external data source that is sitting outside
of the big data cluster. There are a few scenarios
that we enable with the SQL Server big
data cluster in 2019. The first one is a
data virtualization, the second one is a scenario where if your data needs to be
cached inside SQL Server, you can create caches of data
sitting inside your storage and your data pools and you can
create data marts on top of them, and the last one is
you can actually run AI and ML experiments on top
of it using things like Spark, or Python, or Java, or R. These are all
capabilities that we actually had in SQL Server 2017 and below. The Java one and the Spark are new. So let’s see what a SQL
Server architecture for big data cluster looks like. So you have something
called the master, which is a SQL Server
Enterprise Edition and Standard Edition instance, and then you have all the
other components in it. So these are essentially a bunch of containers that are running
either compute pools, so I’ll call this compute pool, there’s a data pool, and then there is storage pool. So all of this is what we
refer to as the cluster nodes. Now what the cluster nodes
do for you is it allows you to actually expand horizontally. If you’re running out
of compute capacity, you can extend this horizontally, the data pools and the storage pools. Now, if you have a SQL Server 2019 or 2017 Instance which is not a big data cluster, you can actually take the
database and you can, I’m going to use BDC as
an abbreviation here, you can actually take the database
and restore it into the master. So the BDC master can actually host a SQL Server database in there, and it is not just a 2019 database, you can host it from 2008, you can host it from 2012, or higher. A backup restore just works. The other piece that you
could do is you can actually use data virtualization
to read from it. So if you have external data sources
like another RDBMS or if you have a source like another HDFS source sitting in
your enterprise environment, you can actually pull data from these and you can
actually read from that. So this gives you a bunch of different business
scenarios that you can enable with using SQL Server
2019 big data cluster. Now the question comes in, how do
you actually license for these? So let’s say the master instance in my scenario is using an
Enterprise Edition of SQL Server and let’s say that has about
10 Enterprise Edition cores, and then your big data cluster nodes have the compute pool, the data pool, and the storage pool, and
let’s say all of them have 10 cores each just to
keep the calculation simple. So now in this example, you have 30 cores of big
data cluster in there. So you would need to
license for 10 cores of Enterprise Edition and 30 cores
of big data clusters in there. These are priced differently. Each big data cluster node
is licensed by cores and each big data cluster core is
licensed at $200 per year. What that does for you is
it allows you to scale out your big data cluster environment or your master
environment separately. Now what happens when you do something like a data cache
and you need to expand? So in this scenario,
you can actually, for the first time,
expand horizontally. So your big data cluster nodes
could go from 10 to 20 to 30 to 40, and it could keep increasing as you need and as your needs for
your enterprise actually grow. So in this scenario, your 30 cores could become
40 without affecting the number of cores that are
required in the master to run them. As long as you have compute capacity to run all of these scenarios, you should be able to license only for what’s
required for the master and what’s required for the big data cluster cores inside the compute, data
and storage pools. Now the interesting piece comes in is what happens when you
actually need more? Do you need to go buy?
The simple answer is yes. But if you have, let’s
say, Software Assurance, which is a licensing
benefit that you can enjoy with enterprise
agreements with Microsoft. What that allows you to do is for
each Enterprise Edition core, you actually get an entitlement
of eight big data cores, and if you have a standard
edition core in the master, you get one big data core. So for each of these
cores that you get, you can actually get more licenses or more entitlement
to run this environment. So if I took this scenario here, and let’s say I started with
10 cores in my master and 30 cores in my BDC and I
have software assurance, I am essentially not
running this with any cost. This is actually an entitlement
because these 10 cores now give me 80 cores of big data cluster for free. So till I actually exceed 80 cores, I don’t need to actually license. So in PoC scenario, if I had to actually take this entire environment and
draw what I drew below here, you would actually go from 10, 20 until you reach 80. There is actually no cost. You’re running this for absolutely
free even in production. Then when you went to 90, 100 you would actually start needing to license
10 additional cores here and 20 additional cores here for big data at $200 per core per year, and the master continues to stay. Now for example, if your database has started getting bigger and you had to move from 10 to let’s say
you now need to go to 20. Now if I take that architecture here, now if you had 20 EE cores, you would end up getting
another 100 cores. So now, you would get 80 into two. Previously we’re getting 80 cores because you had 10 in the master. Now you have another 80, so now that gets you
a 160 big data cores. So essentially in this
entire architecture, what ends up happening is if
your workload increases for the big data as long as you’re not exceeding your entitlement and
you have Software Assurance, you can keep increasing and
you only pay the difference. If your master workload, if this workload keeps increasing, you would need to keep
increasing your master. But what that does as
a side benefit is, it gives you more ability to scale horizontally for your big data
in the future when you need to. So running PoC is a lot easier, moving the POC workload into production without incurring
an additional cost for your environment becomes even more easier with SQL Server 2019
and Software Assurance. If you need to learn more
about these capabilities, we have videos in
this series earlier, where we have talked
about SQL Server 2019 and the capabilities and
the architecture of the big data cluster. Thank
you for joining this one. [MUSIC]

Leave a Reply

Your email address will not be published. Required fields are marked *