How to use Jupyter Notebooks in Azure Data Studio | Azure Friday

How to use Jupyter Notebooks in Azure Data Studio | Azure Friday


Hey friends, you know the SQL
Server team is going all in on Jupyter notebooks with
support for SQL and PowerShell. Python notebooks
and Azure Data Studio and now the SQL Server team is
shipping the internal troubleshooting guides used
by actual Microsoft engineers as Jupyter Books. They’re
making them available to the general public. Vicky Harp is
here to show me what’s new today on Azure Friday. Hi, I’m Scott Hanselman, this is
Azure Friday, here with Vicky Harp. There’s a lot of stuff
going on in SQL Server, and apparently. I’m out of the loop.
Last time I was an expert in SQL Server was 10-15 years ago. While
we just announced a GA of SQL Server 2019. Back at Ignite [2019] and
so there is a lot going on both in SQL Server as a platform on
cloud and on Prem Hybrid and then in our tooling approaches
to it as well. So probably what was the tool that you probably
use? Well, we used SQL Management Studio. It was kind of a Visual
Studio looking thing and we would basically just have
folders full of .SQL files and we had a bunch that we’ve
built up as a group in our shared knowledge was usually a
readme.txt and a SQL file or maybe a SQL file with some
comments and then where were you running your tooling and your
SQL Server. We were running it in on a Windows Server in a data
center on the 3rd floor that you could go down and visit and you
could touch it and it was a real computer and we were doing all
the running of the work on our local machines with. Connection
strings that were probably emailed around it was not very
sophisticated right so the world changed for the SQL Server users
and for the developers using the data platform. So we’ve got SQL
Server in the cloud. We’ve got SQL Server on Prem. We’ve got
SQL Server on Linux in containers on Kubernetes. We’ve
got hybrid approaches and we’ve been out. SQL database on the
edge for edge devices and so with all of that. SSMS is a
wonderful tool. My team works on SMS and continues to work in
SSMS, but it is a Windows only tool and it’s not going to be flexible. For all of these new
ways of working with data. As well, what you mentioned
about having the SQL files that you’re just kind of passing
around also doesn’t really scale to the way that people work.
These days, so To that end, we’ve been working on some new
tooling approaches and one of the most important things to
come out of our tooling group lately has been Azure Data
Studio. so Azure Data Studio is a downstream fork of VS Code so
you know, VS Code is cross plat and all of the goodness that
comes with that. And so it’s downstream fork of that
integrated on a daily basis, that release every month and 2. The VS Code, you know shell, it
adds a lot of connectivity to Azure data. Properties, like SQL
Server. All the different editions of SQL Server Postgres,
etc. So it is a tool that you install on your dev environment.
But then you’re able to connect to all of your servers wherever
they are, and you can do it from Mac and do it from Linux. We do
for Windows, so Azure data. Studio is the next evolution of
data management on every platform now not just windows.
So it will do all the things I’m used to doing it will connect to
SQL Server, and run dot SQL files and stuff that I use SQL management. 4. Plus, a whole lot
more that we’re going to learn about today. Yes, exactly so, so
the thing that I really want to talk about today was as you
mentioned Jupyter notebooks and how we’re using them in Azure
Data Studio. So you’re familiar with Jupyter notebooks. I
present my team works on Top net interactive and Jupyter. I know
has this concept of a kernel, even though I think about
Jupyter and Python. There’s this kernel. This runner and right
now wevegot.net C SHarp and F# and run so you make interactive paper mix pros. Encode exactly
and so this is something that we saw really driving with the
problems in the needs of our data users because if you think
about the way that people deal with their troubleshooting
scripts, etc. They were just standing around SQL files and
they were just hanging around CSV files and these things were
all disconnected from each other so if I may. I’ll show you a
little bit of what our viewer is for that in the tool. And so
here in after data studio. We do have the ability to create both
queries and notebooks so if I just do like a new query. This
is probably what you’re used too. I’m going to connect going
to pick my local. This machine and I can do select
star from sys databases. And run that and I get my
results and so this is kind of the SSMS way in the way that
most people are used to it, but when I save this, if I control.
I save this what I’m getting is the SQL file and it has no
comments and less, I’ve actually added comments literally just a
text to text file so Indiana. Supporting SQL 2019, we had a
need to add something more robust for the SQL Server 2019
big data clusters, which allow spark and you had the need to do
Python. So the first thing that we did is we added kind of
traditional SQL notebook or traditional notebook non sequel,
which supported Python, so this is just kind of a little light
one that I’ve created that uses the dog API to pull down a
random image of a corgi so this is using a Python kernel and you
can see here. We’ve got a number of other kernels mentioned and
we’ve got pie spark Scala are. Python PowerShell, an SQL are
shipping in the product so what’s that SQL kernel for this
is something that we’ve added in Azure Data Studio to give that
full Fidelity of the SQL experience that you get in the
query editor in a Jupyter notebook so close that one out
and kind of show you a comparative experience. So let
me start a little blocking scenario here and start up 5
minute block and so this is I’m doing a transaction. One of them
is updating the same table as another and they’re going to
talking to each other. So let’s say that my DBA has
this detect blocking not SQL file an they run that and I
picked the place to run it run it and now I’ve got this output
and now I see this, but if I want to share this with somebody
else, I’m probably having to type an email to say this is the
query. Iran you want to share the context right. You wanted
the context. Where did it run. When did it run? What was the
situation? What were the results all of them an email with a
bunch of attachments is very disjointed here’s a bunch of
attachments an number of attachments could be 1 could be 20. Right and then a bunch of
pros and then what is the format. Now, the email right
exactly and then let’s say somebody wanted to continue
working on that now they’re taking those disparate pieces an
reconstructing them into something to work with so if I
compare that instead to an notebook this is a sequel
notebook and what I’ve got here is I’ve actually collapsed all
the cells that may expand all the cells and you can see. I’ve
got all the sequel that I need including that diagnostic query
interleaved here with some light descriptions of what it is so if
I run this same. Thing on here, I’ve now got a
number of other pieces of information embedded in this
artifact that I can send to somebody so let me scroll up
here and I can see OK. This is the uptime information for the
server. These are the weights and here are some other
information about the stored procedures that are running the
query statistics, etc. I can save this and if I just hit CTRL
this save and close that. I go over here and just
reopen it. The diagnostic queries it’s
going to have all of that information still in there, so
the next person that I send it to is also going to, well, let’s
see if I double clicked wrong. The next person. I sent it to is
going to be able to open it and immediately start working on it,
so this is where the connection strings. At that point, so the
connection strings. You still just like anything else in SQL
Server. You would just choose to connect them. So it basically
nations are a thing. Just like they are in SQL Management
Studio. So if you have the look at the pros you can look at the
queries. But the connection string isn’t shipped along with
the book correct so and that’s kind of a security. Considering that makes sense
because the book can be generic as well, not specific to exactly
and so one of the things that we started hearing about was the
desire for people to create collections of these to say
thank you. Have your very senior dbas and they have all of their
special scripts and everything they can combine them up into
notebooks and then they want to make them available to the rest
of the Department and make them available to say when. This
happens do X when this happens do? Why are these then source
code appropriate artifacts yes and so you can check these in
with git you can check them into your source code as the IPY MB files. And then as we talk to
users and they were excited about this concept. We realized
something kind of interesting which is that that’s what we at
Microsoft do so. Our engineers within SQL Server also have all
of these special troubleshooting books were calling them books
now but they will be troubleshooting SQL files and
they were kind of floating around there in in Repose. Here,
there and emails there there in one note, there and these are
the things we’ve got an escalation from a customer.
Somebody would find this kind of magic script that lives
somewhere and make sure that that got back into a support case, etc. And it was kind of a
dribs and drabs and it wasn’t sharing it universally So what
we’ve decided to do is use. This open source concept called a
Jupyter book an actually ship. Those scripts that we have
internal to Microsoft to the customers within the product so
in notebook is pros plus runnable cells of language in
this case. SQL and a Jupyter book is a collection of
notebooks correct and so it’s a collection of notebooks with
table of contents, and crosslinking through them so how
many bring up our command palette control shift P. And go to Jupyter Book Sequel
2019 guide and So what that’s doing is if I go over here to my books viewer. I’ve now got this
table of contents. And I’ve got my list. Here and these are the
actual troubleshooting guides that we use internally to
Microsoft so if I go to troubleshooters here
troubleshooting things of all kinds of all kinds. And so here
I’ve got the big data cluster, which is the new SQL 2019. SQL
Server with spark and I can go to the trouble shooter for that
and it’s got a notebook here and this notebook has suggestions of
other notebooks that I might want to click in on so here. I’m
going to go to this one say what’s the version information and if I? Look at each of these
cells that’s actually got the Python code in this case that
would be run. If you did a support case with Microsoft and
so you can run this yourself and see if you can get what you need
out of it. And if not, you’ve got everything you need to open
a case or 2 bring it to the community, etc. And so that’s
how we’re trying to get this information out there. We figure
that there’s no reason to keep this information internally. The
other thing that we’re doing is we’re actually building this
with SQL Server. So you asked if this was you know source code
type so this Python that we’re writing and these notebooks are writing. Or actually being used
as tests of SQL Server. So when you run a build of SQL Server.
It’s running all these notebooks as gated check-in test so this
isn’t just a random bit of documentation that you don’t
care about that. You’ve thrown over the wall. This is the
product itself. It’s integrated with both SQL and now shipped
with Azure data. Studio correct and so with it being shipped in
Azure Data Studio. It has the ability for us to push updates
to it as we have changes. Let’s say we from build build DMV
changes, if you’re going to different version of SQL Server. We can make sure that that
troubleshooting is kept up to date. And so that’s how we’re
sharing it with the community and we’re going to be doing it
for GitHub, etc. We’ve got a number of other ones here where
our SQL Server samples repo for example, is now investing in
putting demos of how to use new features into notebook so that
you can choose to run them and try it if you think about how
samples always worked before you could say I don’t want to run
it. I want to just run through and see what the results were
when they ran it, so I can actually see the results as they
were run in our lab. And take a look at it and say,
Oh, that’s what we do, if I wanted to try. This feature and
you can kind of either run it in your local environment or not so
that’s kind of how we’re getting Jupyter notebooks really
embedded into the way that we want to communicate about new
features about troubleshooting features and helping users to
create their own artifacts of that kind so we want people to
understand the Jupyter notebooks. It’s not just for big
data anymore right. It’s not just Python. It’s any number of
languages that you have a Jupyter kernel for in this case
sequel, and this is not this is an experiment. This isn’t a
little thing you’re trying out. And we said all in during the
introduction SQL Server is all in on Jupyter notebooks. This is
the way to do things absolutely. I mean, we are now using it in
our own build pipeline so we’re all in on it, even for our own
internal purposes. And certainly for the external customers. We
are going to be committed to this we view notebooks as both a
way of doing data work and actually just at the new
interactivity method and so as an example of that. I want to
show you another way that we’re using notebooks. And. The idea is to use a new
deployment. So here I want to do a deployment of SQL Server in a
container so I’m going to pick SQL 2019. Hit select
I’m going to just put a default password here. And put it on a port that I know
is not in use hit open notebook so rather than going through a
full wizard experience, I can now kind of spit out the code
that you need to do something in to operationalize it within a
notebook and let you run with it and make changes to it, and do
everything you need instead of hiding everything behind a
clickthrough wizard so this is the nice thing about the new
Microsoft is not everything is necessarily hidden behind the
wizard here right and at this point. It’s just a text file and
ultimately you can go down and see what’s really going on.
There’s nothing hidden and we have this now. Everyone can do
this so people who are watching it, like we have never heard of this. If they have Azure data
studio, they probably have all these things available to him.
Today, yes, so all that they really need to do is go over
and download it. Install Azure Data Studio. It’s available. If
you just kind of search on Bing. You can find it or you
can go over to our GitHub repo and from there, you can get
access to the insiders build similar to as you would do with
vscode very cool. All in on Jupyter notebooks in Azure
data. Studio buckle up data friends. There’s a lot of great
stuff happening and I’m learning about it today on
Azure Friday.

Leave a Reply

Your email address will not be published. Required fields are marked *