How to Secure and Protect Your Data in Cloud Storage (Cloud Next ’19)

Welcome, everyone. One of the common questions
that I hear, especially from customers who
are considering moving their data to the
cloud for the first time, is what about my security. So security, I think, is
one of the main concerns that customers have when
they’re thinking of moving their data to the cloud. I’m Subhasish, a product manager
for Google Cloud Storage. I also have with me
my colleague Risham, who’s also from the product
management team in Google Cloud Storage. We also have with us our friends
from Twitter, Chad and Chris, from the Twitter’s
public cloud team. And in this session,
we are going to address just that question. How do you secure and
protect your data in cloud with Google Cloud
Storage or GCS? So let’s take a
look at the agenda. We are going to cover a range
of different things today. I’ll begin by telling
you all about GCS, what is Google Cloud Storage
from a product perspective. Then we are going to take a
look at what are all the cutting edge security primitives
and capabilities that you get by
design and default when you use Google Cloud
Storage to save your data. Then we’ll also take a look
at some of the recent features that we have launched
over the last year or so and how our customers
are using those features. Then we are going to have some
exciting new announcements. We are going to have some
brand new product announcements for the first time in
this session today. We’re also going to have a demo
where we’ll take a look at how it’s very easy for you to
use some of the security capabilities. Then we are going to
hear from our friends at Twitter who talk about
how they have architected their security posture
for all their data in Google Cloud Storage. And in the end we’ll end with
an audience question and answer session. Sounds good? So without much further ado,
let’s jump into what is GCS. So GCS or Google
Cloud Storage is Google Cloud’s unified object
storage or blob storage offering. So whether we want to do
AI or ML, whether you want to do analytics, and maybe
you want to run your BigQuery pipeline or your
Datapro pipeline, whether you want to do
Kubernetes orchestration or you want to
Anthos, most likely you would start with GCS. GCS is a place where
you store your data. It really acts like a key,
which can unlock the entire data platform which is
Google Cloud for you. Right. And even if you have
not directly used GCS, you may have indirectly
used GCS because of the fact that, whether you use Gmail
attachments, or YouTube, or Google Photos, or
Drive, they’re all empowered by the
underlying storage stack and the architecture of
the infrastructure storage, which is Google Cloud Storage. Right. So what is Google
Cloud Storage then? Google Cloud Storage
offers a unified API and consistent latencies across
all the different storage classes for the various
different use cases that you may have. The fact that we have unified
API and very consistent latencies across all
different storage classes helps a great deal when you use
GCS for different use cases. So with us you don’t have to
use a different API if you are talking to the
cold storage tier, or you don’t have
to think of how do I architect a system
for offline latencies, because for us every
storage class has online latencies– consistent
subsequent latencies. GCS is very reliable. GCS is very cost effective
across all different storage classes. And last but not the
least, and in this session we are going to touch up
on that in great deal, GCS has a whole bunch of
really smart inbuilt security capabilities. So let’s take a brief
moment to look at some of the key features of GCS. So I talked about the
unified experience. Why unified experience is great
is whether you are using GCS for analytics or
machine learning, or you are using GCS
for backup or archival, across the whole
range of different use cases with the unified APIs, you
do not have to redo your code, or you don’t have to
re-architect your systems because GCS has the same API
across all storage classes. And it has the same latencies. Another great feature that GCS
has is this strong consistency. GCS is a strongly
consistent system. We are not eventually
consistent. Right. So what that means
is, if you are using GCS to store
your data in a region and you are writing to
certain files in one region, now, in a very different
part of the geography– say in the exact opposite
place on the globe– if you try to list
GCS, you are going to find the entire listing
is strongly consistent. So this has tremendous
benefits in that, say, if we didn’t have strong
consistency and, actually, GCS is the only planet
scale storage system with such strong consistency. So if you are not
using GCS, most likely to solve an analytics pipeline
use case like this, what you would do is you would have
an object storage, which is your data lake. But you’d also use a
key value database, which you are using for
your metadata consistency. But, if you use GCS, you don’t
need to use any other systems. GCS takes care of it. GCS is also greatly
redundant, due redundant. Depending on what
is your use case, you can use storage
classes which gives you regional redundancy, or a
redundancy within a country, or you can even choose with
multi regional redundancy, which is across an
entire continent. And lastly, GCSE is
scalable to exabytes. Like you don’t have to worry
about scalability at all if you are dealing with GCS. So sometimes new customers
coming from on prem, they ask me I’m going
to bring hundreds of terabytes or petabytes. Is it going to scale? Right. So you don’t have any
such worries with GCS. In fact, in GCS we
have single customers who have more than
exabytes of data. So, depending on what
is your use case, you are going to use
different storage classes or you’re going to use
the different storage classes in conjunction. Now if your use case is
content distribution, if you are streaming
video, or multimedia, or you are hosting
website information, or if you have a business
continuity disaster recovery use case,
multi-regional is great for you. One thing to remember
about multi regional is multi regional storage
class has inbuilt smart caching and smart routing
inbuilt in the storage class itself. So you don’t have to
necessarily use a CDN at all. If, on the other hand, your
workload is very compute intensive, so what
you are trying to do is basically do a lot
of number crunching, you may be running
an analytics job, or you may be doing genomic
processing, or maybe just general compute– if that’s what you
want to do, what you are looking for is your
storage and your compute to be co-located. And that is where
regional storage classes are really useful,
because you predictably know this is exactly
where my storage is. You can spin up your compute
instances right there. Now recently we also
launched a storage class in beta, which is
called dual regional. And dual regional
really marries the best of multi-regional and regional. So dual regional– what
dual regional is it is a pair of two
specific regions. So you exactly know
where your data is so you can
co-locate your compute. So it’s great for analytics or
ML or other computer use cases. But it also has the
benefits of multi-regional in that it is geodistributed
across two different regions so it gives you the protection
for disaster recovery. Near line and goal line are
great if what you are looking for is backup and archive. And for those of
you who are really on top of your game and you have
read our blog post this morning “What is cooler than cold?” we also announced today an
archive storage class which is even colder than cold line. But the beauty of our
cold storage architecture is at the same time we
support even that storage class with the same API and
the same consistent subsecond latency. So, across all these
different use cases, we have a lot of
customers who use GCS in a very versatile manner. I touched up on this
point initially when I started talking about GCS,
but it is worth keeping in mind. The way most people
use GCS, they don’t think of GCS as a
point storage product. From the earlier days of
on prem enterprise storage, you might be conditioned
to thinking about storage in a very particular way. But GCS is very versatile. We see very often GCS
being used as a data lake. GCS is the first place
your data resides when you think of doing
analytics, or machine learning, or whatever really
you want to do. So GCS is your entry
to the entire data platform that is Google. Now let’s start
talking about security. So why security by
design and default? So, at Google, we don’t
think about security as an afterthought. We don’t think
security is something which can be bolted on later. So, even as we
design our systems and design our products
from the ground up, we design security side by side. So this has reflection in
various different dimensions. So Google has the
largest private network of any public cloud
provider out there. We literally have thousands
of edge node locations and hundreds of pop node
locations across the globe. So chances are, wherever
you are, wherever you are writing from,
we have an edge or a pop right next to you. So what happens is
your data packets enter these GCS’ or Google’s
private network at the closest point of entry and leaves
Google’s private network at the latest point of exit. So this has tremendous
performance benefits because it’s really fast. It’s a private network. It has tremendous
cost benefits for you, because you are not paying
much for network egress outside of our networks. But, most importantly, it has
tremendous security benefits as well. So for those of
you who are really experienced with security, you
know security and performance do not always go together. This is one place where it does. So the fact that we have such
fabulous private networking, you can get benefits
of performance and you can get
benefits of security. So consider this example. Say you are trying
to write your data or read your data from
your region to two different regions,
region A and region B. So with any other public cloud
network, what would happen is your data packets
are going to flow through the public internet for
most of the distance and then just when you hit your
specific region that’s when your data packages are to
go through your cloud vendor’s network. With Google, on the other hand,
what happens is regardless of where your data is stored–
your data will be stored in a very faraway region– but you are within Google’s
pop at the nearest point which is close to you. So this gives you absolutely
minimum exposure of your data packets on the public internet. So, when we talk about
security or privacy, one of the most important
aspects of that is encryption. When you think about your
data in cloud, for example, you think of data at
rest and data in flight. Now, with GCS, you
do not have to worry. Whether it is your data
at rest or data in flight, your data is always encrypted. We do not believe in this
notion of default encryption as an option in the UI. We feel default and option
are sort of oxymoronic. So, all your data, whether
in flight or in motion, is always encrypted. We also have multiple layers
of encryption in our stack. So that goes with our
concept of defense in depth. Now, one caveat I would
like to mention here is for multi-cloud
interoperability, especially S3
compatibility, because S3 supports HTTPN points. For S3 compatible API, we
also support unencrypted modes of data transfer. But, if you’re
using GCS natively, all your data is
always encrypted. Now, what about
encryption key management? We have a range of
choices for you. If you want to manage and
host the encryption keys by yourself, you could do that. In which case, we wouldn’t
know what is the encryption key you are using. In that case, you would just
specify the encryption key to us. It is called customer
specified encryption key. If you want us to host the
encryption key management service for you but you
want complete control in managing the
encryption key, we support customer managed
encryption key for you. There is also this
other range of use case where you want us to
help you with end to end encryption key management. For that, we have Google
managed encryption, which is everything is
automagical for you. We completely take
care of the fact that we have sophisticated
cutting edge encryption key management policies implemented. Now, auditing is another
important aspect in security, because you know
this is how you do your regulatory and compliance. But auditing is
also fundamentally how you prove to yourself
and how you can keep track that your security
posture is really working the way you intended it to be. And, again, depending on
what is your use case, we have various choices for you. If you use multiple
Google Cloud products and you want to do auditing
in a very uniform way, you can use our platform
level cloud audit logging. If you are, on the other hand,
a very savvy and very heavy GCS user and you
want to find out Nth level of detail
about how you are using the service like what’s
your latencies, what’s the throughput, lots
of other information, you can use GCS’ custom
access and storage logs. Now, you may be a company
who have your own threat intelligence and log
monitoring system. And what you are looking for
is basically a programmatic way to consume all that audit
logs and channel them into your system. You can use stackdriver
monitoring to programmatically consume all our audit logs. Now, with all these
audit logs, you probably want to ensure that you comply
with certain third party audits and regulations. So we at Google definitely
help you with that. We can help support you with
a bunch of third party audits and regulations. And this is a continual pattern
of investment from our side that you would see. For example, in the last year
or so we launched bucket log. And, with that, when
configured correctly, we can help you with FINRA,
SEC, CFTC, and CFR regulations. Now another aspect of designing
systems where security is by design and
default is DLP is really a prime example of that. So DLP is data loss prevention. So I’ll tell you how
our customers commonly use this special feature,
which is out there. So say you are a Financial
Services Institute and you have a data set which
you want to move to GCS. You want to use
GCS as a data lake. So you want to run your
analytics pipeline on top. But, before you run
your analytics pipeline, you want to ensure
that all the PII that you have in
your data– you might have first name, last
names, email addresses. You might have social security
numbers, credit card numbers. So you want to ensure before you
run all the analytics pipeline all the data is redacted, right? It is desensitized. So for those of you who
are at an English session by our friends from
Scotiabank, you would see that right now DLP
has the single pane of glass, like literally one page
few clicks ability where you can specify an end to end
analytics pipeline with GCS. And you can completely
eliminate all the PII in your pipeline using DLP. Now DLP can help you
definitely with data reduction and desensitization. But even if the fundamental
thing that you want is to get a very good
understanding of how is your PII to visualize
and manage your PII, you can use DLP for that too. So these are all great examples
of security built in for you by design and default.
But, at the same time, from a customer use case
perspective and requirement perspective, also you need us
to build certain capabilities. So I want to briefly touch up on
a few customer-focused security capabilities that we
launched recently. In last year’s Next, we
announced Cloud KMS in beta. Currently Cloud KMS is in GA. Cloud KMS allows you
to have complete end to end control for your
encryption key management. In Chicago Cloud Summit,
we launched bucket lock. And what bucket
lock basically is it is basically warm storage. Right. So it is immutable storage. So if your use case is I
want to write some data and specify a retention period
during which this data should be unmodified– I shouldn’t be
able to modify it. I shouldn’t be able
to delete this data– then bucket lock is the
right product for you. VPC service controls. So if your main concern is
data exfiltration, right? So say if you are
coming from on-prem, or you have your own private
data centers, so private cloud, and you want to ensure you
carry the same security posture as you
come to Google, VPC is the right product for you. And this is a
testament to the fact that we innovated when our
customers had requirements. In a very short span of time
we have over 5,000 projects using these features. We have over 5
billion operations. We have tens of petabytes
of data, which is already using these features. And we have over 100
petabytes of data which is planned for migration
using these features, right? So this is a continual
pattern that you would see, where we at Google want to
ensure that we innovate exactly where you need us
to innovate, where you have your requirements. And to continue in
this vein, I would now like to invite my
colleague Risham, who’s got a bunch of exciting
new announcements for you. Enjoy. [APPLAUSE] RISHAM DHILLON:
Thank you, Subhasish. Hello, everyone. Welcome. Today we’re going to announce
four new features related to GCS security. So let’s get started. The first feature
that we’re announcing is VPC Service Controls,
which is now in GA. With VPC Service
Controls you can make your own virtual
private cloud within GCP. So customers that come from
traditional enterprise, on premise environments, may
find it challenging sometimes to re-architect their network
perimeters and public clouds. And this is exactly the problem
that VPC Service Controls is going to help you solve. So, with VPC Service
Controls, you can very easily set up a network
perimeter or network firewall, keep certain resources within
that perimeter– for example, your GCS instance, or
your BigQuery instance– allow those resources to be
able to talk with each other, and block access to resources
outside of the perimeter. So, in this image here, you can
see that the green represents the virtual private cloud,
whereas the red represents services that don’t have access. Now, with VPC
Service Controls, you can also configure and extend
the firewall as you prefer. You can wait less certain
IP addresses, as well. VPC Service Controls really
gives you a strong foundation to build your own
virtual private cloud, and we hope you get
the feature a try. Next we’d like to announce
that bucket policy only is now in beta. Bucket policy only
makes the process of managing permissions
for enterprise companies a lot easier. So, with bucket policy
only, all you have to do is set bucket level IAM policy,
and then any object or data that you upload to your bucket
will adhere to the policy that you set at
the bucket level. So, if you take a
step back for a moment and imagine what it’s like
to have thousands of buckets with millions of
objects in them, or even hundreds of
millions of objects, with millions of
object ACLs in them, it can become quite apparent
that managing object level ACLs is sometimes hard
and difficult. And with bucket policy only you no
longer have to do that. You can just set a
bucket level IAM policy, and rest assured knowing that
everything in your bucket adheres to those permissions. So with bucket policy only
you get this added benefit of preventing data exposure–
accidental data exposure. And the reason why that is is
because now you no longer have to worry about both
object level ACL and IAM permissions granting
access, and accidentally over exposing data that way. Additionally, with
bucket policy only, you can use this in conjunction
with another GCP feature called domain restricted sharing. And that’ll help you achieve
the use case of keeping your data within your domain. Now we’ve seen that over–
since we’ve launched a beta, over a million
buckets have already enabled bucket policy only. And we hope that you give
this feature a try as well. Next we’d like to announce two
new features that really help our customers that are
focused on interoperability and multi cloud use cases. The first of this feature
is service account HMAC, which will be in beta shortly. Now the problem
that we’re trying to help solve with
service account HMAC, is that tying HMAC credentials
to service accounts is a lot better than tying HMAC
credentials to user accounts. And if you step
back for a moment, and imagine a use case– or a scenario rather– where you’ve tied HMAC
credentials to user accounts, it may become apparent that when
something happens to that user account– like, the employee
that is tied to that user account leaves, or the user
account becomes disabled, for example– you may risk an outage or
you may risk loss to the data that those credentials provided. With service account
HMAC, you can prevent this from happening by instead
tying your HMAC credentials to service accounts. Now with this feature,
also offers you the ability to completely manage
the lifecycle of HMAC keys. Next, we’d like to announce
that V4 signature support is now in beta as well. So again, this is for our
multi cloud users out there. With V4 signature support, you
can make very minimal changes to your code base, and
access multiple object stores with V4 signing. We support this for both HMAC
keys and Google RSA keys, and we hope you try
out this as well. Now, the features
that I just announced, and the ones that Subhasish
mentioned earlier on, are just a few of the
features that demonstrate our commitment to security. I’m also going to go ahead and
demo some of these features, and show you how easy it is to
actually use these and put them in action. So for this demo, let’s imagine
that I’m a financial security administrator, and I’ve been
tasked with three things. First, I’ve been tasked
with keeping the data that I store within GCS encrypted. And as Subhasish mentioned,
everything in GCS is already encrypted, but if
you wanted to manage encryption, you could do so with KMS. And we’ll go through a quick
demo that shows you how you do that very easily. Next, let’s say that as this
financial services security administrator, I’ve been told
that, hey, we need to simplify permissions management. And instead of using
object level ACLs, we’ll show you how you
can use bucket level IM policies through bucket policy
only to make that happen. Lastly, let’s say that you
have a retention policy that wants you to keep your data
secure, or unmodifiable or immutable for x number of years. Well, bucket lock will help you
do that for any amount of time that you set up the
retention policy. And we’ll go ahead
and we’ll demo how easy it is to use bucket
lock as well to do this. So let me go ahead and
switch over to the demo here. So, as you can see, the first
thing that we’re going to demo is how you can use KMS,
or encryption keys. And upload– whatever data
you upload to a GCS bucket, you can pass that key. And then it’s very easy to do
this entire end to end process. So we’ll go ahead. We’re in the cryptographic
keys panel over here. We’ll click Create key ring. And we’ll just
create a key ring. Let’s just name
it something fun. OK. And then we’ll go ahead–
we’ll choose a location. Let’s say we’re
choosing us-central1. I’m going to– well, now that
I’ve created the key ring, it’s asking me to create a key. So I’ll go ahead and
I’ll create a key. As you can tell, I’m pretty
creative in my key names. It’s the exact same thing as the
key ring, except with a key1. So we’ll click Create here. And now that we’ve created
this key, we’ll go ahead and we’ll use it to encrypt
data that we upload into GCS. So now I’ve just switched
over to my browser panel here. So this is the
view that you would see if you were trying to
manage your buckets in GCS. I already have one
existing bucket. But let’s go ahead
and create a bucket. So I’ll create a bucket. I’ll give it a name. OK. We’ll choose a
regional location. We’ll choose us-central1
as this regional location. And if you see the Show
Advanced Settings over here, it’ll ask you to choose
a retention policy. So that’s not–
that’s the next demo. But it’ll ask you for
encryption as well. So you can go ahead
and you can choose customer managed key over here. And then you can select the
encryption key that you want. And this is the encryption
key that I want– 0401. So I’ll go ahead and
I’ll grant it access. So, now that I’ve granted this
bucket access to that key, I’ll click Create. And now that I have
this bucket, let’s go ahead and upload some
data to that bucket. So we’ll upload the CSV file. And, once the CSV
file is uploaded, you can see that
for the encryption it says customer managed key. And that’s the key
that we created. That’s the key that we passed. So, with just a few
clicks, it’s very easy to take your KMS key
and upload data into GCS and encrypt it with that. So, as the storage administrator
for this financial services company, I can rest assured
knowing that, OK, this is very easy for me to go ahead and do. So the next thing
that I was tasked with was I was tasked with
trying to simplify permissions management. So let’s take a step
back and imagine that you have thousands
of buckets with millions of objects in them, and
you’re using object level ACLs to grant access. So we’ll click this
bucket that I’ve already set up before the demo, called
bucket with lots of ACLs. And when you open
this bucket, you’ll see that it has a
bunch of objects. Now, if this was a
real enterprise bucket, I’m sure it would have a
lot more objects than I have for this demo– like millions. So there are these
bunch of objects. And if we go ahead and
we click Edit Permissions on these objects, you can see
that each of these objects have their own set of
permissions, or ACLs, tied to them. So let’s say that
for this use case, I wanted to make sure that
every object in this bucket, I always had owner access to and
my colleague, Subhasish, always had reader access to. Well, one thing I could do, is
I could go through and confirm that this is true, one by one. So I can click on
the second object, and it seems like,
yep, this is true. I have owner access. He has reader access. Looks good. I can click on the third one. And I could sit here and
manually go through these one by one. And then, if I ever discover–
like here, for example– that I’ve given him the
incorrect permissions– like he has owner permissions,
but I wanted him to have reader permissions– I could– one thing
I could do is I could go ahead and change this. But, if I was to change
that, it might take me a really long time to go
through other objects as well, and change it one by one. So I don’t want this to happen. Unfortunately, I don’t want to
sit here and go through and do that. And I want to make sure
that anytime I upload data into the bucket,
Subhasish always has reader or viewer access. Well, bucket policy
only helps you do that. It helps you set a
bucket level IM policy. That means that
you no longer have to manage object ACLs at scale. So let’s navigate over here
to the Permissions tab. And, now that we’re in this
Permissions tab, what you’ll see is that it says you
can simplify access control with bucket policy only. Great. This is exactly
what we want to do. We’ll click Enable. It’ll give you
some insight here, that you won’t be using ACLs
anymore, which is awesome. I’ll click Enable. And, now that I’ve enabled this,
let’s go back into objects. And, as you can see, I can no
longer edit the ACL permissions on these objects one by one. So, if I go back to
Permissions and I want to give Subhasish viewer
access to these objects, I can do so really easily. I can click here on Add Members. And then I can
add him over here. And I can select the appropriate
role that I want to grant him– in this case, storage
object viewer. And click Save. So, now that I’ve done
that, if you take a look at these objects over
here, essentially, what I’ve basically set up
is the ability for Subhasish to have viewer access
to these objects. Access for him is not
granted through ACLs anymore. And I can rest
assured, making sure that any time some data is
uploaded to this bucket, Subhasish will always
have viewer access, and he’ll have the appropriate
permissions I want. Now, this is really
useful, again, if you have buckets with lots
and lots and lots of objects, and you have lots
of users, and you don’t want to go
through the object ACL granularity overhead. It’s also worth
mentioning that you don’t have to take
an existing bucket, and then turn on
bucket policy only. You can do this from the start
by clicking Create bucket. And here in the
access control model, you’ll see that you can
either set permissions uniformly at the bucket
level, through bucket policy only, or set object level
and bucket level permissions. Great. So as the storage
administrator, I have accomplished
my second task, which is simplifying
permissions management. Now, the last thing that
I’ve been tasked with is keeping data that
I upload into GCS unmodifiable or immutable
for x number of years. So you can imagine that
you’ve been handed a retention policy that says, hey, upload
data into this GCS bucket, but keep it immutable
for 10 years. GCS makes this really,
really easy to do, and we’ll show you how. So we’ll click
Create bucket here. And I’ll just name it that. I’ll choose the regional
storage location that I’ve always been
choosing– us-central1. And if I go into Show
Advanced Settings, you’ll see that
there’s a retention policy over here, which is
exactly what we want to do. We want to set a
retention policy. So I can set this for 10 years,
which means that every time I upload data into
this bucket, I won’t be able to change anything
about it for 10 years. I won’t be able to modify it,
won’t be able to delete it. But because I’m sure
none of you want to stay here until 2029 to
prove that this feature works, I will instead– for purposes of this demo– change this to 30 seconds. So we’re going from 10
years to 30 seconds here, but I think you guys
will get the point. So essentially, if I click
Create on this bucket, what I’ve done is, I’ve created
a bucket where every time I upload some data into
it– like this CSV file– after upload, I won’t be able to
overwrite or delete that file. So if I click Delete
here, it’ll say, hey, you can’t delete
this object yet. You won’t be able to do it until
this retention policy is over– until the 30 seconds
have passed by. So you can imagine that for
a storage administrator, this is very easy to set up. If someone says, hey, we
need to hold this data for 10 years, 15 years, whatever
your security policy says, or whatever regulation
you’re saying says– you just, with a few clicks,
you can make that happen. And after those–
after that time is up– so in this scenario, after
the 30 seconds are up or the 10 years are
up– you can go ahead and do whatever you’d like
to do with the objects. So I’m assuming 30 seconds are
up, so we’ll click Delete here. And as you can see, after that
retention time period is over, it’s very easy to go ahead
and modify that object. So in conclusion, the
things that we just demoed– bucket policy only, KMS,
and also bucket lock– are just three of
the many security features that GCS provides. Everything is meant to be very
intuitive and easy to use. And we hope that this
gives you a sense of the level of commitment
that we have into keeping your data secure on GCS. With that, I’d like
to turn it over to our friends from
Twitter, Chad and Chris, who will talk about how GCS is
being used at Twitter securely. And you can switch over to the– right. CHAD HODGES: Thank you, Risham. [APPLAUSE] Great. Thank you very much
for the warm welcome. Let me get to our slides. Wonderful. All that preparation to
say hi, I’m Chad Hodges. CHRISTOPHER ULHERR:
And I’m Chris Ulherr, and we’re from the public
cloud services team at Twitter. And we’re here
today to talk to you about how Twitter
uses Google Cloud, and specifically, how we’re
using some of the features talked about earlier
in the slides to help improve our security and
meet our security objectives. CHAD HODGES: So just
to start off here. Hopefully, some of
you are tweeting this. But Twitter’s purpose, and
how we position ourselves, is that we’re here to serve
the public conversation. And we want people to stay
informed, inform others, and talk about what
matters in a way that helps society progress. For our part, being part of
the platform organization, there are some
quick facts I want to share with you about
Twitter, if you didn’t know. 321 million monthly
active users, a peak of 143,199
tweets per second. We have a significant
on premises footprint, which is
maybe a strange thing to say at a cloud conference. But Twitter has hundreds of
thousands of physical machines. When you talk about
the companies that are doing what we’re
doing at our scale, it’s a handful, at most. But at the same time, we’re
migrating targeted workloads to the Google Cloud. That’s the partly cloudy
t-shirt I’m actually wearing. Hopefully, some of you got to
take advantage of the panel yesterday about that. And when I say targeted,
that might imply small, but we’re actually
moving 300 petabytes worth of Hadoop data to GCS. And that session was
yesterday, as I mentioned. And we’re clearly
going to be hybrid. We spent a lot of
time, a lot of energy, analyzing that decision. And, if you’re
interested in that, it’s not something
for this session, but there’s a great video
of Derek Lyon from Google Next in London last year. You can check that out. And I would encourage you
to do so because a lot of the methodologies we
used– a lot of things that informed our decision– are
things you should be looking at as well. So, for our team, which is the
public cloud services team, we have some high level
pieces to our mission. And against the backdrop
of that massive migration, and the recognition that there
are other teams in addition to Hadoop who can get value
from the Google Cloud, how do we engage with the
other teams at Twitter? And it’s really
important to recognize that we have to scale. We can’t be gatekeepers. There are a lot
of smart, capable, and inventive people
at Twitter that can come up with amazing ways to
use the cloud to take advantage of Google Cloud. And we have to make
sure that that happens. And that it’s done with as
little friction as possible. And then, second,
because of our scale and because of our
history, we sometimes need support or change
from Google in order to use their offerings. And, to be blunt, I’m delighted
with the offerings that have been discussed here today. They’ve made our
job’s a lot easier. They’ve made me a
lot more comfortable with what we’re doing. And ultimately our goals are to
make sure that our customers– that the internal developers
and users at Twitter– can leverage what
Google offers in a way that’s secure, reliable,
supportable, and repeatable. And again, we need
to be able to do that without falling into the trap
of toil, for those of you with the SRE background. And part of this is
that we’re weighing convenience versus security. So Twitter is a public
platform as you’re all aware. Twitter data is unique, right? It’s a public conversation. We make pieces of that
available via APIs, but we also have to
make sure that we’re taking into account privacy
and data protection, and we’re adhering to the trust
that our users place in us. If developers
violate our policies, we’re going to take
appropriate action. We also, in our role,
have to make sure that we have access
controls and compliance, and things don’t
happen on accident. And to do that, we’re leveraging
VPC Service Controls, domain restricted sharing,
and bucket policies– pardon me– bucket policy only. And there are pieces
of our security policy. Obviously, there are
other things that we do. But this is greatly
simplified the approach. It’s made it a lot easier
to be comfortable that we’re doing the right thing. And it’s also made it so
that we’re secure by default. And a large part of the
Hadoop migration project is making sure that
we’ve done that. And we’ve developed
several services– we touched on those yesterday– the demigod services–
in that panel. And as we move other
workloads to the cloud, we’ll continue to
architect our environment to make it easy to
do the right thing. CHRISTOPHER ULHERR: Yeah. As Chad mentioned, we’re
hyper focused on security, and regardless of intent,
a misconfigured bucket is one of the easiest
ways to expose data. And it’s something we
audit for, and something that we strive to prevent. And a key piece
of achieving that is using the tools
that Google provides. So we’ve leveraged both
the VPC Service Controls, and the domain
restricted sharing, and provided feedback
to Google in the process to help achieve that. And so with the VPC’s
service controls, it helped us
establish a perimeter so that we knew who exactly
can access the data that’s in the storage buckets. And then domain
restricted sharing– we can say, for example, if
I have a BigQuery data set, and somebody is going to attempt
to share that, they don’t share it outside of our domain. So it helps put that
control on top of it. And those help increase
our security posture. CHAD HODGES: And
with that, we’ll hand it back to Subhasish. SUBHASISH CHAKRABORTY:
Thank you, Chad and Chris. [APPLAUSE] I think this was
very instructive, to understand how a company
like Twitter, with so much PII, and so much really
important information uses GCS, and, at the
same time, ensures that their data is secure. So in conclusion, I would like
to touch upon a few themes that we have explored
in this session. First of all,
Google Cloud Storage is the unified blob storage,
or object storage platform in which you can store
your data to unlock the entire power of Google
Cloud as a data platform. We have planet
scale scalability, but at the same time, it’s
a very consistent, unified and simple system. Our customers use GCS for a
range of different use cases. When you are using GCS,
one of the biggest benefits that you get is there is
a lot of security, which is built in by
design and default. So you do not have to think of
building every single security feature from scratch. There is a lot that we
already take care of, right? And then in addition to having
very good default controls, we also expose a lot
of custom controls, based on your use case
where it makes sense. And in this session,
as you have seen, we have talked about
some features we have launched over the last year. And then you also saw some fresh
features today and some demos. You would see this
continuing theme that our main focus
from the product team is to listen to you– our most important
constituency, the customers– and to innovate where it
really matters for you. So this is no surprise
then that there is a lot of large
customers and enterprises who use GCS to do a lot of
amazing things with their data, at the same time, ensure that
the data is fully secure. So with that, I hope
you guys continue to have an amazing NEXT. And for those of you who
haven’t tried GCS yet, this would be a good
time to give GCS a try. Thank you. Thank you everyone for
attending this session. Really appreciate. [APPLAUSE] [MUSIC PLAYING]

Leave a Reply

Your email address will not be published. Required fields are marked *