Automated Data Discovery for Amazon Macie

Default Header Image

Automated Data Discovery for Amazon Macie

Today, we announce automated data discovery for Amazon Macie. This new capability allows you to gain visibility into where your sensitive data resides on Amazon Simple Storage Service (Amazon S3) at a fraction of the cost of running a full data inspection across all your S3 buckets.

At AWS, security is our first priority. The security of the infrastructure itself, but also the security of your data. We give you access to services to manage identities and access, to protect the network and your applications, to detect suspicious activities, to protect your data, and to report on and monitor your compliance status.

Amazon Macie is a data security service that discovers sensitive data using machine learning and pattern matching and enables visibility and automated protection against data security risks. You use Amazon Macie to protect your data in S3 by scanning for the presence of sensitive data, such as names, addresses, and credit card numbers, and continually monitoring for properly configured preventative controls, such as encryption and access policies. Amazon Macie generates alerts when it detects publicly accessible buckets, unencrypted buckets, or buckets shared with an AWS account outside of your organization. You may also configure Amazon Macie to scan your S3 to run full sensitive data discovery scans on your S3 buckets to provide visibility into where sensitive data resides.

But customers operating at scale told us it is difficult to know where to start. When employees and applications add new buckets and generate petabytes of data on a daily basis, what should be scanned first?

Automated data discovery automates the continual discovery of sensitive data and potential data security risks across your entire set of buckets aggregated at AWS Organizations level.

When you enable automated discovery in the console, Macie starts to evaluate the level of sensitivity of each of your buckets and highlights any data security risks. Automated data discovery introduces intelligent and fully managed data sampling to provide an optimized sample rate that meaningfully reduces the amount of data that needs to be analyzed. This reduces the cost of discovering S3 buckets containing sensitive data compared to the cost of full data inspection.

You can tune automated data discovery to only identify the types of sensitive data that are relevant for your use case by choosing from over 100 managed sensitive data types, such as personally identifiable information (PII) and financial records with specific formats for multiple countries. For example, you can enable detection of Spanish or Swedish driving license numbers and choose to ignore US Social Security numbers, depending on your use cases. When the specific type of data you manage is not on our list, you can create custom data types that may be unique to your business, such as employee or patient identification numbers.

Let’s See It in Action
Automated data discovery is on by default for all new Amazon Macie customers, and existing Macie customers can enable it with one click in the AWS Management Console of the Amazon Macie administrator account. There is a 30-day free trial, and you can always opt out at the administrator level.

I can enable or disable the capability from the Automated discovery entry–under Settings–on the left side navigation menu. The Status section reveals the current status.

On the same page, I can configure the list of managed data identifiers. I can turn on or off individual types of data among more than one hundred managed data identifier types. I can also configure new ones. I select Edit on the Managed data identifiers section to include or exclude additional data identifiers.

If I have some buckets with lots of objects and others with a few, Macie won’t spend all its time inspecting one really large bucket at the expense of other smaller ones. Macie also prioritizes buckets that it knows the least about. For example, if it looked at the majority of objects in a small bucket, that bucket will be deprioritized compared to larger buckets where it has seen proportionally fewer objects.

Automated data discovery can provide an interactive data map of sensitive data distribution in S3 buckets within days of the feature being enabled. This data map refreshes daily as it intelligently picks and scans S3 objects in buckets and spreads the scan effort across the entire S3 estate in a given month.

Here is the Summary section of the Amazon Macie page. It looks like my set of buckets is secured. I have no bucket with public access, and 31 of my buckets might contain sensitive data.

When selecting the S3 buckets section of the navigation menu on the left side, I can see a data map of my buckets. The more red the squares are, the more sensitive data are detected in the buckets. The squares in blue represent buckets with no sensitive data detected so far. From there, I can drill down at bucket level to investigate the details.

Pricing and Availability
When you are new to Amazon Macie, automated data discovery is enabled by default. When you already use Amazon Macie in your organization, you can enable automatic data discovery with one click in the Management Console of the Amazon Macie administrator account.

There is a 30-day free trial period when you enable automatic data discovery on your AWS account. After the evaluation period, we charge based on the total quantity of S3 objects in your account as well as the bytes scanned for sensitive content. Charges are prorated per day. You can disable this capability at any time. The pricing page has all the details.

This new capability is now available in all 21 commercial AWS Regions where Macie is available.

Go and enable Amazon Macie automated data discovery today!

— seb

To top