What is S3

S3 was Amazon’s first AWS service, introduced in 2006. It is an object storage (also called blob storage) service that lets users store any kind of file in buckets. It is important to remember that S3 is not a file system. Unlike a typical computer filesystem, S3 does not support hierarchies or symbolic links, and unlike block storage, S3 does not provide sectors or tracks. A good way to think about S3 is as key-value (KV) pairs. Each bucket has a globally unique namespace, and within that namespace users can create files. A file in S3 is a “blob” that is opaque to S3: the service only knows its metadata and its name ("key"). Even though S3 provides a way to create “folders” within buckets, these are simply structured keys that uniquely identify files ("values").

<aside> ⚠️

When we say buckets partition the S3 namespace globally, we really do mean globally. S3 bucket names are unique worldwide, regardless of region. Nowadays AWS web console reminds users about this upon bucket creation, and in fact defaults to suggesting a name that is suffixed with some random text, not to pollute the global namespace. The reason behind this design is that S3 buckets can be made public and their names become part of their FQDNs, which we will discuss further below.

</aside>

Uploading objects and interacting with S3 via Console and CLI

Like every other AWS service, S3 offers a web-based Console access along with API/CLI. In fact, AWS provides neat tutorials on their website that showcase how to create buckets with their console:

Amazon S3 Bucket Creation Tutorial - AWS

The console provides a comprehensive set of controls that allow users to create/delete buckets, upload multiple files concurrently, set policies for buckets along with all their properties.

As you may expect the same can be done with AWS CLI which we covered in the previous section. In fact there are two APIs/CLIs available for S3:

the high-level aws s3 CLI, and
the lower-level, more granular aws s3api CLI.

The aws s3 CLI lets users create, move, list, delete buckets, and perform basic CRUD operations on objects. It’s the CLI/API that many tools/scripts use to backup files or archive directories, for example. The lower-level aws s3api on the other hand provides commands for things like manipulating S3 bucket access control lists, IAM policies and more granular, less-frequently used properties.

S3 access control

S3 bucket policies are JSON policy documents attached to a bucket that define who can access it and what actions they can perform, such as s3:GetObject or s3:PutObject. They work alongside IAM policies and can grant access to specific AWS principals or even anonymous users, but they should be written carefully to avoid unintentionally making data public. The Public Access Block settings are an extra safety layer that can override permissive bucket policies and ACLs by preventing public access at the bucket or account level, which helps ensure that “public by accident” configurations are blocked even if a policy is misconfigured.

S3 storage classes

Every object put into an S3 bucket resides in one of storage classes.

The most common is the Standard storage class. It’s the class every bucket gets as a default, and offers 99.99% availability guarantee from AWS. It’s a good class for objects that need to be accessed relatively frequently, for example as part of providing blob content for a website or a mobile app.

A cheaper alternative is the IA class, or “infrequent access.” It offers the same latency but one nine of availability less than the standard class, for less money. IA comes in two flavors: multi-region (for redundancy) or single region, for data that isn’t as critical or when used as redundant target itself.

A cheaper still alternative is the Glacier storage class. This is a significantly cheaper storage class with less availability guarantees, designed for data that isn’t accessed frequently. It also comes in multiple flavors, one of which—the deep archive—does not offer instant access to objects. This means accessing objects stored in this class can take up to 12 hrs, and thus it cannot be used by APIs that expect the ability to perform CRUD operations on S3 data, but on the other hand is perfect for archiving large files.

Users can set the storage class on the bucket level (applying it to all objects or to their subset) or on the object level during upload, but more importantly S3 buckets support lifecycle rules. This means users can decide that, e.g., objects that are left unmodified in a particular bucket for a certain amount of time (say, a month), automatically transition to a different storage class. It’s a useful mechanism for reducing cost.

Static website hosting on S3

S3 can also serve a static website by hosting users’ HTML, CSS, JavaScript, and other assets directly from a bucket. Users upload the site files, enable Static website hosting on the bucket, and point a DNS name (often via Route 53) at the bucket’s website endpoint or, more commonly, put CloudFront in front for HTTPS, caching, and custom domains. Access is controlled with bucket policies and Public Access Block settings, and S3 is a good fit because it is durable, inexpensive, and does not require running servers.

Practice

create an S3 bucket and upload some files to it
set a lifecycle rule on the S3 bucket that transitions all the objects to Glacier after 90 days
set an IAM policy on the S3 bucket that makes it available as read only

The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.

Week 2 - Introduction to AWS