third_party/gsutil/boto/docs/source/s3_tut.rst - Issue 12042069: Scripts to download files from google storage based on sha1 sums

Unified Diff: third_party/gsutil/boto/docs/source/s3_tut.rst

Issue 12042069: Scripts to download files from google storage based on sha1 sums (Closed) Base URL: https://chromium.googlesource.com/chromium/tools/depot_tools.git@master

Patch Set: Review fixes, updated gsutil Created 7 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

« third_party/gsutil/README ('K') | « third_party/gsutil/boto/docs/source/ref/vpc.rst ('k') | third_party/gsutil/boto/docs/source/security_groups.rst » ('j') | third_party/gsutil/boto/tests/__init__.py » ('J')
Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

Index: third_party/gsutil/boto/docs/source/s3_tut.rst

diff --git a/third_party/gsutil/boto/docs/source/s3_tut.rst b/third_party/gsutil/boto/docs/source/s3_tut.rst

new file mode 100644

index 0000000000000000000000000000000000000000..fc75e108b97984b47fcdd56e48184e914bc1084c

--- /dev/null

+++ b/third_party/gsutil/boto/docs/source/s3_tut.rst

@@ -0,0 +1,450 @@

+.. _s3_tut:

+======================================

+An Introduction to boto's S3 interface

+======================================

+This tutorial focuses on the boto interface to the Simple Storage Service

+from Amazon Web Services. This tutorial assumes that you have already

+downloaded and installed boto.

+Creating a Connection

+---------------------

+The first step in accessing S3 is to create a connection to the service.

+There are two ways to do this in boto. The first is:

+>>> from boto.s3.connection import S3Connection

+>>> conn = S3Connection('<aws access key>', '<aws secret key>')

+At this point the variable conn will point to an S3Connection object. In

+this example, the AWS access key and AWS secret key are passed in to the

+method explicitely. Alternatively, you can set the environment variables:

+* `AWS_ACCESS_KEY_ID` - Your AWS Access Key ID

+* `AWS_SECRET_ACCESS_KEY` - Your AWS Secret Access Key

+and then call the constructor without any arguments, like this:

+>>> conn = S3Connection()

+There is also a shortcut function in the boto package, called connect_s3

+that may provide a slightly easier means of creating a connection::

+ >>> import boto

+ >>> conn = boto.connect_s3()

+In either case, conn will point to an S3Connection object which we will

+use throughout the remainder of this tutorial.

+Creating a Bucket

+-----------------

+Once you have a connection established with S3, you will probably want to

+create a bucket. A bucket is a container used to store key/value pairs

+in S3. A bucket can hold an unlimited amount of data so you could potentially

+have just one bucket in S3 for all of your information. Or, you could create

+separate buckets for different types of data. You can figure all of that out

+later, first let's just create a bucket. That can be accomplished like this::

+ >>> bucket = conn.create_bucket('mybucket')

+ Traceback (most recent call last):

+ File "<stdin>", line 1, in ?

+ File "boto/connection.py", line 285, in create_bucket

+ raise S3CreateError(response.status, response.reason)

+ boto.exception.S3CreateError: S3Error[409]: Conflict

+Whoa. What happended there? Well, the thing you have to know about

+buckets is that they are kind of like domain names. It's one flat name

+space that everyone who uses S3 shares. So, someone has already create

+a bucket called "mybucket" in S3 and that means no one else can grab that

+bucket name. So, you have to come up with a name that hasn't been taken yet.

+For example, something that uses a unique string as a prefix. Your

+AWS_ACCESS_KEY (NOT YOUR SECRET KEY!) could work but I'll leave it to

+your imagination to come up with something. I'll just assume that you

+found an acceptable name.

+The create_bucket method will create the requested bucket if it does not

+exist or will return the existing bucket if it does exist.

+Creating a Bucket In Another Location

+-------------------------------------

+The example above assumes that you want to create a bucket in the

+standard US region. However, it is possible to create buckets in

+other locations. To do so, first import the Location object from the

+boto.s3.connection module, like this::

+ >>> from boto.s3.connection import Location

+ >>> print '\n'.join(i for i in dir(Location) if i[0].isupper())

+ APNortheast

+ APSoutheast

+ APSoutheast2

+ DEFAULT

+ EU

+ SAEast

+ USWest

+ USWest2

+As you can see, the Location object defines a number of possible locations. By

+default, the location is the empty string which is interpreted as the US

+Classic Region, the original S3 region. However, by specifying another

+location at the time the bucket is created, you can instruct S3 to create the

+bucket in that location. For example::

+ >>> conn.create_bucket('mybucket', location=Location.EU)

+will create the bucket in the EU region (assuming the name is available).

+Storing Data

+----------------

+Once you have a bucket, presumably you will want to store some data

+in it. S3 doesn't care what kind of information you store in your objects

+or what format you use to store it. All you need is a key that is unique

+within your bucket.

+The Key object is used in boto to keep track of data stored in S3. To store

+new data in S3, start by creating a new Key object::

+ >>> from boto.s3.key import Key

+ >>> k = Key(bucket)

+ >>> k.key = 'foobar'

+ >>> k.set_contents_from_string('This is a test of S3')

+The net effect of these statements is to create a new object in S3 with a

+key of "foobar" and a value of "This is a test of S3". To validate that

+this worked, quit out of the interpreter and start it up again. Then::

+ >>> import boto

+ >>> c = boto.connect_s3()

+ >>> b = c.create_bucket('mybucket') # substitute your bucket name here

+ >>> from boto.s3.key import Key

+ >>> k = Key(b)

+ >>> k.key = 'foobar'

+ >>> k.get_contents_as_string()

+ 'This is a test of S3'

+So, we can definitely store and retrieve strings. A more interesting

+example may be to store the contents of a local file in S3 and then retrieve

+the contents to another local file.

+::

+ >>> k = Key(b)

+ >>> k.key = 'myfile'

+ >>> k.set_contents_from_filename('foo.jpg')

+ >>> k.get_contents_to_filename('bar.jpg')

+There are a couple of things to note about this. When you send data to

+S3 from a file or filename, boto will attempt to determine the correct

+mime type for that file and send it as a Content-Type header. The boto

+package uses the standard mimetypes package in Python to do the mime type

+guessing. The other thing to note is that boto does stream the content

+to and from S3 so you should be able to send and receive large files without

+any problem.

+Accessing A Bucket

+------------------

+Once a bucket exists, you can access it by getting the bucket. For example::

+ >>> mybucket = conn.get_bucket('mybucket') # Substitute in your bucket name

+ >>> mybucket.list()

+ <listing of keys in the bucket)

+By default, this method tries to validate the bucket's existence. You can

+override this behavior by passing ``validate=False``.::

+ >>> nonexistent = conn.get_bucket('i-dont-exist-at-all', validate=False)

+If the bucket does not exist, a ``S3ResponseError`` will commonly be thrown. If

+you'd rather not deal with any exceptions, you can use the ``lookup`` method.::

+ >>> nonexistent = conn.lookup('i-dont-exist-at-all')

+ >>> if nonexistent is None:

+ ... print "No such bucket!"

+ ...

+ No such bucket!

+Deleting A Bucket

+-----------------

+Removing a bucket can be done using the ``delete_bucket`` method. For example::

+ >>> conn.delete_bucket('mybucket') # Substitute in your bucket name

+The bucket must be empty of keys or this call will fail & an exception will be

+raised. You can remove a non-empty bucket by doing something like::

+ >>> full_bucket = conn.get_bucket('bucket-to-delete')

+ # It's full of keys. Delete them all.

+ >>> for key in full_bucket.list():

+ ... key.delete()

+ ...

+ # The bucket is empty now. Delete it.

+ >>> conn.delete_bucket('bucket-to-delete')

+.. warning::

+ This method can cause data loss! Be very careful when using it.

+ Additionally, be aware that using the above method for removing all keys

+ and deleting the bucket involves a request for each key. As such, it's not

+ particularly fast & is very chatty.

+Listing All Available Buckets

+-----------------------------

+In addition to accessing specific buckets via the create_bucket method

+you can also get a list of all available buckets that you have created.

+::

+ >>> rs = conn.get_all_buckets()

+This returns a ResultSet object (see the SQS Tutorial for more info on

+ResultSet objects). The ResultSet can be used as a sequence or list type

+object to retrieve Bucket objects.

+::

+ >>> len(rs)

+ 11

+ >>> for b in rs:

+ ... print b.name

+ ...

+ <listing of available buckets>

+ >>> b = rs[0]

+Setting / Getting the Access Control List for Buckets and Keys

+--------------------------------------------------------------

+The S3 service provides the ability to control access to buckets and keys

+within s3 via the Access Control List (ACL) associated with each object in

+S3. There are two ways to set the ACL for an object:

+1. Create a custom ACL that grants specific rights to specific users. At the

+ moment, the users that are specified within grants have to be registered

+ users of Amazon Web Services so this isn't as useful or as general as it

+ could be.

+2. Use a "canned" access control policy. There are four canned policies

+ defined:

+ a. private: Owner gets FULL_CONTROL. No one else has any access rights.

+ b. public-read: Owners gets FULL_CONTROL and the anonymous principal is granted READ access.

+ c. public-read-write: Owner gets FULL_CONTROL and the anonymous principal is granted READ and WRITE access.

+ d. authenticated-read: Owner gets FULL_CONTROL and any principal authenticated as a registered Amazon S3 user is granted READ access.

+To set a canned ACL for a bucket, use the set_acl method of the Bucket object.

+The argument passed to this method must be one of the four permissable

+canned policies named in the list CannedACLStrings contained in acl.py.

+For example, to make a bucket readable by anyone:

+>>> b.set_acl('public-read')

+You can also set the ACL for Key objects, either by passing an additional

+argument to the above method:

+>>> b.set_acl('public-read', 'foobar')

+where 'foobar' is the key of some object within the bucket b or you can

+call the set_acl method of the Key object:

+>>> k.set_acl('public-read')

+You can also retrieve the current ACL for a Bucket or Key object using the

+get_acl object. This method parses the AccessControlPolicy response sent

+by S3 and creates a set of Python objects that represent the ACL.

+::

+ >>> acp = b.get_acl()

+ >>> acp

+ <boto.acl.Policy instance at 0x2e6940>

+ >>> acp.acl

+ <boto.acl.ACL instance at 0x2e69e0>

+ >>> acp.acl.grants

+ [<boto.acl.Grant instance at 0x2e6a08>]

+ >>> for grant in acp.acl.grants:

+ ... print grant.permission, grant.display_name, grant.email_address, grant.id

+ ...

+ FULL_CONTROL <boto.user.User instance at 0x2e6a30>

+The Python objects representing the ACL can be found in the acl.py module

+of boto.

+Both the Bucket object and the Key object also provide shortcut

+methods to simplify the process of granting individuals specific

+access. For example, if you want to grant an individual user READ

+access to a particular object in S3 you could do the following::

+ >>> key = b.lookup('mykeytoshare')

+ >>> key.add_email_grant('READ', 'foo@bar.com')

+The email address provided should be the one associated with the users

+AWS account. There is a similar method called add_user_grant that accepts the

+canonical id of the user rather than the email address.

+Setting/Getting Metadata Values on Key Objects

+----------------------------------------------

+S3 allows arbitrary user metadata to be assigned to objects within a bucket.

+To take advantage of this S3 feature, you should use the set_metadata and

+get_metadata methods of the Key object to set and retrieve metadata associated

+with an S3 object. For example::

+ >>> k = Key(b)

+ >>> k.key = 'has_metadata'

+ >>> k.set_metadata('meta1', 'This is the first metadata value')

+ >>> k.set_metadata('meta2', 'This is the second metadata value')

+ >>> k.set_contents_from_filename('foo.txt')

+This code associates two metadata key/value pairs with the Key k. To retrieve

+those values later::

+ >>> k = b.get_key('has_metadata')

+ >>> k.get_metadata('meta1')

+ 'This is the first metadata value'

+ >>> k.get_metadata('meta2')

+ 'This is the second metadata value'

+ >>>

+Setting/Getting/Deleting CORS Configuration on a Bucket

+-------------------------------------------------------

+Cross-origin resource sharing (CORS) defines a way for client web

+applications that are loaded in one domain to interact with resources

+in a different domain. With CORS support in Amazon S3, you can build

+rich client-side web applications with Amazon S3 and selectively allow

+cross-origin access to your Amazon S3 resources.

+To create a CORS configuration and associate it with a bucket::

+ >>> from boto.s3.cors import CORSConfiguration

+ >>> cors_cfg = CORSConfiguration()

+ >>> cors_cfg.add_rule(['PUT', 'POST', 'DELETE'], 'https://www.example.com', allowed_header='*', max_age_seconds=3000, expose_header='x-amz-server-side-encryption')

+ >>> cors_cfg.add_rule('GET', '*')

+The above code creates a CORS configuration object with two rules.

+* The first rule allows cross-origin PUT, POST, and DELETE requests from

+ the https://www.example.com/ origin. The rule also allows all headers

+ in preflight OPTIONS request through the Access-Control-Request-Headers

+ header. In response to any preflight OPTIONS request, Amazon S3 will

+ return any requested headers.

+* The second rule allows cross-origin GET requests from all origins.

+To associate this configuration with a bucket::

+ >>> import boto

+ >>> c = boto.connect_s3()

+ >>> bucket = c.lookup('mybucket')

+ >>> bucket.set_cors(cors_cfg)

+To retrieve the CORS configuration associated with a bucket::

+ >>> cors_cfg = bucket.get_cors()

+And, finally, to delete all CORS configurations from a bucket::

+ >>> bucket.delete_cors()

+Transitioning Objects to Glacier

+--------------------------------

+You can configure objects in S3 to transition to Glacier after a period of

+time. This is done using lifecycle policies. A lifecycle policy can also

+specify that an object should be deleted after a period of time. Lifecycle

+configurations are assigned to buckets and require these parameters:

+* The object prefix that identifies the objects you are targeting.

+* The action you want S3 to perform on the identified objects.

+* The date (or time period) when you want S3 to perform these actions.

+For example, given a bucket ``s3-glacier-boto-demo``, we can first retrieve the

+bucket::

+ >>> import boto

+ >>> c = boto.connect_s3()

+ >>> bucket = c.get_bucket('s3-glacier-boto-demo')

+Then we can create a lifecycle object. In our example, we want all objects

+under ``logs/*`` to transition to Glacier 30 days after the object is created.

+::

+ >>> from boto.s3.lifecycle import Lifecycle, Transition, Rule

+ >>> to_glacier = Transition(days=30, storage_class='GLACIER')

+ >>> rule = Rule('ruleid', 'logs/', 'Enabled', transition=to_glacier)

+ >>> lifecycle = Lifecycle()

+ >>> lifecycle.append(rule)

+.. note::

+ For API docs for the lifecycle objects, see :py:mod:`boto.s3.lifecycle`

+We can now configure the bucket with this lifecycle policy::

+ >>> bucket.configure_lifecycle(lifecycle)

+True

+You can also retrieve the current lifecycle policy for the bucket::

+ >>> current = bucket.get_lifecycle_config()

+ >>> print current[0].transition

+ <Transition: in: 30 days, GLACIER>

+When an object transitions to Glacier, the storage class will be

+updated. This can be seen when you **list** the objects in a bucket::

+ >>> for key in bucket.list():

+ ... print key, key.storage_class

+ ...

+ <Key: s3-glacier-boto-demo,logs/testlog1.log> GLACIER

+You can also use the prefix argument to the ``bucket.list`` method::

+ >>> print list(b.list(prefix='logs/testlog1.log'))[0].storage_class

+ u'GLACIER'

+Restoring Objects from Glacier

+------------------------------

+Once an object has been transitioned to Glacier, you can restore the object

+back to S3. To do so, you can use the :py:meth:`boto.s3.key.Key.restore`

+method of the key object.

+The ``restore`` method takes an integer that specifies the number of days

+to keep the object in S3.

+::

+ >>> import boto

+ >>> c = boto.connect_s3()

+ >>> bucket = c.get_bucket('s3-glacier-boto-demo')

+ >>> key = bucket.get_key('logs/testlog1.log')

+ >>> key.restore(days=5)

+It takes about 4 hours for a restore operation to make a copy of the archive

+available for you to access. While the object is being restored, the

+``ongoing_restore`` attribute will be set to ``True``::

+ >>> key = bucket.get_key('logs/testlog1.log')

+ >>> print key.ongoing_restore

+ True

+When the restore is finished, this value will be ``False`` and the expiry

+date of the object will be non ``None``::

+ >>> key = bucket.get_key('logs/testlog1.log')

+ >>> print key.ongoing_restore

+ False

+ >>> print key.expiry_date

+ "Fri, 21 Dec 2012 00:00:00 GMT"

+.. note:: If there is no restore operation either in progress or completed,

+ the ``ongoing_restore`` attribute will be ``None``.

+Once the object is restored you can then download the contents::

+ >>> key.get_contents_to_filename('testlog1.log')