third_party/gsutil/boto/docs/source/s3_tut.rst - Issue 12042069: Scripts to download files from google storage based on sha1 sums

Side by Side Diff: third_party/gsutil/boto/docs/source/s3_tut.rst

Issue 12042069: Scripts to download files from google storage based on sha1 sums (Closed) Base URL: https://chromium.googlesource.com/chromium/tools/depot_tools.git@master

Patch Set: Review fixes, updated gsutil Created 7 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

« third_party/gsutil/README ('K') | « third_party/gsutil/boto/docs/source/ref/vpc.rst ('k') | third_party/gsutil/boto/docs/source/security_groups.rst » ('j') | third_party/gsutil/boto/tests/__init__.py » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
(Empty)
	1 .. _s3_tut:

	2

	3 ======================================

	4 An Introduction to boto's S3 interface

	5 ======================================

	6

	7 This tutorial focuses on the boto interface to the Simple Storage Service

	8 from Amazon Web Services. This tutorial assumes that you have already

	9 downloaded and installed boto.

	10

	11 Creating a Connection

	12 ---------------------

	13 The first step in accessing S3 is to create a connection to the service.

	14 There are two ways to do this in boto. The first is:

	15

	16 >>> from boto.s3.connection import S3Connection

	17 >>> conn = S3Connection('<aws access key>', '<aws secret key>')

	18

	19 At this point the variable conn will point to an S3Connection object. In

	20 this example, the AWS access key and AWS secret key are passed in to the

	21 method explicitely. Alternatively, you can set the environment variables:

	22

	23 * `AWS_ACCESS_KEY_ID` - Your AWS Access Key ID

	24 * `AWS_SECRET_ACCESS_KEY` - Your AWS Secret Access Key

	25

	26 and then call the constructor without any arguments, like this:

	27

	28 >>> conn = S3Connection()

	29

	30 There is also a shortcut function in the boto package, called connect_s3

	31 that may provide a slightly easier means of creating a connection::

	32

	33 >>> import boto

	34 >>> conn = boto.connect_s3()

	35

	36 In either case, conn will point to an S3Connection object which we will

	37 use throughout the remainder of this tutorial.

	38

	39 Creating a Bucket

	40 -----------------

	41

	42 Once you have a connection established with S3, you will probably want to

	43 create a bucket. A bucket is a container used to store key/value pairs

	44 in S3. A bucket can hold an unlimited amount of data so you could potentially

	45 have just one bucket in S3 for all of your information. Or, you could create

	46 separate buckets for different types of data. You can figure all of that out

	47 later, first let's just create a bucket. That can be accomplished like this::

	48

	49 >>> bucket = conn.create_bucket('mybucket')

	50 Traceback (most recent call last):

	51 File "<stdin>", line 1, in ?

	52 File "boto/connection.py", line 285, in create_bucket

	53 raise S3CreateError(response.status, response.reason)

	54 boto.exception.S3CreateError: S3Error[409]: Conflict

	55

	56 Whoa. What happended there? Well, the thing you have to know about

	57 buckets is that they are kind of like domain names. It's one flat name

	58 space that everyone who uses S3 shares. So, someone has already create

	59 a bucket called "mybucket" in S3 and that means no one else can grab that

	60 bucket name. So, you have to come up with a name that hasn't been taken yet.

	61 For example, something that uses a unique string as a prefix. Your

	62 AWS_ACCESS_KEY (NOT YOUR SECRET KEY!) could work but I'll leave it to

	63 your imagination to come up with something. I'll just assume that you

	64 found an acceptable name.

	65

	66 The create_bucket method will create the requested bucket if it does not

	67 exist or will return the existing bucket if it does exist.

	68

	69 Creating a Bucket In Another Location

	70 -------------------------------------

	71

	72 The example above assumes that you want to create a bucket in the

	73 standard US region. However, it is possible to create buckets in

	74 other locations. To do so, first import the Location object from the

	75 boto.s3.connection module, like this::

	76

	77 >>> from boto.s3.connection import Location

	78 >>> print '\n'.join(i for i in dir(Location) if i[0].isupper())

	79 APNortheast

	80 APSoutheast

	81 APSoutheast2

	82 DEFAULT

	83 EU

	84 SAEast

	85 USWest

	86 USWest2

	87

	88 As you can see, the Location object defines a number of possible locations. By

	89 default, the location is the empty string which is interpreted as the US

	90 Classic Region, the original S3 region. However, by specifying another

	91 location at the time the bucket is created, you can instruct S3 to create the

	92 bucket in that location. For example::

	93

	94 >>> conn.create_bucket('mybucket', location=Location.EU)

	95

	96 will create the bucket in the EU region (assuming the name is available).

	97

	98 Storing Data

	99 ----------------

	100

	101 Once you have a bucket, presumably you will want to store some data

	102 in it. S3 doesn't care what kind of information you store in your objects

	103 or what format you use to store it. All you need is a key that is unique

	104 within your bucket.

	105

	106 The Key object is used in boto to keep track of data stored in S3. To store

	107 new data in S3, start by creating a new Key object::

	108

	109 >>> from boto.s3.key import Key

	110 >>> k = Key(bucket)

	111 >>> k.key = 'foobar'

	112 >>> k.set_contents_from_string('This is a test of S3')

	113

	114 The net effect of these statements is to create a new object in S3 with a

	115 key of "foobar" and a value of "This is a test of S3". To validate that

	116 this worked, quit out of the interpreter and start it up again. Then::

	117

	118 >>> import boto

	119 >>> c = boto.connect_s3()

	120 >>> b = c.create_bucket('mybucket') # substitute your bucket name here

	121 >>> from boto.s3.key import Key

	122 >>> k = Key(b)

	123 >>> k.key = 'foobar'

	124 >>> k.get_contents_as_string()

	125 'This is a test of S3'

	126

	127 So, we can definitely store and retrieve strings. A more interesting

	128 example may be to store the contents of a local file in S3 and then retrieve

	129 the contents to another local file.

	130

	131 ::

	132

	133 >>> k = Key(b)

	134 >>> k.key = 'myfile'

	135 >>> k.set_contents_from_filename('foo.jpg')

	136 >>> k.get_contents_to_filename('bar.jpg')

	137

	138 There are a couple of things to note about this. When you send data to

	139 S3 from a file or filename, boto will attempt to determine the correct

	140 mime type for that file and send it as a Content-Type header. The boto

	141 package uses the standard mimetypes package in Python to do the mime type

	142 guessing. The other thing to note is that boto does stream the content

	143 to and from S3 so you should be able to send and receive large files without

	144 any problem.

	145

	146 Accessing A Bucket

	147 ------------------

	148

	149 Once a bucket exists, you can access it by getting the bucket. For example::

	150

	151 >>> mybucket = conn.get_bucket('mybucket') # Substitute in your bucket name

	152 >>> mybucket.list()

	153 <listing of keys in the bucket)

	154

	155 By default, this method tries to validate the bucket's existence. You can

	156 override this behavior by passing ``validate=False``.::

	157

	158 >>> nonexistent = conn.get_bucket('i-dont-exist-at-all', validate=False)

	159

	160 If the bucket does not exist, a ``S3ResponseError`` will commonly be thrown. If

	161 you'd rather not deal with any exceptions, you can use the ``lookup`` method.::

	162

	163 >>> nonexistent = conn.lookup('i-dont-exist-at-all')

	164 >>> if nonexistent is None:

	165 ... print "No such bucket!"

	166 ...

	167 No such bucket!

	168

	169 Deleting A Bucket

	170 -----------------

	171

	172 Removing a bucket can be done using the ``delete_bucket`` method. For example::

	173

	174 >>> conn.delete_bucket('mybucket') # Substitute in your bucket name

	175

	176 The bucket must be empty of keys or this call will fail & an exception will be

	177 raised. You can remove a non-empty bucket by doing something like::

	178

	179 >>> full_bucket = conn.get_bucket('bucket-to-delete')

	180 # It's full of keys. Delete them all.

	181 >>> for key in full_bucket.list():

	182 ... key.delete()

	183 ...

	184 # The bucket is empty now. Delete it.

	185 >>> conn.delete_bucket('bucket-to-delete')

	186

	187 .. warning::

	188

	189 This method can cause data loss! Be very careful when using it.

	190

	191 Additionally, be aware that using the above method for removing all keys

	192 and deleting the bucket involves a request for each key. As such, it's not

	193 particularly fast & is very chatty.

	194

	195 Listing All Available Buckets

	196 -----------------------------

	197 In addition to accessing specific buckets via the create_bucket method

	198 you can also get a list of all available buckets that you have created.

	199

	200 ::

	201

	202 >>> rs = conn.get_all_buckets()

	203

	204 This returns a ResultSet object (see the SQS Tutorial for more info on

	205 ResultSet objects). The ResultSet can be used as a sequence or list type

	206 object to retrieve Bucket objects.

	207

	208 ::

	209

	210 >>> len(rs)

	211 11

	212 >>> for b in rs:

	213 ... print b.name

	214 ...

	215 <listing of available buckets>

	216 >>> b = rs[0]

	217

	218 Setting / Getting the Access Control List for Buckets and Keys

	219 --------------------------------------------------------------

	220 The S3 service provides the ability to control access to buckets and keys

	221 within s3 via the Access Control List (ACL) associated with each object in

	222 S3. There are two ways to set the ACL for an object:

	223

	224 1. Create a custom ACL that grants specific rights to specific users. At the

	225 moment, the users that are specified within grants have to be registered

	226 users of Amazon Web Services so this isn't as useful or as general as it

	227 could be.

	228

	229 2. Use a "canned" access control policy. There are four canned policies

	230 defined:

	231

	232 a. private: Owner gets FULL_CONTROL. No one else has any access rights.

	233 b. public-read: Owners gets FULL_CONTROL and the anonymous principal is grant ed READ access.

	234 c. public-read-write: Owner gets FULL_CONTROL and the anonymous principal is granted READ and WRITE access.

	235 d. authenticated-read: Owner gets FULL_CONTROL and any principal authenticate d as a registered Amazon S3 user is granted READ access.

	236

	237 To set a canned ACL for a bucket, use the set_acl method of the Bucket object.

	238 The argument passed to this method must be one of the four permissable

	239 canned policies named in the list CannedACLStrings contained in acl.py.

	240 For example, to make a bucket readable by anyone:

	241

	242 >>> b.set_acl('public-read')

	243

	244 You can also set the ACL for Key objects, either by passing an additional

	245 argument to the above method:

	246

	247 >>> b.set_acl('public-read', 'foobar')

	248

	249 where 'foobar' is the key of some object within the bucket b or you can

	250 call the set_acl method of the Key object:

	251

	252 >>> k.set_acl('public-read')

	253

	254 You can also retrieve the current ACL for a Bucket or Key object using the

	255 get_acl object. This method parses the AccessControlPolicy response sent

	256 by S3 and creates a set of Python objects that represent the ACL.

	257

	258 ::

	259

	260 >>> acp = b.get_acl()

	261 >>> acp

	262 <boto.acl.Policy instance at 0x2e6940>

	263 >>> acp.acl

	264 <boto.acl.ACL instance at 0x2e69e0>

	265 >>> acp.acl.grants

	266 [<boto.acl.Grant instance at 0x2e6a08>]

	267 >>> for grant in acp.acl.grants:

	268 ... print grant.permission, grant.display_name, grant.email_address, grant .id

	269 ...

	270 FULL_CONTROL <boto.user.User instance at 0x2e6a30>

	271

	272 The Python objects representing the ACL can be found in the acl.py module

	273 of boto.

	274

	275 Both the Bucket object and the Key object also provide shortcut

	276 methods to simplify the process of granting individuals specific

	277 access. For example, if you want to grant an individual user READ

	278 access to a particular object in S3 you could do the following::

	279

	280 >>> key = b.lookup('mykeytoshare')

	281 >>> key.add_email_grant('READ', 'foo@bar.com')

	282

	283 The email address provided should be the one associated with the users

	284 AWS account. There is a similar method called add_user_grant that accepts the

	285 canonical id of the user rather than the email address.

	286

	287 Setting/Getting Metadata Values on Key Objects

	288 ----------------------------------------------

	289 S3 allows arbitrary user metadata to be assigned to objects within a bucket.

	290 To take advantage of this S3 feature, you should use the set_metadata and

	291 get_metadata methods of the Key object to set and retrieve metadata associated

	292 with an S3 object. For example::

	293

	294 >>> k = Key(b)

	295 >>> k.key = 'has_metadata'

	296 >>> k.set_metadata('meta1', 'This is the first metadata value')

	297 >>> k.set_metadata('meta2', 'This is the second metadata value')

	298 >>> k.set_contents_from_filename('foo.txt')

	299

	300 This code associates two metadata key/value pairs with the Key k. To retrieve

	301 those values later::

	302

	303 >>> k = b.get_key('has_metadata')

	304 >>> k.get_metadata('meta1')

	305 'This is the first metadata value'

	306 >>> k.get_metadata('meta2')

	307 'This is the second metadata value'

	308 >>>

	309

	310 Setting/Getting/Deleting CORS Configuration on a Bucket

	311 -------------------------------------------------------

	312

	313 Cross-origin resource sharing (CORS) defines a way for client web

	314 applications that are loaded in one domain to interact with resources

	315 in a different domain. With CORS support in Amazon S3, you can build

	316 rich client-side web applications with Amazon S3 and selectively allow

	317 cross-origin access to your Amazon S3 resources.

	318

	319 To create a CORS configuration and associate it with a bucket::

	320

	321 >>> from boto.s3.cors import CORSConfiguration

	322 >>> cors_cfg = CORSConfiguration()

	323 >>> cors_cfg.add_rule(['PUT', 'POST', 'DELETE'], 'https://www.example.com', allowed_header='*', max_age_seconds=3000, expose_header='x-amz-server-side-encry ption')

	324 >>> cors_cfg.add_rule('GET', '*')

	325

	326 The above code creates a CORS configuration object with two rules.

	327

	328 * The first rule allows cross-origin PUT, POST, and DELETE requests from

	329 the https://www.example.com/ origin. The rule also allows all headers

	330 in preflight OPTIONS request through the Access-Control-Request-Headers

	331 header. In response to any preflight OPTIONS request, Amazon S3 will

	332 return any requested headers.

	333 * The second rule allows cross-origin GET requests from all origins.

	334

	335 To associate this configuration with a bucket::

	336

	337 >>> import boto

	338 >>> c = boto.connect_s3()

	339 >>> bucket = c.lookup('mybucket')

	340 >>> bucket.set_cors(cors_cfg)

	341

	342 To retrieve the CORS configuration associated with a bucket::

	343

	344 >>> cors_cfg = bucket.get_cors()

	345

	346 And, finally, to delete all CORS configurations from a bucket::

	347

	348 >>> bucket.delete_cors()

	349

	350 Transitioning Objects to Glacier

	351 --------------------------------

	352

	353 You can configure objects in S3 to transition to Glacier after a period of

	354 time. This is done using lifecycle policies. A lifecycle policy can also

	355 specify that an object should be deleted after a period of time. Lifecycle

	356 configurations are assigned to buckets and require these parameters:

	357

	358 * The object prefix that identifies the objects you are targeting.

	359 * The action you want S3 to perform on the identified objects.

	360 * The date (or time period) when you want S3 to perform these actions.

	361

	362 For example, given a bucket ``s3-glacier-boto-demo``, we can first retrieve the

	363 bucket::

	364

	365 >>> import boto

	366 >>> c = boto.connect_s3()

	367 >>> bucket = c.get_bucket('s3-glacier-boto-demo')

	368

	369 Then we can create a lifecycle object. In our example, we want all objects

	370 under ``logs/*`` to transition to Glacier 30 days after the object is created.

	371

	372 ::

	373

	374 >>> from boto.s3.lifecycle import Lifecycle, Transition, Rule

	375 >>> to_glacier = Transition(days=30, storage_class='GLACIER')

	376 >>> rule = Rule('ruleid', 'logs/', 'Enabled', transition=to_glacier)

	377 >>> lifecycle = Lifecycle()

	378 >>> lifecycle.append(rule)

	379

	380 .. note::

	381

	382 For API docs for the lifecycle objects, see :py:mod:`boto.s3.lifecycle`

	383

	384 We can now configure the bucket with this lifecycle policy::

	385

	386 >>> bucket.configure_lifecycle(lifecycle)

	387 True

	388

	389 You can also retrieve the current lifecycle policy for the bucket::

	390

	391 >>> current = bucket.get_lifecycle_config()

	392 >>> print current[0].transition

	393 <Transition: in: 30 days, GLACIER>

	394

	395 When an object transitions to Glacier, the storage class will be

	396 updated. This can be seen when you list the objects in a bucket::

	397

	398 >>> for key in bucket.list():

	399 ... print key, key.storage_class

	400 ...

	401 <Key: s3-glacier-boto-demo,logs/testlog1.log> GLACIER

	402

	403 You can also use the prefix argument to the ``bucket.list`` method::

	404

	405 >>> print list(b.list(prefix='logs/testlog1.log'))[0].storage_class

	406 u'GLACIER'

	407

	408

	409 Restoring Objects from Glacier

	410 ------------------------------

	411

	412 Once an object has been transitioned to Glacier, you can restore the object

	413 back to S3. To do so, you can use the :py:meth:`boto.s3.key.Key.restore`

	414 method of the key object.

	415 The ``restore`` method takes an integer that specifies the number of days

	416 to keep the object in S3.

	417

	418 ::

	419

	420 >>> import boto

	421 >>> c = boto.connect_s3()

	422 >>> bucket = c.get_bucket('s3-glacier-boto-demo')

	423 >>> key = bucket.get_key('logs/testlog1.log')

	424 >>> key.restore(days=5)

	425

	426 It takes about 4 hours for a restore operation to make a copy of the archive

	427 available for you to access. While the object is being restored, the

	428 ``ongoing_restore`` attribute will be set to ``True``::

	429

	430

	431 >>> key = bucket.get_key('logs/testlog1.log')

	432 >>> print key.ongoing_restore

	433 True

	434

	435 When the restore is finished, this value will be ``False`` and the expiry

	436 date of the object will be non ``None``::

	437

	438 >>> key = bucket.get_key('logs/testlog1.log')

	439 >>> print key.ongoing_restore

	440 False

	441 >>> print key.expiry_date

	442 "Fri, 21 Dec 2012 00:00:00 GMT"

	443

	444

	445 .. note:: If there is no restore operation either in progress or completed,

	446 the ``ongoing_restore`` attribute will be ``None``.

	447

	448 Once the object is restored you can then download the contents::

	449

	450 >>> key.get_contents_to_filename('testlog1.log')

OLD	NEW