| OLD | NEW | 
|---|
| (Empty) |  | 
|  | 1 .. _s3_tut: | 
|  | 2 | 
|  | 3 ====================================== | 
|  | 4 An Introduction to boto's S3 interface | 
|  | 5 ====================================== | 
|  | 6 | 
|  | 7 This tutorial focuses on the boto interface to the Simple Storage Service | 
|  | 8 from Amazon Web Services.  This tutorial assumes that you have already | 
|  | 9 downloaded and installed boto. | 
|  | 10 | 
|  | 11 Creating a Connection | 
|  | 12 --------------------- | 
|  | 13 The first step in accessing S3 is to create a connection to the service. | 
|  | 14 There are two ways to do this in boto.  The first is: | 
|  | 15 | 
|  | 16 >>> from boto.s3.connection import S3Connection | 
|  | 17 >>> conn = S3Connection('<aws access key>', '<aws secret key>') | 
|  | 18 | 
|  | 19 At this point the variable conn will point to an S3Connection object.  In | 
|  | 20 this example, the AWS access key and AWS secret key are passed in to the | 
|  | 21 method explicitely.  Alternatively, you can set the environment variables: | 
|  | 22 | 
|  | 23 * `AWS_ACCESS_KEY_ID` - Your AWS Access Key ID | 
|  | 24 * `AWS_SECRET_ACCESS_KEY` - Your AWS Secret Access Key | 
|  | 25 | 
|  | 26 and then call the constructor without any arguments, like this: | 
|  | 27 | 
|  | 28 >>> conn = S3Connection() | 
|  | 29 | 
|  | 30 There is also a shortcut function in the boto package, called connect_s3 | 
|  | 31 that may provide a slightly easier means of creating a connection:: | 
|  | 32 | 
|  | 33     >>> import boto | 
|  | 34     >>> conn = boto.connect_s3() | 
|  | 35 | 
|  | 36 In either case, conn will point to an S3Connection object which we will | 
|  | 37 use throughout the remainder of this tutorial. | 
|  | 38 | 
|  | 39 Creating a Bucket | 
|  | 40 ----------------- | 
|  | 41 | 
|  | 42 Once you have a connection established with S3, you will probably want to | 
|  | 43 create a bucket.  A bucket is a container used to store key/value pairs | 
|  | 44 in S3.  A bucket can hold an unlimited amount of data so you could potentially | 
|  | 45 have just one bucket in S3 for all of your information.  Or, you could create | 
|  | 46 separate buckets for different types of data.  You can figure all of that out | 
|  | 47 later, first let's just create a bucket.  That can be accomplished like this:: | 
|  | 48 | 
|  | 49     >>> bucket = conn.create_bucket('mybucket') | 
|  | 50     Traceback (most recent call last): | 
|  | 51       File "<stdin>", line 1, in ? | 
|  | 52       File "boto/connection.py", line 285, in create_bucket | 
|  | 53         raise S3CreateError(response.status, response.reason) | 
|  | 54     boto.exception.S3CreateError: S3Error[409]: Conflict | 
|  | 55 | 
|  | 56 Whoa.  What happended there?  Well, the thing you have to know about | 
|  | 57 buckets is that they are kind of like domain names.  It's one flat name | 
|  | 58 space that everyone who uses S3 shares.  So, someone has already create | 
|  | 59 a bucket called "mybucket" in S3 and that means no one else can grab that | 
|  | 60 bucket name.  So, you have to come up with a name that hasn't been taken yet. | 
|  | 61 For example, something that uses a unique string as a prefix.  Your | 
|  | 62 AWS_ACCESS_KEY (NOT YOUR SECRET KEY!) could work but I'll leave it to | 
|  | 63 your imagination to come up with something.  I'll just assume that you | 
|  | 64 found an acceptable name. | 
|  | 65 | 
|  | 66 The create_bucket method will create the requested bucket if it does not | 
|  | 67 exist or will return the existing bucket if it does exist. | 
|  | 68 | 
|  | 69 Creating a Bucket In Another Location | 
|  | 70 ------------------------------------- | 
|  | 71 | 
|  | 72 The example above assumes that you want to create a bucket in the | 
|  | 73 standard US region.  However, it is possible to create buckets in | 
|  | 74 other locations.  To do so, first import the Location object from the | 
|  | 75 boto.s3.connection module, like this:: | 
|  | 76 | 
|  | 77     >>> from boto.s3.connection import Location | 
|  | 78     >>> print '\n'.join(i for i in dir(Location) if i[0].isupper()) | 
|  | 79     APNortheast | 
|  | 80     APSoutheast | 
|  | 81     APSoutheast2 | 
|  | 82     DEFAULT | 
|  | 83     EU | 
|  | 84     SAEast | 
|  | 85     USWest | 
|  | 86     USWest2 | 
|  | 87 | 
|  | 88 As you can see, the Location object defines a number of possible locations.  By | 
|  | 89 default, the location is the empty string which is interpreted as the US | 
|  | 90 Classic Region, the original S3 region.  However, by specifying another | 
|  | 91 location at the time the bucket is created, you can instruct S3 to create the | 
|  | 92 bucket in that location.  For example:: | 
|  | 93 | 
|  | 94     >>> conn.create_bucket('mybucket', location=Location.EU) | 
|  | 95 | 
|  | 96 will create the bucket in the EU region (assuming the name is available). | 
|  | 97 | 
|  | 98 Storing Data | 
|  | 99 ---------------- | 
|  | 100 | 
|  | 101 Once you have a bucket, presumably you will want to store some data | 
|  | 102 in it.  S3 doesn't care what kind of information you store in your objects | 
|  | 103 or what format you use to store it.  All you need is a key that is unique | 
|  | 104 within your bucket. | 
|  | 105 | 
|  | 106 The Key object is used in boto to keep track of data stored in S3.  To store | 
|  | 107 new data in S3, start by creating a new Key object:: | 
|  | 108 | 
|  | 109     >>> from boto.s3.key import Key | 
|  | 110     >>> k = Key(bucket) | 
|  | 111     >>> k.key = 'foobar' | 
|  | 112     >>> k.set_contents_from_string('This is a test of S3') | 
|  | 113 | 
|  | 114 The net effect of these statements is to create a new object in S3 with a | 
|  | 115 key of "foobar" and a value of "This is a test of S3".  To validate that | 
|  | 116 this worked, quit out of the interpreter and start it up again.  Then:: | 
|  | 117 | 
|  | 118     >>> import boto | 
|  | 119     >>> c = boto.connect_s3() | 
|  | 120     >>> b = c.create_bucket('mybucket') # substitute your bucket name here | 
|  | 121     >>> from boto.s3.key import Key | 
|  | 122     >>> k = Key(b) | 
|  | 123     >>> k.key = 'foobar' | 
|  | 124     >>> k.get_contents_as_string() | 
|  | 125     'This is a test of S3' | 
|  | 126 | 
|  | 127 So, we can definitely store and retrieve strings.  A more interesting | 
|  | 128 example may be to store the contents of a local file in S3 and then retrieve | 
|  | 129 the contents to another local file. | 
|  | 130 | 
|  | 131 :: | 
|  | 132 | 
|  | 133     >>> k = Key(b) | 
|  | 134     >>> k.key = 'myfile' | 
|  | 135     >>> k.set_contents_from_filename('foo.jpg') | 
|  | 136     >>> k.get_contents_to_filename('bar.jpg') | 
|  | 137 | 
|  | 138 There are a couple of things to note about this.  When you send data to | 
|  | 139 S3 from a file or filename, boto will attempt to determine the correct | 
|  | 140 mime type for that file and send it as a Content-Type header.  The boto | 
|  | 141 package uses the standard mimetypes package in Python to do the mime type | 
|  | 142 guessing.  The other thing to note is that boto does stream the content | 
|  | 143 to and from S3 so you should be able to send and receive large files without | 
|  | 144 any problem. | 
|  | 145 | 
|  | 146 Accessing A Bucket | 
|  | 147 ------------------ | 
|  | 148 | 
|  | 149 Once a bucket exists, you can access it by getting the bucket. For example:: | 
|  | 150 | 
|  | 151     >>> mybucket = conn.get_bucket('mybucket') # Substitute in your bucket name | 
|  | 152     >>> mybucket.list() | 
|  | 153     <listing of keys in the bucket) | 
|  | 154 | 
|  | 155 By default, this method tries to validate the bucket's existence. You can | 
|  | 156 override this behavior by passing ``validate=False``.:: | 
|  | 157 | 
|  | 158     >>> nonexistent = conn.get_bucket('i-dont-exist-at-all', validate=False) | 
|  | 159 | 
|  | 160 If the bucket does not exist, a ``S3ResponseError`` will commonly be thrown. If | 
|  | 161 you'd rather not deal with any exceptions, you can use the ``lookup`` method.:: | 
|  | 162 | 
|  | 163     >>> nonexistent = conn.lookup('i-dont-exist-at-all') | 
|  | 164     >>> if nonexistent is None: | 
|  | 165     ...     print "No such bucket!" | 
|  | 166     ... | 
|  | 167     No such bucket! | 
|  | 168 | 
|  | 169 Deleting A Bucket | 
|  | 170 ----------------- | 
|  | 171 | 
|  | 172 Removing a bucket can be done using the ``delete_bucket`` method. For example:: | 
|  | 173 | 
|  | 174     >>> conn.delete_bucket('mybucket') # Substitute in your bucket name | 
|  | 175 | 
|  | 176 The bucket must be empty of keys or this call will fail & an exception will be | 
|  | 177 raised. You can remove a non-empty bucket by doing something like:: | 
|  | 178 | 
|  | 179     >>> full_bucket = conn.get_bucket('bucket-to-delete') | 
|  | 180     # It's full of keys. Delete them all. | 
|  | 181     >>> for key in full_bucket.list(): | 
|  | 182     ...     key.delete() | 
|  | 183     ... | 
|  | 184     # The bucket is empty now. Delete it. | 
|  | 185     >>> conn.delete_bucket('bucket-to-delete') | 
|  | 186 | 
|  | 187 .. warning:: | 
|  | 188 | 
|  | 189     This method can cause data loss! Be very careful when using it. | 
|  | 190 | 
|  | 191     Additionally, be aware that using the above method for removing all keys | 
|  | 192     and deleting the bucket involves a request for each key. As such, it's not | 
|  | 193     particularly fast & is very chatty. | 
|  | 194 | 
|  | 195 Listing All Available Buckets | 
|  | 196 ----------------------------- | 
|  | 197 In addition to accessing specific buckets via the create_bucket method | 
|  | 198 you can also get a list of all available buckets that you have created. | 
|  | 199 | 
|  | 200 :: | 
|  | 201 | 
|  | 202     >>> rs = conn.get_all_buckets() | 
|  | 203 | 
|  | 204 This returns a ResultSet object (see the SQS Tutorial for more info on | 
|  | 205 ResultSet objects).  The ResultSet can be used as a sequence or list type | 
|  | 206 object to retrieve Bucket objects. | 
|  | 207 | 
|  | 208 :: | 
|  | 209 | 
|  | 210     >>> len(rs) | 
|  | 211     11 | 
|  | 212     >>> for b in rs: | 
|  | 213     ... print b.name | 
|  | 214     ... | 
|  | 215     <listing of available buckets> | 
|  | 216     >>> b = rs[0] | 
|  | 217 | 
|  | 218 Setting / Getting the Access Control List for Buckets and Keys | 
|  | 219 -------------------------------------------------------------- | 
|  | 220 The S3 service provides the ability to control access to buckets and keys | 
|  | 221 within s3 via the Access Control List (ACL) associated with each object in | 
|  | 222 S3.  There are two ways to set the ACL for an object: | 
|  | 223 | 
|  | 224 1. Create a custom ACL that grants specific rights to specific users.  At the | 
|  | 225    moment, the users that are specified within grants have to be registered | 
|  | 226    users of Amazon Web Services so this isn't as useful or as general as it | 
|  | 227    could be. | 
|  | 228 | 
|  | 229 2. Use a "canned" access control policy.  There are four canned policies | 
|  | 230    defined: | 
|  | 231 | 
|  | 232    a. private: Owner gets FULL_CONTROL.  No one else has any access rights. | 
|  | 233    b. public-read: Owners gets FULL_CONTROL and the anonymous principal is grant
     ed READ access. | 
|  | 234    c. public-read-write: Owner gets FULL_CONTROL and the anonymous principal is 
     granted READ and WRITE access. | 
|  | 235    d. authenticated-read: Owner gets FULL_CONTROL and any principal authenticate
     d as a registered Amazon S3 user is granted READ access. | 
|  | 236 | 
|  | 237 To set a canned ACL for a bucket, use the set_acl method of the Bucket object. | 
|  | 238 The argument passed to this method must be one of the four permissable | 
|  | 239 canned policies named in the list CannedACLStrings contained in acl.py. | 
|  | 240 For example, to make a bucket readable by anyone: | 
|  | 241 | 
|  | 242 >>> b.set_acl('public-read') | 
|  | 243 | 
|  | 244 You can also set the ACL for Key objects, either by passing an additional | 
|  | 245 argument to the above method: | 
|  | 246 | 
|  | 247 >>> b.set_acl('public-read', 'foobar') | 
|  | 248 | 
|  | 249 where 'foobar' is the key of some object within the bucket b or you can | 
|  | 250 call the set_acl method of the Key object: | 
|  | 251 | 
|  | 252 >>> k.set_acl('public-read') | 
|  | 253 | 
|  | 254 You can also retrieve the current ACL for a Bucket or Key object using the | 
|  | 255 get_acl object.  This method parses the AccessControlPolicy response sent | 
|  | 256 by S3 and creates a set of Python objects that represent the ACL. | 
|  | 257 | 
|  | 258 :: | 
|  | 259 | 
|  | 260     >>> acp = b.get_acl() | 
|  | 261     >>> acp | 
|  | 262     <boto.acl.Policy instance at 0x2e6940> | 
|  | 263     >>> acp.acl | 
|  | 264     <boto.acl.ACL instance at 0x2e69e0> | 
|  | 265     >>> acp.acl.grants | 
|  | 266     [<boto.acl.Grant instance at 0x2e6a08>] | 
|  | 267     >>> for grant in acp.acl.grants: | 
|  | 268     ...   print grant.permission, grant.display_name, grant.email_address, grant
     .id | 
|  | 269     ... | 
|  | 270     FULL_CONTROL <boto.user.User instance at 0x2e6a30> | 
|  | 271 | 
|  | 272 The Python objects representing the ACL can be found in the acl.py module | 
|  | 273 of boto. | 
|  | 274 | 
|  | 275 Both the Bucket object and the Key object also provide shortcut | 
|  | 276 methods to simplify the process of granting individuals specific | 
|  | 277 access.  For example, if you want to grant an individual user READ | 
|  | 278 access to a particular object in S3 you could do the following:: | 
|  | 279 | 
|  | 280     >>> key = b.lookup('mykeytoshare') | 
|  | 281     >>> key.add_email_grant('READ', 'foo@bar.com') | 
|  | 282 | 
|  | 283 The email address provided should be the one associated with the users | 
|  | 284 AWS account.  There is a similar method called add_user_grant that accepts the | 
|  | 285 canonical id of the user rather than the email address. | 
|  | 286 | 
|  | 287 Setting/Getting Metadata Values on Key Objects | 
|  | 288 ---------------------------------------------- | 
|  | 289 S3 allows arbitrary user metadata to be assigned to objects within a bucket. | 
|  | 290 To take advantage of this S3 feature, you should use the set_metadata and | 
|  | 291 get_metadata methods of the Key object to set and retrieve metadata associated | 
|  | 292 with an S3 object.  For example:: | 
|  | 293 | 
|  | 294     >>> k = Key(b) | 
|  | 295     >>> k.key = 'has_metadata' | 
|  | 296     >>> k.set_metadata('meta1', 'This is the first metadata value') | 
|  | 297     >>> k.set_metadata('meta2', 'This is the second metadata value') | 
|  | 298     >>> k.set_contents_from_filename('foo.txt') | 
|  | 299 | 
|  | 300 This code associates two metadata key/value pairs with the Key k.  To retrieve | 
|  | 301 those values later:: | 
|  | 302 | 
|  | 303     >>> k = b.get_key('has_metadata') | 
|  | 304     >>> k.get_metadata('meta1') | 
|  | 305     'This is the first metadata value' | 
|  | 306     >>> k.get_metadata('meta2') | 
|  | 307     'This is the second metadata value' | 
|  | 308     >>> | 
|  | 309 | 
|  | 310 Setting/Getting/Deleting CORS Configuration on a Bucket | 
|  | 311 ------------------------------------------------------- | 
|  | 312 | 
|  | 313 Cross-origin resource sharing (CORS) defines a way for client web | 
|  | 314 applications that are loaded in one domain to interact with resources | 
|  | 315 in a different domain. With CORS support in Amazon S3, you can build | 
|  | 316 rich client-side web applications with Amazon S3 and selectively allow | 
|  | 317 cross-origin access to your Amazon S3 resources. | 
|  | 318 | 
|  | 319 To create a CORS configuration and associate it with a bucket:: | 
|  | 320 | 
|  | 321     >>> from boto.s3.cors import CORSConfiguration | 
|  | 322     >>> cors_cfg = CORSConfiguration() | 
|  | 323     >>> cors_cfg.add_rule(['PUT', 'POST', 'DELETE'], 'https://www.example.com', 
     allowed_header='*', max_age_seconds=3000, expose_header='x-amz-server-side-encry
     ption') | 
|  | 324     >>> cors_cfg.add_rule('GET', '*') | 
|  | 325 | 
|  | 326 The above code creates a CORS configuration object with two rules. | 
|  | 327 | 
|  | 328 * The first rule allows cross-origin PUT, POST, and DELETE requests from | 
|  | 329   the https://www.example.com/ origin.  The rule also allows all headers | 
|  | 330   in preflight OPTIONS request through the Access-Control-Request-Headers | 
|  | 331   header.  In response to any preflight OPTIONS request, Amazon S3 will | 
|  | 332   return any requested headers. | 
|  | 333 * The second rule allows cross-origin GET requests from all origins. | 
|  | 334 | 
|  | 335 To associate this configuration with a bucket:: | 
|  | 336 | 
|  | 337     >>> import boto | 
|  | 338     >>> c = boto.connect_s3() | 
|  | 339     >>> bucket = c.lookup('mybucket') | 
|  | 340     >>> bucket.set_cors(cors_cfg) | 
|  | 341 | 
|  | 342 To retrieve the CORS configuration associated with a bucket:: | 
|  | 343 | 
|  | 344     >>> cors_cfg = bucket.get_cors() | 
|  | 345 | 
|  | 346 And, finally, to delete all CORS configurations from a bucket:: | 
|  | 347 | 
|  | 348     >>> bucket.delete_cors() | 
|  | 349 | 
|  | 350 Transitioning Objects to Glacier | 
|  | 351 -------------------------------- | 
|  | 352 | 
|  | 353 You can configure objects in S3 to transition to Glacier after a period of | 
|  | 354 time.  This is done using lifecycle policies.  A lifecycle policy can also | 
|  | 355 specify that an object should be deleted after a period of time.  Lifecycle | 
|  | 356 configurations are assigned to buckets and require these parameters: | 
|  | 357 | 
|  | 358 * The object prefix that identifies the objects you are targeting. | 
|  | 359 * The action you want S3 to perform on the identified objects. | 
|  | 360 * The date (or time period) when you want S3 to perform these actions. | 
|  | 361 | 
|  | 362 For example, given a bucket ``s3-glacier-boto-demo``, we can first retrieve the | 
|  | 363 bucket:: | 
|  | 364 | 
|  | 365     >>> import boto | 
|  | 366     >>> c = boto.connect_s3() | 
|  | 367     >>> bucket = c.get_bucket('s3-glacier-boto-demo') | 
|  | 368 | 
|  | 369 Then we can create a lifecycle object.  In our example, we want all objects | 
|  | 370 under ``logs/*`` to transition to Glacier 30 days after the object is created. | 
|  | 371 | 
|  | 372 :: | 
|  | 373 | 
|  | 374     >>> from boto.s3.lifecycle import Lifecycle, Transition, Rule | 
|  | 375     >>> to_glacier = Transition(days=30, storage_class='GLACIER') | 
|  | 376     >>> rule = Rule('ruleid', 'logs/', 'Enabled', transition=to_glacier) | 
|  | 377     >>> lifecycle = Lifecycle() | 
|  | 378     >>> lifecycle.append(rule) | 
|  | 379 | 
|  | 380 .. note:: | 
|  | 381 | 
|  | 382   For API docs for the lifecycle objects, see :py:mod:`boto.s3.lifecycle` | 
|  | 383 | 
|  | 384 We can now configure the bucket with this lifecycle policy:: | 
|  | 385 | 
|  | 386     >>> bucket.configure_lifecycle(lifecycle) | 
|  | 387 True | 
|  | 388 | 
|  | 389 You can also retrieve the current lifecycle policy for the bucket:: | 
|  | 390 | 
|  | 391     >>> current = bucket.get_lifecycle_config() | 
|  | 392     >>> print current[0].transition | 
|  | 393     <Transition: in: 30 days, GLACIER> | 
|  | 394 | 
|  | 395 When an object transitions to Glacier, the storage class will be | 
|  | 396 updated.  This can be seen when you **list** the objects in a bucket:: | 
|  | 397 | 
|  | 398     >>> for key in bucket.list(): | 
|  | 399     ...   print key, key.storage_class | 
|  | 400     ... | 
|  | 401     <Key: s3-glacier-boto-demo,logs/testlog1.log> GLACIER | 
|  | 402 | 
|  | 403 You can also use the prefix argument to the ``bucket.list`` method:: | 
|  | 404 | 
|  | 405     >>> print list(b.list(prefix='logs/testlog1.log'))[0].storage_class | 
|  | 406     u'GLACIER' | 
|  | 407 | 
|  | 408 | 
|  | 409 Restoring Objects from Glacier | 
|  | 410 ------------------------------ | 
|  | 411 | 
|  | 412 Once an object has been transitioned to Glacier, you can restore the object | 
|  | 413 back to S3.  To do so, you can use the :py:meth:`boto.s3.key.Key.restore` | 
|  | 414 method of the key object. | 
|  | 415 The ``restore`` method takes an integer that specifies the number of days | 
|  | 416 to keep the object in S3. | 
|  | 417 | 
|  | 418 :: | 
|  | 419 | 
|  | 420     >>> import boto | 
|  | 421     >>> c = boto.connect_s3() | 
|  | 422     >>> bucket = c.get_bucket('s3-glacier-boto-demo') | 
|  | 423     >>> key = bucket.get_key('logs/testlog1.log') | 
|  | 424     >>> key.restore(days=5) | 
|  | 425 | 
|  | 426 It takes about 4 hours for a restore operation to make a copy of the archive | 
|  | 427 available for you to access.  While the object is being restored, the | 
|  | 428 ``ongoing_restore`` attribute will be set to ``True``:: | 
|  | 429 | 
|  | 430 | 
|  | 431     >>> key = bucket.get_key('logs/testlog1.log') | 
|  | 432     >>> print key.ongoing_restore | 
|  | 433     True | 
|  | 434 | 
|  | 435 When the restore is finished, this value will be ``False`` and the expiry | 
|  | 436 date of the object will be non ``None``:: | 
|  | 437 | 
|  | 438     >>> key = bucket.get_key('logs/testlog1.log') | 
|  | 439     >>> print key.ongoing_restore | 
|  | 440     False | 
|  | 441     >>> print key.expiry_date | 
|  | 442     "Fri, 21 Dec 2012 00:00:00 GMT" | 
|  | 443 | 
|  | 444 | 
|  | 445 .. note:: If there is no restore operation either in progress or completed, | 
|  | 446   the ``ongoing_restore`` attribute will be ``None``. | 
|  | 447 | 
|  | 448 Once the object is restored you can then download the contents:: | 
|  | 449 | 
|  | 450     >>> key.get_contents_to_filename('testlog1.log') | 
| OLD | NEW | 
|---|