OLD | NEW |
(Empty) | |
| 1 .. _s3_tut: |
| 2 |
| 3 ====================================== |
| 4 An Introduction to boto's S3 interface |
| 5 ====================================== |
| 6 |
| 7 This tutorial focuses on the boto interface to the Simple Storage Service |
| 8 from Amazon Web Services. This tutorial assumes that you have already |
| 9 downloaded and installed boto. |
| 10 |
| 11 Creating a Connection |
| 12 --------------------- |
| 13 The first step in accessing S3 is to create a connection to the service. |
| 14 There are two ways to do this in boto. The first is: |
| 15 |
| 16 >>> from boto.s3.connection import S3Connection |
| 17 >>> conn = S3Connection('<aws access key>', '<aws secret key>') |
| 18 |
| 19 At this point the variable conn will point to an S3Connection object. In |
| 20 this example, the AWS access key and AWS secret key are passed in to the |
| 21 method explicitely. Alternatively, you can set the environment variables: |
| 22 |
| 23 * `AWS_ACCESS_KEY_ID` - Your AWS Access Key ID |
| 24 * `AWS_SECRET_ACCESS_KEY` - Your AWS Secret Access Key |
| 25 |
| 26 and then call the constructor without any arguments, like this: |
| 27 |
| 28 >>> conn = S3Connection() |
| 29 |
| 30 There is also a shortcut function in the boto package, called connect_s3 |
| 31 that may provide a slightly easier means of creating a connection:: |
| 32 |
| 33 >>> import boto |
| 34 >>> conn = boto.connect_s3() |
| 35 |
| 36 In either case, conn will point to an S3Connection object which we will |
| 37 use throughout the remainder of this tutorial. |
| 38 |
| 39 Creating a Bucket |
| 40 ----------------- |
| 41 |
| 42 Once you have a connection established with S3, you will probably want to |
| 43 create a bucket. A bucket is a container used to store key/value pairs |
| 44 in S3. A bucket can hold an unlimited amount of data so you could potentially |
| 45 have just one bucket in S3 for all of your information. Or, you could create |
| 46 separate buckets for different types of data. You can figure all of that out |
| 47 later, first let's just create a bucket. That can be accomplished like this:: |
| 48 |
| 49 >>> bucket = conn.create_bucket('mybucket') |
| 50 Traceback (most recent call last): |
| 51 File "<stdin>", line 1, in ? |
| 52 File "boto/connection.py", line 285, in create_bucket |
| 53 raise S3CreateError(response.status, response.reason) |
| 54 boto.exception.S3CreateError: S3Error[409]: Conflict |
| 55 |
| 56 Whoa. What happended there? Well, the thing you have to know about |
| 57 buckets is that they are kind of like domain names. It's one flat name |
| 58 space that everyone who uses S3 shares. So, someone has already create |
| 59 a bucket called "mybucket" in S3 and that means no one else can grab that |
| 60 bucket name. So, you have to come up with a name that hasn't been taken yet. |
| 61 For example, something that uses a unique string as a prefix. Your |
| 62 AWS_ACCESS_KEY (NOT YOUR SECRET KEY!) could work but I'll leave it to |
| 63 your imagination to come up with something. I'll just assume that you |
| 64 found an acceptable name. |
| 65 |
| 66 The create_bucket method will create the requested bucket if it does not |
| 67 exist or will return the existing bucket if it does exist. |
| 68 |
| 69 Creating a Bucket In Another Location |
| 70 ------------------------------------- |
| 71 |
| 72 The example above assumes that you want to create a bucket in the |
| 73 standard US region. However, it is possible to create buckets in |
| 74 other locations. To do so, first import the Location object from the |
| 75 boto.s3.connection module, like this:: |
| 76 |
| 77 >>> from boto.s3.connection import Location |
| 78 >>> print '\n'.join(i for i in dir(Location) if i[0].isupper()) |
| 79 APNortheast |
| 80 APSoutheast |
| 81 APSoutheast2 |
| 82 DEFAULT |
| 83 EU |
| 84 SAEast |
| 85 USWest |
| 86 USWest2 |
| 87 |
| 88 As you can see, the Location object defines a number of possible locations. By |
| 89 default, the location is the empty string which is interpreted as the US |
| 90 Classic Region, the original S3 region. However, by specifying another |
| 91 location at the time the bucket is created, you can instruct S3 to create the |
| 92 bucket in that location. For example:: |
| 93 |
| 94 >>> conn.create_bucket('mybucket', location=Location.EU) |
| 95 |
| 96 will create the bucket in the EU region (assuming the name is available). |
| 97 |
| 98 Storing Data |
| 99 ---------------- |
| 100 |
| 101 Once you have a bucket, presumably you will want to store some data |
| 102 in it. S3 doesn't care what kind of information you store in your objects |
| 103 or what format you use to store it. All you need is a key that is unique |
| 104 within your bucket. |
| 105 |
| 106 The Key object is used in boto to keep track of data stored in S3. To store |
| 107 new data in S3, start by creating a new Key object:: |
| 108 |
| 109 >>> from boto.s3.key import Key |
| 110 >>> k = Key(bucket) |
| 111 >>> k.key = 'foobar' |
| 112 >>> k.set_contents_from_string('This is a test of S3') |
| 113 |
| 114 The net effect of these statements is to create a new object in S3 with a |
| 115 key of "foobar" and a value of "This is a test of S3". To validate that |
| 116 this worked, quit out of the interpreter and start it up again. Then:: |
| 117 |
| 118 >>> import boto |
| 119 >>> c = boto.connect_s3() |
| 120 >>> b = c.create_bucket('mybucket') # substitute your bucket name here |
| 121 >>> from boto.s3.key import Key |
| 122 >>> k = Key(b) |
| 123 >>> k.key = 'foobar' |
| 124 >>> k.get_contents_as_string() |
| 125 'This is a test of S3' |
| 126 |
| 127 So, we can definitely store and retrieve strings. A more interesting |
| 128 example may be to store the contents of a local file in S3 and then retrieve |
| 129 the contents to another local file. |
| 130 |
| 131 :: |
| 132 |
| 133 >>> k = Key(b) |
| 134 >>> k.key = 'myfile' |
| 135 >>> k.set_contents_from_filename('foo.jpg') |
| 136 >>> k.get_contents_to_filename('bar.jpg') |
| 137 |
| 138 There are a couple of things to note about this. When you send data to |
| 139 S3 from a file or filename, boto will attempt to determine the correct |
| 140 mime type for that file and send it as a Content-Type header. The boto |
| 141 package uses the standard mimetypes package in Python to do the mime type |
| 142 guessing. The other thing to note is that boto does stream the content |
| 143 to and from S3 so you should be able to send and receive large files without |
| 144 any problem. |
| 145 |
| 146 Accessing A Bucket |
| 147 ------------------ |
| 148 |
| 149 Once a bucket exists, you can access it by getting the bucket. For example:: |
| 150 |
| 151 >>> mybucket = conn.get_bucket('mybucket') # Substitute in your bucket name |
| 152 >>> mybucket.list() |
| 153 <listing of keys in the bucket) |
| 154 |
| 155 By default, this method tries to validate the bucket's existence. You can |
| 156 override this behavior by passing ``validate=False``.:: |
| 157 |
| 158 >>> nonexistent = conn.get_bucket('i-dont-exist-at-all', validate=False) |
| 159 |
| 160 If the bucket does not exist, a ``S3ResponseError`` will commonly be thrown. If |
| 161 you'd rather not deal with any exceptions, you can use the ``lookup`` method.:: |
| 162 |
| 163 >>> nonexistent = conn.lookup('i-dont-exist-at-all') |
| 164 >>> if nonexistent is None: |
| 165 ... print "No such bucket!" |
| 166 ... |
| 167 No such bucket! |
| 168 |
| 169 Deleting A Bucket |
| 170 ----------------- |
| 171 |
| 172 Removing a bucket can be done using the ``delete_bucket`` method. For example:: |
| 173 |
| 174 >>> conn.delete_bucket('mybucket') # Substitute in your bucket name |
| 175 |
| 176 The bucket must be empty of keys or this call will fail & an exception will be |
| 177 raised. You can remove a non-empty bucket by doing something like:: |
| 178 |
| 179 >>> full_bucket = conn.get_bucket('bucket-to-delete') |
| 180 # It's full of keys. Delete them all. |
| 181 >>> for key in full_bucket.list(): |
| 182 ... key.delete() |
| 183 ... |
| 184 # The bucket is empty now. Delete it. |
| 185 >>> conn.delete_bucket('bucket-to-delete') |
| 186 |
| 187 .. warning:: |
| 188 |
| 189 This method can cause data loss! Be very careful when using it. |
| 190 |
| 191 Additionally, be aware that using the above method for removing all keys |
| 192 and deleting the bucket involves a request for each key. As such, it's not |
| 193 particularly fast & is very chatty. |
| 194 |
| 195 Listing All Available Buckets |
| 196 ----------------------------- |
| 197 In addition to accessing specific buckets via the create_bucket method |
| 198 you can also get a list of all available buckets that you have created. |
| 199 |
| 200 :: |
| 201 |
| 202 >>> rs = conn.get_all_buckets() |
| 203 |
| 204 This returns a ResultSet object (see the SQS Tutorial for more info on |
| 205 ResultSet objects). The ResultSet can be used as a sequence or list type |
| 206 object to retrieve Bucket objects. |
| 207 |
| 208 :: |
| 209 |
| 210 >>> len(rs) |
| 211 11 |
| 212 >>> for b in rs: |
| 213 ... print b.name |
| 214 ... |
| 215 <listing of available buckets> |
| 216 >>> b = rs[0] |
| 217 |
| 218 Setting / Getting the Access Control List for Buckets and Keys |
| 219 -------------------------------------------------------------- |
| 220 The S3 service provides the ability to control access to buckets and keys |
| 221 within s3 via the Access Control List (ACL) associated with each object in |
| 222 S3. There are two ways to set the ACL for an object: |
| 223 |
| 224 1. Create a custom ACL that grants specific rights to specific users. At the |
| 225 moment, the users that are specified within grants have to be registered |
| 226 users of Amazon Web Services so this isn't as useful or as general as it |
| 227 could be. |
| 228 |
| 229 2. Use a "canned" access control policy. There are four canned policies |
| 230 defined: |
| 231 |
| 232 a. private: Owner gets FULL_CONTROL. No one else has any access rights. |
| 233 b. public-read: Owners gets FULL_CONTROL and the anonymous principal is grant
ed READ access. |
| 234 c. public-read-write: Owner gets FULL_CONTROL and the anonymous principal is
granted READ and WRITE access. |
| 235 d. authenticated-read: Owner gets FULL_CONTROL and any principal authenticate
d as a registered Amazon S3 user is granted READ access. |
| 236 |
| 237 To set a canned ACL for a bucket, use the set_acl method of the Bucket object. |
| 238 The argument passed to this method must be one of the four permissable |
| 239 canned policies named in the list CannedACLStrings contained in acl.py. |
| 240 For example, to make a bucket readable by anyone: |
| 241 |
| 242 >>> b.set_acl('public-read') |
| 243 |
| 244 You can also set the ACL for Key objects, either by passing an additional |
| 245 argument to the above method: |
| 246 |
| 247 >>> b.set_acl('public-read', 'foobar') |
| 248 |
| 249 where 'foobar' is the key of some object within the bucket b or you can |
| 250 call the set_acl method of the Key object: |
| 251 |
| 252 >>> k.set_acl('public-read') |
| 253 |
| 254 You can also retrieve the current ACL for a Bucket or Key object using the |
| 255 get_acl object. This method parses the AccessControlPolicy response sent |
| 256 by S3 and creates a set of Python objects that represent the ACL. |
| 257 |
| 258 :: |
| 259 |
| 260 >>> acp = b.get_acl() |
| 261 >>> acp |
| 262 <boto.acl.Policy instance at 0x2e6940> |
| 263 >>> acp.acl |
| 264 <boto.acl.ACL instance at 0x2e69e0> |
| 265 >>> acp.acl.grants |
| 266 [<boto.acl.Grant instance at 0x2e6a08>] |
| 267 >>> for grant in acp.acl.grants: |
| 268 ... print grant.permission, grant.display_name, grant.email_address, grant
.id |
| 269 ... |
| 270 FULL_CONTROL <boto.user.User instance at 0x2e6a30> |
| 271 |
| 272 The Python objects representing the ACL can be found in the acl.py module |
| 273 of boto. |
| 274 |
| 275 Both the Bucket object and the Key object also provide shortcut |
| 276 methods to simplify the process of granting individuals specific |
| 277 access. For example, if you want to grant an individual user READ |
| 278 access to a particular object in S3 you could do the following:: |
| 279 |
| 280 >>> key = b.lookup('mykeytoshare') |
| 281 >>> key.add_email_grant('READ', 'foo@bar.com') |
| 282 |
| 283 The email address provided should be the one associated with the users |
| 284 AWS account. There is a similar method called add_user_grant that accepts the |
| 285 canonical id of the user rather than the email address. |
| 286 |
| 287 Setting/Getting Metadata Values on Key Objects |
| 288 ---------------------------------------------- |
| 289 S3 allows arbitrary user metadata to be assigned to objects within a bucket. |
| 290 To take advantage of this S3 feature, you should use the set_metadata and |
| 291 get_metadata methods of the Key object to set and retrieve metadata associated |
| 292 with an S3 object. For example:: |
| 293 |
| 294 >>> k = Key(b) |
| 295 >>> k.key = 'has_metadata' |
| 296 >>> k.set_metadata('meta1', 'This is the first metadata value') |
| 297 >>> k.set_metadata('meta2', 'This is the second metadata value') |
| 298 >>> k.set_contents_from_filename('foo.txt') |
| 299 |
| 300 This code associates two metadata key/value pairs with the Key k. To retrieve |
| 301 those values later:: |
| 302 |
| 303 >>> k = b.get_key('has_metadata') |
| 304 >>> k.get_metadata('meta1') |
| 305 'This is the first metadata value' |
| 306 >>> k.get_metadata('meta2') |
| 307 'This is the second metadata value' |
| 308 >>> |
| 309 |
| 310 Setting/Getting/Deleting CORS Configuration on a Bucket |
| 311 ------------------------------------------------------- |
| 312 |
| 313 Cross-origin resource sharing (CORS) defines a way for client web |
| 314 applications that are loaded in one domain to interact with resources |
| 315 in a different domain. With CORS support in Amazon S3, you can build |
| 316 rich client-side web applications with Amazon S3 and selectively allow |
| 317 cross-origin access to your Amazon S3 resources. |
| 318 |
| 319 To create a CORS configuration and associate it with a bucket:: |
| 320 |
| 321 >>> from boto.s3.cors import CORSConfiguration |
| 322 >>> cors_cfg = CORSConfiguration() |
| 323 >>> cors_cfg.add_rule(['PUT', 'POST', 'DELETE'], 'https://www.example.com',
allowed_header='*', max_age_seconds=3000, expose_header='x-amz-server-side-encry
ption') |
| 324 >>> cors_cfg.add_rule('GET', '*') |
| 325 |
| 326 The above code creates a CORS configuration object with two rules. |
| 327 |
| 328 * The first rule allows cross-origin PUT, POST, and DELETE requests from |
| 329 the https://www.example.com/ origin. The rule also allows all headers |
| 330 in preflight OPTIONS request through the Access-Control-Request-Headers |
| 331 header. In response to any preflight OPTIONS request, Amazon S3 will |
| 332 return any requested headers. |
| 333 * The second rule allows cross-origin GET requests from all origins. |
| 334 |
| 335 To associate this configuration with a bucket:: |
| 336 |
| 337 >>> import boto |
| 338 >>> c = boto.connect_s3() |
| 339 >>> bucket = c.lookup('mybucket') |
| 340 >>> bucket.set_cors(cors_cfg) |
| 341 |
| 342 To retrieve the CORS configuration associated with a bucket:: |
| 343 |
| 344 >>> cors_cfg = bucket.get_cors() |
| 345 |
| 346 And, finally, to delete all CORS configurations from a bucket:: |
| 347 |
| 348 >>> bucket.delete_cors() |
| 349 |
| 350 Transitioning Objects to Glacier |
| 351 -------------------------------- |
| 352 |
| 353 You can configure objects in S3 to transition to Glacier after a period of |
| 354 time. This is done using lifecycle policies. A lifecycle policy can also |
| 355 specify that an object should be deleted after a period of time. Lifecycle |
| 356 configurations are assigned to buckets and require these parameters: |
| 357 |
| 358 * The object prefix that identifies the objects you are targeting. |
| 359 * The action you want S3 to perform on the identified objects. |
| 360 * The date (or time period) when you want S3 to perform these actions. |
| 361 |
| 362 For example, given a bucket ``s3-glacier-boto-demo``, we can first retrieve the |
| 363 bucket:: |
| 364 |
| 365 >>> import boto |
| 366 >>> c = boto.connect_s3() |
| 367 >>> bucket = c.get_bucket('s3-glacier-boto-demo') |
| 368 |
| 369 Then we can create a lifecycle object. In our example, we want all objects |
| 370 under ``logs/*`` to transition to Glacier 30 days after the object is created. |
| 371 |
| 372 :: |
| 373 |
| 374 >>> from boto.s3.lifecycle import Lifecycle, Transition, Rule |
| 375 >>> to_glacier = Transition(days=30, storage_class='GLACIER') |
| 376 >>> rule = Rule('ruleid', 'logs/', 'Enabled', transition=to_glacier) |
| 377 >>> lifecycle = Lifecycle() |
| 378 >>> lifecycle.append(rule) |
| 379 |
| 380 .. note:: |
| 381 |
| 382 For API docs for the lifecycle objects, see :py:mod:`boto.s3.lifecycle` |
| 383 |
| 384 We can now configure the bucket with this lifecycle policy:: |
| 385 |
| 386 >>> bucket.configure_lifecycle(lifecycle) |
| 387 True |
| 388 |
| 389 You can also retrieve the current lifecycle policy for the bucket:: |
| 390 |
| 391 >>> current = bucket.get_lifecycle_config() |
| 392 >>> print current[0].transition |
| 393 <Transition: in: 30 days, GLACIER> |
| 394 |
| 395 When an object transitions to Glacier, the storage class will be |
| 396 updated. This can be seen when you **list** the objects in a bucket:: |
| 397 |
| 398 >>> for key in bucket.list(): |
| 399 ... print key, key.storage_class |
| 400 ... |
| 401 <Key: s3-glacier-boto-demo,logs/testlog1.log> GLACIER |
| 402 |
| 403 You can also use the prefix argument to the ``bucket.list`` method:: |
| 404 |
| 405 >>> print list(b.list(prefix='logs/testlog1.log'))[0].storage_class |
| 406 u'GLACIER' |
| 407 |
| 408 |
| 409 Restoring Objects from Glacier |
| 410 ------------------------------ |
| 411 |
| 412 Once an object has been transitioned to Glacier, you can restore the object |
| 413 back to S3. To do so, you can use the :py:meth:`boto.s3.key.Key.restore` |
| 414 method of the key object. |
| 415 The ``restore`` method takes an integer that specifies the number of days |
| 416 to keep the object in S3. |
| 417 |
| 418 :: |
| 419 |
| 420 >>> import boto |
| 421 >>> c = boto.connect_s3() |
| 422 >>> bucket = c.get_bucket('s3-glacier-boto-demo') |
| 423 >>> key = bucket.get_key('logs/testlog1.log') |
| 424 >>> key.restore(days=5) |
| 425 |
| 426 It takes about 4 hours for a restore operation to make a copy of the archive |
| 427 available for you to access. While the object is being restored, the |
| 428 ``ongoing_restore`` attribute will be set to ``True``:: |
| 429 |
| 430 |
| 431 >>> key = bucket.get_key('logs/testlog1.log') |
| 432 >>> print key.ongoing_restore |
| 433 True |
| 434 |
| 435 When the restore is finished, this value will be ``False`` and the expiry |
| 436 date of the object will be non ``None``:: |
| 437 |
| 438 >>> key = bucket.get_key('logs/testlog1.log') |
| 439 >>> print key.ongoing_restore |
| 440 False |
| 441 >>> print key.expiry_date |
| 442 "Fri, 21 Dec 2012 00:00:00 GMT" |
| 443 |
| 444 |
| 445 .. note:: If there is no restore operation either in progress or completed, |
| 446 the ``ongoing_restore`` attribute will be ``None``. |
| 447 |
| 448 Once the object is restored you can then download the contents:: |
| 449 |
| 450 >>> key.get_contents_to_filename('testlog1.log') |
OLD | NEW |