Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(13)

Side by Side Diff: third_party/gsutil/boto/docs/source/s3_tut.rst

Issue 12042069: Scripts to download files from google storage based on sha1 sums (Closed) Base URL: https://chromium.googlesource.com/chromium/tools/depot_tools.git@master
Patch Set: Review fixes, updated gsutil Created 7 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 .. _s3_tut:
2
3 ======================================
4 An Introduction to boto's S3 interface
5 ======================================
6
7 This tutorial focuses on the boto interface to the Simple Storage Service
8 from Amazon Web Services. This tutorial assumes that you have already
9 downloaded and installed boto.
10
11 Creating a Connection
12 ---------------------
13 The first step in accessing S3 is to create a connection to the service.
14 There are two ways to do this in boto. The first is:
15
16 >>> from boto.s3.connection import S3Connection
17 >>> conn = S3Connection('<aws access key>', '<aws secret key>')
18
19 At this point the variable conn will point to an S3Connection object. In
20 this example, the AWS access key and AWS secret key are passed in to the
21 method explicitely. Alternatively, you can set the environment variables:
22
23 * `AWS_ACCESS_KEY_ID` - Your AWS Access Key ID
24 * `AWS_SECRET_ACCESS_KEY` - Your AWS Secret Access Key
25
26 and then call the constructor without any arguments, like this:
27
28 >>> conn = S3Connection()
29
30 There is also a shortcut function in the boto package, called connect_s3
31 that may provide a slightly easier means of creating a connection::
32
33 >>> import boto
34 >>> conn = boto.connect_s3()
35
36 In either case, conn will point to an S3Connection object which we will
37 use throughout the remainder of this tutorial.
38
39 Creating a Bucket
40 -----------------
41
42 Once you have a connection established with S3, you will probably want to
43 create a bucket. A bucket is a container used to store key/value pairs
44 in S3. A bucket can hold an unlimited amount of data so you could potentially
45 have just one bucket in S3 for all of your information. Or, you could create
46 separate buckets for different types of data. You can figure all of that out
47 later, first let's just create a bucket. That can be accomplished like this::
48
49 >>> bucket = conn.create_bucket('mybucket')
50 Traceback (most recent call last):
51 File "<stdin>", line 1, in ?
52 File "boto/connection.py", line 285, in create_bucket
53 raise S3CreateError(response.status, response.reason)
54 boto.exception.S3CreateError: S3Error[409]: Conflict
55
56 Whoa. What happended there? Well, the thing you have to know about
57 buckets is that they are kind of like domain names. It's one flat name
58 space that everyone who uses S3 shares. So, someone has already create
59 a bucket called "mybucket" in S3 and that means no one else can grab that
60 bucket name. So, you have to come up with a name that hasn't been taken yet.
61 For example, something that uses a unique string as a prefix. Your
62 AWS_ACCESS_KEY (NOT YOUR SECRET KEY!) could work but I'll leave it to
63 your imagination to come up with something. I'll just assume that you
64 found an acceptable name.
65
66 The create_bucket method will create the requested bucket if it does not
67 exist or will return the existing bucket if it does exist.
68
69 Creating a Bucket In Another Location
70 -------------------------------------
71
72 The example above assumes that you want to create a bucket in the
73 standard US region. However, it is possible to create buckets in
74 other locations. To do so, first import the Location object from the
75 boto.s3.connection module, like this::
76
77 >>> from boto.s3.connection import Location
78 >>> print '\n'.join(i for i in dir(Location) if i[0].isupper())
79 APNortheast
80 APSoutheast
81 APSoutheast2
82 DEFAULT
83 EU
84 SAEast
85 USWest
86 USWest2
87
88 As you can see, the Location object defines a number of possible locations. By
89 default, the location is the empty string which is interpreted as the US
90 Classic Region, the original S3 region. However, by specifying another
91 location at the time the bucket is created, you can instruct S3 to create the
92 bucket in that location. For example::
93
94 >>> conn.create_bucket('mybucket', location=Location.EU)
95
96 will create the bucket in the EU region (assuming the name is available).
97
98 Storing Data
99 ----------------
100
101 Once you have a bucket, presumably you will want to store some data
102 in it. S3 doesn't care what kind of information you store in your objects
103 or what format you use to store it. All you need is a key that is unique
104 within your bucket.
105
106 The Key object is used in boto to keep track of data stored in S3. To store
107 new data in S3, start by creating a new Key object::
108
109 >>> from boto.s3.key import Key
110 >>> k = Key(bucket)
111 >>> k.key = 'foobar'
112 >>> k.set_contents_from_string('This is a test of S3')
113
114 The net effect of these statements is to create a new object in S3 with a
115 key of "foobar" and a value of "This is a test of S3". To validate that
116 this worked, quit out of the interpreter and start it up again. Then::
117
118 >>> import boto
119 >>> c = boto.connect_s3()
120 >>> b = c.create_bucket('mybucket') # substitute your bucket name here
121 >>> from boto.s3.key import Key
122 >>> k = Key(b)
123 >>> k.key = 'foobar'
124 >>> k.get_contents_as_string()
125 'This is a test of S3'
126
127 So, we can definitely store and retrieve strings. A more interesting
128 example may be to store the contents of a local file in S3 and then retrieve
129 the contents to another local file.
130
131 ::
132
133 >>> k = Key(b)
134 >>> k.key = 'myfile'
135 >>> k.set_contents_from_filename('foo.jpg')
136 >>> k.get_contents_to_filename('bar.jpg')
137
138 There are a couple of things to note about this. When you send data to
139 S3 from a file or filename, boto will attempt to determine the correct
140 mime type for that file and send it as a Content-Type header. The boto
141 package uses the standard mimetypes package in Python to do the mime type
142 guessing. The other thing to note is that boto does stream the content
143 to and from S3 so you should be able to send and receive large files without
144 any problem.
145
146 Accessing A Bucket
147 ------------------
148
149 Once a bucket exists, you can access it by getting the bucket. For example::
150
151 >>> mybucket = conn.get_bucket('mybucket') # Substitute in your bucket name
152 >>> mybucket.list()
153 <listing of keys in the bucket)
154
155 By default, this method tries to validate the bucket's existence. You can
156 override this behavior by passing ``validate=False``.::
157
158 >>> nonexistent = conn.get_bucket('i-dont-exist-at-all', validate=False)
159
160 If the bucket does not exist, a ``S3ResponseError`` will commonly be thrown. If
161 you'd rather not deal with any exceptions, you can use the ``lookup`` method.::
162
163 >>> nonexistent = conn.lookup('i-dont-exist-at-all')
164 >>> if nonexistent is None:
165 ... print "No such bucket!"
166 ...
167 No such bucket!
168
169 Deleting A Bucket
170 -----------------
171
172 Removing a bucket can be done using the ``delete_bucket`` method. For example::
173
174 >>> conn.delete_bucket('mybucket') # Substitute in your bucket name
175
176 The bucket must be empty of keys or this call will fail & an exception will be
177 raised. You can remove a non-empty bucket by doing something like::
178
179 >>> full_bucket = conn.get_bucket('bucket-to-delete')
180 # It's full of keys. Delete them all.
181 >>> for key in full_bucket.list():
182 ... key.delete()
183 ...
184 # The bucket is empty now. Delete it.
185 >>> conn.delete_bucket('bucket-to-delete')
186
187 .. warning::
188
189 This method can cause data loss! Be very careful when using it.
190
191 Additionally, be aware that using the above method for removing all keys
192 and deleting the bucket involves a request for each key. As such, it's not
193 particularly fast & is very chatty.
194
195 Listing All Available Buckets
196 -----------------------------
197 In addition to accessing specific buckets via the create_bucket method
198 you can also get a list of all available buckets that you have created.
199
200 ::
201
202 >>> rs = conn.get_all_buckets()
203
204 This returns a ResultSet object (see the SQS Tutorial for more info on
205 ResultSet objects). The ResultSet can be used as a sequence or list type
206 object to retrieve Bucket objects.
207
208 ::
209
210 >>> len(rs)
211 11
212 >>> for b in rs:
213 ... print b.name
214 ...
215 <listing of available buckets>
216 >>> b = rs[0]
217
218 Setting / Getting the Access Control List for Buckets and Keys
219 --------------------------------------------------------------
220 The S3 service provides the ability to control access to buckets and keys
221 within s3 via the Access Control List (ACL) associated with each object in
222 S3. There are two ways to set the ACL for an object:
223
224 1. Create a custom ACL that grants specific rights to specific users. At the
225 moment, the users that are specified within grants have to be registered
226 users of Amazon Web Services so this isn't as useful or as general as it
227 could be.
228
229 2. Use a "canned" access control policy. There are four canned policies
230 defined:
231
232 a. private: Owner gets FULL_CONTROL. No one else has any access rights.
233 b. public-read: Owners gets FULL_CONTROL and the anonymous principal is grant ed READ access.
234 c. public-read-write: Owner gets FULL_CONTROL and the anonymous principal is granted READ and WRITE access.
235 d. authenticated-read: Owner gets FULL_CONTROL and any principal authenticate d as a registered Amazon S3 user is granted READ access.
236
237 To set a canned ACL for a bucket, use the set_acl method of the Bucket object.
238 The argument passed to this method must be one of the four permissable
239 canned policies named in the list CannedACLStrings contained in acl.py.
240 For example, to make a bucket readable by anyone:
241
242 >>> b.set_acl('public-read')
243
244 You can also set the ACL for Key objects, either by passing an additional
245 argument to the above method:
246
247 >>> b.set_acl('public-read', 'foobar')
248
249 where 'foobar' is the key of some object within the bucket b or you can
250 call the set_acl method of the Key object:
251
252 >>> k.set_acl('public-read')
253
254 You can also retrieve the current ACL for a Bucket or Key object using the
255 get_acl object. This method parses the AccessControlPolicy response sent
256 by S3 and creates a set of Python objects that represent the ACL.
257
258 ::
259
260 >>> acp = b.get_acl()
261 >>> acp
262 <boto.acl.Policy instance at 0x2e6940>
263 >>> acp.acl
264 <boto.acl.ACL instance at 0x2e69e0>
265 >>> acp.acl.grants
266 [<boto.acl.Grant instance at 0x2e6a08>]
267 >>> for grant in acp.acl.grants:
268 ... print grant.permission, grant.display_name, grant.email_address, grant .id
269 ...
270 FULL_CONTROL <boto.user.User instance at 0x2e6a30>
271
272 The Python objects representing the ACL can be found in the acl.py module
273 of boto.
274
275 Both the Bucket object and the Key object also provide shortcut
276 methods to simplify the process of granting individuals specific
277 access. For example, if you want to grant an individual user READ
278 access to a particular object in S3 you could do the following::
279
280 >>> key = b.lookup('mykeytoshare')
281 >>> key.add_email_grant('READ', 'foo@bar.com')
282
283 The email address provided should be the one associated with the users
284 AWS account. There is a similar method called add_user_grant that accepts the
285 canonical id of the user rather than the email address.
286
287 Setting/Getting Metadata Values on Key Objects
288 ----------------------------------------------
289 S3 allows arbitrary user metadata to be assigned to objects within a bucket.
290 To take advantage of this S3 feature, you should use the set_metadata and
291 get_metadata methods of the Key object to set and retrieve metadata associated
292 with an S3 object. For example::
293
294 >>> k = Key(b)
295 >>> k.key = 'has_metadata'
296 >>> k.set_metadata('meta1', 'This is the first metadata value')
297 >>> k.set_metadata('meta2', 'This is the second metadata value')
298 >>> k.set_contents_from_filename('foo.txt')
299
300 This code associates two metadata key/value pairs with the Key k. To retrieve
301 those values later::
302
303 >>> k = b.get_key('has_metadata')
304 >>> k.get_metadata('meta1')
305 'This is the first metadata value'
306 >>> k.get_metadata('meta2')
307 'This is the second metadata value'
308 >>>
309
310 Setting/Getting/Deleting CORS Configuration on a Bucket
311 -------------------------------------------------------
312
313 Cross-origin resource sharing (CORS) defines a way for client web
314 applications that are loaded in one domain to interact with resources
315 in a different domain. With CORS support in Amazon S3, you can build
316 rich client-side web applications with Amazon S3 and selectively allow
317 cross-origin access to your Amazon S3 resources.
318
319 To create a CORS configuration and associate it with a bucket::
320
321 >>> from boto.s3.cors import CORSConfiguration
322 >>> cors_cfg = CORSConfiguration()
323 >>> cors_cfg.add_rule(['PUT', 'POST', 'DELETE'], 'https://www.example.com', allowed_header='*', max_age_seconds=3000, expose_header='x-amz-server-side-encry ption')
324 >>> cors_cfg.add_rule('GET', '*')
325
326 The above code creates a CORS configuration object with two rules.
327
328 * The first rule allows cross-origin PUT, POST, and DELETE requests from
329 the https://www.example.com/ origin. The rule also allows all headers
330 in preflight OPTIONS request through the Access-Control-Request-Headers
331 header. In response to any preflight OPTIONS request, Amazon S3 will
332 return any requested headers.
333 * The second rule allows cross-origin GET requests from all origins.
334
335 To associate this configuration with a bucket::
336
337 >>> import boto
338 >>> c = boto.connect_s3()
339 >>> bucket = c.lookup('mybucket')
340 >>> bucket.set_cors(cors_cfg)
341
342 To retrieve the CORS configuration associated with a bucket::
343
344 >>> cors_cfg = bucket.get_cors()
345
346 And, finally, to delete all CORS configurations from a bucket::
347
348 >>> bucket.delete_cors()
349
350 Transitioning Objects to Glacier
351 --------------------------------
352
353 You can configure objects in S3 to transition to Glacier after a period of
354 time. This is done using lifecycle policies. A lifecycle policy can also
355 specify that an object should be deleted after a period of time. Lifecycle
356 configurations are assigned to buckets and require these parameters:
357
358 * The object prefix that identifies the objects you are targeting.
359 * The action you want S3 to perform on the identified objects.
360 * The date (or time period) when you want S3 to perform these actions.
361
362 For example, given a bucket ``s3-glacier-boto-demo``, we can first retrieve the
363 bucket::
364
365 >>> import boto
366 >>> c = boto.connect_s3()
367 >>> bucket = c.get_bucket('s3-glacier-boto-demo')
368
369 Then we can create a lifecycle object. In our example, we want all objects
370 under ``logs/*`` to transition to Glacier 30 days after the object is created.
371
372 ::
373
374 >>> from boto.s3.lifecycle import Lifecycle, Transition, Rule
375 >>> to_glacier = Transition(days=30, storage_class='GLACIER')
376 >>> rule = Rule('ruleid', 'logs/', 'Enabled', transition=to_glacier)
377 >>> lifecycle = Lifecycle()
378 >>> lifecycle.append(rule)
379
380 .. note::
381
382 For API docs for the lifecycle objects, see :py:mod:`boto.s3.lifecycle`
383
384 We can now configure the bucket with this lifecycle policy::
385
386 >>> bucket.configure_lifecycle(lifecycle)
387 True
388
389 You can also retrieve the current lifecycle policy for the bucket::
390
391 >>> current = bucket.get_lifecycle_config()
392 >>> print current[0].transition
393 <Transition: in: 30 days, GLACIER>
394
395 When an object transitions to Glacier, the storage class will be
396 updated. This can be seen when you **list** the objects in a bucket::
397
398 >>> for key in bucket.list():
399 ... print key, key.storage_class
400 ...
401 <Key: s3-glacier-boto-demo,logs/testlog1.log> GLACIER
402
403 You can also use the prefix argument to the ``bucket.list`` method::
404
405 >>> print list(b.list(prefix='logs/testlog1.log'))[0].storage_class
406 u'GLACIER'
407
408
409 Restoring Objects from Glacier
410 ------------------------------
411
412 Once an object has been transitioned to Glacier, you can restore the object
413 back to S3. To do so, you can use the :py:meth:`boto.s3.key.Key.restore`
414 method of the key object.
415 The ``restore`` method takes an integer that specifies the number of days
416 to keep the object in S3.
417
418 ::
419
420 >>> import boto
421 >>> c = boto.connect_s3()
422 >>> bucket = c.get_bucket('s3-glacier-boto-demo')
423 >>> key = bucket.get_key('logs/testlog1.log')
424 >>> key.restore(days=5)
425
426 It takes about 4 hours for a restore operation to make a copy of the archive
427 available for you to access. While the object is being restored, the
428 ``ongoing_restore`` attribute will be set to ``True``::
429
430
431 >>> key = bucket.get_key('logs/testlog1.log')
432 >>> print key.ongoing_restore
433 True
434
435 When the restore is finished, this value will be ``False`` and the expiry
436 date of the object will be non ``None``::
437
438 >>> key = bucket.get_key('logs/testlog1.log')
439 >>> print key.ongoing_restore
440 False
441 >>> print key.expiry_date
442 "Fri, 21 Dec 2012 00:00:00 GMT"
443
444
445 .. note:: If there is no restore operation either in progress or completed,
446 the ``ongoing_restore`` attribute will be ``None``.
447
448 Once the object is restored you can then download the contents::
449
450 >>> key.get_contents_to_filename('testlog1.log')
OLDNEW

Powered by Google App Engine
This is Rietveld 408576698