|
|
Created:
7 years, 11 months ago by Ryan Tseng Modified:
7 years, 3 months ago CC:
chromium-reviews, Dirk Pranke, cmp+cc_chromium.org, M-A Ruel, iannucci+depot_tools_chromium.org, szager1, kjellander_chromium Base URL:
https://chromium.googlesource.com/chromium/tools/depot_tools.git@master Visibility:
Public. |
DescriptionScripts to download files from google storage based on sha1 sums
continuation of: https://codereview.chromium.org/11664024
Moved it from chrome/trunk/src/build to depot_tools/
BUG=153360
TEST=two unittests included in tests/
For end-to-end testing, check out a large directory. Run
find . -name .svn -prune -o -size +1000k -type f -print0 | upload_to_google_storage.py -b chrome-artifacts -0 -
(replacing chrome-artifacts with an upload-able bucket)
to test upload
run "find . -name .svn -prune -o -size +1000k -type f -print0 | xargs -0 rm" to remove the files uploaded. Check that the large binary files have been removed
run "download_from_google_storage.py -r -d -b chrome-artifacts ." to download the files again.
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=187951
Patch Set 1 #Patch Set 2 : #Patch Set 3 : Added gsutil #Patch Set 4 : bug fixes #Patch Set 5 : Removed gsutil/tests and gsutil/docs #
Total comments: 65
Patch Set 6 : Review fixes, updated gsutil #
Total comments: 33
Patch Set 7 : Removed gsutil into a separate patch #Patch Set 8 : Review fixes #Patch Set 9 : Added some unittests #
Total comments: 23
Patch Set 10 : Review fixes #Patch Set 11 : Removed wrappers #Patch Set 12 : Test fix #
Total comments: 16
Patch Set 13 : Review fixes #
Total comments: 5
Patch Set 14 : Removed gstools.py, added more error messages #
Total comments: 6
Patch Set 15 : Review changes, fixed race condition #
Total comments: 7
Patch Set 16 : Small review fixes #Patch Set 17 : Move printing from main thread to printer thread #
Total comments: 12
Patch Set 18 : Added exception types, renamed variables #
Total comments: 21
Patch Set 19 : Review fixes #
Total comments: 6
Patch Set 20 : More review fixes #
Total comments: 20
Patch Set 21 : Review fixes #
Total comments: 2
Patch Set 22 : Split tests, fixed ret_code seeding #
Total comments: 9
Patch Set 23 : Bug fix in gsutil wrapper #Patch Set 24 : Updated messages, switched main thread to use stdout_queue #
Total comments: 22
Patch Set 25 : Review fixes, mostly in unittests #
Total comments: 12
Patch Set 26 : Style fixes #Patch Set 27 : #
Total comments: 2
Messages
Total messages: 44 (0 generated)
This includes all of gsutil except for docs/ and tests/ inside third_party/ I put this in depot_tools so that it can be used both in chrome/ and chrome-internal/ Reimplementing gsutil would take another long while and would be buggier, and I'd like to get this in sooner than later so we can start migrating large binary blobs to google storage.
https://codereview.chromium.org/12042069/diff/20105/common.py File common.py (right): https://codereview.chromium.org/12042069/diff/20105/common.py#newcode4 common.py:4: # common.py: Utility functions common to the Google storage scripts. docstring https://codereview.chromium.org/12042069/diff/20105/common.py#newcode7 common.py:7: import hashlib sort https://codereview.chromium.org/12042069/diff/20105/common.py#newcode11 common.py:11: 2 lines https://codereview.chromium.org/12042069/diff/20105/common.py#newcode13 common.py:13: """A convenience class to call gsutil with some predefined settings.""" """Calls gsutil ... https://codereview.chromium.org/12042069/diff/20105/common.py#newcode18 common.py:18: Remove empty line https://codereview.chromium.org/12042069/diff/20105/common.py#newcode69 common.py:69: def CheckSHA1(sha1_sum, filename): I don't think this function is super useful. https://codereview.chromium.org/12042069/diff/20105/common.py#newcode73 common.py:73: def GetMD5(filename, lock, use_md5): Why is this function so different/asymetric than GetSHA1? https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:6: """Script to download files from Google Storage.""" """Downloads .. We know it's a script. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:19: 2 lines https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:20: # Assuming depot_tools/third_party/gsutil exists. Remove comment, it's fine. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:31: out_q.put('Thread %d is done' % thread_num) That's a race condition. You should enqueue None when done and have the workers quit on None items. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:53: # Main logic of the script goes here. Remove. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:140: parser.add_option('-o', '--output', default=None, Same comments than on the other file. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:6: """Script to upload files to Google Storage.""" """Uploads files to Google Storage.""" https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:21: GSUTIL_DEFAULT_PATH = os.path.join(os.path.dirname(os.path.normpath(__file__)), s/normpath/abspath/ But you can't commit this as-is, unless you move this file into a subdirectory. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:41: Remove one line https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:53: remote_md5 = etag_match.groups()[0] remote_md5 = etag_match.group(1) https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:67: 2 lines https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:70: parser.add_option('-b', '--bucket', default='chrome-artifacts', I prefer no default, at least not if this file is going to live in depot_tools https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:72: parser.add_option('-e', '--boto', default=None, No need for default=None https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:74: parser.add_option('-f', '--force', action='store_true', default=False, No need for default=False, same below. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:76: parser.add_option('-g', '--gsutil_path', default=GSUTIL_DEFAULT_PATH, Why this argument at all if gsutil is included in depot_tools? https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:83: parser.add_option('-s', '--skip_hashing', action='store_true', default=False, Why not the default? Same for --use_md5. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:96: input_filenames = [line for line in sys.stdin.read().split('\0')] input_filenames = sys.stdin.read().split('\0') https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:98: input_filenames = [line.strip() for line in sys.stdin.readlines()] Technically, you would want to have it work with a file ending it a whitespace, you'd want input_files = sys.stdin.read().splitlines() https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:125: # We want to hash everything in a single thread since its faster. I don't understand why it'd be faster since it's CPU bound. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:139: with open(filename + '.sha1', 'w') as f: 'wb' https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:148: # We only want one MD5 calculation happening at a time. Why? https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:153: t = threading.Thread(target=_upload_worker, args=[thread_num, Don't split arguments like that https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:167: two lines
https://codereview.chromium.org/12042069/diff/20105/common.py File common.py (right): https://codereview.chromium.org/12042069/diff/20105/common.py#newcode4 common.py:4: # common.py: Utility functions common to the Google storage scripts. On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > docstring Done. https://codereview.chromium.org/12042069/diff/20105/common.py#newcode7 common.py:7: import hashlib On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > sort Done. https://codereview.chromium.org/12042069/diff/20105/common.py#newcode11 common.py:11: On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > 2 lines Done. https://codereview.chromium.org/12042069/diff/20105/common.py#newcode13 common.py:13: """A convenience class to call gsutil with some predefined settings.""" On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > """Calls gsutil ... Done. https://codereview.chromium.org/12042069/diff/20105/common.py#newcode18 common.py:18: On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Remove empty line Done. https://codereview.chromium.org/12042069/diff/20105/common.py#newcode69 common.py:69: def CheckSHA1(sha1_sum, filename): On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > I don't think this function is super useful. Removed. https://codereview.chromium.org/12042069/diff/20105/common.py#newcode73 common.py:73: def GetMD5(filename, lock, use_md5): On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Why is this function so different/asymetric than GetSHA1? There is an option to cache the md5 sum into a file, whereas GetSHA1 doesn't deal with caching. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:6: """Script to download files from Google Storage.""" On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > """Downloads .. > > We know it's a script. Done. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:19: On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > 2 lines Done. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:20: # Assuming depot_tools/third_party/gsutil exists. On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Remove comment, it's fine. Done. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:31: out_q.put('Thread %d is done' % thread_num) On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > That's a race condition. You should enqueue None when done and have the workers > quit on None items. Done. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:53: # Main logic of the script goes here. On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Remove. Done. https://codereview.chromium.org/12042069/diff/20105/download_from_google_stor... download_from_google_storage.py:140: parser.add_option('-o', '--output', default=None, On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Same comments than on the other file. Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:6: """Script to upload files to Google Storage.""" On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > """Uploads files to Google Storage.""" Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:21: GSUTIL_DEFAULT_PATH = os.path.join(os.path.dirname(os.path.normpath(__file__)), On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > s/normpath/abspath/ > > But you can't commit this as-is, unless you move this file into a subdirectory. Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:41: On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Remove one line Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:53: remote_md5 = etag_match.groups()[0] On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > remote_md5 = etag_match.group(1) Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:67: On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > 2 lines Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:70: parser.add_option('-b', '--bucket', default='chrome-artifacts', On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > I prefer no default, at least not if this file is going to live in depot_tools Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:72: parser.add_option('-e', '--boto', default=None, On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > No need for default=None Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:74: parser.add_option('-f', '--force', action='store_true', default=False, On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > No need for default=False, same below. Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:76: parser.add_option('-g', '--gsutil_path', default=GSUTIL_DEFAULT_PATH, Removed. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:83: parser.add_option('-s', '--skip_hashing', action='store_true', default=False, On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Why not the default? Same for --use_md5. I'm avoiding the situation where you modify the file, but forget to remove the .sha1 file. This way the default always ensure that the .sha1 file is correct. The flag should only be turned on if the user is confident that the .sha1 files accurate reflect the latest changes. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:96: input_filenames = [line for line in sys.stdin.read().split('\0')] On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > input_filenames = sys.stdin.read().split('\0') Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:98: input_filenames = [line.strip() for line in sys.stdin.readlines()] On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Technically, you would want to have it work with a file ending it a whitespace, > you'd want > input_files = sys.stdin.read().splitlines() Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:125: # We want to hash everything in a single thread since its faster. On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > I don't understand why it'd be faster since it's CPU bound. We are most definitely IO bound at harddrive read speed when calculating MD5 sums. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:139: with open(filename + '.sha1', 'w') as f: On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > 'wb' Done. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:148: # We only want one MD5 calculation happening at a time. On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Why? Harddrive IO bound. The harddrive read head jitters if we try to read two files at a time. I found that reading two files at a time doesn't actually increase read speed. In fact running 10 threads is significantly slower than 1 thread at this stage. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:153: t = threading.Thread(target=_upload_worker, args=[thread_num, On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > Don't split arguments like that ??? What did I do? https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:167: On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > two lines Done.
Can you split this CL in two? - First CL: all the third_party. What's the exact size BTW? - Second CL: the helper scripts. From what I understand, they'll be used in the CI? If so, they need unit tests. https://codereview.chromium.org/12042069/diff/20105/common.py File common.py (right): https://codereview.chromium.org/12042069/diff/20105/common.py#newcode73 common.py:73: def GetMD5(filename, lock, use_md5): On 2013/02/22 02:38:00, Ryan T. wrote: > On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > > Why is this function so different/asymetric than GetSHA1? > > There is an option to cache the md5 sum into a file, whereas GetSHA1 doesn't > deal with caching. Then wrap the caching logic into a separate function. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:125: # We want to hash everything in a single thread since its faster. On 2013/02/22 02:38:00, Ryan T. wrote: > On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > > I don't understand why it'd be faster since it's CPU bound. > > We are most definitely IO bound at harddrive read speed when calculating MD5 > sums. Err right, sorry. https://codereview.chromium.org/12042069/diff/20105/upload_to_google_storage.... upload_to_google_storage.py:153: t = threading.Thread(target=_upload_worker, args=[thread_num, On 2013/02/22 02:38:00, Ryan T. wrote: > On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > > Don't split arguments like that > > ??? What did I do? t = threading.Thread( target=_upload_worker, args=[ thread_num, upload_queue, base_url, gsutil.clone(), options, md5_lock]) or as you prefer, but don't split an argument on two list without first putting it first on the line. https://codereview.chromium.org/12042069/diff/11143/common.py File common.py (right): https://codereview.chromium.org/12042069/diff/11143/common.py#newcode10 common.py:10: import subprocess2 It's not a stdlib, so put it in a subgroup after, e.g. import hashlib import os import re import sys import subprocess2 class Gsutil(object): ... https://codereview.chromium.org/12042069/diff/11143/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/11143/download_from_google_stor... download_from_google_storage.py:87: dirs.remove(item) You can't mutate a list while enumerating it. Did you test the code? if not options.recursive: for item in dirs[:]: dirs.remove(item) else: for exclude in ('.svn', '.git'): if exclude in dirs: dirs.remove(exclude) https://codereview.chromium.org/12042069/diff/11143/download_from_google_stor... download_from_google_storage.py:91: with open(full_path) as f: with open(full_path, 'rb') as f: https://codereview.chromium.org/12042069/diff/11143/download_from_google_stor... download_from_google_storage.py:94: work_queue.put((sha1_match.groups(1)[0], work_queue.put( (sha1_match.groups(1)[0], full_path.replace('.sha1', ''))) otherwise it's hard to glance it there's one or 2 arguments. Likewise using number written as word and numbers in the same sentence is harder to read quickly. :) https://codereview.chromium.org/12042069/diff/11143/download_from_google_stor... download_from_google_storage.py:106: output_queue = Queue.Queue() # For printing out to stdio. optional nit: Then name it stdout_queue? It'll be slightly more self-explanative and the comment will not be needed.. https://codereview.chromium.org/12042069/diff/11143/download_from_google_stor... download_from_google_storage.py:108: t = threading.Thread(target=_downloader_worker_thread, args=[thread_num, don't split args. https://codereview.chromium.org/12042069/diff/11143/download_from_google_stor... download_from_google_storage.py:113: work_queue.put((None, None)) # Used to tell worker threads to stop. Technically, you should start the threads first, then iterate, then send the (None, None) items to stop the threads. It would reduce the latency significantly. https://codereview.chromium.org/12042069/diff/11143/download_from_google_stor... download_from_google_storage.py:165: parser.add_option('-g', '--gsutil_path', default=GSUTIL_DEFAULT_PATH, Either align all at +4 or all at (. I prefer +4 but tolerate at ( because everybody disagree with me. https://codereview.chromium.org/12042069/diff/11143/gsdl File gsdl (right): https://codereview.chromium.org/12042069/diff/11143/gsdl#newcode8 gsdl:8: PYTHONDONTWRITEBYTECODE=1 exec python "$base_dir/download_from_google_storage.py" "$@" Are these needed? The question is: does devs will use them at the command line? If not, it's probably better to not include these wrappers. https://codereview.chromium.org/12042069/diff/11143/third_party/gsutil/README File third_party/gsutil/README (right): https://codereview.chromium.org/12042069/diff/11143/third_party/gsutil/README... third_party/gsutil/README:1: This directory contains the Python command line tool gsutil, which Google Add README.chromium with the standard template. You can find it in src/third_party. https://codereview.chromium.org/12042069/diff/11143/third_party/gsutil/boto/t... File third_party/gsutil/boto/tests/__init__.py (right): https://codereview.chromium.org/12042069/diff/11143/third_party/gsutil/boto/t... third_party/gsutil/boto/tests/__init__.py:1: # Copyright (c) 2006-2011 Mitch Garnaat http://garnaat.org/ Don't include tests. Note this fact in the README.chromium file. https://codereview.chromium.org/12042069/diff/11143/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/11143/upload_to_google_storage.... upload_to_google_storage.py:16: from common import Gsutil Replace with: import common https://codereview.chromium.org/12042069/diff/11143/upload_to_google_storage.... upload_to_google_storage.py:47: if gsutil.check_call('ls', file_url)[0] == 0 and not options.force: I don't see gsutil being defined anywhere, did you test this code? https://codereview.chromium.org/12042069/diff/11143/upload_to_google_storage.... upload_to_google_storage.py:77: parser.add_option('-m', '--use_md5', action='store_true', default=False, Remove default=False everywhere, it's unnecessary. https://codereview.chromium.org/12042069/diff/11143/upload_to_google_storage.... upload_to_google_storage.py:90: if len(args) < 1: if not args: https://codereview.chromium.org/12042069/diff/11143/upload_to_google_storage.... upload_to_google_storage.py:113: # Check if we have permissions to the Google Storage bucket. Can you split the rest of this code into its separate function? I'm a big fan of splitting argument processing from actual "main processing". https://codereview.chromium.org/12042069/diff/11143/upload_to_google_storage.... upload_to_google_storage.py:155: t = threading.Thread(target=_upload_worker, args=[thread_num, Argument alignement Start the threads before enqueuing.
+iannucci and szager Robbie, can you take a look at Ryan's CL?
One general question before I start... Why have two scripts for interacting with google storage instead of one (with a parameter)?
On 2013/02/26 18:03:33, iannucci (DO NOT WANT) wrote: > One general question before I start... Why have two scripts for interacting with > google storage instead of one (with a parameter)? Also, common.py seems like a really generic module name to have in the depot_tools root. My guess is that it'll become a dumping ground for 'stuff', which I'd like to avoid. Why not just have it all in one script called gstool or something?
Regarding the two scripts - Its because one is meant to be called by a hook (downloading) while the other one is meant to be called by the developer (uploading). It seems less ambiguous about whether you're uploading or downloading this way. But mostly because it made sense at the time, since that way in the deps file you'd write ["download_from_google_storage.py", "blah......", "options...."] which is a lot more self-explanatory than ["gstool", "download", "blah..."] common.py has been renamed to gstools.py A few unittests have been added. https://chromiumcodereview.appspot.com/12042069/diff/20105/common.py File common.py (right): https://chromiumcodereview.appspot.com/12042069/diff/20105/common.py#newcode73 common.py:73: def GetMD5(filename, lock, use_md5): On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > On 2013/02/22 02:38:00, Ryan T. wrote: > > On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > > > Why is this function so different/asymetric than GetSHA1? > > > > There is an option to cache the md5 sum into a file, whereas GetSHA1 doesn't > > deal with caching. > > Then wrap the caching logic into a separate function. Done. https://chromiumcodereview.appspot.com/12042069/diff/20105/upload_to_google_s... File upload_to_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/20105/upload_to_google_s... upload_to_google_storage.py:153: t = threading.Thread(target=_upload_worker, args=[thread_num, On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > On 2013/02/22 02:38:00, Ryan T. wrote: > > On 2013/02/22 01:15:56, Marc-Antoine Ruel wrote: > > > Don't split arguments like that > > > > ??? What did I do? > > > t = threading.Thread( > target=_upload_worker, > args=[ > thread_num, upload_queue, base_url, > gsutil.clone(), options, md5_lock]) > > or as you prefer, but don't split an argument on two list without first putting > it first on the line. Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/common.py File common.py (right): https://chromiumcodereview.appspot.com/12042069/diff/11143/common.py#newcode10 common.py:10: import subprocess2 On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > It's not a stdlib, so put it in a subgroup after, e.g. > > import hashlib > import os > import re > import sys > > import subprocess2 > > > class Gsutil(object): > ... Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/download_from_goog... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/11143/download_from_goog... download_from_google_storage.py:87: dirs.remove(item) On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > You can't mutate a list while enumerating it. Did you test the code? > > if not options.recursive: > for item in dirs[:]: > dirs.remove(item) > else: > for exclude in ('.svn', '.git'): > if exclude in dirs: > dirs.remove(exclude) Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/download_from_goog... download_from_google_storage.py:91: with open(full_path) as f: On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > with open(full_path, 'rb') as f: Done, but why binary? I'm expecting an ascii file. https://chromiumcodereview.appspot.com/12042069/diff/11143/download_from_goog... download_from_google_storage.py:94: work_queue.put((sha1_match.groups(1)[0], On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > work_queue.put( > (sha1_match.groups(1)[0], full_path.replace('.sha1', ''))) > > otherwise it's hard to glance it there's one or 2 arguments. > > Likewise using number written as word and numbers in the same sentence is harder > to read quickly. :) Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/download_from_goog... download_from_google_storage.py:106: output_queue = Queue.Queue() # For printing out to stdio. On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > optional nit: Then name it stdout_queue? It'll be slightly more self-explanative > and the comment will not be needed.. Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/download_from_goog... download_from_google_storage.py:108: t = threading.Thread(target=_downloader_worker_thread, args=[thread_num, On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > don't split args. Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/download_from_goog... download_from_google_storage.py:113: work_queue.put((None, None)) # Used to tell worker threads to stop. On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > Technically, you should start the threads first, then iterate, then send the > (None, None) items to stop the threads. It would reduce the latency > significantly. Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/download_from_goog... download_from_google_storage.py:165: parser.add_option('-g', '--gsutil_path', default=GSUTIL_DEFAULT_PATH, On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > Either align all at +4 or all at (. I prefer +4 but tolerate at ( because > everybody disagree with me. Oops this should've been removed anyways. https://chromiumcodereview.appspot.com/12042069/diff/11143/gsdl File gsdl (right): https://chromiumcodereview.appspot.com/12042069/diff/11143/gsdl#newcode8 gsdl:8: PYTHONDONTWRITEBYTECODE=1 exec python "$base_dir/download_from_google_storage.py" "$@" On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > Are these needed? The question is: does devs will use them at the command line? > If not, it's probably better to not include these wrappers. I added these in per xusydoc@'s suggestion. download_from_google_storage is mean to be used within a DEPS file or in submodules as a post checkout hook. upload_to_google_storage is mean to be called by the dev manually. I could leave gsul in and remove gsdl, but then it'll feel lopsided. https://chromiumcodereview.appspot.com/12042069/diff/11143/upload_to_google_s... File upload_to_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/11143/upload_to_google_s... upload_to_google_storage.py:16: from common import Gsutil On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > Replace with: > import common Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/upload_to_google_s... upload_to_google_storage.py:47: if gsutil.check_call('ls', file_url)[0] == 0 and not options.force: On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > I don't see gsutil being defined anywhere, did you test this code? The gsutil object is initialized in main() and passed into the worker thread as an argument. https://chromiumcodereview.appspot.com/12042069/diff/11143/upload_to_google_s... upload_to_google_storage.py:77: parser.add_option('-m', '--use_md5', action='store_true', default=False, On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > Remove default=False everywhere, it's unnecessary. Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/upload_to_google_s... upload_to_google_storage.py:90: if len(args) < 1: On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > if not args: Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/upload_to_google_s... upload_to_google_storage.py:113: # Check if we have permissions to the Google Storage bucket. On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > Can you split the rest of this code into its separate function? I'm a big fan of > splitting argument processing from actual "main processing". Done. https://chromiumcodereview.appspot.com/12042069/diff/11143/upload_to_google_s... upload_to_google_storage.py:155: t = threading.Thread(target=_upload_worker, args=[thread_num, On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > Argument alignement > Start the threads before enqueuing. Done.
https://chromiumcodereview.appspot.com/12042069/diff/11143/gsdl File gsdl (right): https://chromiumcodereview.appspot.com/12042069/diff/11143/gsdl#newcode8 gsdl:8: PYTHONDONTWRITEBYTECODE=1 exec python "$base_dir/download_from_google_storage.py" "$@" On 2013/02/27 02:06:56, Ryan T. wrote: > On 2013/02/25 15:15:06, Marc-Antoine Ruel wrote: > > Are these needed? The question is: does devs will use them at the command > line? > > If not, it's probably better to not include these wrappers. > > I added these in per xusydoc@'s suggestion. > download_from_google_storage is mean to be used within a DEPS file or in > submodules as a post checkout hook. > upload_to_google_storage is mean to be called by the dev manually. > > I could leave gsul in and remove gsdl, but then it'll feel lopsided. Then I prefer neither. I don't think it's the kind of tool used enough to warrant a wrapper. Wrappers can be added later if wanted. https://chromiumcodereview.appspot.com/12042069/diff/12249/download_from_goog... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/12249/download_from_goog... download_from_google_storage.py:40: sha1_match = re.search('^([A-Za-z0-9]{40})$', f.read(1024)) s/search/match/ ? You may want to .rstrip() what you read(). https://chromiumcodereview.appspot.com/12042069/diff/12249/download_from_goog... download_from_google_storage.py:92: work_queue.put((None, None)) # Used to tell worker threads to stop. You have to do this after enqueuing the items. I don't think this code was ever run. https://chromiumcodereview.appspot.com/12042069/diff/12249/download_from_goog... download_from_google_storage.py:100: for t in all_threads: Use t.join() Do not busy loop. https://chromiumcodereview.appspot.com/12042069/diff/12249/gstools.py File gstools.py (right): https://chromiumcodereview.appspot.com/12042069/diff/12249/gstools.py#newcode57 gstools.py:57: 2 lines https://chromiumcodereview.appspot.com/12042069/diff/12249/tests/gstools_unit... File tests/gstools_unittest.py (right): https://chromiumcodereview.appspot.com/12042069/diff/12249/tests/gstools_unit... tests/gstools_unittest.py:26: 2 lines https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... File upload_to_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:43: if filename is None: if not filename: break optional style nit: Personally, I wouldn't keep the comment. https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:62: code = gsutil.call('cp', '-q', filename, file_url) Does this library throws exceptions when receiving an HTTP 500, TCP RST, etc? https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:71: elif len(args) == 1 and args[0] == '-': s/elif/if/ and add an empty line above. It's clearer. https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:74: input_filenames = sys.stdin.read().split('\0') return sys.stdin.read().split('\0') https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:76: input_filenames = sys.stdin.read().splitlines() return https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:77: else: return args remove the "else:" line, not needed.
Wrappers removed https://chromiumcodereview.appspot.com/12042069/diff/12249/download_from_goog... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/12249/download_from_goog... download_from_google_storage.py:40: sha1_match = re.search('^([A-Za-z0-9]{40})$', f.read(1024)) On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > s/search/match/ ? > You may want to .rstrip() what you read(). Done. https://chromiumcodereview.appspot.com/12042069/diff/12249/download_from_goog... download_from_google_storage.py:92: work_queue.put((None, None)) # Used to tell worker threads to stop. On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > You have to do this after enqueuing the items. I don't think this code was ever > run. Doh, moved down. Added another test. https://chromiumcodereview.appspot.com/12042069/diff/12249/download_from_goog... download_from_google_storage.py:100: for t in all_threads: On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > Use t.join() Do not busy loop. It shouldn't' busy loop. It should block on the get() line in 105. I can't join otherwise I wouldn't be able to receive stdout lines from the stdout_queue. https://chromiumcodereview.appspot.com/12042069/diff/12249/gstools.py File gstools.py (right): https://chromiumcodereview.appspot.com/12042069/diff/12249/gstools.py#newcode57 gstools.py:57: On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > 2 lines Done. https://chromiumcodereview.appspot.com/12042069/diff/12249/tests/gstools_unit... File tests/gstools_unittest.py (right): https://chromiumcodereview.appspot.com/12042069/diff/12249/tests/gstools_unit... tests/gstools_unittest.py:26: On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > 2 lines Done. https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... File upload_to_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:43: if filename is None: On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > if not filename: > break > > optional style nit: Personally, I wouldn't keep the comment. Done. https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:62: code = gsutil.call('cp', '-q', filename, file_url) On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > Does this library throws exceptions when receiving an HTTP 500, TCP RST, etc? It doesn't throw an exception, it just returns a non-zero code. It can be made to throw exceptions for non-whitelisted errors though, is that preferable? https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:71: elif len(args) == 1 and args[0] == '-': On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > s/elif/if/ > and add an empty line above. It's clearer. Done. https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:74: input_filenames = sys.stdin.read().split('\0') On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > return sys.stdin.read().split('\0') Done. https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:76: input_filenames = sys.stdin.read().splitlines() On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > return Done. https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:77: else: On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > return args > remove the "else:" line, not needed. Done.
https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... File upload_to_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/12249/upload_to_google_s... upload_to_google_storage.py:62: code = gsutil.call('cp', '-q', filename, file_url) On 2013/02/27 23:34:41, Ryan T. wrote: > On 2013/02/27 21:52:07, Marc-Antoine Ruel wrote: > > Does this library throws exceptions when receiving an HTTP 500, TCP RST, etc? > > It doesn't throw an exception, it just returns a non-zero code. > It can be made to throw exceptions for non-whitelisted errors though, is that > preferable? No, my point was more that occasionally, exceptions like IOError can leak through a library even if they try to handle "most" of them. So it's fine to leave it as-is and look for issues in practice. https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:25: def enumerate_work_queue(input_filename, work_queue, options): Please pass directory, recursive, output as 3 separate arguments. https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:26: work_queue_size = 0 if not directory: work_queue.put((input_filename, options.output)) return 1 then the rest of the code. https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:46: print >> sys.stderr, 'No sha1 sum found in %s.' % filename Is this an error or something to safely ignore? https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:61: out_q.put('File %s exists and SHA1 sum (%s) matches. Skipping.' % ( out_q.put( 'File ... https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:78: def download_from_google_storage(input_filename, base_url, gsutil, options): Accept each option individually instead. https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:99: while True: while not any(t.is_alive() for t in all_threads) and not stdout_queue.empty(): print stdout_queue.get() https://chromiumcodereview.appspot.com/12042069/diff/38002/upload_to_google_s... File upload_to_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/38002/upload_to_google_s... upload_to_google_storage.py:64: print >> sys.stderr, gsutil.stderr This won't affect this process' exit code? https://chromiumcodereview.appspot.com/12042069/diff/38002/upload_to_google_s... upload_to_google_storage.py:111: sha1_sum = gstools.GetSHA1(filename) So you may end up doing both a md5 and sha1 calculation simultaneously? Note that "uploading" a file or "calculating hash" are both using a lot of I/O. The issue at stake here is that uploading is usually _more bound_ to network I/O, but in some case the ratio isn't obvious, some VMs (especially cloud compute engine) could get a clear 1gbit of network I/O so it wouldn't much network bound in practice. I don't know the best way to optimize that automatically, I'm just noting this fact.
https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:25: def enumerate_work_queue(input_filename, work_queue, options): On 2013/02/28 14:53:56, Marc-Antoine Ruel wrote: > Please pass directory, recursive, output as 3 separate arguments. Done. https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:26: work_queue_size = 0 On 2013/02/28 14:53:56, Marc-Antoine Ruel wrote: > if not directory: > work_queue.put((input_filename, options.output)) > return 1 > > then the rest of the code. Done. https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:46: print >> sys.stderr, 'No sha1 sum found in %s.' % filename On 2013/02/28 14:53:56, Marc-Antoine Ruel wrote: > Is this an error or something to safely ignore? Hm. This should probably throw an error and stop the program. But I can see people wanting to ignore this error. I'll add a flag. https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:61: out_q.put('File %s exists and SHA1 sum (%s) matches. Skipping.' % ( On 2013/02/28 14:53:56, Marc-Antoine Ruel wrote: > out_q.put( > 'File ... Done. https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:78: def download_from_google_storage(input_filename, base_url, gsutil, options): On 2013/02/28 14:53:56, Marc-Antoine Ruel wrote: > Accept each option individually instead. Done. https://chromiumcodereview.appspot.com/12042069/diff/38002/download_from_goog... download_from_google_storage.py:99: while True: On 2013/02/28 14:53:56, Marc-Antoine Ruel wrote: > while not any(t.is_alive() for t in all_threads) and not stdout_queue.empty(): > print stdout_queue.get() Done. That's a useful keyword :O Added some stuff because there was a race condition where the "Thread finished" gets queued, printed, and the looped continued, checked that the thread was alive, blocked on get(), and then the last thread died after that. https://chromiumcodereview.appspot.com/12042069/diff/38002/upload_to_google_s... File upload_to_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/38002/upload_to_google_s... upload_to_google_storage.py:64: print >> sys.stderr, gsutil.stderr On 2013/02/28 14:53:56, Marc-Antoine Ruel wrote: > This won't affect this process' exit code? It probably should. Updated. https://chromiumcodereview.appspot.com/12042069/diff/38002/upload_to_google_s... upload_to_google_storage.py:111: sha1_sum = gstools.GetSHA1(filename) On 2013/02/28 14:53:56, Marc-Antoine Ruel wrote: > So you may end up doing both a md5 and sha1 calculation simultaneously? > > Note that "uploading" a file or "calculating hash" are both using a lot of I/O. > The issue at stake here is that uploading is usually _more bound_ to network > I/O, but in some case the ratio isn't obvious, some VMs (especially cloud > compute engine) could get a clear 1gbit of network I/O so it wouldn't much > network bound in practice. > > I don't know the best way to optimize that automatically, I'm just noting this > fact. gsutil doesn't maximize network transfer bandwidth for a single file for some reason, which is the main reason this is multi-threaded. So we don't get the full benefit of the 1Gbps unless we open like 10 connections :)
https://chromiumcodereview.appspot.com/12042069/diff/43002/download_from_goog... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/43002/download_from_goog... download_from_google_storage.py:28: with open(input_filename, 'rb') as f: is it fine if it throws an exception when the file is not present or not readable? https://chromiumcodereview.appspot.com/12042069/diff/43002/download_from_goog... download_from_google_storage.py:30: if sha1_match: It's awkward that you silently ignore invalid .sha1 files. https://chromiumcodereview.appspot.com/12042069/diff/43002/gstools.py File gstools.py (right): https://chromiumcodereview.appspot.com/12042069/diff/43002/gstools.py#newcode1 gstools.py:1: # Copyright (c) 2012 The Chromium Authors. All rights reserved. Do you mind putting the content of this file in one of two command scripts? This is to reduce the clutter in depot_tools.
gstools.py has been merged back into download_from_google_storage https://chromiumcodereview.appspot.com/12042069/diff/43002/download_from_goog... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/43002/download_from_goog... download_from_google_storage.py:30: if sha1_match: On 2013/03/03 02:13:14, Marc-Antoine Ruel wrote: > It's awkward that you silently ignore invalid .sha1 files. Errors added. Ignorable using the -i flag. https://chromiumcodereview.appspot.com/12042069/diff/43002/gstools.py File gstools.py (right): https://chromiumcodereview.appspot.com/12042069/diff/43002/gstools.py#newcode1 gstools.py:1: # Copyright (c) 2012 The Chromium Authors. All rights reserved. On 2013/03/03 02:13:14, Marc-Antoine Ruel wrote: > Do you mind putting the content of this file in one of two command scripts? This > is to reduce the clutter in depot_tools. Moved
https://chromiumcodereview.appspot.com/12042069/diff/37015/download_from_goog... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/37015/download_from_goog... download_from_google_storage.py:102: def GetMD5(filename, lock): This functions is not used in this file, please move to upload. https://chromiumcodereview.appspot.com/12042069/diff/37015/download_from_goog... download_from_google_storage.py:114: def GetMD5Cached(filename, lock): This function is not used in this file, please move to upload https://chromiumcodereview.appspot.com/12042069/diff/37015/download_from_goog... download_from_google_storage.py:229: while not work_queue.empty() or any(t.is_alive() for t in all_threads): There's a race condition in there; - The last thread is exiting but t.is_alive() is still true. - Nothing is going to be queued. This will result in a hang at line 230.
I think switching it to "and" will solve the race condition. What do you think? https://codereview.chromium.org/12042069/diff/37015/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/37015/download_from_google_stor... download_from_google_storage.py:102: def GetMD5(filename, lock): On 2013/03/05 02:04:08, Marc-Antoine Ruel (Google) wrote: > This functions is not used in this file, please move to upload. Done. https://codereview.chromium.org/12042069/diff/37015/download_from_google_stor... download_from_google_storage.py:114: def GetMD5Cached(filename, lock): On 2013/03/05 02:04:08, Marc-Antoine Ruel (Google) wrote: > This function is not used in this file, please move to upload Done. https://codereview.chromium.org/12042069/diff/37015/download_from_google_stor... download_from_google_storage.py:229: while not work_queue.empty() or any(t.is_alive() for t in all_threads): On 2013/03/05 02:04:08, Marc-Antoine Ruel (Google) wrote: > There's a race condition in there; > - The last thread is exiting but t.is_alive() is still true. > - Nothing is going to be queued. > > This will result in a hang at line 230. changed or -> and * If the queue not empty, then there will be one or more threads that will queue into stdout_queue. Meaning 230 will not hang since there will guaranteed to be one or more items in queue, or to be queued later. * If the queue is empty, we exit this loop to avoid any race conditions between queueing stdouts and thread exits.
https://codereview.chromium.org/12042069/diff/48001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/48001/download_from_google_stor... download_from_google_storage.py:77: # Check if we have permissions to the Google Storage bucket. You can remove this comment. https://codereview.chromium.org/12042069/diff/48001/download_from_google_stor... download_from_google_storage.py:202: while not work_queue.empty() and any(t.is_alive() for t in all_threads): There's a possibility that the code would get there and no entry would have been appended to work_queue yet. It's possible on a single-cpu system with a single large file. Then this check would immediately fall through line 204 and could possibly miss items. This need to be reworked to be race-condition free. https://codereview.chromium.org/12042069/diff/48001/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/48001/upload_to_google_storage.... upload_to_google_storage.py:43: def GetMD5(filename, lock): I don't think it's useful to pass 'lock' to this function, you should wrap the calls instead, it'd make more sense.
https://codereview.chromium.org/12042069/diff/48001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/48001/download_from_google_stor... download_from_google_storage.py:77: # Check if we have permissions to the Google Storage bucket. On 2013/03/06 19:49:59, Marc-Antoine Ruel wrote: > You can remove this comment. Done. https://codereview.chromium.org/12042069/diff/48001/download_from_google_stor... download_from_google_storage.py:202: while not work_queue.empty() and any(t.is_alive() for t in all_threads): On 2013/03/06 19:49:59, Marc-Antoine Ruel wrote: > There's a possibility that the code would get there and no entry would have been > appended to work_queue yet. It's possible on a single-cpu system with a single > large file. Then this check would immediately fall through line 204 and could > possibly miss items. This need to be reworked to be race-condition free. isn't appending to a Queue() synchronous? If we pass lines 195 and 199 can't we guarantee that the work queue isn't empty? https://codereview.chromium.org/12042069/diff/48001/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/48001/upload_to_google_storage.... upload_to_google_storage.py:43: def GetMD5(filename, lock): On 2013/03/06 19:49:59, Marc-Antoine Ruel wrote: > I don't think it's useful to pass 'lock' to this function, you should wrap the > calls instead, it'd make more sense. Done.
https://codereview.chromium.org/12042069/diff/48001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/48001/download_from_google_stor... download_from_google_storage.py:202: while not work_queue.empty() and any(t.is_alive() for t in all_threads): On 2013/03/06 20:19:08, Ryan T. wrote: > On 2013/03/06 19:49:59, Marc-Antoine Ruel wrote: > > There's a possibility that the code would get there and no entry would have > been > > appended to work_queue yet. It's possible on a single-cpu system with a single > > large file. Then this check would immediately fall through line 204 and could > > possibly miss items. This need to be reworked to be race-condition free. > > isn't appending to a Queue() synchronous? If we pass lines 195 and 199 can't we > guarantee that the work queue isn't empty? oh nevermind I see what you're saying.
Moved printing to different thread, how does that look?
https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... download_from_google_storage.py:71: def CheckBucketPermissions(bucket, gsutil): Since it's new code; either use CamelCaps for functions or underscore_separator. I prefer the later for new code. https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... download_from_google_storage.py:109: raise Exception('%s not found.' % input_filename) Use a proper exception type. You can create a new class for that. https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... download_from_google_storage.py:116: print >> sys.stderr, 'No sha1 sum found in %s.' % input_filename You want to print after the raise statement so that code that raises doesn't print; it's the code handling the exception that should print. https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... download_from_google_storage.py:290: raise NotImplementedError('Unreachable state.') parser.error(). It's not useful to raise from main(). https://codereview.chromium.org/12042069/diff/59002/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/59002/upload_to_google_storage.... upload_to_google_storage.py:143: hash_timer = time.time() # For timing statistics. Remove the comment on that line. hashing_start https://codereview.chromium.org/12042069/diff/59002/upload_to_google_storage.... upload_to_google_storage.py:158: hash_time = time.time() - hash_timer hashing_duration
https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... download_from_google_storage.py:71: def CheckBucketPermissions(bucket, gsutil): On 2013/03/07 16:22:46, Marc-Antoine Ruel wrote: > Since it's new code; either use CamelCaps for functions or underscore_separator. > I prefer the later for new code. Done. https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... download_from_google_storage.py:109: raise Exception('%s not found.' % input_filename) On 2013/03/07 16:22:46, Marc-Antoine Ruel wrote: > Use a proper exception type. You can create a new class for that. Done. https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... download_from_google_storage.py:116: print >> sys.stderr, 'No sha1 sum found in %s.' % input_filename On 2013/03/07 16:22:46, Marc-Antoine Ruel wrote: > You want to print after the raise statement so that code that raises doesn't > print; it's the code handling the exception that should print. Done. https://codereview.chromium.org/12042069/diff/59002/download_from_google_stor... download_from_google_storage.py:290: raise NotImplementedError('Unreachable state.') On 2013/03/07 16:22:46, Marc-Antoine Ruel wrote: > parser.error(). It's not useful to raise from main(). Done. https://codereview.chromium.org/12042069/diff/59002/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/59002/upload_to_google_storage.... upload_to_google_storage.py:143: hash_timer = time.time() # For timing statistics. On 2013/03/07 16:22:46, Marc-Antoine Ruel wrote: > Remove the comment on that line. > hashing_start Done. https://codereview.chromium.org/12042069/diff/59002/upload_to_google_storage.... upload_to_google_storage.py:158: hash_time = time.time() - hash_timer On 2013/03/07 16:22:46, Marc-Antoine Ruel wrote: > hashing_duration Done.
https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:162: out_q.put('Thread %d is done' % thread_num) I'd prefer you to prefix all the messages with '%d>' % thread_num and then use a constant formatting, e.g. the file name then the message, so something like: '%d> %s: file existed' % (thread_num, output_filename) I don't think it's useful to print when a thread is done, the user doesn't care. https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:167: 'File %s exists and SHA1 sum (%s) matches. Skipping.' % ( I don't think it's useful to print the hash, it's a 40 characters string.. https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:178: code, _, err = gsutil.check_call('cp', '-q', file_url, output_filename) Will it fail if the file was already present? https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:187: # Its pausible we want to print empty lines. plausible https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:217: 3 lines -> 1 line https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:219: # Wait for all downloads to finish. You should start this thread because starting to enumerate. https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.... upload_to_google_storage.py:55: def get_md5_cached(filename): BTW, I'm fine with using a lock here if you prefer, so that the lock is only used when the md5 is calculated. An option is to merge this function back into upload_worker. https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.... upload_to_google_storage.py:80: _, out, _ = gsutil.check_call('ls', '-L', file_url) If the error is 403, it will still try uploading? Do you think it could happen? I don't think it's a big deal, I just want to confirm. https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.... upload_to_google_storage.py:178: for ret_code, message in ret_codes.queue: That works? https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.... upload_to_google_storage.py:183: if not max_ret_code: I don't think this block is useful, the error messages were printing at line 181.
https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:162: out_q.put('Thread %d is done' % thread_num) On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > I'd prefer you to prefix all the messages with > '%d>' % thread_num > and then use a constant formatting, e.g. the file name then the message, so > something like: > '%d> %s: file existed' % (thread_num, output_filename) > > I don't think it's useful to print when a thread is done, the user doesn't care. Removed then https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:167: 'File %s exists and SHA1 sum (%s) matches. Skipping.' % ( On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > I don't think it's useful to print the hash, it's a 40 characters string.. Done, an error code is queued in the return code queue so it fails. https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:178: code, _, err = gsutil.check_call('cp', '-q', file_url, output_filename) On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > Will it fail if the file was already present? It won't fail, "gsutil cp" will just overwrite the file. If the file is already present and correct, line 167 would skip the file. https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:187: # Its pausible we want to print empty lines. On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > plausible Done. https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:217: On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > 3 lines -> 1 line Done. https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:219: # Wait for all downloads to finish. On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > You should start this thread because starting to enumerate. Done. https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.... upload_to_google_storage.py:55: def get_md5_cached(filename): On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > BTW, I'm fine with using a lock here if you prefer, so that the lock is only > used when the md5 is calculated. An option is to merge this function back into > upload_worker. Either's okay. This actually looks cleaner, and also prevents the (minor/almost non-existant) issue of thrashing caused by reading md5 hash files while calculating the md5 of another file. https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.... upload_to_google_storage.py:80: _, out, _ = gsutil.check_call('ls', '-L', file_url) On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > If the error is 403, it will still try uploading? Do you think it could happen? > I don't think it's a big deal, I just want to confirm. Hm.. if it 403s, it'll still try to upload, and fail on line 96 when it actually does the upload. Gsutil doesn't actually seem have a good way to check for permissions before uploading. The permission checks before are really just a quick sanity check to see if we can even see the bucket. It should be okay since cp will fail anyways. https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.... upload_to_google_storage.py:178: for ret_code, message in ret_codes.queue: On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > That works? ret_codes.queue returns an iterable queue. I believe it works. Wrote a test and confirmed it. https://codereview.chromium.org/12042069/diff/62001/upload_to_google_storage.... upload_to_google_storage.py:183: if not max_ret_code: On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > I don't think this block is useful, the error messages were printing at line > 181. Doen: Removed the else clause, and made success sound more exciting.
https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/62001/download_from_google_stor... download_from_google_storage.py:178: code, _, err = gsutil.check_call('cp', '-q', file_url, output_filename) On 2013/03/07 20:35:18, Ryan T. wrote: > On 2013/03/07 19:41:22, Marc-Antoine Ruel wrote: > > Will it fail if the file was already present? > > It won't fail, "gsutil cp" will just overwrite the file. > > If the file is already present and correct, line 167 would skip the file. Perfect, I just wanted to make sure you asserted that. https://codereview.chromium.org/12042069/diff/63013/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/63013/download_from_google_stor... download_from_google_storage.py:46: if self.boto_path is not None: if self.boto_path: https://codereview.chromium.org/12042069/diff/63013/download_from_google_stor... download_from_google_storage.py:54: if self.boto_path is not None: if self.boto_path: https://codereview.chromium.org/12042069/diff/63013/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/63013/upload_to_google_storage.... upload_to_google_storage.py:151: upload_queue.put((filename, open('%s.sha1' % filename).read())) with open(filename + '.sha1', 'rb') as f: upload_queue.put((filename, f.read()) otherwise the handles could accumulate. Also, you never validate the content of the .sha1 file unlike how you do with .md5 files.
https://codereview.chromium.org/12042069/diff/63013/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/63013/download_from_google_stor... download_from_google_storage.py:46: if self.boto_path is not None: On 2013/03/07 22:26:56, Marc-Antoine Ruel wrote: > if self.boto_path: Done. https://codereview.chromium.org/12042069/diff/63013/download_from_google_stor... download_from_google_storage.py:54: if self.boto_path is not None: On 2013/03/07 22:26:56, Marc-Antoine Ruel wrote: > if self.boto_path: Done. https://codereview.chromium.org/12042069/diff/63013/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/63013/upload_to_google_storage.... upload_to_google_storage.py:151: upload_queue.put((filename, open('%s.sha1' % filename).read())) On 2013/03/07 22:26:56, Marc-Antoine Ruel wrote: > with open(filename + '.sha1', 'rb') as f: > upload_queue.put((filename, f.read()) > > otherwise the handles could accumulate. > > Also, you never validate the content of the .sha1 file unlike how you do with > .md5 files. Done. Also theres sha1 calculations on line 154. It only skips validation if --skip_hashing is passed in. If the skip_hashing flag is used, its up to the user to ensure that the binary files to upload are valid with the .sha1 hashes, otherwise they could screw up the CAS bucket. Tradeoff being they don't have to wait for the sha1s to finish calculating. The other way is to remove the skip_hashing flag, and always have it validate.
https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:67: elif ('You are attempting to access protected data with ' s/elif/if/ in all these since each is returning. So no need for the else: at 72. https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:242: return max_ret_code You understand that the function will always return >= 1 so this program will always act like it failed? https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:288: elif options.recursive and not options.directory: s/elif/if/ since the if conditions exit. https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:292: else: else is not needed. https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:309: raise parser.error('Unreachable state.') raise? https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:318: if os.path.exists(GSUTIL_DEFAULT_PATH): if not os.path... parser.error() https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.... upload_to_google_storage.py:59: with open('%s.md5' % filename) as f: I prefer 'rb' for consistency with the other calls https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.... upload_to_google_storage.py:71: thread_num, q, base_url, gsutil, md5_lock, force, s/q/upload_queue/ https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.... upload_to_google_storage.py:152: upload_queue.put((filename, f.read())) you should limit the read size so the script doesn't end up reading a 1mb .sha1 file by error. Not that it'll likely happen but still. And by "verification", I mean ensuring it's 40 hex characters. https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.... upload_to_google_storage.py:165: printer_thread = threading.Thread(target=printer_worker, args=[stdout_queue]) Start it earlier.
https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:67: elif ('You are attempting to access protected data with ' On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > s/elif/if/ in all these since each is returning. So no need for the else: at 72. Done. https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:242: return max_ret_code On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > You understand that the function will always return >= 1 so this program will > always act like it failed? Only if a non-zero return code has been queue? This should only happen when "gsutil ls gs://file/I/want" or "gsutil cp" fails. Or am I missing something here? https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:288: elif options.recursive and not options.directory: On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > s/elif/if/ since the if conditions exit. Done. https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:292: else: On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > else is not needed. Done. https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:309: raise parser.error('Unreachable state.') On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > raise? Done, derp. https://codereview.chromium.org/12042069/diff/76001/download_from_google_stor... download_from_google_storage.py:318: if os.path.exists(GSUTIL_DEFAULT_PATH): On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > if not os.path... > parser.error() Done. https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.... upload_to_google_storage.py:59: with open('%s.md5' % filename) as f: On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > I prefer 'rb' for consistency with the other calls Done. https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.... upload_to_google_storage.py:71: thread_num, q, base_url, gsutil, md5_lock, force, On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > s/q/upload_queue/ Done. https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.... upload_to_google_storage.py:152: upload_queue.put((filename, f.read())) On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > you should limit the read size so the script doesn't end up reading a 1mb .sha1 > file by error. Not that it'll likely happen but still. > > And by "verification", I mean ensuring it's 40 hex characters. Oops, yes it should definitely verify that its a 40 char hex string. It should also throw an error and exit since something is almost certainly wrong. https://codereview.chromium.org/12042069/diff/76001/upload_to_google_storage.... upload_to_google_storage.py:165: printer_thread = threading.Thread(target=printer_worker, args=[stdout_queue]) On 2013/03/08 02:07:00, Marc-Antoine Ruel wrote: > Start it earlier. Done.
https://codereview.chromium.org/12042069/diff/52002/tests/gstools_unittest.py File tests/gstools_unittest.py (right): https://codereview.chromium.org/12042069/diff/52002/tests/gstools_unittest.py... tests/gstools_unittest.py:392: download_from_google_storage.download_from_google_storage( You don't check the return code here. Split this file in two, one for each script.
https://codereview.chromium.org/12042069/diff/52002/tests/gstools_unittest.py File tests/gstools_unittest.py (right): https://codereview.chromium.org/12042069/diff/52002/tests/gstools_unittest.py... tests/gstools_unittest.py:392: download_from_google_storage.download_from_google_storage( On 2013/03/08 13:27:41, Marc-Antoine Ruel wrote: > You don't check the return code here. > > Split this file in two, one for each script. Done.
https://codereview.chromium.org/12042069/diff/88001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/88001/download_from_google_stor... download_from_google_storage.py:74: def clone(self): Technically, you don't need that. You can use an object across threads in python if you use it in a read-only way, e.g. you never assign to a member of self. So you can: - remove this function. - add a note in the docstring that this object is immutable. - remove clone in the mock. FYI only since I don't think it's a good idea to use it here, Gsutil is simple enough to not need any special enforcement. This is possible to enforce that an object is immutable, I had written a class to enforce that: http://git.chromium.org/gitweb/?p=chromium/tools/commit-queue.git;a=blob;f=mo... https://codereview.chromium.org/12042069/diff/88001/download_from_google_stor... download_from_google_storage.py:200: download_timer = time.time() download_start https://codereview.chromium.org/12042069/diff/88001/download_from_google_stor... download_from_google_storage.py:204: ret_codes.put((0, None)) Good! https://codereview.chromium.org/12042069/diff/88001/tests/download_from_googl... File tests/download_from_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/88001/tests/download_from_googl... tests/download_from_google_storage_unittests.py:45: return self.expected.pop(0)[0] You need a lock for all the function; you could get two call() calls simultaneously, and both poping from self.expected. self.expected is not thread safe itself so an outher lock is needed. https://codereview.chromium.org/12042069/diff/88001/tests/upload_to_google_st... File tests/upload_to_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/88001/tests/upload_to_google_st... tests/upload_to_google_storage_unittests.py:43: upload_to_google_storage.upload_to_google_storage( Check the return value of the function at all the calls.
https://codereview.chromium.org/12042069/diff/88001/download_from_google_stor... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/88001/download_from_google_stor... download_from_google_storage.py:74: def clone(self): On 2013/03/09 12:41:13, Marc-Antoine Ruel wrote: > Technically, you don't need that. You can use an object across threads in python > if you use it in a read-only way, e.g. you never assign to a member of self. So > you can: > - remove this function. > - add a note in the docstring that this object is immutable. > - remove clone in the mock. > > FYI only since I don't think it's a good idea to use it here, Gsutil is simple > enough to not need any special enforcement. This is possible to enforce that an > object is immutable, I had written a class to enforce that: > http://git.chromium.org/gitweb/?p=chromium/tools/commit-queue.git;a=blob;f=mo... > Done. https://codereview.chromium.org/12042069/diff/88001/download_from_google_stor... download_from_google_storage.py:200: download_timer = time.time() On 2013/03/09 12:41:13, Marc-Antoine Ruel wrote: > download_start Done. https://codereview.chromium.org/12042069/diff/88001/tests/download_from_googl... File tests/download_from_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/88001/tests/download_from_googl... tests/download_from_google_storage_unittests.py:45: return self.expected.pop(0)[0] On 2013/03/09 12:41:13, Marc-Antoine Ruel wrote: > You need a lock for all the function; you could get two call() calls > simultaneously, and both poping from self.expected. self.expected is not thread > safe itself so an outher lock is needed. Done, moved lock to call() function https://codereview.chromium.org/12042069/diff/88001/tests/upload_to_google_st... File tests/upload_to_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/88001/tests/upload_to_google_st... tests/upload_to_google_storage_unittests.py:43: upload_to_google_storage.upload_to_google_storage( On 2013/03/09 12:41:13, Marc-Antoine Ruel wrote: > Check the return value of the function at all the calls. Done.
Bug fix in check_call, updated messages to be more terse (doesn't print out md5 or sha1 hashes unless theres an error), main thread needs to use stdout_queue now because worker threads are started before file enumeration. Updated usage string.
https://codereview.chromium.org/12042069/diff/102001/download_from_google_sto... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/102001/download_from_google_sto... download_from_google_storage.py:53: def check_call(self, *args): BTW, this is slightly confusing since the return value is different from subprocess.check_call(), the python 2.7 version. I don't know if this is what you intended to replicate. I don't mind, just noting. https://codereview.chromium.org/12042069/diff/102001/download_from_google_sto... download_from_google_storage.py:244: '(default) a sha1 sum ([A-Za-z0-9]{40}).\n(-s or --sha1_file) a ' Maybe you'd want to preface each enumeration entry by one or two whitespace. The text would be slightly easier to read if there was one line per line, limited at ~72 cols or so. https://codereview.chromium.org/12042069/diff/102001/download_from_google_sto... download_from_google_storage.py:249: help='Specify the output file name. Defaults to:\n' '\n' are ignored in help; verified locally. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... File tests/download_from_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:55: return (0, '', '') Interesting, usually I keep a reference to the TestCase instance and call test.fail(). I don't mind much. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:65: self.assertEquals(gsutil.path, GSUTIL_DEFAULT_PATH) Please use assertEqual() for new code, as per: http://docs.python.org/2/library/unittest.html#deprecated-aliases https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:102: os.remove(lorem_ipsum2_md5) Then use a temporary directory so no files are leaked in the checkout. Use tempfile.mkdtemp(prefix='gsutil') in setUp and shutil.rmtree(self.base_path) in tearDown. Copy the necessary files. Otherwise, you could mock the necessary file system functions, as you prefer. But don't write in the checkout directory during a unit test. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:119: os.path.dirname(os.path.abspath(__file__)), Use a file-level variable. This code would break if any os.chdir() occurred. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:153: result = list(self.queue.queue) self.assertEqual(sorted(expected_queue), sorted(self.queue.queue)) One line instead of 4. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:165: # pylint: disable=W0212 You can disable it at file level. It'll save a few lines in each unit test. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:274: 2 lines https://codereview.chromium.org/12042069/diff/102001/tests/upload_to_google_s... File tests/upload_to_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/102001/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:129: upload_to_google_storage.get_targets([], self.parser, False) Usually I do: try: upload_to_google_storage.get_targets([], self.parser, False) self.fail() except SystemExit, e: self.assertEquals(e.code, 2) which take care of all the possible cases. https://codereview.chromium.org/12042069/diff/102001/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/102001/upload_to_google_storage... upload_to_google_storage.py:66: with open('%s.md5' % filename, 'w') as f: 'wb'
https://codereview.chromium.org/12042069/diff/102001/download_from_google_sto... File download_from_google_storage.py (right): https://codereview.chromium.org/12042069/diff/102001/download_from_google_sto... download_from_google_storage.py:244: '(default) a sha1 sum ([A-Za-z0-9]{40}).\n(-s or --sha1_file) a ' On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > Maybe you'd want to preface each enumeration entry by one or two whitespace. The > text would be slightly easier to read if there was one line per line, limited at > ~72 cols or so. Done. https://codereview.chromium.org/12042069/diff/102001/download_from_google_sto... download_from_google_storage.py:249: help='Specify the output file name. Defaults to:\n' On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > '\n' are ignored in help; verified locally. Done, removed \n's https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... File tests/download_from_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:65: self.assertEquals(gsutil.path, GSUTIL_DEFAULT_PATH) On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > Please use assertEqual() for new code, as per: > http://docs.python.org/2/library/unittest.html#deprecated-aliases Done. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:102: os.remove(lorem_ipsum2_md5) On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > Then use a temporary directory so no files are leaked in the checkout. > > Use tempfile.mkdtemp(prefix='gsutil') in setUp and shutil.rmtree(self.base_path) > in tearDown. Copy the necessary files. > > Otherwise, you could mock the necessary file system functions, as you prefer. > But don't write in the checkout directory during a unit test. Done. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:119: os.path.dirname(os.path.abspath(__file__)), On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > Use a file-level variable. This code would break if any os.chdir() occurred. Done. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:153: result = list(self.queue.queue) On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > self.assertEqual(sorted(expected_queue), sorted(self.queue.queue)) > > One line instead of 4. Done. Keeping the queue_size check since its calculated somewhat manually. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:165: # pylint: disable=W0212 On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > You can disable it at file level. It'll save a few lines in each unit test. Done. https://codereview.chromium.org/12042069/diff/102001/tests/download_from_goog... tests/download_from_google_storage_unittests.py:274: On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > 2 lines Done. https://codereview.chromium.org/12042069/diff/102001/tests/upload_to_google_s... File tests/upload_to_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/102001/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:129: upload_to_google_storage.get_targets([], self.parser, False) On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > Usually I do: > > try: > upload_to_google_storage.get_targets([], self.parser, False) > self.fail() > except SystemExit, e: > self.assertEquals(e.code, 2) > > which take care of all the possible cases. Works for me, Done. https://codereview.chromium.org/12042069/diff/102001/upload_to_google_storage.py File upload_to_google_storage.py (right): https://codereview.chromium.org/12042069/diff/102001/upload_to_google_storage... upload_to_google_storage.py:66: with open('%s.md5' % filename, 'w') as f: On 2013/03/11 19:52:01, Marc-Antoine Ruel wrote: > 'wb' Done.
https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... File tests/upload_to_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:25: 2 lines https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:30: os.path.dirname(os.path.abspath(__file__)), 'gstools') Use a global variable, otherwise this call could be affected by os.chdir(). https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:63: if os.path.exists(output_filename): This is not necessary. https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:105: one line https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:152: sys.stdin = StringIO.StringIO('\0'.join(inputs)) Note that this could affect the following test case. It's usually better to un-mock the globals you mocked. https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:158: 2 lines. When I say it in one file, I won't necessarily repeat the comment for the other files. It's your responsibility to ensure all the files match their style.
https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... File tests/upload_to_google_storage_unittests.py (right): https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:25: On 2013/03/11 22:48:58, Marc-Antoine Ruel wrote: > 2 lines Done. https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:30: os.path.dirname(os.path.abspath(__file__)), 'gstools') On 2013/03/11 22:48:58, Marc-Antoine Ruel wrote: > Use a global variable, otherwise this call could be affected by os.chdir(). Done. https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:63: if os.path.exists(output_filename): On 2013/03/11 22:48:58, Marc-Antoine Ruel wrote: > This is not necessary. Removed. https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:105: On 2013/03/11 22:48:58, Marc-Antoine Ruel wrote: > one line Done. https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:152: sys.stdin = StringIO.StringIO('\0'.join(inputs)) On 2013/03/11 22:48:58, Marc-Antoine Ruel wrote: > Note that this could affect the following test case. It's usually better to > un-mock the globals you mocked. Added sys.stdin = sys.__stdin__ in cleanUp(). https://codereview.chromium.org/12042069/diff/104003/tests/upload_to_google_s... tests/upload_to_google_storage_unittests.py:158: On 2013/03/11 22:48:58, Marc-Antoine Ruel wrote: > 2 lines. When I say it in one file, I won't necessarily repeat the comment for > the other files. It's your responsibility to ensure all the files match their > style. Done.
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/hinoka@google.com/12042069/110001
Message was sent while issue was closed.
Change committed as 187951
Message was sent while issue was closed.
https://chromiumcodereview.appspot.com/12042069/diff/110001/download_from_goo... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/110001/download_from_goo... download_from_google_storage.py:70: return (403, out, err) These lines appear to silence a server error messages and cause the script to incorrectly report receiving a 403. (std_error is not printed in case on line 83 below) Can I ask why we did it this way?
Message was sent while issue was closed.
https://chromiumcodereview.appspot.com/12042069/diff/110001/download_from_goo... File download_from_google_storage.py (right): https://chromiumcodereview.appspot.com/12042069/diff/110001/download_from_goo... download_from_google_storage.py:70: return (403, out, err) On 2013/09/19 12:25:16, Isaac wrote: > These lines appear to silence a server error messages and cause the script to > incorrectly report receiving a 403. (std_error is not printed in case on line > 83 below) > > Can I ask why we did it this way? Its to replace GSResponseError: status=403, code=AccessDenied, reason="Forbidden", message="Access denied.", detail="..." With a more better looking error without losing information. That error message isn't very friendly looking and doesn't give much of a resolution (In our case the most common failure is due to lack of credentials, in which case it should ask the user to run the script with the "config" argument). Of course if we're losing information we should fix that.
Message was sent while issue was closed.
Could we add something like: Server error: "<msg>" Henrik and I were only able to file b/10837616 after he added this as a local diff on the webrtc bots. :-\
Yep https://codereview.chromium.org/23484061/ On Thu, Sep 19, 2013 at 10:47 AM, <ilevy@chromium.org> wrote: > Could we add something like: > > Server error: "<msg>" > > Henrik and I were only able to file b/10837616 after he added this as a > local > diff on the webrtc bots. :-\ > > https://chromiumcodereview.**appspot.com/12042069/<https://chromiumcodereview... > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org. |