|
|
Chromium Code Reviews
DescriptionClient side implementation of new /content-gs isolate protocol.
Replaces existing /content protocol. Also refactor Storage to
use Item class to represent uploaded files instead of tuples.
BUG=289670
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=226265
Patch Set 1 #
Total comments: 7
Patch Set 2 : PendingPush stuff #
Total comments: 6
Patch Set 3 : introduce Item class #
Total comments: 2
Patch Set 4 : simplified a bit #
Total comments: 12
Patch Set 5 : tests #
Total comments: 12
Patch Set 6 : #
Messages
Total messages: 18 (0 generated)
That's a completion of https://codereview.chromium.org/24578004/ with clean diff.
https://codereview.chromium.org/25093003/diff/1/isolateserver.py File isolateserver.py (right): https://codereview.chromium.org/25093003/diff/1/isolateserver.py#newcode529 isolateserver.py:529: self._use_zip = is_namespace_with_compression(namespace) No need to do this right now; I'd like it to be a pair of compression/decompression generators so it could be more easily extended. https://codereview.chromium.org/25093003/diff/1/isolateserver.py#newcode538 isolateserver.py:538: 'protocol_version': ISOLATE_PROTOCOL_VERSION, I think it's better to sort the keys https://codereview.chromium.org/25093003/diff/1/isolateserver.py#newcode632 isolateserver.py:632: assert push_urls and len(push_urls) == 2 I'm confused. push_urls defaults to None. I'd like to hide push_urls away. I feel the object should cache that and reuse the data without the caller having to know. Otherwise this class is incompatible with the other classes. One option is: - Refuse push() if the object wasn't provided to a contains() call first. For consistency, we should probably enforce it on every implementations. - Have this object cache the push_urls for each contains() calls inside the object. The caller doesn't care. wdyt? https://codereview.chromium.org/25093003/diff/1/tests/isolateserver_smoke_tes... File tests/isolateserver_smoke_test.py (right): https://codereview.chromium.org/25093003/diff/1/tests/isolateserver_smoke_tes... tests/isolateserver_smoke_test.py:33: # protocol for validity check and fetches. I guess it should just use "isolateserver.py download" to confirm the download than that's it. I agree it can be done in a follow up CL.
https://codereview.chromium.org/25093003/diff/1/isolateserver.py File isolateserver.py (right): https://codereview.chromium.org/25093003/diff/1/isolateserver.py#newcode529 isolateserver.py:529: self._use_zip = is_namespace_with_compression(namespace) On 2013/09/28 01:42:58, M-A Ruel wrote: > No need to do this right now; I'd like it to be a pair of > compression/decompression generators so it could be more easily extended. It's literally 3 lines of code, that are gonna be needed in some form anyway in next CL. And current tests assume that if namespace doesn't ends in '-gzip', then it's not a compressed one (which is correct). I'd rather not change this. FYI I have a CL that moves async fetch from DiskCache into Storage (and makes whole Storage/StorageApi/Cache classes a lot more symmetric in terms of push/fetch ops) 80% ready. https://codereview.chromium.org/25093003/diff/1/isolateserver.py#newcode538 isolateserver.py:538: 'protocol_version': ISOLATE_PROTOCOL_VERSION, On 2013/09/28 01:42:58, M-A Ruel wrote: > I think it's better to sort the keys Done. https://codereview.chromium.org/25093003/diff/1/isolateserver.py#newcode632 isolateserver.py:632: assert push_urls and len(push_urls) == 2 On 2013/09/28 01:42:58, M-A Ruel wrote: > I'm confused. push_urls defaults to None. It's stupid pylinter :) It requires method override to have same signature as in base class. > I'd like to hide push_urls away. I feel the object should cache that and reuse > the data without the caller having to know. Otherwise this class is incompatible > with the other classes. What other classes?.. > > One option is: > - Refuse push() if the object wasn't provided to a contains() call first. For > consistency, we should probably enforce it on every implementations. What about replacing 'hash + metadata + expected_size' tuple with an actual object that has some internal state (like 'self.push_urls', 'self.already_uploaded') that is getting updated in 'contains' call? Or make 'contains' call to return such kind of objects. I.e. 'push' will only accept objects returned by 'contains'. I'll sketch later approach in this CL. > - Have this object cache the push_urls for each contains() calls inside the > object. The caller doesn't care. I don't like hidden internal state here. StorageApi should be dumb and stateless, as protocol itself. > > wdyt?
I like the idea of changing the API so that there's only two functions; .fetch() .contains() and .contains() returns an object, where the user can call .push() on it. Does this fits what you had in mind? That said, I don't mind committing _something_ right away and iterating on a follow up CL to iron out the actual python implementation (vs the new interface with the server)
I'll wait on your feedback on PendingPush stuff before finishing it. Don't want to invest too much time on bad decision :)
On 2013/09/30 19:14:35, M-A Ruel wrote: > I like the idea of changing the API so that there's only two functions; > .fetch() > .contains() > > and .contains() returns an object, where the user can call .push() on it. > > Does this fits what you had in mind? > I don't like it. It'll make batch uploads more difficult, imho. And it'll split actual Storage API implementation between two classes (StoageApi and file object that implements 'push'). And it breaks nice symmetry between push\upload. > That said, I don't mind committing _something_ right away and iterating on a > follow up CL to iron out the actual python implementation (vs the new interface > with the server) I'll finish PendingPush stuff then (probably renaming PendingPush into something else). Let's see how it looks when done.
design is fine but it looks like there's 2 bugs in there. https://codereview.chromium.org/25093003/diff/7001/isolateserver.py File isolateserver.py (right): https://codereview.chromium.org/25093003/diff/7001/isolateserver.py#newcode516 isolateserver.py:516: class PendingPush(object): I prefer the class to be before Storage. https://codereview.chromium.org/25093003/diff/7001/isolateserver.py#newcode517 isolateserver.py:517: """A file that were not found on the server and that should be uploaded. """A file that was not ... https://codereview.chromium.org/25093003/diff/7001/isolateserver.py#newcode519 isolateserver.py:519: Returned by StorageApi's 'contains' call, passed to StorageApi's 'push'. Returned by StorageApi.contains(), to be passed to StorageApi.push(). https://codereview.chromium.org/25093003/diff/7001/isolateserver.py#newcode554 isolateserver.py:554: 'protocol_version': ISOLATE_PROTOCOL_VERSION, Sort here too https://codereview.chromium.org/25093003/diff/7001/isolateserver.py#newcode732 isolateserver.py:732: IsolateServer.PendingIsolatePush(files[0], files[1], push_urls) You probably meant files[i][0], files[i][1] :) https://codereview.chromium.org/25093003/diff/7001/isolateserver.py#newcode761 isolateserver.py:761: def push(self, item, expected_size, content_generator, push_urls=None): You forgot to update it
I started to refine PendingPush. And it looks ugly...
I decided to pursue another path:
class File:
path
metadata
push_urls
...
class StorageApi:
def contains(files):
"""List of Files -> list of missing Files (mutated a bit to include
push_urls)."""
def push(file):
"""A File object."""
class Storage:
def upload_tree(....):
files = [File(...) for input args]
# ... and then operate only with File objects, never a plain dicts.
On 2013/09/30 19:43:32, Vadim Sh. wrote: > I started to refine PendingPush. And it looks ugly... > > I decided to pursue another path: > > class File: > path > metadata > push_urls > ... > > class StorageApi: > def contains(files): > """List of Files -> list of missing Files (mutated a bit to include > push_urls).""" I like that. I hate the current tuple thing. I did this because I was lazy. Then each File instance can be stateful if needed. > def push(file): > """A File object.""" > > class Storage: > def upload_tree(....): > files = [File(...) for input args] > # ... and then operate only with File objects, never a plain dicts.
Please take a look. I will update tests when isolateserver.py passes code review. https://codereview.chromium.org/25093003/diff/18001/isolateserver.py File isolateserver.py (right): https://codereview.chromium.org/25093003/diff/18001/isolateserver.py#newcode288 isolateserver.py:288: class Item(object): I decided to call it Item instead of File for two reasons: 1) We can upload byte strings, not a real files. Or in fact anything that can provide a generator and can tell its hash in advance. 2) 'for file in files' overrides builtin 'file' function, while 'for item in items' is fine :)
Simplified it a bit: 1) added back 'priority' to async_push, since it's a property of the 'push' action, not an Item's property. 2) Converted push_state to simple attribute (it was a property will trivial getter and almost trivial setter). Sorry for frequent CLs...
https://codereview.chromium.org/25093003/diff/18001/isolateserver.py File isolateserver.py (right): https://codereview.chromium.org/25093003/diff/18001/isolateserver.py#newcode288 isolateserver.py:288: class Item(object): On 2013/09/30 21:35:21, Vadim Sh. wrote: > I decided to call it Item instead of File for two reasons: > 1) We can upload byte strings, not a real files. Or in fact anything that can > provide a generator and can tell its hash in advance. > 2) 'for file in files' overrides builtin 'file' function, while 'for item in > items' is fine :) I like not using File but Item is a tad generic, but not too generic, since it's inside isolateserver. StorageItem could be an option but Item is probably better after all. https://codereview.chromium.org/25093003/diff/22001/isolateserver.py File isolateserver.py (right): https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode374 isolateserver.py:374: # TODO(vadimsh): Introduce Item as a part of the public interface? I think it'd be better if the caller created the instances, |infiles| type (a dict of dicts) is hard to grok. There's no point in this code to know what a hardlink is for example. I'm fine with the current code in the meantime, we can refactor later. https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode526 isolateserver.py:526: # TODO(vadimsh): Make 'fetch' use Item abstraction as well. I don't think it's necessary. Personally I'd remove the TODO. https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode634 isolateserver.py:634: method='POST') optional nit: method is not strictly necessary https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode692 isolateserver.py:692: # TODO(maruel): Support large files. This would require streaming support. I think you can remove the todo. https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode704 isolateserver.py:704: # A cheese way to avoid memcpy of (possibly huge) file, until streaming s/cheese/cheezy/ :) https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode726 isolateserver.py:726: # TODO(vadimsh): Calculate MD5 sum while uploading a file and send it to IIRC, Google Storage only calculates the MD5 for smaller files and non-composite objects, you can't assume it will always be available, in particular if we enable parallel chunked upload; https://developers.google.com/storage/docs/reference-headers#xgooghash and https://developers.google.com/storage/docs/composite-objects
Ok, added tests. I think it more or less ready. Unfortunately canary waterfall has a hard time right now, especially linux part. I'm not sure why... https://codereview.chromium.org/25093003/diff/22001/isolateserver.py File isolateserver.py (right): https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode374 isolateserver.py:374: # TODO(vadimsh): Introduce Item as a part of the public interface? On 2013/09/30 22:36:33, M-A Ruel wrote: > I think it'd be better if the caller created the instances, |infiles| type (a > dict of dicts) is hard to grok. There's no point in this code to know what a > hardlink is for example. I'm fine with the current code in the meantime, we can > refactor later. Yes. I agree. I left it for future CLs. https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode526 isolateserver.py:526: # TODO(vadimsh): Make 'fetch' use Item abstraction as well. On 2013/09/30 22:36:33, M-A Ruel wrote: > I don't think it's necessary. Personally I'd remove the TODO. Done. https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode634 isolateserver.py:634: method='POST') On 2013/09/30 22:36:33, M-A Ruel wrote: > optional nit: method is not strictly necessary I keep it here for uniformity with rest of calls in this class. https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode692 isolateserver.py:692: # TODO(maruel): Support large files. This would require streaming support. On 2013/09/30 22:36:33, M-A Ruel wrote: > I think you can remove the todo. Done. https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode704 isolateserver.py:704: # A cheese way to avoid memcpy of (possibly huge) file, until streaming On 2013/09/30 22:36:33, M-A Ruel wrote: > s/cheese/cheezy/ :) Done. https://codereview.chromium.org/25093003/diff/22001/isolateserver.py#newcode726 isolateserver.py:726: # TODO(vadimsh): Calculate MD5 sum while uploading a file and send it to On 2013/09/30 22:36:33, M-A Ruel wrote: > IIRC, Google Storage only calculates the MD5 for smaller files and non-composite > objects, you can't assume it will always be available, in particular if we > enable parallel chunked upload; > https://developers.google.com/storage/docs/reference-headers#xgooghash > and > https://developers.google.com/storage/docs/composite-objects Hmm... Docs say CRC32 is available for all objects (including composite ones). CRC32 would be enough to verify integrity of the file. I'll update TODO to mention it.
Ping...
lgtm with optinal nits https://codereview.chromium.org/25093003/diff/28001/isolateserver.py File isolateserver.py (right): https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode304 isolateserver.py:304: def content(self, chunk_size): Maybe yield_content ? Just to make it clear at call sites. I don't mind much. https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode554 isolateserver.py:554: Can also mutated state of the items by assigning an opaque implementation Mutates |items| by assigning opaque implement specific object to Item's push_state attribute on missing entries in the datastore. https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode573 isolateserver.py:573: """A state of push operation carried along with Item in push_state.""" """State needed to call .push(), to be stored in Item.push_state.""" You may want to call the class _PushState, as you prefer. https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode591 isolateserver.py:591: def generate_handshake_request(): _generate_handshake_request ? https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode602 isolateserver.py:602: def validate_handshake_response(caps): _validate_handshake_response ? https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode613 isolateserver.py:613: def server_capabilities(self): _server_capabilities?
https://codereview.chromium.org/25093003/diff/28001/isolateserver.py File isolateserver.py (right): https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode304 isolateserver.py:304: def content(self, chunk_size): On 2013/10/01 17:57:41, M-A Ruel wrote: > Maybe yield_content ? > Just to make it clear at call sites. I don't mind much. I prefer single word. And beside it's not necessarily a generator. I adjusted doc string to reflect this. https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode554 isolateserver.py:554: Can also mutated state of the items by assigning an opaque implementation On 2013/10/01 17:57:41, M-A Ruel wrote: > Mutates |items| by assigning opaque implement specific object to Item's > push_state attribute on missing entries in the datastore. Done. Though strictly speaking there's no datastore here yet. Datastore is a concept of a specific implementation of this interface (IsolateServer). https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode573 isolateserver.py:573: """A state of push operation carried along with Item in push_state.""" On 2013/10/01 17:57:41, M-A Ruel wrote: > """State needed to call .push(), to be stored in Item.push_state.""" > > You may want to call the class _PushState, as you prefer. Done. https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode591 isolateserver.py:591: def generate_handshake_request(): On 2013/10/01 17:57:41, M-A Ruel wrote: > _generate_handshake_request ? Done. https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode602 isolateserver.py:602: def validate_handshake_response(caps): On 2013/10/01 17:57:41, M-A Ruel wrote: > _validate_handshake_response ? Done. https://codereview.chromium.org/25093003/diff/28001/isolateserver.py#newcode613 isolateserver.py:613: def server_capabilities(self): On 2013/10/01 17:57:41, M-A Ruel wrote: > _server_capabilities? Done.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vadimsh@chromium.org/25093003/38001
Message was sent while issue was closed.
Change committed as 226265 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
