|
|
Chromium Code Reviews|
Created:
4 years, 2 months ago by Sharu Jiang Modified:
4 years, 1 month ago CC:
chromium-reviews, infra-reviews+infra_chromium.org, inferno Target Ref:
refs/heads/master Project:
infra Visibility:
Public. |
Description[Findit] Add skeleton code for delta test script.
Skeleton code for delta test, the logic is the same as delta test in clusterfuzz.
BUG=605369
Committed: https://chromium.googlesource.com/infra/infra/+/6cb32efbba33bf0bd0fe76a14f80a7d5172a3bb5
Patch Set 1 #
Total comments: 15
Patch Set 2 : Address comments. #Patch Set 3 : seperate --date arguments and some renames. #
Total comments: 32
Patch Set 4 : Address comments. #Patch Set 5 : Rebase. #
Total comments: 28
Patch Set 6 : Address comments. #
Total comments: 3
Patch Set 7 : Update doc strings. #Patch Set 8 : Rename run-azalea to run-predator #Patch Set 9 : Rebase. #Messages
Total messages: 41 (17 generated)
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
Description was changed from ========== [Findit] Add skeleton code for delta test script. BUG=605369 ========== to ========== [Findit] Add skeleton code for delta test script. Skeleton code for delta test, the logic is the same as delta test in clusterfuzz. BUG=605369 ==========
katesonia@chromium.org changed reviewers: + mbarbella@google.com, stgao@chromium.org, wrengr@chromium.org
katesonia@chromium.org changed reviewers: + chanli@chromium.org, lijeffrey@chromium.org
katesonia@chromium.org changed reviewers: + mbarbella@chromium.org - mbarbella@google.com
PTAL :)
https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/common... File appengine/findit/common/cache_decorator.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/common... appengine/findit/common/cache_decorator.py:140: func (function): An abitrary function. was correct before: "arbitrary" https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... File appengine/findit/crash/findit_for_chromecrash.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... appengine/findit/crash/findit_for_chromecrash.py:28: def fields(self): What's the purpose of this? Since _fields is a private member of namedtuple, it make sense to leave it private so that it's clear when callers are depending on the internals of this class in ways they shouldn't be (i.e., which are fragile). https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/util_s... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/util_s... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:59: def ToString(self): Should also set the __str__ method https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/util_s... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:62: def IsEmpty(self): Should probably also set the __bool__ method
https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/common... File appengine/findit/common/cache_decorator.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/common... appengine/findit/common/cache_decorator.py:140: func (function): An abitrary function. On 2016/10/11 22:57:43, wrengr wrote: > was correct before: "arbitrary" Done. https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... File appengine/findit/crash/findit_for_chromecrash.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... appengine/findit/crash/findit_for_chromecrash.py:28: def fields(self): On 2016/10/11 22:57:43, wrengr wrote: > What's the purpose of this? Since _fields is a private member of namedtuple, it > make sense to leave it private so that it's clear when callers are depending on > the internals of this class in ways they shouldn't be (i.e., which are fragile). This is used in delta test, and later may used to drop empty field in Culprit. https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/util_s... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/util_s... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:59: def ToString(self): On 2016/10/11 22:57:43, wrengr wrote: > Should also set the __str__ method Done. https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/util_s... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:62: def IsEmpty(self): On 2016/10/11 22:57:43, wrengr wrote: > Should probably also set the __bool__ method Done.
https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... File appengine/findit/crash/findit_for_chromecrash.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... appengine/findit/crash/findit_for_chromecrash.py:28: def fields(self): On 2016/10/12 01:18:20, sharu jiang wrote: > On 2016/10/11 22:57:43, wrengr wrote: > > What's the purpose of this? Since _fields is a private member of namedtuple, > it > > make sense to leave it private so that it's clear when callers are depending > on > > the internals of this class in ways they shouldn't be (i.e., which are > fragile). > > This is used in delta test, and later may used to drop empty field in Culprit. I still don't understand. Why can't delta use _fields directly? If all we need is to compare one Culprit to another, then we should provide the appropriate comparison functions rather than having client code rely on internals. This class serves as the API for the library to pass information to the application; thus, it needs to serve the purposes of the library first and foremost. I assume you want to drop fields because of serializing to JSON, but that's an application concern not a library concern. The library shouldn't have to care at all about how the application decides to serialize things.
https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... File appengine/findit/crash/findit_for_chromecrash.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... appengine/findit/crash/findit_for_chromecrash.py:28: def fields(self): On 2016/10/12 20:15:17, wrengr wrote: > On 2016/10/12 01:18:20, sharu jiang wrote: > > On 2016/10/11 22:57:43, wrengr wrote: > > > What's the purpose of this? Since _fields is a private member of namedtuple, > > it > > > make sense to leave it private so that it's clear when callers are depending > > on > > > the internals of this class in ways they shouldn't be (i.e., which are > > fragile). > > > > This is used in delta test, and later may used to drop empty field in Culprit. > > I still don't understand. Why can't delta use _fields directly? If all we need > is to compare one Culprit to another, then we should provide the appropriate > comparison functions rather than having client code rely on internals. > > This class serves as the API for the library to pass information to the > application; thus, it needs to serve the purposes of the library first and > foremost. I assume you want to drop fields because of serializing to JSON, but > that's an application concern not a library concern. The library shouldn't have > to care at all about how the application decides to serialize things. shouldn't _fields supposed to be private and not used directly? This is not for comparison. For "that's an application concern not a library concern", in this case, I think the serialization part should be in clients' code (findit_for_*) instead of Culprit, to do that, the "fields" may still be needed.
https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:120: # TODO(katesoina): Implement run-azalea.py. Is the TODO still valid? https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:121: command = 'python %s %s' % ('run-azalea.py', result_path) + ( what's the current working dir? Will it matter here? https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:122: ' -v' if verbose else '') Can we use a argument list instead of a string? https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:130: p.communicate(input=json.dumps(crashes)) Should we pass information through a file instead? https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:135: logging.info('Fail to get results.') Should it be an error or info? Same for other usage of logging. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:139: return pickle.load(f) Just curious: why we use pickle instead of json? https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:183: for git_hash in [git_hash1, git_hash2]: So for each crash, we switch the checkout twice. If we have 100 crashes, we have to switch checkout 200 times. Can we avoid such an overhead? https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:56: '--after', same as the other CL, why two different names? https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:88: # If in verbose mode, prints debug infos about Gqlquery. Just Gqlquery? https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:117: logging.info('Delta results already existed in\n%s', delta_path) For this case, what's the expected action by the user?
Patchset #3 (id:80001) has been deleted
https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:120: # TODO(katesoina): Implement run-azalea.py. On 2016/10/14 01:40:40, stgao wrote: > Is the TODO still valid? run-azalea.py is not finished yet, however, right, better add Todo there instead of here. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:121: command = 'python %s %s' % ('run-azalea.py', result_path) + ( On 2016/10/14 01:40:40, stgao wrote: > what's the current working dir? Will it matter here? In line 114, we make sure the current working dir is the delta test dir. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:122: ' -v' if verbose else '') On 2016/10/14 01:40:40, stgao wrote: > Can we use a argument list instead of a string? Done. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:130: p.communicate(input=json.dumps(crashes)) On 2016/10/14 01:40:40, stgao wrote: > Should we pass information through a file instead? We can, we can cache the CrashIterator results(batches of crashes) and let run-azalea read those cache, but as I stated in the design doc https://docs.google.com/document/d/1kfu_HIJFCSwOWQleRpSkOH4t_D0UFY5vDSSausaZu..., this may caused some disk space since whenever args changed, we need to cache those crash_infos (it's relatively big compared to culprit result) again. I can do this in another cl. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:135: logging.info('Fail to get results.') On 2016/10/14 01:40:40, stgao wrote: > Should it be an error or info? Same for other usage of logging. Done. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:139: return pickle.load(f) On 2016/10/14 01:40:40, stgao wrote: > Just curious: why we use pickle instead of json? The results are a general concept, it can be a object like Culprit object. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:183: for git_hash in [git_hash1, git_hash2]: On 2016/10/14 01:40:40, stgao wrote: > So for each crash, we switch the checkout twice. If we have 100 crashes, we have > to switch checkout 200 times. > > Can we avoid such an overhead? This is not for each crash, it is for each batch of 1000 crashes (batch_run=True). So I think the overhead should be ok. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:56: '--after', On 2016/10/14 01:40:40, stgao wrote: > same as the other CL, why two different names? As answered in the other cl, this is similar to the args of 'git log', have two options for people to remember :) https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:88: # If in verbose mode, prints debug infos about Gqlquery. On 2016/10/14 01:40:40, stgao wrote: > Just Gqlquery? Oops, needs to be updated. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:117: logging.info('Delta results already existed in\n%s', delta_path) On 2016/10/14 01:40:40, stgao wrote: > For this case, what's the expected action by the user? In this case, user can check out the csv result file.
Ping
Patchset #5 (id:140001) has been deleted
Look good after comments are addressed. I'm fine with such util scripts having no tests, but you may want to test it out before committing. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:121: command = 'python %s %s' % ('run-azalea.py', result_path) + ( On 2016/10/15 01:24:47, sharu jiang wrote: > On 2016/10/14 01:40:40, stgao wrote: > > what's the current working dir? Will it matter here? > > In line 114, we make sure the current working dir is the delta test dir. Is this new process run in the same shell as that process in #115 above? Did the `cd DIR` really change the current working dir? Have you tried in a command line? https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:130: p.communicate(input=json.dumps(crashes)) On 2016/10/15 01:24:47, sharu jiang wrote: > On 2016/10/14 01:40:40, stgao wrote: > > Should we pass information through a file instead? > > We can, we can cache the CrashIterator results(batches of crashes) and let > run-azalea read those cache, but as I stated in the design doc > https://docs.google.com/document/d/1kfu_HIJFCSwOWQleRpSkOH4t_D0UFY5vDSSausaZu..., > this may caused some disk space since whenever args changed, we need to cache > those crash_infos (it's relatively big compared to culprit result) again. I don't quite understand this. Maybe we could chat in person. > > I can do this in another cl. If this is the plan, let's add a TODO with a bug. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:139: return pickle.load(f) On 2016/10/15 01:24:47, sharu jiang wrote: > On 2016/10/14 01:40:40, stgao wrote: > > Just curious: why we use pickle instead of json? > > The results are a general concept, it can be a object like Culprit object. Acknowledged. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:183: for git_hash in [git_hash1, git_hash2]: On 2016/10/15 01:24:47, sharu jiang wrote: > On 2016/10/14 01:40:40, stgao wrote: > > So for each crash, we switch the checkout twice. If we have 100 crashes, we > have > > to switch checkout 200 times. > > > > Can we avoid such an overhead? > > This is not for each crash, it is for each batch of 1000 crashes > (batch_run=True). So I think the overhead should be ok. sg https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:56: '--after', On 2016/10/15 01:24:47, sharu jiang wrote: > On 2016/10/14 01:40:40, stgao wrote: > > same as the other CL, why two different names? > > As answered in the other cl, this is similar to the args of 'git log', have two > options for people to remember :) The other was updated. Why not here? Two names for the same argument is uncommon. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:117: logging.info('Delta results already existed in\n%s', delta_path) On 2016/10/15 01:24:47, sharu jiang wrote: > On 2016/10/14 01:40:40, stgao wrote: > > For this case, what's the expected action by the user? > > In this case, user can check out the csv result file. Why not add that the the message? It is not clear at least to me. And what do you mean by "check out the csv result file"? https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:126: p.communicate(input=json.dumps(crashes)) Just a note: not sure if the passing a big chunk of data through stdio would be a problem or not. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:134: with open(result_path) as f: As the result_path is passed in, why this function want to read the file and return the content? https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:192: dev_null_handle = open(os.devnull, 'w') Should we do a with clause here? Otherwise a file handle is leaking. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/run-azalea.py (right): https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-azalea.py:19: def GetCulprits(crashes, client, verbose=False): What's the verbose for? If not used, could we remove for now?
https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/common... File appengine/findit/common/cache_decorator.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/common... appengine/findit/common/cache_decorator.py:140: func (function): An abitrary function. nit: the previous spelling was correct https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:31: """Dict representation of delta.""" is it possible to include an example of what the resulting dict should look like? https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:41: value2 = value2.ToDict() does value2 need to be checked if callable(value2.ToDict)? https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:72: nit: include in the docstring that set1 and set2 are dicts (?) https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:81: if crash_id not in set2: nit: set2.keys()? https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:111: logging.info('Switch to git %s', git_hash) nit: Switching? https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:131: logging.error('Fail to get results.') nit: Failed https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_util.py (right): https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_util.py:34: return GIT_HASH_PATTERN.match(str(revision)) or revision == 'master' nit: just to be safe, use revision.lower() == 'master' https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_util.py:47: logging.error('Fail to parse git hash for %s\nStacktrace:\n%s', nit: Failed https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_util.py:83: logging.info('Writing delta diff to %s\n', file_path) nit: move this logging line closer to where the write call is actually being made
Patchset #6 (id:180001) has been deleted
Sure, the script has been tested. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:121: command = 'python %s %s' % ('run-azalea.py', result_path) + ( On 2016/10/20 01:40:20, stgao wrote: > On 2016/10/15 01:24:47, sharu jiang wrote: > > On 2016/10/14 01:40:40, stgao wrote: > > > what's the current working dir? Will it matter here? > > > > In line 114, we make sure the current working dir is the delta test dir. > > Is this new process run in the same shell as that process in #115 above? Did the > `cd DIR` really change the current working dir? Have you tried in a command > line? Yes, the run-azalea.py is in the same dir as this file(delta test dir). I tested the skeleton delta test in command line, it works. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:130: p.communicate(input=json.dumps(crashes)) On 2016/10/20 01:40:20, stgao wrote: > On 2016/10/15 01:24:47, sharu jiang wrote: > > On 2016/10/14 01:40:40, stgao wrote: > > > Should we pass information through a file instead? > > > > We can, we can cache the CrashIterator results(batches of crashes) and let > > run-azalea read those cache, but as I stated in the design doc > > > https://docs.google.com/document/d/1kfu_HIJFCSwOWQleRpSkOH4t_D0UFY5vDSSausaZu..., > > this may caused some disk space since whenever args changed, we need to cache > > those crash_infos (it's relatively big compared to culprit result) again. > > I don't quite understand this. Maybe we could chat in person. > > > > > I can do this in another cl. > > If this is the plan, let's add a TODO with a bug. Done. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:56: '--after', On 2016/10/20 01:40:20, stgao wrote: > On 2016/10/15 01:24:47, sharu jiang wrote: > > On 2016/10/14 01:40:40, stgao wrote: > > > same as the other CL, why two different names? > > > > As answered in the other cl, this is similar to the args of 'git log', have > two > > options for people to remember :) > > The other was updated. Why not here? > Two names for the same argument is uncommon. Done. https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-delta-test.py:117: logging.info('Delta results already existed in\n%s', delta_path) On 2016/10/20 01:40:20, stgao wrote: > On 2016/10/15 01:24:47, sharu jiang wrote: > > On 2016/10/14 01:40:40, stgao wrote: > > > For this case, what's the expected action by the user? > > > > In this case, user can check out the csv result file. > > Why not add that the the message? It is not clear at least to me. And what do > you mean by "check out the csv result file"? I meant open the csv file to see the results. Modify the code so, even we already have csv result, we still print the delta to the screen. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:31: """Dict representation of delta.""" On 2016/10/20 15:52:14, lijeffrey wrote: > is it possible to include an example of what the resulting dict should look > like? Done. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:41: value2 = value2.ToDict() On 2016/10/20 15:52:14, lijeffrey wrote: > does value2 need to be checked if callable(value2.ToDict)? The result1 and result2 should be the same kind, so no need to check value2. Added a Note of that in Delta class doc. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:72: On 2016/10/20 15:52:14, lijeffrey wrote: > nit: include in the docstring that set1 and set2 are dicts (?) Done. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:81: if crash_id not in set2: On 2016/10/20 15:52:14, lijeffrey wrote: > nit: set2.keys()? This is valid usage for dict. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:111: logging.info('Switch to git %s', git_hash) On 2016/10/20 15:52:14, lijeffrey wrote: > nit: Switching? Done. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:126: p.communicate(input=json.dumps(crashes)) On 2016/10/20 01:40:20, stgao wrote: > Just a note: not sure if the passing a big chunk of data through stdio would be > a problem or not. It works fine for the largest batch, and the crash data won't be largely expanded, so it should be ok. Added a todo, I will address this potential problem by caching the crashes data and let the run-azalea read corresponding cache file to get input. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:131: logging.error('Fail to get results.') On 2016/10/20 15:52:14, lijeffrey wrote: > nit: Failed Done. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:134: with open(result_path) as f: On 2016/10/20 01:40:20, stgao wrote: > As the result_path is passed in, why this function want to read the file and > return the content? We can read the file outside this function however I think encapsulate this part in GetResults would be cleaner. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:192: dev_null_handle = open(os.devnull, 'w') On 2016/10/20 01:40:20, stgao wrote: > Should we do a with clause here? Otherwise a file handle is leaking. Done. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_util.py (right): https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_util.py:34: return GIT_HASH_PATTERN.match(str(revision)) or revision == 'master' On 2016/10/20 15:52:14, lijeffrey wrote: > nit: just to be safe, use revision.lower() == 'master' Done. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_util.py:47: logging.error('Fail to parse git hash for %s\nStacktrace:\n%s', On 2016/10/20 15:52:15, lijeffrey wrote: > nit: Failed Done. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_util.py:83: logging.info('Writing delta diff to %s\n', file_path) On 2016/10/20 15:52:14, lijeffrey wrote: > nit: move this logging line closer to where the write call is actually being > made Done. https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/run-azalea.py (right): https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/run-azalea.py:19: def GetCulprits(crashes, client, verbose=False): On 2016/10/20 01:40:20, stgao wrote: > What's the verbose for? If not used, could we remove for now? Done.
lgtm https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:134: with open(result_path) as f: On 2016/10/20 22:39:06, sharu jiang wrote: > On 2016/10/20 01:40:20, stgao wrote: > > As the result_path is passed in, why this function want to read the file and > > return the content? > > We can read the file outside this function however I think encapsulate this part > in GetResults would be cleaner. Then why we pass the result_path into this function? That's not encapsulation. https://codereview.chromium.org/2400283003/diff/200001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/200001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:136: args = ['python', 'run-azalea.py', result_path, '--client', client_id] do we still want to go with "azalea"?
https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/160001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:134: with open(result_path) as f: On 2016/10/21 01:41:39, stgao wrote: > On 2016/10/20 22:39:06, sharu jiang wrote: > > On 2016/10/20 01:40:20, stgao wrote: > > > As the result_path is passed in, why this function want to read the file and > > > return the content? > > > > We can read the file outside this function however I think encapsulate this > part > > in GetResults would be cleaner. > > Then why we pass the result_path into this function? That's not encapsulation. The result path is needed so we can check whether we already have results to determine whether we should call run-azalea(run-predator) or not. https://codereview.chromium.org/2400283003/diff/200001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/200001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:136: args = ['python', 'run-azalea.py', result_path, '--client', client_id] On 2016/10/21 01:41:39, stgao wrote: > do we still want to go with "azalea"? The name will be changed in following cl, so I just leave it as it is for now :)
Patchset #8 (id:240001) has been deleted
https://codereview.chromium.org/2400283003/diff/200001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/200001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:136: args = ['python', 'run-azalea.py', result_path, '--client', client_id] On 2016/10/21 18:45:53, sharu jiang wrote: > On 2016/10/21 01:41:39, stgao wrote: > > do we still want to go with "azalea"? > > The name will be changed in following cl, so I just leave it as it is for now :) Renamed it anyway, better sooner than later :)
The CQ bit was checked by katesonia@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from stgao@chromium.org Link to the patchset: https://codereview.chromium.org/2400283003/#ps260001 (title: "Rename run-azalea to run-predator")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by katesonia@chromium.org
https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:183: for git_hash in [git_hash1, git_hash2]: On 2016/10/15 01:24:47, sharu jiang wrote: > On 2016/10/14 01:40:40, stgao wrote: > > So for each crash, we switch the checkout twice. If we have 100 crashes, we > have > > to switch checkout 200 times. > > > > Can we avoid such an overhead? > > This is not for each crash, it is for each batch of 1000 crashes > (batch_run=True). So I think the overhead should be ok. It'd still be good (and easy!) to avoid. I can add it in another CL once this one finally lands.
Patchset #9 (id:280001) has been deleted
The CQ bit was checked by katesonia@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from stgao@chromium.org Link to the patchset: https://codereview.chromium.org/2400283003/#ps300001 (title: "Rebase.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... File appengine/findit/crash/findit_for_chromecrash.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... appengine/findit/crash/findit_for_chromecrash.py:28: def fields(self): On 2016/10/12 22:58:27, sharu jiang wrote: > On 2016/10/12 20:15:17, wrengr wrote: > > On 2016/10/12 01:18:20, sharu jiang wrote: > > > On 2016/10/11 22:57:43, wrengr wrote: > > > > What's the purpose of this? Since _fields is a private member of > namedtuple, > > > it > > > > make sense to leave it private so that it's clear when callers are > depending > > > on > > > > the internals of this class in ways they shouldn't be (i.e., which are > > > fragile). > > > > > > This is used in delta test, and later may used to drop empty field in > Culprit. > > > > I still don't understand. Why can't delta use _fields directly? If all we need > > is to compare one Culprit to another, then we should provide the appropriate > > comparison functions rather than having client code rely on internals. > > > > This class serves as the API for the library to pass information to the > > application; thus, it needs to serve the purposes of the library first and > > foremost. I assume you want to drop fields because of serializing to JSON, but > > that's an application concern not a library concern. The library shouldn't > have > > to care at all about how the application decides to serialize things. > > shouldn't _fields supposed to be private and not used directly? This is not for > comparison. > > For "that's an application concern not a library concern", in this case, I think > the serialization part should be in clients' code (findit_for_*) instead of > Culprit, to do that, the "fields" may still be needed. (FWIW, whenever I talk about "client code" or "clients (of X)", I mean any code that uses X; as distinct from "users", i.e. humans, who use X. I do not specifically mean the AppEngine applications. I believe this has been a source of confusion, so I will avoid the word "client" below.) If we're going to be turning Culprit objects into anonymous dicts, then that should be done in/by the Culprit class itself. That way, code using the class isn't forced to muck around with the innards of the class— thereby becoming fragile to changes in/to the class. If code using the class can simply call a ToDict method (or similar) and expect that the method will return some sort of JSON-serializable dictionary, then the code using the class becomes independent of how the class is implemented; which in turn frees us up to change the internal representation of Culprit objects without needing to change all the code that uses the Culprit class. If we're serializing Culprit objects into JSON, then we surely want to serialize them in the same way regardless of which AppEngine application happens to be working with that serialized format. Thus, serialization is not application-specific, so it should not be implemented by/in application code. It is much less fragile for application code to assume that objects already know how to serialize *themselves*, and then just ask them to do so (by calling the ToDict method) whenever necessary. Yes, in terms of execution control flow, the application code is where the request for serialization is made, but that doesn't mean that's where the implementation of serialization should live.
Message was sent while issue was closed.
Description was changed from ========== [Findit] Add skeleton code for delta test script. Skeleton code for delta test, the logic is the same as delta test in clusterfuzz. BUG=605369 ========== to ========== [Findit] Add skeleton code for delta test script. Skeleton code for delta test, the logic is the same as delta test in clusterfuzz. BUG=605369 Committed: https://chromium.googlesource.com/infra/infra/+/6cb32efbba33bf0bd0fe76a14f80a... ==========
Message was sent while issue was closed.
Committed patchset #9 (id:300001) as https://chromium.googlesource.com/infra/infra/+/6cb32efbba33bf0bd0fe76a14f80a...
Message was sent while issue was closed.
https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... File appengine/findit/crash/findit_for_chromecrash.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... appengine/findit/crash/findit_for_chromecrash.py:28: def fields(self): On 2016/10/24 18:09:06, wrengr wrote: > On 2016/10/12 22:58:27, sharu jiang wrote: > > On 2016/10/12 20:15:17, wrengr wrote: > > > On 2016/10/12 01:18:20, sharu jiang wrote: > > > > On 2016/10/11 22:57:43, wrengr wrote: > > > > > What's the purpose of this? Since _fields is a private member of > > namedtuple, > > > > it > > > > > make sense to leave it private so that it's clear when callers are > > depending > > > > on > > > > > the internals of this class in ways they shouldn't be (i.e., which are > > > > fragile). > > > > > > > > This is used in delta test, and later may used to drop empty field in > > Culprit. > > > > > > I still don't understand. Why can't delta use _fields directly? If all we > need > > > is to compare one Culprit to another, then we should provide the appropriate > > > comparison functions rather than having client code rely on internals. > > > > > > This class serves as the API for the library to pass information to the > > > application; thus, it needs to serve the purposes of the library first and > > > foremost. I assume you want to drop fields because of serializing to JSON, > but > > > that's an application concern not a library concern. The library shouldn't > > have > > > to care at all about how the application decides to serialize things. > > > > shouldn't _fields supposed to be private and not used directly? This is not > for > > comparison. > > > > For "that's an application concern not a library concern", in this case, I > think > > the serialization part should be in clients' code (findit_for_*) instead of > > Culprit, to do that, the "fields" may still be needed. > > (FWIW, whenever I talk about "client code" or "clients (of X)", I mean any code > that uses X; as distinct from "users", i.e. humans, who use X. I do not > specifically mean the AppEngine applications. I believe this has been a source > of confusion, so I will avoid the word "client" below.) > > If we're going to be turning Culprit objects into anonymous dicts, then that > should be done in/by the Culprit class itself. That way, code using the class > isn't forced to muck around with the innards of the class— thereby becoming > fragile to changes in/to the class. If code using the class can simply call a > ToDict method (or similar) and expect that the method will return some sort of > JSON-serializable dictionary, then the code using the class becomes independent > of how the class is implemented; which in turn frees us up to change the > internal representation of Culprit objects without needing to change all the > code that uses the Culprit class. > > If we're serializing Culprit objects into JSON, then we surely want to serialize > them in the same way regardless of which AppEngine application happens to be > working with that serialized format. Thus, serialization is not > application-specific, so it should not be implemented by/in application code. It > is much less fragile for application code to assume that objects already know > how to serialize *themselves*, and then just ask them to do so (by calling the > ToDict method) whenever necessary. Yes, in terms of execution control flow, the > application code is where the request for serialization is made, but that > doesn't mean that's where the implementation of serialization should live. +1 for Wren's suggestion here
Message was sent while issue was closed.
https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... File appengine/findit/crash/findit_for_chromecrash.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... appengine/findit/crash/findit_for_chromecrash.py:28: def fields(self): On 2016/10/24 18:17:26, stgao wrote: > On 2016/10/24 18:09:06, wrengr wrote: > > On 2016/10/12 22:58:27, sharu jiang wrote: > > > On 2016/10/12 20:15:17, wrengr wrote: > > > > On 2016/10/12 01:18:20, sharu jiang wrote: > > > > > On 2016/10/11 22:57:43, wrengr wrote: > > > > > > What's the purpose of this? Since _fields is a private member of > > > namedtuple, > > > > > it > > > > > > make sense to leave it private so that it's clear when callers are > > > depending > > > > > on > > > > > > the internals of this class in ways they shouldn't be (i.e., which are > > > > > fragile). > > > > > > > > > > This is used in delta test, and later may used to drop empty field in > > > Culprit. > > > > > > > > I still don't understand. Why can't delta use _fields directly? If all we > > need > > > > is to compare one Culprit to another, then we should provide the > appropriate > > > > comparison functions rather than having client code rely on internals. > > > > > > > > This class serves as the API for the library to pass information to the > > > > application; thus, it needs to serve the purposes of the library first and > > > > foremost. I assume you want to drop fields because of serializing to JSON, > > but > > > > that's an application concern not a library concern. The library shouldn't > > > have > > > > to care at all about how the application decides to serialize things. > > > > > > shouldn't _fields supposed to be private and not used directly? This is not > > for > > > comparison. > > > > > > For "that's an application concern not a library concern", in this case, I > > think > > > the serialization part should be in clients' code (findit_for_*) instead of > > > Culprit, to do that, the "fields" may still be needed. > > > > (FWIW, whenever I talk about "client code" or "clients (of X)", I mean any > code > > that uses X; as distinct from "users", i.e. humans, who use X. I do not > > specifically mean the AppEngine applications. I believe this has been a source > > of confusion, so I will avoid the word "client" below.) > > > > If we're going to be turning Culprit objects into anonymous dicts, then that > > should be done in/by the Culprit class itself. That way, code using the class > > isn't forced to muck around with the innards of the class— thereby becoming > > fragile to changes in/to the class. If code using the class can simply call a > > ToDict method (or similar) and expect that the method will return some sort of > > JSON-serializable dictionary, then the code using the class becomes > independent > > of how the class is implemented; which in turn frees us up to change the > > internal representation of Culprit objects without needing to change all the > > code that uses the Culprit class. > > > > If we're serializing Culprit objects into JSON, then we surely want to > serialize > > them in the same way regardless of which AppEngine application happens to be > > working with that serialized format. Thus, serialization is not > > application-specific, so it should not be implemented by/in application code. > It > > is much less fragile for application code to assume that objects already know > > how to serialize *themselves*, and then just ask them to do so (by calling the > > ToDict method) whenever necessary. Yes, in terms of execution control flow, > the > > application code is where the request for serialization is made, but that > > doesn't mean that's where the implementation of serialization should live. > > +1 for Wren's suggestion here I agree on what you said about the serializing culprit object into JSON, however I still have one question, should I just use _fields directly in delta test? Shouldn't _smth be private?
Message was sent while issue was closed.
https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py (right): https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:183: for git_hash in [git_hash1, git_hash2]: On 2016/10/24 17:30:36, wrengr wrote: > On 2016/10/15 01:24:47, sharu jiang wrote: > > On 2016/10/14 01:40:40, stgao wrote: > > > So for each crash, we switch the checkout twice. If we have 100 crashes, we > > have > > > to switch checkout 200 times. > > > > > > Can we avoid such an overhead? > > > > This is not for each crash, it is for each batch of 1000 crashes > > (batch_run=True). So I think the overhead should be ok. > > It'd still be good (and easy!) to avoid. I can add it in another CL once this > one finally lands. Can we discuss the solution offline then? The main reasons I want to get delta result batch by batch are 1) we can print intermediate delta result to users (will print that in another cl), especially when users query a long period of crash analyses, it may take long time to see the final results. 2) we can also avoid caching a very big chunk of crashes in memory (or writing to disks), especially when users query a long period of crash analyses.
Message was sent while issue was closed.
On 2016/10/24 19:14:32, sharu jiang wrote: > https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... > File appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py > (right): > > https://codereview.chromium.org/2400283003/diff/100001/appengine/findit/util_... > appengine/findit/util_scripts/crash_queries/delta_test/delta_test.py:183: for > git_hash in [git_hash1, git_hash2]: > On 2016/10/24 17:30:36, wrengr wrote: > > On 2016/10/15 01:24:47, sharu jiang wrote: > > > On 2016/10/14 01:40:40, stgao wrote: > > > > So for each crash, we switch the checkout twice. If we have 100 crashes, > we > > > have > > > > to switch checkout 200 times. > > > > > > > > Can we avoid such an overhead? > > > > > > This is not for each crash, it is for each batch of 1000 crashes > > > (batch_run=True). So I think the overhead should be ok. > > > > It'd still be good (and easy!) to avoid. I can add it in another CL once this > > one finally lands. > > Can we discuss the solution offline then? Definitely. I think I may have misinterpreted where the duplication was coming from (hence the easy patch I had in mind may not be applicable). Still, it'd be good to fix (or at the very least open a feature-request ticket, so it doesn't get lost). Let's talk about it at the meeting tomorrow.
Message was sent while issue was closed.
https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... File appengine/findit/crash/findit_for_chromecrash.py (right): https://codereview.chromium.org/2400283003/diff/40001/appengine/findit/crash/... appengine/findit/crash/findit_for_chromecrash.py:28: def fields(self): On 2016/10/24 18:50:20, sharu jiang wrote: > On 2016/10/24 18:17:26, stgao wrote: > > On 2016/10/24 18:09:06, wrengr wrote: > > > On 2016/10/12 22:58:27, sharu jiang wrote: > > > > On 2016/10/12 20:15:17, wrengr wrote: > > > > > On 2016/10/12 01:18:20, sharu jiang wrote: > > > > > > On 2016/10/11 22:57:43, wrengr wrote: > > > > > > > What's the purpose of this? Since _fields is a private member of > > > > namedtuple, > > > > > > it > > > > > > > make sense to leave it private so that it's clear when callers are > > > > depending > > > > > > on > > > > > > > the internals of this class in ways they shouldn't be (i.e., which > are > > > > > > fragile). > > > > > > > > > > > > This is used in delta test, and later may used to drop empty field in > > > > Culprit. > > > > > > > > > > I still don't understand. Why can't delta use _fields directly? If all > we > > > need > > > > > is to compare one Culprit to another, then we should provide the > > appropriate > > > > > comparison functions rather than having client code rely on internals. > > > > > > > > > > This class serves as the API for the library to pass information to the > > > > > application; thus, it needs to serve the purposes of the library first > and > > > > > foremost. I assume you want to drop fields because of serializing to > JSON, > > > but > > > > > that's an application concern not a library concern. The library > shouldn't > > > > have > > > > > to care at all about how the application decides to serialize things. > > > > > > > > shouldn't _fields supposed to be private and not used directly? This is > not > > > for > > > > comparison. > > > > > > > > For "that's an application concern not a library concern", in this case, I > > > think > > > > the serialization part should be in clients' code (findit_for_*) instead > of > > > > Culprit, to do that, the "fields" may still be needed. > > > > > > (FWIW, whenever I talk about "client code" or "clients (of X)", I mean any > > code > > > that uses X; as distinct from "users", i.e. humans, who use X. I do not > > > specifically mean the AppEngine applications. I believe this has been a > source > > > of confusion, so I will avoid the word "client" below.) > > > > > > If we're going to be turning Culprit objects into anonymous dicts, then that > > > should be done in/by the Culprit class itself. That way, code using the > class > > > isn't forced to muck around with the innards of the class— thereby becoming > > > fragile to changes in/to the class. If code using the class can simply call > a > > > ToDict method (or similar) and expect that the method will return some sort > of > > > JSON-serializable dictionary, then the code using the class becomes > > independent > > > of how the class is implemented; which in turn frees us up to change the > > > internal representation of Culprit objects without needing to change all the > > > code that uses the Culprit class. > > > > > > If we're serializing Culprit objects into JSON, then we surely want to > > serialize > > > them in the same way regardless of which AppEngine application happens to be > > > working with that serialized format. Thus, serialization is not > > > application-specific, so it should not be implemented by/in application > code. > > It > > > is much less fragile for application code to assume that objects already > know > > > how to serialize *themselves*, and then just ask them to do so (by calling > the > > > ToDict method) whenever necessary. Yes, in terms of execution control flow, > > the > > > application code is where the request for serialization is made, but that > > > doesn't mean that's where the implementation of serialization should live. > > > > +1 for Wren's suggestion here > > I agree on what you said about the serializing culprit object into JSON, however > I still have one question, should I just use _fields directly in delta test? > Shouldn't _smth be private? Things with a single underscore are generally "protected" in the Java terminology, whereas things with two underscores are "private" in the Java sense (though double underscore is also used to refer to builtins, so there's some confusion here. Also, I don't know whether Google follows the one- vs two-underscore tradition of the Python community at large). As far as the Delta class goes, there are two salient approaches to take. We can either make the class specific to being a delta of Culprit objects, or we can try to make it generic for some large class of values (e.g., JSON-serializable things). If we make it Culprit-specific, then the Delta class is a "friend" in the C++ terminology, so the fact that it accesses the internals of the Culprit class isn't unexpected. Whenever we update the Culprit class we'll have to make sure Delta is kept in sync; but if Delta is the only "friend" of Culprit, then it's cleaner to have Delta depend directly on the innards of Culprit rather than setting up a bunch of boilerplate for other potential "friend" classes which don't actually exist. On the other hand, if we want Delta to be generic over a large class of values, then it makes more sense IMO to define Delta to operate over dict (and list, set, etc; i.e., Python built-in types, or JSON-serializable types, or something similar). In this case we'd call ToDict before computing the delta, and so Delta doesn't need to know what fields the Culprit object itself has (it can query the dict for its keys). Of course, if we only ever use Delta for the deltas of Culprits, then being generic over all dicts(etc) may make things unnecessarily complicated. (I'm guessing in our case it wouldn't actually be too complicated, but the concern is still worth mentioning.) (A third option is to be generic over types which provide certain helper functions. This would allow avoiding the conversion to Dict, while still letting the Delta class remain agnostic to the internals of the classes it operates over. The idea here is to generalize the notion of "delta" or "equality" to the notion of "zipping" two structures together. My MSE thesis was on how to do this cleanly and efficiently, so if we want to pursue this path I already know how to go about it; but that may be like using a cannon as a flyswatter.) |
