|
|
DescriptionFirst version of benchmark health report.
The report consists of a backend which gathers data and 3 polymer UI
elements. It currently shows 2 UIs:
* A listing of benchmarks by master (defaults to ChromiumPerf):
https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_health_report
* A simple report on the benchmark, including a list of recent alerts and which
bot it is running on:
https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_health_report?benchmark=v8.browsing_desktop_ignition&master=ChromiumPerf&num_days=30
There are a lot of improvements we can make on the basic health report,
but for this CL I'd like to focus on getting a base so it's easy for
people to shard work and add new features as needed. I added server-side
unit tests and basic instantiation tests for the UI, which should hopefully
make it fast to iterate on.
Will file bugs for additional features and issues, including:
* Getting bisect quality stats into the report
* Getting bug triage stats into the report
* Handling tests that have multiple triage rotations
* Adding warnings about missing data
BUG=catapult:#3224
Review-Url: https://codereview.chromium.org/2704663003
Committed: https://chromium.googlesource.com/external/github.com/catapult-project/catapult/+/5720370545f8df69d23bcfbf221f86a282f79ded
Patch Set 1 #
Total comments: 20
Patch Set 2 : addressed review comments #
Total comments: 10
Patch Set 3 : Addressed review comments #Patch Set 4 : Fix typo in test #Messages
Total messages: 24 (9 generated)
sullivan@chromium.org changed reviewers: + eakuefner@chromium.org, simonhatch@chromium.org
https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... File dashboard/dashboard/benchmark_quality_report.py (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:47: response_values = self._GetResponseValuesForMaster(master) Current style of the dashboard is to have different responses based on parameters. I think it is okay in this case, but I actually didn't implement any bug/bisect stats yet because those don't map well onto this structure. Will file a bug for that and we can discuss there? https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:74: def _GetSheriffForBenchmark(self, benchmark, master, benchmarks): This doesn't work if there are multiple sheriffs on a benchmark. Example that's super confusing is the power sheriffing is the only thing listed for system_health benchmark: https://dev-sullivan-7c48f1f5-dot-chromeperf.appspot.com/benchmark_quality_re... It shouldn't be too hard to extend it to grab multiple sheriffs, but I'd prefer to do it in a follow up to make reviews easier. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:87: benchmarks = update_test_suites.FetchCachedTestSuites() We already have tons of information cached about the tests (what bots it has run on, what is monitored, which things are marked "deprecated" after long periods of no data). This endpoint re-uses that. Interested to hear how Simon and Ethan feel about this after the problems with rampant re-use of the supposedly-cached subtest dict. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... File dashboard/dashboard/elements/benchmark-quality-report-details.html (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... dashboard/dashboard/elements/benchmark-quality-report-details.html:45: extra-columns="{{extraColumns}}"></alerts-table> This element does the XHR and fills in details for a specific benchmark. You can grok the meat of what it's supposed to do here on lines 33-45. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... dashboard/dashboard/elements/benchmark-quality-report-details.html:66: extraColumns: { This makes alerts show as performance regression alerts (there are other fields for data stoppage alerts). Later on, we could consider customizing this, for example to show a column listing bisect results. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... File dashboard/dashboard/elements/benchmark-quality-report-list.html (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... dashboard/dashboard/elements/benchmark-quality-report-list.html:42: </ul> This element does the XHR and populates the UI for a simple list of benchmarks. You can see the core HTML here is just that. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... File dashboard/dashboard/elements/benchmark-quality-report-page-test.html (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... dashboard/dashboard/elements/benchmark-quality-report-page-test.html:30: test('instantiation_list', function() { I put both the tests in a single file. I found it was easier to hack on this way. Open to differing opinions. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... File dashboard/dashboard/elements/benchmark-quality-report-page.html (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... dashboard/dashboard/elements/benchmark-quality-report-page.html:25: </template> This element parses the uri parameters and shows either a benchmark list or benchmark details element based on that. This way we only do the uri param parsing once, but we have the code to make XHRs and populate the UI with their responses separated for each endpoint. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/static/... File dashboard/dashboard/static/benchmark_quality_report.html (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/static/... dashboard/dashboard/static/benchmark_quality_report.html:1: <!DOCTYPE html> This is just boilerplate. I think long-term we could have an elegant solution around cloud endpoints + polymer app-route, but the one-time boilerplate cost is not too bad in the short term considering how few endpoints we add.
On 2017/02/17 16:30:13, sullivan wrote: > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... > File dashboard/dashboard/benchmark_quality_report.py (right): > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... > dashboard/dashboard/benchmark_quality_report.py:47: response_values = > self._GetResponseValuesForMaster(master) > Current style of the dashboard is to have different responses based on > parameters. I think it is okay in this case, but I actually didn't implement any > bug/bisect stats yet because those don't map well onto this structure. Will file > a bug for that and we can discuss there? > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... > dashboard/dashboard/benchmark_quality_report.py:74: def > _GetSheriffForBenchmark(self, benchmark, master, benchmarks): > This doesn't work if there are multiple sheriffs on a benchmark. Example that's > super confusing is the power sheriffing is the only thing listed for > system_health benchmark: > https://dev-sullivan-7c48f1f5-dot-chromeperf.appspot.com/benchmark_quality_re... > > It shouldn't be too hard to extend it to grab multiple sheriffs, but I'd prefer > to do it in a follow up to make reviews easier. > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... > dashboard/dashboard/benchmark_quality_report.py:87: benchmarks = > update_test_suites.FetchCachedTestSuites() > We already have tons of information cached about the tests (what bots it has run > on, what is monitored, which things are marked "deprecated" after long periods > of no data). This endpoint re-uses that. Interested to hear how Simon and Ethan > feel about this after the problems with rampant re-use of the supposedly-cached > subtest dict. > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... > File dashboard/dashboard/elements/benchmark-quality-report-details.html (right): > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... > dashboard/dashboard/elements/benchmark-quality-report-details.html:45: > extra-columns="{{extraColumns}}"></alerts-table> > This element does the XHR and fills in details for a specific benchmark. You can > grok the meat of what it's supposed to do here on lines 33-45. > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... > dashboard/dashboard/elements/benchmark-quality-report-details.html:66: > extraColumns: { > This makes alerts show as performance regression alerts (there are other fields > for data stoppage alerts). Later on, we could consider customizing this, for > example to show a column listing bisect results. > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... > File dashboard/dashboard/elements/benchmark-quality-report-list.html (right): > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... > dashboard/dashboard/elements/benchmark-quality-report-list.html:42: </ul> > This element does the XHR and populates the UI for a simple list of benchmarks. > You can see the core HTML here is just that. > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... > File dashboard/dashboard/elements/benchmark-quality-report-page-test.html > (right): > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... > dashboard/dashboard/elements/benchmark-quality-report-page-test.html:30: > test('instantiation_list', function() { > I put both the tests in a single file. I found it was easier to hack on this > way. Open to differing opinions. > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... > File dashboard/dashboard/elements/benchmark-quality-report-page.html (right): > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/element... > dashboard/dashboard/elements/benchmark-quality-report-page.html:25: </template> > This element parses the uri parameters and shows either a benchmark list or > benchmark details element based on that. This way we only do the uri param > parsing once, but we have the code to make XHRs and populate the UI with their > responses separated for each endpoint. > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/static/... > File dashboard/dashboard/static/benchmark_quality_report.html (right): > > https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/static/... > dashboard/dashboard/static/benchmark_quality_report.html:1: <!DOCTYPE html> > This is just boilerplate. I think long-term we could have an elegant solution > around cloud endpoints + polymer app-route, but the one-time boilerplate cost is > not too bad in the short term considering how few endpoints we add. Just a naming bikeshed: can we use the term "benchmark health" instead of benchmark quality? :P The term "health" makes it clear that it needs constant maintenance to avoid getting into the "unhealthy" state, unlike "quality" which could be a one-time thing.
https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... File dashboard/dashboard/benchmark_quality_report.py (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:47: response_values = self._GetResponseValuesForMaster(master) On 2017/02/17 16:30:13, sullivan wrote: > Current style of the dashboard is to have different responses based on > parameters. I think it is okay in this case, but I actually didn't implement any > bug/bisect stats yet because those don't map well onto this structure. Will file > a bug for that and we can discuss there? sgtm https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:74: def _GetSheriffForBenchmark(self, benchmark, master, benchmarks): On 2017/02/17 16:30:13, sullivan wrote: > This doesn't work if there are multiple sheriffs on a benchmark. Example that's > super confusing is the power sheriffing is the only thing listed for > system_health benchmark: > https://dev-sullivan-7c48f1f5-dot-chromeperf.appspot.com/benchmark_quality_re... > > It shouldn't be too hard to extend it to grab multiple sheriffs, but I'd prefer > to do it in a follow up to make reviews easier. Ah yeah was wondering why some of the pages returned fewer alerts than expected. Add TODO just to note it? https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:87: benchmarks = update_test_suites.FetchCachedTestSuites() On 2017/02/17 16:30:13, sullivan wrote: > We already have tons of information cached about the tests (what bots it has run > on, what is monitored, which things are marked "deprecated" after long periods > of no data). This endpoint re-uses that. Interested to hear how Simon and Ethan > feel about this after the problems with rampant re-use of the supposedly-cached > subtest dict. Hadn't seen this, didn't know how bad it was. Don't really like this subtestdict-like pattern of storing some massive monolithic object with all the data in it, and retrieving it to just get at some subset. This thing seems to need to loop querying 2000 objects at a time, does this twice in the overall update. Poking through the datastore shows the externally visible version is broken into 3 parts, which means it's somewhere between 2-3 Mb? And all you're interested in here is a list of benchmark names for a master. Will file a bug to investigate the different uses more and potentially break this down like we did for subtest dict. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... File dashboard/dashboard/benchmark_quality_report_test.py (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report_test.py:115: 'num_days': '30', Maybe add some anomalies outside of the num-days range to get coverage of that param?
Description was changed from ========== First version of benchmark quality report. The report consists of a backend which gathers data and 3 polymer UI elements. It currently shows 2 UIs: * A listing of benchmarks by master (defaults to ChromiumPerf): https://dev-sullivan-7c48f1f5-dot-chromeperf.appspot.com/benchmark_quality_re... * A simple report on the benchmark, including a list of recent alerts and which bot it is running on: https://dev-sullivan-7c48f1f5-dot-chromeperf.appspot.com/benchmark_quality_re... There are a lot of improvements we can make on the basic quality report, but for this CL I'd like to focus on getting a base so it's easy for people to shard work and add new features as needed. I added server-side unit tests and basic instantiation tests for the UI, which should hopefully make it fast to iterate on. Will file bugs for additional features and issues, including: * Getting bisect quality stats into the report * Getting bug triage stats into the report * Handling tests that have multiple triage rotations * Adding warnings about missing data BUG=catapult:#3224 ========== to ========== First version of benchmark quality report. The report consists of a backend which gathers data and 3 polymer UI elements. It currently shows 2 UIs: * A listing of benchmarks by master (defaults to ChromiumPerf): https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_quality_re... * A simple report on the benchmark, including a list of recent alerts and which bot it is running on: https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_quality_re... There are a lot of improvements we can make on the basic quality report, but for this CL I'd like to focus on getting a base so it's easy for people to shard work and add new features as needed. I added server-side unit tests and basic instantiation tests for the UI, which should hopefully make it fast to iterate on. Will file bugs for additional features and issues, including: * Getting bisect quality stats into the report * Getting bug triage stats into the report * Handling tests that have multiple triage rotations * Adding warnings about missing data BUG=catapult:#3224 ==========
I updated the name to benchmark health report as Ned suggested and addressed review comments. PTAL? https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... File dashboard/dashboard/benchmark_quality_report.py (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:74: def _GetSheriffForBenchmark(self, benchmark, master, benchmarks): On 2017/02/17 18:03:47, shatch wrote: > On 2017/02/17 16:30:13, sullivan wrote: > > This doesn't work if there are multiple sheriffs on a benchmark. Example > that's > > super confusing is the power sheriffing is the only thing listed for > > system_health benchmark: > > > https://dev-sullivan-7c48f1f5-dot-chromeperf.appspot.com/benchmark_quality_re... > > > > It shouldn't be too hard to extend it to grab multiple sheriffs, but I'd > prefer > > to do it in a follow up to make reviews easier. > > Ah yeah was wondering why some of the pages returned fewer alerts than expected. > Add TODO just to note it? Done. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:87: benchmarks = update_test_suites.FetchCachedTestSuites() On 2017/02/17 18:03:47, shatch wrote: > On 2017/02/17 16:30:13, sullivan wrote: > > We already have tons of information cached about the tests (what bots it has > run > > on, what is monitored, which things are marked "deprecated" after long periods > > of no data). This endpoint re-uses that. Interested to hear how Simon and > Ethan > > feel about this after the problems with rampant re-use of the > supposedly-cached > > subtest dict. > > Hadn't seen this, didn't know how bad it was. Don't really like this > subtestdict-like pattern of storing some massive monolithic object with all the > data in it, and retrieving it to just get at some subset. This thing seems to > need to loop querying 2000 objects at a time, does this twice in the overall > update. Poking through the datastore shows the externally visible version is > broken into 3 parts, which means it's somewhere between 2-3 Mb? And all you're > interested in here is a list of benchmark names for a master. Will file a bug to > investigate the different uses more and potentially break this down like we did > for subtest dict. Sounds good! This might turn into a good test case for how we could do things differently, both in the current design and with a different data store. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... File dashboard/dashboard/benchmark_quality_report_test.py (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report_test.py:115: 'num_days': '30', On 2017/02/17 18:03:47, shatch wrote: > Maybe add some anomalies outside of the num-days range to get coverage of that > param? It's covered below in testPost_BenchmarkArgumentNumDaysArgument_ListsCorrectAlerts
Description was changed from ========== First version of benchmark quality report. The report consists of a backend which gathers data and 3 polymer UI elements. It currently shows 2 UIs: * A listing of benchmarks by master (defaults to ChromiumPerf): https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_quality_re... * A simple report on the benchmark, including a list of recent alerts and which bot it is running on: https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_quality_re... There are a lot of improvements we can make on the basic quality report, but for this CL I'd like to focus on getting a base so it's easy for people to shard work and add new features as needed. I added server-side unit tests and basic instantiation tests for the UI, which should hopefully make it fast to iterate on. Will file bugs for additional features and issues, including: * Getting bisect quality stats into the report * Getting bug triage stats into the report * Handling tests that have multiple triage rotations * Adding warnings about missing data BUG=catapult:#3224 ========== to ========== First version of benchmark health report. The report consists of a backend which gathers data and 3 polymer UI elements. It currently shows 2 UIs: * A listing of benchmarks by master (defaults to ChromiumPerf): https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_health_report * A simple report on the benchmark, including a list of recent alerts and which bot it is running on: https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_health_rep... There are a lot of improvements we can make on the basic health report, but for this CL I'd like to focus on getting a base so it's easy for people to shard work and add new features as needed. I added server-side unit tests and basic instantiation tests for the UI, which should hopefully make it fast to iterate on. Will file bugs for additional features and issues, including: * Getting bisect quality stats into the report * Getting bug triage stats into the report * Handling tests that have multiple triage rotations * Adding warnings about missing data BUG=catapult:#3224 ==========
lgtm from my end, really want to start hacking on this https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... File dashboard/dashboard/benchmark_quality_report_test.py (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report_test.py:115: 'num_days': '30', On 2017/02/17 18:28:43, sullivan wrote: > On 2017/02/17 18:03:47, shatch wrote: > > Maybe add some anomalies outside of the num-days range to get coverage of that > > param? > > It's covered below in > testPost_BenchmarkArgumentNumDaysArgument_ListsCorrectAlerts Oops :p
The CQ bit was checked by sullivan@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... File dashboard/dashboard/benchmark_quality_report.py (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:47: response_values = self._GetResponseValuesForMaster(master) On 2017/02/17 at 18:03:47, shatch wrote: > On 2017/02/17 16:30:13, sullivan wrote: > > Current style of the dashboard is to have different responses based on > > parameters. I think it is okay in this case, but I actually didn't implement any > > bug/bisect stats yet because those don't map well onto this structure. Will file > > a bug for that and we can discuss there? > > sgtm Caveat that I'm having a little bit of trouble parsing this, but what about doing something like we do in /list_tests, where there's a |type| request parameter and we expect different parameters based on that? https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:87: benchmarks = update_test_suites.FetchCachedTestSuites() On 2017/02/17 at 18:28:43, sullivan wrote: > On 2017/02/17 18:03:47, shatch wrote: > > On 2017/02/17 16:30:13, sullivan wrote: > > > We already have tons of information cached about the tests (what bots it has > > run > > > on, what is monitored, which things are marked "deprecated" after long periods > > > of no data). This endpoint re-uses that. Interested to hear how Simon and > > Ethan > > > feel about this after the problems with rampant re-use of the > > supposedly-cached > > > subtest dict. > > > > Hadn't seen this, didn't know how bad it was. Don't really like this > > subtestdict-like pattern of storing some massive monolithic object with all the > > data in it, and retrieving it to just get at some subset. This thing seems to > > need to loop querying 2000 objects at a time, does this twice in the overall > > update. Poking through the datastore shows the externally visible version is > > broken into 3 parts, which means it's somewhere between 2-3 Mb? And all you're > > interested in here is a list of benchmark names for a master. Will file a bug to > > investigate the different uses more and potentially break this down like we did > > for subtest dict. > > Sounds good! This might turn into a good test case for how we could do things differently, both in the current design and with a different data store. I agree with both of you, that this is a place where we could probably do something better. Caching is a nice performance optimization, but ideally we would do something fast enough that it's not _required_ for functionality, and ideally we would we would be able to enumerate situations where we need a list of suites, and why. I'm as iffy about the ambiguity of what this dict is actually for as I was about subtest dicts, specifically because of how frequently this thing is used throughout add_point. So I'm in favor of an analysis like Simon proposes. Maybe I'll take that on after I've closed out the subtest dict refactor. https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ben... File dashboard/dashboard/benchmark_health_report.py (right): https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ben... dashboard/dashboard/benchmark_health_report.py:31: num_days: With benchmark, number of days to get data for. It looks like this is true based on the code below, but does this mean number of days from now backwards? https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ben... dashboard/dashboard/benchmark_health_report.py:58: query = query.filter(anomaly.Anomaly.is_improvement == False) Jessi is trying to do similar stuff in speed releasing (querying for Anomaly entities by what rotation/what benchmark/etc.). It seems like this being performant depends on the many indices we have for Anomalies -- is that status quo where we want to be long term? https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ele... File dashboard/dashboard/elements/benchmark-health-report-details.html (right): https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ele... dashboard/dashboard/elements/benchmark-health-report-details.html:109: return alerts.map(a => a.bug_id).reduce(function(acc, val) { I think this would be slightly more readable if you use a for..of loop. https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ele... dashboard/dashboard/elements/benchmark-health-report-details.html:131: // Using != and == here and below since bugId could be null or undefined Does it worry us that bugId could be null _or_ undefined? When is it either? https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ele... dashboard/dashboard/elements/benchmark-health-report-details.html:145: function(response) { nit: i think you might be able to avoid the whole .bind(this) thing by writing this as an arrow with braces, like response => {...}
The CQ bit was unchecked by eakuefner@chromium.org
Sorry, looks like my comments were mailed after the box was ticked. I've unticked the box.
Didn't quite get to all the comments, new patchset not yet posted. https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... File dashboard/dashboard/benchmark_quality_report.py (right): https://codereview.chromium.org/2704663003/diff/1/dashboard/dashboard/benchma... dashboard/dashboard/benchmark_quality_report.py:47: response_values = self._GetResponseValuesForMaster(master) On 2017/02/17 19:11:05, eakuefner wrote: > On 2017/02/17 at 18:03:47, shatch wrote: > > On 2017/02/17 16:30:13, sullivan wrote: > > > Current style of the dashboard is to have different responses based on > > > parameters. I think it is okay in this case, but I actually didn't implement > any > > > bug/bisect stats yet because those don't map well onto this structure. Will > file > > > a bug for that and we can discuss there? > > > > sgtm > > Caveat that I'm having a little bit of trouble parsing this, but what about > doing something like we do in /list_tests, where there's a |type| request > parameter and we expect different parameters based on that? I've been meaning to write up a bug/doc for discussion, I just wrote out https://github.com/catapult-project/catapult/issues/3237 https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ben... File dashboard/dashboard/benchmark_health_report.py (right): https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ben... dashboard/dashboard/benchmark_health_report.py:58: query = query.filter(anomaly.Anomaly.is_improvement == False) On 2017/02/17 19:11:06, eakuefner wrote: > Jessi is trying to do similar stuff in speed releasing (querying for Anomaly > entities by what rotation/what benchmark/etc.). It seems like this being > performant depends on the many indices we have for Anomalies -- is that status > quo where we want to be long term? A couple of things wrapped up in here: * We want to actually only show Anomalies for a given master. Specifically there is tons of ChromeOS data we'd want to exclude for our analyses, and also v8 data with the same benchmark name but run in a totally different context (outside the browser). * We do want to show multiple sheriffs, for things like system_health.common_desktop or v8 tests that do have them. Indexes are complicated in data store! In this case, they would be a clear win because: * Unlike Row, Anomaly/Alert has few enough entities that there wouldn't be a huge stored data cost (adding an index to Row is multiple terabytes of additional stored data) * The largest cost of a query is the number of entities returned. It's true that each additional index adds overhead, but Row and Test are the biggest places where that matters because we generally want to be able to query more faster. For Anomaly, if we could just grab the data for a benchmark it'd be faster overall to use an index because fewer entities would be returned. The problem with using an index here is simply that the Anomaly does not store the master and benchmark, which are what we want to query on. i was not planning on adding this, but now that you mention it I think it would speed up this particular query quite a bit, and also remove the complexity around querying by sheriffing and filtering. I'll do this in a follow up: https://github.com/catapult-project/catapult/issues/3238 What type of query is Jessi hitting that does not have indexes?
https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ben... File dashboard/dashboard/benchmark_health_report.py (right): https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ben... dashboard/dashboard/benchmark_health_report.py:58: query = query.filter(anomaly.Anomaly.is_improvement == False) On 2017/02/17 at 20:01:02, sullivan wrote: > On 2017/02/17 19:11:06, eakuefner wrote: > > Jessi is trying to do similar stuff in speed releasing (querying for Anomaly > > entities by what rotation/what benchmark/etc.). It seems like this being > > performant depends on the many indices we have for Anomalies -- is that status > > quo where we want to be long term? > > A couple of things wrapped up in here: > * We want to actually only show Anomalies for a given master. Specifically there is tons of ChromeOS data we'd want to exclude for our analyses, and also v8 data with the same benchmark name but run in a totally different context (outside the browser). > * We do want to show multiple sheriffs, for things like system_health.common_desktop or v8 tests that do have them. > > Indexes are complicated in data store! In this case, they would be a clear win because: > * Unlike Row, Anomaly/Alert has few enough entities that there wouldn't be a huge stored data cost (adding an index to Row is multiple terabytes of additional stored data) > * The largest cost of a query is the number of entities returned. It's true that each additional index adds overhead, but Row and Test are the biggest places where that matters because we generally want to be able to query more faster. For Anomaly, if we could just grab the data for a benchmark it'd be faster overall to use an index because fewer entities would be returned. > > The problem with using an index here is simply that the Anomaly does not store the master and benchmark, which are what we want to query on. i was not planning on adding this, but now that you mention it I think it would speed up this particular query quite a bit, and also remove the complexity around querying by sheriffing and filtering. I'll do this in a follow up: https://github.com/catapult-project/catapult/issues/3238 > > What type of query is Jessi hitting that does not have indexes? There's a doc with some notes here https://docs.google.com/document/d/1p5Yn3cE5sjnwR18lgH3nAj3KLEdVpirLuXLqkK2nm... (internal, but could be Chromium in principle; there's nothing sensitive) Anyway, SGTM to do what you proposed in a followup.
Still addressing comments... https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ele... File dashboard/dashboard/elements/benchmark-health-report-details.html (right): https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ele... dashboard/dashboard/elements/benchmark-health-report-details.html:131: // Using != and == here and below since bugId could be null or undefined On 2017/02/17 19:11:06, eakuefner wrote: > Does it worry us that bugId could be null _or_ undefined? When is it either? So basically we do an XHR to /benchmark_health_report which does a query and uses alerts.py to process. Currently if there is no bug, bug_id would be set to None in alerts.py and then JSON-serialized to null. But if there was a change to functionality, the bug_id parameter entry in the dict could be missing and then it would be undefined. It's hard to guarantee we won't have that change in behavior. It's also strange to me to have a rule that says we should have strict equality, when we also have code that says stuff like: if (bugId) // Just checks if it's falsey if (bugId >= 0) // this casts to Number, so if bugId is null it's checking if 0 >= 0 so it's true Why the restrictive checks if a statement happens to have == or != in it, but not in these other cases?
Okay, this is ready for another look, all review comments shoudl be addressed. https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ele... File dashboard/dashboard/elements/benchmark-health-report-details.html (right): https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ele... dashboard/dashboard/elements/benchmark-health-report-details.html:109: return alerts.map(a => a.bug_id).reduce(function(acc, val) { On 2017/02/17 19:11:06, eakuefner wrote: > I think this would be slightly more readable if you use a for..of loop. Done. https://codereview.chromium.org/2704663003/diff/20001/dashboard/dashboard/ele... dashboard/dashboard/elements/benchmark-health-report-details.html:145: function(response) { On 2017/02/17 19:11:06, eakuefner wrote: > nit: i think you might be able to avoid the whole .bind(this) thing by writing > this as an arrow with braces, like response => {...} Thanks! Done!
lgtm thanks
The CQ bit was checked by sullivan@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from simonhatch@chromium.org, eakuefner@chromium.org Link to the patchset: https://codereview.chromium.org/2704663003/#ps60001 (title: "Fix typo in test")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 60001, "attempt_start_ts": 1487370068634530, "parent_rev": "dd96d49660300073d31db5f009e9cd1cb08dacbf", "commit_rev": "5720370545f8df69d23bcfbf221f86a282f79ded"}
Message was sent while issue was closed.
Description was changed from ========== First version of benchmark health report. The report consists of a backend which gathers data and 3 polymer UI elements. It currently shows 2 UIs: * A listing of benchmarks by master (defaults to ChromiumPerf): https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_health_report * A simple report on the benchmark, including a list of recent alerts and which bot it is running on: https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_health_rep... There are a lot of improvements we can make on the basic health report, but for this CL I'd like to focus on getting a base so it's easy for people to shard work and add new features as needed. I added server-side unit tests and basic instantiation tests for the UI, which should hopefully make it fast to iterate on. Will file bugs for additional features and issues, including: * Getting bisect quality stats into the report * Getting bug triage stats into the report * Handling tests that have multiple triage rotations * Adding warnings about missing data BUG=catapult:#3224 ========== to ========== First version of benchmark health report. The report consists of a backend which gathers data and 3 polymer UI elements. It currently shows 2 UIs: * A listing of benchmarks by master (defaults to ChromiumPerf): https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_health_report * A simple report on the benchmark, including a list of recent alerts and which bot it is running on: https://dev-sullivan-86772197-dot-chromeperf.appspot.com/benchmark_health_rep... There are a lot of improvements we can make on the basic health report, but for this CL I'd like to focus on getting a base so it's easy for people to shard work and add new features as needed. I added server-side unit tests and basic instantiation tests for the UI, which should hopefully make it fast to iterate on. Will file bugs for additional features and issues, including: * Getting bisect quality stats into the report * Getting bug triage stats into the report * Handling tests that have multiple triage rotations * Adding warnings about missing data BUG=catapult:#3224 Review-Url: https://codereview.chromium.org/2704663003 Committed: https://chromium.googlesource.com/external/github.com/catapult-project/catapu... ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001) as https://chromium.googlesource.com/external/github.com/catapult-project/catapu... |