dashboard/dashboard/pinpoint/models/job.py - Issue 3013153002: [pinpoint] Use _JobState.Differences() in _JobState.Explore().

Side by Side Diff: dashboard/dashboard/pinpoint/models/job.py

Issue 3013153002: [pinpoint] Use _JobState.Differences() in _JobState.Explore(). (Closed)

Patch Set: Created 3 years, 3 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
1 # Copyright 2016 The Chromium Authors. All rights reserved.	1 # Copyright 2016 The Chromium Authors. All rights reserved.

2 # Use of this source code is governed by a BSD-style license that can be	2 # Use of this source code is governed by a BSD-style license that can be

3 # found in the LICENSE file.	3 # found in the LICENSE file.

4	4

5 import collections	5 import collections

6 import logging	6 import logging

7 import os	7 import os

8 import traceback	8 import traceback

9	9

10 from google.appengine.api import taskqueue	10 from google.appengine.api import taskqueue

(...skipping 219 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
230 self._changes.append(change)	230 self._changes.append(change)

231	231

232 self._attempts[change] = []	232 self._attempts[change] = []

233 for _ in xrange(self._repeat_count):	233 for _ in xrange(self._repeat_count):

234 self.AddAttempt(change)	234 self.AddAttempt(change)

235	235

236 def Explore(self):	236 def Explore(self):

237 """Compare Changes and bisect by adding additional Changes as needed.	237 """Compare Changes and bisect by adding additional Changes as needed.

238	238

239 For every pair of adjacent Changes, compare their results as probability	239 For every pair of adjacent Changes, compare their results as probability

240 distributions. If more information is needed to establish statistical	240 distributions. If the results are different, find the midpoint of the

241 confidence, add an additional Attempt. If the results are different, find	241 Changes and add it to the Job.

242 the midpoint of the Changes and add it to the Job.

243	242

244 The midpoint can only be added if the second Change represents a commit that	243 The midpoint can only be added if the second Change represents a commit that

245 comes after the first Change. Otherwise, this method won't explore further.	244 comes after the first Change. Otherwise, this method won't explore further.

246 For example, if Change A is repo@abc, and Change B is repo@abc + patch,	245 For example, if Change A is repo@abc, and Change B is repo@abc + patch,

247 there's no way to pick additional Changes to try.	246 there's no way to pick additional Changes to try.

248 """	247 """

249 # Compare every pair of Changes.	248 # This loop adds Changes to the _changes list while looping through it.
	perezju 2017/09/19 13:54:52 Actually, it doesn't. Because of "tuple" you're no Actually, it doesn't. Because of "tuple" you're now iterating through a copy of the original list. Which is fine! perezju 2017/09/19 13:59:04 Ah, ignore both my previous comments. I just noted Show quoted text On 2017/09/19 13:54:52, perezju wrote: > Actually, it doesn't. Because of "tuple" you're now iterating through a copy of > the original list. Which is fine! Ah, ignore both my previous comments. I just noted why you need to yield the index, and your comment here is indeed important. dtu 2017/09/19 18:21:53 Yep, the index is needed for the insertion. Since Show quoted text On 2017/09/19 13:59:04, perezju wrote: > On 2017/09/19 13:54:52, perezju wrote: > > Actually, it doesn't. Because of "tuple" you're now iterating through a copy > of > > the original list. Which is fine! > > Ah, ignore both my previous comments. I just noted why you need to yield the > index, and your comment here is indeed important. Yep, the index is needed for the insertion. Since inserting elements simultaneously uses and modifies the array indices, it's important to go backwards.
250 # TODO: The list may Change while iterating through it.	249 # However, the loop index goes in reverse order and Changes are only added

251 # TODO: Use JobState.Differences().	250 # after the loop index, so the loop never encounters the modified items.

252 for index in xrange(1, len(self._changes)):	251 for index, change_b in reversed(tuple(self.Differences())):

253 change_a = self._changes[index - 1]	252 change_a = self._changes[index - 1]

254 change_b = self._changes[index]

255	253

256 comparison_result = self._Compare(change_a, change_b)	254 try:

257 if comparison_result == _DIFFERENT:	255 midpoint = change_module.Change.Midpoint(change_a, change_b)

258 # Different: Bisect and add an additional Change to the job.	256 except change_module.NonLinearError:

259 try:

260 midpoint = change_module.Change.Midpoint(change_a, change_b)

261 except change_module.NonLinearError:

262 continue

263 logging.info('Adding Change %s.', midpoint)

264 self.AddChange(midpoint, index)

265 elif comparison_result == _SAME:

266 # The same: Do nothing.

267 continue	257 continue

268 elif comparison_result == _UNKNOWN:	258

269 # Unknown: Add an Attempt to the Change with the fewest Attempts.	259 logging.info('Adding Change %s.', midpoint)

270 change = min(change_a, change_b, key=lambda c: len(self._attempts[c]))	260 self.AddChange(midpoint, index)
	perezju 2017/09/19 13:59:04 Do we really need AddChange to be a public method? Do we really need AddChange to be a public method? dtu 2017/09/19 18:21:53 One of the original interactivity feature plans wa Show quoted text On 2017/09/19 13:59:04, perezju wrote: > Do we really need AddChange to be a public method? One of the original interactivity feature plans was for the user to be able to guide the course of the bisect by saying "please run this commit specifically." While we might not expose something as low-level as AddChange() to the user, we will probably want to use it to implement that kind of functionality.
271 self.AddAttempt(change)

272	261

273 def ScheduleWork(self):	262 def ScheduleWork(self):

274 work_left = False	263 work_left = False

275 for attempts in self._attempts.itervalues():	264 for attempts in self._attempts.itervalues():

276 for attempt in attempts:	265 for attempt in attempts:

277 if attempt.completed:	266 if attempt.completed:

278 continue	267 continue

279	268

280 attempt.ScheduleWork()	269 attempt.ScheduleWork()

281 work_left = True	270 work_left = True

282	271

283 return work_left	272 return work_left

284	273

285 def Differences(self):	274 def Differences(self):

	275 """Compares every pair of Changes and yields ones with different results.

	276

	277 This method loops through every pair of adjacent Changes. If they have

	278 statistically different results, this method yields the latter one (which is

	279 assumed to have caused the difference).

	280

	281 Yields:

	282 Tuples of (change_index, Change).

	283 """

286 for index in xrange(1, len(self._changes)):	284 for index in xrange(1, len(self._changes)):

287 change_a = self._changes[index - 1]	285 change_a = self._changes[index - 1]

288 change_b = self._changes[index]	286 change_b = self._changes[index]

289 if self._Compare(change_a, change_b) == _DIFFERENT:	287 if self._Compare(change_a, change_b) == _DIFFERENT:

290 yield index, change_b	288 yield index, change_b
	perezju 2017/09/19 13:54:52 Looking at what clients actually want, maybe it wo Looking at what clients actually want, maybe it would be neater if you yield (change_a, change_b)? dtu 2017/09/19 18:21:53 We will also need the index for AsDict(), below on Show quoted text On 2017/09/19 13:54:52, perezju wrote: > Looking at what clients actually want, maybe it would be neater if you yield > (change_a, change_b)? We will also need the index for AsDict(), below on line 316.
291	289

292 def AsDict(self):	290 def AsDict(self):

293 comparisons = []	291 comparisons = []

294 for index in xrange(1, len(self._changes)):	292 for index in xrange(1, len(self._changes)):

295 change_a = self._changes[index - 1]	293 change_a = self._changes[index - 1]

296 change_b = self._changes[index]	294 change_b = self._changes[index]

297 comparisons.append(self._Compare(change_a, change_b))	295 comparisons.append(self._Compare(change_a, change_b))

298	296

299 # result_values is a 3D array. result_values[change][quest] is a list of	297 # result_values is a 3D array. result_values[change][quest] is a list of

300 # all the result values for that Change and Quest.	298 # all the result values for that Change and Quest.

(...skipping 70 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
371	369

372 try:	370 try:

373 p_value = mann_whitney_u.MannWhitneyU(values_a, values_b)	371 p_value = mann_whitney_u.MannWhitneyU(values_a, values_b)

374 except ValueError:	372 except ValueError:

375 return _UNKNOWN	373 return _UNKNOWN

376	374

377 if p_value < _SIGNIFICANCE_LEVEL:	375 if p_value < _SIGNIFICANCE_LEVEL:

378 return _DIFFERENT	376 return _DIFFERENT

379 else:	377 else:

380 return _UNKNOWN	378 return _UNKNOWN

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »