Issue 9562007: Add a minimum byte count to metrics log serialization

Issue 9562007: Add a minimum byte count to metrics log serialization (Closed)

Created:
8 years, 9 months ago by stuartmorgan

Modified:
8 years, 7 months ago

Reviewers:
Ilya Sherman, jar (doing other things)

CC:
chromium-reviews, MAD

Base URL:
svn://svn.chromium.org/chrome/trunk/src

Visibility:
Public.

More Reviews

Description

Add a minimum byte count to metrics log serialization Add a minimum number of bytes when serializing logs so that a large number of very small logs won't be capped the same way as a large number of large logs. This ensures that a reasonable amount of metrics history will be kept even when logs can potentially be cut very frequently, such as Android. In the short term this may slightly increase the number of logs that are retained, since the number of logs kept by this method is the same or greater than what was kept before. BUG=None TEST=None Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=138081

Patch Set 1 #

Total comments: 2

Patch Set 2 : Skip large individual logs #

Total comments: 1

Patch Set 3 : Use the byte count as a minimum #

Total comments: 21

Patch Set 4 : Address review comments #

Patch Set 5 : Change limit terminology, |start| decrement logic #

Total comments: 2

Patch Set 6 : Re-add safety net return, change constant names to avoid min/max #

Created: 8 years, 7 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+155 lines, -56 lines)			Patch
M	chrome/browser/metrics/metrics_log_serializer.h	View	1 2 3 4	2 chunks	+9 lines, -2 lines	0 comments	Download
M	chrome/browser/metrics/metrics_log_serializer.cc	View	1 2 3 4 5	5 chunks	+44 lines, -16 lines	0 comments	Download
M	chrome/browser/metrics/metrics_log_serializer_unittest.cc	View	1 2 3 4	8 chunks	+102 lines, -38 lines	0 comments	Download

Messages

Total messages: 26 (0 generated)

Expand Messages | Collapse Messages

stuartmorgan

Jim, I thought we'd had a email conversation about this at some point, but I ...

8 years, 9 months ago (2012-03-01 16:03:28 UTC) #1

Ilya Sherman

LGTM (but please make sure that Jim is also happy with this change). https://chromiumcodereview.appspot.com/9562007/diff/1/chrome/browser/metrics/metrics_log_serializer.cc File ...

8 years, 9 months ago (2012-03-01 22:58:21 UTC) #2

jar (doing other things)

https://chromiumcodereview.appspot.com/9562007/diff/1/chrome/browser/metrics/metrics_log_serializer.cc File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/1/chrome/browser/metrics/metrics_log_serializer.cc#newcode135 chrome/browser/metrics/metrics_log_serializer.cc:135: // have reached this point, so at least one ...

8 years, 9 months ago (2012-03-02 17:47:00 UTC) #3

stuartmorgan

On 2012/03/02 17:47:00, jar wrote: > In fact... I don't see why you are sure ...

8 years, 9 months ago (2012-03-05 09:25:59 UTC) #4

jar (doing other things)

I guess I didn't convey my point well. The comment as shown, is not valid, ...

8 years, 9 months ago (2012-03-06 03:58:52 UTC) #5

I guess I didn't convey my point well.

The comment as shown, is not valid, and there is no reason to believe it is
valid.  As such, it is deceptive to readability to state it.

The DCHECK() can be made to fire by my extreme example, which only demonstrates
that the DCHECK is not an invariant implied by other surrounding/distant code. 
As a result, other code should be changed to make the DCHECK true (since it is
clear you want it to be true, as it increases simplicity of this area of code).

The DCHECK will almost never fire in any case, and debug builds just about never
run with UMA uploading turned on.  IF a developer puts in a "bad value," there
is a significant chance it won't push tem across the border, and even fire the
DCHECK for them :-(.  As a result, you get almost no test coverage from it, and
hence I doubt that it is serving the purpose you intended.  It is only adding a
deception.

IMO, DCHECKs should be machine verifiable comments.  In this case, the comment,
and the DCHECK are wrong (although the statements are IMO desirable).

Why can't we change the code so that the DCHECK is indeed a guaranteed
invariant, implied by the code?

On 2012/03/05 09:25:59, stuartmorgan wrote:
> On 2012/03/02 17:47:00, jar wrote:
> > In fact... I don't see why you are sure this is true.  The limit you set is
> not
> > enforced anywhere else (the constant is local to this file), so if I set it
to
> > (for example) 1, then I'm pretty sure your comment would be violated.
> 
> The purpose of the DCHECK is to make it apparent to someone testing a new
value
> that they have chosen poorly. In your example, yes, setting the limit to 1
byte
> would fail the DCHECK. And I also don't think someone should check that in,
and
> thus I don't see it as a bug that it would DCHECK.
> 
> > IF you have an expectation that some other constant is related to your
300000
> > value, you should use that constant, and not have an independently defined
> > value.  To put it another way: I should be able to change values of the
> > constants, and things should just work (I shouldn't have to adjust several
> > things in sync).
> 
> I feel like if our logs become enormous and someone decides to address that by
> changing the 50,000 per-log cap to 500,000, they should revisit the 300,000
> value too. I disagree that any two combinations of the per-log constant and
> total constant are meaningful and should therefore "just work". I'm don't
think
> that just multiplying the max individual log value by some constant to get the
> max value is the behavior we want either though, so I don't think we should
have
> one constant instead of two.
> 
> If you'd rather that having nonsensical combinations not DCHECK though, I can
> remove it.

Ilya Sherman

On 2012/03/06 03:58:52, jar wrote: > I guess I didn't convey my point well. > ...

8 years, 9 months ago (2012-03-06 08:10:41 UTC) #6

On 2012/03/06 03:58:52, jar wrote:
> I guess I didn't convey my point well.
> 
> The comment as shown, is not valid, and there is no reason to believe it is
> valid.  As such, it is deceptive to readability to state it.
> 
> The DCHECK() can be made to fire by my extreme example, which only
demonstrates
> that the DCHECK is not an invariant implied by other surrounding/distant code.

> As a result, other code should be changed to make the DCHECK true (since it is
> clear you want it to be true, as it increases simplicity of this area of
code).
> 
> The DCHECK will almost never fire in any case, and debug builds just about
never
> run with UMA uploading turned on.  IF a developer puts in a "bad value," there
> is a significant chance it won't push tem across the border, and even fire the
> DCHECK for them :-(.  As a result, you get almost no test coverage from it,
and
> hence I doubt that it is serving the purpose you intended.  It is only adding
a
> deception.
> 
> IMO, DCHECKs should be machine verifiable comments.  In this case, the
comment,
> and the DCHECK are wrong (although the statements are IMO desirable).
> 
> Why can't we change the code so that the DCHECK is indeed a guaranteed
> invariant, implied by the code?

How about a COMPILE_ASSERT() to enforce the relationship between the two
constants?
 
> On 2012/03/05 09:25:59, stuartmorgan wrote:
> > On 2012/03/02 17:47:00, jar wrote:
> > > In fact... I don't see why you are sure this is true.  The limit you set
is
> > not
> > > enforced anywhere else (the constant is local to this file), so if I set
it
> to
> > > (for example) 1, then I'm pretty sure your comment would be violated.
> > 
> > The purpose of the DCHECK is to make it apparent to someone testing a new
> value
> > that they have chosen poorly. In your example, yes, setting the limit to 1
> byte
> > would fail the DCHECK. And I also don't think someone should check that in,
> and
> > thus I don't see it as a bug that it would DCHECK.
> > 
> > > IF you have an expectation that some other constant is related to your
> 300000
> > > value, you should use that constant, and not have an independently defined
> > > value.  To put it another way: I should be able to change values of the
> > > constants, and things should just work (I shouldn't have to adjust several
> > > things in sync).
> > 
> > I feel like if our logs become enormous and someone decides to address that
by
> > changing the 50,000 per-log cap to 500,000, they should revisit the 300,000
> > value too. I disagree that any two combinations of the per-log constant and
> > total constant are meaningful and should therefore "just work". I'm don't
> think
> > that just multiplying the max individual log value by some constant to get
the
> > max value is the behavior we want either though, so I don't think we should
> have
> > one constant instead of two.
> > 
> > If you'd rather that having nonsensical combinations not DCHECK though, I
can
> > remove it.

stuartmorgan

On 2012/03/06 08:10:41, Ilya Sherman wrote: > How about a COMPILE_ASSERT() to enforce the relationship ...

8 years, 9 months ago (2012-03-06 12:06:56 UTC) #7

jar (doing other things)

Losing data is scary. It bothered me historically that when we couldn't get through (unable ...

8 years, 9 months ago (2012-03-06 21:59:32 UTC) #8

Losing data is scary.  It bothered me historically that when we couldn't get
through (unable to transmit), that we could lose data.  There is always a chance
that the *reason* for not getting through is a problem in the browser, end then
we'd loose all of our signals about a problem :-(.

We only added a restriction on the size of uploads because we had restrictions
server side. In addition, when we discard a log, we record (in the next log) the
fact that we did a discard).  As a result, I *think* we tend to have at least
the potential for reporting a problem.

I'm wary of having a simple aggregate size being the basis for discarding logs. 
Your comment had (IMO) at its heart the belief that we would not discard
everything, but when a large log comes along, that appears to be the very
plausible result.

I suspect that the simple change to "not hold more than a certain volume" may be
problematic... but I'm not sure.

At first glance, maintaining a cumulative limit seems "nicer" in terms of
ensuring we won't overrun our local (client) resource limits.  On the other
hand, it means that a single large log can cause disposal of other smaller logs.
 We've seen bugs where event-generating-loops caused a log to overflow.  Hence a
"large log" is a very viable failure scenario. [Also note: As folks add more
histograms, we may hit some limits on sizes <sigh> ].

I have a feeling that the right answer is a compromise.  We need to be very
cautious about individual sizes, but I think we should never discard all logs
because the size of one log is "too" large.

The current discard algorithm in this CL doesn't (I think) have that property. 
We should also be cautious to send signals when we do have to do a discard...
and try to make it more likely that the failure signals can get through.

Perhaps, for example, when we are "discarding" logs based on size, we should
discard the largest first, rather than the oldest first.

It would probably be good to think through what the failure scenarios are, and
understand how any algorithmic improvement(?) would handle those scenarios.

Jim

On 2012/03/06 12:06:56, stuartmorgan wrote:
> On 2012/03/06 08:10:41, Ilya Sherman wrote:
> > How about a COMPILE_ASSERT() to enforce the relationship between the two
> > constants?
> 
> Currently there's nothing that can see both constants. Two options that come
> time mind:
> 1) Restructure things so that both constants live in MetricsService and are
> pushed into the respective classes with setters. Somewhat uglier
interface-wise,
> but arguably easier to maintain changes if we want them to relate to eachother
> 2) Just make this code discard any individual log larger than the
serialization
> constant, so that one enormous log (if the other cap is removed) won't cause
the
> rest to be lost. If that happens, DLOG (it's still unlikely to be seen, but it
> would no longer appear to document something that was no longer true, it would
> just document that if it happens it's a bad sign).
> 
> My vote would be 2. Thoughts?

stuartmorgan

On 2012/03/06 21:59:32, jar wrote: > I'm wary of having a simple aggregate size being ...

8 years, 9 months ago (2012-03-07 08:37:16 UTC) #9

On 2012/03/06 21:59:32, jar wrote:
> I'm wary of having a simple aggregate size being the basis for discarding
logs. 
> Your comment had (IMO) at its heart the belief that we would not discard
> everything, but when a large log comes along, that appears to be the very
> plausible result.

My belief is based on actual code in MetricsService that discards large logs
before they reach the serialization stage. So it's plausible only a in a world
where we remove that limit.

>  We've seen bugs where event-generating-loops caused a log to overflow.  Hence
a
> "large log" is a very viable failure scenario.

Except we also have separate code for discarding all events in a log that has
more than N events. So there are already two separate layers of protection
preventing that from getting to this section of code.

I'm all for defense in depth, but I think there are limits to the amount of
complicated engineering that we should do in one layer just to prevent a case
that can only happen if all the other layers undo all of their protections
against oversized logs.

> I have a feeling that the right answer is a compromise.  We need to be very
> cautious about individual sizes, but I think we should never discard all logs
> because the size of one log is "too" large.
> 
> The current discard algorithm in this CL doesn't (I think) have that property.

My suggestion 2 has that property, and is simple. Can you comment on it? I'm not
sure if your different suggestion is because you object to 2, or just to add
another option to the list.

> We should also be cautious to send signals when we do have to do a discard...
> and try to make it more likely that the failure signals can get through.

We could wire up a pref that gets incremented, and then read back out into logs
being closed, as a histogram of the number of logs discarded since last upload.
Of course we have no way of knowing *that* log won't itself be discarded, and
it's more likely that it will if we're already in a log-discarding situation.

Note that we've never kept information about discarding that happens when we go
over the existing count-based limit, so while I see the value in getting a
reliable signal here if we can, it seems like that could be a different bug+CL,
since it's not a new problem being introduced by this CL.

> Perhaps, for example, when we are "discarding" logs based on size, we should
> discard the largest first, rather than the oldest first.

I'd much rather we do something simple like 2 than create a system which causes
the expected overflow case (lots of logs of mostly similar sizes) discard logs
in an essentially random order (since it would come down to differences of
potentially a handful of bytes) just to slightly improve the behavior of a case
that is currently *impossible*, and will only become possible if someone makes a
poorly-thought-through change to individual log sizing limits at some point in
the future.

> It would probably be good to think through what the failure scenarios are, and
> understand how any algorithmic improvement(?) would handle those scenarios.

The discard scenario I expect to be common is someone using Android, on 3G, for
an extended period of time across multiple sessions. Eventually they will create
enough logs that some will be discarded. This isn't a "failure", it's the
expected outcome of the set of policies that have been chosen (no uploading on
3G, some upper limit to how much log data we are willing to keep locally).

Currently, the limit is based on a number of logs. And every session generates a
log (for reasons we've discussed previously). So currently, 8 consecutive
5-second, open-Chrome-just-to-glance-at-something sessions will cause all
previous stored ongoing logs to be thrown away. That's clearly undesirable, so I
want to make it so that we keep a reasonable amount of UMA data no matter how it
happens to be have been sliced into logs.

I do think a simple safety net (like option 2) makes sense, but I feel like
we're getting bogged down in the question of the theoretical future handling of
a currently impossible case (a single huge log reaching this layer) at the
expense of significantly improving the normal handling of log retention for
Android.

stuartmorgan

Since I haven't heard anything, I went ahead and implemented option 2 (and added unit ...

8 years, 8 months ago (2012-04-19 14:03:45 UTC) #10

stuartmorgan

On 2012/04/19 14:03:45, stuartmorgan wrote: > Since I haven't heard anything, I went ahead and ...

8 years, 7 months ago (2012-05-01 07:50:08 UTC) #11

jar (doing other things)

http://codereview.chromium.org/9562007/diff/10001/chrome/browser/metrics/metrics_log_serializer.cc File chrome/browser/metrics/metrics_log_serializer.cc (right): http://codereview.chromium.org/9562007/diff/10001/chrome/browser/metrics/metrics_log_serializer.cc#newcode123 chrome/browser/metrics/metrics_log_serializer.cc:123: // Keep the most recent logs, up to the ...

8 years, 7 months ago (2012-05-05 01:09:31 UTC) #12

stuartmorgan

On 2012/05/05 01:09:31, jar wrote: > I wouldn't mind changing things to only discard if ...

8 years, 7 months ago (2012-05-09 15:49:16 UTC) #13

jar (doing other things)

http://codereview.chromium.org/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc File chrome/browser/metrics/metrics_log_serializer.cc (left): http://codereview.chromium.org/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc#oldcode139 chrome/browser/metrics/metrics_log_serializer.cc:139: start = local_list.size() - max_list_size; IMO, it would be ...

8 years, 7 months ago (2012-05-10 16:49:40 UTC) #14

stuartmorgan

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc File chrome/browser/metrics/metrics_log_serializer.cc (left): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc#oldcode139 chrome/browser/metrics/metrics_log_serializer.cc:139: start = local_list.size() - max_list_size; On 2012/05/10 16:49:40, jar ...

8 years, 7 months ago (2012-05-11 09:16:40 UTC) #15

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
File chrome/browser/metrics/metrics_log_serializer.cc (left):

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:139: start = local_list.size()
- max_list_size;
On 2012/05/10 16:49:40, jar wrote:
> IMO, it would be much easier to read (not to mention more efficient... but
that
> is not significant here) if you only did the iteration sum when this condition
> was true (and you had a chance truncate the save set).

Done.

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
File chrome/browser/metrics/metrics_log_serializer.cc (right):

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes,
On 2012/05/10 16:49:40, jar wrote:
> This should be max_bytes.

Why? In the new model it behaves as a minimum, not a maximum. See the
method-level comment in the header.

Calling them both maximums makes the model much harder to describe and
understand (I tried that first, and it was terrible).

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:146: DCHECK(max_list_length > 0
|| min_bytes > 0);
On 2012/05/10 16:49:40, jar wrote:
> nit: I think you're using a 0 to mean "don't check." I'd suggest using
something
> more like std::max<size_t>, so that the choice is automatically ignored by the
> logic of the code. 

It doesn't mean don't check. Passing both arguments as 0 should mean no logs
would be stored, which isn't what any sane caller would want; thus it indicates
programmer error, and is DCHECK'd

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:162: if ((min_bytes == 0 ||
bytes_used > min_bytes) &&
On 2012/05/10 16:49:40, jar wrote:
> The test for ==0 seemed superfluous, and made this a tad more confusing. 
> Bytes_used is always positive, and so when min_bytes is 0, then bytes_used >
> min_bytes.

Done.

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:167: if (start ==
local_list.size())
On 2012/05/10 16:49:40, jar wrote:
> The DCHECK makes it clear that we should never have a start value >
> local_list.size(), but safe coding suggests it is better to use >= in such
> comparison (as in line 141 of the original file).

Actually, the whole if check goes against Chromium style, which says that we
shouldn't handle DCHECK failures gracefully, so the early return shouldn't have
been there in the first place. I've removed it entirely.

jar (doing other things)

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc#newcode143 chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes, Both lines 142 and 143 represent limits. ...

8 years, 7 months ago (2012-05-11 21:36:49 UTC) #16

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
File chrome/browser/metrics/metrics_log_serializer.cc (right):

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes,
Both lines 142 and 143 represent limits.  When those limits are exceeded, then
some affirmative action is possibly taken.  Hence they should both have the same
sense: both be max, or both be min.

When I thought about it, going back and forth, I decided that the question that
was being asked was:

Have I exceeded the byte count AND have I exceeded the log count?

As a result, I think that both should be max values, with the flexibility to
ignore a violation if we don't have both counts violated.  In any case, they are
the "same sort" of violation.

On 2012/05/11 09:16:40, stuartmorgan wrote:
> On 2012/05/10 16:49:40, jar wrote:
> > This should be max_bytes.
> 
> Why? In the new model it behaves as a minimum, not a maximum. See the
> method-level comment in the header.
> 
> Calling them both maximums makes the model much harder to describe and
> understand (I tried that first, and it was terrible).

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:146: DCHECK(max_list_length > 0
|| min_bytes > 0);
If you don't want to persist unsent logs, I'd expect the values to be zero.  Why
is that outlawed here?

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:161: --start;
IT surprised me a bit that you decremented start here, instead of after you've
validated with line 162.

Consider the case where we have max_list_length == 0, and min_bytes == 1.  I
think that in such a case you don't want to persist any logs, but you always
presist at least one.

Is that really desired here?

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:167: if (start ==
local_list.size())
I don't know where you heard such, as it is not in our style guide, nor our
philosophy.  Some folks have argued against DCHECKs, suggesting that there
should "only be CHECKs," but that political faction has not been accepted.

The policy is to identify problems with DCHECKs, and if you can't respond
gracefully to the situtation, only then should you instead CHECK or induce a
crash.  In the case where you can respond gracefully (as in this case), you
should do so.

On 2012/05/11 09:16:40, stuartmorgan wrote:
> On 2012/05/10 16:49:40, jar wrote:
> > The DCHECK makes it clear that we should never have a start value >
> > local_list.size(), but safe coding suggests it is better to use >= in such
> > comparison (as in line 141 of the original file).
> 
> Actually, the whole if check goes against Chromium style, which says that we
> shouldn't handle DCHECK failures gracefully, so the early return shouldn't
have
> been there in the first place. I've removed it entirely.

stuartmorgan

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc#newcode167 chrome/browser/metrics/metrics_log_serializer.cc:167: if (start == local_list.size()) On 2012/05/11 21:36:50, jar wrote: ...

8 years, 7 months ago (2012-05-11 22:04:37 UTC) #17

stuartmorgan

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc#newcode143 chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes, On 2012/05/11 21:36:50, jar wrote: > Both ...

8 years, 7 months ago (2012-05-15 15:57:05 UTC) #18

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
File chrome/browser/metrics/metrics_log_serializer.cc (right):

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes,
On 2012/05/11 21:36:50, jar wrote:
> Both lines 142 and 143 represent limits.  When those limits are exceeded, then
> some affirmative action is possibly taken.  Hence they should both have the
same
> sense: both be max, or both be min.

Good point; I'll change them both to limit (see below)

> When I thought about it, going back and forth, I decided that the question
that
> was being asked was:
> 
> Have I exceeded the byte count AND have I exceeded the log count?

Agreed.

> As a result, I think that both should be max values, with the flexibility to
> ignore a violation if we don't have both counts violated.

I reach the opposite conclusion from your clear statement of the goal. There are
two numbers, and we don't stop until we have reached or exceeded both numbers.
Thus the numbers are minimums from my perspective. The limiting factor at the
upper end is that we stop when we've satisfied both minimums.

Since we see the question the same but the terms differently, I've chosen a
neutral word--"limit"--and described the function's behavior without using
"minimum" or "maximum". Hopefully that will make it understandable no matter
which angle someone comes at it from. Let me know if the method comment works
for you this way.

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:146: DCHECK(max_list_length > 0
|| min_bytes > 0);
On 2012/05/11 21:36:50, jar wrote:
> If you don't want to persist unsent logs, I'd expect the values to be zero. 
Why
> is that outlawed here?

I would expect someone who doesn't want to persist unsent logs not to call a
method named "WriteLogsToPrefList". I think it's extremely implausible that we
would ever deliberately call with 0 and 0, and much more plausible that someone
might accidentally do it.

Making accidents easier just to make what I see as a nonsensical call ("Please
write logs, but don't write any") seems like the wrong tradeoff to me, which is
why I opted for making it an error.

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:161: --start;
On 2012/05/11 21:36:50, jar wrote:
> IT surprised me a bit that you decremented start here, instead of after you've
> validated with line 162.

Since the method is currently designed such that it's always going to write a
log, I thought it was clearer. I've moved it though; it's more future-proof that
way.

> Consider the case where we have max_list_length == 0, and min_bytes == 1.  I
> think that in such a case you don't want to persist any logs, but you always
> presist at least one.
> 
> Is that really desired here?

Since writing no logs would take 0 bytes, writing no logs when called with a min
of 1 byte would violate the documented behavior of the method in this version.
So yes, it was intentional. That still works with the decrementing and byte
counting moved after the break though.

That is a change from the original CL where I was doing a max number of bytes,
and trying not to go over it. But with the hybrid approach, that became harder
to understand conceptually, so I went to a more straightforward lower bound.

I can go to the "soft-max", but ultimately it doesn't really matter in practice,
since the byte limit only kicks in when there are a bunch of small logs. The
case where soft-max vs. min would be a non-trivial size difference is:
  <a bunch of small logs summing to byte_limit - 1>, enormous log
But we could just as easily have:
  enormous log 1, enormous log 2, ..., enormous log (length_limit - 1)
which would be much more space.

So I went for the easier-to-understand model since the practical effects are
pretty much the same.

jar (doing other things)

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metrics/metrics_log_serializer.cc#newcode143 chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes, I'm fine with the word "limit." I ...

8 years, 7 months ago (2012-05-15 18:27:39 UTC) #19

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
File chrome/browser/metrics/metrics_log_serializer.cc (right):

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes,
I'm fine with the word "limit."

I think the difference in the read is that I ask:

"Are we going to truncate anything?"

and you ask:

"Are we going to save all?"

On 2012/05/15 15:57:05, stuartmorgan wrote:
> On 2012/05/11 21:36:50, jar wrote:
> > Both lines 142 and 143 represent limits.  When those limits are exceeded,
then
> > some affirmative action is possibly taken.  Hence they should both have the
> same
> > sense: both be max, or both be min.
> 
> Good point; I'll change them both to limit (see below)
> 
> > When I thought about it, going back and forth, I decided that the question
> that
> > was being asked was:
> > 
> > Have I exceeded the byte count AND have I exceeded the log count?
> 
> Agreed.
> 
> > As a result, I think that both should be max values, with the flexibility to
> > ignore a violation if we don't have both counts violated.
> 
> I reach the opposite conclusion from your clear statement of the goal. There
are
> two numbers, and we don't stop until we have reached or exceeded both numbers.
> Thus the numbers are minimums from my perspective. The limiting factor at the
> upper end is that we stop when we've satisfied both minimums.
> 
> Since we see the question the same but the terms differently, I've chosen a
> neutral word--"limit"--and described the function's behavior without using
> "minimum" or "maximum". Hopefully that will make it understandable no matter
> which angle someone comes at it from. Let me know if the method comment works
> for you this way.

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:167: if (start ==
local_list.size())
I went around on this with Darin.  The style guide commentary in this area is
focused on user-visible elements, and error handling.  This area, surrounding
UMA, which is not user-visible, is more questionable with regard to that
proposed pattern. 

Given the ease with which this bug can be stimulated, I'd rather see the error
handling code, than the DCHECK()... so if you insist on only one, I'd like that
one.

This will allow folks to set bounds at very low level, and often (not always)
have zero logs persisted.  For some low-memory users, that may indeed be
preferable.

On 2012/05/11 22:04:37, stuartmorgan wrote:
> On 2012/05/11 21:36:50, jar wrote:
> > I don't know where you heard such, as it is not in our style guide
> 
> Not only is it in our style guide, it's one of the only bold things in it:
>
http://dev.chromium.org/developers/coding-style#TOC-CHECK-vs-DCHECK-and-NOTRE...
> 
> > In the case where you can respond gracefully (as in this case), you
> should do so.
> 
> That's exactly the opposite of what the style guide says.

https://chromiumcodereview.appspot.com/9562007/diff/19007/chrome/browser/metr...
File chrome/browser/metrics/metrics_log_serializer.cc (right):

https://chromiumcodereview.appspot.com/9562007/diff/19007/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:164: (local_list.size() -
start) >= list_length_limit)
You probably want ">" rather than ">=" now.

Currently, if list_length_limit == 0, then we could break on the first pass...
and get to line 170, and violate the DCHECK.

I guess this gets back to the meaning of "limit".  

The alternative (in my other comment) is to remove the DCHECK, and allow a call
where we sometimes choose to not persist any logs.  I think that is a very
plausible scenario on a low memory system anyway.

stuartmorgan

I'll post a new version with the failsafe error handling tomorrow, but I want to ...

8 years, 7 months ago (2012-05-15 19:02:58 UTC) #20

I'll post a new version with the failsafe error handling tomorrow, but I want to
clarify if you also want a change in algorithm as well to avoid another review
round-trip.

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
File chrome/browser/metrics/metrics_log_serializer.cc (right):

https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:167: if (start ==
local_list.size())
On 2012/05/15 18:27:39, jar wrote:
> Given the ease with which this bug can be stimulated, I'd rather see the error
> handling code, than the DCHECK()... so if you insist on only one, I'd like
that
> one.

I can re-add the error-handling. (I'm not clear how it's easy to trigger this
bug though; someone would have to call the function with both values 0, which is
a very strange thing to do, DCHECKs immediately, and is documented as being
disallowed.)

> This will allow folks to set bounds at very low level, and often (not always)
> have zero logs persisted.  For some low-memory users, that may indeed be
> preferable.

That's not possible as currently written. Does this mean you want me to rewrite
this using what I was calling "soft-max" instead of what it currently does? As I
said before, I think the current version is somewhat easier to understand, and
I'm having a hard time imagining that much memory constraint on a device that
can run Chrome (see below).

https://chromiumcodereview.appspot.com/9562007/diff/19007/chrome/browser/metr...
File chrome/browser/metrics/metrics_log_serializer.cc (right):

https://chromiumcodereview.appspot.com/9562007/diff/19007/chrome/browser/metr...
chrome/browser/metrics/metrics_log_serializer.cc:164: (local_list.size() -
start) >= list_length_limit)
On 2012/05/15 18:27:40, jar wrote:
> You probably want ">" rather than ">=" now.
> 
> Currently, if list_length_limit == 0, then we could break on the first pass...
> and get to line 170, and violate the DCHECK.

No, because it's an &&. To satisfy both conditions on the first pass, both
limits would have to be 0, and that's a DCHECK at the start of the method.

> The alternative (in my other comment) is to remove the DCHECK, and allow a
call
> where we sometimes choose to not persist any logs.  I think that is a very
> plausible scenario on a low memory system anyway.

Since we always have at least one log to save, this would guarantee loss of
metrics on every single shutdown (or app termination on mobile). I'm having a
hard time imagining a device with so little persistant storage that we would
consider always throwing away metrics in normal use, so I'd rather not design
for it.

stuartmorgan

New version up with the early return re-added, and the constant names and constants scrubbed ...

8 years, 7 months ago (2012-05-18 12:21:11 UTC) #21

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/stuartmorgan@chromium.org/9562007/28002

8 years, 7 months ago (2012-05-19 21:57:41 UTC) #23

commit-bot: I haz the power

Try job failure for 9562007-28002 (retry) on linux_rel for step "browser_tests". It's a second try, ...

8 years, 7 months ago (2012-05-19 23:08:15 UTC) #24

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/stuartmorgan@chromium.org/9562007/28002

8 years, 7 months ago (2012-05-21 06:30:21 UTC) #25

Change committed as 138081

Expand Messages | Collapse Messages