|
|
Created:
8 years, 9 months ago by stuartmorgan Modified:
8 years, 7 months ago CC:
chromium-reviews, MAD Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAdd a minimum byte count to metrics log serialization
Add a minimum number of bytes when serializing logs so that a large number of very small logs won't be capped the same way as a large number of large logs. This ensures that a reasonable amount of metrics history will be kept even when logs can potentially be cut very frequently, such as Android.
In the short term this may slightly increase the number of logs that are retained, since the number of logs kept by this method is the same or greater than what was kept before.
BUG=None
TEST=None
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=138081
Patch Set 1 #
Total comments: 2
Patch Set 2 : Skip large individual logs #
Total comments: 1
Patch Set 3 : Use the byte count as a minimum #
Total comments: 21
Patch Set 4 : Address review comments #Patch Set 5 : Change limit terminology, |start| decrement logic #
Total comments: 2
Patch Set 6 : Re-add safety net return, change constant names to avoid min/max #
Messages
Total messages: 26 (0 generated)
Jim, I thought we'd had a email conversation about this at some point, but I can't find it so may have discussed it with someone else. If that's the case, and you feel like moving away from counts is a bad idea, we should discuss offline to figure out how to handle the many-small-logs case. +Ilya since this touches on the XML vs. proto logic a bit. Background on the 300,000 byte number is that I logged some XML log sizes in a local build, and came up with the following: - An initial log was ~13,000 bytes - A standard (30 min) ongoing log with moderate usage (but with very low tab count) for the whole half hour was ~30,000 bytes Padding the initial log a bit to ~15,000 (since I wasn't using the browser for the first minute, and had no stored session), and multiplying by the count limit of 20, gives 300,000 Assuming my 30,000 was a reasonable value to work from, 30,000 * the count of 8 = 240,000. Since it was in ballpark, I just combined them to a single number. Worth noting is that the single-log max is currently 50,000, so we'd always be able to store at least 6 logs of each type. I'm not attached to these numbers, so if we want to revise down (or up) for one or both log types just say the word.
LGTM (but please make sure that Jim is also happy with this change). https://chromiumcodereview.appspot.com/9562007/diff/1/chrome/browser/metrics/... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/1/chrome/browser/metrics/... chrome/browser/metrics/metrics_log_serializer.cc:135: // have reached this point, so at least one should always be written. nit: This comment makes it sound like the line below should be DCHECK_LT.
https://chromiumcodereview.appspot.com/9562007/diff/1/chrome/browser/metrics/... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/1/chrome/browser/metrics/... chrome/browser/metrics/metrics_log_serializer.cc:135: // have reached this point, so at least one should always be written. On 2012/03/01 22:58:21, Ilya Sherman wrote: > nit: This comment makes it sound like the line below should be DCHECK_LT. In fact... I don't see why you are sure this is true. The limit you set is not enforced anywhere else (the constant is local to this file), so if I set it to (for example) 1, then I'm pretty sure your comment would be violated. IF you have an expectation that some other constant is related to your 300000 value, you should use that constant, and not have an independently defined value. To put it another way: I should be able to change values of the constants, and things should just work (I shouldn't have to adjust several things in sync).
On 2012/03/02 17:47:00, jar wrote: > In fact... I don't see why you are sure this is true. The limit you set is not > enforced anywhere else (the constant is local to this file), so if I set it to > (for example) 1, then I'm pretty sure your comment would be violated. The purpose of the DCHECK is to make it apparent to someone testing a new value that they have chosen poorly. In your example, yes, setting the limit to 1 byte would fail the DCHECK. And I also don't think someone should check that in, and thus I don't see it as a bug that it would DCHECK. > IF you have an expectation that some other constant is related to your 300000 > value, you should use that constant, and not have an independently defined > value. To put it another way: I should be able to change values of the > constants, and things should just work (I shouldn't have to adjust several > things in sync). I feel like if our logs become enormous and someone decides to address that by changing the 50,000 per-log cap to 500,000, they should revisit the 300,000 value too. I disagree that any two combinations of the per-log constant and total constant are meaningful and should therefore "just work". I'm don't think that just multiplying the max individual log value by some constant to get the max value is the behavior we want either though, so I don't think we should have one constant instead of two. If you'd rather that having nonsensical combinations not DCHECK though, I can remove it.
I guess I didn't convey my point well. The comment as shown, is not valid, and there is no reason to believe it is valid. As such, it is deceptive to readability to state it. The DCHECK() can be made to fire by my extreme example, which only demonstrates that the DCHECK is not an invariant implied by other surrounding/distant code. As a result, other code should be changed to make the DCHECK true (since it is clear you want it to be true, as it increases simplicity of this area of code). The DCHECK will almost never fire in any case, and debug builds just about never run with UMA uploading turned on. IF a developer puts in a "bad value," there is a significant chance it won't push tem across the border, and even fire the DCHECK for them :-(. As a result, you get almost no test coverage from it, and hence I doubt that it is serving the purpose you intended. It is only adding a deception. IMO, DCHECKs should be machine verifiable comments. In this case, the comment, and the DCHECK are wrong (although the statements are IMO desirable). Why can't we change the code so that the DCHECK is indeed a guaranteed invariant, implied by the code? On 2012/03/05 09:25:59, stuartmorgan wrote: > On 2012/03/02 17:47:00, jar wrote: > > In fact... I don't see why you are sure this is true. The limit you set is > not > > enforced anywhere else (the constant is local to this file), so if I set it to > > (for example) 1, then I'm pretty sure your comment would be violated. > > The purpose of the DCHECK is to make it apparent to someone testing a new value > that they have chosen poorly. In your example, yes, setting the limit to 1 byte > would fail the DCHECK. And I also don't think someone should check that in, and > thus I don't see it as a bug that it would DCHECK. > > > IF you have an expectation that some other constant is related to your 300000 > > value, you should use that constant, and not have an independently defined > > value. To put it another way: I should be able to change values of the > > constants, and things should just work (I shouldn't have to adjust several > > things in sync). > > I feel like if our logs become enormous and someone decides to address that by > changing the 50,000 per-log cap to 500,000, they should revisit the 300,000 > value too. I disagree that any two combinations of the per-log constant and > total constant are meaningful and should therefore "just work". I'm don't think > that just multiplying the max individual log value by some constant to get the > max value is the behavior we want either though, so I don't think we should have > one constant instead of two. > > If you'd rather that having nonsensical combinations not DCHECK though, I can > remove it.
On 2012/03/06 03:58:52, jar wrote: > I guess I didn't convey my point well. > > The comment as shown, is not valid, and there is no reason to believe it is > valid. As such, it is deceptive to readability to state it. > > The DCHECK() can be made to fire by my extreme example, which only demonstrates > that the DCHECK is not an invariant implied by other surrounding/distant code. > As a result, other code should be changed to make the DCHECK true (since it is > clear you want it to be true, as it increases simplicity of this area of code). > > The DCHECK will almost never fire in any case, and debug builds just about never > run with UMA uploading turned on. IF a developer puts in a "bad value," there > is a significant chance it won't push tem across the border, and even fire the > DCHECK for them :-(. As a result, you get almost no test coverage from it, and > hence I doubt that it is serving the purpose you intended. It is only adding a > deception. > > IMO, DCHECKs should be machine verifiable comments. In this case, the comment, > and the DCHECK are wrong (although the statements are IMO desirable). > > Why can't we change the code so that the DCHECK is indeed a guaranteed > invariant, implied by the code? How about a COMPILE_ASSERT() to enforce the relationship between the two constants? > On 2012/03/05 09:25:59, stuartmorgan wrote: > > On 2012/03/02 17:47:00, jar wrote: > > > In fact... I don't see why you are sure this is true. The limit you set is > > not > > > enforced anywhere else (the constant is local to this file), so if I set it > to > > > (for example) 1, then I'm pretty sure your comment would be violated. > > > > The purpose of the DCHECK is to make it apparent to someone testing a new > value > > that they have chosen poorly. In your example, yes, setting the limit to 1 > byte > > would fail the DCHECK. And I also don't think someone should check that in, > and > > thus I don't see it as a bug that it would DCHECK. > > > > > IF you have an expectation that some other constant is related to your > 300000 > > > value, you should use that constant, and not have an independently defined > > > value. To put it another way: I should be able to change values of the > > > constants, and things should just work (I shouldn't have to adjust several > > > things in sync). > > > > I feel like if our logs become enormous and someone decides to address that by > > changing the 50,000 per-log cap to 500,000, they should revisit the 300,000 > > value too. I disagree that any two combinations of the per-log constant and > > total constant are meaningful and should therefore "just work". I'm don't > think > > that just multiplying the max individual log value by some constant to get the > > max value is the behavior we want either though, so I don't think we should > have > > one constant instead of two. > > > > If you'd rather that having nonsensical combinations not DCHECK though, I can > > remove it.
On 2012/03/06 08:10:41, Ilya Sherman wrote: > How about a COMPILE_ASSERT() to enforce the relationship between the two > constants? Currently there's nothing that can see both constants. Two options that come time mind: 1) Restructure things so that both constants live in MetricsService and are pushed into the respective classes with setters. Somewhat uglier interface-wise, but arguably easier to maintain changes if we want them to relate to eachother 2) Just make this code discard any individual log larger than the serialization constant, so that one enormous log (if the other cap is removed) won't cause the rest to be lost. If that happens, DLOG (it's still unlikely to be seen, but it would no longer appear to document something that was no longer true, it would just document that if it happens it's a bad sign). My vote would be 2. Thoughts?
Losing data is scary. It bothered me historically that when we couldn't get through (unable to transmit), that we could lose data. There is always a chance that the *reason* for not getting through is a problem in the browser, end then we'd loose all of our signals about a problem :-(. We only added a restriction on the size of uploads because we had restrictions server side. In addition, when we discard a log, we record (in the next log) the fact that we did a discard). As a result, I *think* we tend to have at least the potential for reporting a problem. I'm wary of having a simple aggregate size being the basis for discarding logs. Your comment had (IMO) at its heart the belief that we would not discard everything, but when a large log comes along, that appears to be the very plausible result. I suspect that the simple change to "not hold more than a certain volume" may be problematic... but I'm not sure. At first glance, maintaining a cumulative limit seems "nicer" in terms of ensuring we won't overrun our local (client) resource limits. On the other hand, it means that a single large log can cause disposal of other smaller logs. We've seen bugs where event-generating-loops caused a log to overflow. Hence a "large log" is a very viable failure scenario. [Also note: As folks add more histograms, we may hit some limits on sizes <sigh> ]. I have a feeling that the right answer is a compromise. We need to be very cautious about individual sizes, but I think we should never discard all logs because the size of one log is "too" large. The current discard algorithm in this CL doesn't (I think) have that property. We should also be cautious to send signals when we do have to do a discard... and try to make it more likely that the failure signals can get through. Perhaps, for example, when we are "discarding" logs based on size, we should discard the largest first, rather than the oldest first. It would probably be good to think through what the failure scenarios are, and understand how any algorithmic improvement(?) would handle those scenarios. Jim On 2012/03/06 12:06:56, stuartmorgan wrote: > On 2012/03/06 08:10:41, Ilya Sherman wrote: > > How about a COMPILE_ASSERT() to enforce the relationship between the two > > constants? > > Currently there's nothing that can see both constants. Two options that come > time mind: > 1) Restructure things so that both constants live in MetricsService and are > pushed into the respective classes with setters. Somewhat uglier interface-wise, > but arguably easier to maintain changes if we want them to relate to eachother > 2) Just make this code discard any individual log larger than the serialization > constant, so that one enormous log (if the other cap is removed) won't cause the > rest to be lost. If that happens, DLOG (it's still unlikely to be seen, but it > would no longer appear to document something that was no longer true, it would > just document that if it happens it's a bad sign). > > My vote would be 2. Thoughts?
On 2012/03/06 21:59:32, jar wrote: > I'm wary of having a simple aggregate size being the basis for discarding logs. > Your comment had (IMO) at its heart the belief that we would not discard > everything, but when a large log comes along, that appears to be the very > plausible result. My belief is based on actual code in MetricsService that discards large logs before they reach the serialization stage. So it's plausible only a in a world where we remove that limit. > We've seen bugs where event-generating-loops caused a log to overflow. Hence a > "large log" is a very viable failure scenario. Except we also have separate code for discarding all events in a log that has more than N events. So there are already two separate layers of protection preventing that from getting to this section of code. I'm all for defense in depth, but I think there are limits to the amount of complicated engineering that we should do in one layer just to prevent a case that can only happen if all the other layers undo all of their protections against oversized logs. > I have a feeling that the right answer is a compromise. We need to be very > cautious about individual sizes, but I think we should never discard all logs > because the size of one log is "too" large. > > The current discard algorithm in this CL doesn't (I think) have that property. My suggestion 2 has that property, and is simple. Can you comment on it? I'm not sure if your different suggestion is because you object to 2, or just to add another option to the list. > We should also be cautious to send signals when we do have to do a discard... > and try to make it more likely that the failure signals can get through. We could wire up a pref that gets incremented, and then read back out into logs being closed, as a histogram of the number of logs discarded since last upload. Of course we have no way of knowing *that* log won't itself be discarded, and it's more likely that it will if we're already in a log-discarding situation. Note that we've never kept information about discarding that happens when we go over the existing count-based limit, so while I see the value in getting a reliable signal here if we can, it seems like that could be a different bug+CL, since it's not a new problem being introduced by this CL. > Perhaps, for example, when we are "discarding" logs based on size, we should > discard the largest first, rather than the oldest first. I'd much rather we do something simple like 2 than create a system which causes the expected overflow case (lots of logs of mostly similar sizes) discard logs in an essentially random order (since it would come down to differences of potentially a handful of bytes) just to slightly improve the behavior of a case that is currently *impossible*, and will only become possible if someone makes a poorly-thought-through change to individual log sizing limits at some point in the future. > It would probably be good to think through what the failure scenarios are, and > understand how any algorithmic improvement(?) would handle those scenarios. The discard scenario I expect to be common is someone using Android, on 3G, for an extended period of time across multiple sessions. Eventually they will create enough logs that some will be discarded. This isn't a "failure", it's the expected outcome of the set of policies that have been chosen (no uploading on 3G, some upper limit to how much log data we are willing to keep locally). Currently, the limit is based on a number of logs. And every session generates a log (for reasons we've discussed previously). So currently, 8 consecutive 5-second, open-Chrome-just-to-glance-at-something sessions will cause all previous stored ongoing logs to be thrown away. That's clearly undesirable, so I want to make it so that we keep a reasonable amount of UMA data no matter how it happens to be have been sliced into logs. I do think a simple safety net (like option 2) makes sense, but I feel like we're getting bogged down in the question of the theoretical future handling of a currently impossible case (a single huge log reaching this layer) at the expense of significantly improving the normal handling of log retention for Android.
Since I haven't heard anything, I went ahead and implemented option 2 (and added unit tests for various large-log cases).
On 2012/04/19 14:03:45, stuartmorgan wrote: > Since I haven't heard anything, I went ahead and implemented option 2 (and added > unit tests for various large-log cases). jar: This CL specifically is very important for us to get into M20, for which the branch point is now a week away. Please prioritize this review over the other changes I have outstanding.
http://codereview.chromium.org/9562007/diff/10001/chrome/browser/metrics/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): http://codereview.chromium.org/9562007/diff/10001/chrome/browser/metrics/metr... chrome/browser/metrics/metrics_log_serializer.cc:123: // Keep the most recent logs, up to the size limit. Although your proposal makes a lot of sense for mobile, where memory is constrained, and there is a great potential for users to restart regularly (generating small logs), I don't think this is a good solution at all for other platforms. The problem is that logs continue to rise in size, as does memory, and this system seems guaranteed to impact desktop reporting IMO. You may (currently) be willing to live with that loss... but I don't think we can do that to our current user base. I'd rather see an ifdef style of discard patching, which will not impact the current approach. I wouldn't mind changing things to only discard if both thresholds are exceeded (both too many files, and too many bytes). That would indeed seem to be a nice improvement for non-mobile, as well as satisfying your needs for mobile (at least in teh use case you described, with many many little logs). I do however suspect that you really want to just discard data when your memory footprint is exceeded (too many bytes) so you may choose to do so on your platform. Am I missing something?
On 2012/05/05 01:09:31, jar wrote: > I wouldn't mind changing things to only discard if both > thresholds are exceeded (both too many files, and too many bytes). That would > indeed seem to be a nice improvement for non-mobile, as well as satisfying your > needs for mobile (at least in teh use case you described, with many many little > logs). New version is up that implements that approach instead of an upper limit of bytes. > I do however suspect that you really want to just discard data when your memory > footprint is exceeded (too many bytes) so you may choose to do so on your > platform. I'm primarily concerned about the case I mentioned (excessive data loss with many small logs), rather than the exact details of the upper limit of storage. If that changes later, it can easily by achieved by ifdefing just the length constants in this approach.
http://codereview.chromium.org/9562007/diff/18002/chrome/browser/metrics/metr... File chrome/browser/metrics/metrics_log_serializer.cc (left): http://codereview.chromium.org/9562007/diff/18002/chrome/browser/metrics/metr... chrome/browser/metrics/metrics_log_serializer.cc:139: start = local_list.size() - max_list_size; IMO, it would be much easier to read (not to mention more efficient... but that is not significant here) if you only did the iteration sum when this condition was true (and you had a chance truncate the save set). http://codereview.chromium.org/9562007/diff/18002/chrome/browser/metrics/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): http://codereview.chromium.org/9562007/diff/18002/chrome/browser/metrics/metr... chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes, This should be max_bytes. http://codereview.chromium.org/9562007/diff/18002/chrome/browser/metrics/metr... chrome/browser/metrics/metrics_log_serializer.cc:146: DCHECK(max_list_length > 0 || min_bytes > 0); nit: I think you're using a 0 to mean "don't check." I'd suggest using something more like std::max<size_t>, so that the choice is automatically ignored by the logic of the code. http://codereview.chromium.org/9562007/diff/18002/chrome/browser/metrics/metr... chrome/browser/metrics/metrics_log_serializer.cc:162: if ((min_bytes == 0 || bytes_used > min_bytes) && The test for ==0 seemed superfluous, and made this a tad more confusing. Bytes_used is always positive, and so when min_bytes is 0, then bytes_used > min_bytes. http://codereview.chromium.org/9562007/diff/18002/chrome/browser/metrics/metr... chrome/browser/metrics/metrics_log_serializer.cc:167: if (start == local_list.size()) The DCHECK makes it clear that we should never have a start value > local_list.size(), but safe coding suggests it is better to use >= in such comparison (as in line 141 of the original file).
https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... File chrome/browser/metrics/metrics_log_serializer.cc (left): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:139: start = local_list.size() - max_list_size; On 2012/05/10 16:49:40, jar wrote: > IMO, it would be much easier to read (not to mention more efficient... but that > is not significant here) if you only did the iteration sum when this condition > was true (and you had a chance truncate the save set). Done. https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes, On 2012/05/10 16:49:40, jar wrote: > This should be max_bytes. Why? In the new model it behaves as a minimum, not a maximum. See the method-level comment in the header. Calling them both maximums makes the model much harder to describe and understand (I tried that first, and it was terrible). https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:146: DCHECK(max_list_length > 0 || min_bytes > 0); On 2012/05/10 16:49:40, jar wrote: > nit: I think you're using a 0 to mean "don't check." I'd suggest using something > more like std::max<size_t>, so that the choice is automatically ignored by the > logic of the code. It doesn't mean don't check. Passing both arguments as 0 should mean no logs would be stored, which isn't what any sane caller would want; thus it indicates programmer error, and is DCHECK'd https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:162: if ((min_bytes == 0 || bytes_used > min_bytes) && On 2012/05/10 16:49:40, jar wrote: > The test for ==0 seemed superfluous, and made this a tad more confusing. > Bytes_used is always positive, and so when min_bytes is 0, then bytes_used > > min_bytes. Done. https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:167: if (start == local_list.size()) On 2012/05/10 16:49:40, jar wrote: > The DCHECK makes it clear that we should never have a start value > > local_list.size(), but safe coding suggests it is better to use >= in such > comparison (as in line 141 of the original file). Actually, the whole if check goes against Chromium style, which says that we shouldn't handle DCHECK failures gracefully, so the early return shouldn't have been there in the first place. I've removed it entirely.
https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes, Both lines 142 and 143 represent limits. When those limits are exceeded, then some affirmative action is possibly taken. Hence they should both have the same sense: both be max, or both be min. When I thought about it, going back and forth, I decided that the question that was being asked was: Have I exceeded the byte count AND have I exceeded the log count? As a result, I think that both should be max values, with the flexibility to ignore a violation if we don't have both counts violated. In any case, they are the "same sort" of violation. On 2012/05/11 09:16:40, stuartmorgan wrote: > On 2012/05/10 16:49:40, jar wrote: > > This should be max_bytes. > > Why? In the new model it behaves as a minimum, not a maximum. See the > method-level comment in the header. > > Calling them both maximums makes the model much harder to describe and > understand (I tried that first, and it was terrible). https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:146: DCHECK(max_list_length > 0 || min_bytes > 0); If you don't want to persist unsent logs, I'd expect the values to be zero. Why is that outlawed here? https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:161: --start; IT surprised me a bit that you decremented start here, instead of after you've validated with line 162. Consider the case where we have max_list_length == 0, and min_bytes == 1. I think that in such a case you don't want to persist any logs, but you always presist at least one. Is that really desired here? https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:167: if (start == local_list.size()) I don't know where you heard such, as it is not in our style guide, nor our philosophy. Some folks have argued against DCHECKs, suggesting that there should "only be CHECKs," but that political faction has not been accepted. The policy is to identify problems with DCHECKs, and if you can't respond gracefully to the situtation, only then should you instead CHECK or induce a crash. In the case where you can respond gracefully (as in this case), you should do so. On 2012/05/11 09:16:40, stuartmorgan wrote: > On 2012/05/10 16:49:40, jar wrote: > > The DCHECK makes it clear that we should never have a start value > > > local_list.size(), but safe coding suggests it is better to use >= in such > > comparison (as in line 141 of the original file). > > Actually, the whole if check goes against Chromium style, which says that we > shouldn't handle DCHECK failures gracefully, so the early return shouldn't have > been there in the first place. I've removed it entirely.
https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:167: if (start == local_list.size()) On 2012/05/11 21:36:50, jar wrote: > I don't know where you heard such, as it is not in our style guide Not only is it in our style guide, it's one of the only bold things in it: http://dev.chromium.org/developers/coding-style#TOC-CHECK-vs-DCHECK-and-NOTRE... > In the case where you can respond gracefully (as in this case), you should do so. That's exactly the opposite of what the style guide says.
https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes, On 2012/05/11 21:36:50, jar wrote: > Both lines 142 and 143 represent limits. When those limits are exceeded, then > some affirmative action is possibly taken. Hence they should both have the same > sense: both be max, or both be min. Good point; I'll change them both to limit (see below) > When I thought about it, going back and forth, I decided that the question that > was being asked was: > > Have I exceeded the byte count AND have I exceeded the log count? Agreed. > As a result, I think that both should be max values, with the flexibility to > ignore a violation if we don't have both counts violated. I reach the opposite conclusion from your clear statement of the goal. There are two numbers, and we don't stop until we have reached or exceeded both numbers. Thus the numbers are minimums from my perspective. The limiting factor at the upper end is that we stop when we've satisfied both minimums. Since we see the question the same but the terms differently, I've chosen a neutral word--"limit"--and described the function's behavior without using "minimum" or "maximum". Hopefully that will make it understandable no matter which angle someone comes at it from. Let me know if the method comment works for you this way. https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:146: DCHECK(max_list_length > 0 || min_bytes > 0); On 2012/05/11 21:36:50, jar wrote: > If you don't want to persist unsent logs, I'd expect the values to be zero. Why > is that outlawed here? I would expect someone who doesn't want to persist unsent logs not to call a method named "WriteLogsToPrefList". I think it's extremely implausible that we would ever deliberately call with 0 and 0, and much more plausible that someone might accidentally do it. Making accidents easier just to make what I see as a nonsensical call ("Please write logs, but don't write any") seems like the wrong tradeoff to me, which is why I opted for making it an error. https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:161: --start; On 2012/05/11 21:36:50, jar wrote: > IT surprised me a bit that you decremented start here, instead of after you've > validated with line 162. Since the method is currently designed such that it's always going to write a log, I thought it was clearer. I've moved it though; it's more future-proof that way. > Consider the case where we have max_list_length == 0, and min_bytes == 1. I > think that in such a case you don't want to persist any logs, but you always > presist at least one. > > Is that really desired here? Since writing no logs would take 0 bytes, writing no logs when called with a min of 1 byte would violate the documented behavior of the method in this version. So yes, it was intentional. That still works with the decrementing and byte counting moved after the break though. That is a change from the original CL where I was doing a max number of bytes, and trying not to go over it. But with the hybrid approach, that became harder to understand conceptually, so I went to a more straightforward lower bound. I can go to the "soft-max", but ultimately it doesn't really matter in practice, since the byte limit only kicks in when there are a bunch of small logs. The case where soft-max vs. min would be a non-trivial size difference is: <a bunch of small logs summing to byte_limit - 1>, enormous log But we could just as easily have: enormous log 1, enormous log 2, ..., enormous log (length_limit - 1) which would be much more space. So I went for the easier-to-understand model since the practical effects are pretty much the same.
https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:143: size_t min_bytes, I'm fine with the word "limit." I think the difference in the read is that I ask: "Are we going to truncate anything?" and you ask: "Are we going to save all?" On 2012/05/15 15:57:05, stuartmorgan wrote: > On 2012/05/11 21:36:50, jar wrote: > > Both lines 142 and 143 represent limits. When those limits are exceeded, then > > some affirmative action is possibly taken. Hence they should both have the > same > > sense: both be max, or both be min. > > Good point; I'll change them both to limit (see below) > > > When I thought about it, going back and forth, I decided that the question > that > > was being asked was: > > > > Have I exceeded the byte count AND have I exceeded the log count? > > Agreed. > > > As a result, I think that both should be max values, with the flexibility to > > ignore a violation if we don't have both counts violated. > > I reach the opposite conclusion from your clear statement of the goal. There are > two numbers, and we don't stop until we have reached or exceeded both numbers. > Thus the numbers are minimums from my perspective. The limiting factor at the > upper end is that we stop when we've satisfied both minimums. > > Since we see the question the same but the terms differently, I've chosen a > neutral word--"limit"--and described the function's behavior without using > "minimum" or "maximum". Hopefully that will make it understandable no matter > which angle someone comes at it from. Let me know if the method comment works > for you this way. https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:167: if (start == local_list.size()) I went around on this with Darin. The style guide commentary in this area is focused on user-visible elements, and error handling. This area, surrounding UMA, which is not user-visible, is more questionable with regard to that proposed pattern. Given the ease with which this bug can be stimulated, I'd rather see the error handling code, than the DCHECK()... so if you insist on only one, I'd like that one. This will allow folks to set bounds at very low level, and often (not always) have zero logs persisted. For some low-memory users, that may indeed be preferable. On 2012/05/11 22:04:37, stuartmorgan wrote: > On 2012/05/11 21:36:50, jar wrote: > > I don't know where you heard such, as it is not in our style guide > > Not only is it in our style guide, it's one of the only bold things in it: > http://dev.chromium.org/developers/coding-style#TOC-CHECK-vs-DCHECK-and-NOTRE... > > > In the case where you can respond gracefully (as in this case), you > should do so. > > That's exactly the opposite of what the style guide says. https://chromiumcodereview.appspot.com/9562007/diff/19007/chrome/browser/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/19007/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:164: (local_list.size() - start) >= list_length_limit) You probably want ">" rather than ">=" now. Currently, if list_length_limit == 0, then we could break on the first pass... and get to line 170, and violate the DCHECK. I guess this gets back to the meaning of "limit". The alternative (in my other comment) is to remove the DCHECK, and allow a call where we sometimes choose to not persist any logs. I think that is a very plausible scenario on a low memory system anyway.
I'll post a new version with the failsafe error handling tomorrow, but I want to clarify if you also want a change in algorithm as well to avoid another review round-trip. https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/18002/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:167: if (start == local_list.size()) On 2012/05/15 18:27:39, jar wrote: > Given the ease with which this bug can be stimulated, I'd rather see the error > handling code, than the DCHECK()... so if you insist on only one, I'd like that > one. I can re-add the error-handling. (I'm not clear how it's easy to trigger this bug though; someone would have to call the function with both values 0, which is a very strange thing to do, DCHECKs immediately, and is documented as being disallowed.) > This will allow folks to set bounds at very low level, and often (not always) > have zero logs persisted. For some low-memory users, that may indeed be > preferable. That's not possible as currently written. Does this mean you want me to rewrite this using what I was calling "soft-max" instead of what it currently does? As I said before, I think the current version is somewhat easier to understand, and I'm having a hard time imagining that much memory constraint on a device that can run Chrome (see below). https://chromiumcodereview.appspot.com/9562007/diff/19007/chrome/browser/metr... File chrome/browser/metrics/metrics_log_serializer.cc (right): https://chromiumcodereview.appspot.com/9562007/diff/19007/chrome/browser/metr... chrome/browser/metrics/metrics_log_serializer.cc:164: (local_list.size() - start) >= list_length_limit) On 2012/05/15 18:27:40, jar wrote: > You probably want ">" rather than ">=" now. > > Currently, if list_length_limit == 0, then we could break on the first pass... > and get to line 170, and violate the DCHECK. No, because it's an &&. To satisfy both conditions on the first pass, both limits would have to be 0, and that's a DCHECK at the start of the method. > The alternative (in my other comment) is to remove the DCHECK, and allow a call > where we sometimes choose to not persist any logs. I think that is a very > plausible scenario on a low memory system anyway. Since we always have at least one log to save, this would guarantee loss of metrics on every single shutdown (or app termination on mobile). I'm having a hard time imagining a device with so little persistant storage that we would consider always throwing away metrics in normal use, so I'd rather not design for it.
New version up with the early return re-added, and the constant names and constants scrubbed of min/max. Still need an answer on whether it's okay to stop as soon as both limits are exceeded (current code) or I need to change to stopping just before exceeding the second.
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/stuartmorgan@chromium.org/9562007/28002
Try job failure for 9562007-28002 (retry) on linux_rel for step "browser_tests". It's a second try, previously, step "browser_tests" failed. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=linux_rel&...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/stuartmorgan@chromium.org/9562007/28002
Change committed as 138081 |