|
|
Created:
4 years, 2 months ago by brettw Modified:
4 years, 1 month ago CC:
chromium-reviews, mlamouri+watch-content_chromium.org, cbentzel+watch_chromium.org, extensions-reviews_chromium.org, grt+watch_chromium.org, jam, darin-cc_chromium.org, chromium-apps-reviews_chromium.org, pam+watch_chromium.org Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionReduce buggy usage of the registry controlled domain service.
GetRegistryLength for host names canonicalizes the input for the caller, but
then returns the length in the canonicalized input, which is not necessarily
the same as the length in the original string. As a result, computations
performed by the caller based on this value can be wrong (see the bug for
more).
All callers of this function were audited and changed to use on of the
following:
- Many callers don't need the offsets. A new function
HostHasRegistryControlledDomain is added to check for the presence of
a R.C.D. without the risk of returning incorrect string lengths.
- Many callers already have guaranteed-canonical strings (they came out of
a GURL or KURL object soon before the call) These were changed to use a
new GetCanonicalHostRegistryLength function. A further advantage is that
these calls will be faster.
- A new Permissive function is added that handles cases where the input
is necessarily non-canonical.
Adds an IDN test case to the unit tests.
Removes checking for IP addresses in the already-known-canonical cases.
This requires a separate full canonicalization and IP addresses should
never match the R.C.D. list.
BUG=657199
Committed: https://crrev.com/060f6a0de7706cc43f9d773ae9ce2cb36bc9964d
Cr-Commit-Position: refs/heads/master@{#427545}
Patch Set 1 #Patch Set 2 : Merge #Patch Set 3 : . #
Total comments: 3
Patch Set 4 : Fix typo #Patch Set 5 : Handle more cases #Patch Set 6 : . #Patch Set 7 : Fix #
Total comments: 90
Patch Set 8 : Review comments #
Total comments: 13
Patch Set 9 : Review comments #Messages
Total messages: 78 (47 generated)
Description was changed from ========== Rename BUG= ========== to ========== GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetRegistryLengthForCanonicalHost function. A further advantage is that these calls will be faster. - There are two remaining calls that are wrong and it's not obvious how to fix them. The existing function call was renamed to BuggyGetHostRegistryLength and these callers use it with a TODO. BUG=657199 ==========
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_compile_dbg_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_comp...) win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
Merge
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_comp...)
Description was changed from ========== GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetRegistryLengthForCanonicalHost function. A further advantage is that these calls will be faster. - There are two remaining calls that are wrong and it's not obvious how to fix them. The existing function call was renamed to BuggyGetHostRegistryLength and these callers use it with a TODO. BUG=657199 ========== to ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetRegistryLengthForCanonicalHost function. A further advantage is that these calls will be faster. - There are two remaining calls that are wrong and it's not obvious how to fix them. The existing function call was renamed to BuggyGetHostRegistryLength and these callers use it with a TODO. BUG=657199 ==========
.
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetRegistryLengthForCanonicalHost function. A further advantage is that these calls will be faster. - There are two remaining calls that are wrong and it's not obvious how to fix them. The existing function call was renamed to BuggyGetHostRegistryLength and these callers use it with a TODO. BUG=657199 ========== to ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetRegistryLengthForCanonicalHost function. A further advantage is that these calls will be faster. - There are two remaining calls that are wrong and it's not obvious how to fix them. The existing function call was renamed to BuggyGetHostRegistryLength and these callers use it with a TODO. Adds an IDN test case to the unit tests. BUG=657199 ==========
brettw@chromium.org changed reviewers: + pkasting@chromium.org
https://codereview.chromium.org/2433583002/diff/40001/components/google/core/... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2433583002/diff/40001/components/google/core/... components/google/core/browser/google_util.cc:91: const GURL& base_url(CommandLineGoogleBaseURL()); This is separated out from below to avoid double-canonicalizing in the GURL case (IsGoogleDomainUrl). https://codereview.chromium.org/2433583002/diff/40001/extensions/common/manif... File extensions/common/manifest_handlers/externally_connectable.cc (right): https://codereview.chromium.org/2433583002/diff/40001/extensions/common/manif... extensions/common/manifest_handlers/externally_connectable.cc:143: url::CanonHostInfo host_info; This is changed around a bit because it wanted to check for the behavior of "invalid host" and I didn't want to change the semantics. So it explicitly canonicalizes first, checks for success, then uses the bool version later for clarity. This could have used the Buggy* version without being buggy since it only cares about 0, npos, and >0, but that seemed potentially confusing.
https://codereview.chromium.org/2433583002/diff/40001/net/base/registry_contr... File net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc (right): https://codereview.chromium.org/2433583002/diff/40001/net/base/registry_contr... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:278: EXPECT_EQ(10U, BuggyGetHostRegistryLength("foo.xn--fiqs8s", This has a bug but I'm not uploading the fixed version yet to avoid clobbering the comments I added.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
Fix typo
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
PS4 fixes a typo, you may want to review PS3 since I left a couple of comments.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
I may be slow reviewing this; I just got placed on unscheduled sheriff duty for the next two days.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Actually hold off on this, I think I found a solution to do it correctly.
Description was changed from ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetRegistryLengthForCanonicalHost function. A further advantage is that these calls will be faster. - There are two remaining calls that are wrong and it's not obvious how to fix them. The existing function call was renamed to BuggyGetHostRegistryLength and these callers use it with a TODO. Adds an IDN test case to the unit tests. BUG=657199 ========== to ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetCanonicalHostRegistryLength function. A further advantage is that these calls will be faster. - A new Permissive function is added that handles cases where the input is necessarily non-canonical. Adds an IDN test case to the unit tests. BUG=657199 ==========
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetCanonicalHostRegistryLength function. A further advantage is that these calls will be faster. - A new Permissive function is added that handles cases where the input is necessarily non-canonical. Adds an IDN test case to the unit tests. BUG=657199 ========== to ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetCanonicalHostRegistryLength function. A further advantage is that these calls will be faster. - A new Permissive function is added that handles cases where the input is necessarily non-canonical. Adds an IDN test case to the unit tests. Removes checking for IP addresses in the already-known-canonical cases. This requires a separate full canonicalization and IP addresses should never match the R.C.D. list. BUG=657199 ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
.
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Okay, this is ready now, this actually fixes the bug now rather than leaving the several uses cases with a "Buggy' version of the function marked.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_compile_dbg_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_comp...)
Fix
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...)
https://codereview.chromium.org/2433583002/diff/120001/chrome/browser/android... File chrome/browser/android/history_report/delta_file_commons.cc (right): https://codereview.chromium.org/2433583002/diff/120001/chrome/browser/android... chrome/browser/android/history_report/delta_file_commons.cc:19: using net::registry_controlled_domains::GetCanonicalHostRegistryLength; Nit: I'd just qualify the name directly below, but whatever (same comment applies in other files) https://codereview.chromium.org/2433583002/diff/120001/chrome/browser/supervi... File chrome/browser/supervised_user/supervised_user_url_filter.cc (right): https://codereview.chromium.org/2433583002/diff/120001/chrome/browser/supervi... chrome/browser/supervised_user/supervised_user_url_filter.cc:284: const std::string& canonical_host, Nit: Update param name in header as well. I suggest renaming the function CanonicalHostMatchesPattern() too if you're really keen to emphasize that everywhere. https://codereview.chromium.org/2433583002/diff/120001/components/google/core... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2433583002/diff/120001/components/google/core... components/google/core/browser/google_util.cc:95: return IsValidHostName(canonical_host, "google", subdomain_permission); Nit: Simpler: return (base_url.is_valid() && (canonical_host == base_url.host_piece()) || IsValidHostName(canonical_host, "google", subdomain_permission); https://codereview.chromium.org/2433583002/diff/120001/components/omnibox/bro... File components/omnibox/browser/history_quick_provider.cc (right): https://codereview.chromium.org/2433583002/diff/120001/components/omnibox/bro... components/omnibox/browser/history_quick_provider.cc:156: } Nit: Simpler: // Internet hosts get one score, intranet hosts get another. const bool internet_host = net::registry_controlled_domains::HostHasRegistryControlledDomain( host, net::registry_controlled_domains::EXCLUDE_UNKNOWN_REGISTRIES, net::registry_controlled_domains::EXCLUDE_PRIVATE_REGISTRIES); url_what_you_typed_match_score = internet_host ? HistoryURLProvider::kScoreForWhatYouTypedResult : HistoryURLProvider::kScoreForUnvisitedIntranetResult; https://codereview.chromium.org/2433583002/diff/120001/components/ssl_errors/... File components/ssl_errors/error_classification.cc (right): https://codereview.chromium.org/2433583002/diff/120001/components/ssl_errors/... components/ssl_errors/error_classification.cc:281: bool IsHostNameKnownTLD(const std::string& host_name) { Nit: Bad form to ask you to change this while here, but I feel like this name is misleading (it's not saying the host _is_ a known TLD, but rather that it _has_ one), and it would be nice to rename it HostHasKnownTLD(), or even HostHasRegistryControlledDomain(). https://codereview.chromium.org/2433583002/diff/120001/content/renderer/webpu... File content/renderer/webpublicsuffixlist_impl.cc (right): https://codereview.chromium.org/2433583002/diff/120001/content/renderer/webpu... content/renderer/webpublicsuffixlist_impl.cc:17: net::registry_controlled_domains::GetCanonicalHostRegistryLength( Nit: Kinda makes me nervous that this assumes |host| is canonical. At the least we should probably rename |host| to |canonical_host| and write that it must be canonical in the declaration comment? I wish OriginAccessEntry had more comments to indicate that it was only expected to be constructed with canonicalized hosts or something? https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... File extensions/common/manifest_handlers/externally_connectable.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:149: NOTREACHED() << *it; This violates the style guide by handling NOTREACHED. I think we should just delete this block entirely. If you are super-uncomfortable doing it in this CL I can write a separate CL to replace the whole original block with a DCHECK_NE(std::string::npos, registry_length), but then this CL would still want to rip out that DCHECK to make use of HostHasRegistryControlledDomain() anyway, so that's just make-work that doesn't buy anything. So let's just remove it. https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:157: canonical_host, Nit: Seems like using pattern.host() here is fine regardless of whether we rip the block above out, since this function will try to re-canonicalize anyway. https://codereview.chromium.org/2433583002/diff/120001/extensions/common/url_... File extensions/common/url_pattern.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/url_... extensions/common/url_pattern.cc:463: base::StringPrintf("notatld.%s", host_.c_str()), Nit: Replace this StringPrintf() call with just "notatld" + host_, https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (left): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:244: if (gurl.HostIsIPAddress()) I assume deleting this is not a behavior change because presumably we never have TLDs that are numeric. (I haven't thought about if IPv6 addresses cause problems.) Is it an efficiency change? I'd hate to lose efficiency for IP addresses passed here (but if we gain efficiency, yay?). https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:208: std::string canonical_host; // No not modify outside of canon_output. Nit: No -> Do ? https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:209: canonical_host.reserve(host.size()); Nit: The rest of this file, and the majority of Chrome code I've seen, seems to prefer length() to size() for strings. (Several places) https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:215: while (current < host.size()) { Nit: Shorter, narrows scope of |current|: for (size_t current = 0; current < host.size(); ++current) { This also allows deleting the "current++" line near the bottom of the loop. (It's OK to increment |current| unconditionally since if we already hit the end of the string, one past will still terminate the loop.) https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:216: // Advance to next "." or end. Nit: Technically, this comment belongs on the while() loop; |begin| is not related. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:217: size_t begin = current; Nit: Can be const https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:219: current++; Nit: Shorter: current = std::min(host.find('.', begin), host.length()); https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:224: mapping.canonical_begin = static_cast<size_t>(canon_output.length()); Nit: If you save this in a temp here instead of creating a whole MappedHostComponent: size_t canonical_begin = static_cast<size_t>(canon_output.length()); Then you can remove the rest of this block and do something like this below: components.push_back({begin, current, canonical_begin, static_cast<size_t>(canon_output.length())}); https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:226: // Append the canonicalized version of this component. Nit: Append -> Try to append ? https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:231: // Failure to canonicalize this component, append as-is. Nit: Failure -> Failed; comma -> semicolon https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:233: canon_output.push_back(host[i]); Nit: Shorter and more efficient: canon_output.Append(&host[begin], static_cast<int>(current - begin)); (Probably makes sense to pull the "static_cast<int>(current - begin)" part out to a temp given that this will be the second time you've done it) https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:260: if (canonical_rcd_begin < mapping.canonical_end) { Nit: If you change this to this, at the top of the loop: if (canonical_rcd_begin >= mapping.canonical_end) continue; Then not only can you unindent this whole remaining section, but you probably get a small efficiency win, since you'll be doing one comparison instead of two for every skipped component. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:276: for (int current_try = static_cast<int>(mapping.original_end) - 1; Can't we binary-search this instead of linear-searching it, using the length of the canonicalized output? (This also implies, "can't we just compare string lengths and not full strings below".) https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:277: current_try >= static_cast<int>(mapping.original_begin); Seems like we could get by with ">" instead of ">=" since we already checked the == case above, right? https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:278: current_try--) { Nit: Predecrement https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:368: private_filter) > 0; "> 0" doesn't seem right; we want to return false for npos, don't we? https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:370: return GetRegistryLengthImpl(host, unknown_filter, private_filter) > 0; Same question. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:373: return false Semicolon https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.h (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:233: // registry-controlled portion. Nit: How about this last sentence: "Also returns true for an invalid host name like "*.google.com", as long as it has a valid registry-controlled portion." https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:240: // a GURL. Prefer the GURL version or HasRegistryControlledSuffix to eliminate Nit: HasRegistryControlledSuffix -> HostHasRegistryControlledDomain https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:259: // Invalid components are skipped only when they are dot-separated. Nit: I'm confused. Is this paragraph simply trying to say: Like HostHasRegistryControlledDomain, this can accept hostnames that are otherwise invalid, as long as they end with a dot followed by a valid registry-controlled portion. ...Because if not, the bit about "non-canonical dot" has me completely lost. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:263: // string will return std::string::npos like GetRegistryLength(). Nit: Is the last sentence necessary? I mean, we already said this function was like GetRegistryLength(). https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:57: UnknownRegistryFilter unknown_filter) { Nit: This function is rarely passed anything other than EXCLUDE_UNKNOWN_REGISTRIES, so you could eliminate this param and save a bunch of verbiage below, and use GetRegistryLength() directly for the remaining calls (or, I suppose, write an "IncludingBoth" helper). A similar comment applies to GetCanonicalHostRegistryLength(), and possibly to GetRegistryLengthFromURL(), though that has a few more INCLUDE_UNKNOWN_REGISTRIES calls. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:64: UnknownRegistryFilter unknown_filter) { Nit: These two functions are never passed anything other than EXCLUDE_UNKNOWN_REGISTRIES, so you can just eliminate this param and save a bunch of verbiage below. A similar comment applies to GetCanonicalHostRegistryLengthIncludingPrivate(). https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:288: // IDN case. This domain is: Unicode = U+4E2D, U+56FD; IDN = xn--fiqs8s Nit: Why do we care what the Unicode domain is? https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:548: // Trailing spaces are counted as part of the hostname so make it not match. Nit: "hostname so make it not match" -> "hostname, meaning this no longer has a known registry" https://codereview.chromium.org/2433583002/diff/120001/net/cert/cert_verify_p... File net/cert/cert_verify_proc.cc (left): https://codereview.chromium.org/2433583002/diff/120001/net/cert/cert_verify_p... net/cert/cert_verify_proc.cc:540: if (registry_len == 0) We don't need to worry about a behavioral change due to how this used to not check for npos, do we? https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h File url/url_canon.h (right): https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h#newcod... url/url_canon.h:390: // is more common because all numbers by themselves are considered IP Nit: "more common" than what? More common than one might think reading this comment? https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h#newcod... url/url_canon.h:393: // Be careful, unless the input is guaranteed ASCII, it's not possible to split Nit: first comma -> semicolon https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h#newcod... url/url_canon.h:395: // substring as a unit. Nit: Maybe this paragraph should be more direct: "Because Punycode works on each dot-separated substring as a unit, you should only pass this function substrings that represent complete dot-separated subcomponents of the original host." (Do we really need to put in an ASCII caveat?) https://codereview.chromium.org/2433583002/diff/120001/url/url_canon_host.cc File url/url_canon_host.cc (right): https://codereview.chromium.org/2433583002/diff/120001/url/url_canon_host.cc#... url/url_canon_host.cc:327: return success; Nit: Slightly simpler: if (has_non_ascii || has_escaped) { return DoComplexHost(&spec[host.begin], host.len, has_non_ascii, has_escaped, output); } const bool success = DoSimpleHost(&spec[host.begin], host.len, output, &has_non_ascii); DCHECK(!has_non_ascii); return success; https://codereview.chromium.org/2433583002/diff/120001/url/url_canon_host.cc#... url/url_canon_host.cc:345: if (!DoHostSubstring<CHAR, UCHAR>(spec, host, output)) { Nit: Reverse conditional and arms, so "else" does not mentally read like a double-negative https://codereview.chromium.org/2433583002/diff/120001/url/url_canon_unittest.cc File url/url_canon_unittest.cc (right): https://codereview.chromium.org/2433583002/diff/120001/url/url_canon_unittest... url/url_canon_unittest.cc:920: EXPECT_EQ("", out_str); Nit: Since |out_str| is a std::string I tend to prefer EXPECT_EQ(std::string(), out_str) (or EXPECT_TRUE(out_str.empty())), but whatever
Review comments
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
new snap up. https://codereview.chromium.org/2433583002/diff/120001/chrome/browser/android... File chrome/browser/android/history_report/delta_file_commons.cc (right): https://codereview.chromium.org/2433583002/diff/120001/chrome/browser/android... chrome/browser/android/history_report/delta_file_commons.cc:19: using net::registry_controlled_domains::GetCanonicalHostRegistryLength; I left as-is. I wouldn't personally have done the usings, but the call is pretty icky with them so I'm inclined not to mess with it. https://codereview.chromium.org/2433583002/diff/120001/chrome/browser/supervi... File chrome/browser/supervised_user/supervised_user_url_filter.cc (right): https://codereview.chromium.org/2433583002/diff/120001/chrome/browser/supervi... chrome/browser/supervised_user/supervised_user_url_filter.cc:284: const std::string& canonical_host, I renamed the param but not the function. I think I will have to do more to this one in a followup for handling *'s. https://codereview.chromium.org/2433583002/diff/120001/components/google/core... File components/google/core/browser/google_util.cc (right): https://codereview.chromium.org/2433583002/diff/120001/components/google/core... components/google/core/browser/google_util.cc:95: return IsValidHostName(canonical_host, "google", subdomain_permission); Not done. https://codereview.chromium.org/2433583002/diff/120001/components/omnibox/bro... File components/omnibox/browser/history_quick_provider.cc (right): https://codereview.chromium.org/2433583002/diff/120001/components/omnibox/bro... components/omnibox/browser/history_quick_provider.cc:156: } Not done. https://codereview.chromium.org/2433583002/diff/120001/components/ssl_errors/... File components/ssl_errors/error_classification.cc (right): https://codereview.chromium.org/2433583002/diff/120001/components/ssl_errors/... components/ssl_errors/error_classification.cc:281: bool IsHostNameKnownTLD(const std::string& host_name) { Seems like a good improvement, done. https://codereview.chromium.org/2433583002/diff/120001/content/renderer/webpu... File content/renderer/webpublicsuffixlist_impl.cc (right): https://codereview.chromium.org/2433583002/diff/120001/content/renderer/webpu... content/renderer/webpublicsuffixlist_impl.cc:17: net::registry_controlled_domains::GetCanonicalHostRegistryLength( I added the comment, there is no parameter name because it's WebKit :( https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... File extensions/common/manifest_handlers/externally_connectable.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:149: NOTREACHED() << *it; I was also unhappy with this function when I was writing it. I kind of wanted to just check rcd::HostHasRegistryControlledDomain and delete this call altogether. But I don't actually understand the calling code very well or how strong the guarantees actually are above, and it seemed like it might be a potentially security-sensitive check (these patterns are coming from extensions manifests which makes me paranoid). I didn't want to fall through to doing something if it actually means "malicious extension manifest" so I defaulted to exactly preserving the previous behavior. https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:157: canonical_host, If we're keeping the code above (which I think we should), it will tend to be faster to canonicalize an already-canonicalized host. https://codereview.chromium.org/2433583002/diff/120001/extensions/common/url_... File extensions/common/url_pattern.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/url_... extensions/common/url_pattern.cc:463: base::StringPrintf("notatld.%s", host_.c_str()), Done, I didn't notice how dumb the old code was. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (left): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:244: if (gurl.HostIsIPAddress()) On 2016/10/22 05:04:19, Peter Kasting wrote: > I assume deleting this is not a behavior change because presumably we never have > TLDs that are numeric. (I haven't thought about if IPv6 addresses cause > problems.) > > Is it an efficiency change? I'd hate to lose efficiency for IP addresses passed > here (but if we gain efficiency, yay?). I did this because it can never match, and decoding the IP address is equivalent to recanonicalizing the entire host again. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:209: canonical_host.reserve(host.size()); I've always written size() but will use length() for file consistency. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:215: while (current < host.size()) { On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: Shorter, narrows scope of |current|: > > for (size_t current = 0; current < host.size(); ++current) { > > This also allows deleting the "current++" line near the bottom of the loop. > (It's OK to increment |current| unconditionally since if we already hit the end > of the string, one past will still terminate the loop.) Done. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:216: // Advance to next "." or end. On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: Technically, this comment belongs on the while() loop; |begin| is not > related. Done. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:217: size_t begin = current; On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: Can be const I prefer not to mark random local variables const just because they can be. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:219: current++; On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: Shorter: > > current = std::min(host.find('.', begin), host.length()); This seems less readable to me. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:224: mapping.canonical_begin = static_cast<size_t>(canon_output.length()); On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: If you save this in a temp here instead of creating a whole > MappedHostComponent: > > size_t canonical_begin = static_cast<size_t>(canon_output.length()); > > Then you can remove the rest of this block and do something like this below: > > components.push_back({begin, current, canonical_begin, > static_cast<size_t>(canon_output.length())}); This feels icky to me, I don't like implicitly depending on the order of 4 ints in the struct which is surprising to somebody changing the code. My current structure felt cleaner than writing a constructor (which I would normally do for this kind of thing) because it was only used in one place and that would be even more code). https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:226: // Append the canonicalized version of this component. On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: Append -> Try to append ? Done. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:233: canon_output.push_back(host[i]); Actually this old code was wrong in the UTF-16 case (not sure why it compiled). I had to write a separate function to do this. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:260: if (canonical_rcd_begin < mapping.canonical_end) { On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: If you change this to this, at the top of the loop: > > if (canonical_rcd_begin >= mapping.canonical_end) > continue; > > Then not only can you unindent this whole remaining section, but you probably > get a small efficiency win, since you'll be doing one comparison instead of two > for every skipped component. Done. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:276: for (int current_try = static_cast<int>(mapping.original_end) - 1; That doesn't work. Canonicalization can make the result longer or shorter than the original. Canonicalizing the last 3 characters of "%43om" gives "3om, but canonicalizing the last 5 gives "com" https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:278: current_try--) { On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: Predecrement I don't understand this comment. Are you asking because you prefer predecrement generally (in which I case I think your comment is not helpful) or is there a logic problem that I don't see? https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:368: private_filter) > 0; On 2016/10/22 05:04:19, Peter Kasting wrote: > "> 0" doesn't seem right; we want to return false for npos, don't we? Doth done, thanks for catching. I added some unit test cases for the empty string cases. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.h (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:240: // a GURL. Prefer the GURL version or HasRegistryControlledSuffix to eliminate On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: HasRegistryControlledSuffix -> HostHasRegistryControlledDomain Done. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:259: // Invalid components are skipped only when they are dot-separated. On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: I'm confused. > > Is this paragraph simply trying to say: > > Like HostHasRegistryControlledDomain, this can accept hostnames that are > otherwise invalid, as long as they end with a dot followed by a valid > registry-controlled portion. > > ...Because if not, the bit about "non-canonical dot" has me completely lost. It actually handles more cases than your sentence, I expanded on this. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:263: // string will return std::string::npos like GetRegistryLength(). I expanded on this a bit. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:64: UnknownRegistryFilter unknown_filter) { On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: These two functions are never passed anything other than > EXCLUDE_UNKNOWN_REGISTRIES, so you can just eliminate this param and save a > bunch of verbiage below. > > A similar comment applies to GetCanonicalHostRegistryLengthIncludingPrivate(). Done. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:288: // IDN case. This domain is: Unicode = U+4E2D, U+56FD; IDN = xn--fiqs8s Done, the previous patch had a bunch more stuff here. https://codereview.chromium.org/2433583002/diff/120001/net/cert/cert_verify_p... File net/cert/cert_verify_proc.cc (left): https://codereview.chromium.org/2433583002/diff/120001/net/cert/cert_verify_p... net/cert/cert_verify_proc.cc:540: if (registry_len == 0) On 2016/10/22 05:04:20, Peter Kasting wrote: > We don't need to worry about a behavioral change due to how this used to not > check for npos, do we? No, because the name is canonical (it would be slightly faster to call the canonical version and get the length, but I think this is more readable). https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h File url/url_canon.h (right): https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h#newcod... url/url_canon.h:395: // substring as a unit. I mostly did your comment with some addition. https://codereview.chromium.org/2433583002/diff/120001/url/url_canon_host.cc File url/url_canon_host.cc (right): https://codereview.chromium.org/2433583002/diff/120001/url/url_canon_host.cc#... url/url_canon_host.cc:327: return success; On 2016/10/22 05:04:20, Peter Kasting wrote: > Nit: Slightly simpler: > > if (has_non_ascii || has_escaped) { > return DoComplexHost(&spec[host.begin], host.len, has_non_ascii, > has_escaped, output); > } > > const bool success = > DoSimpleHost(&spec[host.begin], host.len, output, &has_non_ascii); > DCHECK(!has_non_ascii); > return success; Done. https://codereview.chromium.org/2433583002/diff/120001/url/url_canon_host.cc#... url/url_canon_host.cc:345: if (!DoHostSubstring<CHAR, UCHAR>(spec, host, output)) { On 2016/10/22 05:04:20, Peter Kasting wrote: > Nit: Reverse conditional and arms, so "else" does not mentally read like a > double-negative Done.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Have not yet re-reviewed, just replying to a few things. Fast work! This was a hairy patch with a lot of stuff to get through :) https://codereview.chromium.org/2433583002/diff/120001/components/omnibox/bro... File components/omnibox/browser/history_quick_provider.cc (right): https://codereview.chromium.org/2433583002/diff/120001/components/omnibox/bro... components/omnibox/browser/history_quick_provider.cc:156: } On 2016/10/24 21:45:23, brettw (ping on IM after 24h) wrote: > Not done. I'm OK with this not being done, but I was slightly worried about the possibility of the original code here being modified later to not set |url_what_you_typed_match_score| sometimes, and I was trying to avoid the existing comments, which aren't quite accurate. ("Probably an intranet host" is incorrect, since this is known to be an intranet host at this point.) Instead of trying to wordsmith these, I was hoping to just write the code so that the comments were no longer needed. I dunno if there is a variant that addresses those concerns better. IMO the original code here is really ugly, and my proposed rewrite is somewhat ugly, and both have to do with deep nesting and long, qualified names. :/ https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... File extensions/common/manifest_handlers/externally_connectable.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:149: NOTREACHED() << *it; On 2016/10/24 21:45:23, brettw (ping on IM after 24h) wrote: > I was also unhappy with this function when I was writing it. I kind of wanted to > just check rcd::HostHasRegistryControlledDomain and delete this call altogether. > > But I don't actually understand the calling code very well or how strong the > guarantees actually are above, and it seemed like it might be a potentially > security-sensitive check (these patterns are coming from extensions manifests > which makes me paranoid). I didn't want to fall through to doing something if it > actually means "malicious extension manifest" so I defaulted to exactly > preserving the previous behavior. Pretty sure we need to handle malicious input here, but that this code is literally saying control flow can't reach here due to the parsing above. It made more sense in the old world since the old code called a function that had a return value that we were basically saying couldn't be returned. But in the new world you're now adding explicit code to go out of its way and check for this case. This raises my hackles because there's no way someone reading the new version could think anything other than that this is, in fact, possible to hit, and we need this code (and the NOTREACHED is the part that's broken). In other words, I think you're reinforcing the precise concerns you had :) I suggest looping in an OWNER of this code to sanity-check that my interpretation and proposed change are correct. If possible I'd like to avoid adding this code just to spin up a separate parallel review to rip it back out, though I also don't want to block your fix if for some reason correct resolution of this turns out to be hard. https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:157: canonical_host, On 2016/10/24 21:45:23, brettw (ping on IM after 24h) wrote: > If we're keeping the code above (which I think we should), it will tend to be > faster to canonicalize an already-canonicalized host. Seems fair, I mostly said this because I'm convinced the above code must go away and I wanted to be clear (to myself) we didn't need to canonicalize this. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:217: size_t begin = current; On 2016/10/24 21:45:24, brettw (ping on IM after 24h) wrote: > On 2016/10/22 05:04:19, Peter Kasting wrote: > > Nit: Can be const > > I prefer not to mark random local variables const just because they can be. There's no Chromium-wide pattern, so that's fine. However, note that both the Google Style Guide and Effective C++ recommend const for local variables when it's possible to use it. Consider changing your personal patterns over time :) https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:219: current++; On 2016/10/24 21:45:24, brettw (ping on IM after 24h) wrote: > On 2016/10/22 05:04:19, Peter Kasting wrote: > > Nit: Shorter: > > > > current = std::min(host.find('.', begin), host.length()); > > This seems less readable to me. What about: current = host.find('.', begin); if (current == std::string::npos) current = host.length(); This is longer, but I can tell immediately what it's trying to do, as opposed to the while loop. Although maybe now that the comment is moved down the existing code is good enough. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:276: for (int current_try = static_cast<int>(mapping.original_end) - 1; On 2016/10/24 21:45:24, brettw (ping on IM after 24h) wrote: > That doesn't work. Canonicalization can make the result longer or shorter than > the original. Canonicalizing the last 3 characters of "%43om" gives "3om, but > canonicalizing the last 5 gives "com" Yeah, you're right. I thought about this some but not deeply enough. Might be good to add a comment like "Note that the length of the canonical output may both increase and decrease as we proceed through the string, so we can't binary-search based on the output length; we need to do an exhaustive linear search." https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:278: current_try--) { On 2016/10/24 21:45:24, brettw (ping on IM after 24h) wrote: > On 2016/10/22 05:04:19, Peter Kasting wrote: > > Nit: Predecrement > > I don't understand this comment. Are you asking because you prefer predecrement > generally (in which I case I think your comment is not helpful) or is there a > logic problem that I don't see? There's no logic problem, I was suggesting using predecrement because you don't need postdecrement form. I thought the Google Style Guide mandated this, which it does, but only for non-scalar types; it says you can use either form for scalar types. Pretty much all the rest of the Chromium code I've worked with uses prefix forms for scalars as well for consistency, which seems wise to me (postfix form leaps out as "I'm doing something clever!"), so I suggest (but do not require) using it here as well for that reason. (Or in other words "there are weak reasons to prefer prefix form here, are there weak reasons to prefer postfix form?")
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_clang_dbg_recipe on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_clan...)
LGTM, but if possible, please try to get sufficient approval from a relevant OWNER of externally_connectable.cc to avoid adding the new code there to handle the NOTREACHED() case before landing. I think everything else from my last message + this one is style debates, comment improvements, small efficiency wins, and the like. Thanks for writing this patch. Checking the callers to see which could really be doing something dangerous can't have been thrilling, and the Permissive version of the registry length getter was hairy. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:277: current_try >= static_cast<int>(mapping.original_begin); On 2016/10/22 05:04:19, Peter Kasting wrote: > Seems like we could get by with ">" instead of ">=" since we already checked the > == case above, right? Still wondering about this. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:57: UnknownRegistryFilter unknown_filter) { On 2016/10/22 05:04:19, Peter Kasting wrote: > Nit: This function is rarely passed anything other than > EXCLUDE_UNKNOWN_REGISTRIES, so you could eliminate this param and save a bunch > of verbiage below, and use GetRegistryLength() directly for the remaining calls > (or, I suppose, write an "IncludingBoth" helper). > > A similar comment applies to GetCanonicalHostRegistryLength(), and possibly to > GetRegistryLengthFromURL(), though that has a few more > INCLUDE_UNKNOWN_REGISTRIES calls. Did you elect to reject this suggestion? Seems like it could reduce a lot of noise. https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h File url/url_canon.h (right): https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h#newcod... url/url_canon.h:390: // is more common because all numbers by themselves are considered IP On 2016/10/22 05:04:20, Peter Kasting wrote: > Nit: "more common" than what? More common than one might think reading this > comment? (Still confused about this) https://codereview.chromium.org/2433583002/diff/140001/components/ssl_errors/... File components/ssl_errors/error_classification.cc (right): https://codereview.chromium.org/2433583002/diff/140001/components/ssl_errors/... components/ssl_errors/error_classification.cc:76: !(HostNameHasKnownTLD(dns_name))) { Nit: Remove extra parens https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:390: return rcd_length > 0; Nit: Shorter and seems slightly simpler?: return (rcd_length != 0) && (rcd_length != std::string::npos); https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.h (right): https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:232: // host names like "*.google.com" as long as it has a valid registry-controlled Nit: Change "for invalid host names" to "for an invalid host name" (plurality agreement). https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:258: // (which they won't match). Nit: I don't totally know what "passed through for suffix lookup" means (and don't know what improvement to suggest). https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:273: // because invalid portions are skipped, it won't return npos in any other case. Nit: OK, after reading this a couple times, I think I understand what it wants to say. My confusion arose because the comments in the third paragraph talk about piecewise canonicalization, but the comments above don't actually talk about that, so I was a bit context-free. It might make sense to say something in the first paragraph like: "Like GetRegistryLength for a potentially non-canonicalized hostname. This splits the input into substrings at '.' characters, then attempts to piecewise-canonicalize the substrings. After finding the registry length of the concatenated piecewise string, it then maps back to the corresponding length in the original input string." With that stage set, much of the rest of the comment is clearer to me. (I might also change paragraph three from "But since the input is split on literal dots only before being piecewise-canonicalized, the dot won't get canonicalized, ..." to "But since the %2E will be in the same substring as the %00, the substring will fail to canonicalize, the %2E will be left escaped, ...") https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc (right): https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:293: EXPECT_FALSE(HostHasRegistryControlledDomain("", EXCLUDE_UNKNOWN_REGISTRIES, Nit: "" -> std::string() (general Chromium pattern, saves a conversion) https://codereview.chromium.org/2433583002/diff/140001/third_party/WebKit/pub... File third_party/WebKit/public/platform/WebPublicSuffixList.h (right): https://codereview.chromium.org/2433583002/diff/140001/third_party/WebKit/pub... third_party/WebKit/public/platform/WebPublicSuffixList.h:41: virtual size_t getPublicSuffixLength(const WebString&) = 0; Nit: In WebKit style, it's OK to add a param name here as long as the name is not obvious/redundant with the function name. Here, I think it's perfectly legal to call this canonical_hostname or whatever. https://codereview.chromium.org/2433583002/diff/140001/url/url_canon.h File url/url_canon.h (right): https://codereview.chromium.org/2433583002/diff/140001/url/url_canon.h#newcod... url/url_canon.h:395: // dot-separated subcomponents of the original host. Even if you has ASCII Nit: has -> have https://codereview.chromium.org/2433583002/diff/140001/url/url_canon.h#newcod... url/url_canon.h:396: // input, percent-escaped characters will have different meaning if split in Nit: meaning -> meanings
asargent@chromium.org changed reviewers: + asargent@chromium.org
https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... File extensions/common/manifest_handlers/externally_connectable.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:149: NOTREACHED() << *it; On 2016/10/24 23:04:32, Peter Kasting wrote: > On 2016/10/24 21:45:23, brettw (ping on IM after 24h) wrote: > > I was also unhappy with this function when I was writing it. I kind of wanted > to > > just check rcd::HostHasRegistryControlledDomain and delete this call > altogether. > > > > But I don't actually understand the calling code very well or how strong the > > guarantees actually are above, and it seemed like it might be a potentially > > security-sensitive check (these patterns are coming from extensions manifests > > which makes me paranoid). I didn't want to fall through to doing something if > it > > actually means "malicious extension manifest" so I defaulted to exactly > > preserving the previous behavior. > > Pretty sure we need to handle malicious input here, but that this code is > literally saying control flow can't reach here due to the parsing above. It > made more sense in the old world since the old code called a function that had a > return value that we were basically saying couldn't be returned. But in the new > world you're now adding explicit code to go out of its way and check for this > case. This raises my hackles because there's no way someone reading the new > version could think anything other than that this is, in fact, possible to hit, > and we need this code (and the NOTREACHED is the part that's broken). In other > words, I think you're reinforcing the precise concerns you had :) > > I suggest looping in an OWNER of this code to sanity-check that my > interpretation and proposed change are correct. If possible I'd like to avoid > adding this code just to spin up a separate parallel review to rip it back out, > though I also don't want to block your fix if for some reason correct resolution > of this turns out to be hard. For context, this check is less about security and more about our philosophy of the web platform and how web pages can rely on extensions to provide additional capabilities. The externally_connectable feature lets an extension specify an explicit set of url match patterns that are allowed to open a messaging pipe to the extension, and when the feature was developed there was a strong desire from the web platform folks to limit this to some finite set of websites (ideally something like a website companion extension developed by the same team as the website itself) to avoid having extensions that become de facto providers of additional API surface to the entire web platform. The guarantee we want is that each url match pattern we put into |matches| is roughly "a particular site", e.g. not matching all hosts or all urls under an "eTLD"/"public suffix". It's ok (definitely not something we should crash on) if an extension accidentally provides an invalid entry; we should just generate an install warning and keep processing. As for the question of the NOTREACHED, it seems like we're guaranteed that pattern.host() will be some non-empty string, but isn't it possible that (eg due to bugs in URLPattern) that it will be some nonempty but nonsensical string that will end up failing the canonicalization process in CanonicalizeHost? I didn't trace beyond the body of CanonicalizeHost in net/base/url_util.cc, but it sure looks like there is a possibility canon_host can end up empty. So my vote would be to just remove the NOTREACHED and change the code to treat this as an install warning and keep going like the other cases here (instead of setting *error and returning a null ExternallyConnectableInfo).
On 2016/10/25 18:20:49, Antony Sargent wrote: > https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... > File extensions/common/manifest_handlers/externally_connectable.cc (right): > > https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... > extensions/common/manifest_handlers/externally_connectable.cc:149: NOTREACHED() > << *it; > On 2016/10/24 23:04:32, Peter Kasting wrote: > > On 2016/10/24 21:45:23, brettw (ping on IM after 24h) wrote: > > > I was also unhappy with this function when I was writing it. I kind of > wanted > > to > > > just check rcd::HostHasRegistryControlledDomain and delete this call > > altogether. > > > > > > But I don't actually understand the calling code very well or how strong the > > > guarantees actually are above, and it seemed like it might be a potentially > > > security-sensitive check (these patterns are coming from extensions > manifests > > > which makes me paranoid). I didn't want to fall through to doing something > if > > it > > > actually means "malicious extension manifest" so I defaulted to exactly > > > preserving the previous behavior. > > > > Pretty sure we need to handle malicious input here, but that this code is > > literally saying control flow can't reach here due to the parsing above. It > > made more sense in the old world since the old code called a function that had > a > > return value that we were basically saying couldn't be returned. But in the > new > > world you're now adding explicit code to go out of its way and check for this > > case. This raises my hackles because there's no way someone reading the new > > version could think anything other than that this is, in fact, possible to > hit, > > and we need this code (and the NOTREACHED is the part that's broken). In > other > > words, I think you're reinforcing the precise concerns you had :) > > > > I suggest looping in an OWNER of this code to sanity-check that my > > interpretation and proposed change are correct. If possible I'd like to avoid > > adding this code just to spin up a separate parallel review to rip it back > out, > > though I also don't want to block your fix if for some reason correct > resolution > > of this turns out to be hard. > > > For context, this check is less about security and more about our philosophy of > the web platform and how web pages can rely on extensions to provide additional > capabilities. The externally_connectable feature lets an extension specify an > explicit set of url match patterns that are allowed to open a messaging pipe to > the extension, and when the feature was developed there was a strong desire from > the web platform folks to limit this to some finite set of websites (ideally > something like a website companion extension developed by the same team as the > website itself) to avoid having extensions that become de facto providers of > additional API surface to the entire web platform. > > The guarantee we want is that each url match pattern we put into |matches| is > roughly "a particular site", e.g. not matching all hosts or all urls under an > "eTLD"/"public suffix". It's ok (definitely not something we should crash on) if > an extension accidentally provides an invalid entry; we should just generate an > install warning and keep processing. > > As for the question of the NOTREACHED, it seems like we're guaranteed that > pattern.host() will be some non-empty string, but isn't it possible that (eg due > to bugs in URLPattern) that it will be some nonempty but nonsensical string that > will end up failing the canonicalization process in CanonicalizeHost? I didn't > trace beyond the body of CanonicalizeHost in net/base/url_util.cc, but it sure > looks like there is a possibility canon_host can end up empty. So my vote would > be to just remove the NOTREACHED and change the code to treat this as an install > warning and keep going like the other cases here (instead of setting *error and > returning a null ExternallyConnectableInfo). Thanks, I'll remove the NOTREACHED and keep the error message.
https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... File extensions/common/manifest_handlers/externally_connectable.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:149: NOTREACHED() << *it; On 2016/10/25 18:20:49, Antony Sargent wrote: > As for the question of the NOTREACHED, it seems like we're guaranteed that > pattern.host() will be some non-empty string, but isn't it possible that (eg due > to bugs in URLPattern) that it will be some nonempty but nonsensical string that > will end up failing the canonicalization process in CanonicalizeHost? Will it? If so, then handling this makes sense, but can you give a sample input which would cause this, so we can stick it in a motivating comment here? I want to make sure we're only handling a real case that can actually get here, and not handling something that could only ever arise if other code was not obeying its contracts. If, for example, URLPattern::Parse() should never return PARSE_SUCCESS on a non-canonical host, we should definitely not handle failing that here. (If it shouldn't do this but currently does, then we should separately fix that issue and then make sure we don't handle that here.) Whereas if it's OK for Parse() to succeed on something that cannot be canonicalized, then handling this is correct, and we should provide a sample such input.
https://codereview.chromium.org/2433583002/diff/120001/components/omnibox/bro... File components/omnibox/browser/history_quick_provider.cc (right): https://codereview.chromium.org/2433583002/diff/120001/components/omnibox/bro... components/omnibox/browser/history_quick_provider.cc:156: } I removed "Probably" from the comment. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:219: current++; On 2016/10/24 23:04:32, Peter Kasting wrote: > On 2016/10/24 21:45:24, brettw (ping on IM after 24h) wrote: > > On 2016/10/22 05:04:19, Peter Kasting wrote: > > > Nit: Shorter: > > > > > > current = std::min(host.find('.', begin), host.length()); > > > > This seems less readable to me. > > What about: > > current = host.find('.', begin); > if (current == std::string::npos) > current = host.length(); I can live with this. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:276: for (int current_try = static_cast<int>(mapping.original_end) - 1; On 2016/10/24 23:04:32, Peter Kasting wrote: > On 2016/10/24 21:45:24, brettw (ping on IM after 24h) wrote: > > That doesn't work. Canonicalization can make the result longer or shorter than > > the original. Canonicalizing the last 3 characters of "%43om" gives "3om, but > > canonicalizing the last 5 gives "com" > > Yeah, you're right. I thought about this some but not deeply enough. > > Might be good to add a comment like "Note that the length of the canonical > output may both increase and decrease as we proceed through the string, so we > can't binary-search based on the output length; we need to do an exhaustive > linear search." Done. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:277: current_try >= static_cast<int>(mapping.original_begin); On 2016/10/25 01:33:32, Peter Kasting wrote: > On 2016/10/22 05:04:19, Peter Kasting wrote: > > Seems like we could get by with ">" instead of ">=" since we already checked > the > > == case above, right? > > Still wondering about this. You're correct, but in the context of a loop counting down ">" jumps out as a bug to me so I think it's more clear to keep it in there. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:278: current_try--) { On 2016/10/24 23:04:32, Peter Kasting wrote: > On 2016/10/24 21:45:24, brettw (ping on IM after 24h) wrote: > > On 2016/10/22 05:04:19, Peter Kasting wrote: > > > Nit: Predecrement > > > > I don't understand this comment. Are you asking because you prefer > predecrement > > generally (in which I case I think your comment is not helpful) or is there a > > logic problem that I don't see? > > There's no logic problem, I was suggesting using predecrement because you don't > need postdecrement form. > > I thought the Google Style Guide mandated this, which it does, but only for > non-scalar types; it says you can use either form for scalar types. Pretty much > all the rest of the Chromium code I've worked with uses prefix forms for scalars > as well for consistency, which seems wise to me (postfix form leaps out as "I'm > doing something clever!"), so I suggest (but do not require) using it here as > well for that reason. I always write postfix for scalars because I think prefix looks weird. There are 21K instances of "i++" and 27K instances of "++i" in the code base, so "pretty much all" is an incorrect impression. Please be more careful about how you write these comments, I actually spent some time questioning myself and reading about increments in loops to make sure there wasn't something I was missing. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain_unittest.cc:57: UnknownRegistryFilter unknown_filter) { On 2016/10/25 01:33:32, Peter Kasting wrote: > On 2016/10/22 05:04:19, Peter Kasting wrote: > > Nit: This function is rarely passed anything other than > > EXCLUDE_UNKNOWN_REGISTRIES, so you could eliminate this param and save a bunch > > of verbiage below, and use GetRegistryLength() directly for the remaining > calls > > (or, I suppose, write an "IncludingBoth" helper). > > > > A similar comment applies to GetCanonicalHostRegistryLength(), and possibly to > > GetRegistryLengthFromURL(), though that has a few more > > INCLUDE_UNKNOWN_REGISTRIES calls. > > Did you elect to reject this suggestion? Seems like it could reduce a lot of > noise. I fixed the other ones where it wasn't used. I agree the test setup in this file isn't great, but felt like I'd done enough tweaking of it for this patch. https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h File url/url_canon.h (right): https://codereview.chromium.org/2433583002/diff/120001/url/url_canon.h#newcod... url/url_canon.h:390: // is more common because all numbers by themselves are considered IP On 2016/10/25 01:33:32, Peter Kasting wrote: > On 2016/10/22 05:04:20, Peter Kasting wrote: > > Nit: "more common" than what? More common than one might think reading this > > comment? > > (Still confused about this) Done. https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:390: return rcd_length > 0; On 2016/10/25 01:33:32, Peter Kasting wrote: > Nit: Shorter and seems slightly simpler?: > > return (rcd_length != 0) && (rcd_length != std::string::npos); Done. https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.h (right): https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:258: // (which they won't match). I clarified. https://codereview.chromium.org/2433583002/diff/140001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.h:273: // because invalid portions are skipped, it won't return npos in any other case. On 2016/10/25 01:33:32, Peter Kasting wrote: > Nit: OK, after reading this a couple times, I think I understand what it wants > to say. My confusion arose because the comments in the third paragraph talk > about piecewise canonicalization, but the comments above don't actually talk > about that, so I was a bit context-free. > > It might make sense to say something in the first paragraph like: > > "Like GetRegistryLength for a potentially non-canonicalized hostname. This > splits the input into substrings at '.' characters, then attempts to > piecewise-canonicalize the substrings. After finding the registry length of the > concatenated piecewise string, it then maps back to the corresponding length in > the original input string." > > With that stage set, much of the rest of the comment is clearer to me. (I might > also change paragraph three from "But since the input is split on literal dots > only before being piecewise-canonicalized, the dot won't get canonicalized, ..." > to "But since the %2E will be in the same substring as the %00, the substring > will fail to canonicalize, the %2E will be left escaped, ...") Done. https://codereview.chromium.org/2433583002/diff/140001/third_party/WebKit/pub... File third_party/WebKit/public/platform/WebPublicSuffixList.h (right): https://codereview.chromium.org/2433583002/diff/140001/third_party/WebKit/pub... third_party/WebKit/public/platform/WebPublicSuffixList.h:41: virtual size_t getPublicSuffixLength(const WebString&) = 0; On 2016/10/25 01:33:33, Peter Kasting wrote: > Nit: In WebKit style, it's OK to add a param name here as long as the name is > not obvious/redundant with the function name. Here, I think it's perfectly > legal to call this canonical_hostname or whatever. Done.
Review comments
The CQ bit was checked by brettw@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... File net/base/registry_controlled_domains/registry_controlled_domain.cc (right): https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:277: current_try >= static_cast<int>(mapping.original_begin); On 2016/10/25 20:28:17, brettw (ping on IM after 24h) wrote: > On 2016/10/25 01:33:32, Peter Kasting wrote: > > On 2016/10/22 05:04:19, Peter Kasting wrote: > > > Seems like we could get by with ">" instead of ">=" since we already checked > > the > > > == case above, right? > > > > Still wondering about this. > > You're correct, but in the context of a loop counting down ">" jumps out as a > bug to me so I think it's more clear to keep it in there. Maybe write '>' with a comment saying we already checked the == case above? It won't make any functional difference, but it really confused me while reading it because I started going back over to see whether the "==" this loop would be checking was functionally different than the "==" we were checking above. I never did completely convince myself one way or the other. I just wouldn't want anyone else to read this and go through the same process. https://codereview.chromium.org/2433583002/diff/120001/net/base/registry_cont... net/base/registry_controlled_domains/registry_controlled_domain.cc:278: current_try--) { On 2016/10/25 20:28:17, brettw (ping on IM after 24h) wrote: > On 2016/10/24 23:04:32, Peter Kasting wrote: > > On 2016/10/24 21:45:24, brettw (ping on IM after 24h) wrote: > > > On 2016/10/22 05:04:19, Peter Kasting wrote: > > > > Nit: Predecrement > > > > > > I don't understand this comment. Are you asking because you prefer > > predecrement > > > generally (in which I case I think your comment is not helpful) or is there > a > > > logic problem that I don't see? > > > > There's no logic problem, I was suggesting using predecrement because you > don't > > need postdecrement form. > > > > I thought the Google Style Guide mandated this, which it does, but only for > > non-scalar types; it says you can use either form for scalar types. Pretty > much > > all the rest of the Chromium code I've worked with uses prefix forms for > scalars > > as well for consistency, which seems wise to me (postfix form leaps out as > "I'm > > doing something clever!"), so I suggest (but do not require) using it here as > > well for that reason. > > I always write postfix for scalars because I think prefix looks weird. There are > 21K instances of "i++" and 27K instances of "++i" in the code base, This surprised me so I tried to look myself: https://cs.chromium.org/search/?q=i%5C%2B%5C%2B+-file:third_party+-file:v8+fi... https://cs.chromium.org/search/?q=%5C%2B%5C%2Bi+-file:third_party+-file:v8+fi... ~1500 vs. ~5500, which is more postfix than I expected, but not so even as the numbers above. Did I do my searches wrong? > Please be more careful about how you write > these comments, I actually spent some time questioning myself and reading about > increments in loops to make sure there wasn't something I was missing. I'm sorry; I didn't even consider that you might interpret my original comment as saying you had a logic bug, or I would have tried to be clearer. (I also would have tried to be clearer had I not been misremembering the Google style guide as mandating this, but I figured since it did you would know what I was referring to.)
https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... File extensions/common/manifest_handlers/externally_connectable.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:149: NOTREACHED() << *it; On 2016/10/25 20:28:17, Peter Kasting wrote: > On 2016/10/25 18:20:49, Antony Sargent wrote: > > As for the question of the NOTREACHED, it seems like we're guaranteed that > > pattern.host() will be some non-empty string, but isn't it possible that (eg > due > > to bugs in URLPattern) that it will be some nonempty but nonsensical string > that > > will end up failing the canonicalization process in CanonicalizeHost? > > Will it? If so, then handling this makes sense, but can you give a sample input > which would cause this, so we can stick it in a motivating comment here? > > I want to make sure we're only handling a real case that can actually get here, > and not handling something that could only ever arise if other code was not > obeying its contracts. If, for example, URLPattern::Parse() should never return > PARSE_SUCCESS on a non-canonical host, we should definitely not handle failing > that here. (If it shouldn't do this but currently does, then we should > separately fix that issue and then make sure we don't handle that here.) > > Whereas if it's OK for Parse() to succeed on something that cannot be > canonicalized, then handling this is correct, and we should provide a sample > such input. Taking a quick look at the code for URLPattern::Parse, it doesn't seem to use any of the GURL canonicalization utilities, in favor of just doing a bunch of "manual" string parsing, so I would not be surprised if there was such a case (or some future refactoring accidentally created one). It's definitely possible for some inputs to URLPattern::Parse to end up with a non-canonical host, so having a canonical host is definitely not part of the contract. This code doesn't execute very frequently, so I would err on the side of extra checks for defense in depth even if the conditions they test for are unlikely.
https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... File extensions/common/manifest_handlers/externally_connectable.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:149: NOTREACHED() << *it; On 2016/10/25 20:57:13, Antony Sargent wrote: > On 2016/10/25 20:28:17, Peter Kasting wrote: > > On 2016/10/25 18:20:49, Antony Sargent wrote: > > > As for the question of the NOTREACHED, it seems like we're guaranteed that > > > pattern.host() will be some non-empty string, but isn't it possible that (eg > > due > > > to bugs in URLPattern) that it will be some nonempty but nonsensical string > > that > > > will end up failing the canonicalization process in CanonicalizeHost? > > > > Will it? If so, then handling this makes sense, but can you give a sample > input > > which would cause this, so we can stick it in a motivating comment here? > > > > I want to make sure we're only handling a real case that can actually get > here, > > and not handling something that could only ever arise if other code was not > > obeying its contracts. If, for example, URLPattern::Parse() should never > return > > PARSE_SUCCESS on a non-canonical host, we should definitely not handle failing > > that here. (If it shouldn't do this but currently does, then we should > > separately fix that issue and then make sure we don't handle that here.) > > > > Whereas if it's OK for Parse() to succeed on something that cannot be > > canonicalized, then handling this is correct, and we should provide a sample > > such input. > > Taking a quick look at the code for URLPattern::Parse, it doesn't seem to use > any of the GURL canonicalization utilities, in favor of just doing a bunch of > "manual" string parsing, so I would not be surprised if there was such a case > (or some future refactoring accidentally created one). It's definitely possible > for some inputs to URLPattern::Parse to end up with a non-canonical host, so > having a canonical host is definitely not part of the contract. > > This code doesn't execute very frequently, so I would err on the side of extra > checks for defense in depth even if the conditions they test for are unlikely. OK. I'm fine with that outcome, as long as we have comments that make this clear (as opposed to the old comments that implied the opposite, or no comments at all). Do you think it would make sense to use more of the GURL/url_util parsing and canonicalization code in the future instead of roll-your-own string parsing? If so perhaps there should be a separate bug for that?
The CQ bit was unchecked by brettw@chromium.org
The CQ bit was checked by brettw@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from pkasting@chromium.org Link to the patchset: https://codereview.chromium.org/2433583002/#ps160001 (title: "Review comments")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... File extensions/common/manifest_handlers/externally_connectable.cc (right): https://codereview.chromium.org/2433583002/diff/120001/extensions/common/mani... extensions/common/manifest_handlers/externally_connectable.cc:149: NOTREACHED() << *it; On 2016/10/25 21:05:19, Peter Kasting wrote: > On 2016/10/25 20:57:13, Antony Sargent wrote: > > On 2016/10/25 20:28:17, Peter Kasting wrote: > > > On 2016/10/25 18:20:49, Antony Sargent wrote: > > > > As for the question of the NOTREACHED, it seems like we're guaranteed that > > > > pattern.host() will be some non-empty string, but isn't it possible that > (eg > > > due > > > > to bugs in URLPattern) that it will be some nonempty but nonsensical > string > > > that > > > > will end up failing the canonicalization process in CanonicalizeHost? > > > > > > Will it? If so, then handling this makes sense, but can you give a sample > > input > > > which would cause this, so we can stick it in a motivating comment here? > > > > > > I want to make sure we're only handling a real case that can actually get > > here, > > > and not handling something that could only ever arise if other code was not > > > obeying its contracts. If, for example, URLPattern::Parse() should never > > return > > > PARSE_SUCCESS on a non-canonical host, we should definitely not handle > failing > > > that here. (If it shouldn't do this but currently does, then we should > > > separately fix that issue and then make sure we don't handle that here.) > > > > > > Whereas if it's OK for Parse() to succeed on something that cannot be > > > canonicalized, then handling this is correct, and we should provide a sample > > > such input. > > > > Taking a quick look at the code for URLPattern::Parse, it doesn't seem to use > > any of the GURL canonicalization utilities, in favor of just doing a bunch of > > "manual" string parsing, so I would not be surprised if there was such a case > > (or some future refactoring accidentally created one). It's definitely > possible > > for some inputs to URLPattern::Parse to end up with a non-canonical host, so > > having a canonical host is definitely not part of the contract. > > > > This code doesn't execute very frequently, so I would err on the side of extra > > checks for defense in depth even if the conditions they test for are unlikely. > > > OK. I'm fine with that outcome, as long as we have comments that make this > clear (as opposed to the old comments that implied the opposite, or no comments > at all). > > Do you think it would make sense to use more of the GURL/url_util parsing and > canonicalization code in the future instead of roll-your-own string parsing? If > so perhaps there should be a separate bug for that? Hmm, the tricky part about doing that in general is that a bunch of patterns that are legal are not valid urls (a "*" can appear in various places), so I wouldn't expect our canonicalizer to work on them: https://developer.chrome.com/extensions/match_patterns But it looks like at least for the host, we could enforce that it canonicalizes properly after parsing, as long as it is non-empty. I've filed crbug.com/659375 for this.
Message was sent while issue was closed.
Description was changed from ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetCanonicalHostRegistryLength function. A further advantage is that these calls will be faster. - A new Permissive function is added that handles cases where the input is necessarily non-canonical. Adds an IDN test case to the unit tests. Removes checking for IP addresses in the already-known-canonical cases. This requires a separate full canonicalization and IP addresses should never match the R.C.D. list. BUG=657199 ========== to ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetCanonicalHostRegistryLength function. A further advantage is that these calls will be faster. - A new Permissive function is added that handles cases where the input is necessarily non-canonical. Adds an IDN test case to the unit tests. Removes checking for IP addresses in the already-known-canonical cases. This requires a separate full canonicalization and IP addresses should never match the R.C.D. list. BUG=657199 ==========
Message was sent while issue was closed.
Committed patchset #9 (id:160001)
Message was sent while issue was closed.
Description was changed from ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetCanonicalHostRegistryLength function. A further advantage is that these calls will be faster. - A new Permissive function is added that handles cases where the input is necessarily non-canonical. Adds an IDN test case to the unit tests. Removes checking for IP addresses in the already-known-canonical cases. This requires a separate full canonicalization and IP addresses should never match the R.C.D. list. BUG=657199 ========== to ========== Reduce buggy usage of the registry controlled domain service. GetRegistryLength for host names canonicalizes the input for the caller, but then returns the length in the canonicalized input, which is not necessarily the same as the length in the original string. As a result, computations performed by the caller based on this value can be wrong (see the bug for more). All callers of this function were audited and changed to use on of the following: - Many callers don't need the offsets. A new function HostHasRegistryControlledDomain is added to check for the presence of a R.C.D. without the risk of returning incorrect string lengths. - Many callers already have guaranteed-canonical strings (they came out of a GURL or KURL object soon before the call) These were changed to use a new GetCanonicalHostRegistryLength function. A further advantage is that these calls will be faster. - A new Permissive function is added that handles cases where the input is necessarily non-canonical. Adds an IDN test case to the unit tests. Removes checking for IP addresses in the already-known-canonical cases. This requires a separate full canonicalization and IP addresses should never match the R.C.D. list. BUG=657199 Committed: https://crrev.com/060f6a0de7706cc43f9d773ae9ce2cb36bc9964d Cr-Commit-Position: refs/heads/master@{#427545} ==========
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/060f6a0de7706cc43f9d773ae9ce2cb36bc9964d Cr-Commit-Position: refs/heads/master@{#427545}
Message was sent while issue was closed.
wychen@chromium.org changed reviewers: + wychen@chromium.org
Message was sent while issue was closed.
Could this be the cause of net_unittests failure? https://build.chromium.org/p/chromium.android/builders/Android%20Cronet%20Bui...
Message was sent while issue was closed.
A revert of this CL (patchset #9 id:160001) has been created in https://codereview.chromium.org/2454553002/ by wychen@chromium.org. The reason for reverting is: net_unittests failure on Cronet builders.
Message was sent while issue was closed.
Failed builders: https://uberchromegw.corp.google.com/i/chromium.android/builders/Android%20Cr... https://uberchromegw.corp.google.com/i/chromium.android/builders/Android%20Cr... https://uberchromegw.corp.google.com/i/chromium.android/builders/Android%20Cr... |