|
|
Created:
8 years, 2 months ago by Bart N. Modified:
8 years, 1 month ago CC:
chromium-reviews, James Su, H Fung Visibility:
Public. |
DescriptionInitial implementation of dedupping search provider's URLs.
There are two main sources of such URLs:
(1) search provider itself
(2) history URL providers (after visiting a previously suggested URL).
A search provider URL may contain time/position specific CGI params,
which if left, may prevent from dedupping.
BUG=146551
TEST=AutocompleteResultTest::SortAndCullDuplicateSearchURLs
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=164790
Patch Set 1 #Patch Set 2 : Fix a minor compilation error. #
Total comments: 13
Patch Set 3 : Addressed comments. #
Total comments: 18
Patch Set 4 : Addressed comments. #Patch Set 5 : #Patch Set 6 : Fixed a broken test. Added a test search engine (foo). #Patch Set 7 : Fix failing tests by making sure that TemplateURLService is loaded when needed. #Patch Set 8 : #Patch Set 9 : Added missing initialization of country code necessary for Android unit tests. #Patch Set 10 : Add a missing include. #Messages
Total messages: 44 (0 generated)
http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:366: // like aqs/oq/aq. I would not be opposed to adding a NormalizeURL to TemplateURL to perform that operation. Wonder what pkasting would think. http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). This may confuse an "image search" with a regular search, since these only differ in their query parameters. I don't know if it's an issue here.
http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). On 2012/10/19 14:25:34, beaudoin wrote: > This may confuse an "image search" with a regular search, since these only > differ in their query parameters. Uuuuugggghhh. This is the sort of reason why I get really unhappy about the idea of having all these totally different result sets (images, news, maps, whatever) just be query params on a single host + path. I don't know how to solve this problem. The only thing I can think of is to pass a "strict" bool arg to the TemplateURL param extraction code, and then somehow instruct the system on which params can be allowed to differ in "strict" mode and which can't. (E.g. we let aqs and oq differ in strict mode and say that anything else is important.) This is ugly, but I don't know how else to do this. Note that for parsing queries out to place in the local search history, we still want non-strict matching. http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:376: TemplateURLRef::SearchTermsArgs(search_terms))); In the original bug we proposed replacing the actual destination URL, not just the one used for de-duping, with the idea that this would prevent us from repeatedly visiting some history URL that had embedded garbage in the aqs/oq params if that were the highest-scoring match. http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... File chrome/browser/autocomplete/autocomplete_match.h (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.h:191: // URLs that might have been generated by the default search provider. We should not pass this in. If the match has an associated TemplateURL already, then clearly that's the TemplateURL to use for normalization. Otherwise, we should use the TemplateURL machinery to dynamically calculate the matching TemplateURL, if any, for the destination URL. (This handles matches from the history providers.) The result of this is that we'll de-dupe properly for non-DSPs as well, and the callers will be insulated from all this complexity to boot. I don't know whether GetTemplateURL() can be changed to do this -- you'd have to check the callers to see whether returning a non-NULL template URL for a "URL from history" result. My suspicion is that it can, but maybe we don't want to unless we make it cache the calculation result. http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... File chrome/browser/autocomplete/autocomplete_result.h (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_result.h:76: Profile* profile); It doesn't really make sense to use a single AutocompleteResult on multiple different Profiles, so let's supply the Profile* at construction time. However, I don't think we should be passing in a Profile* at all, see AutocompleteMatch comments.
Peter - we essentially have to agree on two high level issues raised so far: (1) how to handle image search vs search (2) whether to fix destination URL see my comments inline. http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). On 2012/10/19 22:49:33, Peter Kasting wrote: > On 2012/10/19 14:25:34, beaudoin wrote: > > This may confuse an "image search" with a regular search, since these only > > differ in their query parameters. > > Uuuuugggghhh. This is the sort of reason why I get really unhappy about the > idea of having all these totally different result sets (images, news, maps, > whatever) just be query params on a single host + path. > > I don't know how to solve this problem. The only thing I can think of is to > pass a "strict" bool arg to the TemplateURL param extraction code, and then > somehow instruct the system on which params can be allowed to differ in "strict" > mode and which can't. (E.g. we let aqs and oq differ in strict mode and say > that anything else is important.) I agree this is a problem here. Even with your proposal, we may not be safe (what if oq/aqs are actually important for image search?). I'd be inclined into fixing/modifying ExtractSearchTermsFromURL so it is aware of non-search properties, like Image Search or simply by adding another output param which could indicate what corpora it extracted query terms for (or simply bool* is_search). Then, we would be safe to de-dup iff we are on Search. > > This is ugly, but I don't know how else to do this. Note that for parsing > queries out to place in the local search history, we still want non-strict > matching. http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:376: TemplateURLRef::SearchTermsArgs(search_terms))); On 2012/10/19 22:49:33, Peter Kasting wrote: > In the original bug we proposed replacing the actual destination URL, not just > the one used for de-duping, with the idea that this would prevent us from > repeatedly visiting some history URL that had embedded garbage in the aqs/oq > params if that were the highest-scoring match. Yup, I'm totally aware of that, in fact it could have been me who proposed that, but I don't remember. However, I intentionally didn't implement it in first shot, because it seems more risky to me (e.g. look at the image search example). Would you be OK if I added it as a follow-up item rather than do it in this CL?
http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). On 2012/10/22 18:20:24, Bart N. wrote: > On 2012/10/19 22:49:33, Peter Kasting wrote: > > On 2012/10/19 14:25:34, beaudoin wrote: > > > This may confuse an "image search" with a regular search, since these only > > > differ in their query parameters. > > > > Uuuuugggghhh. This is the sort of reason why I get really unhappy about the > > idea of having all these totally different result sets (images, news, maps, > > whatever) just be query params on a single host + path. > > > > I don't know how to solve this problem. The only thing I can think of is to > > pass a "strict" bool arg to the TemplateURL param extraction code, and then > > somehow instruct the system on which params can be allowed to differ in > "strict" > > mode and which can't. (E.g. we let aqs and oq differ in strict mode and say > > that anything else is important.) > I agree this is a problem here. Even with your proposal, we may not be safe > (what if oq/aqs are actually important for image search?). > > I'd be inclined into fixing/modifying ExtractSearchTermsFromURL so it is aware > of non-search properties, like Image Search or simply by adding another output > param which could indicate what corpora it extracted query terms for (or simply > bool* is_search). Then, we would be safe to de-dup iff we are on Search. The problem is that ExtractSearchTermsFromURL is "Google-agnostic". That is, it uses only information from the TemplateURL. I guess we could add some fields in that TemplateURL to specify "important" query parameters but the exact way to do this and the associated logic are relatively complex if we want to keep it generic. I think this is where PK is headed with his "strict" bool. > > This is ugly, but I don't know how else to do this. Note that for parsing > > queries out to place in the local search history, we still want non-strict > > matching. More generally, I wonder if we should extend TemplateURL to allow search engines to provide a "URL specification", mapping important ref and query parameters to some canonical field known to Chrome. For example: "google": { "name": "Google", "keyword": "google.com", "favicon": "http://www.google.com/favicon.ico", ... "params": { "query": [ {"loc": "ref", "param": "q"}, {"loc": "query", "param": "q"} ], "corpus": [ {"loc": "query", "param": "tbm"} ], "language": [ {"loc": "query", "param": "hl"} ] } } Then we could have a TemplateURL::ParseURL(const GURL& url) returning a Dictionary value with the known fields filled-in. This in turn could be used to de-dupe (instead of the current approach of reconstructing a URL). In some sense it would be a generalized ExtractSearchTermsFromURL that would recognize that the query is made up of more than the search term. http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:376: TemplateURLRef::SearchTermsArgs(search_terms))); On 2012/10/22 18:20:24, Bart N. wrote: > On 2012/10/19 22:49:33, Peter Kasting wrote: > > In the original bug we proposed replacing the actual destination URL, not just > > the one used for de-duping, with the idea that this would prevent us from > > repeatedly visiting some history URL that had embedded garbage in the aqs/oq > > params if that were the highest-scoring match. > Yup, I'm totally aware of that, in fact it could have been me who proposed that, > but I don't remember. > > However, I intentionally didn't implement it in first shot, because it seems > more risky to me (e.g. look at the image search example). Would you be OK if I > added it as a follow-up item rather than do it in this CL? With the approach above we could maybe even reconstruct the cleaned-up URL from the extracted structure.
http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). On 2012/10/22 18:43:23, beaudoin wrote: > On 2012/10/22 18:20:24, Bart N. wrote: > > On 2012/10/19 22:49:33, Peter Kasting wrote: > > > On 2012/10/19 14:25:34, beaudoin wrote: > > > > This may confuse an "image search" with a regular search, since these only > > > > differ in their query parameters. > > > > > > Uuuuugggghhh. This is the sort of reason why I get really unhappy about the > > > idea of having all these totally different result sets (images, news, maps, > > > whatever) just be query params on a single host + path. > > > > > > I don't know how to solve this problem. The only thing I can think of is to > > > pass a "strict" bool arg to the TemplateURL param extraction code, and then > > > somehow instruct the system on which params can be allowed to differ in > > "strict" > > > mode and which can't. (E.g. we let aqs and oq differ in strict mode and say > > > that anything else is important.) > > I agree this is a problem here. Even with your proposal, we may not be safe > > (what if oq/aqs are actually important for image search?). > > > > I'd be inclined into fixing/modifying ExtractSearchTermsFromURL so it is aware > > of non-search properties, like Image Search or simply by adding another output > > param which could indicate what corpora it extracted query terms for (or > simply > > bool* is_search). Then, we would be safe to de-dup iff we are on Search. > > The problem is that ExtractSearchTermsFromURL is "Google-agnostic". That is, it > uses only information from the TemplateURL. I guess we could add some fields in > that TemplateURL to specify "important" query parameters but the exact way to do > this and the associated logic are relatively complex if we want to keep it > generic. I think this is where PK is headed with his "strict" bool. > > > > This is ugly, but I don't know how else to do this. Note that for parsing > > > queries out to place in the local search history, we still want non-strict > > > matching. > > More generally, I wonder if we should extend TemplateURL to allow search engines > to provide a "URL specification", mapping important ref and query parameters to > some canonical field known to Chrome. For example: > > "google": { > "name": "Google", > "keyword": "google.com", > "favicon": "http://www.google.com/favicon.ico", > ... > "params": { > "query": [ {"loc": "ref", "param": "q"}, {"loc": "query", "param": "q"} ], > "corpus": [ {"loc": "query", "param": "tbm"} ], > "language": [ {"loc": "query", "param": "hl"} ] > } > } > > Then we could have a TemplateURL::ParseURL(const GURL& url) returning a > Dictionary value with the known fields filled-in. This in turn could be used to > de-dupe (instead of the current approach of reconstructing a URL). In some sense > it would be a generalized ExtractSearchTermsFromURL that would recognize that > the query is made up of more than the search term. Yes, totally makes sense. I didn't mean to make ExtractSearchTermsFromURL Google specific; I was more thinking in line with your proposal (register CGI params such as it's clear for TemplateURL whether it's OK to drop given param for de-dupping/destination. For instance, we could indicate that a given param needs to be updated before serving the URL again. I like your proposal above. I wonder what Peter thinks. Anyway, it sounds like a bigger refactoring though. http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:376: TemplateURLRef::SearchTermsArgs(search_terms))); On 2012/10/22 18:43:23, beaudoin wrote: > On 2012/10/22 18:20:24, Bart N. wrote: > > On 2012/10/19 22:49:33, Peter Kasting wrote: > > > In the original bug we proposed replacing the actual destination URL, not > just > > > the one used for de-duping, with the idea that this would prevent us from > > > repeatedly visiting some history URL that had embedded garbage in the aqs/oq > > > params if that were the highest-scoring match. > > Yup, I'm totally aware of that, in fact it could have been me who proposed > that, > > but I don't remember. > > > > However, I intentionally didn't implement it in first shot, because it seems > > more risky to me (e.g. look at the image search example). Would you be OK if I > > added it as a follow-up item rather than do it in this CL? > > With the approach above we could maybe even reconstruct the cleaned-up URL from > the extracted structure. Agreed. It would be clean and Google-agnostic.
I think you guys have the right sorts of concerns in mind in general. I don't really relish the idea of teaching TemplateURLs about the concept of a "corpus" (and even more than that, a way to tell whether you're "on the search corpus"). Primarily I worry that over time this idea can mutate such that it no longer makes sense. Years ago, for example, being on the "search corpus" would have meant you never got any image results, whereas now Universal has changed that (on Google at least). Also this sort of concept may map poorly across different search providers. My hope with the suggestion of "important vs. unimportant" was to draw the most general possible line in hopes that it would be something with broad applicability across both time and different providers. It is true that potentially this doesn't work as-is, if for example a parameter is sometimes "important" and sometimes not. But practically speaking I'm betting that doesn't happen -- I suspect that something like "aqs" or "oq" can always be changed without changing the true meaning of the URL, as opposed to being malleable based on the presence of yet another query param. It's not instantly obvious to me whether the right way to go is to single out "important" parameters (e.g. "q") and consider everything else unimportant, or vice versa. It probably depends on (a) how many parameters we know fall into each category and (b) how we think that will change over time. Right now, the implementation basically considers everything other than "whichever param has the %s" as unimportant, so the minimum behavioral change would be to keep that as the default view and just mark a few other parameters as "also important". I think the format proposed by Philippe above is too generic in any case. It may be necessary long-term but I don't want us to over-architect this thing too much. Let's write the simplest and most basic syntax we can get away with and extend over time.
http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete... chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). On 2012/10/22 19:19:42, Bart N. wrote: > On 2012/10/22 18:43:23, beaudoin wrote: > > On 2012/10/22 18:20:24, Bart N. wrote: > > > On 2012/10/19 22:49:33, Peter Kasting wrote: > > > > On 2012/10/19 14:25:34, beaudoin wrote: > > > > > This may confuse an "image search" with a regular search, since these > only > > > > > differ in their query parameters. > > > > > > > > Uuuuugggghhh. This is the sort of reason why I get really unhappy about > the > > > > idea of having all these totally different result sets (images, news, > maps, > > > > whatever) just be query params on a single host + path. > > > > > > > > I don't know how to solve this problem. The only thing I can think of is > to > > > > pass a "strict" bool arg to the TemplateURL param extraction code, and > then > > > > somehow instruct the system on which params can be allowed to differ in > > > "strict" > > > > mode and which can't. (E.g. we let aqs and oq differ in strict mode and > say > > > > that anything else is important.) > > > I agree this is a problem here. Even with your proposal, we may not be safe > > > (what if oq/aqs are actually important for image search?). > > > > > > I'd be inclined into fixing/modifying ExtractSearchTermsFromURL so it is > aware > > > of non-search properties, like Image Search or simply by adding another > output > > > param which could indicate what corpora it extracted query terms for (or > > simply > > > bool* is_search). Then, we would be safe to de-dup iff we are on Search. > > > > The problem is that ExtractSearchTermsFromURL is "Google-agnostic". That is, > it > > uses only information from the TemplateURL. I guess we could add some fields > in > > that TemplateURL to specify "important" query parameters but the exact way to > do > > this and the associated logic are relatively complex if we want to keep it > > generic. I think this is where PK is headed with his "strict" bool. > > > > > > This is ugly, but I don't know how else to do this. Note that for parsing > > > > queries out to place in the local search history, we still want non-strict > > > > matching. > > > > More generally, I wonder if we should extend TemplateURL to allow search > engines > > to provide a "URL specification", mapping important ref and query parameters > to > > some canonical field known to Chrome. For example: > > > > "google": { > > "name": "Google", > > "keyword": "google.com", > > "favicon": "http://www.google.com/favicon.ico", > > ... > > "params": { > > "query": [ {"loc": "ref", "param": "q"}, {"loc": "query", "param": "q"} > ], > > "corpus": [ {"loc": "query", "param": "tbm"} ], > > "language": [ {"loc": "query", "param": "hl"} ] > > } > > } > > > > Then we could have a TemplateURL::ParseURL(const GURL& url) returning a > > Dictionary value with the known fields filled-in. This in turn could be used > to > > de-dupe (instead of the current approach of reconstructing a URL). In some > sense > > it would be a generalized ExtractSearchTermsFromURL that would recognize that > > the query is made up of more than the search term. > Yes, totally makes sense. I didn't mean to make ExtractSearchTermsFromURL Google > specific; I was more thinking in line with your proposal (register CGI params > such as it's clear for TemplateURL whether it's OK to drop given param for > de-dupping/destination. For instance, we could indicate that a given param needs > to be updated before serving the URL again. > > I like your proposal above. I wonder what Peter thinks. Anyway, it sounds like a > bigger refactoring though. Totally. It would probably also have to wait after the move of template_url_prepopulate_data to JSON which I'm working on. For the moment, I would be inclined to say dedupping-that-confuses-corpuses is better than no-dedupping-at-all and would recommend accepting that CL with the current design, which is probably the best we can do without a major refactoring.
On 2012/10/22 21:08:01, Peter Kasting wrote: > I think you guys have the right sorts of concerns in mind in general. > > I don't really relish the idea of teaching TemplateURLs about the concept of a > "corpus" (and even more than that, a way to tell whether you're "on the search > corpus"). Primarily I worry that over time this idea can mutate such that it no > longer makes sense. Years ago, for example, being on the "search corpus" would > have meant you never got any image results, whereas now Universal has changed > that (on Google at least). Also this sort of concept may map poorly across > different search providers. Actually my main intent was to expose a function which would reliably tell whether a given URL is produced by a search provider or not, and do it in a Google-agnostic way. One approach would be to modify ExtractSearchTerms function to never return query terms unless URL matches the search provider's template, or have a separate function. > > My hope with the suggestion of "important vs. unimportant" was to draw the most > general possible line in hopes that it would be something with broad > applicability across both time and different providers. It is true that > potentially this doesn't work as-is, if for example a parameter is sometimes > "important" and sometimes not. But practically speaking I'm betting that > doesn't happen -- I suspect that something like "aqs" or "oq" can always be > changed without changing the true meaning of the URL, as opposed to being > malleable based on the presence of yet another query param. > > It's not instantly obvious to me whether the right way to go is to single out > "important" parameters (e.g. "q") and consider everything else unimportant, or > vice versa. It probably depends on (a) how many parameters we know fall into > each category and (b) how we think that will change over time. Right now, the > implementation basically considers everything other than "whichever param has > the %s" as unimportant, so the minimum behavioral change would be to keep that > as the default view and just mark a few other parameters as "also important". From my personal experience, it's much safer to explicitly drop something you know about rather than keeping important params and dropping everything else. However, for dedupping purpose, either approach is fine as long as we are 100% sure we are dealing with the proper URL (see my comment above). > > I think the format proposed by Philippe above is too generic in any case. It > may be necessary long-term but I don't want us to over-architect this thing too > much. Let's write the simplest and most basic syntax we can get away with and > extend over time. I'm fine with a simpler format, but I'm still not sure what the best way would be. You can think of either having a regex (which could be be default derived from template by replacing all params with ".*" or have it defined for each provider. However, in case of image search, which is a superset of Google search, it's probably important to indicate which params must not be present (unless you are fine risking that at some point image search URL may have aqs, because it already has "oq"). Also, it strikes me that given two entries "image search" and "web search", it would be totally fine just to retain one of them, whichever is scored higher. If so, this CL in its current form would work as desired, disregarding the destination URL fix. Let me think more about it.
On 2012/10/22 21:36:48, beaudoin wrote: > For the moment, I would be inclined to say dedupping-that-confuses-corpuses is > better than no-dedupping-at-all and would recommend accepting that CL with the > current design, which is probably the best we can do without a major > refactoring. Sorry for a cross-post, but I essentially suggested similar approach - simply ignoring different corpora for dedupping purpose.
BTW, I chatted with Harry and one idea we could implement is to piggy back on domain check and have it indicated what params are important/not-important etc. This would give us flexibility to control it on the server side, yet make it Google-agnostic.
On 2012/10/22 21:39:13, Bart N. wrote: > On 2012/10/22 21:08:01, Peter Kasting wrote: > > I don't really relish the idea of teaching TemplateURLs about the concept of a > > "corpus" (and even more than that, a way to tell whether you're "on the search > > corpus"). Primarily I worry that over time this idea can mutate such that it > no > > longer makes sense. > > Actually my main intent was to expose a function which would reliably tell > whether a given URL is produced by a search provider or not, and do it in a > Google-agnostic way. One approach would be to modify ExtractSearchTerms function > to never return query terms unless URL matches the search provider's template, > or have a separate function. I don't think I'm following you. The primary problem I thought we were trying to solve is that the conceptual "same search" can be represented by an arbitrary number of distinct URLs in history. What's important is not whether those URLs were generated via the Chrome search provider or via typing into google.com itself, but the fact that they "mean the same". Leaving the search provider out entirely, we don't want history to return two URL results that are for identical searches save for a different "aqs"; and in fact we want to strip or recalculate that param in the URL the user ultimately navigates to. > From my personal experience, it's much safer to explicitly drop something you > know about rather than keeping important params and dropping everything else. Sure, that's obviously the more cautious approach. My worry is that it unnecessarily restricts Chrome's capabilities when other teams at Google can and do add/remove/modify the URL params all the time. We want to match as many URLs as we can even if some team decides to add an &foo=bar argument that doesn't really mean much to the user. > > Let's write the simplest and most basic syntax we can get away with and > > extend over time. > > I'm fine with a simpler format, but I'm still not sure what the best way would > be. You can think of either having a regex (which could be be default derived > from template by replacing all params with ".*" or have it defined for each > provider. I never imagined we could define this any other way than per-provider. I am keen to avoid regexes for a variety of reasons. > Also, it strikes me that given two entries "image search" and "web search", it > would be totally fine just to retain one of them, whichever is scored higher. If > so, this CL in its current form would work as desired, disregarding the > destination URL fix. Maybe. I'm not 100% sure.
On 2012/10/22 21:53:57, Bart N. wrote: > BTW, I chatted with Harry and one idea we could implement is to piggy back on > domain check and have it indicated what params are important/not-important etc. > This would give us flexibility to control it on the server side, yet make it > Google-agnostic. Let's not do this. searchdomaincheck is the opposite of Google-agnostic and this is really unrelated data that IMO isn't the province of that service.
On 2012/10/22 21:55:54, Peter Kasting wrote: > On 2012/10/22 21:39:13, Bart N. wrote: > > On 2012/10/22 21:08:01, Peter Kasting wrote: > > > I don't really relish the idea of teaching TemplateURLs about the concept of > a > > > "corpus" (and even more than that, a way to tell whether you're "on the > search > > > corpus"). Primarily I worry that over time this idea can mutate such that > it > > no > > > longer makes sense. > > > > Actually my main intent was to expose a function which would reliably tell > > whether a given URL is produced by a search provider or not, and do it in a > > Google-agnostic way. One approach would be to modify ExtractSearchTerms > function > > to never return query terms unless URL matches the search provider's template, > > or have a separate function. > > I don't think I'm following you. > > The primary problem I thought we were trying to solve is that the conceptual > "same search" can be represented by an arbitrary number of distinct URLs in > history. What's important is not whether those URLs were generated via the > Chrome search provider or via typing into http://google.com itself, but the fact that > they "mean the same". Leaving the search provider out entirely, we don't want > history to return two URL results that are for identical searches save for a > different "aqs"; and in fact we want to strip or recalculate that param in the > URL the user ultimately navigates to. Yes, we are on the same page except I'm proposing a different approach to solving it. What I'm saying is this: given a set of URLs regardless if they were generated by Chrome search provider, were manually entered in the box or were simply links followed from a page, I want to de-dup them. The way I want to do is simple: ask the only authority (in this case search provider) to normalize the URLs such as they are comparable for dedupping purpose. Only search provider will know if a URL is search related or not. History URL provider has no idea if the URL relates to search or not, as it is simply search agnostic. If you believe this should be done in a different way, then please let me know. I still believe that search provider (or its template) is the best place to find out if two URLs mean the same. Please note that there is no reliable way to tell if two URLs are same unless you understand a particular site they refer two. While it may be possible to detect and drop CGI params that refer to an HTTP session (e.g. jsessionid), in general it may not be possible to generalize the deduping concept for any History URL. Thus, I focused on the source we know (search).
You don't want SearchProvider, you want TemplateURLService/TemplateURL/TemplateURLRef. TemplateURL* aren't going to be dropping parameters from any-old-URL, they'll only work on well-defined TemplateURLs. If I visit http://crazysite.com/blah?x=y, there won't be a matching TemplateURL so we won't be doing any kind of mangling of the URL.
On 2012/10/22 22:10:53, Peter Kasting wrote: > You don't want SearchProvider, you want > TemplateURLService/TemplateURL/TemplateURLRef. > > TemplateURL* aren't going to be dropping parameters from any-old-URL, they'll > only work on well-defined TemplateURLs. If I visit > http://crazysite.com/blah?x=y, there won't be a matching TemplateURL so we won't > be doing any kind of mangling of the URL. That is something I didn't follow entirely, because I simply don't know how it works. In this case, I have to blindly trust you :)
On 2012/10/22 22:15:36, Bart N. wrote: > On 2012/10/22 22:10:53, Peter Kasting wrote: > > You don't want SearchProvider, you want > > TemplateURLService/TemplateURL/TemplateURLRef. > > > > TemplateURL* aren't going to be dropping parameters from any-old-URL, they'll > > only work on well-defined TemplateURLs. If I visit > > http://crazysite.com/blah?x=y, there won't be a matching TemplateURL so we > won't > > be doing any kind of mangling of the URL. > > That is something I didn't follow entirely, because I simply don't know how it > works. In this case, I have to blindly trust you :) It's worth understanding this. TemplateURL is the class that represents "a search engine". Each line you see in Chrome's "manage search engines" dialog corresponds to one TemplateURL. We ship out-of-the-box with a few TemplateURLs defined for each locale as specified in template_url_prepopulate_data.cc. TemplateURL can do something like "given this search string, give me back the navigation URL". The TemplateURLService is the internal class that manages the known TemplateURLs and can tell you things like "given this keyword string, give me back the TemplateURL it corresponds to". The SearchProvider and KeywordProvider are both users of TemplateURLs. Given your input, they transform it into destination URLs by using TemplateURLs. The SearchProvider mainly deals with whichever TemplateURL is set as your default search provider, while the KeywordProvider gets TemplateURLs by lookup after parsing your input string to extract a keyword. A random Chrome user may have from a few to a few hundred TemplateURLs defined. Of these, only the ones we ship in the prepopulate data would have fields we'd add listing things like "additional query params that cannot be safely dropped". If that didn't answer your confusion, then I didn't understand what you were saying :)
On 2012/10/22 22:22:09, Peter Kasting wrote: > On 2012/10/22 22:15:36, Bart N. wrote: > > On 2012/10/22 22:10:53, Peter Kasting wrote: > > > You don't want SearchProvider, you want > > > TemplateURLService/TemplateURL/TemplateURLRef. > > > > > > TemplateURL* aren't going to be dropping parameters from any-old-URL, > they'll > > > only work on well-defined TemplateURLs. If I visit > > > http://crazysite.com/blah?x=y, there won't be a matching TemplateURL so we > > won't > > > be doing any kind of mangling of the URL. > > > > That is something I didn't follow entirely, because I simply don't know how it > > works. In this case, I have to blindly trust you :) > > It's worth understanding this. That's a great explanation and thanks for that (and dealing with me). > > TemplateURL is the class that represents "a search engine". Each line you see > in Chrome's "manage search engines" dialog corresponds to one TemplateURL. We > ship out-of-the-box with a few TemplateURLs defined for each locale as specified > in template_url_prepopulate_data.cc. TemplateURL can do something like "given > this search string, give me back the navigation URL". The TemplateURLService is > the internal class that manages the known TemplateURLs and can tell you things > like "given this keyword string, give me back the TemplateURL it corresponds > to". > > The SearchProvider and KeywordProvider are both users of TemplateURLs. Given > your input, they transform it into destination URLs by using TemplateURLs. The > SearchProvider mainly deals with whichever TemplateURL is set as your default > search provider, while the KeywordProvider gets TemplateURLs by lookup after > parsing your input string to extract a keyword. > > A random Chrome user may have from a few to a few hundred TemplateURLs defined. > Of these, only the ones we ship in the prepopulate data would have fields we'd > add listing things like "additional query params that cannot be safely dropped". I still don't understand what's wrong with my current code: TemplateURL* template_url = TemplateURLPrepopulateData::GetPrepopulatedDefaultSearch(profile); why is this not going to work? My understanding is that this method will give me back the "pre-populated" template I need as the name says. Or is it because I pass non-empty profile, in which case could be fixed as simply passing NULL? > > If that didn't answer your confusion, then I didn't understand what you were > saying :)
(Jumping back a little, sorry...) The question here is: "Given url_a and url_b will the search engine return the same results?" One way to find out is to ask the server. This, I believe, is the server-side service Bart talks about. (Reminds me of URL canonicalization -- only applied to search URLs.) I'm not a big fan for a variety of reasons: - Adds latency - Google-agnostic in theory but in practice only Google will support it - Chrome cannot learn anything on the query without pinging the server -- makes it less practical for deduping, say, history. OTOH, I agree with PK that "corpus" is not a concept we want to teach Chrome itself. What I was proposing is that ParseURL return a schema-less structure. The search provider is free to define any fields it wants, as long as it maps to a query or ref parameter. Internally, Chrome can grok some typical fields ("search_terms", "language"), and it can use the entire extracted structure for comparison purposes. In short, the contract between the search provider and Chrome is: - If the Dict are equals, the URLs will return the same search queries. The benefit of having field name internally in Chrome may be, for example, to allow it to reconstruct URLs from one of these structures. For example, the TemplateURL might have: "url": "http://mysearch.com?q={field:searchTerms}&hl={field:language}&corpus={field:corpus}"
On 2012/10/22 22:32:16, Bart N. wrote: > I still don't understand what's wrong with my current code: > > TemplateURL* template_url = > TemplateURLPrepopulateData::GetPrepopulatedDefaultSearch(profile); What you want to know is "what TemplateURLs, if any, correspond to the URLs these matches will navigate to" (and subsequently, "what is the normalized form of each such URL"). What you're actually asking is "what TemplateURL is the out-of-the-box default for this locale". (Which, as it happens, is always Google.) Note that even fixing this to ask "what is the current default search provider for this user" is still wrong; we want proper de-duping and URL normalization for URLs that come from TemplateURLs other than the default search provider too.
On 2012/10/22 22:40:44, beaudoin wrote: > (Jumping back a little, sorry...) > > The question here is: "Given url_a and url_b will the search engine return the > same results?" One way to find out is to ask the server. This, I believe, is the > server-side service Bart talks about. (Reminds me of URL canonicalization -- > only applied to search URLs.) I'm not a big fan for a variety of reasons: > - Adds latency > - Google-agnostic in theory but in practice only Google will support it > - Chrome cannot learn anything on the query without pinging the server -- makes > it less practical for deduping, say, history. AFAIK, Chrome already performs this check for various reasons. I just wanted to piggy back on that. The advantage of doing so is that whenever we add new param, we don't need to change Chrome binary (and we are not limited by the release cycle). For instance, as of now, search URL has also param gs_l which contains time sensitive data. This should be dropped. This param is generated by searchbox implementation and is subject to change. > > OTOH, I agree with PK that "corpus" is not a concept we want to teach Chrome > itself. What I was proposing is that ParseURL return a schema-less structure. > The search provider is free to define any fields it wants, as long as it maps to > a query or ref parameter. Internally, Chrome can grok some typical fields > ("search_terms", "language"), and it can use the entire extracted structure for > comparison purposes. In short, the contract between the search provider and > Chrome is: > - If the Dict are equals, the URLs will return the same search queries. > > The benefit of having field name internally in Chrome may be, for example, to > allow it to reconstruct URLs from one of these structures. For example, the > TemplateURL might have: > "url": > "http://mysearch.com?q={field:searchTerms}&hl={field:language}&corpus={field:corpus}" While this looks fine, how are you going to say - please don't drop CGI param X? A good example is image search where we MUST NOT drop "site=imghp&tbm=isch"
On 2012/10/22 22:40:44, beaudoin wrote: > The question here is: "Given url_a and url_b will the search engine return the > same results?" Correct. Or perhaps correct if we say "...return similar enough results for our purposes". (Which makes us have to define "similar enough", which I'm not sure on.) Note however that there is also another question, which is "what is the correct form of this historical URL if I want to re-navigate to it now?" The idea here is that old values of things like "aqs" and "oq" are probably wrong and need to be stripped or recalculated. That does suggest to me one other way to tackle this -- we could try to "normalize" URLs before they're added to history, so that anything we pull up will be missing "bogus" (in the sense of their values being out of date) params like these. This might make life slightly easier. But we'd have to run a migration pass on the user's history :(. My suggestions have been made with an eye toward solving both these problems. > What I was proposing is that ParseURL return a schema-less structure. > The search provider is free to define any fields it wants, as long as it maps to > a query or ref parameter. Internally, Chrome can grok some typical fields > ("search_terms", "language"), and it can use the entire extracted structure for > comparison purposes. In short, the contract between the search provider and > Chrome is: > - If the Dict are equals, the URLs will return the same search queries. > > The benefit of having field name internally in Chrome may be, for example, to > allow it to reconstruct URLs from one of these structures. For example, the > TemplateURL might have: > "url": > "http://mysearch.com?q={field:searchTerms}&hl={field:language}&corpus={field:corpus}" We already have some pieces of this, in the form of the OSDD fields that we understand (like "search terms" and "language" and "input encoding" and the like). The rest of the fields seem to break down into "args we should preserve unchanged when normalizing" and "args we don't care about and can strip out". Hence my suggestion of a simple bit that marks something as "important" or not.
On 2012/10/22 22:45:16, Peter Kasting wrote: > On 2012/10/22 22:32:16, Bart N. wrote: > > I still don't understand what's wrong with my current code: > > > > TemplateURL* template_url = > > TemplateURLPrepopulateData::GetPrepopulatedDefaultSearch(profile); > > What you want to know is "what TemplateURLs, if any, correspond to the URLs > these matches will navigate to" (and subsequently, "what is the normalized form > of each such URL"). What you're actually asking is "what TemplateURL is the > out-of-the-box default for this locale". (Which, as it happens, is always > Google.) Ah, I understand now, thank you. Yes, I wanted to get the user's default search provider. > > Note that even fixing this to ask "what is the current default search provider > for this user" is still wrong; we want proper de-duping and URL normalization > for URLs that come from TemplateURLs other than the default search provider too. Yes, you expressed this as well, but the goal of this CL, as the description reads, is to focus on search provider's URL. We can of course extend it, in which case you are right and we are on the same page.
On 2012/10/22 22:52:01, Bart N. wrote: > On 2012/10/22 22:40:44, beaudoin wrote: > > The question here is: "Given url_a and url_b will the search engine return the > > same results?" One way to find out is to ask the server. This, I believe, is > the > > server-side service Bart talks about. (Reminds me of URL canonicalization -- > > only applied to search URLs.) I'm not a big fan for a variety of reasons: > > - Adds latency > > - Google-agnostic in theory but in practice only Google will support it > > - Chrome cannot learn anything on the query without pinging the server -- > makes > > it less practical for deduping, say, history. > > AFAIK, Chrome already performs this check for various reasons. But it doesn't, always. For example, if Google isn't your default search provider we may not ever check at all. > The advantage of doing so is that whenever we add new param, > we don't need to change Chrome binary (and we are not limited by the release > cycle). True, but I don't see the release cycle as a big deal given the kinds of problems we're talking about. If people aren't getting gs_l de-duped for a few weeks, I don't see that as a major issue. OTOH, building this into searchdomaincheck has major ramifications that I don't like given that searchdomaincheck will never ever be implemented outside Google and is already super-compleex on the Chrome side.
On 2012/10/22 22:55:20, Bart N. wrote: > On 2012/10/22 22:45:16, Peter Kasting wrote: > > Note that even fixing this to ask "what is the current default search provider > > for this user" is still wrong; we want proper de-duping and URL normalization > > for URLs that come from TemplateURLs other than the default search provider > > too. > > Yes, you expressed this as well, but the goal of this CL, as the description > reads, is to focus on search provider's URL. We can of course extend it, in > which case you are right and we are on the same page. I think we should try and solve the general case now. My original review comments were aimed at doing this, and also avoid some plumbing and conceptual dependencies that make me uncomfortable (like the idea of an AutocompleteResult method taking a Profile* or knowing the user's default search provider).
I agree with solving the general case (it's not too much harder) but still think we should not worry about corpus-confusion right now. This is a much more involved change as you'll need at least that "important" bit in the TemplateURL and, as I've found out recently, it causes ripples all over Chrome. If/when you need that "important bit" feel free to ping me or assign it to me.
On 2012/10/22 23:08:15, beaudoin wrote: > I agree with solving the general case (it's not too much harder) but still think > we should not worry about corpus-confusion right now. This is a much more > involved change as you'll need at least that "important" bit in the TemplateURL > and, as I've found out recently, it causes ripples all over Chrome. > > If/when you need that "important bit" feel free to ping me or assign it to me. Sounds good. Peter, I'll probably need your advice regarding performance impact - when I initially implemented my change, I wanted to use template URL associated with the autocomplete match, but I found it to be NULL for the history URLs (is this expected?). You mentioned something about dynamically finding template a given URL matches using TemplateURLService, but you also mentioned that this may not be cached. Also, is it possible that a given URL will match more than one template? If so, which one should we use?
On 2012/10/22 23:31:05, Bart N. wrote: > On 2012/10/22 23:08:15, beaudoin wrote: > > I agree with solving the general case (it's not too much harder) but still > think > > we should not worry about corpus-confusion right now. This is a much more > > involved change as you'll need at least that "important" bit in the > TemplateURL > > and, as I've found out recently, it causes ripples all over Chrome. > > > > If/when you need that "important bit" feel free to ping me or assign it to me. > Sounds good. > > Peter, I'll probably need your advice regarding performance impact - when I > initially implemented my change, I wanted to use template URL associated with > the autocomplete match, but I found it to be NULL for the history URLs (is this > expected?). You mentioned something about dynamically finding template a given > URL matches using TemplateURLService, but you also mentioned that this may not > be cached. What I was worried about was the impact of putting this in AutocompleteMatch::GetTemplateURL() but not caching it, given that that function may be called repeatedly. But I was honestly worried more about the correctness of doing that than the performance. > Also, is it possible that a given URL will match more than one template? If so, > which one should we use? In practice the TemplateURLService functions which return matching TemplateURLs for your queries all just return single URLs, even if multiple ones could in theory match. I don't remember which match gets chosen -- I think it's the one with the lowest ID, which means the one that was added earliest -- but it's not generally a big deal. I wouldn't worry about this.
Peter/Philippe, I've partially addressed your comments and I'd like to submit it in this form (modulo your comments). Since we are approaching M24 cutoff, I'd like to have at least dedup fixed as it's more prominent issue if AQS is enabled. I still want to fix destination URL normalization, but I prefer to do it in a separate CL. What is supported now: - Dedupping will work fine across all search engines (it's not limited to the default search provider as intended before) What is not supported yet, but will be addressed in the following CLs: - The destination URL for autocomplete matches from non-search provider won't be properly normalized, e.g. from history search provider. This can be done by (a) ensuring that search_terms_args are properly initialized for non-search provider matches (e.g. history search); (b) GetTemplateURL passes true in use_host argument. To address Peter's comment in particular: - I still need to pass Profile to result/match, because GetTemplateURL requires it and without it, you wouldn't get user's search engine overrides, would you? - It wasn't safe to modify GetTemplateURL, because it is used in a few places where it could break; for instance in AutocompleteController, when we update AQS we use search terms from the autocomplete match; in case of history provider, they may not be properly initialized PTAL
lgtm LGTM
LGTM http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.cc:367: // term substitutions like aqs/oq/aq. Nit: URL -> the URL, purpose -> purposes. Don't refer to "non-essential term substitutions" (or specific params) since this code removes _everything_ parametrized. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.cc:369: // we pick a template that matches the destination URL's host. Nit: This part can be omitted as it's documented in the GetTemplateURL() declaration. Maybe just: "If the destination URL looks like it was generated from a TemplateURL, remove all substitutions other than the search terms. This allows us to eliminate cases like past search URLs from history that differ only by some obscure query param from each other or from the search/keyword provider matches." http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.cc:379: // is sufficient from de-dupping perspective). Nit: I'd remove this comment entirely; the suggested comment I gave above makes this clear enough IMO. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_match.h (right): http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.h:193: // data that would otherwise prevent from proper dedupping. Nit: dedupping -> deduping (and in other files) I'd also replace "time/position sensitive data" with "query args other than the search terms themselves" since that's more accurate (even though the prior phrase is the core motivation here). Finally, if we need to pass a Profile here and in GetTemplateURL(), we should consider simply forcing AutocompleteMatch() to take a Profile* and store it as a member. Optionally, perhaps the right fix is to leave these signatures but make AutocompleteResult take a Profile*? I'm not sure without trying both. It would be fine to do this as a follow up. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.h:223: // Simply delegates to GetTemplateURL(profile, false) below. Nit: Having two functions here is just another way of writing default arguments, which are banned by the style guide. Remove this function and make all callers use the two-arg form. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.h:227: // |destination_url|'s host if |check_host| is true. Nit: Be more clear. Something like "If |use_host| is true and the keyword does not map to a valid TemplateURL, we'll then check for a TemplateURL that corresponds to the destination_url's hostname." You may also want to rename the arg |allow_fallback_to_destination_host| or something. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_result.cc (right): http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_result.cc:13: #include "chrome/browser/search_engines/template_url_prepopulate_data.h" Nit: Not necessary? http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_result_unittest.cc (right): http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_result_unittest.cc:53: scoped_ptr<TestingProfile> profile_; Nit: Does this need to be heap-allocated? Seems like just a TestingProfile member would work. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_result_unittest.cc:248: matches[4].destination_url = GURL("http://www.google.com/"); This seems like it relies on the test data having a TemplateURL for google.com. I'm not sure that's a good assumption to make. Perhaps this test should manually add a TemplateURL for some fake search engine, then use that.
(Also please look into test failures)
Thanks for the review. I'm still running tests to make sure all looks good before I commit. FYI: I'll have a followup CL early next week. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.cc:367: // term substitutions like aqs/oq/aq. On 2012/10/26 22:13:49, Peter Kasting wrote: > Nit: URL -> the URL, purpose -> purposes. > > Don't refer to "non-essential term substitutions" (or specific params) since > this code removes _everything_ parametrized. Done. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.cc:369: // we pick a template that matches the destination URL's host. On 2012/10/26 22:13:49, Peter Kasting wrote: > Nit: This part can be omitted as it's documented in the GetTemplateURL() > declaration. Maybe just: "If the destination URL looks like it was generated > from a TemplateURL, remove all substitutions other than the search terms. This > allows us to eliminate cases like past search URLs from history that differ only > by some obscure query param from each other or from the search/keyword provider > matches." Done. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.cc:379: // is sufficient from de-dupping perspective). On 2012/10/26 22:13:49, Peter Kasting wrote: > Nit: I'd remove this comment entirely; the suggested comment I gave above makes > this clear enough IMO. Done. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_match.h (right): http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.h:193: // data that would otherwise prevent from proper dedupping. On 2012/10/26 22:13:49, Peter Kasting wrote: > Nit: dedupping -> deduping (and in other files) Done. > > I'd also replace "time/position sensitive data" with "query args other than the > search terms themselves" since that's more accurate (even though the prior > phrase is the core motivation here). > > Finally, if we need to pass a Profile here and in GetTemplateURL(), we should > consider simply forcing AutocompleteMatch() to take a Profile* and store it as a > member. Optionally, perhaps the right fix is to leave these signatures but make > AutocompleteResult take a Profile*? I'm not sure without trying both. > > It would be fine to do this as a follow up. Yes, I'll have a follow-up CL to support proper destination URL normalization. Deferred. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.h:223: // Simply delegates to GetTemplateURL(profile, false) below. On 2012/10/26 22:13:49, Peter Kasting wrote: > Nit: Having two functions here is just another way of writing default arguments, > which are banned by the style guide. Remove this function and make all callers > use the two-arg form. Done. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_match.h:227: // |destination_url|'s host if |check_host| is true. On 2012/10/26 22:13:49, Peter Kasting wrote: > Nit: Be more clear. Something like "If |use_host| is true and the keyword does > not map to a valid TemplateURL, we'll then check for a TemplateURL that > corresponds to the destination_url's hostname." You may also want to rename the > arg |allow_fallback_to_destination_host| or something. Done. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_result.cc (right): http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_result.cc:13: #include "chrome/browser/search_engines/template_url_prepopulate_data.h" On 2012/10/26 22:13:49, Peter Kasting wrote: > Nit: Not necessary? Good catch, yes. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... File chrome/browser/autocomplete/autocomplete_result_unittest.cc (right): http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_result_unittest.cc:53: scoped_ptr<TestingProfile> profile_; On 2012/10/26 22:13:49, Peter Kasting wrote: > Nit: Does this need to be heap-allocated? Seems like just a TestingProfile > member would work. Yup. Done. http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet... chrome/browser/autocomplete/autocomplete_result_unittest.cc:248: matches[4].destination_url = GURL("http://www.google.com/"); On 2012/10/26 22:13:49, Peter Kasting wrote: > This seems like it relies on the test data having a TemplateURL for http://google.com. > I'm not sure that's a good assumption to make. > > Perhaps this test should manually add a TemplateURL for some fake search engine, > then use that. Yes - I'll update it in the followup CL.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bartn@chromium.org/11198074/44015
Presubmit check for 11198074-44015 failed and returned exit status 1. Running presubmit commit checks ... ** Presubmit ERRORS ** Missing LGTM from an OWNER for files in these directories: chrome chrome/browser/instant Presubmit checks took 2.1s to calculate.
c/b/instant lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bartn@chromium.org/11198074/44015
Presubmit check for 11198074-44015 failed and returned exit status 1. Running presubmit commit checks ... ** Presubmit ERRORS ** Missing LGTM from an OWNER for files in these directories: chrome Presubmit checks took 2.4s to calculate.
Scott - do you mind stamping this CL for chrome/browser/*? I'm missing OWNER"s approval.
LGTM
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bartn@chromium.org/11198074/44015
On 2012/10/29 20:23:43, I haz the power (commit-bot) wrote: > CQ is trying da patch. Follow status at > https://chromium-status.appspot.com/cq/bartn%40chromium.org/11198074/44015 Hey guys - it looks like the last remaining failing test is related to some flaky tests on mac_rel: http://chromium-build-logs.appspot.com/gtest_query?gtest_query=ExtensionApiTe... and is unrelated to my CL. What's the procedure for such a case? Should I notify the Chromium sheriff?
On 2012/10/29 22:24:19, Bart N. wrote: > On 2012/10/29 20:23:43, I haz the power (commit-bot) wrote: > > CQ is trying da patch. Follow status at > > https://chromium-status.appspot.com/cq/bartn%2540chromium.org/11198074/44015 > Hey guys - it looks like the last remaining failing test is related to some > flaky tests on mac_rel: > > http://chromium-build-logs.appspot.com/gtest_query?gtest_query=ExtensionApiTe... > > and is unrelated to my CL. What's the procedure for such a case? Should I notify > the Chromium sheriff? On 2012/10/29 22:24:19, Bart N. wrote: > On 2012/10/29 20:23:43, I haz the power (commit-bot) wrote: > > CQ is trying da patch. Follow status at > > https://chromium-status.appspot.com/cq/bartn%2540chromium.org/11198074/44015 > Hey guys - it looks like the last remaining failing test is related to some > flaky tests on mac_rel: > > http://chromium-build-logs.appspot.com/gtest_query?gtest_query=ExtensionApiTe... > > and is unrelated to my CL. What's the procedure for such a case? Should I notify > the Chromium sheriff? The commit queue usually retries failed tests. If it's flaky, the retry may pass. If the retry doesn't pass, and you still believe that particular test is flaky, you can either try commit-queueing it again or landing it manually. --mark
> The commit queue usually retries failed tests. If it's flaky, the retry may > pass. OK, I'll give it some time (it shows mac_rel in gray in addition to mac_rel in red). > If the retry doesn't pass, and you still believe that particular test is flaky, > you can either try commit-queueing it again or landing it manually. I'm certain that the test is flaky, because it passed once (patch set 8) and also as you see in the two links below, it's been failing for other people, too: http://chromium-build-logs.appspot.com/gtest_query?gtest_query=PPAPINaClGLibc... and http://chromium-build-logs.appspot.com/gtest_query?gtest_query=ExtensionApiTe...
Change committed as 164790 |