Issue 11198074: Initial implementation of dedupping search provider's URLs.

Issue 11198074: Initial implementation of dedupping search provider's URLs. (Closed)

Created:
8 years, 2 months ago by Bart N.

Modified:
8 years, 1 month ago

Reviewers:
beaudoin, Peter Kasting, sreeram, sky

CC:
chromium-reviews, James Su, H Fung

Base URL:
http://git.chromium.org/git/chromium.git@trunk

Visibility:
Public.

More Reviews

Description

Initial implementation of dedupping search provider's URLs. There are two main sources of such URLs: (1) search provider itself (2) history URL providers (after visiting a previously suggested URL). A search provider URL may contain time/position specific CGI params, which if left, may prevent from dedupping. BUG=146551 TEST=AutocompleteResultTest::SortAndCullDuplicateSearchURLs Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=164790

Patch Set 1 #

Patch Set 2 : Fix a minor compilation error. #

Total comments: 13

Patch Set 3 : Addressed comments. #

Total comments: 18

Patch Set 4 : Addressed comments. #

Patch Set 5 : #

Patch Set 6 : Fixed a broken test. Added a test search engine (foo). #

Patch Set 7 : Fix failing tests by making sure that TemplateURLService is loaded when needed. #

Patch Set 8 : #

Patch Set 9 : Added missing initialization of country code necessary for Android unit tests. #

Patch Set 10 : Add a missing include. #

Created: 8 years, 1 month ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+153 lines, -35 lines)			Patch
M	chrome/browser/autocomplete/autocomplete_browsertest.cc	View	1 2 3 4 5 6	7 chunks	+11 lines, -0 lines	0 comments	Download
M	chrome/browser/autocomplete/autocomplete_controller.cc	View	1 2 3	5 chunks	+5 lines, -5 lines	0 comments	Download
M	chrome/browser/autocomplete/autocomplete_match.h	View	1 2 3	2 chunks	+10 lines, -2 lines	0 comments	Download
M	chrome/browser/autocomplete/autocomplete_match.cc	View	1 2 3 4	4 chunks	+34 lines, -8 lines	0 comments	Download
M	chrome/browser/autocomplete/autocomplete_provider_unittest.cc	View	1 2 3 4 5 6	3 chunks	+3 lines, -2 lines	0 comments	Download
M	chrome/browser/autocomplete/autocomplete_result.h	View		3 chunks	+4 lines, -2 lines	0 comments	Download
M	chrome/browser/autocomplete/autocomplete_result.cc	View	1 2 3	3 chunks	+6 lines, -4 lines	0 comments	Download
M	chrome/browser/autocomplete/autocomplete_result_unittest.cc	View	1 2 3 4 5 6 7 8 9	7 chunks	+68 lines, -5 lines	0 comments	Download
M	chrome/browser/autocomplete/history_url_provider_unittest.cc	View	1 2 3	1 chunk	+1 line, -1 line	0 comments	Download
M	chrome/browser/autocomplete/search_provider.cc	View	1 2 3	1 chunk	+1 line, -1 line	0 comments	Download
M	chrome/browser/instant/instant_controller.cc	View	1 2 3	1 chunk	+2 lines, -1 line	0 comments	Download
M	chrome/browser/popup_blocker_browsertest.cc	View	1 2 3 4 5 6 7	3 chunks	+5 lines, -1 line	0 comments	Download
M	chrome/browser/ui/omnibox/omnibox_edit_model.cc	View	1 2 3	2 chunks	+2 lines, -2 lines	0 comments	Download
M	chrome/browser/ui/omnibox/omnibox_popup_model.cc	View	1 2 3	1 chunk	+1 line, -1 line	0 comments	Download

Messages

Total messages: 44 (0 generated)

Expand Messages | Collapse Messages

beaudoin

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc#newcode366 chrome/browser/autocomplete/autocomplete_match.cc:366: // like aqs/oq/aq. I would not be opposed to ...

8 years, 2 months ago (2012-10-19 14:25:34 UTC) #1

Peter Kasting

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc#newcode373 chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). On 2012/10/19 14:25:34, ...

8 years, 2 months ago (2012-10-19 22:49:33 UTC) #2

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
File chrome/browser/autocomplete/autocomplete_match.cc (right):

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from
de-dupping perspective).
On 2012/10/19 14:25:34, beaudoin wrote:
> This may confuse an "image search" with a regular search, since these only
> differ in their query parameters.

Uuuuugggghhh.  This is the sort of reason why I get really unhappy about the
idea of having all these totally different result sets (images, news, maps,
whatever) just be query params on a single host + path.

I don't know how to solve this problem.  The only thing I can think of is to
pass a "strict" bool arg to the TemplateURL param extraction code, and then
somehow instruct the system on which params can be allowed to differ in "strict"
mode and which can't.  (E.g. we let aqs and oq differ in strict mode and say
that anything else is important.)

This is ugly, but I don't know how else to do this.  Note that for parsing
queries out to place in the local search history, we still want non-strict
matching.

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.cc:376:
TemplateURLRef::SearchTermsArgs(search_terms)));
In the original bug we proposed replacing the actual destination URL, not just
the one used for de-duping, with the idea that this would prevent us from
repeatedly visiting some history URL that had embedded garbage in the aqs/oq
params if that were the highest-scoring match.

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
File chrome/browser/autocomplete/autocomplete_match.h (right):

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.h:191: // URLs that might have
been generated by the default search provider.
We should not pass this in.

If the match has an associated TemplateURL already, then clearly that's the
TemplateURL to use for normalization.  Otherwise, we should use the TemplateURL
machinery to dynamically calculate the matching TemplateURL, if any, for the
destination URL.  (This handles matches from the history providers.)  The result
of this is that we'll de-dupe properly for non-DSPs as well, and the callers
will be insulated from all this complexity to boot.

I don't know whether GetTemplateURL() can be changed to do this -- you'd have to
check the callers to see whether returning a non-NULL template URL for a "URL
from history" result.  My suspicion is that it can, but maybe we don't want to
unless we make it cache the calculation result.

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
File chrome/browser/autocomplete/autocomplete_result.h (right):

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_result.h:76: Profile* profile);
It doesn't really make sense to use a single AutocompleteResult on multiple
different Profiles, so let's supply the Profile* at construction time.

However, I don't think we should be passing in a Profile* at all, see
AutocompleteMatch comments.

Bart N.

Peter - we essentially have to agree on two high level issues raised so far: ...

8 years, 2 months ago (2012-10-22 18:20:24 UTC) #3

Peter - we essentially have to agree on two high level issues raised so far:
(1) how to handle image search vs search
(2) whether to fix destination URL

see my comments inline.

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
File chrome/browser/autocomplete/autocomplete_match.cc (right):

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from
de-dupping perspective).
On 2012/10/19 22:49:33, Peter Kasting wrote:
> On 2012/10/19 14:25:34, beaudoin wrote:
> > This may confuse an "image search" with a regular search, since these only
> > differ in their query parameters.
> 
> Uuuuugggghhh.  This is the sort of reason why I get really unhappy about the
> idea of having all these totally different result sets (images, news, maps,
> whatever) just be query params on a single host + path.
> 
> I don't know how to solve this problem.  The only thing I can think of is to
> pass a "strict" bool arg to the TemplateURL param extraction code, and then
> somehow instruct the system on which params can be allowed to differ in
"strict"
> mode and which can't.  (E.g. we let aqs and oq differ in strict mode and say
> that anything else is important.)
I agree this is a problem here. Even with your proposal, we may not be safe
(what if oq/aqs are actually important for image search?).

I'd be inclined into fixing/modifying ExtractSearchTermsFromURL so it is aware
of non-search properties, like Image Search or simply by adding another output
param which could indicate what corpora it extracted query terms for (or simply
bool* is_search). Then, we would be safe to de-dup iff we are on Search.
> 
> This is ugly, but I don't know how else to do this.  Note that for parsing
> queries out to place in the local search history, we still want non-strict
> matching.

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.cc:376:
TemplateURLRef::SearchTermsArgs(search_terms)));
On 2012/10/19 22:49:33, Peter Kasting wrote:
> In the original bug we proposed replacing the actual destination URL, not just
> the one used for de-duping, with the idea that this would prevent us from
> repeatedly visiting some history URL that had embedded garbage in the aqs/oq
> params if that were the highest-scoring match.
Yup, I'm totally aware of that, in fact it could have been me who proposed that,
but I don't remember.

However, I intentionally didn't implement it in first shot, because it seems
more risky to me (e.g. look at the image search example). Would you be OK if I
added it as a follow-up item rather than do it in this CL?

beaudoin

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc#newcode373 chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). On 2012/10/22 18:20:24, ...

8 years, 2 months ago (2012-10-22 18:43:23 UTC) #4

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
File chrome/browser/autocomplete/autocomplete_match.cc (right):

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from
de-dupping perspective).
On 2012/10/22 18:20:24, Bart N. wrote:
> On 2012/10/19 22:49:33, Peter Kasting wrote:
> > On 2012/10/19 14:25:34, beaudoin wrote:
> > > This may confuse an "image search" with a regular search, since these only
> > > differ in their query parameters.
> > 
> > Uuuuugggghhh.  This is the sort of reason why I get really unhappy about the
> > idea of having all these totally different result sets (images, news, maps,
> > whatever) just be query params on a single host + path.
> > 
> > I don't know how to solve this problem.  The only thing I can think of is to
> > pass a "strict" bool arg to the TemplateURL param extraction code, and then
> > somehow instruct the system on which params can be allowed to differ in
> "strict"
> > mode and which can't.  (E.g. we let aqs and oq differ in strict mode and say
> > that anything else is important.)
> I agree this is a problem here. Even with your proposal, we may not be safe
> (what if oq/aqs are actually important for image search?).
> 
> I'd be inclined into fixing/modifying ExtractSearchTermsFromURL so it is aware
> of non-search properties, like Image Search or simply by adding another output
> param which could indicate what corpora it extracted query terms for (or
simply
> bool* is_search). Then, we would be safe to de-dup iff we are on Search.

The problem is that ExtractSearchTermsFromURL is "Google-agnostic". That is, it
uses only information from the TemplateURL. I guess we could add some fields in
that TemplateURL to specify "important" query parameters but the exact way to do
this and the associated logic are relatively complex if we want to keep it
generic. I think this is where PK is headed with his "strict" bool.

> > This is ugly, but I don't know how else to do this.  Note that for parsing
> > queries out to place in the local search history, we still want non-strict
> > matching.

More generally, I wonder if we should extend TemplateURL to allow search engines
to provide a "URL specification", mapping important ref and query parameters to
some canonical field known to Chrome. For example:

"google": {
   "name": "Google",
   "keyword": "google.com",
   "favicon": "http://www.google.com/favicon.ico",
   ...
   "params": {
     "query": [ {"loc": "ref", "param": "q"}, {"loc": "query", "param": "q"} ],
     "corpus": [ {"loc": "query", "param": "tbm"} ],
     "language": [ {"loc": "query", "param": "hl"} ]
   }
}

Then we could have a TemplateURL::ParseURL(const GURL& url) returning a
Dictionary value with the known fields filled-in. This in turn could be used to
de-dupe (instead of the current approach of reconstructing a URL). In some sense
it would be a generalized ExtractSearchTermsFromURL that would recognize that
the query is made up of more than the search term.

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.cc:376:
TemplateURLRef::SearchTermsArgs(search_terms)));
On 2012/10/22 18:20:24, Bart N. wrote:
> On 2012/10/19 22:49:33, Peter Kasting wrote:
> > In the original bug we proposed replacing the actual destination URL, not
just
> > the one used for de-duping, with the idea that this would prevent us from
> > repeatedly visiting some history URL that had embedded garbage in the aqs/oq
> > params if that were the highest-scoring match.
> Yup, I'm totally aware of that, in fact it could have been me who proposed
that,
> but I don't remember.
> 
> However, I intentionally didn't implement it in first shot, because it seems
> more risky to me (e.g. look at the image search example). Would you be OK if I
> added it as a follow-up item rather than do it in this CL?

With the approach above we could maybe even reconstruct the cleaned-up URL from
the extracted structure.

Bart N.

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc#newcode373 chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). On 2012/10/22 18:43:23, ...

8 years, 2 months ago (2012-10-22 19:19:42 UTC) #5

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
File chrome/browser/autocomplete/autocomplete_match.cc (right):

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from
de-dupping perspective).
On 2012/10/22 18:43:23, beaudoin wrote:
> On 2012/10/22 18:20:24, Bart N. wrote:
> > On 2012/10/19 22:49:33, Peter Kasting wrote:
> > > On 2012/10/19 14:25:34, beaudoin wrote:
> > > > This may confuse an "image search" with a regular search, since these
only
> > > > differ in their query parameters.
> > > 
> > > Uuuuugggghhh.  This is the sort of reason why I get really unhappy about
the
> > > idea of having all these totally different result sets (images, news,
maps,
> > > whatever) just be query params on a single host + path.
> > > 
> > > I don't know how to solve this problem.  The only thing I can think of is
to
> > > pass a "strict" bool arg to the TemplateURL param extraction code, and
then
> > > somehow instruct the system on which params can be allowed to differ in
> > "strict"
> > > mode and which can't.  (E.g. we let aqs and oq differ in strict mode and
say
> > > that anything else is important.)
> > I agree this is a problem here. Even with your proposal, we may not be safe
> > (what if oq/aqs are actually important for image search?).
> > 
> > I'd be inclined into fixing/modifying ExtractSearchTermsFromURL so it is
aware
> > of non-search properties, like Image Search or simply by adding another
output
> > param which could indicate what corpora it extracted query terms for (or
> simply
> > bool* is_search). Then, we would be safe to de-dup iff we are on Search.
> 
> The problem is that ExtractSearchTermsFromURL is "Google-agnostic". That is,
it
> uses only information from the TemplateURL. I guess we could add some fields
in
> that TemplateURL to specify "important" query parameters but the exact way to
do
> this and the associated logic are relatively complex if we want to keep it
> generic. I think this is where PK is headed with his "strict" bool.
> 
> > > This is ugly, but I don't know how else to do this.  Note that for parsing
> > > queries out to place in the local search history, we still want non-strict
> > > matching.
> 
> More generally, I wonder if we should extend TemplateURL to allow search
engines
> to provide a "URL specification", mapping important ref and query parameters
to
> some canonical field known to Chrome. For example:
> 
> "google": {
>    "name": "Google",
>    "keyword": "google.com",
>    "favicon": "http://www.google.com/favicon.ico",
>    ...
>    "params": {
>      "query": [ {"loc": "ref", "param": "q"}, {"loc": "query", "param": "q"}
],
>      "corpus": [ {"loc": "query", "param": "tbm"} ],
>      "language": [ {"loc": "query", "param": "hl"} ]
>    }
> }
> 
> Then we could have a TemplateURL::ParseURL(const GURL& url) returning a
> Dictionary value with the known fields filled-in. This in turn could be used
to
> de-dupe (instead of the current approach of reconstructing a URL). In some
sense
> it would be a generalized ExtractSearchTermsFromURL that would recognize that
> the query is made up of more than the search term.
Yes, totally makes sense. I didn't mean to make ExtractSearchTermsFromURL Google
specific; I was more thinking in line with your proposal (register CGI params
such as it's clear for TemplateURL whether it's OK to drop given param for
de-dupping/destination. For instance, we could indicate that a given param needs
to be updated before serving the URL again.

I like your proposal above. I wonder what Peter thinks. Anyway, it sounds like a
bigger refactoring though.

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.cc:376:
TemplateURLRef::SearchTermsArgs(search_terms)));
On 2012/10/22 18:43:23, beaudoin wrote:
> On 2012/10/22 18:20:24, Bart N. wrote:
> > On 2012/10/19 22:49:33, Peter Kasting wrote:
> > > In the original bug we proposed replacing the actual destination URL, not
> just
> > > the one used for de-duping, with the idea that this would prevent us from
> > > repeatedly visiting some history URL that had embedded garbage in the
aqs/oq
> > > params if that were the highest-scoring match.
> > Yup, I'm totally aware of that, in fact it could have been me who proposed
> that,
> > but I don't remember.
> > 
> > However, I intentionally didn't implement it in first shot, because it seems
> > more risky to me (e.g. look at the image search example). Would you be OK if
I
> > added it as a follow-up item rather than do it in this CL?
> 
> With the approach above we could maybe even reconstruct the cleaned-up URL
from
> the extracted structure.
Agreed. It would be clean and Google-agnostic.

Peter Kasting

I think you guys have the right sorts of concerns in mind in general. I ...

8 years, 2 months ago (2012-10-22 21:08:01 UTC) #6

I think you guys have the right sorts of concerns in mind in general.

I don't really relish the idea of teaching TemplateURLs about the concept of a
"corpus" (and even more than that, a way to tell whether you're "on the search
corpus").  Primarily I worry that over time this idea can mutate such that it no
longer makes sense.  Years ago, for example, being on the "search corpus" would
have meant you never got any image results, whereas now Universal has changed
that (on Google at least).  Also this sort of concept may map poorly across
different search providers.

My hope with the suggestion of "important vs. unimportant" was to draw the most
general possible line in hopes that it would be something with broad
applicability across both time and different providers.  It is true that
potentially this doesn't work as-is, if for example a parameter is sometimes
"important" and sometimes not.  But practically speaking I'm betting that
doesn't happen -- I suspect that something like "aqs" or "oq" can always be
changed without changing the true meaning of the URL, as opposed to being
malleable based on the presence of yet another query param.

It's not instantly obvious to me whether the right way to go is to single out
"important" parameters (e.g. "q") and consider everything else unimportant, or
vice versa.  It probably depends on (a) how many parameters we know fall into
each category and (b) how we think that will change over time.  Right now, the
implementation basically considers everything other than "whichever param has
the %s" as unimportant, so the minimum behavioral change would be to keep that
as the default view and just mark a few other parameters as "also important".

I think the format proposed by Philippe above is too generic in any case.  It
may be necessary long-term but I don't want us to over-architect this thing too
much.  Let's write the simplest and most basic syntax we can get away with and
extend over time.

beaudoin

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete/autocomplete_match.cc#newcode373 chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from de-dupping perspective). On 2012/10/22 19:19:42, ...

8 years, 2 months ago (2012-10-22 21:36:48 UTC) #7

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
File chrome/browser/autocomplete/autocomplete_match.cc (right):

http://codereview.chromium.org/11198074/diff/3002/chrome/browser/autocomplete...
chrome/browser/autocomplete/autocomplete_match.cc:373: // is sufficient from
de-dupping perspective).
On 2012/10/22 19:19:42, Bart N. wrote:
> On 2012/10/22 18:43:23, beaudoin wrote:
> > On 2012/10/22 18:20:24, Bart N. wrote:
> > > On 2012/10/19 22:49:33, Peter Kasting wrote:
> > > > On 2012/10/19 14:25:34, beaudoin wrote:
> > > > > This may confuse an "image search" with a regular search, since these
> only
> > > > > differ in their query parameters.
> > > > 
> > > > Uuuuugggghhh.  This is the sort of reason why I get really unhappy about
> the
> > > > idea of having all these totally different result sets (images, news,
> maps,
> > > > whatever) just be query params on a single host + path.
> > > > 
> > > > I don't know how to solve this problem.  The only thing I can think of
is
> to
> > > > pass a "strict" bool arg to the TemplateURL param extraction code, and
> then
> > > > somehow instruct the system on which params can be allowed to differ in
> > > "strict"
> > > > mode and which can't.  (E.g. we let aqs and oq differ in strict mode and
> say
> > > > that anything else is important.)
> > > I agree this is a problem here. Even with your proposal, we may not be
safe
> > > (what if oq/aqs are actually important for image search?).
> > > 
> > > I'd be inclined into fixing/modifying ExtractSearchTermsFromURL so it is
> aware
> > > of non-search properties, like Image Search or simply by adding another
> output
> > > param which could indicate what corpora it extracted query terms for (or
> > simply
> > > bool* is_search). Then, we would be safe to de-dup iff we are on Search.
> > 
> > The problem is that ExtractSearchTermsFromURL is "Google-agnostic". That is,
> it
> > uses only information from the TemplateURL. I guess we could add some fields
> in
> > that TemplateURL to specify "important" query parameters but the exact way
to
> do
> > this and the associated logic are relatively complex if we want to keep it
> > generic. I think this is where PK is headed with his "strict" bool.
> > 
> > > > This is ugly, but I don't know how else to do this.  Note that for
parsing
> > > > queries out to place in the local search history, we still want
non-strict
> > > > matching.
> > 
> > More generally, I wonder if we should extend TemplateURL to allow search
> engines
> > to provide a "URL specification", mapping important ref and query parameters
> to
> > some canonical field known to Chrome. For example:
> > 
> > "google": {
> >    "name": "Google",
> >    "keyword": "google.com",
> >    "favicon": "http://www.google.com/favicon.ico",
> >    ...
> >    "params": {
> >      "query": [ {"loc": "ref", "param": "q"}, {"loc": "query", "param": "q"}
> ],
> >      "corpus": [ {"loc": "query", "param": "tbm"} ],
> >      "language": [ {"loc": "query", "param": "hl"} ]
> >    }
> > }
> > 
> > Then we could have a TemplateURL::ParseURL(const GURL& url) returning a
> > Dictionary value with the known fields filled-in. This in turn could be used
> to
> > de-dupe (instead of the current approach of reconstructing a URL). In some
> sense
> > it would be a generalized ExtractSearchTermsFromURL that would recognize
that
> > the query is made up of more than the search term.
> Yes, totally makes sense. I didn't mean to make ExtractSearchTermsFromURL
Google
> specific; I was more thinking in line with your proposal (register CGI params
> such as it's clear for TemplateURL whether it's OK to drop given param for
> de-dupping/destination. For instance, we could indicate that a given param
needs
> to be updated before serving the URL again.
> 
> I like your proposal above. I wonder what Peter thinks. Anyway, it sounds like
a
> bigger refactoring though.

Totally. It would probably also have to wait after the move of
template_url_prepopulate_data to JSON which I'm working on.

For the moment, I would be inclined to say dedupping-that-confuses-corpuses is
better than no-dedupping-at-all and would recommend accepting that CL with the
current design, which is probably the best we can do without a major
refactoring.

Bart N.

On 2012/10/22 21:08:01, Peter Kasting wrote: > I think you guys have the right sorts ...

8 years, 2 months ago (2012-10-22 21:39:13 UTC) #8

On 2012/10/22 21:08:01, Peter Kasting wrote:
> I think you guys have the right sorts of concerns in mind in general.
> 
> I don't really relish the idea of teaching TemplateURLs about the concept of a
> "corpus" (and even more than that, a way to tell whether you're "on the search
> corpus").  Primarily I worry that over time this idea can mutate such that it
no
> longer makes sense.  Years ago, for example, being on the "search corpus"
would
> have meant you never got any image results, whereas now Universal has changed
> that (on Google at least).  Also this sort of concept may map poorly across
> different search providers.
Actually my main intent was to expose a function which would reliably tell
whether a given URL is produced by a search provider or not, and do it in a
Google-agnostic way. One approach would be to modify ExtractSearchTerms function
to never return query terms unless URL matches the search provider's template,
or have a separate function.
> 
> My hope with the suggestion of "important vs. unimportant" was to draw the
most
> general possible line in hopes that it would be something with broad
> applicability across both time and different providers.  It is true that
> potentially this doesn't work as-is, if for example a parameter is sometimes
> "important" and sometimes not.  But practically speaking I'm betting that
> doesn't happen -- I suspect that something like "aqs" or "oq" can always be
> changed without changing the true meaning of the URL, as opposed to being
> malleable based on the presence of yet another query param.
> 
> It's not instantly obvious to me whether the right way to go is to single out
> "important" parameters (e.g. "q") and consider everything else unimportant, or
> vice versa.  It probably depends on (a) how many parameters we know fall into
> each category and (b) how we think that will change over time.  Right now, the
> implementation basically considers everything other than "whichever param has
> the %s" as unimportant, so the minimum behavioral change would be to keep that
> as the default view and just mark a few other parameters as "also important".
From my personal experience, it's much safer to explicitly drop something you
know about rather than keeping important params and dropping everything else.
However, for dedupping purpose, either approach is fine as long as we are 100%
sure we are dealing with the proper URL (see my comment above).
> 
> I think the format proposed by Philippe above is too generic in any case.  It
> may be necessary long-term but I don't want us to over-architect this thing
too
> much.  Let's write the simplest and most basic syntax we can get away with and
> extend over time.
I'm fine with a simpler format, but I'm still not sure what the best way would
be. You can think of either having a regex (which could be be default derived
from template by replacing all params with ".*" or have it defined for each
provider. However, in case of image search, which is a superset of Google
search, it's probably important to indicate which params must not be present
(unless you are fine risking that at some point image search URL may have aqs,
because it already has "oq").

Also, it strikes me that given two entries "image search" and "web search", it
would be totally fine just to retain one of them, whichever is scored higher. If
so, this CL in its current form would work as desired, disregarding the
destination URL fix.

Let me think more about it.

Bart N.

On 2012/10/22 21:36:48, beaudoin wrote: > For the moment, I would be inclined to say ...

8 years, 2 months ago (2012-10-22 21:41:58 UTC) #9

Bart N.

BTW, I chatted with Harry and one idea we could implement is to piggy back ...

8 years, 2 months ago (2012-10-22 21:53:57 UTC) #10

Peter Kasting

On 2012/10/22 21:39:13, Bart N. wrote: > On 2012/10/22 21:08:01, Peter Kasting wrote: > > ...

8 years, 2 months ago (2012-10-22 21:55:54 UTC) #11

On 2012/10/22 21:39:13, Bart N. wrote:
> On 2012/10/22 21:08:01, Peter Kasting wrote:
> > I don't really relish the idea of teaching TemplateURLs about the concept of
a
> > "corpus" (and even more than that, a way to tell whether you're "on the
search
> > corpus").  Primarily I worry that over time this idea can mutate such that
it
> no
> > longer makes sense.
>
> Actually my main intent was to expose a function which would reliably tell
> whether a given URL is produced by a search provider or not, and do it in a
> Google-agnostic way. One approach would be to modify ExtractSearchTerms
function
> to never return query terms unless URL matches the search provider's template,
> or have a separate function.

I don't think I'm following you.

The primary problem I thought we were trying to solve is that the conceptual
"same search" can be represented by an arbitrary number of distinct URLs in
history.  What's important is not whether those URLs were generated via the
Chrome search provider or via typing into google.com itself, but the fact that
they "mean the same".  Leaving the search provider out entirely, we don't want
history to return two URL results that are for identical searches save for a
different "aqs"; and in fact we want to strip or recalculate that param in the
URL the user ultimately navigates to.

> From my personal experience, it's much safer to explicitly drop something you
> know about rather than keeping important params and dropping everything else.

Sure, that's obviously the more cautious approach.  My worry is that it
unnecessarily restricts Chrome's capabilities when other teams at Google can and
do add/remove/modify the URL params all the time.  We want to match as many URLs
as we can even if some team decides to add an &foo=bar argument that doesn't
really mean much to the user.

> > Let's write the simplest and most basic syntax we can get away with and
> > extend over time.
>
> I'm fine with a simpler format, but I'm still not sure what the best way would
> be. You can think of either having a regex (which could be be default derived
> from template by replacing all params with ".*" or have it defined for each
> provider.

I never imagined we could define this any other way than per-provider.

I am keen to avoid regexes for a variety of reasons.

> Also, it strikes me that given two entries "image search" and "web search", it
> would be totally fine just to retain one of them, whichever is scored higher.
If
> so, this CL in its current form would work as desired, disregarding the
> destination URL fix.

Maybe.  I'm not 100% sure.

Peter Kasting

On 2012/10/22 21:53:57, Bart N. wrote: > BTW, I chatted with Harry and one idea ...

8 years, 2 months ago (2012-10-22 21:57:23 UTC) #12

Bart N.

On 2012/10/22 21:55:54, Peter Kasting wrote: > On 2012/10/22 21:39:13, Bart N. wrote: > > ...

8 years, 2 months ago (2012-10-22 22:07:02 UTC) #13

On 2012/10/22 21:55:54, Peter Kasting wrote:
> On 2012/10/22 21:39:13, Bart N. wrote:
> > On 2012/10/22 21:08:01, Peter Kasting wrote:
> > > I don't really relish the idea of teaching TemplateURLs about the concept
of
> a
> > > "corpus" (and even more than that, a way to tell whether you're "on the
> search
> > > corpus").  Primarily I worry that over time this idea can mutate such that
> it
> > no
> > > longer makes sense.
> >
> > Actually my main intent was to expose a function which would reliably tell
> > whether a given URL is produced by a search provider or not, and do it in a
> > Google-agnostic way. One approach would be to modify ExtractSearchTerms
> function
> > to never return query terms unless URL matches the search provider's
template,
> > or have a separate function.
> 
> I don't think I'm following you.
> 
> The primary problem I thought we were trying to solve is that the conceptual
> "same search" can be represented by an arbitrary number of distinct URLs in
> history.  What's important is not whether those URLs were generated via the
> Chrome search provider or via typing into http://google.com itself, but the
fact that
> they "mean the same".  Leaving the search provider out entirely, we don't want
> history to return two URL results that are for identical searches save for a
> different "aqs"; and in fact we want to strip or recalculate that param in the
> URL the user ultimately navigates to.
Yes, we are on the same page except I'm proposing a different approach to
solving it. What I'm saying is this: given a set of URLs regardless if they were
generated by Chrome search provider, were manually entered in the box or were
simply links followed from a page, I want to de-dup them. The way I want to do
is simple: ask the only authority (in this case search provider) to normalize
the URLs such as they are comparable for dedupping purpose. Only search provider
will know if a URL is search related or not. History URL provider has no idea if
the URL relates to search or not, as it is simply search agnostic.

If you believe this should be done in a different way, then please let me know.
I still believe that search provider (or its template) is the best place to find
out if two URLs mean the same.

Please note that there is no reliable way to tell if two URLs are same unless
you understand a particular site they refer two. While it may be possible to
detect and drop CGI params that refer to an HTTP session (e.g. jsessionid), in
general it may not be possible to generalize the deduping concept for any
History URL. Thus, I focused on the source we know (search).

Peter Kasting

You don't want SearchProvider, you want TemplateURLService/TemplateURL/TemplateURLRef. TemplateURL* aren't going to be dropping parameters from ...

8 years, 2 months ago (2012-10-22 22:10:53 UTC) #14

Bart N.

On 2012/10/22 22:10:53, Peter Kasting wrote: > You don't want SearchProvider, you want > TemplateURLService/TemplateURL/TemplateURLRef. ...

8 years, 2 months ago (2012-10-22 22:15:36 UTC) #15

Peter Kasting

On 2012/10/22 22:15:36, Bart N. wrote: > On 2012/10/22 22:10:53, Peter Kasting wrote: > > ...

8 years, 2 months ago (2012-10-22 22:22:09 UTC) #16

Bart N.

On 2012/10/22 22:22:09, Peter Kasting wrote: > On 2012/10/22 22:15:36, Bart N. wrote: > > ...

8 years, 2 months ago (2012-10-22 22:32:16 UTC) #17

On 2012/10/22 22:22:09, Peter Kasting wrote:
> On 2012/10/22 22:15:36, Bart N. wrote:
> > On 2012/10/22 22:10:53, Peter Kasting wrote:
> > > You don't want SearchProvider, you want
> > > TemplateURLService/TemplateURL/TemplateURLRef.
> > > 
> > > TemplateURL* aren't going to be dropping parameters from any-old-URL,
> they'll
> > > only work on well-defined TemplateURLs.  If I visit
> > > http://crazysite.com/blah?x=y, there won't be a matching TemplateURL so we
> > won't
> > > be doing any kind of mangling of the URL.
> >
> > That is something I didn't follow entirely, because I simply don't know how
it
> > works. In this case, I have to blindly trust you :)
> 
> It's worth understanding this.
That's a great explanation and thanks for that (and dealing with me).
> 
> TemplateURL is the class that represents "a search engine".  Each line you see
> in Chrome's "manage search engines" dialog corresponds to one TemplateURL.  We
> ship out-of-the-box with a few TemplateURLs defined for each locale as
specified
> in template_url_prepopulate_data.cc.  TemplateURL can do something like "given
> this search string, give me back the navigation URL".  The TemplateURLService
is
> the internal class that manages the known TemplateURLs and can tell you things
> like "given this keyword string, give me back the TemplateURL it corresponds
> to".
> 
> The SearchProvider and KeywordProvider are both users of TemplateURLs.  Given
> your input, they transform it into destination URLs by using TemplateURLs. 
The
> SearchProvider mainly deals with whichever TemplateURL is set as your default
> search provider, while the KeywordProvider gets TemplateURLs by lookup after
> parsing your input string to extract a keyword.
> 
> A random Chrome user may have from a few to a few hundred TemplateURLs
defined. 
> Of these, only the ones we ship in the prepopulate data would have fields we'd
> add listing things like "additional query params that cannot be safely
dropped".
I still don't understand what's wrong with my current code:

TemplateURL* template_url =
TemplateURLPrepopulateData::GetPrepopulatedDefaultSearch(profile);

why is this not going to work? My understanding is that this method will give me
back the "pre-populated" template I need as the name says. Or is it because I
pass non-empty profile, in which case could be fixed as simply passing NULL?
> 
> If that didn't answer your confusion, then I didn't understand what you were
> saying :)

beaudoin

(Jumping back a little, sorry...) The question here is: "Given url_a and url_b will the ...

8 years, 2 months ago (2012-10-22 22:40:44 UTC) #18

Peter Kasting

On 2012/10/22 22:32:16, Bart N. wrote: > I still don't understand what's wrong with my ...

8 years, 2 months ago (2012-10-22 22:45:16 UTC) #19

Bart N.

On 2012/10/22 22:40:44, beaudoin wrote: > (Jumping back a little, sorry...) > > The question ...

8 years, 2 months ago (2012-10-22 22:52:01 UTC) #20

On 2012/10/22 22:40:44, beaudoin wrote:
> (Jumping back a little, sorry...)
> 
> The question here is: "Given url_a and url_b will the search engine return the
> same results?" One way to find out is to ask the server. This, I believe, is
the
> server-side service Bart talks about. (Reminds me of URL canonicalization --
> only applied to search URLs.) I'm not a big fan for a variety of reasons:
> - Adds latency
> - Google-agnostic in theory but in practice only Google will support it
> - Chrome cannot learn anything on the query without pinging the server --
makes
> it less practical for deduping, say, history.
AFAIK, Chrome already performs this check for various reasons. I just wanted to
piggy back on that. The advantage of doing so is that whenever we add new param,
we don't need to change Chrome binary (and we are not limited by the release
cycle). For instance, as of now, search URL has also param gs_l which contains
time sensitive data. This should be dropped. This param is generated by
searchbox implementation and is subject to change.
> 
> OTOH, I agree with PK that "corpus" is not a concept we want to teach Chrome
> itself. What I was proposing is that ParseURL return a schema-less structure.
> The search provider is free to define any fields it wants, as long as it maps
to
> a query or ref parameter. Internally, Chrome can grok some typical fields
> ("search_terms", "language"), and it can use the entire extracted structure
for
> comparison purposes. In short, the contract between the search provider and
> Chrome is:
> - If the Dict are equals, the URLs will return the same search queries.
> 
> The benefit of having field name internally in Chrome may be, for example, to
> allow it to reconstruct URLs from one of these structures. For example, the
> TemplateURL might have:
>   "url":
>
"http://mysearch.com?q={field:searchTerms}&hl={field:language}&corpus={field:corpus}"
While this looks fine, how are you going to say - please don't drop CGI param X?
A good example is image search where we MUST NOT drop "site=imghp&tbm=isch"

Peter Kasting

On 2012/10/22 22:40:44, beaudoin wrote: > The question here is: "Given url_a and url_b will ...

8 years, 2 months ago (2012-10-22 22:53:44 UTC) #21

On 2012/10/22 22:40:44, beaudoin wrote:
> The question here is: "Given url_a and url_b will the search engine return the
> same results?"

Correct.  Or perhaps correct if we say "...return similar enough results for our
purposes".  (Which makes us have to define "similar enough", which I'm not sure
on.)

Note however that there is also another question, which is "what is the correct
form of this historical URL if I want to re-navigate to it now?"  The idea here
is that old values of things like "aqs" and "oq" are probably wrong and need to
be stripped or recalculated.  That does suggest to me one other way to tackle
this -- we could try to "normalize" URLs before they're added to history, so
that anything we pull up will be missing "bogus" (in the sense of their values
being out of date) params like these.  This might make life slightly easier. 
But we'd have to run a migration pass on the user's history :(.

My suggestions have been made with an eye toward solving both these problems.

> What I was proposing is that ParseURL return a schema-less structure.
> The search provider is free to define any fields it wants, as long as it maps
to
> a query or ref parameter. Internally, Chrome can grok some typical fields
> ("search_terms", "language"), and it can use the entire extracted structure
for
> comparison purposes. In short, the contract between the search provider and
> Chrome is:
> - If the Dict are equals, the URLs will return the same search queries.
> 
> The benefit of having field name internally in Chrome may be, for example, to
> allow it to reconstruct URLs from one of these structures. For example, the
> TemplateURL might have:
>   "url":
>
"http://mysearch.com?q={field:searchTerms}&hl={field:language}&corpus={field:corpus}"

We already have some pieces of this, in the form of the OSDD fields that we
understand (like "search terms" and "language" and "input encoding" and the
like).  The rest of the fields seem to break down into "args we should preserve
unchanged when normalizing" and "args we don't care about and can strip out". 
Hence my suggestion of a simple bit that marks something as "important" or not.

Bart N.

On 2012/10/22 22:45:16, Peter Kasting wrote: > On 2012/10/22 22:32:16, Bart N. wrote: > > ...

8 years, 2 months ago (2012-10-22 22:55:20 UTC) #22

Peter Kasting

On 2012/10/22 22:52:01, Bart N. wrote: > On 2012/10/22 22:40:44, beaudoin wrote: > > The ...

8 years, 2 months ago (2012-10-22 22:56:49 UTC) #23

Peter Kasting

On 2012/10/22 22:55:20, Bart N. wrote: > On 2012/10/22 22:45:16, Peter Kasting wrote: > > ...

8 years, 2 months ago (2012-10-22 23:02:09 UTC) #24

beaudoin

I agree with solving the general case (it's not too much harder) but still think ...

8 years, 2 months ago (2012-10-22 23:08:15 UTC) #25

Bart N.

On 2012/10/22 23:08:15, beaudoin wrote: > I agree with solving the general case (it's not ...

8 years, 2 months ago (2012-10-22 23:31:05 UTC) #26

Peter Kasting

On 2012/10/22 23:31:05, Bart N. wrote: > On 2012/10/22 23:08:15, beaudoin wrote: > > I ...

8 years, 2 months ago (2012-10-22 23:54:54 UTC) #27

Bart N.

Peter/Philippe, I've partially addressed your comments and I'd like to submit it in this form ...

8 years, 1 month ago (2012-10-25 17:46:27 UTC) #28

Peter Kasting

LGTM http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplete/autocomplete_match.cc File chrome/browser/autocomplete/autocomplete_match.cc (right): http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplete/autocomplete_match.cc#newcode367 chrome/browser/autocomplete/autocomplete_match.cc:367: // term substitutions like aqs/oq/aq. Nit: URL -> ...

8 years, 1 month ago (2012-10-26 22:13:49 UTC) #30

LGTM

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
File chrome/browser/autocomplete/autocomplete_match.cc (right):

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.cc:367: // term substitutions
like aqs/oq/aq.
Nit: URL -> the URL, purpose -> purposes.

Don't refer to "non-essential term substitutions" (or specific params) since
this code removes _everything_ parametrized.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.cc:369: // we pick a template
that matches the destination URL's host.
Nit: This part can be omitted as it's documented in the GetTemplateURL()
declaration.  Maybe just: "If the destination URL looks like it was generated
from a TemplateURL, remove all substitutions other than the search terms.  This
allows us to eliminate cases like past search URLs from history that differ only
by some obscure query param from each other or from the search/keyword provider
matches."

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.cc:379: // is sufficient from
de-dupping perspective).
Nit: I'd remove this comment entirely; the suggested comment I gave above makes
this clear enough IMO.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
File chrome/browser/autocomplete/autocomplete_match.h (right):

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.h:193: // data that would
otherwise prevent from proper dedupping.
Nit: dedupping -> deduping (and in other files)

I'd also replace "time/position sensitive data" with "query args other than the
search terms themselves" since that's more accurate (even though the prior
phrase is the core motivation here).

Finally, if we need to pass a Profile here and in GetTemplateURL(), we should
consider simply forcing AutocompleteMatch() to take a Profile* and store it as a
member.  Optionally, perhaps the right fix is to leave these signatures but make
AutocompleteResult take a Profile*?  I'm not sure without trying both.

It would be fine to do this as a follow up.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.h:223: // Simply delegates to
GetTemplateURL(profile, false) below.
Nit: Having two functions here is just another way of writing default arguments,
which are banned by the style guide.  Remove this function and make all callers
use the two-arg form.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.h:227: // |destination_url|'s
host if |check_host| is true.
Nit: Be more clear.  Something like "If |use_host| is true and the keyword does
not map to a valid TemplateURL, we'll then check for a TemplateURL that
corresponds to the destination_url's hostname."  You may also want to rename the
arg |allow_fallback_to_destination_host| or something.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
File chrome/browser/autocomplete/autocomplete_result.cc (right):

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_result.cc:13: #include
"chrome/browser/search_engines/template_url_prepopulate_data.h"
Nit: Not necessary?

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
File chrome/browser/autocomplete/autocomplete_result_unittest.cc (right):

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_result_unittest.cc:53:
scoped_ptr<TestingProfile> profile_;
Nit: Does this need to be heap-allocated?  Seems like just a TestingProfile
member would work.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_result_unittest.cc:248:
matches[4].destination_url = GURL("http://www.google.com/");
This seems like it relies on the test data having a TemplateURL for google.com. 
I'm not sure that's a good assumption to make.

Perhaps this test should manually add a TemplateURL for some fake search engine,
then use that.

Bart N.

Thanks for the review. I'm still running tests to make sure all looks good before ...

8 years, 1 month ago (2012-10-26 23:36:55 UTC) #32

Thanks for the review.  I'm still running tests to make sure all looks good
before I commit.

FYI: I'll have a followup CL early next week.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
File chrome/browser/autocomplete/autocomplete_match.cc (right):

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.cc:367: // term substitutions
like aqs/oq/aq.
On 2012/10/26 22:13:49, Peter Kasting wrote:
> Nit: URL -> the URL, purpose -> purposes.
> 
> Don't refer to "non-essential term substitutions" (or specific params) since
> this code removes _everything_ parametrized.

Done.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.cc:369: // we pick a template
that matches the destination URL's host.
On 2012/10/26 22:13:49, Peter Kasting wrote:
> Nit: This part can be omitted as it's documented in the GetTemplateURL()
> declaration.  Maybe just: "If the destination URL looks like it was generated
> from a TemplateURL, remove all substitutions other than the search terms. 
This
> allows us to eliminate cases like past search URLs from history that differ
only
> by some obscure query param from each other or from the search/keyword
provider
> matches."

Done.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.cc:379: // is sufficient from
de-dupping perspective).
On 2012/10/26 22:13:49, Peter Kasting wrote:
> Nit: I'd remove this comment entirely; the suggested comment I gave above
makes
> this clear enough IMO.

Done.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
File chrome/browser/autocomplete/autocomplete_match.h (right):

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.h:193: // data that would
otherwise prevent from proper dedupping.
On 2012/10/26 22:13:49, Peter Kasting wrote:
> Nit: dedupping -> deduping (and in other files)
Done.
> 
> I'd also replace "time/position sensitive data" with "query args other than
the
> search terms themselves" since that's more accurate (even though the prior
> phrase is the core motivation here).
> 
> Finally, if we need to pass a Profile here and in GetTemplateURL(), we should
> consider simply forcing AutocompleteMatch() to take a Profile* and store it as
a
> member.  Optionally, perhaps the right fix is to leave these signatures but
make
> AutocompleteResult take a Profile*?  I'm not sure without trying both.
> 
> It would be fine to do this as a follow up.
Yes, I'll have a follow-up CL to support proper destination URL normalization.
Deferred.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.h:223: // Simply delegates to
GetTemplateURL(profile, false) below.
On 2012/10/26 22:13:49, Peter Kasting wrote:
> Nit: Having two functions here is just another way of writing default
arguments,
> which are banned by the style guide.  Remove this function and make all
callers
> use the two-arg form.

Done.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_match.h:227: // |destination_url|'s
host if |check_host| is true.
On 2012/10/26 22:13:49, Peter Kasting wrote:
> Nit: Be more clear.  Something like "If |use_host| is true and the keyword
does
> not map to a valid TemplateURL, we'll then check for a TemplateURL that
> corresponds to the destination_url's hostname."  You may also want to rename
the
> arg |allow_fallback_to_destination_host| or something.

Done.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
File chrome/browser/autocomplete/autocomplete_result.cc (right):

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_result.cc:13: #include
"chrome/browser/search_engines/template_url_prepopulate_data.h"
On 2012/10/26 22:13:49, Peter Kasting wrote:
> Nit: Not necessary?
Good catch, yes.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
File chrome/browser/autocomplete/autocomplete_result_unittest.cc (right):

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_result_unittest.cc:53:
scoped_ptr<TestingProfile> profile_;
On 2012/10/26 22:13:49, Peter Kasting wrote:
> Nit: Does this need to be heap-allocated?  Seems like just a TestingProfile
> member would work.
Yup. Done.

http://codereview.chromium.org/11198074/diff/28001/chrome/browser/autocomplet...
chrome/browser/autocomplete/autocomplete_result_unittest.cc:248:
matches[4].destination_url = GURL("http://www.google.com/");
On 2012/10/26 22:13:49, Peter Kasting wrote:
> This seems like it relies on the test data having a TemplateURL for
http://google.com. 
> I'm not sure that's a good assumption to make.
> 
> Perhaps this test should manually add a TemplateURL for some fake search
engine,
> then use that.
Yes - I'll update it in the followup CL.