|
|
DescriptionMinor SimpleCacheBackend improvements.
- Use a base::hash_map instead of a std::map for the
SimpleBackendImpl::EntryMap type, to reduce heap usage.
This saves 3 pointers/node on Android, since there is no
need for an ordered map here.
- Avoid creating a new closure on every call to
SimpleIndex::PostponeWritingToDisk. Each base::Bind() call
creates a new heap-allocated reference-counted object.
Given that this function is called for every mutating
operation on the index, it's a good idea to get rid of it.
Since the called function and its argument never change,
create the closure only once and reuse it on every
write_to_disk_timer_.Start() call.
- Avoid performing one extra lookup and one extra call
to PostponeWritingToDisk() in SimpleIndex::Remove().
- Remove un-necessary temporary string creations and a tiny
typo.
R=felipeg@chromium.org, gavinp@chromium.org, pasko@chromium.org, pliard@chromium.org
BUG=
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=204035
Patch Set 1 #
Total comments: 10
Patch Set 2 : Made failed Open and Create an UNINITIALIZED, added a test. #Patch Set 3 : Made failed Open and Create an UNINITIALIZED, added a test. #Patch Set 4 : rebase #
Total comments: 10
Patch Set 5 : simple rebase #Patch Set 6 : nits #
Total comments: 2
Patch Set 7 : Use sizeof("_0") - 1' #
Messages
Total messages: 15 (0 generated)
https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... File net/disk_cache/simple/simple_backend_impl.h (right): https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_backend_impl.h:88: typedef base::hash_map<uint64, base::WeakPtr<SimpleEntryImpl> > EntryMap; This change cannot make any significant improvement: the amount of active entries is not more than 300 at all times. Even worse, it can degrade performance if a 'malicious' website attacks our hash function making buckets for google.com main resource very lengthy, for example. I would rather leave it as is. https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... File net/disk_cache/simple/simple_index.cc (right): https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_index.cc:269: UpdateEntryIteratorSize(&it, 0); I am guessing the main purpose of this is to remove extra PostponeWritingToDisk() and StartEvictionIfNeeded(). And also make less searches through the small collection. I think it is premature: * all these operations are cheap in this case * makes it harder to read The abstraction that UpdateEntryIteratorSize presents is a bit subtle: it updates both object state (cache_size_) and the entry pointed at, which is hard to reflect in the function name. It leaves a few second thoughts on the iterator nature in the different context: whether it is safe to update the value the iterator points to. One way to simplify it: SimpleIndex::UpdateSizeForEntrySizeChange(const disk_cache::Entry& entry, uint64 entry_size) and entry size modification would happen separately near the spot where we created the iterator. But then it's only 2 lines and a DCHECK. Is it worth it? https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_index.cc:388: FROM_HERE, base::TimeDelta::FromMilliseconds(delay), write_to_disk_cb_); nice https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_index.cc:502: base_name.end() - kFileSuffixLength); nice https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... File net/disk_cache/simple/simple_index.h (right): https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_index.h:199: base::Closure write_to_disk_cb_; nit: please separate the declaration of last_write_to_disk_ from the next two declarations by one empty line.
https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... File net/disk_cache/simple/simple_backend_impl.h (right): https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_backend_impl.h:88: typedef base::hash_map<uint64, base::WeakPtr<SimpleEntryImpl> > EntryMap; I feel that these statements are contradicting each other :) If there are no more than 300 entries at all times, then a degenerate map will result in a linked list with 300 items in it. Even a mobile CPU should be able to scan such a list and perform 64-bit comparisons in each node in a _very_ reasonable amount of time, so the effect of such denial of service would probably not be noticeable. Moreover, I though we were using a cryptographic hash like SHA-1 to make this DOS attack extremely difficult to pull. I.e. I don't think there is an easy way to produce hundreds or even thousands of input strings that all hash to the same SHA-1. Even if there was, this would require several hundreds / thousands concurrent active requests, right? If you still think a std::map() is better, I propose adding a comment on top of this typedef explaining why this container type is preferred. https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... File net/disk_cache/simple/simple_index.cc (right): https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_index.cc:269: UpdateEntryIteratorSize(&it, 0); On 2013/05/17 10:31:33, pasko wrote: > I am guessing the main purpose of this is to remove extra > PostponeWritingToDisk() and StartEvictionIfNeeded(). And also make less searches > through the small collection. > Yes. > I think it is premature: > * all these operations are cheap in this case Unfortunately, due to the way our timers are implemented, PostponeWritingToDisk() is costly. Everytime you call Start() or Reset() on a timer, a new (internal/hidden) task gets posted on the message loop. Said task will always execute, but may or may end up calling the timer's callback (the time implementation ensures that only the last scheduled task will do that). Since the current code calls Start() twice in a row, this results in at least one completely un-necessary task every time. Each one of them takes time and heap to setup / operate / disappear, and pollute the message loop for no good reason. In other words, I think it's valuable to avoid the repetition here. I agree that the lookup and StartEvictionIfNeeded() call are cheap though. I wish our message loops implemented timers natively instead, but that's not currently the way it works. > * makes it harder to read > Probably, any suggestion welcome :) > The abstraction that UpdateEntryIteratorSize presents is a bit subtle: it > updates both object state (cache_size_) and the entry pointed at, which is hard > to reflect in the function name. It leaves a few second thoughts on the iterator > nature in the different context: whether it is safe to update the value the > iterator points to. > > One way to simplify it: > SimpleIndex::UpdateSizeForEntrySizeChange(const disk_cache::Entry& entry, uint64 > entry_size) > > and entry size modification would happen separately near the spot where we > created the iterator. But then it's only 2 lines and a DCHECK. Is it worth it? I'll try to come up with something. I agree it'd be good to clarify this behaviour. https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... File net/disk_cache/simple/simple_index.h (right): https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_index.h:199: base::Closure write_to_disk_cb_; On 2013/05/17 10:31:33, pasko wrote: > nit: please separate the declaration of last_write_to_disk_ from the next two > declarations by one empty line. > Done.
https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... File net/disk_cache/simple/simple_backend_impl.h (right): https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_backend_impl.h:88: typedef base::hash_map<uint64, base::WeakPtr<SimpleEntryImpl> > EntryMap; On 2013/05/17 14:36:18, digit1 wrote: > I feel that these statements are contradicting each other :) > > If there are no more than 300 entries at all times, then a degenerate map will > result in a linked list with 300 items in it. Even a mobile CPU should be able > to scan such a list and perform 64-bit comparisons in each node in a _very_ > reasonable amount of time, so the effect of such denial of service would > probably not be noticeable. > > Moreover, I though we were using a cryptographic hash like SHA-1 to make this > DOS attack extremely difficult to pull. I.e. I don't think there is an easy way > to produce hundreds or even thousands of input strings that all hash to the same > SHA-1. Even if there was, this would require several hundreds / thousands > concurrent active requests, right? > > If you still think a std::map() is better, I propose adding a comment on top of > this typedef explaining why this container type is preferred. You are right, there is practically no difference. Keep it whatever you like, just remove "include <map>" if you decide to stay with hash_map. https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... File net/disk_cache/simple/simple_index.cc (right): https://chromiumcodereview.appspot.com/14877019/diff/1/net/disk_cache/simple/... net/disk_cache/simple/simple_index.cc:269: UpdateEntryIteratorSize(&it, 0); On 2013/05/17 14:36:18, digit1 wrote: > On 2013/05/17 10:31:33, pasko wrote: > > I am guessing the main purpose of this is to remove extra > > PostponeWritingToDisk() and StartEvictionIfNeeded(). And also make less > searches > > through the small collection. > > > Yes. > > > I think it is premature: > > * all these operations are cheap in this case > > Unfortunately, due to the way our timers are implemented, > PostponeWritingToDisk() is costly. Everytime you call Start() or Reset() on a > timer, a new (internal/hidden) task gets posted on the message loop. Said task > will always execute, but may or may end up calling the timer's callback (the > time implementation ensures that only the last scheduled task will do that). > > Since the current code calls Start() twice in a row, this results in at least > one completely un-necessary task every time. Each one of them takes time and > heap to setup / operate / disappear, and pollute the message loop for no good > reason. In other words, I think it's valuable to avoid the repetition here. Thanks! After this overview I am a little scared: there are tons of these timers across the codebase. > I agree that the lookup and StartEvictionIfNeeded() call are cheap though. > > I wish our message loops implemented timers natively instead, but that's not > currently the way it works. Would it be a viable optimization to to make it a single PostTask? Messing up with alarm() and signal handling scares me off as well, do we have a relevant benchmark to prove the approach before actually making it working properly? Probably not.. Sad face. > > * makes it harder to read > > > Probably, any suggestion welcome :) > > > The abstraction that UpdateEntryIteratorSize presents is a bit subtle: it > > updates both object state (cache_size_) and the entry pointed at, which is > hard > > to reflect in the function name. It leaves a few second thoughts on the > iterator > > nature in the different context: whether it is safe to update the value the > > iterator points to. > > > > One way to simplify it: > > SimpleIndex::UpdateSizeForEntrySizeChange(const disk_cache::Entry& entry, > uint64 > > entry_size) > > > > and entry size modification would happen separately near the spot where we > > created the iterator. But then it's only 2 lines and a DCHECK. Is it worth it? > > I'll try to come up with something. I agree it'd be good to clarify this > behaviour. Thanks.
Something weird has happened to this CL... I don't understand the two latest uploads.
On 2013/05/21 16:39:15, gavinp wrote: > Something weird has happened to this CL... I don't understand the two latest > uploads. please ignore them, I accidentally uploaded to the wrong issue, twice. Rietveld produces a 403 error when uploading to other people's issues, but creates an extra patchset. I have no idea how it happened in my client, I did not reuse the branch, but was rewriting my branch history, and could probably take some values from 'master', where I patched this CL once before.
Spooky, I've rebased and re-uploaded the patch. By the way, I had the following message from "git cl upload": """" The last upload made from this repository was patchset #1 but the most recent patchset on the server is #14003. Uploading will still work, but if you've uploaded to this issue from another machine or branch the patch you're uploading now might not include those changes. About to upload; enter to confirm. """" latest patch seems ok. Go figure.
On 2013/05/21 19:51:14, digit wrote: > Spooky, I've rebased and re-uploaded the patch. By the way, I had the following > message from "git cl upload": > > """" > The last upload made from this repository was patchset #1 but the most recent > patchset on the server is #14003. > Uploading will still work, but if you've uploaded to this issue from another > machine or branch the patch you're uploading now might not include those > changes. > About to upload; enter to confirm. > """" > > latest patch seems ok. Go figure. You can probably safely delete patch sets 2 and 3, so future generations don't get confused. I'll go through this now.
https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:170: write_to_disk_cb_ = base::Bind(&SimpleIndex::WriteToDisk, AsWeakPtr()); Why not place this in the ctor list? https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:282: entries_set_.find(simple_util::GetEntryHashKey(key)) != I don't think this is an improvement. count() is a lot easier to read than find. https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:348: bool SimpleIndex::UpdateEntrySize(const std::string& key, uint64 entry_size) { Is there any way to lose this? https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:392: uint64 entry_size) { Please add a thread checker. https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:394: DCHECK(cache_size_ - (*it)->second.GetEntrySize() <= cache_size_); DCHECK_GE(cache_size, ...);
https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:170: write_to_disk_cb_ = base::Bind(&SimpleIndex::WriteToDisk, AsWeakPtr()); On 2013/05/21 19:59:21, gavinp wrote: > Why not place this in the ctor list? Done. https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:282: entries_set_.find(simple_util::GetEntryHashKey(key)) != count() is actually slowed than find() != end() with STLport, but I agree the difference should be minimal, so I've reverted this. https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:348: bool SimpleIndex::UpdateEntrySize(const std::string& key, uint64 entry_size) { On 2013/05/21 19:59:21, gavinp wrote: > Is there any way to lose this? The only place where it's called is SimpleEntryImpl::SetSynchronousData() (and a unit test). I'm not sure what this does so didn't try to touch it. https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:392: uint64 entry_size) { On 2013/05/21 19:59:21, gavinp wrote: > Please add a thread checker. Done. https://codereview.chromium.org/14877019/diff/19001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:394: DCHECK(cache_size_ - (*it)->second.GetEntrySize() <= cache_size_); On 2013/05/21 19:59:21, gavinp wrote: > DCHECK_GE(cache_size, ...); Done.
https://codereview.chromium.org/14877019/diff/30001/net/disk_cache/simple/sim... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/14877019/diff/30001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:352: UpdateEntryIteratorSize(&it, entry_size); Somehow adding this function looks less awkward to me today. Naming suggestion: UpdateEntrySize(). We are in the Index, so the name does not need to say that we are updating the index. There is no need to say "Iterator" in the name because we are passing the iterator in anyway. Still I would prefer to do it->second.SetEntrySize(entry_size) outside this function, with that the reader is not forced to think whether updating the iterator is done in the function, and in which way.
lgtm https://codereview.chromium.org/14877019/diff/30001/net/disk_cache/simple/sim... File net/disk_cache/simple/simple_index.cc (right): https://codereview.chromium.org/14877019/diff/30001/net/disk_cache/simple/sim... net/disk_cache/simple/simple_index.cc:489: const int kFileSuffixLength = strlen("_0"); As long as we're playing this game, you could make this compile time with sizeof("_0") - 1, right?
On 2013/05/30 13:50:40, gavinp wrote: > As long as we're playing this game, you could make this compile time with > sizeof("_0") - 1, right? Yes, I've done that, trying the latest patch now :)
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/digit@chromium.org/14877019/41001
Message was sent while issue was closed.
Change committed as 204035 |