Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(27)

Issue 2291773002: The function CopyCharsUnsigned(uint8_t* dest, const uint8_t* src, size_t chars) is now a specializa…

Created:
4 years, 3 months ago by octavian.soldea
Modified:
3 years, 10 months ago
CC:
v8-reviews_googlegroups.com, ofrobots
Target Ref:
refs/pending/heads/master
Project:
v8
Visibility:
Public.

Description

The function CopyCharsUnsigned(uint8_t* dest, const uint8_t* src, size_t chars) is now a specialization of a template function in which I prefer using MemCopy, a function which translates to AVX for the x64 architecture (on Intel platforms). The submitted patch is a modification in file utils.h. I would like to ask for a review from Jochen Eisinger jochen@chromium.org. The submission is not affecting practically the V8 performance (I runned the V8 benchmarks suite). However, this submission is important due to the fact that I have seen improvement of 11% in the http parsing benchmark of Node.js. In this context, I would like to mention that the current submission was first submitted to Node.js under pull request number 7898 (see https://github.com/nodejs/node/pull/7898) and it was redirected to V8. BUG=v8:5395

Patch Set 1 #

Total comments: 1

Patch Set 2 : Changing kMinComplexMemCopy to 8 on x64 architecture. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+25 lines, -2 lines) Patch
M src/utils.h View 1 3 chunks +25 lines, -2 lines 0 comments Download

Messages

Total messages: 23 (8 generated)
octavian.soldea
4 years, 3 months ago (2016-08-29 23:11:40 UTC) #4
jochen (gone - plz use gerrit)
hi, thanks for sending this patch Could you please format the CL description according to ...
4 years, 3 months ago (2016-08-30 11:46:18 UTC) #7
Jakob Kummerow
Sorry, I'm not happy with this patch in its current form. See one comment on ...
4 years, 3 months ago (2016-08-30 13:29:28 UTC) #8
Michael Hablich
octavian.soldea can you please create an issue on V8's issue tracker with the data and ...
4 years, 3 months ago (2016-08-30 13:51:18 UTC) #9
Michael Hablich
On 2016/08/30 13:51:18, Hablich wrote: > octavian.soldea can you please create an issue on V8's ...
4 years, 3 months ago (2016-08-30 13:55:21 UTC) #12
octavian.soldea
On 2016/08/30 13:55:21, Hablich wrote: > On 2016/08/30 13:51:18, Hablich wrote: > > octavian.soldea can ...
4 years, 3 months ago (2016-08-30 18:00:40 UTC) #13
octavian.soldea
On 2016/08/30 18:00:40, octavian.soldea wrote: > On 2016/08/30 13:55:21, Hablich wrote: > > On 2016/08/30 ...
4 years, 3 months ago (2016-09-15 14:07:39 UTC) #14
octavian.soldea
On 2016/09/15 14:07:39, octavian.soldea wrote: > On 2016/08/30 18:00:40, octavian.soldea wrote: > > On 2016/08/30 ...
4 years, 2 months ago (2016-10-18 21:01:40 UTC) #16
Jakob Kummerow
How well does this alternative proposal work for you? https://chromiumcodereview.appspot.com/2438583002 I would prefer not to ...
4 years, 2 months ago (2016-10-19 16:16:22 UTC) #17
octavian.soldea
On 2016/10/19 16:16:22, Jakob Kummerow wrote: > How well does this alternative proposal work for ...
4 years, 1 month ago (2016-11-02 18:33:48 UTC) #18
octaviansoldea
On 2016/11/02 18:33:48, octavian.soldea wrote: > On 2016/10/19 16:16:22, Jakob Kummerow wrote: > > How ...
4 years ago (2016-11-24 00:50:54 UTC) #19
Jakob Kummerow
My point still stands: the explicit specialization you provide doesn't do anything. It is the ...
4 years ago (2016-11-24 10:25:20 UTC) #20
octaviansoldea
On 2016/11/24 10:25:20, Jakob Kummerow wrote: > My point still stands: the explicit specialization you ...
4 years ago (2016-11-24 14:33:18 UTC) #21
octavian.soldea
On 2016/11/24 14:33:18, octaviansoldea wrote: > On 2016/11/24 10:25:20, Jakob Kummerow wrote: > > My ...
3 years, 11 months ago (2017-01-10 03:40:29 UTC) #22
octaviansoldea
3 years, 10 months ago (2017-02-14 18:18:32 UTC) #23
On 2017/01/10 03:40:29, octavian.soldea wrote:
> On 2016/11/24 14:33:18, octaviansoldea wrote:
> > On 2016/11/24 10:25:20, Jakob Kummerow wrote:
> > > My point still stands: the explicit specialization you provide doesn't do
> > > anything. It is the same as the generic implementation. If you see a
> > difference,
> > > then either you're measuring noise (or artifacts like alignment), or your
> > > compiler is really bad.
> > > 
> > > Look: Here's the original, generic, platform-independent implementation:
> > > 
> > > template <typename sourcechar, typename sinkchar>
> > > void CopyCharsUnsigned(sinkchar* dest, const sourcechar* src, size_t
chars)
> {
> > >   sinkchar* limit = dest + chars;
> > >   if ((sizeof(*dest) == sizeof(*src)) &&
> > >       (chars >= static_cast<int>(kMinComplexMemCopy / sizeof(*dest)))) {
> > >     MemCopy(dest, src, chars * sizeof(*dest));
> > >   } else {
> > >     while (dest < limit) *dest++ = static_cast<sinkchar>(*src++);
> > >   }
> > > }
> > > 
> > > If we request an automatic specialization for uint8_t, what the compiler
> > > generates internally as a first step is:
> > > 
> > > void CopyCharsUnsigned(uint8_t* dest, const uint8_t* src, size_t chars) {
> > >   uint8_t* limit = dest + chars;
> > >   if ((sizeof(*dest) == sizeof(*src)) &&
> > >       (chars >= static_cast<int>(kMinComplexMemCopy / sizeof(*dest)))) {
> > >     MemCopy(dest, src, chars * sizeof(*dest));
> > >   } else {
> > >     while (dest < limit) *dest++ = static_cast<uint8_t>(*src++);
> > >   }
> > > }
> > > 
> > > Since the compiler knows that sizeof(uint8_t) == 1, and static_cast from
> > uint8_t
> > > to uint8_t is a no-op, this simplifies further to:
> > > 
> > > void CopyCharsUnsigned(uint8_t* dest, const uint8_t* src, size_t chars) {
> > >   sinkchar* limit = dest + chars;
> > >   if ((1 == 1) &&
> > >       (chars >= static_cast<int>(kMinComplexMemCopy / 1))) {
> > >     MemCopy(dest, src, chars * 1);
> > >   } else {
> > >     while (dest < limit) *dest++ = *src++;
> > >   }
> > > }
> > > 
> > > Now if the trivial expressions "1 == 1", "/ 1", "* 1" are constant-folded
> > away,
> > > you end up exactly with the specialization you provided manually. (With
one
> > > minor exception: you still include "/ sizeof(*dest)" in your code, even
> though
> > > you know that dest has type uint8_t so the division is a no-op.) 
> > > 
> > > Am I missing something?
> > > 
> > > Which leads me to conclude that if the manual specialization does indeed
> > produce
> > > different code than automatic instantiation of the templatized version,
then
> > > it's a bug in your compiler.
> > > 
> > > Further, we strongly dislike platform-specific code because it's a
> maintenance
> > > nightmare. If you want to add any "#ifdef *_ARCH_*" code, you need a
pretty
> > > strong reason that the given platform really benefits a lot. Copying
> > > platform-independent code to a platform-specific #ifdef does not count as
> > > significant benefit.
> > > 
> > > So it all seems to boil down to you saying that kMinComplexMemCopy = 8 is
> > better
> > > than kMinComplexMemCopy = 16. Fine. It's still a one-liner then.
> > > 
> > > (I'd also still be curious to hear if there are any other scenarios than
the
> > > bench-parser.js microbenchmark that benefit from this change. It's easy to
> > > "prove" almost anything with a microbenchmark. It's also easy to make the
> > wrong
> > > engineering decisions based on microbenchmarks -- maybe a bigger
> > > kMinComplexMemCopy constant is better in real workloads? That's the
primary
> > > reason why I'd tend to be careful with changing it; 128 -> 16 seems safer
> than
> > > 128 -> 8 just because it's a smaller step.)
> > 
> > Dear Reviewer
> > 
> > Thank you for your message. I really appreciate and like your explanations.
In
> > this context, I am preparing an answer and come back asap.
> > 
> > Best regards,
> >   Octavian
> 
> Dear Reviewer, Hi Jakob,
> 
> Thank you for your feedback provided in your your previous answers.
> 
> Taking into account many constraints, I understand the problems
> related to specialization and accept these comments. While 
> kMinComplexMemCopy = 16 presents better performance than the baseline,
> I would prefer to set kMinComplexMemCopy = 8. Here is an experiment that
> analyzes the performance of kMinComplexMemCopy = 8, 16, and baseline.
> 
> When benchmarking the versions with kMinComplexMemCopy = 8, 16, and baseline,
> using parameters n=1000000 and fields 4, 256, and 512, I received
> the following requests per seconds. Each line represents a constant field. In
> each line, the first number represents kMinComplexMemCopy = 8, the second
> is kMinComplexMemCopy = 16, and the third is the baseline.
> 
>             K8      K16     Baseline																			
> fields=4    811,728 786,777 784,156																			
> fields=256   34,844  33,364  29,990																			
> fields=512   17,616  16,865  15,397		
> 
> Therefore, I would like to ask to approve kMinComplexMemCopy = 8.
> 
> I am looking forward to hearing from you.																
> 
> Wishing you A Happy New Year!
>               Octavian

Hi Jakob, Dear Reviewer

In a trial to give a more complete answer, I would like to come back with
several arguments.

The compiler I have used is gcc version 5. In order to analyze the behavior of
the code I performed a comparative analysis
of the assembly code. When expanding the function CopyCharsUnsigned, in the
cases kMinComplexMemCpy=128 (baseline), 8, and 16,
the very first assembly instruction is cmp $0x[value]. The only difference I
have seen, is that the argument [value] of the cmp
instruction differs, i.e. [value] equals 7f, 7, and f respectively. Therefore,
the difference consists of when the memcpy
function is called. At this point CopyCharsUnsigned is calling the memcpy
version that is implemented
in the standard C library, i.e. it calls __mempy_avx_unaligned, which is highly
optimized with AVX instructions, on IA64.
The more we use memcpy the better. Therefore, there is an advantage in using it.
Furthermore, when renouncing to the kMinComplexMemCpy
constant, the assembly code expands in a direct call to memcpy, without a test
of size. This is the best, in my humble opinion.

Could you please take a decision regarding this change?
I am looking forward to hearing from you.

Best regards,
  Octavian

Powered by Google App Engine
This is Rietveld 408576698