Issue 21703003: Minor sk_memset{16|32}_SSE2 optimization.

Issue 21703003: Minor sk_memset{16|32}_SSE2 optimization. (Closed)

Created:
7 years, 4 months ago by f(malita)

Modified:
7 years, 4 months ago

Reviewers:
Stephen White, reed1

CC:
skia-review_googlegroups.com

Base URL:
https://skia.googlecode.com/svn/trunk

Visibility:
Public.

More Reviews

Description

Minor sk_memset{16|32}_SSE2 optimization. Using explicitly indexed references allows some compilers to generate more efficient loops. For gcc 4.6.3: 613c18: 83 ea 10 sub $0x10,%edx 613c1b: 66 0f 7f 07 movdqa %xmm0,(%rdi) 613c1f: 66 0f 7f 47 10 movdqa %xmm0,0x10(%rdi) 613c24: 66 0f 7f 47 20 movdqa %xmm0,0x20(%rdi) 613c29: 66 0f 7f 47 30 movdqa %xmm0,0x30(%rdi) 613c2e: 48 83 c7 40 add $0x40,%rdi 613c32: 83 fa 0f cmp $0xf,%edx 613c35: 7f e1 jg 613c18 <_Z16sk_memset32_SSE2Pjji+0x38> vs. previous: 613c18: 83 ea 10 sub $0x10,%edx 613c1b: 66 0f 7f 07 movdqa %xmm0,(%rdi) 613c1f: 66 0f 7f 47 10 movdqa %xmm0,0x10(%rdi) 613c24: 66 0f 7f 47 20 movdqa %xmm0,0x20(%rdi) 613c29: 48 83 c7 40 add $0x40,%rdi 613c2d: 83 fa 0f cmp $0xf,%edx 613c30: 66 0f 7f 47 f0 movdqa %xmm0,-0x10(%rdi) 613c35: 7f e1 jg 613c18 <_Z16sk_memset32_SSE2Pjji+0x38> This yields a 0.2% - 1% improvement with the memset micro benchmarks, presumably due to avoiding a stall on the next store after the %rdi increment. R=reed@google.com,senorblanco@chromium.org Committed: http://code.google.com/p/skia/source/detail?r=10545

Patch Set 1 #

Created: 7 years, 4 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+10 lines, -8 lines)			Patch
	M	src/opts/SkUtils_opts_SSE2.cpp	View		2 chunks	+10 lines, -8 lines	0 comments	Download

Messages

Total messages: 4 (0 generated)

Expand Messages | Collapse Messages

f(malita)

Memset bench numbers (sorted) for 10 runs of nice -n -20 taskset 0x00000001 bench --timers ...

7 years, 4 months ago (2013-08-04 18:01:43 UTC) #1

Memset bench numbers (sorted) for 10 runs of

  nice -n -20 taskset 0x00000001 bench --timers c --repeat 10 --config
NONRENDERING  --match memset

It would be nice to test this with clang also.

        ORIG    PATCH   DELTA   DELTA%

memset16_1_600
	13.26	12.97	0.29	2.19%
	13.29	13	0.29	2.18%
	13.3	13	0.30	2.26%
	13.3	13	0.30	2.26%
	13.31	13.01	0.30	2.25%
	13.32	13.01	0.31	2.33%
	13.36	13.01	0.35	2.62%
	13.36	13.01	0.35	2.62%
	13.38	13.02	0.36	2.69%
	15.2	13.02	2.18	14.34%
				
memset16_600_800
	6.68	6.64	0.04	0.60%
	6.69	6.65	0.04	0.60%
	6.7	6.65	0.05	0.75%
	6.7	6.65	0.05	0.75%
	6.71	6.65	0.06	0.89%
	6.72	6.65	0.07	1.04%
	6.72	6.65	0.07	1.04%
	6.74	6.66	0.08	1.19%
	6.75	6.66	0.09	1.33%
	7.66	6.67	0.99	12.92%
				
memset16_800_1000
	7.68	7.67	0.01	0.13%
	7.68	7.67	0.01	0.13%
	7.68	7.67	0.01	0.13%
	7.69	7.67	0.02	0.26%
	7.7	7.67	0.03	0.39%
	7.71	7.67	0.04	0.52%
	7.71	7.68	0.03	0.39%
	7.71	7.69	0.02	0.26%
	7.82	7.69	0.13	1.66%
	8.78	7.81	0.97	11.05%
				
memset16_1000_2000
	55.85	55.56	0.29	0.52%
	55.92	55.57	0.35	0.63%
	55.92	55.57	0.35	0.63%
	55.96	55.58	0.38	0.68%
	56.03	55.59	0.44	0.79%
	56.13	55.67	0.46	0.82%
	56.14	55.68	0.46	0.82%
	56.18	55.7	0.48	0.85%
	56.27	55.72	0.55	0.98%
	63.78	56.85	6.93	10.87%
				
memset16_2000_3000
	85.65	85.36	0.29	0.34%
	85.67	85.39	0.28	0.33%
	85.71	85.4	0.31	0.36%
	85.73	85.44	0.29	0.34%
	85.88	85.47	0.41	0.48%
	86.03	85.49	0.54	0.63%
	86.06	85.52	0.54	0.63%
	86.09	85.54	0.55	0.64%
	86.57	85.54	1.03	1.19%
	98.13	85.58	12.55	12.79%
				
memset16_3000_4000
	116.02	115.76	0.26	0.22%
	116.04	115.79	0.25	0.22%
	116.05	115.79	0.26	0.22%
	116.08	115.82	0.26	0.22%
	116.2	115.85	0.35	0.30%
	116.39	115.87	0.52	0.45%
	116.44	115.88	0.56	0.48%
	116.49	115.91	0.58	0.50%
	117.15	115.93	1.22	1.04%
	132.87	119	13.87	10.44%
				
memset16_4000_5000
	160.49	157.18	3.31	2.06%
	160.98	157.62	3.36	2.09%
	161.66	157.66	4.00	2.47%
	161.82	157.8	4.02	2.48%
	162.05	157.98	4.07	2.51%
	162.88	158.5	4.38	2.69%
	163.01	158.58	4.43	2.72%
	164.32	159.83	4.49	2.73%
	164.76	160.01	4.75	2.88%
	167.53	168.12	-0.59	-0.35%
				
memset32_1_600
	8.85	8.7	0.15	1.69%
	8.86	8.71	0.15	1.69%
	8.86	8.71	0.15	1.69%
	8.86	8.71	0.15	1.69%
	8.87	8.71	0.16	1.80%
	8.88	8.72	0.16	1.80%
	8.89	8.72	0.17	1.91%
	8.9	8.72	0.18	2.02%
	8.9	8.72	0.18	2.02%
	8.91	8.74	0.17	1.91%
				
memset32_600_800
	4.95	4.91	0.04	0.81%
	4.96	4.91	0.05	1.01%
	4.96	4.91	0.05	1.01%
	4.96	4.92	0.04	0.81%
	4.96	4.92	0.04	0.81%
	4.97	4.92	0.05	1.01%
	4.97	4.93	0.04	0.80%
	4.97	4.93	0.04	0.80%
	4.99	4.93	0.06	1.20%
	5.01	4.94	0.07	1.40%
				
memset32_800_1000
	6.16	6.14	0.02	0.32%
	6.17	6.15	0.02	0.32%
	6.18	6.15	0.03	0.49%
	6.18	6.15	0.03	0.49%
	6.19	6.15	0.04	0.65%
	6.2	6.15	0.05	0.81%
	6.21	6.16	0.05	0.81%
	6.21	6.16	0.05	0.81%
	6.23	6.16	0.07	1.12%
	6.23	6.16	0.07	1.12%
				
memset32_1000_2000
	49.08	49	0.08	0.16%
	49.12	49.02	0.10	0.20%
	49.15	49.03	0.12	0.24%
	49.2	49.03	0.17	0.35%
	49.21	49.03	0.18	0.37%
	49.26	49.05	0.21	0.43%
	49.3	49.08	0.22	0.45%
	49.3	49.09	0.21	0.43%
	49.3	49.12	0.18	0.37%
	49.32	49.22	0.10	0.20%
				
memset32_2000_3000
	79.48	79.4	0.08	0.10%
	79.52	79.41	0.11	0.14%
	79.55	79.41	0.14	0.18%
	79.55	79.42	0.13	0.16%
	79.56	79.43	0.13	0.16%
	79.75	79.43	0.32	0.40%
	79.79	79.44	0.35	0.44%
	79.8	79.45	0.35	0.44%
	79.81	79.51	0.30	0.38%
	79.87	79.51	0.36	0.45%
				
memset32_3000_4000
	109.89	109.84	0.05	0.05%
	109.93	109.85	0.08	0.07%
	110.02	109.86	0.16	0.15%
	110.25	109.88	0.37	0.34%
	110.27	109.88	0.39	0.35%
	110.35	109.91	0.44	0.40%
	110.38	109.95	0.43	0.39%
	110.4	109.95	0.45	0.41%
	110.9	110.03	0.87	0.78%
	111.27	111.1	0.17	0.15%
				
memset32_4000_5000
	140.37	140.28	0.09	0.06%
	140.38	140.3	0.08	0.06%
	140.42	140.31	0.11	0.08%
	140.44	140.31	0.13	0.09%
	140.47	140.31	0.16	0.11%
	140.82	140.33	0.49	0.35%
	140.83	140.36	0.47	0.33%
	140.85	140.36	0.49	0.35%
	140.99	140.36	0.63	0.45%
	155.79	140.37	15.42	9.90%

Stephen White

LGTM. Hard to believe the compiler can't figure that out.

7 years, 4 months ago (2013-08-05 13:56:55 UTC) #2

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://skia-tree-status.appspot.com/cq/fmalita@chromium.org/21703003/1

7 years, 4 months ago (2013-08-05 20:21:13 UTC) #3

Message was sent while issue was closed.

Change committed as 10545

Expand Messages | Collapse Messages