OLD | NEW |
1 IMPORTANT NOTE FOR 64-BIT USERS | 1 IMPORTANT NOTE FOR 64-BIT USERS |
2 ------------------------------- | 2 ------------------------------- |
3 There are known issues with some perftools functionality on x86_64 | 3 There are known issues with some perftools functionality on x86_64 |
4 systems. See 64-BIT ISSUES, below. | 4 systems. See 64-BIT ISSUES, below. |
5 | 5 |
6 | 6 |
7 TCMALLOC | 7 TCMALLOC |
8 -------- | 8 -------- |
9 Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of | 9 Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of |
10 tcmalloc -- a replacement for malloc and new. See below for some | 10 tcmalloc -- a replacement for malloc and new. See below for some |
11 environment variables you can use with tcmalloc, as well. | 11 environment variables you can use with tcmalloc, as well. |
12 | 12 |
13 tcmalloc functionality is available on all systems we've tested; see | 13 tcmalloc functionality is available on all systems we've tested; see |
14 INSTALL for more details. See README_windows.txt for instructions on | 14 INSTALL for more details. See README_windows.txt for instructions on |
15 using tcmalloc on Windows. | 15 using tcmalloc on Windows. |
16 | 16 |
17 NOTE: When compiling with programs with gcc, that you plan to link | 17 NOTE: When compiling with programs with gcc, that you plan to link |
18 with libtcmalloc, it's safest to pass in the flags | 18 with libtcmalloc, it's safest to pass in the flags |
19 | 19 |
20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free | 20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free |
21 | 21 |
22 when compiling. gcc makes some optimizations assuming it is using its | 22 when compiling. gcc makes some optimizations assuming it is using its |
23 own, built-in malloc; that assumption obviously isn't true with | 23 own, built-in malloc; that assumption obviously isn't true with |
24 tcmalloc. In practice, we haven't seen any problems with this, but | 24 tcmalloc. In practice, we haven't seen any problems with this, but |
25 the expected risk is highest for users who register their own malloc | 25 the expected risk is highest for users who register their own malloc |
26 hooks with tcmalloc (using google/malloc_hook.h). The risk is lowest | 26 hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is |
27 for folks who use tcmalloc_minimal (or, of course, who pass in the | 27 lowest for folks who use tcmalloc_minimal (or, of course, who pass in |
28 above flags :-) ). | 28 the above flags :-) ). |
29 | 29 |
30 | 30 |
31 HEAP PROFILER | 31 HEAP PROFILER |
32 ------------- | 32 ------------- |
33 See doc/heap-profiler.html for information about how to use tcmalloc's | 33 See doc/heap-profiler.html for information about how to use tcmalloc's |
34 heap profiler and analyze its output. | 34 heap profiler and analyze its output. |
35 | 35 |
36 As a quick-start, do the following after installing this package: | 36 As a quick-start, do the following after installing this package: |
37 | 37 |
38 1) Link your executable with -ltcmalloc | 38 1) Link your executable with -ltcmalloc |
(...skipping 181 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
220 Its likeliness increases the more dlopen() commands an executable has. | 220 Its likeliness increases the more dlopen() commands an executable has. |
221 Most executables don't have any, though several library routines like | 221 Most executables don't have any, though several library routines like |
222 getgrgid() call dlopen() behind the scenes. | 222 getgrgid() call dlopen() behind the scenes. |
223 | 223 |
224 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the | 224 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the |
225 cpu-profiler tool is unreliable: it will sometimes work, but sometimes | 225 cpu-profiler tool is unreliable: it will sometimes work, but sometimes |
226 cause a segfault. I'll explain the problem first, and then some | 226 cause a segfault. I'll explain the problem first, and then some |
227 workarounds. | 227 workarounds. |
228 | 228 |
229 Note that this only affects the cpu-profiler, which is a | 229 Note that this only affects the cpu-profiler, which is a |
230 google-perftools feature you must turn on manually by setting the | 230 gperftools feature you must turn on manually by setting the |
231 CPUPROFILE environment variable. If you do not turn on cpu-profiling, | 231 CPUPROFILE environment variable. If you do not turn on cpu-profiling, |
232 you shouldn't see any crashes due to perftools. | 232 you shouldn't see any crashes due to perftools. |
233 | 233 |
234 The gory details: The underlying problem is in the backtrace() | 234 The gory details: The underlying problem is in the backtrace() |
235 function, which is a built-in function in libc. | 235 function, which is a built-in function in libc. |
236 Backtracing is fairly straightforward in the normal case, but can run | 236 Backtracing is fairly straightforward in the normal case, but can run |
237 into problems when having to backtrace across a signal frame. | 237 into problems when having to backtrace across a signal frame. |
238 Unfortunately, the cpu-profiler uses signals in order to register a | 238 Unfortunately, the cpu-profiler uses signals in order to register a |
239 profiling event, so every backtrace that the profiler does crosses a | 239 profiling event, so every backtrace that the profiler does crosses a |
240 signal frame. | 240 signal frame. |
(...skipping 15 matching lines...) Expand all Loading... |
256 your code, rather than setting CPUPROFILE. This will profile only | 256 your code, rather than setting CPUPROFILE. This will profile only |
257 those sections of the codebase. Though we haven't done much testing, | 257 those sections of the codebase. Though we haven't done much testing, |
258 in theory this should reduce the chance of crashes by limiting the | 258 in theory this should reduce the chance of crashes by limiting the |
259 signal generation to only a small part of the codebase. Ideally, you | 259 signal generation to only a small part of the codebase. Ideally, you |
260 would not use ProfilerStart()/ProfilerStop() around code that spawns | 260 would not use ProfilerStart()/ProfilerStop() around code that spawns |
261 new threads, or is otherwise likely to cause a call to | 261 new threads, or is otherwise likely to cause a call to |
262 pthread_mutex_lock! | 262 pthread_mutex_lock! |
263 | 263 |
264 --- | 264 --- |
265 17 May 2011 | 265 17 May 2011 |
OLD | NEW |