Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(220)

Side by Side Diff: third_party/cherrypy/lib/cpstats.py

Issue 9368042: Add CherryPy to third_party. (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/tools/build/
Patch Set: '' Created 8 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
« no previous file with comments | « third_party/cherrypy/lib/covercp.py ('k') | third_party/cherrypy/lib/cptools.py » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Property Changes:
Added: svn:eol-style
+ LF
OLDNEW
(Empty)
1 """CPStats, a package for collecting and reporting on program statistics.
2
3 Overview
4 ========
5
6 Statistics about program operation are an invaluable monitoring and debugging
7 tool. Unfortunately, the gathering and reporting of these critical values is
8 usually ad-hoc. This package aims to add a centralized place for gathering
9 statistical performance data, a structure for recording that data which
10 provides for extrapolation of that data into more useful information,
11 and a method of serving that data to both human investigators and
12 monitoring software. Let's examine each of those in more detail.
13
14 Data Gathering
15 --------------
16
17 Just as Python's `logging` module provides a common importable for gathering
18 and sending messages, performance statistics would benefit from a similar
19 common mechanism, and one that does *not* require each package which wishes
20 to collect stats to import a third-party module. Therefore, we choose to
21 re-use the `logging` module by adding a `statistics` object to it.
22
23 That `logging.statistics` object is a nested dict. It is not a custom class,
24 because that would 1) require libraries and applications to import a third-
25 party module in order to participate, 2) inhibit innovation in extrapolation
26 approaches and in reporting tools, and 3) be slow. There are, however, some
27 specifications regarding the structure of the dict.
28
29 {
30 +----"SQLAlchemy": {
31 | "Inserts": 4389745,
32 | "Inserts per Second":
33 | lambda s: s["Inserts"] / (time() - s["Start"]),
34 | C +---"Table Statistics": {
35 | o | "widgets": {-----------+
36 N | l | "Rows": 1.3M, | Record
37 a | l | "Inserts": 400, |
38 m | e | },---------------------+
39 e | c | "froobles": {
40 s | t | "Rows": 7845,
41 p | i | "Inserts": 0,
42 a | o | },
43 c | n +---},
44 e | "Slow Queries":
45 | [{"Query": "SELECT * FROM widgets;",
46 | "Processing Time": 47.840923343,
47 | },
48 | ],
49 +----},
50 }
51
52 The `logging.statistics` dict has four levels. The topmost level is nothing
53 more than a set of names to introduce modularity, usually along the lines of
54 package names. If the SQLAlchemy project wanted to participate, for example,
55 it might populate the item `logging.statistics['SQLAlchemy']`, whose value
56 would be a second-layer dict we call a "namespace". Namespaces help multiple
57 packages to avoid collisions over key names, and make reports easier to read,
58 to boot. The maintainers of SQLAlchemy should feel free to use more than one
59 namespace if needed (such as 'SQLAlchemy ORM'). Note that there are no case
60 or other syntax constraints on the namespace names; they should be chosen
61 to be maximally readable by humans (neither too short nor too long).
62
63 Each namespace, then, is a dict of named statistical values, such as
64 'Requests/sec' or 'Uptime'. You should choose names which will look
65 good on a report: spaces and capitalization are just fine.
66
67 In addition to scalars, values in a namespace MAY be a (third-layer)
68 dict, or a list, called a "collection". For example, the CherryPy StatsTool
69 keeps track of what each request is doing (or has most recently done)
70 in a 'Requests' collection, where each key is a thread ID; each
71 value in the subdict MUST be a fourth dict (whew!) of statistical data about
72 each thread. We call each subdict in the collection a "record". Similarly,
73 the StatsTool also keeps a list of slow queries, where each record contains
74 data about each slow query, in order.
75
76 Values in a namespace or record may also be functions, which brings us to:
77
78 Extrapolation
79 -------------
80
81 The collection of statistical data needs to be fast, as close to unnoticeable
82 as possible to the host program. That requires us to minimize I/O, for example,
83 but in Python it also means we need to minimize function calls. So when you
84 are designing your namespace and record values, try to insert the most basic
85 scalar values you already have on hand.
86
87 When it comes time to report on the gathered data, however, we usually have
88 much more freedom in what we can calculate. Therefore, whenever reporting
89 tools (like the provided StatsPage CherryPy class) fetch the contents of
90 `logging.statistics` for reporting, they first call `extrapolate_statistics`
91 (passing the whole `statistics` dict as the only argument). This makes a
92 deep copy of the statistics dict so that the reporting tool can both iterate
93 over it and even change it without harming the original. But it also expands
94 any functions in the dict by calling them. For example, you might have a
95 'Current Time' entry in the namespace with the value "lambda scope: time.time()" .
96 The "scope" parameter is the current namespace dict (or record, if we're
97 currently expanding one of those instead), allowing you access to existing
98 static entries. If you're truly evil, you can even modify more than one entry
99 at a time.
100
101 However, don't try to calculate an entry and then use its value in further
102 extrapolations; the order in which the functions are called is not guaranteed.
103 This can lead to a certain amount of duplicated work (or a redesign of your
104 schema), but that's better than complicating the spec.
105
106 After the whole thing has been extrapolated, it's time for:
107
108 Reporting
109 ---------
110
111 The StatsPage class grabs the `logging.statistics` dict, extrapolates it all,
112 and then transforms it to HTML for easy viewing. Each namespace gets its own
113 header and attribute table, plus an extra table for each collection. This is
114 NOT part of the statistics specification; other tools can format how they like.
115
116 You can control which columns are output and how they are formatted by updating
117 StatsPage.formatting, which is a dict that mirrors the keys and nesting of
118 `logging.statistics`. The difference is that, instead of data values, it has
119 formatting values. Use None for a given key to indicate to the StatsPage that a
120 given column should not be output. Use a string with formatting (such as '%.3f')
121 to interpolate the value(s), or use a callable (such as lambda v: v.isoformat())
122 for more advanced formatting. Any entry which is not mentioned in the formatting
123 dict is output unchanged.
124
125 Monitoring
126 ----------
127
128 Although the HTML output takes pains to assign unique id's to each <td> with
129 statistical data, you're probably better off fetching /cpstats/data, which
130 outputs the whole (extrapolated) `logging.statistics` dict in JSON format.
131 That is probably easier to parse, and doesn't have any formatting controls,
132 so you get the "original" data in a consistently-serialized format.
133 Note: there's no treatment yet for datetime objects. Try time.time() instead
134 for now if you can. Nagios will probably thank you.
135
136 Turning Collection Off
137 ----------------------
138
139 It is recommended each namespace have an "Enabled" item which, if False,
140 stops collection (but not reporting) of statistical data. Applications
141 SHOULD provide controls to pause and resume collection by setting these
142 entries to False or True, if present.
143
144
145 Usage
146 =====
147
148 To collect statistics on CherryPy applications:
149
150 from cherrypy.lib import cpstats
151 appconfig['/']['tools.cpstats.on'] = True
152
153 To collect statistics on your own code:
154
155 import logging
156 # Initialize the repository
157 if not hasattr(logging, 'statistics'): logging.statistics = {}
158 # Initialize my namespace
159 mystats = logging.statistics.setdefault('My Stuff', {})
160 # Initialize my namespace's scalars and collections
161 mystats.update({
162 'Enabled': True,
163 'Start Time': time.time(),
164 'Important Events': 0,
165 'Events/Second': lambda s: (
166 (s['Important Events'] / (time.time() - s['Start Time']))),
167 })
168 ...
169 for event in events:
170 ...
171 # Collect stats
172 if mystats.get('Enabled', False):
173 mystats['Important Events'] += 1
174
175 To report statistics:
176
177 root.cpstats = cpstats.StatsPage()
178
179 To format statistics reports:
180
181 See 'Reporting', above.
182
183 """
184
185 # -------------------------------- Statistics -------------------------------- #
186
187 import logging
188 if not hasattr(logging, 'statistics'): logging.statistics = {}
189
190 def extrapolate_statistics(scope):
191 """Return an extrapolated copy of the given scope."""
192 c = {}
193 for k, v in list(scope.items()):
194 if isinstance(v, dict):
195 v = extrapolate_statistics(v)
196 elif isinstance(v, (list, tuple)):
197 v = [extrapolate_statistics(record) for record in v]
198 elif hasattr(v, '__call__'):
199 v = v(scope)
200 c[k] = v
201 return c
202
203
204 # --------------------- CherryPy Applications Statistics --------------------- #
205
206 import threading
207 import time
208
209 import cherrypy
210
211 appstats = logging.statistics.setdefault('CherryPy Applications', {})
212 appstats.update({
213 'Enabled': True,
214 'Bytes Read/Request': lambda s: (s['Total Requests'] and
215 (s['Total Bytes Read'] / float(s['Total Requests'])) or 0.0),
216 'Bytes Read/Second': lambda s: s['Total Bytes Read'] / s['Uptime'](s),
217 'Bytes Written/Request': lambda s: (s['Total Requests'] and
218 (s['Total Bytes Written'] / float(s['Total Requests'])) or 0.0),
219 'Bytes Written/Second': lambda s: s['Total Bytes Written'] / s['Uptime'](s),
220 'Current Time': lambda s: time.time(),
221 'Current Requests': 0,
222 'Requests/Second': lambda s: float(s['Total Requests']) / s['Uptime'](s),
223 'Server Version': cherrypy.__version__,
224 'Start Time': time.time(),
225 'Total Bytes Read': 0,
226 'Total Bytes Written': 0,
227 'Total Requests': 0,
228 'Total Time': 0,
229 'Uptime': lambda s: time.time() - s['Start Time'],
230 'Requests': {},
231 })
232
233 proc_time = lambda s: time.time() - s['Start Time']
234
235
236 class ByteCountWrapper(object):
237 """Wraps a file-like object, counting the number of bytes read."""
238
239 def __init__(self, rfile):
240 self.rfile = rfile
241 self.bytes_read = 0
242
243 def read(self, size=-1):
244 data = self.rfile.read(size)
245 self.bytes_read += len(data)
246 return data
247
248 def readline(self, size=-1):
249 data = self.rfile.readline(size)
250 self.bytes_read += len(data)
251 return data
252
253 def readlines(self, sizehint=0):
254 # Shamelessly stolen from StringIO
255 total = 0
256 lines = []
257 line = self.readline()
258 while line:
259 lines.append(line)
260 total += len(line)
261 if 0 < sizehint <= total:
262 break
263 line = self.readline()
264 return lines
265
266 def close(self):
267 self.rfile.close()
268
269 def __iter__(self):
270 return self
271
272 def next(self):
273 data = self.rfile.next()
274 self.bytes_read += len(data)
275 return data
276
277
278 average_uriset_time = lambda s: s['Count'] and (s['Sum'] / s['Count']) or 0
279
280
281 class StatsTool(cherrypy.Tool):
282 """Record various information about the current request."""
283
284 def __init__(self):
285 cherrypy.Tool.__init__(self, 'on_end_request', self.record_stop)
286
287 def _setup(self):
288 """Hook this tool into cherrypy.request.
289
290 The standard CherryPy request object will automatically call this
291 method when the tool is "turned on" in config.
292 """
293 if appstats.get('Enabled', False):
294 cherrypy.Tool._setup(self)
295 self.record_start()
296
297 def record_start(self):
298 """Record the beginning of a request."""
299 request = cherrypy.serving.request
300 if not hasattr(request.rfile, 'bytes_read'):
301 request.rfile = ByteCountWrapper(request.rfile)
302 request.body.fp = request.rfile
303
304 r = request.remote
305
306 appstats['Current Requests'] += 1
307 appstats['Total Requests'] += 1
308 appstats['Requests'][threading._get_ident()] = {
309 'Bytes Read': None,
310 'Bytes Written': None,
311 # Use a lambda so the ip gets updated by tools.proxy later
312 'Client': lambda s: '%s:%s' % (r.ip, r.port),
313 'End Time': None,
314 'Processing Time': proc_time,
315 'Request-Line': request.request_line,
316 'Response Status': None,
317 'Start Time': time.time(),
318 }
319
320 def record_stop(self, uriset=None, slow_queries=1.0, slow_queries_count=100,
321 debug=False, **kwargs):
322 """Record the end of a request."""
323 resp = cherrypy.serving.response
324 w = appstats['Requests'][threading._get_ident()]
325
326 r = cherrypy.request.rfile.bytes_read
327 w['Bytes Read'] = r
328 appstats['Total Bytes Read'] += r
329
330 if resp.stream:
331 w['Bytes Written'] = 'chunked'
332 else:
333 cl = int(resp.headers.get('Content-Length', 0))
334 w['Bytes Written'] = cl
335 appstats['Total Bytes Written'] += cl
336
337 w['Response Status'] = getattr(resp, 'output_status', None) or resp.stat us
338
339 w['End Time'] = time.time()
340 p = w['End Time'] - w['Start Time']
341 w['Processing Time'] = p
342 appstats['Total Time'] += p
343
344 appstats['Current Requests'] -= 1
345
346 if debug:
347 cherrypy.log('Stats recorded: %s' % repr(w), 'TOOLS.CPSTATS')
348
349 if uriset:
350 rs = appstats.setdefault('URI Set Tracking', {})
351 r = rs.setdefault(uriset, {
352 'Min': None, 'Max': None, 'Count': 0, 'Sum': 0,
353 'Avg': average_uriset_time})
354 if r['Min'] is None or p < r['Min']:
355 r['Min'] = p
356 if r['Max'] is None or p > r['Max']:
357 r['Max'] = p
358 r['Count'] += 1
359 r['Sum'] += p
360
361 if slow_queries and p > slow_queries:
362 sq = appstats.setdefault('Slow Queries', [])
363 sq.append(w.copy())
364 if len(sq) > slow_queries_count:
365 sq.pop(0)
366
367
368 import cherrypy
369 cherrypy.tools.cpstats = StatsTool()
370
371
372 # ---------------------- CherryPy Statistics Reporting ---------------------- #
373
374 import os
375 thisdir = os.path.abspath(os.path.dirname(__file__))
376
377 try:
378 import json
379 except ImportError:
380 try:
381 import simplejson as json
382 except ImportError:
383 json = None
384
385
386 missing = object()
387
388 locale_date = lambda v: time.strftime('%c', time.gmtime(v))
389 iso_format = lambda v: time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(v))
390
391 def pause_resume(ns):
392 def _pause_resume(enabled):
393 pause_disabled = ''
394 resume_disabled = ''
395 if enabled:
396 resume_disabled = 'disabled="disabled" '
397 else:
398 pause_disabled = 'disabled="disabled" '
399 return """
400 <form action="pause" method="POST" style="display:inline">
401 <input type="hidden" name="namespace" value="%s" />
402 <input type="submit" value="Pause" %s/>
403 </form>
404 <form action="resume" method="POST" style="display:inline">
405 <input type="hidden" name="namespace" value="%s" />
406 <input type="submit" value="Resume" %s/>
407 </form>
408 """ % (ns, pause_disabled, ns, resume_disabled)
409 return _pause_resume
410
411
412 class StatsPage(object):
413
414 formatting = {
415 'CherryPy Applications': {
416 'Enabled': pause_resume('CherryPy Applications'),
417 'Bytes Read/Request': '%.3f',
418 'Bytes Read/Second': '%.3f',
419 'Bytes Written/Request': '%.3f',
420 'Bytes Written/Second': '%.3f',
421 'Current Time': iso_format,
422 'Requests/Second': '%.3f',
423 'Start Time': iso_format,
424 'Total Time': '%.3f',
425 'Uptime': '%.3f',
426 'Slow Queries': {
427 'End Time': None,
428 'Processing Time': '%.3f',
429 'Start Time': iso_format,
430 },
431 'URI Set Tracking': {
432 'Avg': '%.3f',
433 'Max': '%.3f',
434 'Min': '%.3f',
435 'Sum': '%.3f',
436 },
437 'Requests': {
438 'Bytes Read': '%s',
439 'Bytes Written': '%s',
440 'End Time': None,
441 'Processing Time': '%.3f',
442 'Start Time': None,
443 },
444 },
445 'CherryPy WSGIServer': {
446 'Enabled': pause_resume('CherryPy WSGIServer'),
447 'Connections/second': '%.3f',
448 'Start time': iso_format,
449 },
450 }
451
452
453 def index(self):
454 # Transform the raw data into pretty output for HTML
455 yield """
456 <html>
457 <head>
458 <title>Statistics</title>
459 <style>
460
461 th, td {
462 padding: 0.25em 0.5em;
463 border: 1px solid #666699;
464 }
465
466 table {
467 border-collapse: collapse;
468 }
469
470 table.stats1 {
471 width: 100%;
472 }
473
474 table.stats1 th {
475 font-weight: bold;
476 text-align: right;
477 background-color: #CCD5DD;
478 }
479
480 table.stats2, h2 {
481 margin-left: 50px;
482 }
483
484 table.stats2 th {
485 font-weight: bold;
486 text-align: center;
487 background-color: #CCD5DD;
488 }
489
490 </style>
491 </head>
492 <body>
493 """
494 for title, scalars, collections in self.get_namespaces():
495 yield """
496 <h1>%s</h1>
497
498 <table class='stats1'>
499 <tbody>
500 """ % title
501 for i, (key, value) in enumerate(scalars):
502 colnum = i % 3
503 if colnum == 0: yield """
504 <tr>"""
505 yield """
506 <th>%(key)s</th><td id='%(title)s-%(key)s'>%(value)s</td>""" % vars( )
507 if colnum == 2: yield """
508 </tr>"""
509
510 if colnum == 0: yield """
511 <th></th><td></td>
512 <th></th><td></td>
513 </tr>"""
514 elif colnum == 1: yield """
515 <th></th><td></td>
516 </tr>"""
517 yield """
518 </tbody>
519 </table>"""
520
521 for subtitle, headers, subrows in collections:
522 yield """
523 <h2>%s</h2>
524 <table class='stats2'>
525 <thead>
526 <tr>""" % subtitle
527 for key in headers:
528 yield """
529 <th>%s</th>""" % key
530 yield """
531 </tr>
532 </thead>
533 <tbody>"""
534 for subrow in subrows:
535 yield """
536 <tr>"""
537 for value in subrow:
538 yield """
539 <td>%s</td>""" % value
540 yield """
541 </tr>"""
542 yield """
543 </tbody>
544 </table>"""
545 yield """
546 </body>
547 </html>
548 """
549 index.exposed = True
550
551 def get_namespaces(self):
552 """Yield (title, scalars, collections) for each namespace."""
553 s = extrapolate_statistics(logging.statistics)
554 for title, ns in sorted(s.items()):
555 scalars = []
556 collections = []
557 ns_fmt = self.formatting.get(title, {})
558 for k, v in sorted(ns.items()):
559 fmt = ns_fmt.get(k, {})
560 if isinstance(v, dict):
561 headers, subrows = self.get_dict_collection(v, fmt)
562 collections.append((k, ['ID'] + headers, subrows))
563 elif isinstance(v, (list, tuple)):
564 headers, subrows = self.get_list_collection(v, fmt)
565 collections.append((k, headers, subrows))
566 else:
567 format = ns_fmt.get(k, missing)
568 if format is None:
569 # Don't output this column.
570 continue
571 if hasattr(format, '__call__'):
572 v = format(v)
573 elif format is not missing:
574 v = format % v
575 scalars.append((k, v))
576 yield title, scalars, collections
577
578 def get_dict_collection(self, v, formatting):
579 """Return ([headers], [rows]) for the given collection."""
580 # E.g., the 'Requests' dict.
581 headers = []
582 for record in v.itervalues():
583 for k3 in record:
584 format = formatting.get(k3, missing)
585 if format is None:
586 # Don't output this column.
587 continue
588 if k3 not in headers:
589 headers.append(k3)
590 headers.sort()
591
592 subrows = []
593 for k2, record in sorted(v.items()):
594 subrow = [k2]
595 for k3 in headers:
596 v3 = record.get(k3, '')
597 format = formatting.get(k3, missing)
598 if format is None:
599 # Don't output this column.
600 continue
601 if hasattr(format, '__call__'):
602 v3 = format(v3)
603 elif format is not missing:
604 v3 = format % v3
605 subrow.append(v3)
606 subrows.append(subrow)
607
608 return headers, subrows
609
610 def get_list_collection(self, v, formatting):
611 """Return ([headers], [subrows]) for the given collection."""
612 # E.g., the 'Slow Queries' list.
613 headers = []
614 for record in v:
615 for k3 in record:
616 format = formatting.get(k3, missing)
617 if format is None:
618 # Don't output this column.
619 continue
620 if k3 not in headers:
621 headers.append(k3)
622 headers.sort()
623
624 subrows = []
625 for record in v:
626 subrow = []
627 for k3 in headers:
628 v3 = record.get(k3, '')
629 format = formatting.get(k3, missing)
630 if format is None:
631 # Don't output this column.
632 continue
633 if hasattr(format, '__call__'):
634 v3 = format(v3)
635 elif format is not missing:
636 v3 = format % v3
637 subrow.append(v3)
638 subrows.append(subrow)
639
640 return headers, subrows
641
642 if json is not None:
643 def data(self):
644 s = extrapolate_statistics(logging.statistics)
645 cherrypy.response.headers['Content-Type'] = 'application/json'
646 return json.dumps(s, sort_keys=True, indent=4)
647 data.exposed = True
648
649 def pause(self, namespace):
650 logging.statistics.get(namespace, {})['Enabled'] = False
651 raise cherrypy.HTTPRedirect('./')
652 pause.exposed = True
653 pause.cp_config = {'tools.allow.on': True,
654 'tools.allow.methods': ['POST']}
655
656 def resume(self, namespace):
657 logging.statistics.get(namespace, {})['Enabled'] = True
658 raise cherrypy.HTTPRedirect('./')
659 resume.exposed = True
660 resume.cp_config = {'tools.allow.on': True,
661 'tools.allow.methods': ['POST']}
662
OLDNEW
« no previous file with comments | « third_party/cherrypy/lib/covercp.py ('k') | third_party/cherrypy/lib/cptools.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698