third_party/cherrypy/lib/cpstats.py - Issue 9368042: Add CherryPy to third_party.

Side by Side Diff: third_party/cherrypy/lib/cpstats.py

Issue 9368042: Add CherryPy to third_party. (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/tools/build/

Patch Set: '' Created 8 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

Property Changes:

Added: svn:eol-style
+ LF

OLD	NEW
(Empty)
	1 """CPStats, a package for collecting and reporting on program statistics.

	2

	3 Overview

	4 ========

	5

	6 Statistics about program operation are an invaluable monitoring and debugging

	7 tool. Unfortunately, the gathering and reporting of these critical values is

	8 usually ad-hoc. This package aims to add a centralized place for gathering

	9 statistical performance data, a structure for recording that data which

	10 provides for extrapolation of that data into more useful information,

	11 and a method of serving that data to both human investigators and

	12 monitoring software. Let's examine each of those in more detail.

	13

	14 Data Gathering

	15 --------------

	16

	17 Just as Python's `logging` module provides a common importable for gathering

	18 and sending messages, performance statistics would benefit from a similar

	19 common mechanism, and one that does not require each package which wishes

	20 to collect stats to import a third-party module. Therefore, we choose to

	21 re-use the `logging` module by adding a `statistics` object to it.

	22

	23 That `logging.statistics` object is a nested dict. It is not a custom class,

	24 because that would 1) require libraries and applications to import a third-

	25 party module in order to participate, 2) inhibit innovation in extrapolation

	26 approaches and in reporting tools, and 3) be slow. There are, however, some

	27 specifications regarding the structure of the dict.

	28

	29 {

	30 +----"SQLAlchemy": {

	31 \| "Inserts": 4389745,

	32 \| "Inserts per Second":

	33 \| lambda s: s["Inserts"] / (time() - s["Start"]),

	34 \| C +---"Table Statistics": {

	35 \| o \| "widgets": {-----------+

	36 N \| l \| "Rows": 1.3M, \| Record

	37 a \| l \| "Inserts": 400, \|

	38 m \| e \| },---------------------+

	39 e \| c \| "froobles": {

	40 s \| t \| "Rows": 7845,

	41 p \| i \| "Inserts": 0,

	42 a \| o \| },

	43 c \| n +---},

	44 e \| "Slow Queries":

	45 \| [{"Query": "SELECT * FROM widgets;",

	46 \| "Processing Time": 47.840923343,

	47 \| },

	48 \| ],

	49 +----},

	50 }

	51

	52 The `logging.statistics` dict has four levels. The topmost level is nothing

	53 more than a set of names to introduce modularity, usually along the lines of

	54 package names. If the SQLAlchemy project wanted to participate, for example,

	55 it might populate the item `logging.statistics['SQLAlchemy']`, whose value

	56 would be a second-layer dict we call a "namespace". Namespaces help multiple

	57 packages to avoid collisions over key names, and make reports easier to read,

	58 to boot. The maintainers of SQLAlchemy should feel free to use more than one

	59 namespace if needed (such as 'SQLAlchemy ORM'). Note that there are no case

	60 or other syntax constraints on the namespace names; they should be chosen

	61 to be maximally readable by humans (neither too short nor too long).

	62

	63 Each namespace, then, is a dict of named statistical values, such as

	64 'Requests/sec' or 'Uptime'. You should choose names which will look

	65 good on a report: spaces and capitalization are just fine.

	66

	67 In addition to scalars, values in a namespace MAY be a (third-layer)

	68 dict, or a list, called a "collection". For example, the CherryPy StatsTool

	69 keeps track of what each request is doing (or has most recently done)

	70 in a 'Requests' collection, where each key is a thread ID; each

	71 value in the subdict MUST be a fourth dict (whew!) of statistical data about

	72 each thread. We call each subdict in the collection a "record". Similarly,

	73 the StatsTool also keeps a list of slow queries, where each record contains

	74 data about each slow query, in order.

	75

	76 Values in a namespace or record may also be functions, which brings us to:

	77

	78 Extrapolation

	79 -------------

	80

	81 The collection of statistical data needs to be fast, as close to unnoticeable

	82 as possible to the host program. That requires us to minimize I/O, for example,

	83 but in Python it also means we need to minimize function calls. So when you

	84 are designing your namespace and record values, try to insert the most basic

	85 scalar values you already have on hand.

	86

	87 When it comes time to report on the gathered data, however, we usually have

	88 much more freedom in what we can calculate. Therefore, whenever reporting

	89 tools (like the provided StatsPage CherryPy class) fetch the contents of

	90 `logging.statistics` for reporting, they first call `extrapolate_statistics`

	91 (passing the whole `statistics` dict as the only argument). This makes a

	92 deep copy of the statistics dict so that the reporting tool can both iterate

	93 over it and even change it without harming the original. But it also expands

	94 any functions in the dict by calling them. For example, you might have a

	95 'Current Time' entry in the namespace with the value "lambda scope: time.time()" .

	96 The "scope" parameter is the current namespace dict (or record, if we're

	97 currently expanding one of those instead), allowing you access to existing

	98 static entries. If you're truly evil, you can even modify more than one entry

	99 at a time.

	100

	101 However, don't try to calculate an entry and then use its value in further

	102 extrapolations; the order in which the functions are called is not guaranteed.

	103 This can lead to a certain amount of duplicated work (or a redesign of your

	104 schema), but that's better than complicating the spec.

	105

	106 After the whole thing has been extrapolated, it's time for:

	107

	108 Reporting

	109 ---------

	110

	111 The StatsPage class grabs the `logging.statistics` dict, extrapolates it all,

	112 and then transforms it to HTML for easy viewing. Each namespace gets its own

	113 header and attribute table, plus an extra table for each collection. This is

	114 NOT part of the statistics specification; other tools can format how they like.

	115

	116 You can control which columns are output and how they are formatted by updating

	117 StatsPage.formatting, which is a dict that mirrors the keys and nesting of

	118 `logging.statistics`. The difference is that, instead of data values, it has

	119 formatting values. Use None for a given key to indicate to the StatsPage that a

	120 given column should not be output. Use a string with formatting (such as '%.3f')

	121 to interpolate the value(s), or use a callable (such as lambda v: v.isoformat())

	122 for more advanced formatting. Any entry which is not mentioned in the formatting

	123 dict is output unchanged.

	124

	125 Monitoring

	126 ----------

	127

	128 Although the HTML output takes pains to assign unique id's to each <td> with

	129 statistical data, you're probably better off fetching /cpstats/data, which

	130 outputs the whole (extrapolated) `logging.statistics` dict in JSON format.

	131 That is probably easier to parse, and doesn't have any formatting controls,

	132 so you get the "original" data in a consistently-serialized format.

	133 Note: there's no treatment yet for datetime objects. Try time.time() instead

	134 for now if you can. Nagios will probably thank you.

	135

	136 Turning Collection Off

	137 ----------------------

	138

	139 It is recommended each namespace have an "Enabled" item which, if False,

	140 stops collection (but not reporting) of statistical data. Applications

	141 SHOULD provide controls to pause and resume collection by setting these

	142 entries to False or True, if present.

	143

	144

	145 Usage

	146 =====

	147

	148 To collect statistics on CherryPy applications:

	149

	150 from cherrypy.lib import cpstats

	151 appconfig['/']['tools.cpstats.on'] = True

	152

	153 To collect statistics on your own code:

	154

	155 import logging

	156 # Initialize the repository

	157 if not hasattr(logging, 'statistics'): logging.statistics = {}

	158 # Initialize my namespace

	159 mystats = logging.statistics.setdefault('My Stuff', {})

	160 # Initialize my namespace's scalars and collections

	161 mystats.update({

	162 'Enabled': True,

	163 'Start Time': time.time(),

	164 'Important Events': 0,

	165 'Events/Second': lambda s: (

	166 (s['Important Events'] / (time.time() - s['Start Time']))),

	167 })

	168 ...

	169 for event in events:

	170 ...

	171 # Collect stats

	172 if mystats.get('Enabled', False):

	173 mystats['Important Events'] += 1

	174

	175 To report statistics:

	176

	177 root.cpstats = cpstats.StatsPage()

	178

	179 To format statistics reports:

	180

	181 See 'Reporting', above.

	182

	183 """

	184

	185 # -------------------------------- Statistics -------------------------------- #

	186

	187 import logging

	188 if not hasattr(logging, 'statistics'): logging.statistics = {}

	189

	190 def extrapolate_statistics(scope):

	191 """Return an extrapolated copy of the given scope."""

	192 c = {}

	193 for k, v in list(scope.items()):

	194 if isinstance(v, dict):

	195 v = extrapolate_statistics(v)

	196 elif isinstance(v, (list, tuple)):

	197 v = [extrapolate_statistics(record) for record in v]

	198 elif hasattr(v, '__call__'):

	199 v = v(scope)

	200 c[k] = v

	201 return c

	202

	203

	204 # --------------------- CherryPy Applications Statistics --------------------- #

	205

	206 import threading

	207 import time

	208

	209 import cherrypy

	210

	211 appstats = logging.statistics.setdefault('CherryPy Applications', {})

	212 appstats.update({

	213 'Enabled': True,

	214 'Bytes Read/Request': lambda s: (s['Total Requests'] and

	215 (s['Total Bytes Read'] / float(s['Total Requests'])) or 0.0),

	216 'Bytes Read/Second': lambda s: s['Total Bytes Read'] / s['Uptime'](s),

	217 'Bytes Written/Request': lambda s: (s['Total Requests'] and

	218 (s['Total Bytes Written'] / float(s['Total Requests'])) or 0.0),

	219 'Bytes Written/Second': lambda s: s['Total Bytes Written'] / s['Uptime'](s),

	220 'Current Time': lambda s: time.time(),

	221 'Current Requests': 0,

	222 'Requests/Second': lambda s: float(s['Total Requests']) / s['Uptime'](s),

	223 'Server Version': cherrypy.__version__,

	224 'Start Time': time.time(),

	225 'Total Bytes Read': 0,

	226 'Total Bytes Written': 0,

	227 'Total Requests': 0,

	228 'Total Time': 0,

	229 'Uptime': lambda s: time.time() - s['Start Time'],

	230 'Requests': {},

	231 })

	232

	233 proc_time = lambda s: time.time() - s['Start Time']

	234

	235

	236 class ByteCountWrapper(object):

	237 """Wraps a file-like object, counting the number of bytes read."""

	238

	239 def __init__(self, rfile):

	240 self.rfile = rfile

	241 self.bytes_read = 0

	242

	243 def read(self, size=-1):

	244 data = self.rfile.read(size)

	245 self.bytes_read += len(data)

	246 return data

	247

	248 def readline(self, size=-1):

	249 data = self.rfile.readline(size)

	250 self.bytes_read += len(data)

	251 return data

	252

	253 def readlines(self, sizehint=0):

	254 # Shamelessly stolen from StringIO

	255 total = 0

	256 lines = []

	257 line = self.readline()

	258 while line:

	259 lines.append(line)

	260 total += len(line)

	261 if 0 < sizehint <= total:

	262 break

	263 line = self.readline()

	264 return lines

	265

	266 def close(self):

	267 self.rfile.close()

	268

	269 def __iter__(self):

	270 return self

	271

	272 def next(self):

	273 data = self.rfile.next()

	274 self.bytes_read += len(data)

	275 return data

	276

	277

	278 average_uriset_time = lambda s: s['Count'] and (s['Sum'] / s['Count']) or 0

	279

	280

	281 class StatsTool(cherrypy.Tool):

	282 """Record various information about the current request."""

	283

	284 def __init__(self):

	285 cherrypy.Tool.__init__(self, 'on_end_request', self.record_stop)

	286

	287 def _setup(self):

	288 """Hook this tool into cherrypy.request.

	289

	290 The standard CherryPy request object will automatically call this

	291 method when the tool is "turned on" in config.

	292 """

	293 if appstats.get('Enabled', False):

	294 cherrypy.Tool._setup(self)

	295 self.record_start()

	296

	297 def record_start(self):

	298 """Record the beginning of a request."""

	299 request = cherrypy.serving.request

	300 if not hasattr(request.rfile, 'bytes_read'):

	301 request.rfile = ByteCountWrapper(request.rfile)

	302 request.body.fp = request.rfile

	303

	304 r = request.remote

	305

	306 appstats['Current Requests'] += 1

	307 appstats['Total Requests'] += 1

	308 appstats['Requests'][threading._get_ident()] = {

	309 'Bytes Read': None,

	310 'Bytes Written': None,

	311 # Use a lambda so the ip gets updated by tools.proxy later

	312 'Client': lambda s: '%s:%s' % (r.ip, r.port),

	313 'End Time': None,

	314 'Processing Time': proc_time,

	315 'Request-Line': request.request_line,

	316 'Response Status': None,

	317 'Start Time': time.time(),

	318 }

	319

	320 def record_stop(self, uriset=None, slow_queries=1.0, slow_queries_count=100,

	321 debug=False, **kwargs):

	322 """Record the end of a request."""

	323 resp = cherrypy.serving.response

	324 w = appstats['Requests'][threading._get_ident()]

	325

	326 r = cherrypy.request.rfile.bytes_read

	327 w['Bytes Read'] = r

	328 appstats['Total Bytes Read'] += r

	329

	330 if resp.stream:

	331 w['Bytes Written'] = 'chunked'

	332 else:

	333 cl = int(resp.headers.get('Content-Length', 0))

	334 w['Bytes Written'] = cl

	335 appstats['Total Bytes Written'] += cl

	336

	337 w['Response Status'] = getattr(resp, 'output_status', None) or resp.stat us

	338

	339 w['End Time'] = time.time()

	340 p = w['End Time'] - w['Start Time']

	341 w['Processing Time'] = p

	342 appstats['Total Time'] += p

	343

	344 appstats['Current Requests'] -= 1

	345

	346 if debug:

	347 cherrypy.log('Stats recorded: %s' % repr(w), 'TOOLS.CPSTATS')

	348

	349 if uriset:

	350 rs = appstats.setdefault('URI Set Tracking', {})

	351 r = rs.setdefault(uriset, {

	352 'Min': None, 'Max': None, 'Count': 0, 'Sum': 0,

	353 'Avg': average_uriset_time})

	354 if r['Min'] is None or p < r['Min']:

	355 r['Min'] = p

	356 if r['Max'] is None or p > r['Max']:

	357 r['Max'] = p

	358 r['Count'] += 1

	359 r['Sum'] += p

	360

	361 if slow_queries and p > slow_queries:

	362 sq = appstats.setdefault('Slow Queries', [])

	363 sq.append(w.copy())

	364 if len(sq) > slow_queries_count:

	365 sq.pop(0)

	366

	367

	368 import cherrypy

	369 cherrypy.tools.cpstats = StatsTool()

	370

	371

	372 # ---------------------- CherryPy Statistics Reporting ---------------------- #

	373

	374 import os

	375 thisdir = os.path.abspath(os.path.dirname(__file__))

	376

	377 try:

	378 import json

	379 except ImportError:

	380 try:

	381 import simplejson as json

	382 except ImportError:

	383 json = None

	384

	385

	386 missing = object()

	387

	388 locale_date = lambda v: time.strftime('%c', time.gmtime(v))

	389 iso_format = lambda v: time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(v))

	390

	391 def pause_resume(ns):

	392 def _pause_resume(enabled):

	393 pause_disabled = ''

	394 resume_disabled = ''

	395 if enabled:

	396 resume_disabled = 'disabled="disabled" '

	397 else:

	398 pause_disabled = 'disabled="disabled" '

	399 return """

	400 <form action="pause" method="POST" style="display:inline">

	401 <input type="hidden" name="namespace" value="%s" />

	402 <input type="submit" value="Pause" %s/>

	403 </form>

	404 <form action="resume" method="POST" style="display:inline">

	405 <input type="hidden" name="namespace" value="%s" />

	406 <input type="submit" value="Resume" %s/>

	407 </form>

	408 """ % (ns, pause_disabled, ns, resume_disabled)

	409 return _pause_resume

	410

	411

	412 class StatsPage(object):

	413

	414 formatting = {

	415 'CherryPy Applications': {

	416 'Enabled': pause_resume('CherryPy Applications'),

	417 'Bytes Read/Request': '%.3f',

	418 'Bytes Read/Second': '%.3f',

	419 'Bytes Written/Request': '%.3f',

	420 'Bytes Written/Second': '%.3f',

	421 'Current Time': iso_format,

	422 'Requests/Second': '%.3f',

	423 'Start Time': iso_format,

	424 'Total Time': '%.3f',

	425 'Uptime': '%.3f',

	426 'Slow Queries': {

	427 'End Time': None,

	428 'Processing Time': '%.3f',

	429 'Start Time': iso_format,

	430 },

	431 'URI Set Tracking': {

	432 'Avg': '%.3f',

	433 'Max': '%.3f',

	434 'Min': '%.3f',

	435 'Sum': '%.3f',

	436 },

	437 'Requests': {

	438 'Bytes Read': '%s',

	439 'Bytes Written': '%s',

	440 'End Time': None,

	441 'Processing Time': '%.3f',

	442 'Start Time': None,

	443 },

	444 },

	445 'CherryPy WSGIServer': {

	446 'Enabled': pause_resume('CherryPy WSGIServer'),

	447 'Connections/second': '%.3f',

	448 'Start time': iso_format,

	449 },

	450 }

	451

	452

	453 def index(self):

	454 # Transform the raw data into pretty output for HTML

	455 yield """

	456 <html>

	457 <head>

	458 <title>Statistics</title>

	459 <style>

	460

	461 th, td {

	462 padding: 0.25em 0.5em;

	463 border: 1px solid #666699;

	464 }

	465

	466 table {

	467 border-collapse: collapse;

	468 }

	469

	470 table.stats1 {

	471 width: 100%;

	472 }

	473

	474 table.stats1 th {

	475 font-weight: bold;

	476 text-align: right;

	477 background-color: #CCD5DD;

	478 }

	479

	480 table.stats2, h2 {

	481 margin-left: 50px;

	482 }

	483

	484 table.stats2 th {

	485 font-weight: bold;

	486 text-align: center;

	487 background-color: #CCD5DD;

	488 }

	489

	490 </style>

	491 </head>

	492 <body>

	493 """

	494 for title, scalars, collections in self.get_namespaces():

	495 yield """

	496 <h1>%s</h1>

	497

	498 <table class='stats1'>

	499 <tbody>

	500 """ % title

	501 for i, (key, value) in enumerate(scalars):

	502 colnum = i % 3

	503 if colnum == 0: yield """

	504 <tr>"""

	505 yield """

	506 <th>%(key)s</th><td id='%(title)s-%(key)s'>%(value)s</td>""" % vars( )

	507 if colnum == 2: yield """

	508 </tr>"""

	509

	510 if colnum == 0: yield """

	511 <th></th><td></td>

	512 <th></th><td></td>

	513 </tr>"""

	514 elif colnum == 1: yield """

	515 <th></th><td></td>

	516 </tr>"""

	517 yield """

	518 </tbody>

	519 </table>"""

	520

	521 for subtitle, headers, subrows in collections:

	522 yield """

	523 <h2>%s</h2>

	524 <table class='stats2'>

	525 <thead>

	526 <tr>""" % subtitle

	527 for key in headers:

	528 yield """

	529 <th>%s</th>""" % key

	530 yield """

	531 </tr>

	532 </thead>

	533 <tbody>"""

	534 for subrow in subrows:

	535 yield """

	536 <tr>"""

	537 for value in subrow:

	538 yield """

	539 <td>%s</td>""" % value

	540 yield """

	541 </tr>"""

	542 yield """

	543 </tbody>

	544 </table>"""

	545 yield """

	546 </body>

	547 </html>

	548 """

	549 index.exposed = True

	550

	551 def get_namespaces(self):

	552 """Yield (title, scalars, collections) for each namespace."""

	553 s = extrapolate_statistics(logging.statistics)

	554 for title, ns in sorted(s.items()):

	555 scalars = []

	556 collections = []

	557 ns_fmt = self.formatting.get(title, {})

	558 for k, v in sorted(ns.items()):

	559 fmt = ns_fmt.get(k, {})

	560 if isinstance(v, dict):

	561 headers, subrows = self.get_dict_collection(v, fmt)

	562 collections.append((k, ['ID'] + headers, subrows))

	563 elif isinstance(v, (list, tuple)):

	564 headers, subrows = self.get_list_collection(v, fmt)

	565 collections.append((k, headers, subrows))

	566 else:

	567 format = ns_fmt.get(k, missing)

	568 if format is None:

	569 # Don't output this column.

	570 continue

	571 if hasattr(format, '__call__'):

	572 v = format(v)

	573 elif format is not missing:

	574 v = format % v

	575 scalars.append((k, v))

	576 yield title, scalars, collections

	577

	578 def get_dict_collection(self, v, formatting):

	579 """Return ([headers], [rows]) for the given collection."""

	580 # E.g., the 'Requests' dict.

	581 headers = []

	582 for record in v.itervalues():

	583 for k3 in record:

	584 format = formatting.get(k3, missing)

	585 if format is None:

	586 # Don't output this column.

	587 continue

	588 if k3 not in headers:

	589 headers.append(k3)

	590 headers.sort()

	591

	592 subrows = []

	593 for k2, record in sorted(v.items()):

	594 subrow = [k2]

	595 for k3 in headers:

	596 v3 = record.get(k3, '')

	597 format = formatting.get(k3, missing)

	598 if format is None:

	599 # Don't output this column.

	600 continue

	601 if hasattr(format, '__call__'):

	602 v3 = format(v3)

	603 elif format is not missing:

	604 v3 = format % v3

	605 subrow.append(v3)

	606 subrows.append(subrow)

	607

	608 return headers, subrows

	609

	610 def get_list_collection(self, v, formatting):

	611 """Return ([headers], [subrows]) for the given collection."""

	612 # E.g., the 'Slow Queries' list.

	613 headers = []

	614 for record in v:

	615 for k3 in record:

	616 format = formatting.get(k3, missing)

	617 if format is None:

	618 # Don't output this column.

	619 continue

	620 if k3 not in headers:

	621 headers.append(k3)

	622 headers.sort()

	623

	624 subrows = []

	625 for record in v:

	626 subrow = []

	627 for k3 in headers:

	628 v3 = record.get(k3, '')

	629 format = formatting.get(k3, missing)

	630 if format is None:

	631 # Don't output this column.

	632 continue

	633 if hasattr(format, '__call__'):

	634 v3 = format(v3)

	635 elif format is not missing:

	636 v3 = format % v3

	637 subrow.append(v3)

	638 subrows.append(subrow)

	639

	640 return headers, subrows

	641

	642 if json is not None:

	643 def data(self):

	644 s = extrapolate_statistics(logging.statistics)

	645 cherrypy.response.headers['Content-Type'] = 'application/json'

	646 return json.dumps(s, sort_keys=True, indent=4)

	647 data.exposed = True

	648

	649 def pause(self, namespace):

	650 logging.statistics.get(namespace, {})['Enabled'] = False

	651 raise cherrypy.HTTPRedirect('./')

	652 pause.exposed = True

	653 pause.cp_config = {'tools.allow.on': True,

	654 'tools.allow.methods': ['POST']}

	655

	656 def resume(self, namespace):

	657 logging.statistics.get(namespace, {})['Enabled'] = True

	658 raise cherrypy.HTTPRedirect('./')

	659 resume.exposed = True

	660 resume.cp_config = {'tools.allow.on': True,

	661 'tools.allow.methods': ['POST']}

	662

OLD	NEW

« no previous file with comments | « third_party/cherrypy/lib/covercp.py ('k') | third_party/cherrypy/lib/cptools.py » ('j') | no next file with comments »