OLD | NEW |
(Empty) | |
| 1 """Request body processing for CherryPy. |
| 2 |
| 3 .. versionadded:: 3.2 |
| 4 |
| 5 Application authors have complete control over the parsing of HTTP request |
| 6 entities. In short, :attr:`cherrypy.request.body<cherrypy._cprequest.Request.bod
y>` |
| 7 is now always set to an instance of :class:`RequestBody<cherrypy._cpreqbody.Requ
estBody>`, |
| 8 and *that* class is a subclass of :class:`Entity<cherrypy._cpreqbody.Entity>`. |
| 9 |
| 10 When an HTTP request includes an entity body, it is often desirable to |
| 11 provide that information to applications in a form other than the raw bytes. |
| 12 Different content types demand different approaches. Examples: |
| 13 |
| 14 * For a GIF file, we want the raw bytes in a stream. |
| 15 * An HTML form is better parsed into its component fields, and each text field |
| 16 decoded from bytes to unicode. |
| 17 * A JSON body should be deserialized into a Python dict or list. |
| 18 |
| 19 When the request contains a Content-Type header, the media type is used as a |
| 20 key to look up a value in the |
| 21 :attr:`request.body.processors<cherrypy._cpreqbody.Entity.processors>` dict. |
| 22 If the full media |
| 23 type is not found, then the major type is tried; for example, if no processor |
| 24 is found for the 'image/jpeg' type, then we look for a processor for the 'image' |
| 25 types altogether. If neither the full type nor the major type has a matching |
| 26 processor, then a default processor is used |
| 27 (:func:`default_proc<cherrypy._cpreqbody.Entity.default_proc>`). For most |
| 28 types, this means no processing is done, and the body is left unread as a |
| 29 raw byte stream. Processors are configurable in an 'on_start_resource' hook. |
| 30 |
| 31 Some processors, especially those for the 'text' types, attempt to decode bytes |
| 32 to unicode. If the Content-Type request header includes a 'charset' parameter, |
| 33 this is used to decode the entity. Otherwise, one or more default charsets may |
| 34 be attempted, although this decision is up to each processor. If a processor |
| 35 successfully decodes an Entity or Part, it should set the |
| 36 :attr:`charset<cherrypy._cpreqbody.Entity.charset>` attribute |
| 37 on the Entity or Part to the name of the successful charset, so that |
| 38 applications can easily re-encode or transcode the value if they wish. |
| 39 |
| 40 If the Content-Type of the request entity is of major type 'multipart', then |
| 41 the above parsing process, and possibly a decoding process, is performed for |
| 42 each part. |
| 43 |
| 44 For both the full entity and multipart parts, a Content-Disposition header may |
| 45 be used to fill :attr:`name<cherrypy._cpreqbody.Entity.name>` and |
| 46 :attr:`filename<cherrypy._cpreqbody.Entity.filename>` attributes on the |
| 47 request.body or the Part. |
| 48 |
| 49 .. _custombodyprocessors: |
| 50 |
| 51 Custom Processors |
| 52 ================= |
| 53 |
| 54 You can add your own processors for any specific or major MIME type. Simply add |
| 55 it to the :attr:`processors<cherrypy._cprequest.Entity.processors>` dict in a |
| 56 hook/tool that runs at ``on_start_resource`` or ``before_request_body``. |
| 57 Here's the built-in JSON tool for an example:: |
| 58 |
| 59 def json_in(force=True, debug=False): |
| 60 request = cherrypy.serving.request |
| 61 def json_processor(entity): |
| 62 \"""Read application/json data into request.json.\""" |
| 63 if not entity.headers.get("Content-Length", ""): |
| 64 raise cherrypy.HTTPError(411) |
| 65 |
| 66 body = entity.fp.read() |
| 67 try: |
| 68 request.json = json_decode(body) |
| 69 except ValueError: |
| 70 raise cherrypy.HTTPError(400, 'Invalid JSON document') |
| 71 if force: |
| 72 request.body.processors.clear() |
| 73 request.body.default_proc = cherrypy.HTTPError( |
| 74 415, 'Expected an application/json content type') |
| 75 request.body.processors['application/json'] = json_processor |
| 76 |
| 77 We begin by defining a new ``json_processor`` function to stick in the ``process
ors`` |
| 78 dictionary. All processor functions take a single argument, the ``Entity`` insta
nce |
| 79 they are to process. It will be called whenever a request is received (for those |
| 80 URI's where the tool is turned on) which has a ``Content-Type`` of |
| 81 "application/json". |
| 82 |
| 83 First, it checks for a valid ``Content-Length`` (raising 411 if not valid), then |
| 84 reads the remaining bytes on the socket. The ``fp`` object knows its own length,
so |
| 85 it won't hang waiting for data that never arrives. It will return when all data |
| 86 has been read. Then, we decode those bytes using Python's built-in ``json`` modu
le, |
| 87 and stick the decoded result onto ``request.json`` . If it cannot be decoded, we |
| 88 raise 400. |
| 89 |
| 90 If the "force" argument is True (the default), the ``Tool`` clears the ``process
ors`` |
| 91 dict so that request entities of other ``Content-Types`` aren't parsed at all. S
ince |
| 92 there's no entry for those invalid MIME types, the ``default_proc`` method of ``
cherrypy.request.body`` |
| 93 is called. But this does nothing by default (usually to provide the page handler
an opportunity to handle it.) |
| 94 But in our case, we want to raise 415, so we replace ``request.body.default_proc
`` |
| 95 with the error (``HTTPError`` instances, when called, raise themselves). |
| 96 |
| 97 If we were defining a custom processor, we can do so without making a ``Tool``.
Just add the config entry:: |
| 98 |
| 99 request.body.processors = {'application/json': json_processor} |
| 100 |
| 101 Note that you can only replace the ``processors`` dict wholesale this way, not u
pdate the existing one. |
| 102 """ |
| 103 |
| 104 try: |
| 105 from io import DEFAULT_BUFFER_SIZE |
| 106 except ImportError: |
| 107 DEFAULT_BUFFER_SIZE = 8192 |
| 108 import re |
| 109 import sys |
| 110 import tempfile |
| 111 try: |
| 112 from urllib import unquote_plus |
| 113 except ImportError: |
| 114 def unquote_plus(bs): |
| 115 """Bytes version of urllib.parse.unquote_plus.""" |
| 116 bs = bs.replace(ntob('+'), ntob(' ')) |
| 117 atoms = bs.split(ntob('%')) |
| 118 for i in range(1, len(atoms)): |
| 119 item = atoms[i] |
| 120 try: |
| 121 pct = int(item[:2], 16) |
| 122 atoms[i] = bytes([pct]) + item[2:] |
| 123 except ValueError: |
| 124 pass |
| 125 return ntob('').join(atoms) |
| 126 |
| 127 import cherrypy |
| 128 from cherrypy._cpcompat import basestring, ntob, ntou |
| 129 from cherrypy.lib import httputil |
| 130 |
| 131 |
| 132 # -------------------------------- Processors -------------------------------- # |
| 133 |
| 134 def process_urlencoded(entity): |
| 135 """Read application/x-www-form-urlencoded data into entity.params.""" |
| 136 qs = entity.fp.read() |
| 137 for charset in entity.attempt_charsets: |
| 138 try: |
| 139 params = {} |
| 140 for aparam in qs.split(ntob('&')): |
| 141 for pair in aparam.split(ntob(';')): |
| 142 if not pair: |
| 143 continue |
| 144 |
| 145 atoms = pair.split(ntob('='), 1) |
| 146 if len(atoms) == 1: |
| 147 atoms.append(ntob('')) |
| 148 |
| 149 key = unquote_plus(atoms[0]).decode(charset) |
| 150 value = unquote_plus(atoms[1]).decode(charset) |
| 151 |
| 152 if key in params: |
| 153 if not isinstance(params[key], list): |
| 154 params[key] = [params[key]] |
| 155 params[key].append(value) |
| 156 else: |
| 157 params[key] = value |
| 158 except UnicodeDecodeError: |
| 159 pass |
| 160 else: |
| 161 entity.charset = charset |
| 162 break |
| 163 else: |
| 164 raise cherrypy.HTTPError( |
| 165 400, "The request entity could not be decoded. The following " |
| 166 "charsets were attempted: %s" % repr(entity.attempt_charsets)) |
| 167 |
| 168 # Now that all values have been successfully parsed and decoded, |
| 169 # apply them to the entity.params dict. |
| 170 for key, value in params.items(): |
| 171 if key in entity.params: |
| 172 if not isinstance(entity.params[key], list): |
| 173 entity.params[key] = [entity.params[key]] |
| 174 entity.params[key].append(value) |
| 175 else: |
| 176 entity.params[key] = value |
| 177 |
| 178 |
| 179 def process_multipart(entity): |
| 180 """Read all multipart parts into entity.parts.""" |
| 181 ib = "" |
| 182 if 'boundary' in entity.content_type.params: |
| 183 # http://tools.ietf.org/html/rfc2046#section-5.1.1 |
| 184 # "The grammar for parameters on the Content-type field is such that it |
| 185 # is often necessary to enclose the boundary parameter values in quotes |
| 186 # on the Content-type line" |
| 187 ib = entity.content_type.params['boundary'].strip('"') |
| 188 |
| 189 if not re.match("^[ -~]{0,200}[!-~]$", ib): |
| 190 raise ValueError('Invalid boundary in multipart form: %r' % (ib,)) |
| 191 |
| 192 ib = ('--' + ib).encode('ascii') |
| 193 |
| 194 # Find the first marker |
| 195 while True: |
| 196 b = entity.readline() |
| 197 if not b: |
| 198 return |
| 199 |
| 200 b = b.strip() |
| 201 if b == ib: |
| 202 break |
| 203 |
| 204 # Read all parts |
| 205 while True: |
| 206 part = entity.part_class.from_fp(entity.fp, ib) |
| 207 entity.parts.append(part) |
| 208 part.process() |
| 209 if part.fp.done: |
| 210 break |
| 211 |
| 212 def process_multipart_form_data(entity): |
| 213 """Read all multipart/form-data parts into entity.parts or entity.params.""" |
| 214 process_multipart(entity) |
| 215 |
| 216 kept_parts = [] |
| 217 for part in entity.parts: |
| 218 if part.name is None: |
| 219 kept_parts.append(part) |
| 220 else: |
| 221 if part.filename is None: |
| 222 # It's a regular field |
| 223 value = part.fullvalue() |
| 224 else: |
| 225 # It's a file upload. Retain the whole part so consumer code |
| 226 # has access to its .file and .filename attributes. |
| 227 value = part |
| 228 |
| 229 if part.name in entity.params: |
| 230 if not isinstance(entity.params[part.name], list): |
| 231 entity.params[part.name] = [entity.params[part.name]] |
| 232 entity.params[part.name].append(value) |
| 233 else: |
| 234 entity.params[part.name] = value |
| 235 |
| 236 entity.parts = kept_parts |
| 237 |
| 238 def _old_process_multipart(entity): |
| 239 """The behavior of 3.2 and lower. Deprecated and will be changed in 3.3.""" |
| 240 process_multipart(entity) |
| 241 |
| 242 params = entity.params |
| 243 |
| 244 for part in entity.parts: |
| 245 if part.name is None: |
| 246 key = ntou('parts') |
| 247 else: |
| 248 key = part.name |
| 249 |
| 250 if part.filename is None: |
| 251 # It's a regular field |
| 252 value = part.fullvalue() |
| 253 else: |
| 254 # It's a file upload. Retain the whole part so consumer code |
| 255 # has access to its .file and .filename attributes. |
| 256 value = part |
| 257 |
| 258 if key in params: |
| 259 if not isinstance(params[key], list): |
| 260 params[key] = [params[key]] |
| 261 params[key].append(value) |
| 262 else: |
| 263 params[key] = value |
| 264 |
| 265 |
| 266 |
| 267 # --------------------------------- Entities --------------------------------- # |
| 268 |
| 269 |
| 270 class Entity(object): |
| 271 """An HTTP request body, or MIME multipart body. |
| 272 |
| 273 This class collects information about the HTTP request entity. When a |
| 274 given entity is of MIME type "multipart", each part is parsed into its own |
| 275 Entity instance, and the set of parts stored in |
| 276 :attr:`entity.parts<cherrypy._cpreqbody.Entity.parts>`. |
| 277 |
| 278 Between the ``before_request_body`` and ``before_handler`` tools, CherryPy |
| 279 tries to process the request body (if any) by calling |
| 280 :func:`request.body.process<cherrypy._cpreqbody.RequestBody.process`. |
| 281 This uses the ``content_type`` of the Entity to look up a suitable processor |
| 282 in :attr:`Entity.processors<cherrypy._cpreqbody.Entity.processors>`, a dict. |
| 283 If a matching processor cannot be found for the complete Content-Type, |
| 284 it tries again using the major type. For example, if a request with an |
| 285 entity of type "image/jpeg" arrives, but no processor can be found for |
| 286 that complete type, then one is sought for the major type "image". If a |
| 287 processor is still not found, then the |
| 288 :func:`default_proc<cherrypy._cpreqbody.Entity.default_proc>` method of the |
| 289 Entity is called (which does nothing by default; you can override this too). |
| 290 |
| 291 CherryPy includes processors for the "application/x-www-form-urlencoded" |
| 292 type, the "multipart/form-data" type, and the "multipart" major type. |
| 293 CherryPy 3.2 processes these types almost exactly as older versions. |
| 294 Parts are passed as arguments to the page handler using their |
| 295 ``Content-Disposition.name`` if given, otherwise in a generic "parts" |
| 296 argument. Each such part is either a string, or the |
| 297 :class:`Part<cherrypy._cpreqbody.Part>` itself if it's a file. (In this |
| 298 case it will have ``file`` and ``filename`` attributes, or possibly a |
| 299 ``value`` attribute). Each Part is itself a subclass of |
| 300 Entity, and has its own ``process`` method and ``processors`` dict. |
| 301 |
| 302 There is a separate processor for the "multipart" major type which is more |
| 303 flexible, and simply stores all multipart parts in |
| 304 :attr:`request.body.parts<cherrypy._cpreqbody.Entity.parts>`. You can |
| 305 enable it with:: |
| 306 |
| 307 cherrypy.request.body.processors['multipart'] = _cpreqbody.process_multi
part |
| 308 |
| 309 in an ``on_start_resource`` tool. |
| 310 """ |
| 311 |
| 312 # http://tools.ietf.org/html/rfc2046#section-4.1.2: |
| 313 # "The default character set, which must be assumed in the |
| 314 # absence of a charset parameter, is US-ASCII." |
| 315 # However, many browsers send data in utf-8 with no charset. |
| 316 attempt_charsets = ['utf-8'] |
| 317 """A list of strings, each of which should be a known encoding. |
| 318 |
| 319 When the Content-Type of the request body warrants it, each of the given |
| 320 encodings will be tried in order. The first one to successfully decode the |
| 321 entity without raising an error is stored as |
| 322 :attr:`entity.charset<cherrypy._cpreqbody.Entity.charset>`. This defaults |
| 323 to ``['utf-8']`` (plus 'ISO-8859-1' for "text/\*" types, as required by |
| 324 `HTTP/1.1 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_
), |
| 325 but ``['us-ascii', 'utf-8']`` for multipart parts. |
| 326 """ |
| 327 |
| 328 charset = None |
| 329 """The successful decoding; see "attempt_charsets" above.""" |
| 330 |
| 331 content_type = None |
| 332 """The value of the Content-Type request header. |
| 333 |
| 334 If the Entity is part of a multipart payload, this will be the Content-Type |
| 335 given in the MIME headers for this part. |
| 336 """ |
| 337 |
| 338 default_content_type = 'application/x-www-form-urlencoded' |
| 339 """This defines a default ``Content-Type`` to use if no Content-Type header |
| 340 is given. The empty string is used for RequestBody, which results in the |
| 341 request body not being read or parsed at all. This is by design; a missing |
| 342 ``Content-Type`` header in the HTTP request entity is an error at best, |
| 343 and a security hole at worst. For multipart parts, however, the MIME spec |
| 344 declares that a part with no Content-Type defaults to "text/plain" |
| 345 (see :class:`Part<cherrypy._cpreqbody.Part>`). |
| 346 """ |
| 347 |
| 348 filename = None |
| 349 """The ``Content-Disposition.filename`` header, if available.""" |
| 350 |
| 351 fp = None |
| 352 """The readable socket file object.""" |
| 353 |
| 354 headers = None |
| 355 """A dict of request/multipart header names and values. |
| 356 |
| 357 This is a copy of the ``request.headers`` for the ``request.body``; |
| 358 for multipart parts, it is the set of headers for that part. |
| 359 """ |
| 360 |
| 361 length = None |
| 362 """The value of the ``Content-Length`` header, if provided.""" |
| 363 |
| 364 name = None |
| 365 """The "name" parameter of the ``Content-Disposition`` header, if any.""" |
| 366 |
| 367 params = None |
| 368 """ |
| 369 If the request Content-Type is 'application/x-www-form-urlencoded' or |
| 370 multipart, this will be a dict of the params pulled from the entity |
| 371 body; that is, it will be the portion of request.params that come |
| 372 from the message body (sometimes called "POST params", although they |
| 373 can be sent with various HTTP method verbs). This value is set between |
| 374 the 'before_request_body' and 'before_handler' hooks (assuming that |
| 375 process_request_body is True).""" |
| 376 |
| 377 processors = {'application/x-www-form-urlencoded': process_urlencoded, |
| 378 'multipart/form-data': process_multipart_form_data, |
| 379 'multipart': process_multipart, |
| 380 } |
| 381 """A dict of Content-Type names to processor methods.""" |
| 382 |
| 383 parts = None |
| 384 """A list of Part instances if ``Content-Type`` is of major type "multipart"
.""" |
| 385 |
| 386 part_class = None |
| 387 """The class used for multipart parts. |
| 388 |
| 389 You can replace this with custom subclasses to alter the processing of |
| 390 multipart parts. |
| 391 """ |
| 392 |
| 393 def __init__(self, fp, headers, params=None, parts=None): |
| 394 # Make an instance-specific copy of the class processors |
| 395 # so Tools, etc. can replace them per-request. |
| 396 self.processors = self.processors.copy() |
| 397 |
| 398 self.fp = fp |
| 399 self.headers = headers |
| 400 |
| 401 if params is None: |
| 402 params = {} |
| 403 self.params = params |
| 404 |
| 405 if parts is None: |
| 406 parts = [] |
| 407 self.parts = parts |
| 408 |
| 409 # Content-Type |
| 410 self.content_type = headers.elements('Content-Type') |
| 411 if self.content_type: |
| 412 self.content_type = self.content_type[0] |
| 413 else: |
| 414 self.content_type = httputil.HeaderElement.from_str( |
| 415 self.default_content_type) |
| 416 |
| 417 # Copy the class 'attempt_charsets', prepending any Content-Type charset |
| 418 dec = self.content_type.params.get("charset", None) |
| 419 if dec: |
| 420 self.attempt_charsets = [dec] + [c for c in self.attempt_charsets |
| 421 if c != dec] |
| 422 else: |
| 423 self.attempt_charsets = self.attempt_charsets[:] |
| 424 |
| 425 # Length |
| 426 self.length = None |
| 427 clen = headers.get('Content-Length', None) |
| 428 # If Transfer-Encoding is 'chunked', ignore any Content-Length. |
| 429 if clen is not None and 'chunked' not in headers.get('Transfer-Encoding'
, ''): |
| 430 try: |
| 431 self.length = int(clen) |
| 432 except ValueError: |
| 433 pass |
| 434 |
| 435 # Content-Disposition |
| 436 self.name = None |
| 437 self.filename = None |
| 438 disp = headers.elements('Content-Disposition') |
| 439 if disp: |
| 440 disp = disp[0] |
| 441 if 'name' in disp.params: |
| 442 self.name = disp.params['name'] |
| 443 if self.name.startswith('"') and self.name.endswith('"'): |
| 444 self.name = self.name[1:-1] |
| 445 if 'filename' in disp.params: |
| 446 self.filename = disp.params['filename'] |
| 447 if self.filename.startswith('"') and self.filename.endswith('"')
: |
| 448 self.filename = self.filename[1:-1] |
| 449 |
| 450 # The 'type' attribute is deprecated in 3.2; remove it in 3.3. |
| 451 type = property(lambda self: self.content_type, |
| 452 doc="""A deprecated alias for :attr:`content_type<cherrypy._cpreqbody.En
tity.content_type>`.""") |
| 453 |
| 454 def read(self, size=None, fp_out=None): |
| 455 return self.fp.read(size, fp_out) |
| 456 |
| 457 def readline(self, size=None): |
| 458 return self.fp.readline(size) |
| 459 |
| 460 def readlines(self, sizehint=None): |
| 461 return self.fp.readlines(sizehint) |
| 462 |
| 463 def __iter__(self): |
| 464 return self |
| 465 |
| 466 def __next__(self): |
| 467 line = self.readline() |
| 468 if not line: |
| 469 raise StopIteration |
| 470 return line |
| 471 |
| 472 def next(self): |
| 473 return self.__next__() |
| 474 |
| 475 def read_into_file(self, fp_out=None): |
| 476 """Read the request body into fp_out (or make_file() if None). Return fp
_out.""" |
| 477 if fp_out is None: |
| 478 fp_out = self.make_file() |
| 479 self.read(fp_out=fp_out) |
| 480 return fp_out |
| 481 |
| 482 def make_file(self): |
| 483 """Return a file-like object into which the request body will be read. |
| 484 |
| 485 By default, this will return a TemporaryFile. Override as needed. |
| 486 See also :attr:`cherrypy._cpreqbody.Part.maxrambytes`.""" |
| 487 return tempfile.TemporaryFile() |
| 488 |
| 489 def fullvalue(self): |
| 490 """Return this entity as a string, whether stored in a file or not.""" |
| 491 if self.file: |
| 492 # It was stored in a tempfile. Read it. |
| 493 self.file.seek(0) |
| 494 value = self.file.read() |
| 495 self.file.seek(0) |
| 496 else: |
| 497 value = self.value |
| 498 return value |
| 499 |
| 500 def process(self): |
| 501 """Execute the best-match processor for the given media type.""" |
| 502 proc = None |
| 503 ct = self.content_type.value |
| 504 try: |
| 505 proc = self.processors[ct] |
| 506 except KeyError: |
| 507 toptype = ct.split('/', 1)[0] |
| 508 try: |
| 509 proc = self.processors[toptype] |
| 510 except KeyError: |
| 511 pass |
| 512 if proc is None: |
| 513 self.default_proc() |
| 514 else: |
| 515 proc(self) |
| 516 |
| 517 def default_proc(self): |
| 518 """Called if a more-specific processor is not found for the ``Content-Ty
pe``.""" |
| 519 # Leave the fp alone for someone else to read. This works fine |
| 520 # for request.body, but the Part subclasses need to override this |
| 521 # so they can move on to the next part. |
| 522 pass |
| 523 |
| 524 |
| 525 class Part(Entity): |
| 526 """A MIME part entity, part of a multipart entity.""" |
| 527 |
| 528 # "The default character set, which must be assumed in the absence of a |
| 529 # charset parameter, is US-ASCII." |
| 530 attempt_charsets = ['us-ascii', 'utf-8'] |
| 531 """A list of strings, each of which should be a known encoding. |
| 532 |
| 533 When the Content-Type of the request body warrants it, each of the given |
| 534 encodings will be tried in order. The first one to successfully decode the |
| 535 entity without raising an error is stored as |
| 536 :attr:`entity.charset<cherrypy._cpreqbody.Entity.charset>`. This defaults |
| 537 to ``['utf-8']`` (plus 'ISO-8859-1' for "text/\*" types, as required by |
| 538 `HTTP/1.1 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_
), |
| 539 but ``['us-ascii', 'utf-8']`` for multipart parts. |
| 540 """ |
| 541 |
| 542 boundary = None |
| 543 """The MIME multipart boundary.""" |
| 544 |
| 545 default_content_type = 'text/plain' |
| 546 """This defines a default ``Content-Type`` to use if no Content-Type header |
| 547 is given. The empty string is used for RequestBody, which results in the |
| 548 request body not being read or parsed at all. This is by design; a missing |
| 549 ``Content-Type`` header in the HTTP request entity is an error at best, |
| 550 and a security hole at worst. For multipart parts, however (this class), |
| 551 the MIME spec declares that a part with no Content-Type defaults to |
| 552 "text/plain". |
| 553 """ |
| 554 |
| 555 # This is the default in stdlib cgi. We may want to increase it. |
| 556 maxrambytes = 1000 |
| 557 """The threshold of bytes after which point the ``Part`` will store its data |
| 558 in a file (generated by :func:`make_file<cherrypy._cprequest.Entity.make_fil
e>`) |
| 559 instead of a string. Defaults to 1000, just like the :mod:`cgi` module in |
| 560 Python's standard library. |
| 561 """ |
| 562 |
| 563 def __init__(self, fp, headers, boundary): |
| 564 Entity.__init__(self, fp, headers) |
| 565 self.boundary = boundary |
| 566 self.file = None |
| 567 self.value = None |
| 568 |
| 569 def from_fp(cls, fp, boundary): |
| 570 headers = cls.read_headers(fp) |
| 571 return cls(fp, headers, boundary) |
| 572 from_fp = classmethod(from_fp) |
| 573 |
| 574 def read_headers(cls, fp): |
| 575 headers = httputil.HeaderMap() |
| 576 while True: |
| 577 line = fp.readline() |
| 578 if not line: |
| 579 # No more data--illegal end of headers |
| 580 raise EOFError("Illegal end of headers.") |
| 581 |
| 582 if line == ntob('\r\n'): |
| 583 # Normal end of headers |
| 584 break |
| 585 if not line.endswith(ntob('\r\n')): |
| 586 raise ValueError("MIME requires CRLF terminators: %r" % line) |
| 587 |
| 588 if line[0] in ntob(' \t'): |
| 589 # It's a continuation line. |
| 590 v = line.strip().decode('ISO-8859-1') |
| 591 else: |
| 592 k, v = line.split(ntob(":"), 1) |
| 593 k = k.strip().decode('ISO-8859-1') |
| 594 v = v.strip().decode('ISO-8859-1') |
| 595 |
| 596 existing = headers.get(k) |
| 597 if existing: |
| 598 v = ", ".join((existing, v)) |
| 599 headers[k] = v |
| 600 |
| 601 return headers |
| 602 read_headers = classmethod(read_headers) |
| 603 |
| 604 def read_lines_to_boundary(self, fp_out=None): |
| 605 """Read bytes from self.fp and return or write them to a file. |
| 606 |
| 607 If the 'fp_out' argument is None (the default), all bytes read are |
| 608 returned in a single byte string. |
| 609 |
| 610 If the 'fp_out' argument is not None, it must be a file-like object that |
| 611 supports the 'write' method; all bytes read will be written to the fp, |
| 612 and that fp is returned. |
| 613 """ |
| 614 endmarker = self.boundary + ntob("--") |
| 615 delim = ntob("") |
| 616 prev_lf = True |
| 617 lines = [] |
| 618 seen = 0 |
| 619 while True: |
| 620 line = self.fp.readline(1<<16) |
| 621 if not line: |
| 622 raise EOFError("Illegal end of multipart body.") |
| 623 if line.startswith(ntob("--")) and prev_lf: |
| 624 strippedline = line.strip() |
| 625 if strippedline == self.boundary: |
| 626 break |
| 627 if strippedline == endmarker: |
| 628 self.fp.finish() |
| 629 break |
| 630 |
| 631 line = delim + line |
| 632 |
| 633 if line.endswith(ntob("\r\n")): |
| 634 delim = ntob("\r\n") |
| 635 line = line[:-2] |
| 636 prev_lf = True |
| 637 elif line.endswith(ntob("\n")): |
| 638 delim = ntob("\n") |
| 639 line = line[:-1] |
| 640 prev_lf = True |
| 641 else: |
| 642 delim = ntob("") |
| 643 prev_lf = False |
| 644 |
| 645 if fp_out is None: |
| 646 lines.append(line) |
| 647 seen += len(line) |
| 648 if seen > self.maxrambytes: |
| 649 fp_out = self.make_file() |
| 650 for line in lines: |
| 651 fp_out.write(line) |
| 652 else: |
| 653 fp_out.write(line) |
| 654 |
| 655 if fp_out is None: |
| 656 result = ntob('').join(lines) |
| 657 for charset in self.attempt_charsets: |
| 658 try: |
| 659 result = result.decode(charset) |
| 660 except UnicodeDecodeError: |
| 661 pass |
| 662 else: |
| 663 self.charset = charset |
| 664 return result |
| 665 else: |
| 666 raise cherrypy.HTTPError( |
| 667 400, "The request entity could not be decoded. The following
" |
| 668 "charsets were attempted: %s" % repr(self.attempt_charsets)) |
| 669 else: |
| 670 fp_out.seek(0) |
| 671 return fp_out |
| 672 |
| 673 def default_proc(self): |
| 674 """Called if a more-specific processor is not found for the ``Content-Ty
pe``.""" |
| 675 if self.filename: |
| 676 # Always read into a file if a .filename was given. |
| 677 self.file = self.read_into_file() |
| 678 else: |
| 679 result = self.read_lines_to_boundary() |
| 680 if isinstance(result, basestring): |
| 681 self.value = result |
| 682 else: |
| 683 self.file = result |
| 684 |
| 685 def read_into_file(self, fp_out=None): |
| 686 """Read the request body into fp_out (or make_file() if None). Return fp
_out.""" |
| 687 if fp_out is None: |
| 688 fp_out = self.make_file() |
| 689 self.read_lines_to_boundary(fp_out=fp_out) |
| 690 return fp_out |
| 691 |
| 692 Entity.part_class = Part |
| 693 |
| 694 try: |
| 695 inf = float('inf') |
| 696 except ValueError: |
| 697 # Python 2.4 and lower |
| 698 class Infinity(object): |
| 699 def __cmp__(self, other): |
| 700 return 1 |
| 701 def __sub__(self, other): |
| 702 return self |
| 703 inf = Infinity() |
| 704 |
| 705 |
| 706 comma_separated_headers = ['Accept', 'Accept-Charset', 'Accept-Encoding', |
| 707 'Accept-Language', 'Accept-Ranges', 'Allow', 'Cache-Control', 'Connection', |
| 708 'Content-Encoding', 'Content-Language', 'Expect', 'If-Match', |
| 709 'If-None-Match', 'Pragma', 'Proxy-Authenticate', 'Te', 'Trailer', |
| 710 'Transfer-Encoding', 'Upgrade', 'Vary', 'Via', 'Warning', 'Www-Authenticate'
] |
| 711 |
| 712 |
| 713 class SizedReader: |
| 714 |
| 715 def __init__(self, fp, length, maxbytes, bufsize=DEFAULT_BUFFER_SIZE, has_tr
ailers=False): |
| 716 # Wrap our fp in a buffer so peek() works |
| 717 self.fp = fp |
| 718 self.length = length |
| 719 self.maxbytes = maxbytes |
| 720 self.buffer = ntob('') |
| 721 self.bufsize = bufsize |
| 722 self.bytes_read = 0 |
| 723 self.done = False |
| 724 self.has_trailers = has_trailers |
| 725 |
| 726 def read(self, size=None, fp_out=None): |
| 727 """Read bytes from the request body and return or write them to a file. |
| 728 |
| 729 A number of bytes less than or equal to the 'size' argument are read |
| 730 off the socket. The actual number of bytes read are tracked in |
| 731 self.bytes_read. The number may be smaller than 'size' when 1) the |
| 732 client sends fewer bytes, 2) the 'Content-Length' request header |
| 733 specifies fewer bytes than requested, or 3) the number of bytes read |
| 734 exceeds self.maxbytes (in which case, 413 is raised). |
| 735 |
| 736 If the 'fp_out' argument is None (the default), all bytes read are |
| 737 returned in a single byte string. |
| 738 |
| 739 If the 'fp_out' argument is not None, it must be a file-like object that |
| 740 supports the 'write' method; all bytes read will be written to the fp, |
| 741 and None is returned. |
| 742 """ |
| 743 |
| 744 if self.length is None: |
| 745 if size is None: |
| 746 remaining = inf |
| 747 else: |
| 748 remaining = size |
| 749 else: |
| 750 remaining = self.length - self.bytes_read |
| 751 if size and size < remaining: |
| 752 remaining = size |
| 753 if remaining == 0: |
| 754 self.finish() |
| 755 if fp_out is None: |
| 756 return ntob('') |
| 757 else: |
| 758 return None |
| 759 |
| 760 chunks = [] |
| 761 |
| 762 # Read bytes from the buffer. |
| 763 if self.buffer: |
| 764 if remaining is inf: |
| 765 data = self.buffer |
| 766 self.buffer = ntob('') |
| 767 else: |
| 768 data = self.buffer[:remaining] |
| 769 self.buffer = self.buffer[remaining:] |
| 770 datalen = len(data) |
| 771 remaining -= datalen |
| 772 |
| 773 # Check lengths. |
| 774 self.bytes_read += datalen |
| 775 if self.maxbytes and self.bytes_read > self.maxbytes: |
| 776 raise cherrypy.HTTPError(413) |
| 777 |
| 778 # Store the data. |
| 779 if fp_out is None: |
| 780 chunks.append(data) |
| 781 else: |
| 782 fp_out.write(data) |
| 783 |
| 784 # Read bytes from the socket. |
| 785 while remaining > 0: |
| 786 chunksize = min(remaining, self.bufsize) |
| 787 try: |
| 788 data = self.fp.read(chunksize) |
| 789 except Exception: |
| 790 e = sys.exc_info()[1] |
| 791 if e.__class__.__name__ == 'MaxSizeExceeded': |
| 792 # Post data is too big |
| 793 raise cherrypy.HTTPError( |
| 794 413, "Maximum request length: %r" % e.args[1]) |
| 795 else: |
| 796 raise |
| 797 if not data: |
| 798 self.finish() |
| 799 break |
| 800 datalen = len(data) |
| 801 remaining -= datalen |
| 802 |
| 803 # Check lengths. |
| 804 self.bytes_read += datalen |
| 805 if self.maxbytes and self.bytes_read > self.maxbytes: |
| 806 raise cherrypy.HTTPError(413) |
| 807 |
| 808 # Store the data. |
| 809 if fp_out is None: |
| 810 chunks.append(data) |
| 811 else: |
| 812 fp_out.write(data) |
| 813 |
| 814 if fp_out is None: |
| 815 return ntob('').join(chunks) |
| 816 |
| 817 def readline(self, size=None): |
| 818 """Read a line from the request body and return it.""" |
| 819 chunks = [] |
| 820 while size is None or size > 0: |
| 821 chunksize = self.bufsize |
| 822 if size is not None and size < self.bufsize: |
| 823 chunksize = size |
| 824 data = self.read(chunksize) |
| 825 if not data: |
| 826 break |
| 827 pos = data.find(ntob('\n')) + 1 |
| 828 if pos: |
| 829 chunks.append(data[:pos]) |
| 830 remainder = data[pos:] |
| 831 self.buffer += remainder |
| 832 self.bytes_read -= len(remainder) |
| 833 break |
| 834 else: |
| 835 chunks.append(data) |
| 836 return ntob('').join(chunks) |
| 837 |
| 838 def readlines(self, sizehint=None): |
| 839 """Read lines from the request body and return them.""" |
| 840 if self.length is not None: |
| 841 if sizehint is None: |
| 842 sizehint = self.length - self.bytes_read |
| 843 else: |
| 844 sizehint = min(sizehint, self.length - self.bytes_read) |
| 845 |
| 846 lines = [] |
| 847 seen = 0 |
| 848 while True: |
| 849 line = self.readline() |
| 850 if not line: |
| 851 break |
| 852 lines.append(line) |
| 853 seen += len(line) |
| 854 if seen >= sizehint: |
| 855 break |
| 856 return lines |
| 857 |
| 858 def finish(self): |
| 859 self.done = True |
| 860 if self.has_trailers and hasattr(self.fp, 'read_trailer_lines'): |
| 861 self.trailers = {} |
| 862 |
| 863 try: |
| 864 for line in self.fp.read_trailer_lines(): |
| 865 if line[0] in ntob(' \t'): |
| 866 # It's a continuation line. |
| 867 v = line.strip() |
| 868 else: |
| 869 try: |
| 870 k, v = line.split(ntob(":"), 1) |
| 871 except ValueError: |
| 872 raise ValueError("Illegal header line.") |
| 873 k = k.strip().title() |
| 874 v = v.strip() |
| 875 |
| 876 if k in comma_separated_headers: |
| 877 existing = self.trailers.get(envname) |
| 878 if existing: |
| 879 v = ntob(", ").join((existing, v)) |
| 880 self.trailers[k] = v |
| 881 except Exception: |
| 882 e = sys.exc_info()[1] |
| 883 if e.__class__.__name__ == 'MaxSizeExceeded': |
| 884 # Post data is too big |
| 885 raise cherrypy.HTTPError( |
| 886 413, "Maximum request length: %r" % e.args[1]) |
| 887 else: |
| 888 raise |
| 889 |
| 890 |
| 891 class RequestBody(Entity): |
| 892 """The entity of the HTTP request.""" |
| 893 |
| 894 bufsize = 8 * 1024 |
| 895 """The buffer size used when reading the socket.""" |
| 896 |
| 897 # Don't parse the request body at all if the client didn't provide |
| 898 # a Content-Type header. See http://www.cherrypy.org/ticket/790 |
| 899 default_content_type = '' |
| 900 """This defines a default ``Content-Type`` to use if no Content-Type header |
| 901 is given. The empty string is used for RequestBody, which results in the |
| 902 request body not being read or parsed at all. This is by design; a missing |
| 903 ``Content-Type`` header in the HTTP request entity is an error at best, |
| 904 and a security hole at worst. For multipart parts, however, the MIME spec |
| 905 declares that a part with no Content-Type defaults to "text/plain" |
| 906 (see :class:`Part<cherrypy._cpreqbody.Part>`). |
| 907 """ |
| 908 |
| 909 maxbytes = None |
| 910 """Raise ``MaxSizeExceeded`` if more bytes than this are read from the socke
t.""" |
| 911 |
| 912 def __init__(self, fp, headers, params=None, request_params=None): |
| 913 Entity.__init__(self, fp, headers, params) |
| 914 |
| 915 # http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 |
| 916 # When no explicit charset parameter is provided by the |
| 917 # sender, media subtypes of the "text" type are defined |
| 918 # to have a default charset value of "ISO-8859-1" when |
| 919 # received via HTTP. |
| 920 if self.content_type.value.startswith('text/'): |
| 921 for c in ('ISO-8859-1', 'iso-8859-1', 'Latin-1', 'latin-1'): |
| 922 if c in self.attempt_charsets: |
| 923 break |
| 924 else: |
| 925 self.attempt_charsets.append('ISO-8859-1') |
| 926 |
| 927 # Temporary fix while deprecating passing .parts as .params. |
| 928 self.processors['multipart'] = _old_process_multipart |
| 929 |
| 930 if request_params is None: |
| 931 request_params = {} |
| 932 self.request_params = request_params |
| 933 |
| 934 def process(self): |
| 935 """Process the request entity based on its Content-Type.""" |
| 936 # "The presence of a message-body in a request is signaled by the |
| 937 # inclusion of a Content-Length or Transfer-Encoding header field in |
| 938 # the request's message-headers." |
| 939 # It is possible to send a POST request with no body, for example; |
| 940 # however, app developers are responsible in that case to set |
| 941 # cherrypy.request.process_body to False so this method isn't called. |
| 942 h = cherrypy.serving.request.headers |
| 943 if 'Content-Length' not in h and 'Transfer-Encoding' not in h: |
| 944 raise cherrypy.HTTPError(411) |
| 945 |
| 946 self.fp = SizedReader(self.fp, self.length, |
| 947 self.maxbytes, bufsize=self.bufsize, |
| 948 has_trailers='Trailer' in h) |
| 949 super(RequestBody, self).process() |
| 950 |
| 951 # Body params should also be a part of the request_params |
| 952 # add them in here. |
| 953 request_params = self.request_params |
| 954 for key, value in self.params.items(): |
| 955 # Python 2 only: keyword arguments must be byte strings (type 'str')
. |
| 956 if sys.version_info < (3, 0): |
| 957 if isinstance(key, unicode): |
| 958 key = key.encode('ISO-8859-1') |
| 959 |
| 960 if key in request_params: |
| 961 if not isinstance(request_params[key], list): |
| 962 request_params[key] = [request_params[key]] |
| 963 request_params[key].append(value) |
| 964 else: |
| 965 request_params[key] = value |
OLD | NEW |