Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(741)

Unified Diff: third_party/gsutil/boto/docs/source/cloudsearch_tut.rst

Issue 12317103: Added gsutil to depot tools (Closed) Base URL: https://chromium.googlesource.com/chromium/tools/depot_tools.git@master
Patch Set: added readme Created 7 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: third_party/gsutil/boto/docs/source/cloudsearch_tut.rst
diff --git a/third_party/gsutil/boto/docs/source/cloudsearch_tut.rst b/third_party/gsutil/boto/docs/source/cloudsearch_tut.rst
new file mode 100644
index 0000000000000000000000000000000000000000..7172a47dd7b57684bc44099938d214e49ef4161f
--- /dev/null
+++ b/third_party/gsutil/boto/docs/source/cloudsearch_tut.rst
@@ -0,0 +1,411 @@
+.. cloudsearch_tut:
+
+===============================================
+An Introduction to boto's Cloudsearch interface
+===============================================
+
+This tutorial focuses on the boto interface to AWS' Cloudsearch_. This tutorial
+assumes that you have boto already downloaded and installed.
+
+.. _Cloudsearch: http://aws.amazon.com/cloudsearch/
+
+Creating a Connection
+---------------------
+The first step in accessing CloudSearch is to create a connection to the service.
+
+The recommended method of doing this is as follows::
+
+ >>> import boto.cloudsearch
+ >>> conn = boto.cloudsearch.connect_to_region("us-east-1", aws_access_key_id= '<aws access key'>, aws_secret_access_key='<aws secret key>')
+
+At this point, the variable conn will point to a CloudSearch connection object
+in the us-east-1 region. Currently, this is the only region which has the
+CloudSearch service. In this example, the AWS access key and AWS secret key are
+passed in to the method explicitly. Alternatively, you can set the environment
+variables:
+
+* `AWS_ACCESS_KEY_ID` - Your AWS Access Key ID
+* `AWS_SECRET_ACCESS_KEY` - Your AWS Secret Access Key
+
+and then simply call::
+
+ >>> import boto.cloudsearch
+ >>> conn = boto.cloudsearch.connect_to_region("us-east-1")
+
+In either case, conn will point to the Connection object which we will use
+throughout the remainder of this tutorial.
+
+Creating a Domain
+-----------------
+
+Once you have a connection established with the CloudSearch service, you will
+want to create a domain. A domain encapsulates the data that you wish to index,
+as well as indexes and metadata relating to it.
+
+ >>> from boto.cloudsearch.domain import Domain
+ >>> domain = Domain(conn, conn.create_domain('demo'))
+
+This domain can be used to control access policies, indexes, and the actual
+document service, which you will use to index and search.
+
+Setting access policies
+-----------------------
+
+Before you can connect to a document service, you need to set the correct access properties.
+For example, if you were connecting from 192.168.1.0, you could give yourself access as follows:
+
+ >>> our_ip = '192.168.1.0'
+
+ >>> # Allow our IP address to access the document and search services
+ >>> policy = domain.get_access_policies()
+ >>> policy.allow_search_ip(our_ip)
+ >>> policy.allow_doc_ip(our_ip)
+
+You can use the allow_search_ip() and allow_doc_ip() methods to give different
+CIDR blocks access to searching and the document service respectively.
+
+Creating index fields
+---------------------
+
+Each domain can have up to twenty index fields which are indexed by the
+CloudSearch service. For each index field, you will need to specify whether
+it's a text or integer field, as well as optionaly a default value.
+
+ >>> # Create an 'text' index field called 'username'
+ >>> uname_field = domain.create_index_field('username', 'text')
+
+ >>> # Epoch time of when the user last did something
+ >>> time_field = domain.create_index_field('last_activity', 'uint', default=0)
+
+It is also possible to mark an index field as a facet. Doing so allows a search
+query to return categories into which results can be grouped, or to create
+drill-down categories
+
+ >>> # But it would be neat to drill down into different countries
+ >>> loc_field = domain.create_index_field('location', 'text', facet=True)
+
+Finally, you can also mark a snippet of text as being able to be returned
+directly in your search query by using the results option.
+
+ >>> # Directly insert user snippets in our results
+ >>> snippet_field = domain.create_index_field('snippet', 'text', result=True)
+
+You can add up to 20 index fields in this manner:
+
+ >>> follower_field = domain.create_index_field('follower_count', 'uint', default=0)
+
+Adding Documents to the Index
+-----------------------------
+
+Now, we can add some documents to our new search domain. First, you will need a
+document service object through which queries are sent:
+
+ >>> doc_service = domain.get_document_service()
+
+For this example, we will use a pre-populated list of sample content for our
+import. You would normally pull such data from your database or another
+document store.
+
+ >>> users = [
+ {
+ 'id': 1,
+ 'username': 'dan',
+ 'last_activity': 1334252740,
+ 'follower_count': 20,
+ 'location': 'USA',
+ 'snippet': 'Dan likes watching sunsets and rock climbing',
+ },
+ {
+ 'id': 2,
+ 'username': 'dankosaur',
+ 'last_activity': 1334252904,
+ 'follower_count': 1,
+ 'location': 'UK',
+ 'snippet': 'Likes to dress up as a dinosaur.',
+ },
+ {
+ 'id': 3,
+ 'username': 'danielle',
+ 'last_activity': 1334252969,
+ 'follower_count': 100,
+ 'location': 'DE',
+ 'snippet': 'Just moved to Germany!'
+ },
+ {
+ 'id': 4,
+ 'username': 'daniella',
+ 'last_activity': 1334253279,
+ 'follower_count': 7,
+ 'location': 'USA',
+ 'snippet': 'Just like Dan, I like to watch a good sunset, but heights scare me.',
+ }
+ ]
+
+When adding documents to our document service, we will batch them together. You
+can schedule a document to be added by using the add() method. Whenever you are
+adding a document, you must provide a unique ID, a version ID, and the actual
+document to be indexed. In this case, we are using the user ID as our unique
+ID. The version ID is used to determine which is the latest version of an
+object to be indexed. If you wish to update a document, you must use a higher
+version ID. In this case, we are using the time of the user's last activity as
+a version number.
+
+ >>> for user in users:
+ >>> doc_service.add(user['id'], user['last_activity'], user)
+
+When you are ready to send the batched request to the document service, you can
+do with the commit() method. Note that cloudsearch will charge per 1000 batch
+uploads. Each batch upload must be under 5MB.
+
+ >>> result = doc_service.commit()
+
+The result is an instance of `cloudsearch.CommitResponse` which will
+make the plain dictionary response a nice object (ie result.adds,
+result.deletes) and raise an exception for us if all of our documents
+weren't actually committed.
+
+After you have successfully committed some documents to cloudsearch, you must
+use :py:meth:`clear_sdf
+<boto.cloudsearch.document.DocumentServiceConnection.clear_sdf>`, if you wish
+to use the same document service connection again so that its internal cache is
+cleared.
+
+Searching Documents
+-------------------
+
+Now, let's try performing a search. First, we will need a SearchServiceConnection:
+
+ >>> search_service = domain.get_search_service()
+
+A standard search will return documents which contain the exact words being
+searched for.
+
+ >>> results = search_service.search(q="dan")
+ >>> results.hits
+ 2
+ >>> map(lambda x: x['id'], results)
+ [u'1', u'4']
+
+The standard search does not look at word order:
+
+ >>> results = search_service.search(q="dinosaur dress")
+ >>> results.hits
+ 1
+ >>> map(lambda x: x['id'], results)
+ [u'2']
+
+It's also possible to do more complex queries using the bq argument (Boolean
+Query). When you are using bq, your search terms must be enclosed in single
+quotes.
+
+ >>> results = search_service.search(bq="'dan'")
+ >>> results.hits
+ 2
+ >>> map(lambda x: x['id'], results)
+ [u'1', u'4']
+
+When you are using boolean queries, it's also possible to use wildcards to
+extend your search to all words which start with your search terms:
+
+ >>> results = search_service.search(bq="'dan*'")
+ >>> results.hits
+ 4
+ >>> map(lambda x: x['id'], results)
+ [u'1', u'2', u'3', u'4']
+
+The boolean query also allows you to create more complex queries. You can OR
+term together using "|", AND terms together using "+" or a space, and you can
+remove words from the query using the "-" operator.
+
+ >>> results = search_service.search(bq="'watched|moved'")
+ >>> results.hits
+ 2
+ >>> map(lambda x: x['id'], results)
+ [u'3', u'4']
+
+By default, the search will return 10 terms but it is possible to adjust this
+by using the size argument as follows:
+
+ >>> results = search_service.search(bq="'dan*'", size=2)
+ >>> results.hits
+ 4
+ >>> map(lambda x: x['id'], results)
+ [u'1', u'2']
+
+It is also possible to offset the start of the search by using the start argument as follows:
+
+ >>> results = search_service.search(bq="'dan*'", start=2)
+ >>> results.hits
+ 4
+ >>> map(lambda x: x['id'], results)
+ [u'3', u'4']
+
+
+Ordering search results and rank expressions
+--------------------------------------------
+
+If your search query is going to return many results, it is good to be able to sort them
+You can order your search results by using the rank argument. You are able to
+sort on any fields which have the results option turned on.
+
+ >>> results = search_service.search(bq=query, rank=['-follower_count'])
+
+You can also create your own rank expressions to sort your results according to
+other criteria:
+
+ >>> domain.create_rank_expression('recently_active', 'last_activity') # We'll want to be able to just show the most recently active users
+
+ >>> domain.create_rank_expression('activish', 'text_relevance + ((follower_count/(time() - last_activity))*1000)') # Let's get trickier and combine text relevance with a really dynamic expression
+
+ >>> results = search_service.search(bq=query, rank=['-recently_active'])
+
+Viewing and Adjusting Stemming for a Domain
+-------------------------------------------
+
+A stemming dictionary maps related words to a common stem. A stem is
+typically the root or base word from which variants are derived. For
+example, run is the stem of running and ran. During indexing, Amazon
+CloudSearch uses the stemming dictionary when it performs
+text-processing on text fields. At search time, the stemming
+dictionary is used to perform text-processing on the search
+request. This enables matching on variants of a word. For example, if
+you map the term running to the stem run and then search for running,
+the request matches documents that contain run as well as running.
+
+To get the current stemming dictionary defined for a domain, use the
+``get_stemming`` method of the Domain object.
+
+ >>> stems = domain.get_stemming()
+ >>> stems
+ {u'stems': {}}
+ >>>
+
+This returns a dictionary object that can be manipulated directly to
+add additional stems for your search domain by adding pairs of term:stem
+to the stems dictionary.
+
+ >>> stems['stems']['running'] = 'run'
+ >>> stems['stems']['ran'] = 'run'
+ >>> stems
+ {u'stems': {u'ran': u'run', u'running': u'run'}}
+ >>>
+
+This has changed the value locally. To update the information in
+Amazon CloudSearch, you need to save the data.
+
+ >>> stems.save()
+
+You can also access certain CloudSearch-specific attributes related to
+the stemming dictionary defined for your domain.
+
+ >>> stems.status
+ u'RequiresIndexDocuments'
+ >>> stems.creation_date
+ u'2012-05-01T12:12:32Z'
+ >>> stems.update_date
+ u'2012-05-01T12:12:32Z'
+ >>> stems.update_version
+ 19
+ >>>
+
+The status indicates that, because you have changed the stems associated
+with the domain, you will need to re-index the documents in the domain
+before the new stems are used.
+
+Viewing and Adjusting Stopwords for a Domain
+--------------------------------------------
+
+Stopwords are words that should typically be ignored both during
+indexing and at search time because they are either insignificant or
+so common that including them would result in a massive number of
+matches.
+
+To view the stopwords currently defined for your domain, use the
+``get_stopwords`` method of the Domain object.
+
+ >>> stopwords = domain.get_stopwords()
+ >>> stopwords
+ {u'stopwords': [u'a',
+ u'an',
+ u'and',
+ u'are',
+ u'as',
+ u'at',
+ u'be',
+ u'but',
+ u'by',
+ u'for',
+ u'in',
+ u'is',
+ u'it',
+ u'of',
+ u'on',
+ u'or',
+ u'the',
+ u'to',
+ u'was']}
+ >>>
+
+You can add additional stopwords by simply appending the values to the
+list.
+
+ >>> stopwords['stopwords'].append('foo')
+ >>> stopwords['stopwords'].append('bar')
+ >>> stopwords
+
+Similarly, you could remove currently defined stopwords from the list.
+To save the changes, use the ``save`` method.
+
+ >>> stopwords.save()
+
+The stopwords object has similar attributes defined above for stemming
+that provide additional information about the stopwords in your domain.
+
+
+Viewing and Adjusting Stopwords for a Domain
+--------------------------------------------
+
+You can configure synonyms for terms that appear in the data you are
+searching. That way, if a user searches for the synonym rather than
+the indexed term, the results will include documents that contain the
+indexed term.
+
+If you want two terms to match the same documents, you must define
+them as synonyms of each other. For example:
+
+ cat, feline
+ feline, cat
+
+To view the synonyms currently defined for your domain, use the
+``get_synonyms`` method of the Domain object.
+
+ >>> synonyms = domain.get_synonyms()
+ >>> synonyms
+ {u'synonyms': {}}
+ >>>
+
+You can define new synonyms by adding new term:synonyms entries to the
+synonyms dictionary object.
+
+ >>> synonyms['synonyms']['cat'] = ['feline', 'kitten']
+ >>> synonyms['synonyms']['dog'] = ['canine', 'puppy']
+
+To save the changes, use the ``save`` method.
+
+ >>> synonyms.save()
+
+The synonyms object has similar attributes defined above for stemming
+that provide additional information about the stopwords in your domain.
+
+Deleting Documents
+------------------
+
+ >>> import time
+ >>> from datetime import datetime
+
+ >>> doc_service = domain.get_document_service()
+
+ >>> # Again we'll cheat and use the current epoch time as our version number
+
+ >>> doc_service.delete(4, int(time.mktime(datetime.utcnow().timetuple())))
+ >>> service.commit()

Powered by Google App Engine
This is Rietveld 408576698