Index: utils/apidoc/mdn/README.txt |
diff --git a/utils/apidoc/mdn/README.txt b/utils/apidoc/mdn/README.txt |
new file mode 100644 |
index 0000000000000000000000000000000000000000..6288fc060b67c1326b78b088a5b5b0d6ac2de4e5 |
--- /dev/null |
+++ b/utils/apidoc/mdn/README.txt |
@@ -0,0 +1,48 @@ |
+Here's a rough walkthrough of how this works. The ultimate output file is |
+database.filtered.json. |
nweiz
2012/02/01 00:10:39
After reading this, I'm having trouble understandi
|
+ |
+search.js |
+- read data/domTypes.json |
nweiz
2012/02/01 00:10:39
What's in this file? Where does it come from?
|
+- for each dom type: |
+ - search for page on www.googleapis.com |
+ - write search results to output/search/<type>.json |
+ . this is a list of search results and urls to pages |
+ |
+crawl.js |
+- read data/domTypes.json |
+- for each dom type: |
+ - for each output/search/<type>.json: |
nweiz
2012/02/01 00:10:39
Isn't there only one of these files for each type?
|
+ - for each result in the file: |
+ - try to scrape that cached MDN page from webcache.googleusercontent.com |
+ - write mdn page to output/crawl/<type><index of result>.html |
+- write output/crawl/cache.json |
+ . it maps types -> search result page urls and titles |
+ |
+extract.sh |
nweiz
2012/02/01 00:10:39
Should probably mention which directory this needs
|
+- compile extract.dart to js |
+- run extractRunner.js |
nweiz
2012/02/01 00:10:39
Is this the same as the compiled extract.dart? If
|
+ - read data/domTypes.json |
+ - read output/crawl/cache.json |
+ - read data/dartIdl.json |
nweiz
2012/02/01 00:10:39
What's in this file? Where does it come from?
|
+ - for each scraped search result page: |
+ - create a cleaned up html page in output/extract/<type><index>.html that |
+ contains the scraped content + a script tag that includes extract.dart.js. |
+ - create an args file in output/extract/<type><index>.html.json with some |
+ data on how that file should be processed |
nweiz
2012/02/01 00:10:39
s/that file/the HTML file/
What sort of data? Wha
|
+ - invoke dump render tree on that file |
nweiz
2012/02/01 00:10:39
Make it more explicit that this invokes it in a he
|
+ - when that returns, parse the console output and add it to database.json |
nweiz
2012/02/01 00:10:39
Does this mean output/database.json?
|
+ - add any errors to output/errors.json |
+ - save output/database.json |
nweiz
2012/02/01 00:10:39
Somewhat confusing given that you just said you we
|
+ |
+extract.dart |
nweiz
2012/02/01 00:10:39
Is this run within extractRunner.js? How is its fu
|
+- xhr output/extract/<type><index>.html.json |
nweiz
2012/02/01 00:10:39
Is this different than the "read *.json" you're do
|
+- all sorts of shenanigans to actually pull the content out of the html |
+- build a JSON object with the results |
+- do a postmessage with that object so extractRunner.js can pull it out |
+ |
+- run postProcess.dart |
nweiz
2012/02/01 00:10:39
Is this run via DumpRenderTree? On the VM? On Frog
|
+ - go through the results for each type looking for the best match |
nweiz
2012/02/01 00:10:39
Mention what files you're using here.
|
+ - write output/database.html |
+ - write output/examples.html |
+ - write output/obsolete.html |
nweiz
2012/02/01 00:10:39
What are all these files for? Why are they in HTML
|
+ - write output/database.filtered.json which is the best matches |
nweiz
2012/02/01 00:10:39
Is this just a mapping of type names to the conten
|