OLD | NEW |
(Empty) | |
| 1 Here's a rough walkthrough of how this works. The ultimate output file is |
| 2 database.filtered.json. |
| 3 |
| 4 search.js |
| 5 - read data/domTypes.json |
| 6 - for each dom type: |
| 7 - search for page on www.googleapis.com |
| 8 - write search results to output/search/<type>.json |
| 9 . this is a list of search results and urls to pages |
| 10 |
| 11 crawl.js |
| 12 - read data/domTypes.json |
| 13 - for each dom type: |
| 14 - for each output/search/<type>.json: |
| 15 - for each result in the file: |
| 16 - try to scrape that cached MDN page from webcache.googleusercontent.com |
| 17 - write mdn page to output/crawl/<type><index of result>.html |
| 18 - write output/crawl/cache.json |
| 19 . it maps types -> search result page urls and titles |
| 20 |
| 21 extract.sh |
| 22 - compile extract.dart to js |
| 23 - run extractRunner.js |
| 24 - read data/domTypes.json |
| 25 - read output/crawl/cache.json |
| 26 - read data/dartIdl.json |
| 27 - for each scraped search result page: |
| 28 - create a cleaned up html page in output/extract/<type><index>.html that |
| 29 contains the scraped content + a script tag that includes extract.dart.js. |
| 30 - create an args file in output/extract/<type><index>.html.json with some |
| 31 data on how that file should be processed |
| 32 - invoke dump render tree on that file |
| 33 - when that returns, parse the console output and add it to database.json |
| 34 - add any errors to output/errors.json |
| 35 - save output/database.json |
| 36 |
| 37 extract.dart |
| 38 - xhr output/extract/<type><index>.html.json |
| 39 - all sorts of shenanigans to actually pull the content out of the html |
| 40 - build a JSON object with the results |
| 41 - do a postmessage with that object so extractRunner.js can pull it out |
| 42 |
| 43 - run postProcess.dart |
| 44 - go through the results for each type looking for the best match |
| 45 - write output/database.html |
| 46 - write output/examples.html |
| 47 - write output/obsolete.html |
| 48 - write output/database.filtered.json which is the best matches |
OLD | NEW |