As I’ve said before, VuGen makes a great content scraping tool for cases when you want a quick and dirty script to save specific data from multiple webpages.
In this example, I wanted to create a list of all the WordPress plugins available from http://wordpress.org/extend/plugins/ (currently there are 4,245), and save all the metadata about the plugin:
- Number of downloads
- Version number
// Only download content from "wordpress.org"
web_add_auto_filter("Action=Include", "Host=wordpress.org", LAST);
Load the next "browse by popularity" page, then load each plugin page in turn.
Save version and statistics data for each plugin to a file.
There are 332 pages, so set number of iterations to 332 in Runtime Settings.
http://wordpress.org/extend/plugins/browse/popular/page/2 - http://wordpress.org/extend/plugins/browse/popular/page/332
char* file = "C:\TEMP\output.txt";
For those who would like a copy of the raw data, it is available here (904 KB).