I recently came across PragPub and noticed they had an archive of back issues available for perusal. Before subscribing, I thought I’d download the entire archive to see what’s already been published. Since there are nearly 50 issues available, this was a great opportunity for web-scraping. My first instinct was to Google if anyone had come up with a solution. Indeed there were a couple of scripts!
Then it hit me…
All I needed to do was check the download link. If it was semantic, then
wget is all I would need. Upon inspection, the URLs are perfect for this situation:
wget would be all I need:
Now compare that to the results Google turned up. The first script is a perl script that requires the use of several CPAN modules.
The second is a python script that actually uses
Now, I should say that the best tool for a job is often the one that you have on hand or even the one that you are most familiar with. I don’t mean to criticize the authors of these scripts—I’m sure the scripts worked for them (and that’s why I haven’t named the authors). What I want to point out is that knowing the capabilities of the tools available on most modern systems can be of great use in situations like these.
It is worthwhile to keep in mind the simplicity of our UNIX forefathers.