Juneau Empire Paper Scraper
Monday, February 11th, 2008 by Pat
I’ve been reading the Juneau Empire less and less since the new website redesign at the end of last month. It’s so ridiculously slow loading and painful to navigate that it must be part of some grand and imperceptible strategy. Perhaps they hope to boost their print circulation by intentionally creating the least accessible website ever?
I don’t know, it remains a mystery.
I thought I would try to figure out some way to fix the site but I didn’t get very far. I tried writing a greasemonkey script to just remove all the offending items but I got frustrated with the slow load times. Greasemonkey does a good job of cleaning things up once they load but, weighing in at over 1.2 Megabytes, the Empire site loads like honey through a straw.
What I’d rather build, and what I know how to build, is a scraper. Something to harvest the data and spit it out in a nice clean format complete with RSS feeds. The problem there is a question of legality since I’d basically be republishing copyrighted works.
I might be missing something obvious but it seems like the best route here has to be a client side solution like a custom Firefox extension or a combination of existing extensions.
I wonder how they navigate their own site? Maybe they use AdBlock and Greasemonkey filters? Do they put up with it because they’re being paid by the hour? What a pain in the ass.




