I'm Chris.

Indexing the unindexed tumblrs

August 4th, 2013

After purchasing tumblr, it didn't take Yahoo long to start messing it up. Yahoo allegedly adopted some fairly strict filtering of content, so based on the content of your blog it could be blocked by the site's internal search as well as external (ie: google) searches. I built an application that crawls tumblr and builds a list of the unindexed blogs, and then made it searchable. To be honest, I don't really understand tumblr, and I can't tell the difference between spam and not-spam, since everyone is just reposting everyone else's (mostly nsfw) posts anyway. And Yahoo alleges that this is mostly a spam filtering strategy. Nevertheless, it was a policy change that got people worked up and it was sort of a fun sunday evening project.

(link removed, i took the server down)

Stuff used: scrapy, django, backbone.js, elasticsearch