Some components for the "other" information infrastructure on the internet (rss feed etc)

Disclaimer: this started essentially as note-to-self listing a few interesting projects to spare me another internet search session.

RSS feeds (and their twin brothers Atom) are ubiquitous over the internet making it possible to easily get a summary of the latest publications of a given website.

Interestingly a huge amount of websites produce this kind of feeds (most blogs obviously but also sites like  twitter[en]) and from this point of view the RSS format is quite lively.

But on the consumer side, I'm pretty  disappointed with the "offer" in terms of RSS readers. Over the time I've tested several well-known desktop readers (liferea, rssowl, thunderbird...) and most of then ended up synchronizing with Google Reader. This one has consequently come to be my main newsreader and it appears to me as clearly dominating the world of internet based newsreader. However such a predominance is not that much a good sign ((a quick glimpse at HackerNews shows that people are regularly trying to reinvent the newsreader service so there is hope I guess)).

There's a lot of thing I like in Google Reader and especially its good user experience (the clean UI, the key bindings, the reactivity), but it still doesn't help me to find the interesting bits in the hundreds of posts that fall everyday down the feed ((Even worse it seems to be designed to recommend me more subscriptions, argh ! )).

Hence the idea to look for alternative solution and maybe libraries and components to build my own personal news reader.

Some open source and pythonic components found on the net:

  • Atomisator by Tarek Ziadé: the good idea is here: use natural language processing toolkits to triage the articles ((a good idea from 2008...))
  • pyf (on bitbucket) offers a framework for dataflow programming, also with a re-implementation of Yahoo Pipes' UI.  What's more the site looks nice with a lot of docs that I'd like to dive in to get some more details.
  • pypes (on bitbucket) apparently does the same as pyf. The code's structure and the code samples look a bit more understandable to me, but the whole thing requires a specific implementation of Python (stackless), too bad ?
  • pipe2py ((if you look long enough you'll realize that my current post looks like a great exercise of plagiarism of one of the comments)) (on github) translate Yahoo Pipes instances into Python code and could have been of great use to me during my previous experiments with feeds.

Given all these components (and I've only searched for Python ones) I wonder why we don't have better news reading solutions :)

All the more if we add to that the good advice from Dave Winer, one of the founding fathers of this "feed system":

  • The RSS community wakes up with 2 recommendations: make the subscription easier with a centralised "yellow pages" of RSS feeds and enable notifications into RSS (making them work in "push" mode as well as in "pull")
  • RSS is supposed to be really simple : defining that the RSS reader of the future should be twitter-like

I mostly agree with all that though not 100% with the last point. For my use case, one of the interest of the news reader is not to miss any publication of a given selection of website, even though for other sources of information (high volume of articles, among which a lot of  "noise") I wouldn't mind missing (more than) a few. Today both categories are shown and processed in the same way by the readers I know and that's maybe a place where some amelioration could sought (and, no, I won't create two separate google accounts :) ).