reader 3.15 released – Retry-After

Hi there!

I'm happy to announce version 3.15 of reader, a Python feed reader library.

What's new? #

Here are the highlights since reader 3.13.

Retry-After #

Now that it supports scheduled updates, reader can honor the Retry-After HTTP header sent with 429 Too Many Requests or 503 Service Unavailable responses.

Adding this required an extensive rework of the parser internal API, but I'd say it was worth it, since we're getting quite close to it becoming stable.

Next up in HTTP compliance is to do more on behalf of the user: bump the update interval on repeated throttling, and handle gone and redirected feeds accordingly.

Faster tag filters, feed slugs #

OR-only tag filters like get_feeds(tags=[['one', 'two']]) now use an index.

This is useful for maintaining a reverse mapping to feeds/entries, like the feed slugs recipe does to add support for user-defined short URLs:

>>> url = 'https://death.andgravity.com/_feed/index.xml'
>>> reader.set_feed_slug(url, 'andgravity')
>>> reader.get_feed_by_slug('andgravity')
Feed(url='https://death.andgravity.com/_feed/index.xml', ...)

(Interested in adopting this recipe as a real plugin? Submit a pull request!)

enclosure_tags improvements #

The enclosure_tags plugin fixes ID3 tags for MP3 enclosures like podcasts.

I've changed the implementation to rewrite tags on the fly, instead of downloading the entire file, rewriting tags, and then sending it to the user; this should allow browsers to display accurate download progress.

Some other, smaller improvements:

Set genre to Podcast if the feed has any tag containing "podcast".
Prefer feed user title to feed title if available.
Use feed title as artist, instead of author.

Using the installed feedparser #

Because feedparser makes PyPI releases at a lower cadence, reader has been using a vendored version of feedparser's develop branch for some time. It is now possible to opt out of this behavior and make reader use the installed feedparser package.

Python versions #

reader 3.14 (released back in July) adds support for Python 3.13.

Upcoming changes #

Replacing Requests with HTTPX #

reader relies on Requests to retrieve feeds from the internet; among others, it replaces feedparser's use of urllib to make it easier to write plugins.

However, Requests has a few issues that may never get fixed because it is in a feature-freeze – mainly the lack of default timeouts, underpowered response hooks, and no request hooks, all of which I had to work around in reader code.

So, I've been looking into using HTTPX instead.

Some reasons to use HTTPX:

largely Requests-compatible API and feature set
while the ecosystem is probably not comparable, it is actively maintained, popular enough, and the basics (mocking, auth) are there
strict timeouts by default (and more kinds than Requests)
request/response hooks
URL normalization (needed by the parser)

Bad reasons to move away from Requests:

lack of async support – I have no plan to use async in reader at this point
lack of HTTP/2 support – coming soon in urllib3 (and by extension, Requests?); also, reader makes rare requests to many different hosts, I'm not sure it would benefit all that much from HTTP/2
lack of Brotli/Zstandard compresson support – urllib3 already supports them

Reasons to not move to HTTPX:

not 1.0 yet (but coming soon)
not as battle-tested as Requests (but can use urllib3 as transport)

So, when is this happening? Nothing's actually burning, so soon™, but not that soon; watch #360 if you're interested in this.

That's it for now. For more details, see the full changelog.

Want to contribute? Check out the docs and the roadmap.

Learned something new today? Share this with others, it really helps! PyCoder's Weekly HN Reddit linkedin Twitter

What is reader? #

reader takes care of the core functionality required by a feed reader, so you can focus on what makes yours different.

reader in action reader allows you to:

retrieve, store, and manage Atom, RSS, and JSON feeds
mark articles as read or important
add arbitrary tags/metadata to feeds and articles
filter feeds and articles
full-text search articles
get statistics on feed and user activity
write plugins to extend its functionality

...all these with:

a stable, clearly documented API
excellent test coverage
fully typed Python

To find out more, check out the GitHub repo and the docs, or give the tutorial a try.

Why use a feed reader library? #

Have you been unhappy with existing feed readers and wanted to make your own, but:

never knew where to start?
it seemed like too much work?
you don't like writing backend code?

Are you already working with feedparser, but:

want an easier way to store, filter, sort and search feeds and entries?
want to get back type-annotated objects instead of dicts?
want to restrict or deny file-system access?
want to change the way feeds are retrieved by using Requests?
want to also support JSON Feed?
want to support custom information sources?

... while still supporting all the feed types feedparser does?

If you answered yes to any of the above, reader can help.

The reader philosophy #

reader is a library
reader is for the long term
reader is extensible
reader is stable (within reason)
reader is simple to use; API matters
reader features work well together
reader is tested
reader is documented
reader has minimal dependencies

Why make your own feed reader? #

So you can:

have full control over your data
control what features it has or doesn't have
decide how much you pay for it
make sure it doesn't get closed while you're still using it
really, it's easier than you think

Obviously, this may not be your cup of tea, but if it is, reader can help.