Description

For a few years now I have been using mutt for handling my private emails and Thunderbird for work related stuff, mainly because the former is less than stellar at handling mailing lists and the latter is at least bearable for mailing lists. These days my private email address is on a lot of mailing lists, too and this kind of arrangement may yet force me to use Thunderbird for my private email addresses.

This project is about coming up with a solution that lets me continue to use mutt where it is sensible (i.e. INBOX) and enables handling mailing lists through something designed for the purpose: a NNTP newsreader such as (gnus, KNode, pan, slrn, tin, ...). To this end I plan on cobbling together a Maildir backed local NNTP server with some plumbing to...

  • ...expose IMAP folders as newsgroups
  • ...turn postings to these newsgroups into emails sent to the corresponding list.

Bonus points: Hack up the server side component and one or more news readers of to support NNTP through a Unix Domain Socket to make the whole thing suitable for multi user machines where user would have access to 127.0.0.1.

Building blocks

I already dug up a few things that might be handy in building this sort of thing:

NNTP side

  • https://github.com/bingos/poe-component-server-nntp/ - A perl module for building NNTP servers
  • https://github.com/jpm/papercut - A modular NNTP server that supports adding storage plugins (current favourite since it probably won't need a lot of modification)

Email side

  • http://www.offlineimap.org/ - synchronizes an IMAP account with a local folder (bidirectional, i.e. it will send changes in the maildir to the IMAP account as well)
  • esmtp - lightweight mail transfer agent (accepts emails on STDIN, will send them to the smarthost in its configuration)

Looking for mad skills in:

Nothing? Add some keywords!

This project is part of:

Hack Week 14


Comments

  • jgrassler
    about 3 years ago by jgrassler | Reply

    I opted for using papercut as the NNTP component now. It will require fairly substantial changes:

    • It's currently not packaged in any sort of way, so I will add the boilerplate neccessary to turn Papercut into a pypi package installable through git and adjust the library parts accordingly to make it installable in the system (as opposed to residing in /path/to/where/papercut/will/run). As part of that I will also:
      • ...polish up the main executable a bit so it can take parameters such as --config (to allow for multiple Papercut instances with different configurations).
      • ...add code for handling YAML formatted configuration files (as opposed to the simple settings.py file it comes with right now.
      • Add more flexible configuration for the plugins, namely a way to configure papercut by group hierarchy in a way where something like "use the mbox plugin with files ~/Mail/chameleons/{green,yellow} for the alt.binaries.pictures.chameleons.* hierarchy" is possible.
      • I may not find the time to adapt the forum plugins to these changes (I'm not very interested in these). If somebody is interested in these I'd be happy to accept pull requests, though.

    Beyond that I will need to add code to:

    • Turn posts into emails sent to the proper mailing lists
    • Change message ID handling to use the messages' existing IDs
    • I might also add a patch enabling Papercut to speak NNTP over a Unix Domain Socket (for secure operation on multiuser machines)

    The whole thing will be a hard fork. I talked this over with Joao Prado Maia, the original author and we agreed to do it this way rather than through an avalanche of pull requests (he is no longer maintaining papercut). For those who would like to have a look and/or contribute you will find the current sources here:

    https://github.com/jgrassler/papercut

    I will try to only merge reasonably stable stuff into master.

  • jgrassler
    about 3 years ago by jgrassler | Reply

    Setuptools packaging is done to the point where it installs and runs. If anyone would like to try it:

    sudo zypper install libmysqlclient-devel
    git clone --branch pypi https://github.com/jgrassler/papercut.git /tmp/papercut
    virtualenv /tmp/ppcut
    . /tmp/ppcut/bin/activate
    pip install /tmp/papercut 
    papercut --config /tmp/papercut/etc/papercut/mbox.yaml 
    

    That should give you a NNTP server on localhost, port 1119.

    If this ceases to work at some point just clone the master branch (I'll rebas and merge the pypi branch some time tomorrow). So far I've only tested the mbox storage plugin and that one works.

  • jgrassler
    about 3 years ago by jgrassler | Reply

    Fork/pypi packaging done, you can test-drive it without the --branch option now:

    sudo zypper install libmysqlclient-devel
    git clone --branch pypi https://github.com/jgrassler/papercut.git /tmp/papercut
    virtualenv /tmp/ppcut
    . /tmp/ppcut/bin/activate
    pip install /tmp/papercut 
    papercut --config /tmp/papercut/etc/papercut/mbox.yaml 
    

  • jgrassler
    about 3 years ago by jgrassler | Reply

    ...i.e. clone it with this command (EPASTE in the previous post):

    git clone https://github.com/jgrassler/papercut.git /tmp/papercut

  • jgrassler
    about 3 years ago by jgrassler | Reply

    Now where do I get a maildir from?

    This is only peripherally related to the task at hand but it needed doing: I do not have a maildir since I am using Thunderbird, so where do I get a realistic test case for the code I'm messing with? Or to put it differently, how do I convert Thunderbird's storage to a maildir offlineimap/papercut can handle? Surprisingly this took a considerable amount of effort.

    Starting Point: Thunderbird's Email Storage

    Thunderbird defaults to storing its emails in mbox format. While it does have halfhearted support, that was not an option for me. I started out using the default (mbox) and there is no way to convert existing mbox storage to maildir in Thunderbird itself.

    Battle Plan: mb2md...

    Luckily there is a fairly sophisticated conversion tool: mb2md. Looks like we're all set...

    ...Almost Survives Contact With The Enemy

    At first mb2md looked like the solution to my problem. A few hours of hacking later I had finished the actual solution, a sizable shell script that calls mb2md near the end.

    Invoked as follows it should produce a maildir suitable for offlineimap's consumption:

    source=~/.thunderbird/*.default/ImapMail/mail.example.com dest=~/mail/example.com tb2maildir
    

    In my case it sort of worked. Apparently there were a few conversion errors in the mb2md step and I ended up with a few additional headerless phantom emails after I finally worked up the courage to run offlineimap. But the synchronization speedup was worth it: offlineimap usually had about half the emails in any given folder to work with (Thunderbird did not have everything available locally).

    On the whole I probably would not do the whole thing again but just run offlineimap over night (I initially didn't think the conversion would be this much effort). That being said it's nice to know there is a bandwidth saving alternative and tb2maildir conversion tool will be useful to someone else intending to migrate Thunderbird's local mail storage into a Maildir :-)

  • jgrassler
    about 3 years ago by jgrassler | Reply

    Maildir support tested and working

    It took a few little tweaks to the modified codebase, but basic maildir functionality is now there:

    http://btw23.de/johannes/tmp/slrn.png

    It will still require some optimizing, though. Retrieving all posts from a group based on a large folder containing a few months' worth of openstack-dev posts (papercut.maildir.INBOX.lists.openst in the screenshot) takes a long time. I interrupted it when it was still going on after about two minutes.

    Owing to the maildir format's one-file-per-message there is a lot of I/O required to get just the meta data (a cursory strace look at the papercut process confirmed this). I will probably come up with a header caching scheme for the maildir storage plugin, but if somebody has a better suggestion I'd be happy to try it instead :-)

    (One exception to that: I'd rather not use papercut's existing caching since that caches method return values at top level (regardless of plugin). Since I will need to cache some stuff at the plugin level anyway (Message IDs for one), I'd rather not use it since that way I'd get two independent caches).

  • jgrassler
    about 3 years ago by jgrassler | Reply

    The screenshot URL in the previous post is wrong. Here's the correct one:

    slrn talking to papercut

  • jgrassler
    about 3 years ago by jgrassler | Reply

    Caching is partially implemented (see branch cache). Still a few bugs to be ironed out. I'm calling it a day for now - I'm past the point where I will no longer fix bugs, only introduce new ones :-)

  • jgrassler
    about 3 years ago by jgrassler | Reply

    Optimizing the cache

    I did some further optimizations to the cache this morning. I got rid of all uneccessary I/O but XOVER was still slow. I added some debugging output to get_XOVER and thus the problem down to something in the iteration over the article IDs taking a long time. The prime suspect being the HeaderCache.message_byid() calls, of course.

    That turned out to be the case indeed, and this humble commit finally solved the problem. If you'd like to have a go at debugging the issue yourself, check out the last broken commit. Enjoy :-)

    I'll keep this branch around indefinitely (I won't rebase it either, at least not in place) as a reminder of the incremental improvements and blind alleys the performance optimization process involves.

  • jgrassler
    about 3 years ago by jgrassler | Reply

    Cache is a bit more solid now (I think I dealt with most opportunities for it to go stale now, but feel free to point out the ones I haven't thought of...). Also, article retrieval by Message-ID is now supported for the maildir plugin. Next stop: Configurable hierarchies, each with their own maildir backend instance.

  • jgrassler
    about 3 years ago by jgrassler | Reply

    Quick status update:

    Cache continues to be stable if a bit slow to build (I think I'll add a mechanism to dump it to/load it from disk at some stage). I'm currently in the middle of converting papercut_nntp.py to multi backend operation. All the fluff for making this possible (configuration handling for multiple hierarchies, backend loading in papercut_nntp.py) is in place. Now all the handler methods for NNTP commands need to be converted to multi backend operation. As of now multi backend operation for the following NNTP commands has been implemented and tested: NEWGROUPS, GROUP, NEWNEWS, LIST, STATandARTICLE`.

    The conversion is taking place in the hierarchies branch. If someone is interested in contributing (there are still plenty of operations left...) I'd be happy to accept pull requests against that branch :-)

  • jgrassler
    about 3 years ago by jgrassler | Reply

    Multi backend mode works for maildir now, including posting to the maildir backed groups. So this is mostly finished now. What still needs doing now:

    • A hook for inserting a hierarchy or even group specific posting script that turns messages into emails and saves them to the Drafts folder
    • A proper changelog writeup of everything I did since the last changelog entry
    • Possibly some small nice-to-have kind of stuff I can't think of right now.

    Let's see if I get around to adding these little bits over the weekend... :-)

  • jgrassler
    almost 2 years ago by jgrassler | Reply

    TODO list for Hackweek 16:

    • Persistent cache (to speed up startup)
    • Hooks to insert scripts for posting to groups/entire hierarchies (see previous comment)
    • Publication on PyPI and RPM package

Similar Projects

This project is one of its kind!