RichyHBM

Software engineer with a focus on game development and scalable backend development

Simple Email Archival and Indexing

One of the main pain points with self-hosting tends to be email, whilst setting up a mail server isn’t especially hard, the maintenance is what causes most issues; and in particular keeping your self-hosted mail servers off of spam lists. That’s why I generally leave that to third parties, however I still like to keep a local storage of my mail, mostly for backup purposes…

A simple solution to this would be to run something like Thunderbird or Outlook, however these only run when you open them, meaning if I forgot to open it for a while I wouldn’t have the latest copy of all of my emails. Instead I want something that runs 24/7 in the background without me needing to interact with it, with a bonus point if it can also restore emails to a different provider.

There are a number of tools that allow you to download your emails and save these locally as either files, or to databases, but when looking this up the general advice is to then spin up some sort of IMAP server on top of it to serve these into an email client. When all you want is to be able to look up emails this can get quite complicated, and also means you can end up with different email data locally than on the server.

Eventually I managed to find some tools that can index emails on the command line to let you then search them, etc. Luckily there were a couple of projects that provide a GUI on top of these tools allowing you to access these via a web browser. First I had to setup a mechanism to download emails, and find a suitable storage mechanism.

Mbsync

There are a number of different tools that can accomplish the task of fetching emails, but a lot of them seem to work with specific providers. For example there are a number of tools that work specifically with gmail, however I have a few emails from different providers so am looking for something that can use default standards like IMAP. After a little searching I came across mbsync which can fetch from any IMAP email provider and then stores emails in a simple maildir file format, which for my purposes was perfect!

The configuration of mbsync all happens in a config file, for which I am using /config/mbsync.rc, this defines the remote accounts, in this case for gmail, as well as authentication information for these. For gmail you will need to use an app password, as your main password will likely be also setup with 2fa. How you secure this is out of scope for this post but some ways could be using a tool like Vault by Hashicorp or even just using gpg to keep encrypted data on the containers.

IMAPAccount foobar.gmail.com
Host imap.gmail.com
User [email protected]
PassCmd "decrypt_stored_password()" 
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt

This section then defines a remote (in mbsync notion) account, and associates the above defined information to it.

IMAPStore foobar.gmail.com-remote
Account foobar.gmail.com

Next up we define the local account information, such as where to store the data, and how to treat specific labels/folders.

MaildirStore foobar.gmail.com-local
SubFolders Verbatim
Path /mail/gmail.com/foobar/
Inbox /mail/gmail.com/foobar/Inbox

Lastly set up a sync operation, this tells mbsync to synchronise the local account with the remote one, in a pull only direction, without touching anything on the remote side.

Channel foobar.gmail.com
Far :foobar.gmail.com-remote:
Near :foobar.gmail.com-local:
Patterns *
Create Near
Expunge None
Remove None
CopyArrivalDate yes
Sync Pull
SyncState *

Now wrapping this into a container is very easy as mbsync simply does the synchronisation each time it is ran and then exits once finished, meaning we can easily just add this into an hourly cron job (more or less often depending on your preference), and then for good measure run it once on startup.

FROM alpine
RUN apk update && apk add --no-cache isync gpg curl
RUN echo '#!/bin/sh' > /etc/periodic/hourly/fetch_mail.sh
RUN echo 'mbsync -V -a -c /config/mbsync.rc' >> /etc/periodic/hourly/fetch_mail.sh
RUN chmod +x /etc/periodic/hourly/fetch_mail.sh
CMD /etc/periodic/hourly/fetch_mail.sh; /usr/sbin/crond -f

NotMuch/NetViel

For search and indexing of emails I just wanted something simple, a lot of online guides on this suggest spinning up an IMAP server like dovecot but all I wanted for my simple system was a way to look at and search emails via a quick and easy web interface. Enter netviel, netviel is a small web interface on top of the linux command notmuch, which is in itself a search and indexing tool for emails in maildir format, this seemed perfect for my simple needs.

Its configuration is very simple, all you need to do is point it at the maildir directory and set your primary email address, as well as any others you may have.

[database]
path=/app/mail

[user]
name=Richy
primary_email=foobar@gmail.com
other_email=other@gmail.com;foobar@hotmail.com;

[new]
[search]
[maildir]

The dockerfile for it is also very simple, basically just installing it on top of alpine and setting the config location for notmuch to find.

FROM python:3.11-alpine3.17
RUN apk update && apk add --no-cache notmuch curl rsync
RUN pip install notmuch netviel gunicorn
WORKDIR /app
VOLUME /app/mail
ENV NOTMUCH_CONFIG /app/notmuch-config
COPY ./docker/run.sh /app/run.sh
CMD sh /app/run.sh

Now for my particular use case I wanted to make a copy of my maildir in order to fully contain any possible changes to it, for this I made a copy of the maildir using rsync and also syncing any deletions. Then run notmuch to index any new emails that may have been pulled, this is done on an hourly basis.

#!/bin/sh

rsync -r /mail/ /app/mail
notmuch new

echo '#!/bin/sh' > /etc/periodic/hourly/update-mail.sh
echo 'rsync -r --delete /mail/ /app/mail' >> /etc/periodic/hourly/update-mail.sh
echo 'notmuch new' > /etc/periodic/hourly/update-mail.sh

chmod +X /etc/periodic/hourly/update-mail.sh

gunicorn -b 0.0.0.0:5000 netviel.wsgi:app

Docker Setup

Finally it’s just a case of putting all of this into a docker compose file, this just simplifies things, mainly storage locations, the main thing is that I make sure the mail directory I mount in netviel is set as read only so that any edits done within netviel can’t make their way to the original maildir backup directory.

---
version: '3.7'
services:
  mbsync:
    build:
      context: .
      dockerfile: docker/mbsync.Dockerfile
    labels:
      - traefik.enable=false
    container_name: mbsync
    volumes:
      - ./config/mbsync/mbsync.rc:/config/mbsync.rc:ro
      - ./storage/mail:/mail
    environment:
      - TZ=UTC
      - PUID=1000
      - PGID=1000
    restart: unless-stopped

  netviel:
    build:
      context: .
      dockerfile: docker/netviel.Dockerfile
    labels:
      - traefik.enable=true
      - traefik.http.services.netviel.loadbalancer.server.port=5000
      - traefik.http.services.netviel.loadbalancer.server.scheme=http
    container_name: netviel
    environment:
      - TZ=UTC
      - PUID=1000
      - PGID=1000
    volumes:
      - ./config/notmuch-config.toml:/app/notmuch-config
      - ./storage/mail:/mail:ro
      - ./storage/notmuch:/app/mail/.notmuch
    restart: unless-stopped