The Arcology Garden

Asynchronous and Offline

LifeTechEmacsTopicsArcology

I've made it a goal of mine to spend less time directly on-line opting for asynchronous, offline things, caching code documentation locally (and preferably accessible to Emacs), making my mail and feeds available offline and generally trying to spend less time wading in the sewage that is The Orange Site and The Robot Site to pass the time. That is to say, I'm trying to compute a bit more like rms. I've settled on a Rix-signature bullshit setup involving a bunch of moving parts and sawblades, and I'll talk about some of the sharper pieces now.

Back in 2014, I wrote about setting up Gnus to do adaptive scoring of mailboxes, a way of training Gnus to bubble emails you care about towards the top of your inbox, and pushing mails you're less likely to care about downwards, based on who's sent the message, or the subject of the message, as well as defining manual rules across any of the mail's fields, letting me triage multiple weeks of PTO emails down to 15 or so action items when I return. It's an incredibly powerful Tool, and it's flexible given that I can just reach in and define my own scoring rules based on any header or even text in the body. All of my emails are on my local machine, I can inspect exactly why my mail engine thinks the message is important or unimportant and modify that behavior. It's not a magic ML model, it's a brain-dead text-matching engine and that's perfect.

Lately I've begun stretching this system to work with more sources of information, allowing me to push more possible sources of information and let the computer more or less manage that flow for me. Key to this is a piece of software called the Universal Aggregator, a small constellation of tools which lets you take feeds of messages and store them in to Maildir format, which is how I consume my email currently. UA is designed to work with RSS, but its composable nature means that you can wire other things up to it pretty easily. The most impressive part of UA, in my opinion, is the ability to inline images from the source HTML as mime multipart, which means your mail client can load the images from weird tumblr art blogs.

Gnus already has an RSS backend, but I don't use it. It's slow to fetch (it has to do HTTP requests every time you open Gnus or load the group), and I prefer for things like that to have them batched in to a single group in Gnus without having to fight Virtual Groups. Universal Aggregator provides a simple shell-script configuration interface which horrifyingly enough is actually treated as a shell script within the ggs process manager in Universal Aggregator.

default_timeout=30

rss() {
    command 2000 "rss2json \\"$1\\" | ua-inline | maildir-put -cache /data/ua-cache -root /data/Maildir-feeds -folder \\"$2\\""
}

rss "https://usesthis.com/feed/" TechBlog
rss "http://jeff-vogel.blogspot.com/feeds/posts/default?alt=rss" TechBlog
rss "http://randsinrepose.com/feed/" Blogs

It's that simple. You define commands, and then run those commands on URLs. The commands rss2json, ua-inline and maildir-put all do what it says on the tin, it's a shell pipeline which turns an RSS feed in to entries in a Maildir. I run this inside of a Docker container with my personal Maildir mounted on to /data, and bob's your uncle.

Expanding on this, I've begun to look at what other data sources I consume would benefit from the offline, scored system that my RSS feeds and mail benefit from. The obvious low hanging fruit was Twitter. I spent an afternoon whipping up this simple Python script using the Tweepy API to push out JSON in the format Universal Aggregator's maildir-put would like them to be in.

import tweepy
from email import utils
import time
import json
import click

auth = tweepy.OAuthHandler("XXX", "XXX")
auth.set_access_token("XXX", "XXX")
api = tweepy.API(auth)

@click.group()
def cli():
    pass

def make_2822_date(dt):
    tup = dt.timetuple()
    flt = time.mktime(tup)
    return utils.formatdate(flt)

def render_tweet(status):
    date = make_2822_date(status.created_at)
    references = None
    body = u'<a href="https://twitter.com/{twuser}/status/{twid}">{twuser}</a>: {twbody}'.format(
        twuser=status.user.screen_name,
        twid=status.id_str,
        twbody=status.text
    )
    if status.entities.get("media") and len(status.entities["media"]) > 0:
        for medium in (status.entities["media"]):
            body += u'<br/><img src="{twimg}"/>'.format(
                twimg=medium[u"media_url_https"]
            )
    if status.in_reply_to_status_id:
        body += u'<br/> <a href="https://twitter.com/{twuser}/status/{twid}">in reply to {twuser}</a>'.format(
            twuser=status.in_reply_to_screen_name,
            twid=status.in_reply_to_status_id_str
        )
        references = [status.in_reply_to_status_id_str]
    return {
        'author': status.author.name,
        'title': status.text,
        'id': status.user.screen_name + "_" + str(status.id),
        'date': make_2822_date(status.created_at),
        'body': body,
        'references': references,
        'authorEmail': status.user.screen_name + "@twitter.com"
    }

@cli.command()
def home():
    tweets = api.home_timeline()
    for tweet in tweets:
        print json.dumps(render_tweet(tweet))

@cli.command()
@click.option('--owner', type=str)
@click.option('--slug', type=str)
def list(owner, slug):
    tweets = api.list_timeline(owner, slug)
    for tweet in tweets:
        print json.dumps(render_tweet(tweet))

if __name__ == '__main__':
    cli()

I'm a pretty heavy user of twitter lists, it supports rendering your home timeline and any list your account has access to. Register an application on dev.twitter.com and put the credentials in and you're off to the races.

twitter_list() {
    command 1800 "python /usr/local/bin/tweets.py list --owner \\"$1\\" --slug \\"$2\\" | ua-inline | maildir-put -cache /data/ua-cache -root /data/Maildir-feeds -folder \\"$3\\""
}

twitter_home() {
    command 1800 "python /usr/local/bin/tweets.py home | ua-inline | maildir-put -cache /data/ua-cache -root /data/Maildir-feeds -folder \\"$1\\""
}

twitter_home Twitter
twitter_list rrrrrrrix artists Twitter
twitter_list rrrrrrrix not-sad-twitter Twitter
twitter_list rrrrrrrix work-peeps Twitter

Overall it's proven to be a very effective way to keep up with current events, and the weird twitter art bots that I enjoy following. Gnus's Adaptive Scoring is weighted towards things that post a lot, given there's no age-based burnoff for the scores, so accounts like @tweegeemee and @archillect which post automatically always bubble up to the top of my "inbox." I make aggressive use of Gnus Limiting to cut down on the number of posts based on date (only show me posts from less than 0.25 days ago1), or score (only show me posts that pass the muster), or by the author of the post. I get roughly 2000 tweets in to this system while I'm asleep, and I can pretty readily "catch up" to that while I'm on the bus on the way in to work, focusing on the 400-500 that Gnus thinks are relevant to me, or I can limit it further when I'm drinking my coffee at work.

This script currently lacks context of quoted tweets, isn't doing OCR on images, etc, but it's a surprisingly effective way to manage these things and actually works compared to the "In case you missed it" that Twitter provides. If I open up twitter dot com I can treat it as a slice of "current time", ignoring anything below the fold and know that the good good good #content that I desire is there waiting for me on my laptop. When I'm not on a device that doesn't have access to my Maildir, if I'm out walking or if I don't bring my laptop to work with me, I have no excuse not to pick up a book or open up my Pocket queue instead of enjoying this junk food. I uninstalled twitter and facebook from my phone, let's see how it treats me.

Attachments


  1. gnus-summary-limit-to-age only supports days, but thankfully works with floats. 💯↩︎