Delete old tweets selectively using Python and Tweepy

For some time I’ve used an online service to delete tweets that are more than one week old. I do this because I use Twitter for levity, for throwaway comments and retweets on issues of the day, and I don’t really want those saved for posterity. Thanks to search crawlers and caches I can never be certain that tweets are gone forever, but this is a small step in that direction.

When I joined Keybase I discovered that I needed to prevent my ‘proof’ tweet from being deleted, and the simple method used by the online deletion service was no longer an option. My solution uses an exception list containing the IDs of the tweets I wish to save, and these are ignored when their contemporaries are merged with the infinite.

I’ve written a Python script that uses Tweepy to scan the contents of my timeline and delete any tweet that meets two criteria – more than seven days old and not in my exception list. It’s very simple, there are probably better ways of doing it (please let me know), but it works well for me as a nightly cron job.

Please note that since I’ve been deleting my old tweets this way for some time I’ve never had issues with the Twitter API rate limits. Every deletion is an API call, so if you have many tweets you may need to consider initially limiting the number returned via the .items() method. This is demonstrated in the Tweepy cursor tutorial.

To get the required authentication keys you will need to register a Twitter application.

Update

Since my initial post I’ve added functionality to unfavor (or ‘unfavorite’) tweets, too. I’ve included the full script below.

#!/usr/bin/env python

import tweepy
from datetime import datetime, timedelta

# options
test_mode = False
verbose = False
delete_tweets = True
delete_favs = True
days_to_keep = 7

tweets_to_save = [
    573245340398170114, # keybase proof
    573395137637662721, # a tweet to this very post
]
favs_to_save = [
    362469775730946048, # tony this is icac
]

# auth and api
consumer_key = 'XXXXXXXX'
consumer_secret = 'XXXXXXXX'
access_token = 'XXXXXXXX'
access_token_secret = 'XXXXXXXX'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

# set cutoff date, use utc to match twitter
cutoff_date = datetime.utcnow() - timedelta(days=days_to_keep)

# delete old tweets
if delete_tweets:
    # get all timeline tweets
    print "Retrieving timeline tweets"
    timeline = tweepy.Cursor(api.user_timeline).items()
    deletion_count = 0
    ignored_count = 0

    for tweet in timeline:
        # where tweets are not in save list and older than cutoff date
        if tweet.id not in tweets_to_save and tweet.created_at < cutoff_date:
            if verbose:
                print "Deleting %d: [%s] %s" % (tweet.id, tweet.created_at, tweet.text)
            if not test_mode:
                api.destroy_status(tweet.id)
            
            deletion_count += 1
        else:
            ignored_count += 1

    print "Deleted %d tweets, ignored %d" % (deletion_count, ignored_count)
else:
    print "Not deleting tweets"
    
# unfavor old favorites
if delete_favs:
    # get all favorites
    print "Retrieving favorite tweets"
    favorites = tweepy.Cursor(api.favorites).items()
    unfav_count = 0
    kept_count = 0

    for tweet in favorites:
        # where tweets are not in save list and older than cutoff date
        if tweet.id not in favs_to_save and tweet.created_at < cutoff_date:
            if verbose:
                print "Unfavoring %d: [%s] %s" % (tweet.id, tweet.created_at, tweet.text)
            if not test_mode:
                api.destroy_favorite(tweet.id)
            
            unfav_count += 1
        else:
            kept_count += 1

    print "Unfavored %d tweets, ignored %d" % (unfav_count, kept_count)
else:
    print "Not unfavoring tweets"

Favourite songs of 2014

My top thirty tracks of the year. Starts with my top ten, the rest are ordered by (Spotify) track length.

Spotify playlist
YouTube playlist

“I’m wearing Win Butler’s hair
There’s a scalpless singer of a Montreal rock band somewhere
And he’s all right”

  1. Jessica Lea Mayfield – Do I Have The Time
  2. Röyksopp & Robyn – Monument (The Inevitable End Version)
  3. Happyness – Montreal Rock Band Somewhere
  4. St. Vincent – Birth In Reverse
  5. La Roux – Kiss And Not Tell
  6. Bombay Bicycle Club – Luna
  7. Alvvays – Archie, Marry Me
  8. Eleanor Dunlop – Disguise
  9. The Preatures – Better Than It Ever Could Be
  10. Bertie Blackman – War Of One
  • Bad//Dreems – Dumb Ideas
  • The Bohicas – XXX
  • SBTRKT – NEW DORP. NEW YORK
  • Broods – Mother & Father
  • East – Your Ghost
  • St. Vincent – Digital Witness
  • The Griswolds – Beware The Dog
  • The Preatures – Somebody’s Talking
  • Highasakite – Darth Vader
  • Ecca Vandal – White Flag
  • Kimbra – 90s Music
  • First Aid Kit – My Silver Lining
  • Jack White – Lazaretto
  • CHVRCHES – Bela Lugosi’s Dead
  • The Babe Rainbow – Secret Enchanted Broccoli Forest
  • Superfood – Right On Satellite
  • Interpol – All The Rage Back Home
  • City Calm Down – Pavement
  • Glass Animals – Pools
  • Lana Del Rey – Shades Of Cool

Honourable mention:

My top thirty are those I can play over and over again, and although this one doesn’t qualify on that count it’s the most entertaining song of 2014. And it did us a favour by allowing us to listen to the catchy tune of the original without being subjected to its lyrics.

Use Getflix or Unblock-Us servers selectively with Dnsmasq

I subscribe to Getflix, which is quite similar to Unblock-Us in that it allows users to access geo-blocked content. The basic method to use these services is to set one’s device to use their provided DNS servers, but this sends all DNS requests their way. I wanted only to use their DNS servers to resolve specific geo-blocked URLs.

There are a couple of reasons you might want to do this – you may be concerned about yet another party being privy to your site visits, and in my case I wanted to retain the faster, closer DNS servers provided by my ISP for the majority of my web requests.

Dnsmasq is present in several flavours of custom firmware available for many consumer routers, but since that was unavailable to me I have set it up on my NAS, which runs the Ubuntu-server linux distro. There are many guides for setting up Dnsmasq on many systems (for me it was as easy as “sudo apt-get install dnsmasq”), so I’ll just stick to explaining why I’ve configured it as I have.

Here is my Dnsmasq configuration file. Much of this isn’t necessary for this goal but I’ve kept it intact for context. I’ll go through why I’ve made certain decisions and it may help someone else.

# /etc/dnsmasq.conf

# regular dns servers (IPs redacted)
server=x.x.x.x
server=x.x.x.x
server=x.x.x.x
server=x.x.x.x

# getflix primary dns
server=/getflix.com.au/54.252.183.4
server=/netflix.com/54.252.183.4
server=/watchafl.afl.com.au/rightster.com/54.252.183.4
server=/hulu.com/a248.e.akamai.net/54.252.183.4
server=/pbs.org/54.252.183.4
server=/bbc.co.uk/cp143012-i.akamaihd.net/54.252.183.4
server=/itv.com/54.252.183.4
server=/channel4.com/54.252.183.4

# getflix secondary dns
server=/getflix.com.au/54.252.183.5
server=/netflix.com/54.252.183.5
server=/watchafl.afl.com.au/rightster.com/54.252.183.5
server=/hulu.com/a248.e.akamai.net/54.252.183.5
server=/pbs.org/54.252.183.5
server=/bbc.co.uk/cp143012-i.akamaihd.net/54.252.183.5
server=/itv.com/54.252.183.5
server=/channel4.com/54.252.183.5

# settings
interface=em1       # accept requests from the em1 interface
bogus-priv          # don't forward non-routable (local) addresses
domain-needed       # don't forward incomplete hostnames (names without dots)
no-resolv           # don't read /etc/resolv.conf to get upstream servers
all-servers         # use all servers, use the first returned
#strict-order       # query servers in the order they appear
domain=local        # set the domain name of this network
local=/local/       # set selected domains to only resolve locally
expand-hosts        # add our domain name to our local hostnames
cache-size=10000    # increase the cache to 10k records
no-hosts            # don't use the regular hosts file
addn-hosts=/etc/dnsmasq.hosts   # use alternate hosts file

# dhcp: set range, netmask and lease time for unidentified clients
dhcp-range=192.168.1.100,192.168.1.199,255.255.255.0,168h
read-ethers                     # read the /etc/ethers file for static assignment
dhcp-option=3,192.168.1.1       # set the gateway (router)

# logging
log-facility=/var/log/dnsmasq   # log file
#log-queries                    # log dns queries
#log-dhcp                       # log dhcp activity

# disable a bunch of windows stuff
filterwin2k                     # block certain unnecessary windows requests
dhcp-option=19,0                # set ip-forwarding off
dhcp-option=44,0.0.0.0          # set netbios-over-TCP/IP (WINS) nameserver(s)
dhcp-option=45,0.0.0.0          # netbios datagram distribution server
dhcp-option=46,8                # netbios node type
dhcp-option=252,"\n"            # tell windows not to ask for proxy info
dhcp-option=vendor:MSFT,2,1i    # tell windows to release lease on shutdown

The upstream DNS servers have been selected by their speed from my location (according to namebench). Farther down I’ve also set the “all-servers” flag, which means that every request I make is resolved by each server that I’ve configured, and the first response is accepted. Like this fellow, I found that it resulted in a tremendous resolution speed increase. This is a terrible setting for a big network to use because of the increased traffic, but since I’m just a home user and since I’m caching my requests it’s not such a big deal. Were I not using this I might have gone for the “strict-order” option, to ensure that the faster servers I’ve listed at the top are tried first.

The Getflix server block defines which URLs are to be resolved via the Getflix servers, using some domains I found here, plus a few more that they hadn’t updated at the time of writing. Each server line is saying that for each of these addresses, use this DNS server to resolve it. I could have put all of them on one line, but preferred to separate them according to the service being accessed. I have repeated this whole block for the secondary Getflix DNS server.

I’ve commented the settings but it’s worth mentioning a few. I have specified the interface to listen on even though there’s only the one point of entry on my network. Recent versions of Dnsmasq block all traffic if nothing is specified here, which is the opposite to its previous behaviour.

I’ve specified that Dnsmasq is not to read nameservers from the /etc/resolv.conf file and not to read hostnames from the /etc/hosts file. Both of these are used by the system for other purposes as well, and I wanted to keep Dnsmasq ‘clean’. I’ve specified my own hosts file specifically for Dnsmasq instead. It looks something like this:

# /etc/dnsmasq.hosts
192.168.1.1     red
192.168.1.2     green
192.168.1.10    blue
192.168.1.20    yellow
192.168.1.30    purple

Dnsmasq is also being used as a DHCP server, so I’m specifying my gateway (the router) and an IP range to be used for unidentified clients. This includes a subnet value, which is required because my router is a DHCP relay. Thanks to the “read-ethers” option I can specify clients requiring static IPs in the /etc/ethers file, which looks a little like this:

# /etc/ethers
28:91:4a:2b:0b:21 192.168.1.2
20:c7:d0:9b:db:7f 192.168.1.10
08:50:6e:e7:3e:95 192.168.1.20
84:35:35:3f:15:78 192.168.1.30

While troubleshooting my setup I was logging DHCP and DNS activity on top of the standard Dnsmasq reporting, but I’ve turned both off now. The final block of the config turns off a bunch of stuff related to Windows clients, which I do have, but my network is so small that they are pointless overheads.

That’s about it! Let me know if you have any questions about my configuration, or if you can help me improve upon it. My thanks to these articles, which pointed me in the right direction:

Updates

17 May 2014: Since posting this I’ve changed router, and the new one doesn’t support DHCP relaying. So I’m now doing DHCP on the router itself and am simply using Dnsmasq for DNS. I have commented out all of the DHCP lines in /etc/dnsmasq.conf and therefore no longer use /etc/ethers, but everything still works as before.