Since FriendFeed was bought by Fackbook, it’s doomed for itself and users. Yes, they still have a team to maintain and fix issues, but it has become a zombie I would say. I stayed until two months ago, I removed all services I had added and removed it from yjl.im. As I see it as a zombie, it continued grabbing my stuff from a few sources for a few weeks. I didn’t report because I didn’t care.
Few days ago, I decided to remove some entries. I have wanted to do that for long time after I saw many links from FriendFeed (to this blog) reported in Webmaster Tools. Surely, I would have those link, I added my blog to FriendFeed. It’s same reason why often don’t link to my blog post in screenshots I upload on Flickr. I feel I was spamming myself. Time to fix it.
I am not trying to remove all entries but only those without being commented on or liked, I also don’t remove the FriendFeed entries. For example, you write an entry or upload files or images directly on FriendFeed. I want to keep everything original intact. Blog entries, YouTube favorites, Last.fm favorites, etc., they will be still intact in source websites as they were after I remove those entries from my FriendFeed account. But likes and comments on FriendFeed are original stuff, so I keep entries which have those.
Unfortunately, there is an API rate limit for deletion (or write permission). Of course I use API, you don’t expect me to delete 9,430 entries (10,091 entries in total in my account) by mouse clicks, do you? I don’t know exact rate, it seems to be 100 requests per a few hours or per a day, I am not sure. Conservatively speaking, it’s a 100-day job. I wrote an email to API team for detail about rate limit but I haven’t got an response.
Since this is a one-time script, I just post the code here.
#!/usr/bin/env python import datetime import getpass import re import shelve import sys import time from urllib2 import HTTPError from friendfeed import FriendFeed NUM = 100 DELETION_INTERVAL = 30 RE_TITLE = re.compile('(.*) - <a rel.*') def print_eta(n, extra=0, est_only=False): return eta = 3600 * n / 100 + extra est = datetime.datetime.now() + datetime.timedelta(seconds=eta) if est_only: print '[%s]' % est else: print 'Estimated time to complete: %d seconds, at %s' % (eta, est) def main(): ff = FriendFeed() nickname = raw_input('Your FriendFeed Nickname: ') data = shelve.open('%s.data' % nickname) if 'start' not in data: start = 0 else: start = data['start'] if start == -1: # Finish retrieving entries entries = data['entries'] marked = len([True for v in entries.values() if v[0]]) total = len(entries) del_queue = [entry for entry, value in entries.items() if value[0] and not value[1]] print '%d out of %d entries marked for deletion.' % (marked, total) print '%d deleted, %d left to delete.' % (marked - len(del_queue), len(del_queue)) print if not del_queue: return print 'You can find your Remote Key at http://friendfeed.com/remotekey' print remote_key = getpass.getpass('Please enter your remote key [no echo]: ') ff = FriendFeed(nickname, remote_key) print print_eta(len(del_queue), extra=5) print 'Starting deletion (every %d seconds a request) in 5 seconds...' % DELETION_INTERVAL print time.sleep(5) del_count = 0 try: while del_count < len(del_queue): e_id = del_queue[del_count] try: result = ff._fetch('/api/entry/delete', {'entry': e_id}) except HTTPError, e: data['entries'] = entries data.sync() if e.code == 403 and 'limit-exceeded' in e.read(): print print 'Failed to delete [%s], reached the rate limit.' % e_id print_eta(len(del_queue) - del_count, extra=10*60) print 'Sleeping for 10 minutes...' time.sleep(10 * 60) print continue raise e if result['success']: entries[e_id] = (True, True) else: print print 'Failed to delete [%s]: ' % e_id, result print 'Continue, anyway.' sys.stdout.write('#') sys.stdout.flush() del_count += 1 if del_count % 50 == 0 or del_count == len(del_queue): sys.stdout.write(' %d \n' % del_count) print_eta(len(del_queue) - del_count, extra=10*60) data['entries'] = entries data.sync() time.sleep(DELETION_INTERVAL) except Exception, e: data['entries'] = entries data.sync() raise e print 'Done.' else: entries = data.get('entries', {}); # Retrieving entries while True: feed = ff.fetch_user_feed(nickname, start=start, num=NUM, maxcomments=1, maxlikes=1, hidden=1) ids = [entry['id'] for entry in feed['entries']] for e_id in ids: if e_id not in entries: break else: # All already in entries print 'Retrieval is done.' break for entry in feed['entries']: if entry['id'] in entries: continue if entry['service']['id'] == 'internal': # Don't delete FriendFeed stuff entries[entry['id']] = (False, False) #elif 'comments' not in entry and 'likes' not in entry: elif len(entry['comments']) + len(entry['likes']) == 0: entries[entry['id']] = (True, False) else: entries[entry['id']] = (False, False) print 'start=%d, entries=%d' % (start, len(entries)) start += NUM data['entries'] = entries data['start'] = start data.sync() time.sleep(5) data['start'] = -1 data.sync() marked = len([True for v in entries.values() if v[0]]) total = len(entries) print '%d out of %d entries marked for deletion.' % (marked, total) data.close() if __name__ == '__main__': main()
(I silence print_eta(). Because I don’t have exact rate limit information, therefore I could not give an ETA.)
It’s a two-stage design. First run is to collect entries and to mark entries should be deleted, you will only be asked for FriendFeed nickname. The collected data will be stored in nickname.data file.
% ./remove_lonely_entries.py Your FriendFeed Username: livibetter start=0, entries=100 start=100, entries=200 start=200, entries=300 start=300, entries=400 [...] start=9600, entries=9691 start=9700, entries=9791 start=9800, entries=9891 start=9900, entries=9991 start=10000, entries=10091 Retrieval is done. 9430 out of 10091 entries marked for deletion.
The second stage is to delete entries, you will also be asked for remote key. I still use API v1, API v2 uses OAuth and I am not sure if it supports three-legged OAuth. No need to trouble myself to that. Once you enter the key, the deletion will be started in five seconds. It will send a deletion request every 30 seconds. If it gets a rate limit exceeded response, it sleeps for 10 minutes, then try again.
It should be safe to interrupt this script anytime you want, it will pick up where you force it to leave. There is no option to tell this script which stage to perform, it knows. Simply run the script without options and follow the instruction, it will be fine.
Meh, I still have 8,887 entries to delete!
It gives these errors. What should I do to run it?
ReplyDeleteSomething looks wrong with json in your Python 2.7. I wasn't running it with Python 2.7, could you try 2.6 with simplejson?
ReplyDeleteIt's working with 2.6 and simplejson. Thank you.
ReplyDeleteLast question: How should I edit script to set it for delete all entries?
I am guessing you are not familiar with Python programming, so the easiest way is to modify two lines are:
ReplyDeleteentries[entry['id']] = (False, False)
to
entries[entry['id']] = (True, False)