I wanted to plot posting activity by month in bar chart. I was thinking to use Blogger API, but I didn’t want to write JavaScript (prefer JavasScript if using API) or request multiple times if there are more than 500 posts. So, I decided to do it using exported XML file.

Here is the current stat of this blog:

% ./b.py blog-01-11-2011.xml
2008-09  18 ###################
2008-10  25 ##########################
2008-11  57 ############################################################
2008-12  51 ######################################################
2009-01  32 ##################################
2009-02  13 #############
2009-03  27 ############################
2009-04  41 ###########################################
2009-05  14 ##############
2009-06   1 #
2009-07   1 #
2009-08   1 #
2009-09   1 #
2009-10  27 ############################
2009-11  18 ###################
2009-12  16 #################
2010-01  16 #################
2010-02   3 ###
2010-03   2 ##
2010-04   9 #########
2010-05  19 ####################
2010-06   2 ##
2010-07   1 #
2010-08  53 ########################################################
2010-09  62 ##################################################################
2010-10   8 ########
2010-11  51 ######################################################
2010-12  10 ##########
2011-01   3 ###

Total:  582 posts

% ./b.py blog-01-11-2011.xml comment
2008-09   1 #
2008-10   6 ######
2008-11   6 ######
2008-12  38 ######################################
2009-01  23 #######################
2009-02   8 ########
2009-03  16 ################
2009-04  10 ##########
2009-05   3 ###
2009-06   3 ###
2009-07   1 #
2009-08   2 ##
2009-09   0
2009-10   6 ######
2009-11   9 #########
2009-12   8 ########
2010-01   3 ###
2010-02   4 ####
2010-03   0
2010-04   1 #
2010-05   0
2010-06   0
2010-07   0
2010-08   0
2010-09  66 ##################################################################
2010-10   4 ####
2010-11  12 ############
2010-12   9 #########

Total:  239 comments

The Python code:

#!/usr/bin/env python
# ./script.py export.xml [<post|comment> [$(tput cols)]]

from pyquery import PyQuery as pq
import sys

kind = 'post' if len(sys.argv) < 3 else sys.argv[2]
width = (78 if len(sys.argv) < 4 else int(sys.argv[3])) - 12

with open(sys.argv[1]) as f:
  b = pq(f.read().replace('feed xmlns=', 'feed xmlblahblahblah='), parser='xml')

entries = b('entry')
m_count = {}
for idx in range(len(entries)):
  if entries.eq(idx)('category[scheme$=kind]').eq(0).attr('term').split('#')[1] != kind:
    continue

  month = entries.eq(idx)('published').text()[:7]
  if month not in m_count:
    m_count[month] = 0
  m_count[month] += 1

m_count_keys = m_count.keys()
m_min = min(m_count_keys).split('-')
m_max = max(m_count_keys).split('-')
min_year, min_month = int(m_min[0]), int(m_min[1])
max_year, max_month = int(m_max[0]), int(m_max[1])
del m_count_keys, m_min, m_max

m_count_values = m_count.values()
max_count, min_count = max(m_count_values), min(m_count_values)
total_count = sum(m_count_values)
del m_count_values

for year in range(min_year, max_year+1):
  for month in range(1, 12+1):
    if year == min_year and month < min_month:
      continue
    if year == max_year and month > max_month:
      break
    key = '%d-%02d' % (year, month)
    count = m_count[key] if key in m_count else 0
    print '%s %3d %s' % (key, count, '#'*(width*count/max_count))
print
print 'Total: %4d %ss' % (total_count, kind)

It requires pyquery.

It can be done without pyquery, but it would be real pain in ass for me. I ran into with XML namespace problem, there is a related bug report.

The script must be supplied with the XML file and you can specify second argument of what kind of entry you want to plot, which should be post or comment only. The third argument will be the text width. The default value is 78, you can use $(tput cols) or $COLUMNS to get current terminal text width if available.

I decided to use textual chart, it’s good enough for me to see the results.