I wanted to plot posting activity by month in bar chart. I was thinking to use Blogger API, but I didn’t want to write JavaScript (prefer JavasScript if using API) or request multiple times if there are more than 500 posts. So, I decided to do it using exported XML file.
Here is the current stat of this blog:
% ./b.py blog-01-11-2011.xml 2008-09 18 ################### 2008-10 25 ########################## 2008-11 57 ############################################################ 2008-12 51 ###################################################### 2009-01 32 ################################## 2009-02 13 ############# 2009-03 27 ############################ 2009-04 41 ########################################### 2009-05 14 ############## 2009-06 1 # 2009-07 1 # 2009-08 1 # 2009-09 1 # 2009-10 27 ############################ 2009-11 18 ################### 2009-12 16 ################# 2010-01 16 ################# 2010-02 3 ### 2010-03 2 ## 2010-04 9 ######### 2010-05 19 #################### 2010-06 2 ## 2010-07 1 # 2010-08 53 ######################################################## 2010-09 62 ################################################################## 2010-10 8 ######## 2010-11 51 ###################################################### 2010-12 10 ########## 2011-01 3 ### Total: 582 posts % ./b.py blog-01-11-2011.xml comment 2008-09 1 # 2008-10 6 ###### 2008-11 6 ###### 2008-12 38 ###################################### 2009-01 23 ####################### 2009-02 8 ######## 2009-03 16 ################ 2009-04 10 ########## 2009-05 3 ### 2009-06 3 ### 2009-07 1 # 2009-08 2 ## 2009-09 0 2009-10 6 ###### 2009-11 9 ######### 2009-12 8 ######## 2010-01 3 ### 2010-02 4 #### 2010-03 0 2010-04 1 # 2010-05 0 2010-06 0 2010-07 0 2010-08 0 2010-09 66 ################################################################## 2010-10 4 #### 2010-11 12 ############ 2010-12 9 ######### Total: 239 comments
The Python code:
#!/usr/bin/env python # ./script.py export.xml [<post|comment> [$(tput cols)]] from pyquery import PyQuery as pq import sys kind = 'post' if len(sys.argv) < 3 else sys.argv[2] width = (78 if len(sys.argv) < 4 else int(sys.argv[3])) - 12 with open(sys.argv[1]) as f: b = pq(f.read().replace('feed xmlns=', 'feed xmlblahblahblah='), parser='xml') entries = b('entry') m_count = {} for idx in range(len(entries)): if entries.eq(idx)('category[scheme$=kind]').eq(0).attr('term').split('#')[1] != kind: continue month = entries.eq(idx)('published').text()[:7] if month not in m_count: m_count[month] = 0 m_count[month] += 1 m_count_keys = m_count.keys() m_min = min(m_count_keys).split('-') m_max = max(m_count_keys).split('-') min_year, min_month = int(m_min[0]), int(m_min[1]) max_year, max_month = int(m_max[0]), int(m_max[1]) del m_count_keys, m_min, m_max m_count_values = m_count.values() max_count, min_count = max(m_count_values), min(m_count_values) total_count = sum(m_count_values) del m_count_values for year in range(min_year, max_year+1): for month in range(1, 12+1): if year == min_year and month < min_month: continue if year == max_year and month > max_month: break key = '%d-%02d' % (year, month) count = m_count[key] if key in m_count else 0 print '%s %3d %s' % (key, count, '#'*(width*count/max_count)) print print 'Total: %4d %ss' % (total_count, kind)
It requires pyquery.
It can be done without pyquery, but it would be real pain in ass for me. I ran into with XML namespace problem, there is a related bug report.
The script must be supplied with the XML file and you can specify second argument of what kind of entry you want to plot, which should be post or comment only. The third argument will be the text width. The default value is 78, you can use $(tput cols) or $COLUMNS to get current terminal text width if available.
I decided to use textual chart, it’s good enough for me to see the results.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.