YJL: ‘abcdefghijklmnopqrstuvwxyz’

Nope, this is not about the alphabet song at all. It’s about how fast a Python code can generate 'abcdefghijklmnopqrstuvwxyz'.

The answer is already there, 'abcdefghijklmnopqrstuvwxyz'. Still don’t see it? Alright, let me give you a full Python code:

#!/usr/bin/env python
# A program demonstrates how to generate an 'abcdefghijklmnopqrstuvwxyz' QUICK.


atoz = 'abcdefghijklmnopqrstuvwxyz'

That’s the complete code, get it?

You might ask “Seriously, you are writing a post about this?” Yes, I’m.

I found many people have been trying to generate such string in different ways, more Pythonic, looks like smart or genius, or looks like hell. But sometimes, the direct approach is the most simple way and easy way to understand, not only by the programmer but also by people who don’t know about programming. The following list is what I saw from Internet, there must be more.

atoz = 'abcdefghijklmnopqrstuvwxyz'
from string import ascii_lowercase
atoz = map(chr, xrange(97, 123))
atoz = map(chr, xrange(ord('a'), ord('z') + 1))
atoz = map(chr, range(97, 123))
atoz = [chr(i) for i in xrange(97, 123)]

Which one is the best in your mind?

Contents

1 Runtime Profiling
2 Memory Profiling
3 Conclusion

1 Runtime Profiling

I did some profiling, yep, I did. Here is the code for profiling:

#!/usr/bin/env python

import sys
import timeit

def main():

  m = [
      ("atoz = 'abcdefghijklmnopqrstuvwxyz'", '', 0),
      ("from string import ascii_lowercase", '', 0),
      ("atoz = map(chr, xrange(97, 123))", '', 2),
      ("atoz = map(chr, xrange(ord('a'), ord('z') + 1))", '', 2),
      ("atoz = list(map(chr, range(97, 123)))", '', 3),
      ("atoz = list(map(chr, range(ord('a'), ord('z') + 1)))", '', 3),
      ("atoz = [chr(i) for i in xrange(97, 123)]", '', 2),
      ("atoz = [chr(i) for i in range(97, 123)]", '', 3),
      ('atoz[13]', "atoz = 'abcdefghijklmnopqrstuvwxyz'", 0),
      ('atoz[13]', "atoz = map(chr, xrange(97, 123))", 2),
      ('atoz[13]', "atoz = list(map(chr, range(97, 123)))", 3),
      ]
  max_len = max(map(lambda x: len(x[0]), m))

  for i in range(len(m)):
    stat, setup, v = m[i]
    if v != 0 and v != sys.version_info[0]:
      continue
    if setup:
      sys.stdout.write('%s\n' % setup)
    else:
      setup = 'pass'
    sys.stdout.write(('%%-%ds -> ' % max_len) % stat)
    sys.stdout.write('%12.6f us\n\n' % (min(timeit.Timer(stat, setup).repeat(10000, 100)) * 1000000))

if __name__ == '__main__':
  main()

At first, I used cProfile, but it only shows to milliseconds level, so I switched to timeit. Each one is ran for 10,000 sessions and each session accumulate 10 runs of the code, then pick up the smallest runtime as result. Which represent how fast the code might be able to be executed. But there is a catch, it might actually be run faster than that just timeit could not tell because timer’s precision.

Here is the results:

CODE                                                       Python 2.5.4    Python 2.6.5    Python 3.1.2
atoz = 'abcdefghijklmnopqrstuvwxyz'                  ->     4.768372 us     4.768372 us     3.814697 us

from string import ascii_lowercase                   ->   186.920166 us   177.860260 us   258.922577 us

atoz = map(chr, xrange(97, 123))                     ->   584.840775 us   560.998917 us   vvvvvvvvvvvvv
atoz = map(chr, range(97, 123))                      ->   618.934631 us   577.926636 us   vvvvvvvvvvvvv
atoz = list(map(chr, range(97, 123)))                ->   ^^^^^^^^^^^^^   ^^^^^^^^^^^^^   775.814056 us

atoz = map(chr, xrange(ord('a'), ord('z') + 1))      ->   609.874725 us   594.854355 us   vvvvvvvvvvvvv
atoz = list(map(chr, range(ord('a'), ord('z') + 1))) ->   ^^^^^^^^^^^^^   ^^^^^^^^^^^^^   811.100006 us

atoz = [chr(i) for i in xrange(97, 123)]             ->   942.945480 us   875.949860 us   vvvvvvvvvvvvv
atoz = [chr(i) for i in range(97, 123)]              ->   ^^^^^^^^^^^^^   ^^^^^^^^^^^^^   946.044922 us

atoz = 'abcdefghijklmnopqrstuvwxyz'
atoz[13]                                             ->    12.874603 us     9.775162 us     7.867813 us

atoz = map(chr, xrange(97, 123))
atoz[13]                                             ->    10.967255 us     7.867813 us     vvvvvvvvvvv
atoz = list(map(chr, range(97, 123)))
atoz[13]                                             ->    ^^^^^^^^^^^^     ^^^^^^^^^^^     7.867813 us

As you can see, using constant from string module, map, and list comprehensions are very slow. Yes, it’s one-time setup but the direct approach is also the one-time setup. And those, except the ascii_lowercase, would take everyone sometime to read it and understand.

2 Memory Profiling

I also did a memory profiling using this code with Guppy-PE:

#!/usr/bin/env python

from guppy import hpy

hp = hpy()

hp.setrelheap()
atoz = 'abcdefghijklmnopqrstuvwxyz'
print hp.heap()
print

hp.setrelheap()
from string import ascii_lowercase
print hp.heap()
print

hp.setrelheap()
print hp.heap()
atoz = map(chr, xrange(97, 123))
print

hp.setrelheap()
atoz = map(chr, xrange(ord('a'), ord('z') + 1))
print hp.heap()
print

hp.setrelheap()
atoz = map(chr, range(97, 123))
print hp.heap()
print

hp.setrelheap()
atoz = [chr(i) for i in xrange(97, 123)]
print hp.heap()
print

The results:

=== Python 2.5.4 ===

Partition of a set of 1 object. Total size = 656 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100      656 100       656 100 types.FrameType

Partition of a set of 1 object. Total size = 560 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100      560 100       560 100 types.FrameType

Partition of a set of 1 object. Total size = 656 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100      656 100       656 100 types.FrameType

Partition of a set of 2 objects. Total size = 888 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  50      560  63       560  63 types.FrameType
     1      1  50      328  37       888 100 list

Partition of a set of 2 objects. Total size = 984 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  50      656  67       656  67 types.FrameType
     1      1  50      328  33       984 100 list

Partition of a set of 2 objects. Total size = 888 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  50      560  63       560  63 types.FrameType
     1      1  50      328  37       888 100 list

=== Python 2.6.5 ===

Partition of a set of 1 object. Total size = 448 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100      448 100       448 100 types.FrameType

Partition of a set of 1 object. Total size = 448 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100      448 100       448 100 types.FrameType

Partition of a set of 1 object. Total size = 448 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1 100      448 100       448 100 types.FrameType

Partition of a set of 2 objects. Total size = 776 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  50      448  58       448  58 types.FrameType
     1      1  50      328  42       776 100 list

Partition of a set of 2 objects. Total size = 776 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  50      448  58       448  58 types.FrameType
     1      1  50      328  42       776 100 list

Partition of a set of 2 objects. Total size = 776 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0      1  50      448  58       448  58 types.FrameType
     1      1  50      328  42       776 100 list

Guppy isn’t compatible with Python 3, hence there is no result of Python 3.1.2.

Those use map or list comprehension would result a list and required more memory as expected, it doesn’t really affect how you use it. Usually you just access it like atoz[10], it works for list and string types. But memory use tells you that string type uses less memory, however, if you notice the runtime result above, you would have seen accessing list element is faster than substring of a string.

3 Conclusion

My conclusion is wgasa.

YJL

‘abcdefghijklmnopqrstuvwxyz’

1 Runtime Profiling

2 Memory Profiling

3 Conclusion

0 comments:

Post a Comment