Printing unicode characters in Python strings

| categories: unicode, python | tags:

Are you tired of printing strings like this:

print 'The volume is {0} Angstrom^3'.format(125)
The volume is 125 Angstrom^3

Wish you could get Å in your string? That is the unicode character U+212B. We can get that to print in Python, but we have to create it in a unicode string, and print the string properly encoded. Let us try it out.

print u'\u212B'.encode('utf-8')

We use u'' to indicate a unicode string. Note we have to encode the string to print it, or will get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u212b' in position 0: ordinal not in range(128)

Do more, do more, we wish we could! Unicode also supports some superscripted and subscripted numbers (http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts ). Let us see that in action.

print u'\u212B\u00B3'.encode('utf-8')
ų

Pretty sweet. The code is not all that readable if you aren't fluent in unicode, but if it was buried in some library it would just print something nice looking. We can use this to print chemical formulas too.

print u'''The chemical formula of water is H\u2082O.
Water dissociates into H\u207A and OH\u207B'''.encode('utf-8')

=The chemical formula of water is H₂O. Water dissociates into H⁺ and OH⁻

There are other encodings too. See the symbols here: http://en.wikipedia.org/wiki/Number_Forms

print u'1/4 or \u00BC'.encode('latin-1')
1/4 or ¼

That seems like:

print u'A good idea\u00AE'.encode('latin-1')
A good idea®

I can not tell how you know exactly what encoding to use. If you use utf-8 in the example above, you get a stray character in front of the desired trademark symbol. Still, it is interesting you can get prettier symbols!

Copyright (C) 2014 by John Kitchin. See the License for information about copying.

org-mode source

Org-mode version = 8.2.5g

Discuss on Twitter