python Programming Glossary: unicodedata

Convert Unicode to String in Python (containing extra symbols)

http://stackoverflow.com/questions/1207457/convert-unicode-to-string-in-python-containing-extra-symbols

u Kl羹ft skr瓣ms inf繹r p疇 f矇d矇ral 矇lectoral gro e import unicodedata unicodedata.normalize 'NFKD' title .encode 'ascii' 'ignore'.. skr瓣ms inf繹r p疇 f矇d矇ral 矇lectoral gro e import unicodedata unicodedata.normalize 'NFKD' title .encode 'ascii' 'ignore' 'Kluft skrams..

latin-1 to ascii

http://stackoverflow.com/questions/1382998/latin-1-to-ascii

in other answers # coding utf 8 import codecs import unicodedata x u Wikip矇dia le projet d 羴ncyclop矇die xtd ord u' u ' ord u'矇'.. def ae return x.encode 'ascii' 'asciify' def ud return unicodedata.normalize 'NFKD' x .encode 'ASCII' 'ignore' def tr return x.translate.. le projet d'encyclopedie showing clearly that the unicodedata based approach while it does have the convenience of not needing..

How to implement Unicode string matching by folding in python

http://stackoverflow.com/questions/1410308/how-to-implement-unicode-string-matching-by-folding-in-python

it is much nicer to edit that way. # encoding UTF 8 import unicodedata from unicodedata import normalize category def _folditems _folding_table.. to edit that way. # encoding UTF 8 import unicodedata from unicodedata import normalize category def _folditems _folding_table # general.. the accents def strip_accents s return ''.join c for c in unicodedata.normalize 'NFD' unicode s if unicodedata.category c 'Mn' strip_accents..

Character reading from file in Python

http://stackoverflow.com/questions/147741/character-reading-from-file-in-python

few special cases such as this particular example Use the unicodedata module's normalize and the string.encode method to convert as.. ascii using python teststr u'I don xe2 x80 x98t like this' unicodedata.normalize 'NFKD' teststr .encode 'ascii' 'ignore' 'I donat like..

How do I convert a file's format from Unicode to ASCII using Python?

http://stackoverflow.com/questions/175240/how-do-i-convert-a-files-format-from-unicode-to-ascii-using-python

a straight ASCII equivalent. This blog recommends the unicodedata module which seems to take care of roughly converting characters.. lectoral groe which is pretty wrong. However using the unicodedata module the result can be much closer to the original text import.. the result can be much closer to the original text import unicodedata unicodedata.normalize 'NFKD' title .encode 'ascii' 'ignore'..

SQLite, python, unicode, and non-utf data

http://stackoverflow.com/questions/2392732/sqlite-python-unicode-and-non-utf-data

'籀' from latin 1 to utf 8 and not mangle it repr and unicodedata.name are your friends when it comes to debugging such problems.. ' xf3' print repr oacute_unicode u' xf3' import unicodedata unicodedata.name oacute_unicode 'LATIN SMALL LETTER O WITH ACUTE'.. xf3' print repr oacute_unicode u' xf3' import unicodedata unicodedata.name oacute_unicode 'LATIN SMALL LETTER O WITH ACUTE' print..

Why does Python print unicode characters when the default encoding is ASCII?

http://stackoverflow.com/questions/2596714/why-does-python-print-unicode-characters-when-the-default-encoding-is-ascii

and is just sent to the terminal. On my system import unicodedata as ud import sys sys.stdout.encoding 'cp437' ud.name u' xe9'.. ' xe9'.decode 'cp437' 'GREEK CAPITAL LETTER THETA' import unicodedata as ud ud.name u' xe9' 'LATIN SMALL LETTER E WITH ACUTE' ' xe9'.decode..

How to read Unicode input and compare Unicode strings in Python?

http://stackoverflow.com/questions/477061/how-to-read-unicode-input-and-compare-unicode-strings-in-python

礙tre 礙tre print a1 a2 False So you might want to use the unicodedata.normalize method import unicodedata as ud ud.normalize 'NFC'.. might want to use the unicodedata.normalize method import unicodedata as ud ud.normalize 'NFC' a1 u' xeatre' ud.normalize 'NFC' a2..

In Python, how to list all characters matched by POSIX extended regex `[:space:]`?

http://stackoverflow.com/questions/8921365/in-python-how-to-list-all-characters-matched-by-posix-extended-regex-space

u' u202f' u' u205f' u' u3000' What is all that stuff unicodedata.name is your friend from unicodedata import name for c in re.findall.. is all that stuff unicodedata.name is your friend from unicodedata import name for c in re.findall r' s' chrs re.UNICODE ... print..

Stripping non printable characters from a string in python

http://stackoverflow.com/questions/92438/stripping-non-printable-characters-from-a-string-in-python

You just have to build the character class yourself. The unicodedata module is quite helpful for this especially the unicodedata.category.. module is quite helpful for this especially the unicodedata.category function. See Unicode Character Database for descriptions.. Database for descriptions of the categories. import unicodedata re all_chars unichr i for i in xrange 0x110000 control_chars..