python Programming Glossary: unicodedata.category
Python unicode regular expression matching failing with some unicode characters -bug or mistake? http://stackoverflow.com/questions/12746458/python-unicode-regular-expression-matching-failing-with-some-unicode-characters re_ assert re_.search ^ w word flags re_.UNICODE print unicodedata.category cp for cp in word print .join ch for ch in regex.findall X word..
How to implement Unicode string matching by folding in python http://stackoverflow.com/questions/1410308/how-to-implement-unicode-string-matching-by-folding-in-python c for c in unicodedata.normalize 'NFD' unicode s if unicodedata.category c 'Mn' strip_accents u' stblocket' 'Ostblocket' share improve..
Playing around with Devanagari characters http://stackoverflow.com/questions/6805311/playing-around-with-devanagari-characters by looking up the Unicode category for each code point map unicodedata.category a 'Lo' 'Mc' 'Lo' 'Mn' 'Lo' 'Lo' 'Zs' 'Lo' 'Mn' 'Lo' 'Mc' 'Zs'.. None virama u' N DEVANAGARI SIGN VIRAMA ' for c in s cat unicodedata.category c 0 if cat 'M' or cat 'L' and last virama cluster c else if..
removing accent and special characters [duplicate] http://stackoverflow.com/questions/8694815/removing-accent-and-special-characters ''.join x for x in unicodedata.normalize 'NFKD' data if unicodedata.category x 0 'L' .lower Is there any better way to do this python diacritics..
Stripping non printable characters from a string in python http://stackoverflow.com/questions/92438/stripping-non-printable-characters-from-a-string-in-python module is quite helpful for this especially the unicodedata.category function. See Unicode Character Database for descriptions of.. 0x110000 control_chars ''.join c for c in all_chars if unicodedata.category c 'Cc' # or equivalently and much more efficiently control_chars..
|