java Programming Glossary: codepoint

http://stackoverflow.com/questions/1029897/comparing-a-char-to-a-code-point

comparing a code point to a Java character For example int codepoint String.codePointAt 0 char token ' n' I know I can probably do.. 0 char token ' n' I know I can probably do if codepoint int token ... but this code looks fragile. Is there a formal.. looks fragile. Is there a formal API method for comparing codepoints to chars or converting the char up to a codepoint for comparison..

Converting UTF-8 to ISO-8859-1 in Java

http://stackoverflow.com/questions/1273986/converting-utf-8-to-iso-8859-1-in-java

Character.UnicodeBlock.BASIC_LATIN out.append ch else int codepoint Character.codePointAt sequence i handle supplementary range.. i handle supplementary range chars i Character.charCount codepoint 1 emit entity out.append #x out.append Integer.toHexString codepoint.. 1 emit entity out.append #x out.append Integer.toHexString codepoint out.append return out Example usage String foo This is Cyrillic..

UTF-16 to ASCII conversion in Java

http://stackoverflow.com/questions/1490218/utf-16-to-ascii-conversion-in-java

to represent codes 0x00FFFF. In other words a Unicode codepoint 0x00FFFF is actually represented in UTF 16 as two characters.. of this admittedly esoteric point. In fact dealing with codepoints 0x00FFFF in Java is rather tricky in general. This stems from..

How can I iterate through the unicode codepoints of a Java String?

http://stackoverflow.com/questions/1527856/how-can-i-iterate-through-the-unicode-codepoints-of-a-java-string

can I iterate through the unicode codepoints of a Java String So I know about String#codePointAt int but.. int but it's indexed by the char offset not by the codepoint offset. I'm thinking about trying something like using String#charAt.. range if so use String#codePointAt int to get the codepoint and increment the index by 2 if not use the given char value..

How do I detect unicode characters in a Java string?

http://stackoverflow.com/questions/1673544/how-do-i-detect-unicode-characters-in-a-java-string

to loop through every character of the String and test its codepoint if it is covered by the ISO 8859 charset or not. You can also..

Howto unescape a Java string literal in Java

http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java

NB proper Unicode never needs more than 6 as highest valid codepoint is 0x10FFFF not maxint 0xFFFFFFFF Lame Java escape IDIOT JAVA.. A control character is what you get when you xor its codepoint with '@' 64. This only makes sense for ASCII and may not yield..

Reading File from Windows and Linux yields different results (character encoding?)

http://stackoverflow.com/questions/6366912/reading-file-from-windows-and-linux-yields-different-results-character-encoding

0xEF 0xBF 0xBD and is UTF 8 representation of the Unicode codepoint 0xFFFD . The codepoint in itself is the replacement character.. UTF 8 representation of the Unicode codepoint 0xFFFD . The codepoint in itself is the replacement character for illegal UTF 8 sequences... a UTF 8 sequence . Since 0x89 on it's own is not a valid codepoint in the ASCII 7 range ref the UTF 8 encoding scheme it cannot..