======================================================================
Sonderzeichen unter Windows
Unter Windows wurden Zeichen mit 8 Bit gespeichert, womit die Zeichen
128 - 255 als sogenannte "Sonderzeichen" genutzt werden konnten. Das
System benutzte frueher je nach Sprachregion verschiedene Encodings.
Das am weitesten verbreitete war Windows-1252 Western European, eine
Erweiterung des Encodings ISO-8859-1, besser bekannt als Latin-1.
+----------------+-----------------+----------------+----------------+
| | | | |
| 0x80 / 128 € | 0xa0 / 160 nbsp | 0xc0 / 192 À | 0xe0 / 224 à |
| | 0xa1 / 161 ¡ | 0xc1 / 193 Á | 0xe1 / 225 á |
| 0x82 / 130 ‚ | 0xa2 / 162 ¢ | 0xc2 / 194 Â | 0xe2 / 226 â |
| 0x83 / 131 ƒ | 0xa3 / 163 £ | 0xc3 / 195 Ã | 0xe3 / 227 ã |
| 0x84 / 132 „ | 0xa4 / 164 ¤ | 0xc4 / 196 Ä | 0xe4 / 228 ä |
| 0x85 / 133 … | 0xa5 / 165 ¥ | 0xc5 / 197 Å | 0xe5 / 229 å |
| 0x86 / 134 † | 0xa6 / 166 ¦ | 0xc6 / 198 Æ | 0xe6 / 230 æ |
| 0x87 / 135 ‡ | 0xa7 / 167 § | 0xc7 / 199 Ç | 0xe7 / 231 ç |
| 0x88 / 136 ˆ | 0xa8 / 168 ¨ | 0xc8 / 200 È | 0xe8 / 232 è |
| 0x89 / 137 ‰ | 0xa9 / 169 © | 0xc9 / 201 É | 0xe9 / 233 é |
| 0x8a / 138 Š | 0xaa / 170 ª | 0xca / 202 Ê | 0xea / 234 ê |
| 0x8b / 139 ‹ | 0xab / 171 « | 0xcb / 203 Ë | 0xeb / 235 ë |
| 0x8c / 140 Œ | 0xac / 172 ¬ | 0xcc / 204 Ì | 0xec / 236 ì |
| | 0xad / 173 shy | 0xcd / 205 Í | 0xed / 237 í |
| 0x8e / 142 Ž | 0xae / 174 ® | 0xce / 206 Î | 0xee / 238 î |
| | 0xaf / 175 ¯ | 0xcf / 207 Ï | 0xef / 239 ï |
| | 0xb0 / 176 ° | 0xd0 / 208 Ð | 0xf0 / 240 ð |
| 0x91 / 145 ‘ | 0xb1 / 177 ± | 0xd1 / 209 Ñ | 0xf1 / 241 ñ |
| 0x92 / 146 ’ | 0xb2 / 178 ² | 0xd2 / 210 Ò | 0xf2 / 242 ò |
| 0x93 / 147 “ | 0xb3 / 179 ³ | 0xd3 / 211 Ó | 0xf3 / 243 ó |
| 0x94 / 148 ” | 0xb4 / 180 ´ | 0xd4 / 212 Ô | 0xf4 / 244 ô |
| 0x95 / 149 • | 0xb5 / 181 µ | 0xd5 / 213 Õ | 0xf5 / 245 õ |
| 0x96 / 150 – | 0xb6 / 182 ¶ | 0xd6 / 214 Ö | 0xf6 / 246 ö |
| 0x97 / 151 — | 0xb7 / 183 · | 0xd7 / 215 × | 0xf7 / 247 ÷ |
| 0x98 / 152 ˜ | 0xb8 / 184 ¸ | 0xd8 / 216 Ø | 0xf8 / 248 ø |
| 0x99 / 153 ™ | 0xb9 / 185 ¹ | 0xd9 / 217 Ù | 0xf9 / 249 ù |
| 0x9a / 154 š | 0xba / 186 º | 0xda / 218 Ú | 0xfa / 250 ú |
| 0x9b / 155 › | 0xbb / 187 » | 0xdb / 219 Û | 0xfb / 251 û |
| 0x9c / 156 œ | 0xbc / 188 ¼ | 0xdc / 220 Ü | 0xfc / 252 ü |
| | 0xbd / 189 ½ | 0xdd / 221 Ý | 0xfd / 253 ý |
| 0x9e / 158 ž | 0xbe / 180 ´ | 0xde / 222 Þ | 0xfe / 254 þ |
| 0x9f / 159 Ÿ | 0xbf / 191 ¿ | 0xdf / 223 ß | 0xff / 255 ÿ |
| | | | |
+----------------+-----------------+----------------+----------------+
Wenn bei der Darstellung von Text in einer Zeile mehr Zeichen stehen
als die Zeile Platz bietet, wird die Zeile "umgebrochen", was bedeu-
tet, dass das Wort, welches ueber die Zeile hinauslaeuft, auf die
naechste Zeile verschoben wird. Beim Zeichen 160 handelt es sich um
das sogenannte non-breaking-space (nbsp), das zwar genauso wie das
Zeichen 32 als Leerschlag dargestellt wird, jedoch nicht als Zwischen-
wortabstand aufgefasst, und somit auch nicht umgebrochen wird.
Einige Textprogramme erlauben es auch, die Worte zusaetzlich mit einem
Bindestrich (Hypen) zu trennen. Das Zeichen 173 ist das sogenannte
Soft-Hypen (shy), welches nur dann dargestellt wird, wenn an dieser
Stelle eine Wort-Trennung stattfindet. Ansonsten wird das Zeichen
nicht dargestellt.
Die Encodings Windows-1252 Western European und ISO-8859-1 unterschei-
den sich in den Zeichen 128 - 159, welche unter der ISO-8859 Klasse
als nicht definiert gelten. Da diese Zeichen nur selten gebraucht wer-
den, wurden faelschlicherweise die beiden Encodings oft vermischt.
Die Angabe ISO-8859-1, beziehungsweise Latin-1, ist heutzutags weitaus
verbreiteter als Windows-1252 und wurde sogar fuer die Zeichenbelegung
der Zeichen 128 - 255 in Unicode uebernommen. Dennoch sind nach wie
vor viele Dateien in Windows-1252 codiert, weswegen gemaess dem neuen
HTML5-Standard das Encoding ISO-8859-1 als Windows-1252 interpretiert
werden soll. Um diese Zweideutigkeit zu vermeiden, empfielt sich das
UTF-8 Encoding, oder aber die Verwendung von benannten HTML-Zeichen
gemaess der folgenden Tabelle:
+----------------+-----------------+----------------+----------------+
| € € | nbsp | À À | à à |
| | ¡ ¡ | Á Á | á á |
| ‚ ‚ | ¢ ¢ | Â Â | â â |
| ƒ ƒ | £ £ | Ã Ã | ã ã |
| „ „ | ¤ ¤ | Ä Ä | ä ä |
| … … | ¥ ¥ | Å Å | å å |
| † † | ¦ ¦ | Æ Æ | æ æ |
| ‡ ‡ | § § | Ç Ç | ç ç |
| ˆ ˆ | ¨ ¨ | È È | è è |
| ‰ ‰ | © © | É É | é é |
| Š Š | ª ª | Ê Ê | ê ê |
| ‹ ‹ | « « | Ë Ë | ë ë |
| Œ Œ | ¬ ¬ | Ì Ì | ì ì |
| | shy ­ | Í Í | í í |
| Ž | ® ® | Î Î | î î |
| | ¯ ¯ | Ï Ï | ï ï |
| | ° ° | Ð Ð | ð ð |
| ‘ ‘ | ± ± | Ñ Ñ | ñ ñ |
| ’ ’ | ² ² | Ò Ò | ò ò |
| “ “ | ³ ³ | Ó Ó | ó ó |
| ” ” | ´ ´ | Ô Ô | ô ô |
| • • | µ µ | Õ Õ | õ õ |
| – – | ¶ ¶ | Ö Ö | ö ö |
| — — | · · | × × | ÷ ÷ |
| ˜ ˜ | ¸ ¸ | Ø Ø | ø ø |
| ™ ™ | ¹ &supl; | Ù Ù | ù ù |
| š š | º º | Ú Ú | ú ú |
| › › | » » | Û Û | û û |
| œ œ | ¼ ¼ | Ü Ü | ü ü |
| | ½ ½ | Ý Ý | ý ý |
| ž | ´ ¾ | Þ Þ | þ þ |
| Ÿ Ÿ | ¿ ¿ | ß ß | ÿ ÿ |
+----------------+-----------------+----------------+----------------+
Nebst ISO-8859 gibt es noch den Standard ISO 8859 (kein Bindestrich).
Dieser definiert ebenfalls keine druckbaren Zeichen fuer 128 - 159,
ersetzt diese jedoch durch zusaetzliche Steuerzeichen, welche in der
folgenden Tabelle aufgelistet sind:
+------------+--------+----------------------------------------------+
| 0x80 / 128 | PAD | Padding character |
| 0x81 / 129 | HOP | High octet preset |
| 0x82 / 130 | BPH | Break permitted here |
| 0x83 / 131 | NBH | No break here |
| 0x84 / 132 | IND | Index |
| 0x85 / 133 | NEL | Next line |
| 0x86 / 134 | SSA | Start of selected area |
| 0x87 / 135 | ESA | End of selected area |
| 0x88 / 136 | HTS | Character tabulation set |
| 0x89 / 137 | HTJ | Character tabulation set with justify |
| 0x8a / 138 | VTS | Line tabulation set |
| 0x8b / 139 | PLD | Partial line forward |
| 0x8c / 140 | PLU | Partial line backward |
| 0x8d / 141 | RI | Reverse line feed |
| 0x8e / 142 | SS2 | Single-shift two |
| 0x8f / 143 | SS3 | Single-shift three |
| 0x90 / 144 | DCS | Device control string |
| 0x91 / 145 | PU1 | Private use one |
| 0x92 / 146 | PU2 | Private use two |
| 0x93 / 147 | STS | Set transmit state |
| 0x94 / 148 | CCH | Cancel character |
| 0x95 / 149 | MW | Message waiting |
| 0x96 / 150 | SPA | Start of guarded area |
| 0x97 / 151 | EPA | End of garded area |
| 0x98 / 152 | SOS | Start of string |
| 0x99 / 153 | SGCI | Single graphic character introducer |
| 0x9a / 154 | SCI | Single character introducer |
| 0x9b / 155 | CSI | Control sequence introducer |
| 0x9c / 156 | ST | String terminator |
| 0x9d / 157 | OSC | Operating system command |
| 0x9e / 158 | PM | Privacy message |
| 0x9f / 159 | APC | Application program command |
+------------+--------+----------------------------------------------+
Hier noch eine Auflistung der in Windows-1252 definierten Zeichen und
deren Umwandlung und Benennung in Unicode:
+--------------------+---+-------------------------------------------+
| Hex /Dec /UTF-8 | | Unicode Name |
| | | |
| 0x20ac/8364/e282ac | € | EURO SIGN |
| | | |
| 0x201a/8218/e2809a | ‚ | SINGLE LOW-9 QUOTATION MARK |
| 0x0192/0402/c692 | ƒ | LATIN SMALL LETTER F WITH HOOK |
| 0x201e/8222/e2809e | „ | DOUBLE LOW-9 QUOTATION MARK |
| 0x2026/8230/e280a6 | … | HORIZONTAL ELLIPSIS |
| 0x2020/8224/e280a0 | † | DAGGER |
| 0x2021/8225/e280a1 | ‡ | DOUBLE DAGGER |
| 0x02c6/0710/cb86 | ˆ | MODIFIER LETTER CIRCUMFLEX ACCENT |
| 0x2030/8240/e280b0 | ‰ | PER MILLE SIGN |
| 0x0160/0352/c5a0 | Š | LATIN CAPITAL LETTER S WITH CARON |
| 0x2039/8249/e280b9 | ‹ | SINGLE LEFT-POINTING ANGLE QUOTATION MARK |
| 0x0152/0338/c592 | Œ | LATIN CAPITAL LIGATURE OE |
| | | |
| 0x017d/0381/c5bd | Ž | LATIN CAPITAL LETTER Z WITH CARON |
| | | |
+--------------------+---+-------------------------------------------+
| | | |
| 0x2018/8216/e28098 | ‘ | LEFT SINGLE QUOTATION MARK |
| 0x2019/8217/e28099 | ’ | RIGHT SINGLE QUOTATION MARK |
| 0x201c/8220/e2809c | “ | LEFT DOUBLE QUOTATION MARK |
| 0x201d/8221/e2809d | ” | RIGHT DOUBLE QUOTATION MARK |
| 0x2022/8226/e280a2 | • | BULLET |
| 0x2013/8211/e28093 | – | EN DASH |
| 0x2014/8212/e28094 | — | EM DASH |
| 0x02dc/0732/cb9c | ˜ | SMALL TILDE |
| 0x2122/8482/e284a2 | ™ | TRADE MARK SIGN |
| 0x0161/0353/c5a1 | š | LATIN SMALL LETTER S WITH CARON |
| 0x203a/8250/e280ba | › | SINGLE RIGHT-POINTING ANGLE QUOTATION |
| 0x0153/0339/c593 | œ | LATIN SMALL LIGATURE OE |
| | | |
| 0x017e/0382/c5be | ž | LATIN SMALL LETTER Z WITH CARON |
| 0x0178/0376/c5b8 | Ÿ | LATIN CAPITAL LETTER Y WITH DIAERESIS |
+--------------------+---+-------------------------------------------+
| 0x00a0/0160/c2a0 nbsp | NO-BREAK SPACE |
| 0x00a1/0161/c2a1 | ¡ | INVERTED EXCLAMATION MARK |
| 0x00a2/0162/c2a2 | ¢ | CENT SIGN |
| 0x00a3/0163/c2a3 | £ | POUND SIGN |
| 0x00a4/0164/c2a4 | ¤ | CURRENCY SIGN |
| 0x00a5/0165/c2a5 | ¥ | YEN SIGN |
| 0x00a6/0166/c2a6 | ¦ | BROKEN BAR |
| 0x00a7/0167/c2a7 | § | SECTION SIGN |
| 0x00a8/0168/c2a8 | ¨ | DIAERESIS |
| 0x00a9/0169/c2a9 | © | COPYRIGHT SIGN |
| 0x00aa/0170/c2aa | ª | FEMININE ORDINAL INDICATOR |
| 0x00ab/0171/c2ab | « | LEFT-POINTING DOUBLE ANGLE QUOTATION |
| 0x00ac/0172/c2ac | ¬ | NOT SIGN |
| 0x00ad/0173/c2ad shy | SOFT HYPHEN |
| 0x00ae/0174/c2ae | ® | REGISTERED SIGN |
| 0x00af/0175/c2af | ¯ | MACRON |
+--------------------+---+-------------------------------------------+
| 0x00b0/0176/c2b0 | ° | DEGREE SIGN |
| 0x00b1/0177/c2b1 | ± | PLUS-MINUS SIGN |
| 0x00b2/0178/c2b2 | ² | SUPERSCRIPT TWO |
| 0x00b3/0179/c2b3 | ³ | SUPERSCRIPT THREE |
| 0x00b4/0180/c2b4 | ´ | ACUTE ACCENT |
| 0x00b5/0181/c2b5 | µ | MICRO SIGN |
| 0x00b6/0182/c2b6 | ¶ | PILCROW SIGN |
| 0x00b7/0183/c2b7 | · | MIDDLE DOT |
| 0x00b8/0184/c2b8 | ¸ | CEDILLA |
| 0x00b9/0185/c2b9 | ¹ | SUPERSCRIPT ONE |
| 0x00ba/0186/c2ba | º | MASCULINE ORDINAL INDICATOR |
| 0x00bb/0187/c2bb | » | RIGHT-POINTING DOUBLE ANGLE QUOTATION |
| 0x00bc/0188/c2bc | ¼ | VULGAR FRACTION ONE QUARTER |
| 0x00bd/0189/c2bd | ½ | VULGAR FRACTION ONE HALF |
| 0x00be/0190/c2be | ¾ | VULGAR FRACTION THREE QUARTERS |
| 0x00bf/0191/c2bf | ¿ | INVERTED QUESTION MARK |
+--------------------+---+-------------------------------------------+
| 0x00c0/0192/c380 | À | LATIN CAPITAL LETTER A WITH GRAVE |
| 0x00c1/0193/c381 | Á | LATIN CAPITAL LETTER A WITH ACUTE |
| 0x00c2/0194/c382 | Â | LATIN CAPITAL LETTER A WITH CURCUMFLEX |
| 0x00c3/0195/c383 | Ã | LATIN CAPITAL LETTER A WITH TILDE |
| 0x00c4/0196/c384 | Ä | LATIN CAPITAL LETTER A WITH DIAERESIS |
| 0x00c5/0197/c385 | Å | LATIN CAPITAL LETTER A WITH RING ABOVE |
| 0x00c6/0198/c386 | Æ | LATIN CAPITAL LETTER AE |
| 0x00c7/0199/c387 | Ç | LATIN CAPITAL LETTER C WITH CEDILLA |
| 0x00c8/0200/c388 | È | LATIN CAPITAL LETTER E WITH GRAVE |
| 0x00c9/0201/c389 | É | LATIN CAPITAL LETTER E WITH ACUTE |
| 0x00ca/0202/c38a | Ê | LATIN CAPITAL LETTER E WITH CURCUMFLEX |
| 0x00cb/0203/c38b | Ë | LATIN CAPITAL LETTER E WITH DIAERESIS |
| 0x00cc/0204/c38c | Ì | LATIN CAPITAL LETTER I WITH GRAVE |
| 0x00cd/0205/c38d | Í | LATIN CAPITAL LETTER I WITH ACUTE |
| 0x00ce/0206/c38e | Î | LATIN CAPITAL LETTER I WITH CURCUMFLEX |
| 0x00cf/0207/c38f | Ï | LATIN CAPITAL LETTER I WITH DIAERESIS |
+--------------------+---+-------------------------------------------+
| 0x00d0/0208/c390 | Ð | LATIN CAPITAL LETTER ETH |
| 0x00d1/0209/c391 | Ñ | LATIN CAPITAL LETTER N WITH TILDE |
| 0x00d2/0210/c392 | Ò | LATIN CAPITAL LETTER O WITH GRAVE |
| 0x00d3/0211/c393 | Ó | LATIN CAPITAL LETTER O WITH ACUTE |
| 0x00d4/0212/c394 | Ô | LATIN CAPITAL LETTER O WITH CIRCUMFLES |
| 0x00d5/0213/c395 | Õ | LATIN CAPITAL LETTER O WITH TILDE |
| 0x00d6/0214/c396 | Ö | LATIN CAPITAL LETTER O WITH DIAERESIS |
| 0x00d7/0215/c397 | × | MULTIPLICATION SIGN |
| 0x00d8/0216/c398 | Ø | LATIN CAPITAL LETTER O WITH STROKE |
| 0x00d9/0217/c399 | Ù | LATIN CAPITAL LETTER U WITH GRAVE |
| 0x00da/0218/c39a | Ú | LATIN CAPITAL LETTER U WITH ACUTE |
| 0x00db/0219/c39b | Û | LATIN CAPITAL LETTER U WITH CIRCUMFLEX |
| 0x00dc/0220/c39c | Ü | LATIN CAPITAL LETTER U WITH DIAERESIS |
| 0x00dd/0221/c39d | Ý | LATIN CAPITAL LETTER Y WITH ACUTE |
| 0x00de/0222/c39e | Þ | LATIN CAPITAL LETTER THORN |
| 0x00df/0223/c39f | ß | LATIN SMALL LETTER SHARP S |
+--------------------+---+-------------------------------------------+
| 0x00e0/0224/c3a0 | à | LATIN SMALL LETTER A WITH GRAVE |
| 0x00e1/0225/c3a1 | á | LATIN SMALL LETTER A WITH ACUTE |
| 0x00e2/0226/c3a2 | â | LATIN SMALL LETTER A WITH CURCUMFLEX |
| 0x00e3/0227/c3a3 | ã | LATIN SMALL LETTER A WITH TILDE |
| 0x00e4/0228/c3a4 | ä | LATIN SMALL LETTER A WITH DIAERESIS |
| 0x00e5/0229/c3a5 | å | LATIN SMALL LETTER A WITH RING ABOVE |
| 0x00e6/0230/c3a6 | æ | LATIN SMALL LETTER AE |
| 0x00e7/0231/c3a7 | ç | LATIN SMALL LETTER C WITH CEDILLA |
| 0x00e8/0232/c3a8 | è | LATIN SMALL LETTER E WITH GRAVE |
| 0x00e9/0233/c3a9 | é | LATIN SMALL LETTER E WITH ACUTE |
| 0x00ea/0234/c3aa | ê | LATIN SMALL LETTER E WITH CURCUMFLEX |
| 0x00eb/0235/c3ab | ë | LATIN SMALL LETTER E WITH DIAERESIS |
| 0x00ec/0236/c3ac | ì | LATIN SMALL LETTER I WITH GRAVE |
| 0x00ed/0237/c3ad | í | LATIN SMALL LETTER I WITH ACUTE |
| 0x00ee/0238/c3ae | î | LATIN SMALL LETTER I WITH CURCUMFLEX |
| 0x00ef/0239/c3af | ï | LATIN SMALL LETTER I WITH DIAERESIS |
+--------------------+---+-------------------------------------------+
| 0x00f0/0240/c3b0 | ð | LATIN SMALL LETTER ETH |
| 0x00f1/0241/c3b1 | ñ | LATIN SMALL LETTER N WITH TILDE |
| 0x00f2/0242/c3b2 | ò | LATIN SMALL LETTER O WITH GRAVE |
| 0x00f3/0243/c3b3 | ó | LATIN SMALL LETTER O WITH ACUTE |
| 0x00f4/0244/c3b4 | ô | LATIN SMALL LETTER O WITH CIRCUMFLES |
| 0x00f5/0245/c3b5 | õ | LATIN SMALL LETTER O WITH TILDE |
| 0x00f6/0246/c3b6 | ö | LATIN SMALL LETTER O WITH DIAERESIS |
| 0x00f7/0247/c3b7 | ÷ | DIVISION SIGN |
| 0x00f8/0248/c3b8 | ø | LATIN SMALL LETTER O WITH STROKE |
| 0x00f9/0249/c3b9 | ù | LATIN SMALL LETTER U WITH GRAVE |
| 0x00fa/0250/c3ba | ú | LATIN SMALL LETTER U WITH ACUTE |
| 0x00fb/0251/c3bb | û | LATIN SMALL LETTER U WITH CIRCUMFLEX |
| 0x00fc/0252/c3bc | ü | LATIN SMALL LETTER U WITH DIAERESIS |
| 0x00fd/0253/c3bd | ý | LATIN SMALL LETTER Y WITH ACUTE |
| 0x00fe/0254/c3be | þ | LATIN SMALL LETTER THORN |
| 0x00ff/0255/c3bf | ÿ | LATIN SMALL LETTER Y WITH DIAERESIS |
+--------------------+---+-------------------------------------------+
======================================================================
----------------------------------------------------------------------
(c) Tobias Stamm, manderby.com