The Extended SMS Converter uses the lossy mechanism to extend the alphabet of the standard converter. It maps a wider set of input character codes, including commonly-used Eastern European Unicode characters, to the standard 7-bit alphabet. This section describes the Extended SMS Converter and the alphabet it supports.
Languages supported by this converter include Croatian, Czech, Estonian, Hungarian, Icelandic, Latvian, Lithuanian, Polish, Romanian, Serbian, Slovak, Slovenian, Turkish, Portuguese and Spanish. This converter is identified by the KCharacterSetIdentifierExtendedSms7Bit
UID, which is defined in the charconv.h
file.
Any undefined Unicode is converted to a question mark (?)–GSM code 0x37
. Any code outside GSM 0x00
~0x7F
is converted to the Unicode replacement character 0xFFFD
.
The highlighted boxes in Figure 1 illustrate the alphabet of the extended SMS converter:
GSM 7-bit default alphabet
GSM 7-bit default alphabet extension table
Extra lossy conversions–exclude 9 characters listed in Table 2
Extended lossy conversions–shown as Lossy Characters 2 in Figure 1.
Figure 1
Table 1 lists the extra lossy conversions supported by this converter in addition to those supported by the standard converter.
Table 1
Character | Unicode | GSM | Converted Character |
Ώ GREEK CAPITAL LETTER OMEGA WITH TONOS | U+038F | 0x15 | Ω GREEK CAPITAL LETTER OMEGA |
(NO-BREAK SPACE) | U+00A0 | 0x20 | (SPACE) |
« LEFT-POINTING DOUBLE ANGLE QUOTATION MARK * | U+00AB | 0x22 | " QUOTATION MARK |
» RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK * | U+00BB | 0x22 | " QUOTATION MARK |
` GRAVE ACCENT | U+0060 | 0x27 | ' APOSTROPHE |
´ ACUTE ACCENT | U+00B4 | 0x27 | ' APOSTROPHE |
΄ GREEK TONOS | U+0384 | 0x27 | ' APOSTROPHE |
΅ GREEK DIALYTIKA TONOS | U+0385 | 0x27 | ' APOSTROPHE |
× MULTIPLICATION SIGN | U+00D7 | 0x2A | * ASTERISK |
¸ CEDILLA | U+00B8 | 0x2C | , COMMA |
SOFT HYPHEN | U+00AD | 0x2D | - HYPHEN-MINUS |
· MIDDLE DOT | U+00B7 | 0x2E | . FULL STOP |
÷ DIVISION SIGN | U+00F7 | 0x2F | / SOLIDUS |
¹ SUPERSCRIPT ONE | U+00B9 | 0x31 | 1 DIGIT ONE |
² SUPERSCRIPT TWO | U+00B2 | 0x32 | 2 DIGIT TWO |
³ SUPERSCRIPT THREE | U+00B3 | 0x33 | 3 DIGIT THREE |
; GREEK QUESTION MARK (Erotimatiko) | U+037E | 0x3B | ; SEMICOLON |
Ā LATIN CAPITAL LETTER A WITH MACRON | U+0100 | 0x41 | A LATIN CAPITAL LETTER A |
Ă LATIN CAPITAL LETTER A WITH BREVE | U+0102 | 0x41 | A LATIN CAPITAL LETTER A |
Ą LATIN CAPITAL LETTER A WITH OGONEK | U+0104 | 0x41 | A LATIN CAPITAL LETTER A |
Ć LATIN CAPITAL LETTER C WITH ACUTE | U+0106 | 0x43 | C LATIN CAPITAL LETTER C |
Ĉ LATIN CAPITAL LETTER C WITH CIRCUMFLEX | U+0108 | 0x43 | C LATIN CAPITAL LETTER C |
Ċ LATIN CAPITAL LETTER C WITH DOT ABOVE | U+010A | 0x43 | C LATIN CAPITAL LETTER C |
Č LATIN CAPITAL LETTER C WITH CARON | U+010C | 0x43 | C LATIN CAPITAL LETTER C |
Ð LATIN CAPITAL LETTER ETH (Icelandic) | U+00D0 | 0x44 | D LATIN CAPITAL LETTER D |
Ď LATIN CAPITAL LETTER D WITH CARON | U+010E | 0x44 | D LATIN CAPITAL LETTER D |
Đ LATIN CAPITAL LETTER D WITH STROKE | U+0110 | 0x44 | D LATIN CAPITAL LETTER D |
Ē LATIN CAPITAL LETTER E WITH MACRON | U+0112 | 0x45 | E LATIN CAPITAL LETTER E |
Ĕ LATIN CAPITAL LETTER E WITH BREVE | U+0114 | 0x45 | E LATIN CAPITAL LETTER E |
Ė LATIN CAPITAL LETTER E WITH DOT ABOVE | U+0116 | 0x45 | E LATIN CAPITAL LETTER E |
Ę LATIN CAPITAL LETTER E WITH OGONEK | U+0118 | 0x45 | E LATIN CAPITAL LETTER E |
Ě LATIN CAPITAL LETTER E WITH CARON | U+011A | 0x45 | E LATIN CAPITAL LETTER E |
Ĝ LATIN CAPITAL LETTER G WITH CIRCUMFLEX | U+011C | 0x47 | G LATIN CAPITAL LETTER G |
Ğ LATIN CAPITAL LETTER G WITH BREVE | U+011E | 0x47 | G LATIN CAPITAL LETTER G |
Ġ LATIN CAPITAL LETTER G WITH DOT ABOVE | U+0120 | 0x47 | G LATIN CAPITAL LETTER G |
Ģ LATIN CAPITAL LETTER G WITH CEDILLA | U+0122 | 0x47 | G LATIN CAPITAL LETTER G |
Ĥ LATIN CAPITAL LETTER H WITH CIRCUMFLEX | U+0124 | 0x48 | H LATIN CAPITAL LETTER H |
Ħ LATIN CAPITAL LETTER H WITH STROKE | U+0126 | 0x48 | H LATIN CAPITAL LETTER H |
Ĩ LATIN CAPITAL LETTER I WITH TILDE | U+0128 | 0x49 | I LATIN CAPITAL LETTER I |
Ī LATIN CAPITAL LETTER I WITH MACRON | U+012A | 0x49 | I LATIN CAPITAL LETTER I |
Ĭ LATIN CAPITAL LETTER I WITH BREVE | U+012C | 0x49 | I LATIN CAPITAL LETTER I |
Į LATIN CAPITAL LETTER I WITH OGONEK | U+012E | 0x49 | I LATIN CAPITAL LETTER I |
İ LATIN CAPITAL LETTER I WITH DOT ABOVE | U+0130 | 0x49 | I LATIN CAPITAL LETTER I |
Ĵ LATIN CAPITAL LETTER J WITH CIRCUMFLEX | U+0134 | 0x4A | J LATIN CAPITAL LETTER J |
Ķ LATIN CAPITAL LETTER K WITH CEDILLA | U+0136 | 0x4B | K LATIN CAPITAL LETTER K |
Ĺ LATIN CAPITAL LETTER L WITH ACUTE | U+0139 | 0x4C | L LATIN CAPITAL LETTER L |
Ļ LATIN CAPITAL LETTER L WITH CEDILLA | U+013B | 0x4C | L LATIN CAPITAL LETTER L |
Ľ LATIN CAPITAL LETTER L WITH CARON | U+013D | 0x4C | L LATIN CAPITAL LETTER L |
Ŀ LATIN CAPITAL LETTER L WITH MIDDLE DOT | U+013F | 0x4C | L LATIN CAPITAL LETTER L |
Ł LATIN CAPITAL LETTER L WITH STROKE | U+0141 | 0x4C | L LATIN CAPITAL LETTER L |
Ń LATIN CAPITAL LETTER N WITH ACUTE | U+0143 | 0x4E | N LATIN CAPITAL LETTER N |
Ņ LATIN CAPITAL LETTER N WITH CEDILLA | U+0145 | 0x4E | N LATIN CAPITAL LETTER N |
Ň LATIN CAPITAL LETTER N WITH CARON | U+0147 | 0x4E | N LATIN CAPITAL LETTER N |
Ŋ LATIN CAPITAL LETTER ENG (Sami) | U+014A | 0x4E | N LATIN CAPITAL LETTER N |
Ō LATIN CAPITAL LETTER O WITH MACRON | U+014C | 0x4F | O LATIN CAPITAL LETTER O |
Ŏ LATIN CAPITAL LETTER O WITH BREVE | U+014E | 0x4F | O LATIN CAPITAL LETTER O |
Œ LATIN CAPITAL LIGATURE OE | U+0152 | 0x4F | O LATIN CAPITAL LETTER O |
Ŕ LATIN CAPITAL LETTER R WITH ACUTE | U+0154 | 0x52 | R LATIN CAPITAL LETTER R |
Ŗ LATIN CAPITAL LETTER R WITH CEDILLA | U+0156 | 0x52 | R LATIN CAPITAL LETTER R |
Ř LATIN CAPITAL LETTER R WITH CARON | U+0158 | 0x52 | R LATIN CAPITAL LETTER R |
Ś LATIN CAPITAL LETTER S WITH ACUTE | U+015A | 0x53 | S LATIN CAPITAL LETTER S |
Ŝ LATIN CAPITAL LETTER S WITH CIRCUMFLEX | U+015C | 0x53 | S LATIN CAPITAL LETTER S |
Ş LATIN CAPITAL LETTER S WITH CEDILLA * | U+015E | 0x53 | S LATIN CAPITAL LETTER S |
Š LATIN CAPITAL LETTER S WITH CARON | U+0160 | 0x53 | S LATIN CAPITAL LETTER S |
Þ LATIN CAPITAL LETTER THORN (Icelandic) | U+00DE | 0x54 | T LATIN CAPITAL LETTER T |
Ţ LATIN CAPITAL LETTER T WITH CEDILLA * | U+0162 | 0x54 | T LATIN CAPITAL LETTER T |
Ť LATIN CAPITAL LETTER T WITH CARON | U+0164 | 0x54 | T LATIN CAPITAL LETTER T |
Ŧ LATIN CAPITAL LETTER T WITH STROKE | U+0166 | 0x54 | T LATIN CAPITAL LETTER T |
Ũ LATIN CAPITAL LETTER U WITH TILDE | U+0168 | 0x55 | U LATIN CAPITAL LETTER U |
Ū LATIN CAPITAL LETTER U WITH MACRON | U+016A | 0x55 | U LATIN CAPITAL LETTER U |
Ŭ LATIN CAPITAL LETTER U WITH BREVE | U+016C | 0x55 | U LATIN CAPITAL LETTER U |
Ů LATIN CAPITAL LETTER U WITH RING ABOVE | U+016E | 0x55 | U LATIN CAPITAL LETTER U |
Ų LATIN CAPITAL LETTER U WITH OGONEK | U+0172 | 0x55 | U LATIN CAPITAL LETTER U |
Ŵ LATIN CAPITAL LETTER W WITH CIRCUMFLEX | U+0174 | 0x57 | W LATIN CAPITAL LETTER W |
Ŷ LATIN CAPITAL LETTER Y WITH CIRCUMFLEX | U+0176 | 0x59 | Y LATIN CAPITAL LETTER Y |
Ÿ LATIN CAPITAL LETTER Y WITH DIAERESIS | U+0178 | 0x59 | Y LATIN CAPITAL LETTER Y |
Ź LATIN CAPITAL LETTER Z WITH ACUTE | U+0179 | 0x5A | Z LATIN CAPITAL LETTER Z |
Ż LATIN CAPITAL LETTER Z WITH DOT ABOVE | U+017B | 0x5A | Z LATIN CAPITAL LETTER Z |
Ž LATIN CAPITAL LETTER Z WITH CARON | U+017D | 0x5A | Z LATIN CAPITAL LETTER Z |
Ö LATIN CAPITAL LETTER O WITH DIAERESIS | U+00D6 | 0x5C | Ö LATIN CAPITAL LETTER O WITH DIAERESIS |
Ő LATIN CAPITAL LETTER O WITH DOUBLE ACUTE | U+0150 | 0x5C | Ö LATIN CAPITAL LETTER O WITH DIAERESIS |
Ű LATIN CAPITAL LETTER U WITH DOUBLE ACUTE | U+0170 | 0x5E | Ü LATIN CAPITAL LETTER U WITH DIAERESIS |
ā LATIN SMALL LETTER A WITH MACRON | U+0101 | 0x61 | a LATIN SMALL LETTER A |
ă LATIN SMALL LETTER A WITH BREVE | U+0103 | 0x61 | a LATIN SMALL LETTER A |
ą LATIN SMALL LETTER A WITH OGONEK | U+0105 | 0x61 | a LATIN SMALL LETTER A |
ª FEMININE ORDINAL INDICATOR | U+00AA | 0x61 | a LATIN SMALL LETTER A |
ć LATIN SMALL LETTER C WITH ACUTE | U+0107 | 0x63 | c LATIN SMALL LETTER C |
ĉ LATIN SMALL LETTER C WITH CIRCUMFLEX | U+0109 | 0x63 | c LATIN SMALL LETTER C |
ċ LATIN SMALL LETTER C WITH DOT ABOVE | U+010B | 0x63 | c LATIN SMALL LETTER C |
č LATIN SMALL LETTER C WITH CARON | U+010D | 0x63 | c LATIN SMALL LETTER C |
¢ CENT SIGN | U+00A2 | 0x63 | c LATIN SMALL LETTER C |
© COPYRIGHT SIGN | U+00A9 | 0x63 | c LATIN SMALL LETTER C |
ð LATIN SMALL LETTER ETH (Icelandic) | U+00F0 | 0x64 | d LATIN SMALL LETTER D |
ď LATIN SMALL LETTER D WITH CARON | U+010F | 0x64 | d LATIN SMALL LETTER D |
đ LATIN SMALL LETTER D WITH STROKE | U+0111 | 0x64 | d LATIN SMALL LETTER D |
ē LATIN SMALL LETTER E WITH MACRON | U+0113 | 0x65 | e LATIN SMALL LETTER E |
ĕ LATIN SMALL LETTER E WITH BREVE | U+0115 | 0x65 | e LATIN SMALL LETTER E |
ė LATIN SMALL LETTER E WITH DOT ABOVE | U+0117 | 0x65 | e LATIN SMALL LETTER E |
ę LATIN SMALL LETTER E WITH OGONEK | U+0119 | 0x65 | e LATIN SMALL LETTER E |
ě LATIN SMALL LETTER E WITH CARON | U+011B | 0x65 | e LATIN SMALL LETTER E |
ĝ LATIN SMALL LETTER G WITH CIRCUMFLEX | U+011D | 0x67 | g LATIN SMALL LETTER G |
ğ LATIN SMALL LETTER G WITH BREVE | U+011F | 0x67 | g LATIN SMALL LETTER G |
ġ LATIN SMALL LETTER G WITH DOT ABOVE | U+0121 | 0x67 | g LATIN SMALL LETTER G |
ģ LATIN SMALL LETTER G WITH CEDILLA | U+0123 | 0x67 | g LATIN SMALL LETTER G |
ĥ LATIN SMALL LETTER H WITH CIRCUMFLEX | U+0125 | 0x68 | h LATIN SMALL LETTER H |
ħ LATIN SMALL LETTER H WITH STROKE | U+0127 | 0x68 | h LATIN SMALL LETTER H |
ĩ LATIN SMALL LETTER I WITH TILDE | U+0129 | 0x69 | i LATIN SMALL LETTER I |
ī LATIN SMALL LETTER I WITH MACRON | U+012B | 0x69 | i LATIN SMALL LETTER I |
ĭ LATIN SMALL LETTER I WITH BREVE | U+012D | 0x69 | i LATIN SMALL LETTER I |
į LATIN SMALL LETTER I WITH OGONEK | U+012F | 0x69 | i LATIN SMALL LETTER I |
ı LATIN SMALL LETTER DOTLESS I | U+0131 | 0x69 | i LATIN SMALL LETTER I |
ĵ LATIN SMALL LETTER J WITH CIRCUMFLEX | U+0135 | 0x6A | j LATIN SMALL LETTER J |
ķ LATIN SMALL LETTER K WITH CEDILLA | U+0137 | 0x6B | k LATIN SMALL LETTER K |
ĸ LATIN SMALL LETTER KRA (Greenlandic) | U+0138 | 0x6B | k LATIN SMALL LETTER K |
ĺ LATIN SMALL LETTER L WITH ACUTE | U+013A | 0x6C | l LATIN SMALL LETTER L |
ļ LATIN SMALL LETTER L WITH CEDILLA | U+013C | 0x6C | l LATIN SMALL LETTER L |
ľ LATIN SMALL LETTER L WITH CARON | U+013E | 0x6C | l LATIN SMALL LETTER L |
ŀ LATIN SMALL LETTER L WITH MIDDLE DOT | U+0140 | 0x6C | l LATIN SMALL LETTER L |
ł LATIN SMALL LETTER L WITH STROKE | U+0142 | 0x6C | l LATIN SMALL LETTER L |
ń LATIN SMALL LETTER N WITH ACUTE | U+0144 | 0x6E | n LATIN SMALL LETTER N |
ņ LATIN SMALL LETTER N WITH CEDILLA | U+0146 | 0x6E | n LATIN SMALL LETTER N |
ň LATIN SMALL LETTER N WITH CARON | U+0148 | 0x6E | n LATIN SMALL LETTER N |
ʼn LATIN SMALL LETTER N PRECEDED BY APOSTROPHE | U+0149 | 0x6E | n LATIN SMALL LETTER N |
ŋ LATIN SMALL LETTER ENG (Sami) | U+014B | 0x6E | n LATIN SMALL LETTER N |
ō LATIN SMALL LETTER O WITH MACRON | U+014D | 0x6F | o LATIN SMALL LETTER O |
ŏ LATIN SMALL LETTER O WITH BREVE | U+014F | 0x6F | o LATIN SMALL LETTER O |
° DEGREE SIGN | U+00B0 | 0x6F | o LATIN SMALL LETTER O |
º MASCULINE ORDINAL INDICATOR | U+00BA | 0x6F | o LATIN SMALL LETTER O |
œ LATIN SMALL LIGATURE OE | U+0153 | 0x6F | o LATIN SMALL LETTER O |
ŕ LATIN SMALL LETTER R WITH ACUTE | U+0155 | 0x72 | r LATIN SMALL LETTER R |
ŗ LATIN SMALL LETTER R WITH CEDILLA | U+0157 | 0x72 | r LATIN SMALL LETTER R |
ř LATIN SMALL LETTER R WITH CARON | U+0159 | 0x72 | r LATIN SMALL LETTER R |
® REGISTERED SIGN | U+00AE | 0x72 | r LATIN SMALL LETTER R |
ś LATIN SMALL LETTER S WITH ACUTE | U+015B | 0x73 | s LATIN SMALL LETTER S |
ŝ LATIN SMALL LETTER S WITH CIRCUMFLEX | U+015D | 0x73 | s LATIN SMALL LETTER S |
ş LATIN SMALL LETTER S WITH CEDILLA * | U+015F | 0x73 | s LATIN SMALL LETTER S |
š LATIN SMALL LETTER S WITH CARON | U+0161 | 0x73 | s LATIN SMALL LETTER S |
þ LATIN SMALL LETTER THORN (Icelandic) | U+00FE | 0x74 | t LATIN SMALL LETTER T |
ţ LATIN SMALL LETTER T WITH CEDILLA * | U+0163 | 0x74 | t LATIN SMALL LETTER T |
ť LATIN SMALL LETTER T WITH CARON | U+0165 | 0x74 | t LATIN SMALL LETTER T |
ŧ LATIN SMALL LETTER T WITH STROKE | U+0167 | 0x74 | t LATIN SMALL LETTER T |
ũ LATIN SMALL LETTER U WITH TILDE | U+0169 | 0x75 | u LATIN SMALL LETTER U |
ū LATIN SMALL LETTER U WITH MACRON | U+016B | 0x75 | u LATIN SMALL LETTER U |
ŭ LATIN SMALL LETTER U WITH BREVE | U+016D | 0x75 | u LATIN SMALL LETTER U |
ů LATIN SMALL LETTER U WITH RING ABOVE | U+016F | 0x75 | u LATIN SMALL LETTER U |
ų LATIN SMALL LETTER U WITH OGONEK | U+0173 | 0x75 | u LATIN SMALL LETTER U |
µ MICRO SIGN | U+00B5 | 0x75 | u LATIN SMALL LETTER U |
ŵ LATIN SMALL LETTER W WITH CIRCUMFLEX | U+0175 | 0x77 | w LATIN SMALL LETTER W |
ŷ LATIN SMALL LETTER Y WITH CIRCUMFLEX | U+0177 | 0x79 | y LATIN SMALL LETTER Y |
ź LATIN SMALL LETTER Z WITH ACUTE | U+017A | 0x7A | z LATIN SMALL LETTER Z |
ż LATIN SMALL LETTER Z WITH DOT ABOVE | U+017C | 0x7A | z LATIN SMALL LETTER Z |
ž LATIN SMALL LETTER Z WITH CARON | U+017E | 0x7A | z LATIN SMALL LETTER Z |
ő LATIN SMALL LETTER O WITH DOUBLE ACUTE | U+0151 | 0x7C | ö LATIN SMALL LETTER O WITH DIAERESIS |
ű LATIN SMALL LETTER U WITH DOUBLE ACUTE | U+0171 | 0x7E | ü LATIN SMALL LETTER U WITH DIAERESIS |
Table 2 lists the 9 characters in Lossy Character 1 supported by Standard SMS Converter but not by Extended SMS Converter.
Table 2
Character | Unicode | GSM | Converted Character |
ϕ GREEK PHI SYMBOL | 0x03D5 | 0x12 | Φ GREEK CAPITAL LETTER PHI |
Ω OHM SIGN | 0x2126 | 0x15 | Ω GREEK CAPITAL LETTER OMEGA |
∏ N-ARY PRODUCT | 0x220F | 0x16 | Π GREEK CAPITAL LETTER PI |
∑ N-ARY SUMMATION | 0x2211 | 0x18 | Σ GREEK CAPITAL LETTER SIGMA |
ϑ GREEK THETA SYMBOL | 0x03D1 | 0x19 | Θ GREEK CAPITAL LETTER THETA |
ϐ GREEK BETA SYMBOL | 0x03D0 | 0x42 | B LATIN CAPITAL LETTER B |
ϒ GREEK UPSILON WITH HOOK SYMBOL | 0x03D2 | 0x59 | Y LATIN CAPITAL LETTER Y |
ϓ GREEK UPSILON WITH ACUTE AND HOOK SYMBOL | 0x03D3 | 0x59 | Y LATIN CAPITAL LETTER Y |
ϔ GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL | 0x03D4 | 0x59 | Y LATIN CAPITAL LETTER Y |