W Wrapl, The Programming Language

Libraries:Gtk:Glib:GUnicode

Functions

CanonicalDecomposition(ch @ Std.Integer.SmallT, result_len @ Std.Object.T) : Std.Object.T

Warning

CanonicalDecomposition has been deprecated since version 2.30 and should not be used in newly-written code. Use the more flexible g_unichar_fully_decompose() instead.



CanonicalOrdering(string @ Std.Object.T, len @ Std.Integer.SmallT) : Std.Object.T

Computes the canonical ordering of a string in-place. This rearranges decomposed characters in the string according to their combining classes. See the Unicode manual for more information.

string a UCS-4 encoded string.
len the maximum length of string to use.


GetCharset(charset @ Std.Object.T) : Std.Symbol.T

Obtains the character set for the current locale; you might use this character set as an argument to g_convert(), to convert from the current locale's encoding to some other encoding. (Frequently g_locale_to_utf8() and g_locale_from_utf8() are nice shortcuts, though.)

On Windows the character set returned by this function is the so-called system default ANSI code-page. That is the character set used by the "narrow" versions of C library and Win32 functions that handle file names. It might be different from the character set used by the C library's current locale.

The return value is TRUE if the locale's encoding is UTF-8, in that case you can perhaps avoid calling g_convert().

The string returned in charset is not allocated, and should not be freed.

charset return location for character set name
Returns TRUE if the returned charset is UTF-8


Ucs4ToUtf16(str @ Std.Object.T, len @ Std.Integer.SmallT, items_read @ Std.Object.T, items_written @ Std.Object.T, error @ Std.Object.T) : Std.Object.T

Convert a string from UCS-4 to UTF-16. A 0 character will be added to the result after the converted text.

str a UCS-4 encoded string
len the maximum length (number of characters) of str to use. If len < 0, then the string is nul-terminated.
items_read location to store number of bytes read, or NULL. If an error occurs then the index of the invalid input is stored here.
items_written location to store number of gunichar2 written, or NULL. The value stored here does not include the trailing 0.
error location to store the error occurring, or NULL to ignore errors. Any of the errors in Gtk.Glib.GConvertError.T other than Gtk.Glib.GConvertError.NoConversion may occur.
Returns a pointer to a newly allocated UTF-16 string. This value must be freed with g_free(). If an error occurs, NULL will be returned and error set.


Ucs4ToUtf8(str @ Std.Object.T, len @ Std.Integer.SmallT, items_read @ Std.Object.T, items_written @ Std.Object.T, error @ Std.Object.T) : Std.String.T

Convert a string from a 32-bit fixed width representation as UCS-4. to UTF-8. The result will be terminated with a 0 byte.

str a UCS-4 encoded string
len the maximum length (number of characters) of str to use. If len < 0, then the string is nul-terminated.
items_read location to store number of characters read, or NULL.
items_written location to store number of bytes written or NULL. The value here stored does not include the trailing 0 byte.
error location to store the error occurring, or NULL to ignore errors. Any of the errors in Gtk.Glib.GConvertError.T other than Gtk.Glib.GConvertError.NoConversion may occur.
Returns a pointer to a newly allocated UTF-8 string. This value must be freed with g_free(). If an error occurs, NULL will be returned and error set. In that case, items_read will be set to the position of the first invalid input character.


UnicharBreakType(c @ Std.Integer.SmallT) : Gtk.Glib.GUnicodeBreakType.T

Determines the break type of c. c should be a Unicode character (to derive a character from UTF-8 encoded text, use Utf8GetChar). The break type is used to find word and line breaks ("text boundaries"), Pango implements the Unicode boundary resolution algorithms and normally you would use a function such as pango_break() instead of caring about break types yourself.

c a Unicode character
Returns the break type of c


UnicharCombiningClass(uc @ Std.Integer.SmallT) : Std.Integer.SmallT

Determines the canonical combining class of a Unicode character.

uc a Unicode character
Returns the combining class of the character


UnicharDigitValue(c @ Std.Integer.SmallT) : Std.Integer.SmallT

Determines the numeric value of a character as a decimal digit.

c a Unicode character
Returns If c is a decimal digit (according to UnicharIsdigit), its numeric value. Otherwise, -1.


UnicharGetMirrorChar(ch @ Std.Integer.SmallT, mirrored_ch @ Std.Object.T) : Std.Symbol.T

In Unicode, some characters are mirrored. This means that their images are mirrored horizontally in text that is laid out from right to left. For instance, "(" would become its mirror image, ")", in right-to-left text.

If ch has the Unicode mirrored property and there is another unicode character that typically has a glyph that is the mirror image of ch's glyph and mirrored_ch is set, it puts that character in the address pointed to by mirrored_ch. Otherwise the original character is put.

ch a Unicode character
mirrored_ch location to store the mirrored character
Returns TRUE if ch has a mirrored character, FALSE otherwise


UnicharGetScript(ch @ Std.Integer.SmallT) : Gtk.Glib.GUnicodeScript.T

Looks up the Gtk.Glib.GUnicodeScript.T for a particular character (as defined by Unicode Standard Annex #24). No check is made for ch being a valid Unicode character; if you pass in invalid character, the result is undefined.

This function is equivalent to Gtk.Pango.Global.ScriptForUnichar and the two are interchangeable.

ch a Unicode character
Returns the Gtk.Glib.GUnicodeScript.T for the character.


UnicharIsalnum(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is alphanumeric. Given some UTF-8 text, obtain a character value with Utf8GetChar.

c a Unicode character
Returns TRUE if c is an alphanumeric character


UnicharIsalpha(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is alphabetic (i.e. a letter). Given some UTF-8 text, obtain a character value with Utf8GetChar.

c a Unicode character
Returns TRUE if c is an alphabetic character


UnicharIscntrl(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is a control character. Given some UTF-8 text, obtain a character value with Utf8GetChar.

c a Unicode character
Returns TRUE if c is a control character


UnicharIsdefined(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines if a given character is assigned in the Unicode standard.

c a Unicode character
Returns TRUE if the character has an assigned value


UnicharIsdigit(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is numeric (i.e. a digit). This covers ASCII 0-9 and also digits in other languages/scripts. Given some UTF-8 text, obtain a character value with Utf8GetChar.

c a Unicode character
Returns TRUE if c is a digit


UnicharIsgraph(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is printable and not a space (returns FALSE for control characters, format characters, and spaces). UnicharIsprint is similar, but returns TRUE for spaces. Given some UTF-8 text, obtain a character value with Utf8GetChar.

c a Unicode character
Returns TRUE if c is printable unless it's a space


UnicharIslower(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is a lowercase letter. Given some UTF-8 text, obtain a character value with Utf8GetChar.

c a Unicode character
Returns TRUE if c is a lowercase letter


UnicharIsmark(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is a mark (non-spacing mark, combining mark, or enclosing mark in Unicode speak). Given some UTF-8 text, obtain a character value with Utf8GetChar.

Note: in most cases where isalpha characters are allowed, ismark characters should be allowed to as they are essential for writing most European languages as well as many non-Latin scripts.

c a Unicode character
Returns TRUE if c is a mark character


UnicharIsprint(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is printable. Unlike UnicharIsgraph, returns TRUE for spaces. Given some UTF-8 text, obtain a character value with Utf8GetChar.

c a Unicode character
Returns TRUE if c is printable


UnicharIspunct(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is punctuation or a symbol. Given some UTF-8 text, obtain a character value with Utf8GetChar.

c a Unicode character
Returns TRUE if c is a punctuation or symbol character


UnicharIsspace(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines whether a character is a space, tab, or line separator (newline, carriage return, etc.). Given some UTF-8 text, obtain a character value with Utf8GetChar.

(Note: don't use this to do word breaking; you have to use Pango or equivalent to get word breaking right, the algorithm is fairly complex.)

c a Unicode character
Returns TRUE if c is a space character


UnicharIstitle(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines if a character is titlecase. Some characters in Unicode which are composites, such as the DZ digraph have three case variants instead of just two. The titlecase form is used at the beginning of a word where only the first letter is capitalized. The titlecase form of the DZ digraph is U+01F2 LATIN CAPITAL LETTTER D WITH SMALL LETTER Z.

c a Unicode character
Returns TRUE if the character is titlecase


UnicharIsupper(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines if a character is uppercase.

c a Unicode character
Returns TRUE if c is an uppercase character


UnicharIswide(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines if a character is typically rendered in a double-width cell.

c a Unicode character
Returns TRUE if the character is wide


UnicharIswideCjk(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines if a character is typically rendered in a double-width cell under legacy East Asian locales. If a character is wide according to UnicharIswide, then it is also reported wide with this function, but the converse is not necessarily true. See the Unicode Standard Annex #11 for details.

If a character passes the UnicharIswide test then it will also pass this test, but not the other way around. Note that some characters may pas both this test and UnicharIszerowidth.

c a Unicode character
Returns TRUE if the character is wide in legacy East Asian locales


UnicharIsxdigit(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines if a character is a hexidecimal digit.

c a Unicode character.
Returns TRUE if the character is a hexadecimal digit


UnicharIszerowidth(c @ Std.Integer.SmallT) : Std.Symbol.T

Determines if a given character typically takes zero width when rendered. The return value is TRUE for all non-spacing and enclosing marks (e.g., combining accents), format characters, zero-width space, but not U+00AD SOFT HYPHEN.

A typical use of this function is with one of UnicharIswide or UnicharIswideCjk to determine the number of cells a string occupies when displayed on a grid display (terminals). However, note that not all terminals support zero-width rendering of zero-width marks.

c a Unicode character
Returns TRUE if the character has zero width


UnicharToUtf8(c @ Std.Integer.SmallT, outbuf @ Std.String.T) : Std.Integer.SmallT

Converts a single character to UTF-8.

c a Unicode character code
outbuf output buffer, must have at least 6 bytes of space. If NULL, the length will be computed and returned and nothing will be written to outbuf.
Returns number of bytes written


UnicharTolower(c @ Std.Integer.SmallT) : Std.Integer.SmallT

Converts a character to lower case.

c a Unicode character.
Returns the result of converting c to lower case. If c is not an upperlower or titlecase character, or has no lowercase equivalent c is returned unchanged.


UnicharTotitle(c @ Std.Integer.SmallT) : Std.Integer.SmallT

Converts a character to the titlecase.

c a Unicode character
Returns the result of converting c to titlecase. If c is not an uppercase or lowercase character, c is returned unchanged.


UnicharToupper(c @ Std.Integer.SmallT) : Std.Integer.SmallT

Converts a character to uppercase.

c a Unicode character
Returns the result of converting c to uppercase. If c is not an lowercase or titlecase character, or has no upper case equivalent c is returned unchanged.


UnicharType(c @ Std.Integer.SmallT) : Gtk.Glib.GUnicodeType.T

Classifies a Unicode character by type.

c a Unicode character
Returns the type of the character.


UnicharValidate(ch @ Std.Integer.SmallT) : Std.Symbol.T

Checks whether ch is a valid Unicode character. Some possible integer values of ch will not be valid. 0 is considered a valid character, though it's normally a string terminator.

ch a Unicode character
Returns TRUE if ch is a valid Unicode character


UnicharXdigitValue(c @ Std.Integer.SmallT) : Std.Integer.SmallT

Determines the numeric value of a character as a hexidecimal digit.

c a Unicode character
Returns If c is a hex digit (according to UnicharIsxdigit), its numeric value. Otherwise, -1.


Utf16ToUcs4(str @ Std.Object.T, len @ Std.Integer.SmallT, items_read @ Std.Object.T, items_written @ Std.Object.T, error @ Std.Object.T) : Std.Object.T

Convert a string from UTF-16 to UCS-4. The result will be nul-terminated.

str a UTF-16 encoded string
len the maximum length (number of gunichar2) of str to use. If len < 0, then the string is nul-terminated.
items_read location to store number of words read, or NULL. If NULL, then Gtk.Glib.GConvertError.PartialInput will be returned in case str contains a trailing partial character. If an error occurs then the index of the invalid input is stored here.
items_written location to store number of characters written, or NULL. The value stored here does not include the trailing 0 character.
error location to store the error occurring, or NULL to ignore errors. Any of the errors in Gtk.Glib.GConvertError.T other than Gtk.Glib.GConvertError.NoConversion may occur.
Returns a pointer to a newly allocated UCS-4 string. This value must be freed with g_free(). If an error occurs, NULL will be returned and error set.


Utf16ToUtf8(str @ Std.Object.T, len @ Std.Integer.SmallT, items_read @ Std.Object.T, items_written @ Std.Object.T, error @ Std.Object.T) : Std.String.T

Convert a string from UTF-16 to UTF-8. The result will be terminated with a 0 byte.

Note that the input is expected to be already in native endianness, an initial byte-order-mark character is not handled specially. g_convert() can be used to convert a byte buffer of UTF-16 data of ambiguous endianess.

Further note that this function does not validate the result string; it may e.g. include embedded NUL characters. The only validation done by this function is to ensure that the input can be correctly interpreted as UTF-16, i.e. it doesn't contain things unpaired surrogates.

str a UTF-16 encoded string
len the maximum length (number of gunichar2) of str to use. If len < 0, then the string is nul-terminated.
items_read location to store number of words read, or NULL. If NULL, then Gtk.Glib.GConvertError.PartialInput will be returned in case str contains a trailing partial character. If an error occurs then the index of the invalid input is stored here.
items_written location to store number of bytes written, or NULL. The value stored here does not include the trailing 0 byte.
error location to store the error occurring, or NULL to ignore errors. Any of the errors in Gtk.Glib.GConvertError.T other than Gtk.Glib.GConvertError.NoConversion may occur.
Returns a pointer to a newly allocated UTF-8 string. This value must be freed with g_free(). If an error occurs, NULL will be returned and error set.


Utf8Casefold(str @ Std.String.T, len @ Std.Integer.SmallT) : Std.String.T

Converts a string into a form that is independent of case. The result will not correspond to any particular case, but can be compared for equality or ordered with the results of calling Utf8Casefold on other strings.

Note that calling Utf8Casefold followed by Utf8Collate is only an approximation to the correct linguistic case insensitive ordering, though it is a fairly good one. Getting this exactly right would require a more sophisticated collation function that takes case sensitivity into account. GLib does not currently provide such a function.

str a UTF-8 encoded string
len length of str, in bytes, or -1 if str is nul-terminated.
Returns a newly allocated string, that is a case independent form of str.


Utf8Collate(str1 @ Std.String.T, str2 @ Std.String.T) : Std.Integer.SmallT

Compares two strings for ordering using the linguistically correct rules for the current locale. When sorting a large number of strings, it will be significantly faster to obtain collation keys with Utf8CollateKey and compare the keys with strcmp() when sorting instead of sorting the original strings.

str1 a UTF-8 encoded string
str2 a UTF-8 encoded string
Returns < 0 if str1 compares before str2, 0 if they compare equal, > 0 if str1 compares after str2.


Utf8CollateKey(str @ Std.String.T, len @ Std.Integer.SmallT) : Std.String.T

Converts a string into a collation key that can be compared with other collation keys produced by the same function using strcmp().

The results of comparing the collation keys of two strings with strcmp() will always be the same as comparing the two original keys with Utf8Collate.

Note that this function depends on the current locale.

str a UTF-8 encoded string.
len length of str, in bytes, or -1 if str is nul-terminated.
Returns a newly allocated string. This string should be freed with g_free() when you are done with it.


Utf8CollateKeyForFilename(str @ Std.String.T, len @ Std.Integer.SmallT) : Std.String.T

Converts a string into a collation key that can be compared with other collation keys produced by the same function using strcmp().

In order to sort filenames correctly, this function treats the dot '.' as a special case. Most dictionary orderings seem to consider it insignificant, thus producing the ordering "event.c" "eventgenerator.c" "event.h" instead of "event.c" "event.h" "eventgenerator.c". Also, we would like to treat numbers intelligently so that "file1" "file10" "file5" is sorted as "file1" "file5" "file10".

Note that this function depends on the current locale.

str a UTF-8 encoded string.
len length of str, in bytes, or -1 if str is nul-terminated.
Returns a newly allocated string. This string should be freed with g_free() when you are done with it.


Utf8FindNextChar(p @ Std.String.T, end @ Std.String.T) : Std.String.T

Finds the start of the next UTF-8 character in the string after p.

p does not have to be at the beginning of a UTF-8 character. No check is made to see if the character found is actually valid other than it starts with an appropriate byte.

p a pointer to a position within a UTF-8 encoded string
end a pointer to the byte following the end of the string, or NULL to indicate that the string is nul-terminated.
Returns a pointer to the found character or NULL


Utf8FindPrevChar(str @ Std.String.T, p @ Std.String.T) : Std.String.T

Given a position p with a UTF-8 encoded string str, find the start of the previous UTF-8 character starting before p. Returns NULL if no UTF-8 characters are present in str before p.

p does not have to be at the beginning of a UTF-8 character. No check is made to see if the character found is actually valid other than it starts with an appropriate byte.

str pointer to the beginning of a UTF-8 encoded string
p pointer to some position within str
Returns a pointer to the found character or NULL.


Utf8GetChar(p @ Std.String.T) : Std.Integer.SmallT

Converts a sequence of bytes encoded as UTF-8 to a Unicode character. If p does not point to a valid UTF-8 encoded character, results are undefined. If you are not sure that the bytes are complete valid Unicode characters, you should use Utf8GetCharValidated instead.

p a pointer to Unicode character encoded as UTF-8
Returns the resulting character


Utf8GetCharValidated(p @ Std.String.T, max_len @ Std.Integer.SmallT) : Std.Integer.SmallT

Convert a sequence of bytes encoded as UTF-8 to a Unicode character. This function checks for incomplete characters, for invalid characters such as characters that are out of the range of Unicode, and for overlong encodings of valid characters.

p a pointer to Unicode character encoded as UTF-8
max_len the maximum number of bytes to read, or -1, for no maximum or if p is nul-terminated
Returns the resulting character. If p points to a partial sequence at the end of a string that could begin a valid character (or if max_len is zero), returns (gunichar)-2; otherwise, if p does not point to a valid UTF-8 encoded Unicode character, returns (gunichar)-1.


Utf8Normalize(str @ Std.String.T, len @ Std.Integer.SmallT, mode @ Gtk.Glib.GNormalizeMode.T) : Std.String.T

Converts a string into canonical form, standardizing such issues as whether a character with an accent is represented as a base character and combining accent or as a single precomposed character. The string has to be valid UTF-8, otherwise NULL is returned. You should generally call Utf8Normalize before comparing two Unicode strings.

The normalization mode Gtk.Glib.GNormalizeMode.Default only standardizes differences that do not affect the text content, such as the above-mentioned accent representation. Gtk.Glib.GNormalizeMode.All also standardizes the "compatibility" characters in Unicode, such as SUPERSCRIPT THREE to the standard forms (in this case DIGIT THREE). Formatting information may be lost but for most text operations such characters should be considered the same.

Gtk.Glib.GNormalizeMode.DefaultCompose and Gtk.Glib.GNormalizeMode.AllCompose are like Gtk.Glib.GNormalizeMode.Default and Gtk.Glib.GNormalizeMode.All, but returned a result with composed forms rather than a maximally decomposed form. This is often useful if you intend to convert the string to a legacy encoding or pass it to a system with less capable Unicode handling.

str a UTF-8 encoded string.
len length of str, in bytes, or -1 if str is nul-terminated.
mode the type of normalization to perform.
Returns a newly allocated string, that is the normalized form of str, or NULL if str is not valid UTF-8.


Utf8OffsetToPointer(str @ Std.String.T, offset @ Std.Integer.SmallT) : Std.String.T

Converts from an integer character offset to a pointer to a position within the string.

Since 2.10, this function allows to pass a negative offset to step backwards. It is usually worth stepping backwards from the end instead of forwards if offset is in the last fourth of the string, since moving forward is about 3 times faster than moving backward.

Note

This function doesn't abort when reaching the end of str. Therefore you should be sure that offset is within string boundaries before calling that function. Call Utf8Strlen when unsure.

This limitation exists as this function is called frequently during text rendering and therefore has to be as fast as possible.



Utf8PointerToOffset(str @ Std.String.T, pos @ Std.String.T) : Std.Integer.SmallT

Converts from a pointer to position within a string to a integer character offset.

Since 2.10, this function allows pos to be before str, and returns a negative offset in this case.

str a UTF-8 encoded string
pos a pointer to a position within str
Returns the resulting character offset


Utf8PrevChar(p @ Std.String.T) : Std.String.T

Finds the previous UTF-8 character in the string before p.

p does not have to be at the beginning of a UTF-8 character. No check is made to see if the character found is actually valid other than it starts with an appropriate byte. If p might be the first character of the string, you must use Utf8FindPrevChar instead.

p a pointer to a position within a UTF-8 encoded string
Returns a pointer to the found character.


Utf8Strchr(p @ Std.String.T, len @ Std.Integer.SmallT, c @ Std.Integer.SmallT) : Std.String.T

Finds the leftmost occurrence of the given Unicode character in a UTF-8 encoded string, while limiting the search to len bytes. If len is -1, allow unbounded search.

p a nul-terminated UTF-8 encoded string
len the maximum length of p
c a Unicode character
Returns NULL if the string does not contain the character, otherwise, a pointer to the start of the leftmost occurrence of the character in the string.


Utf8Strdown(str @ Std.String.T, len @ Std.Integer.SmallT) : Std.String.T

Converts all Unicode characters in the string that have a case to lowercase. The exact manner that this is done depends on the current locale, and may result in the number of characters in the string changing.

str a UTF-8 encoded string
len length of str, in bytes, or -1 if str is nul-terminated.
Returns a newly allocated string, with all characters converted to lowercase.


Utf8Strlen(p @ Std.String.T, max @ Std.Integer.SmallT) : Std.Integer.SmallT

Computes the length of the string in characters, not including the terminating nul character.

p pointer to the start of a UTF-8 encoded string
max the maximum number of bytes to examine. If max is less than 0, then the string is assumed to be nul-terminated. If max is 0, p will not be examined and may be NULL.
Returns the length of the string in characters


Utf8Strncpy(dest @ Std.String.T, src @ Std.String.T, n @ Std.Integer.SmallT) : Std.String.T

Like the standard C strncpy() function, but copies a given number of characters instead of a given number of bytes. The src string must be valid UTF-8 encoded text. (Use Utf8Validate on all text before trying to use UTF-8 utility functions with it.)

dest buffer to fill with characters from src
src UTF-8 encoded string
n character count
Returns dest


Utf8Strrchr(p @ Std.String.T, len @ Std.Integer.SmallT, c @ Std.Integer.SmallT) : Std.String.T

Find the rightmost occurrence of the given Unicode character in a UTF-8 encoded string, while limiting the search to len bytes. If len is -1, allow unbounded search.

p a nul-terminated UTF-8 encoded string
len the maximum length of p
c a Unicode character
Returns NULL if the string does not contain the character, otherwise, a pointer to the start of the rightmost occurrence of the character in the string.


Utf8Strreverse(str @ Std.String.T, len @ Std.Integer.SmallT) : Std.String.T

Reverses a UTF-8 string. str must be valid UTF-8 encoded text. (Use Utf8Validate on all text before trying to use UTF-8 utility functions with it.)

This function is intended for programmatic uses of reversed strings. It pays no attention to decomposed characters, combining marks, byte order marks, directional indicators (LRM, LRO, etc) and similar characters which might need special handling when reversing a string for display purposes.

Note that unlike Gtk.Glib.GStrfuncs.Strreverse, this function returns newly-allocated memory, which should be freed with g_free() when no longer needed.

str a UTF-8 encoded string
len the maximum length of str to use, in bytes. If len < 0, then the string is nul-terminated.
Returns a newly-allocated string which is the reverse of str.


Utf8Strup(str @ Std.String.T, len @ Std.Integer.SmallT) : Std.String.T

Converts all Unicode characters in the string that have a case to uppercase. The exact manner that this is done depends on the current locale, and may result in the number of characters in the string increasing. (For instance, the German ess-zet will be changed to SS.)

str a UTF-8 encoded string
len length of str, in bytes, or -1 if str is nul-terminated.
Returns a newly allocated string, with all characters converted to uppercase.


Utf8ToUcs4(str @ Std.String.T, len @ Std.Integer.SmallT, items_read @ Std.Object.T, items_written @ Std.Object.T, error @ Std.Object.T) : Std.Object.T

Convert a string from UTF-8 to a 32-bit fixed width representation as UCS-4. A trailing 0 character will be added to the string after the converted text.

str a UTF-8 encoded string
len the maximum length of str to use, in bytes. If len < 0, then the string is nul-terminated.
items_read location to store number of bytes read, or NULL. If NULL, then Gtk.Glib.GConvertError.PartialInput will be returned in case str contains a trailing partial character. If an error occurs then the index of the invalid input is stored here.
items_written location to store number of characters written or NULL. The value here stored does not include the trailing 0 character.
error location to store the error occurring, or NULL to ignore errors. Any of the errors in Gtk.Glib.GConvertError.T other than Gtk.Glib.GConvertError.NoConversion may occur.
Returns a pointer to a newly allocated UCS-4 string. This value must be freed with g_free(). If an error occurs, NULL will be returned and error set.


Utf8ToUcs4Fast(str @ Std.String.T, len @ Std.Integer.SmallT, items_written @ Std.Object.T) : Std.Object.T

Convert a string from UTF-8 to a 32-bit fixed width representation as UCS-4, assuming valid UTF-8 input. This function is roughly twice as fast as Utf8ToUcs4 but does no error checking on the input. A trailing 0 character will be added to the string after the converted text.

str a UTF-8 encoded string
len the maximum length of str to use, in bytes. If len < 0, then the string is nul-terminated.
items_written location to store the number of characters in the result, or NULL.
Returns a pointer to a newly allocated UCS-4 string. This value must be freed with g_free().


Utf8ToUtf16(str @ Std.String.T, len @ Std.Integer.SmallT, items_read @ Std.Object.T, items_written @ Std.Object.T, error @ Std.Object.T) : Std.Object.T

Convert a string from UTF-8 to UTF-16. A 0 character will be added to the result after the converted text.

str a UTF-8 encoded string
len the maximum length (number of bytes) of str to use. If len < 0, then the string is nul-terminated.
items_read location to store number of bytes read, or NULL. If NULL, then Gtk.Glib.GConvertError.PartialInput will be returned in case str contains a trailing partial character. If an error occurs then the index of the invalid input is stored here.
items_written location to store number of gunichar2 written, or NULL. The value stored here does not include the trailing 0.
error location to store the error occurring, or NULL to ignore errors. Any of the errors in Gtk.Glib.GConvertError.T other than Gtk.Glib.GConvertError.NoConversion may occur.
Returns a pointer to a newly allocated UTF-16 string. This value must be freed with g_free(). If an error occurs, NULL will be returned and error set.


Utf8Validate(str @ Std.String.T, max_len @ Std.Integer.SmallT, end @ Agg.List.T) : Std.Symbol.T

Validates UTF-8 encoded text. str is the text to validate; if str is nul-terminated, then max_len can be -1, otherwise max_len should be the number of bytes to validate. If end is non-NULL, then the end of the valid range will be stored there (i.e. the start of the first invalid character if some bytes were invalid, or the end of the text being validated otherwise).

Note that Utf8Validate returns FALSE if max_len is positive and NUL is met before max_len bytes have been read.

Returns TRUE if all of str was valid. Many GLib and GTK+ routines require valid UTF-8 as input; so data read from a file or the network should be checked with Utf8Validate before doing anything else with it.

str a pointer to character data
max_len max bytes to validate, or -1 to go until NUL
end return location for end of valid data. [allow-none][out]
Returns TRUE if the text was valid UTF-8