Unicode

Library (rnrs unicode (6))

[R6RS] The procedures exported by the (rnrs unicode (6))library provide access to some aspects of the Unicode semantics for characters and strings: category information, case-independent comparisons, case mappings, and normalization.

Some of the procedures that operate on characters or strings ignore the difference between upper case and lower case. These procedures have "-ci" (for "case insensitive") embedded in their names.

Characters

[R6RS] These procedures take a character argument and return a character result. If the argument is an upper-case or title-case character, and if there is a single character that is its lower-case form, then char-downcasereturns that character. If the argument is a lower-case or title-case character, and there is a single character that is its upper-case form, then char-upcasereturns that character. If the argument is a lower-case or upper-case character, and there is a single character that is its title-case form, then char-titlecasereturns that character. If the argument is not a title-case character and there is no single character that is its title-case form, then char-titlecasereturns the upper-case form of the argument. Finally, if the character has a case-folded character, then char-foldcase returns that character. Otherwise the character returned is the same as the argument. For Turkic characters İ (#\x130) and ı (#\x131), char-foldcase behaves as the identity function; otherwise char-foldcase is the same as char-downcase composed with char-upcase.

[R6RS] These procedures are similar to char=?, etc., but operate on the case-folded versions of the characters.

[R6RS] These procedures return #t if their arguments are alphabetic, numeric, whitespace, upper-case, lower-case, or title-case characters, respectively; otherwise they return #f.

A character is alphabetic if it has the Unicode "Alphabetic" property. A character is numeric if it has the Unicode "Numeric" property. A character is whitespace if has the Unicode "White_Space" property. A character is upper case if it has the Unicode "Uppercase" property, lower case if it has the "Lowercase" property, and title case if it is in the Lt general category.

[R6RS] Returns a symbol representing the Unicode general category of char, one of Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Ps, Pe, Pi, Pf, Pd, Pc, Po, Sc, Sm, Sk, So, Zs, Zp, Zl, Cc, Cf, Cs, Co, or Cn.

Strings

[R6RS+][SRFI-13] These procedures take a string argument and return a string result. They are defined in terms of Unicode's locale-independent case mappings from Unicode scalar-value sequences to scalar-value sequences. In particular, the length of the result string can be different from the length of the input string. When the specified result is equal in the sense of string=? to the argument, these procedures may return the argument instead of a newly allocated string.

The string-upcase procedure converts a string to upper case; string-downcase converts a string to lower case.

The string-foldcase procedure converts the string to its case-folded counterpart, using the full case-folding mapping, but without the special mappings for Turkic languages.

The string-titlecase procedure converts the first cased character of each word via char-titlecase, and downcases all other cased characters.

If the optional argument start and end are given, these must be exact integer and the procedures will first substring the given string with range start and end then convert it.

[R6RS] These procedures are similar to string=?, etc., but operate on the case-folded versions of the strings.

[R6RS] These procedures take a string argument and return a string result, which is the input string normalized to Unicode normalization form D, KD, C, or KC, respectively. When the specified result is equal in the sense of string=?to the argument, these procedures may return the argument instead of a newly allocated string.