Bytevectors

Many applications deal with blocks of binary data by accessing them in various ways-extracting signed or unsigned numbers of various sizes. Therefore, the (rnrs bytevectors (6))library provides a single type for blocks of binary data with multiple ways to access that data. It deals with integers and floating-point representations in various sizes with specified endianness.

Bytevectorsare objects of a disjoint type. Conceptually, a bytevector represents a sequence of 8-bit bytes. The description of bytevectors uses the term byte for an exact integer object in the interval { - 128, ..., 127} and the term octet for an exact integer object in the interval {0, ..., 255}. A byte corresponds to its two's complement representation as an octet.

The length of a bytevector is the number of bytes it contains. This number is fixed. A valid index into a bytevector is an exact, non-negative integer object less than the length of the bytevector. The first byte of a bytevector has index 0; the last byte has an index one less than the length of the bytevector.

Generally, the access procedures come in different flavors according to the size of the represented integer and the endianness of the representation. The procedures also distinguish signed and unsigned representations. The signed representations all use two's complement.

Like string literals, literals representing bytevectors do not need to be quoted:

#vu8(12 23 123)
#vu8(12 23 123)

Library (rnrs bytevectors (6))

[R6RS] This library provides a single type for blocks of binary data with multiple ways to access that data.

General operations

[R6RS] The name of symbol must be a symbol describing an endianness. (endianness _symbol_) evaluates to the symbol named symbol. Whenever one of the procedures operating on bytevectors accepts an endianness as an argument, that argument must be one of these symbols. It is a syntax violation for symbol to be anything other than an endianness symbol supported by the Sagittarius.

Currently, Sagittarius supports these symbols; big, littleand native.

[R6RS] Returns the endianness symbol associated platform endianness. This may be a symbol either big or little.

[R6RS] Returns #t if obj is a bytevector, otherwise returns #f.

[R6RS] Returns a newly allocated bytevector of k bytes.

If the fill argument is missing, the initial contents of the returned bytevector are 0.

If the fill argument is present, it must be an exact integer object in the interval {-128, ..., 255} that specifies the initial value for the bytes of the bytevector: If fill is positive, it is interpreted as an octet; if it is negative, it is interpreted as a byte.

[R6RS] Returns, as an exact integer object, the number of bytes in bytevector.

[R6RS] Returns #t if bytevector1 and bytevector2 are equal-that is, if they have the same length and equal bytes at all valid indices. It returns #f otherwise.

[R6RS+] The fill argument is as in the description of the make-bytevector procedure. The bytevector-fill! procedure stores fill in every element of bytevector and returns unspecified values. Analogous to vector-fill!.

If optional arguments start or end is given, then the procedure restricts the range of filling from start to end (exclusive) index of bytevector. When end is omitted then it uses the length of the given bytevector.

[R6RS] Source and target must be bytevectors. Source-start, target-start, and k must be non-negative exact integer objects that satisfy

0 <= source-start <= source-start + k <= _source-length_0 <= target-start <= target-start + k <= _target-length_where source-length is the length of source and _target-length_is the length of target.

The bytevector-copy! procedure copies the bytes from source at indices

source-start, ... source-start + k - 1

to consecutive indices in target starting at target-index.

This returns unspecified values.

_ :optional (start 0) (end -1)

[R6RS+] Returns a newly allocated copy of bytevector.

If optional argument start was given, the procedure copies from the given start index.

If optional argument end was given, the procedure copies to the given end index (exclusive).

Operation on bytes and octets

[R6RS] K must be a valid index of bytevector.

The bytevector-u8-ref procedure returns the byte at index k of bytevector, as an octet.

The bytevector-s8-ref procedure returns the byte at index k of bytevector, as a (signed) byte.

[R6RS] K must be a valid index of bytevector.

The bytevector-u8-set! procedure stores octet in element _k_of bytevector.

The bytevector-s8-set! procedure stores the two's-complement representation of byte in element k of bytevector.

Both procedures return unspecified values.

[R6RS] List must be a list of octets.

The bytevector->u8-list procedure returns a newly allocated list of the octets of bytevector in the same order.

The u8-list->bytevector procedure returns a newly allocated bytevector whose elements are the elements of list list, in the same order. It is analogous to list->vector.

Operations on integers of arbitary size

[R6RS] Size must be a positive exact integer object. K, ..., k + size - 1 must be valid indices of bytevector.

The bytevector-uint-ref procedure retrieves the exact integer object corresponding to the unsigned representation of size size and specified by endianness at indices k, ..., k + size - 1.

The bytevector-sint-ref procedure retrieves the exact integer object corresponding to the two's-complement representation of size size and specified by endianness at indices k, ..., k + size - 1.

For bytevector-uint-set!, n must be an exact integer object in the interval _{0, ..., 256 ^ "size" - 1}_The bytevector-uint-set! procedure stores the unsigned representation of size size and specified by endianness into bytevector at indices k, ..., k + size - 1.

For bytevector-sint-set!, n must be an exact integer object in the interval {-256 ^ "size" / 2, ..., 256 ^ "size" / 2 - 1}. bytevector-sint-set! stores the two's-complement representation of size size and specified by endianness into bytevector at indices k, ..., k + size - 1.

The ...-set! procedures return unspecified values.

[R6RS] Size must be a positive exact integer object. For uint-list->bytevector, list must be a list of exact integer objects in the interval {0, ..., 256 ^ "size" - 1}. For sint-list->bytevector, list must be a list of exact integer objects in the interval {-256 ^ "size"/2, ..., 256 ^ "size"/2 - 1}. The length of _bytevector_or, respectively, of list must be divisible by size.

These procedures convert between lists of integer objects and their consecutive representations according to size and endianness in the _bytevector_objects in the same way as bytevector->u8-list and u8-list->bytevectordo for one-byte representations.

Operation on 16-bit integers

[R6RS] K must be a valid index of bytevector; so must k + 1. For bytevector-u16-set! and bytevector-u16-native-set!, _n_must be an exact integer object in the interval {0, ..., 2 ^ 16 - 1}. For bytevector-s16-set! and bytevector-s16-native-set!, _n_must be an exact integer object in the interval {-2 ^ 15, ..., 2 ^ 15 - 1}.

These retrieve and set two-byte representations of numbers at indices _k_and k + 1, according to the endianness specified by endianness. The procedures with u16 in their names deal with the unsigned representation; those with s16 in their names deal with the two's-complement representation.

The procedures with native in their names employ the native endianness, and work only at aligned indices: k must be a multiple of 2.

The ...-set! procedures return unspecified values.

Operation on 32-bit integers

[R6RS] K must be a valid index of bytevector; so must k + 3. For bytevector-u32-set! and bytevector-u32-native-set!, _n_must be an exact integer object in the interval {0, ..., 2 ^ 32 - 1}. For bytevector-s32-set! and bytevector-s32-native-set!, _n_must be an exact integer object in the interval {-2 ^ 31, ..., 2 ^ 32 - 1}.

These retrieve and set two-byte representations of numbers at indices _k_and k + 3, according to the endianness specified by endianness. The procedures with u32 in their names deal with the unsigned representation; those with s32 in their names deal with the two's-complement representation.

The procedures with native in their names employ the native endianness, and work only at aligned indices: k must be a multiple of 4.

The ...-set! procedures return unspecified values.

Operation on 64-bit integers

[R6RS] K must be a valid index of bytevector; so must k + 7. For bytevector-u64-set! and bytevector-u64-native-set!, _n_must be an exact integer object in the interval {0, ..., 2 ^ 64 - 1}. For bytevector-s64-set! and bytevector-s64-native-set!, _n_must be an exact integer object in the interval {-2 ^ 63, ..., 2 ^ 64 - 1}.

These retrieve and set two-byte representations of numbers at indices _k_and k + 7, according to the endianness specified by endianness. The procedures with u64 in their names deal with the unsigned representation; those with s64 in their names deal with the two's-complement representation.

The procedures with native in their names employ the native endianness, and work only at aligned indices: k must be a multiple of 8.

The ...-set! procedures return unspecified values.

Operation on IEEE-754 representations

[R6RS] K, …, k + 3 must be valid indices of bytevector. For bytevector-ieee-single-native-ref, k must be a multiple of 4.

These procedures return the inexact real number object that best represents the IEEE-754 single-precision number represented by the four bytes beginning at index k.

[R6RS] K, …, k + 7 must be valid indices of bytevector. For bytevector-ieee-double-native-ref, k must be a multiple of 8.

These procedures return the inexact real number object that best represents the IEEE-754 double-precision number represented by the four bytes beginning at index k.

[R6RS] K, …, k + 3 must be valid indices of bytevector. For bytevector-ieee-single-native-set!, k must be a multiple of 4.

These procedures store an IEEE-754 single-precision representation of _x_into elements k through k + 3 of bytevector, and return unspecified values.

[R6RS] K, …, k + 7 must be valid indices of bytevector. For bytevector-ieee-double-native-set!, k must be a multiple of 8.

These procedures store an IEEE-754 double-precision representation of _x_into elements k through k + 7 of bytevector, and return unspecified values.

Operation on strings

This section describes procedures that convert between strings and bytevectors containing Unicode encodings of those strings. When decoding bytevectors, encoding errors are handled as with the replace semantics of textual I/O: If an invalid or incomplete character encoding is encountered, then the replacement character U+FFFD is appended to the string being generated, an appropriate number of bytes are ignored, and decoding continues with the following bytes.

[R6RS+] [R7RS] Returns a newly allocated (unless empty) bytevector that contains the UTF-8 encoding of the given string.

If the optional argument start is given, the procedure converts given string from start index (inclusive).

If the optional argument end is given, the procedure converts given string to end index (exclusive).

These optional arguments must be fixnum if it's given.

[R6RS] If endianness is specified, it must be the symbol bigor the symbol little. The string->utf16 procedure returns a newly allocated (unless empty) bytevector that contains the UTF-16BE or UTF-16LE encoding of the given string (with no byte-order mark). If _endianness_is not specified or is big, then UTF-16BE is used. If endianness is little, then UTF-16LE is used.

[R6RS] If endianness is specified, it must be the symbol bigor the symbol little. The string->utf32 procedure returns a newly allocated (unless empty) bytevector that contains the UTF-32BE or UTF-32LE encoding of the given string (with no byte-order mark). If _endianness_is not specified or is big, then UTF-32BE is used. If endianness is little, then UTF-32LE is used.

[R6RS] Returns a newly allocated (unless empty) string whose character sequence is encoded by the given bytevector.

If the optional argument start is given, the procedure converts given string from start index (inclusive).

If the optional argument end is given, the procedure converts given string to end index (exclusive).

These optional arguments must be fixnum if it's given.

[R6RS] Endianness must be the symbol big or the symbol little. The utf16->string procedure returns a newly allocated (unless empty) string whose character sequence is encoded by the given bytevector. Bytevector is decoded according to UTF-16BE or UTF-16LE: If endianness-mandatory? is absent or #f, utf16->string determines the endianness according to a UTF-16 BOM at the beginning of _bytevector_if a BOM is present; in this case, the BOM is not decoded as a character. Also in this case, if no UTF-16 BOM is present, endianness specifies the endianness of the encoding. If endianness-mandatory? is a true value, _endianness_specifies the endianness of the encoding, and any UTF-16 BOM in the encoding is decoded as a regular character.

[R6RS] Endianness must be the symbol big or the symbol little. The utf32->string procedure returns a newly allocated (unless empty) string whose character sequence is encoded by the given bytevector. Bytevector is decoded according to UTF-32BE or UTF-32LE: If endianness-mandatory? is absent or #f, utf32->string determines the endianness according to a UTF-32 BOM at the beginning of _bytevector_if a BOM is present; in this case, the BOM is not decoded as a character. Also in this case, if no UTF-32 BOM is present, endianness specifies the endianness of the encoding. If endianness-mandatory? is a true value, _endianness_specifies the endianness of the encoding, and any UTF-32 BOM in the encoding is decoded as a regular character.