CCString
Basic String Utils
make n c
is a string of length n
with each index holding the character c
.
init n f
is a string of length n
with index i
holding the character f i
(called in increasing index order).
get s i
is the character at index i
in s
. This is the same as writing s.[i]
.
Return a new string that contains the same bytes as the given byte sequence.
Return a new byte sequence that contains the same bytes as the given string.
Note. The Stdlib.(^)
binary operator concatenates two strings.
concat sep ss
concatenates the list of strings ss
, inserting the separator string sep
between each.
starts_with
~prefix s
is true
if and only if s
starts with prefix
.
ends_with
~suffix s
is true
if and only if s
ends with suffix
.
contains_from s start c
is true
if and only if c
appears in s
after position start
.
rcontains_from s stop c
is true
if and only if c
appears in s
before position stop+1
.
sub s pos len
is a string of length len
, containing the substring of s
that starts at position pos
and has length len
.
map f s
is the string resulting from applying f
to all the characters of s
in increasing order.
mapi f s
is like map
but the index of the character is also passed to f
.
fold_left f x s
computes f (... (f (f x s.[0]) s.[1]) ...) s.[n-1]
, where n
is the length of the string s
.
fold_right f s x
computes f s.[0] (f s.[1] ( ... (f s.[n-1] x) ...))
, where n
is the length of the string s
.
trim s
is s
without leading and trailing whitespace. Whitespace characters are: ' '
, '\x0C'
(form feed), '\n'
, '\r'
, and '\t'
.
escaped s
is s
with special characters represented by escape sequences, following the lexical conventions of OCaml.
All characters outside the US-ASCII printable range [0x20;0x7E] are escaped, as well as backslash (0x2F) and double-quote (0x22).
The function Scanf.unescaped
is a left inverse of escaped
, i.e. Scanf.unescaped (escaped s) = s
for any string s
(unless escaped s
fails).
uppercase_ascii s
is s
with all lowercase letters translated to uppercase, using the US-ASCII character set.
lowercase_ascii s
is s
with all uppercase letters translated to lowercase, using the US-ASCII character set.
capitalize_ascii s
is s
with the first character set to uppercase, using the US-ASCII character set.
uncapitalize_ascii s
is s
with the first character set to lowercase, using the US-ASCII character set.
iteri
is like iter
, but the function is also given the corresponding character index.
index_from s i c
is the index of the first occurrence of c
in s
after position i
.
index_from_opt s i c
is the index of the first occurrence of c
in s
after position i
(if any).
rindex_from s i c
is the index of the last occurrence of c
in s
before position i+1
.
rindex_from_opt s i c
is the index of the last occurrence of c
in s
before position i+1
(if any).
rindex_opt s c
is String.rindex_from_opt
s (length s - 1) c
.
val to_seqi : t -> (int * char) Stdlib.Seq.t
to_seqi s
is like to_seq
but also tuples the corresponding index.
val get_utf_8_uchar : t -> int -> Stdlib.Uchar.utf_decode
get_utf_8_uchar b i
decodes an UTF-8 character at index i
in b
.
val is_valid_utf_8 : t -> bool
is_valid_utf_8 b
is true
if and only if b
contains valid UTF-8 data.
val get_utf_16be_uchar : t -> int -> Stdlib.Uchar.utf_decode
get_utf_16be_uchar b i
decodes an UTF-16BE character at index i
in b
.
val is_valid_utf_16be : t -> bool
is_valid_utf_16be b
is true
if and only if b
contains valid UTF-16BE data.
val get_utf_16le_uchar : t -> int -> Stdlib.Uchar.utf_decode
get_utf_16le_uchar b i
decodes an UTF-16LE character at index i
in b
.
val is_valid_utf_16le : t -> bool
is_valid_utf_16le b
is true
if and only if b
contains valid UTF-16LE data.
The functions in this section binary decode integers from strings.
All following functions raise Invalid_argument
if the characters needed at index i
to decode the integer are not available.
Little-endian (resp. big-endian) encoding means that least (resp. most) significant bytes are stored first. Big-endian is also known as network byte order. Native-endian encoding is either little-endian or big-endian depending on Sys.big_endian
.
32-bit and 64-bit integers are represented by the int32
and int64
types, which can be interpreted either as signed or unsigned numbers.
8-bit and 16-bit integers are represented by the int
type, which has more bits than the binary encoding. These extra bits are sign-extended (or zero-extended) for functions which decode 8-bit or 16-bit integers and represented them with int
values.
get_uint8 b i
is b
's unsigned 8-bit integer starting at character index i
.
get_int8 b i
is b
's signed 8-bit integer starting at character index i
.
get_uint16_ne b i
is b
's native-endian unsigned 16-bit integer starting at character index i
.
get_uint16_be b i
is b
's big-endian unsigned 16-bit integer starting at character index i
.
get_uint16_le b i
is b
's little-endian unsigned 16-bit integer starting at character index i
.
get_int16_ne b i
is b
's native-endian signed 16-bit integer starting at character index i
.
get_int16_be b i
is b
's big-endian signed 16-bit integer starting at character index i
.
get_int16_le b i
is b
's little-endian signed 16-bit integer starting at character index i
.
get_int32_ne b i
is b
's native-endian 32-bit integer starting at character index i
.
val seeded_hash : int -> t -> int
A seeded hash function for strings, with the same output value as Hashtbl.seeded_hash
. This function allows this module to be passed as argument to the functor Hashtbl.MakeSeeded
.
get_int32_be b i
is b
's big-endian 32-bit integer starting at character index i
.
get_int32_le b i
is b
's little-endian 32-bit integer starting at character index i
.
get_int64_ne b i
is b
's native-endian 64-bit integer starting at character index i
.
get_int64_be b i
is b
's big-endian 64-bit integer starting at character index i
.
get_int64_le b i
is b
's little-endian 64-bit integer starting at character index i
.
val length : t -> int
length s
returns the length (number of characters) of the given string s
.
val blit : t -> int -> Stdlib.Bytes.t -> int -> int -> unit
blit src src_pos dst dst_pos len
copies len
characters from string src
starting at character indice src_pos
, to the Bytes sequence dst
starting at character indice dst_pos
. Like String.blit
. Compatible with the -safe-string
option.
val fold : ('a -> char -> 'a) -> 'a -> t -> 'a
fold f init s
folds on chars by increasing index. Computes f(… (f (f init s.[0]) s.[1]) …) s.[n-1]
.
val foldi : ('a -> int -> char -> 'a) -> 'a -> t -> 'a
foldi f init s
is just like fold
, but it also passes in the index of each chars as second argument to the folded function f
.
val to_seq : t -> char Stdlib.Seq.t
to_seq s
returns the Seq.t
of characters contained in the string s
. Renamed from to std_seq
since 3.0.
val to_list : t -> char list
to_list s
returns the list
of characters contained in the string s
.
val pp_buf : Stdlib.Buffer.t -> t -> unit
pp_buf buf s
prints s
to the buffer buf
. Renamed from pp
since 2.0.
val pp : Stdlib.Format.formatter -> t -> unit
pp f s
prints the string s
within quotes to the formatter f
. Renamed from print
since 2.0.
compare s1 s2
compares the strings s1
and s2
and returns an integer that indicates their relative position in the sort order.
pad ~side ~c n s
ensures that the string s
is at least n
bytes long, and pads it on the side
with c
if it's not the case.
val of_gen : char gen -> string
of_gen gen
converts a gen
of characters to a string.
val of_iter : char iter -> string
of_iter iter
converts an iter
of characters to a string.
val of_seq : char Stdlib.Seq.t -> string
of_seq seq
converts a seq
of characters to a string. Renamed from of_std_seq
since 3.0.
to_array s
returns the array of characters contained in the string s
.
find ~start ~sub s
returns the starting index of the first occurrence of sub
within s
or -1
.
val find_all : ?start:int -> sub:string -> string -> int gen
find_all ~start ~sub s
finds all occurrences of sub
in s
, even overlapping instances and returns them in a generator gen
.
find_all_l ~sub s
finds all occurrences of sub
in s
and returns them in a list.
mem ~start ~sub s
is true
iff sub
is a substring of s
.
rfind ~sub s
finds sub
in string s
from the right, returns its first index or -1
. Should only be used with very small sub
.
replace ~which ~sub ~by s
replaces some occurrences of sub
by by
in s
.
is_sub ~sub ~sub_pos s ~pos ~sub_len
returns true
iff the substring of sub
starting at position sub_pos
and of length sub_len
is a substring of s
starting at position pos
.
chop_prefix ~pre s
removes pre
from s
if pre
really is a prefix of s
, returns None
otherwise.
chop_suffix ~suf s
removes suf
from s
if suf
really is a suffix of s
, returns None
otherwise.
val lines_gen : string -> string gen
lines_gen s
returns the gen
of the lines of s
(splits along '\n').
val lines_iter : string -> string iter
lines_iter s
returns the iter
of the lines of s
(splits along '\n').
val lines_seq : string -> string Stdlib.Seq.t
lines_seq s
returns the Seq.t
of the lines of s
(splits along '\n').
val concat_gen : sep:string -> string gen -> string
concat_gen ~sep gen
concatenates all strings of gen
, separated with sep
.
val concat_seq : sep:string -> string Stdlib.Seq.t -> string
concat_seq ~sep seq
concatenates all strings of seq
, separated with sep
.
val concat_iter : sep:string -> string iter -> string
concat_iter ~sep iter
concatenates all strings of iter
, separated with sep
.
val unlines_gen : string gen -> string
unlines_gen gen
concatenates all strings of gen
, separated with '\n'.
val unlines_iter : string iter -> string
unlines_iter iter
concatenates all strings of iter
, separated with '\n'.
val unlines_seq : string Stdlib.Seq.t -> string
unlines_seq seq
concatenates all strings of seq
, separated with '\n'.
set s i c
creates a new string which is a copy of s
, except for index i
, which becomes c
.
iter f s
applies function f
on each character of s
. Alias to String.iter
.
filter_map f s
calls (f a0) (f a1) … (f an)
where a0 … an
are the characters of s. It returns the string of characters ci
such as f ai = Some ci
(when f
returns None
, the corresponding element of s
is discarded).
filter f s
discards characters of s
not satisfying f
.
uniq eq s
remove consecutive duplicate characters in s
.
flat_map ~sep f s
maps each chars of s
to a string, then concatenates them all.
for_all f s
is true
iff all characters of s
satisfy the predicate f
.
exists f s
is true
iff some character of s
satisfy the predicate f
.
drop_while f s
discards any characters of s
starting from the left, up to the first character c
not satisfying f c
.
rdrop_while f s
discards any characters of s
starting from the right, up to the first character c
not satisfying f c
.
iter2 f s1 s2
iterates on pairs of chars.
iteri2 f s1 s2
iterates on pairs of chars with their index.
fold2 f init s1 s2
folds on pairs of chars.
for_all2 f s1 s2
returns true
iff all pairs of chars satisfy the predicate f
.
exists2 f s1 s2
returns true
iff a pair of chars satisfy the predicate f
.
Those functions are deprecated in String
since 4.03, so we provide a stable alias for them even in older versions.
equal_caseless s1 s2
compares s1
and s2
without respect to ascii lowercase.
Same as of_hex
but fails harder.
A relatively efficient algorithm for finding sub-strings.
module Find : sig ... end
module Split : sig ... end
split_on_char by s
splits the string s
along the given char by
.
split ~by s
splits the string s
along the given string by
. Alias to Split.list_cpy
.
compare_versions s1 s2
compares version strings s1
and s2
, considering that numbers are above text.
compare_natural s1 s2
is the Natural Sort Order, comparing chunks of digits as natural numbers. https://en.wikipedia.org/wiki/Natural_sort_order
edit_distance ~cutoff s1 s2
is the edition distance between the two strings s1
and s2
. This satisfies the classical distance axioms: it is always positive, symmetric, and satisfies the formula distance s1 s2 + distance s2 s3 >= distance s1 s3
.
module Infix : sig ... end