License | BSD-style |
---|---|
Maintainer | Vincent Hanquez <vincent@snarc.org> |
Stability | experimental |
Portability | portable |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Opaque packed String encoded in UTF8.
The type is an instance of IsString and IsList, which allow OverloadedStrings
for string literal, and fromList
to convert a [Char] (Prelude String) to a packed
representation
{-# LANGUAGE OverloadedStrings #-} s = "Hello World" :: String
s = fromList ("Hello World" :: Prelude.String) :: String
Each unicode code point is represented by a variable encoding of 1 to 4 bytes,
For more information about UTF8: https://en.wikipedia.org/wiki/UTF-8
Synopsis
- data String
- data Encoding
- = ASCII7
- | UTF8
- | UTF16
- | UTF32
- | ISO_8859_1
- fromBytes :: Encoding -> UArray Word8 -> (String, Maybe ValidationFailure, UArray Word8)
- fromBytesLenient :: UArray Word8 -> (String, UArray Word8)
- fromBytesUnsafe :: UArray Word8 -> String
- toBytes :: Encoding -> String -> UArray Word8
- data ValidationFailure
- lines :: String -> [String]
- words :: String -> [String]
- upper :: String -> String
- lower :: String -> String
- replace :: String -> String -> String -> String
- indices :: String -> String -> [Offset8]
- toBase64 :: String -> String
- toBase64URL :: Bool -> String -> String
- toBase64OpenBSD :: String -> String
- breakLine :: String -> Either Bool (String, String)
Documentation
Opaque packed array of characters in the UTF8 encoding
Instances
Various String Encoding that can be use to convert to and from bytes
Instances
Data Encoding | |
Defined in Basement.String gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> Encoding -> c Encoding # gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c Encoding # toConstr :: Encoding -> Constr # dataTypeOf :: Encoding -> DataType # dataCast1 :: Typeable t => (forall d. Data d => c (t d)) -> Maybe (c Encoding) # dataCast2 :: Typeable t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c Encoding) # gmapT :: (forall b. Data b => b -> b) -> Encoding -> Encoding # gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> Encoding -> r # gmapQr :: forall r r'. (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> Encoding -> r # gmapQ :: (forall d. Data d => d -> u) -> Encoding -> [u] # gmapQi :: Int -> (forall d. Data d => d -> u) -> Encoding -> u # gmapM :: Monad m => (forall d. Data d => d -> m d) -> Encoding -> m Encoding # gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> Encoding -> m Encoding # gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> Encoding -> m Encoding # | |
Bounded Encoding | |
Enum Encoding | |
Show Encoding | |
Eq Encoding | |
Ord Encoding | |
Defined in Basement.String |
fromBytes :: Encoding -> UArray Word8 -> (String, Maybe ValidationFailure, UArray Word8) Source #
Convert a ByteArray to a string assuming a specific encoding.
It returns a 3-tuple of:
- The string that has been succesfully converted without any error
- An optional validation error
- The remaining buffer that hasn't been processed (either as a result of an error, or because the encoded sequence is not fully available)
Considering a stream of data that is fetched chunk by chunk, it's valid to assume that some sequence might fall in a chunk boundary. When converting chunks, if the error is Nothing and the remaining buffer is not empty, then this buffer need to be prepended to the next chunk
fromBytesLenient :: UArray Word8 -> (String, UArray Word8) Source #
Convert a UTF8 array of bytes to a String.
If there's any error in the stream, it will automatically insert replacement bytes to replace invalid sequences.
In the case of sequence that fall in the middle of 2 chunks, the remaining buffer is supposed to be preprended to the next chunk, and resume the parsing.
fromBytesUnsafe :: UArray Word8 -> String Source #
Convert a Byte Array representing UTF8 data directly to a string without checking for UTF8 validity
If the input contains invalid sequences, it will trigger runtime async errors when processing data.
In doubt, use fromBytes
toBytes :: Encoding -> String -> UArray Word8 Source #
Convert a String to a bytearray in a specific encoding
if the encoding is UTF8, the underlying buffer is returned without extra allocation or any processing
In any other encoding, some allocation and processing are done to convert.
data ValidationFailure Source #
Possible failure related to validating bytes of UTF8 sequences.
Instances
Exception ValidationFailure | |
Defined in Basement.UTF8.Types | |
Show ValidationFailure | |
Defined in Basement.UTF8.Types showsPrec :: Int -> ValidationFailure -> ShowS # show :: ValidationFailure -> String # showList :: [ValidationFailure] -> ShowS # | |
Eq ValidationFailure | |
Defined in Basement.UTF8.Types (==) :: ValidationFailure -> ValidationFailure -> Bool # (/=) :: ValidationFailure -> ValidationFailure -> Bool # |
lines :: String -> [String] Source #
Split lines in a string using newline as separation.
Note that carriage return preceding a newline are also strip for maximum compatibility between Windows and Unix system.
replace :: String -> String -> String -> String Source #
Replace all the occurrencies of needle
with replacement
in
the haystack
string.
indices :: String -> String -> [Offset8] Source #
Finds where are the insertion points when we search for a needle
within an haystack
.
toBase64URL :: Bool -> String -> String Source #
Transform string src
to URL-safe base64 binary representation.
The result will be either padded or unpadded, depending on the boolean
padded
argument.
toBase64OpenBSD :: String -> String Source #
Transform string src
to OpenBSD base64 binary representation.
breakLine :: String -> Either Bool (String, String) Source #
Same as break but cut on a line feed with an optional carriage return.
This is the same operation as 'breakElem LF' dropping the last character of the string if it's a CR.
Also for efficiency reason (streaming), it returns if the last character was a CR character.