split-0.2.5: Combinator library for splitting lists.
Copyright(c) Brent Yorgey Louis Wasserman 2008-2012
LicenseBSD-style (see LICENSE)
MaintainerBrent Yorgey <byorgey@gmail.com>
Stabilitystable
PortabilityHaskell 2010
Safe HaskellSafe-Inferred
LanguageHaskell2010

Data.List.Split

Description

The Data.List.Split module contains a wide range of strategies for splitting lists with respect to some sort of delimiter, mostly implemented through a unified combinator interface. The goal is to be flexible yet simple. See below for usage, examples, and detailed documentation of all exported functions. If you want to learn about the implementation, see Data.List.Split.Internals.

A git repository containing the source (including a module with over 40 QuickCheck properties) can be found at https://github.com/byorgey/split.

Synopsis

Getting started

To get started, you should take a look at the functions splitOn, splitOneOf, splitWhen, endBy, chunksOf, splitPlaces, and other functions listed in the next two sections. These functions implement various common splitting operations, and one of them will probably do the job 90% of the time. For example:

>>> splitOn "x" "axbxc"
["a","b","c"]
>>> splitOn "x" "axbxcx"
["a","b","c",""]
>>> endBy ";" "foo;bar;baz;"
["foo","bar","baz"]
>>> splitWhen (<0) [1,3,-4,5,7,-9,0,2]
[[1,3],[5,7],[0,2]]
>>> splitOneOf ";.," "foo,bar;baz.glurk"
["foo","bar","baz","glurk"]
>>> chunksOf 3 ['a'..'z']
["abc","def","ghi","jkl","mno","pqr","stu","vwx","yz"]

If you want more flexibility, however, you can use the combinator library in terms of which these functions are defined. For more information, see the section labeled "Splitting Combinators".

The goal of this library is to be flexible yet simple. It does not implement any particularly sophisticated list-splitting methods, nor is it tuned for speed. If you find yourself wanting something more complicated or optimized, it probably means you should use a real parsing or regular expression library.

Convenience functions

These functions implement some common splitting strategies. Note that all of the functions in this section drop delimiters from the final output, since that is a more common use case. If you wish to keep the delimiters somehow, see the "Splitting Combinators" section.

splitOn :: Eq a => [a] -> [a] -> [[a]] Source #

Split on the given sublist. Equivalent to split . dropDelims . onSublist.

>>> splitOn ":" "12:35:07"
["12","35","07"]
>>> splitOn "x" "axbxc"
["a","b","c"]
>>> splitOn "x" "axbxcx"
["a","b","c",""]
>>> splitOn ".." "a..b...c....d.."
["a","b",".c","","d",""]

In some parsing combinator frameworks this is also known as sepBy.

Note that this is the right inverse of the intercalate function from Data.List, that is,

intercalate x . splitOn x === id

splitOn x . intercalate x is the identity on certain lists, but it is tricky to state the precise conditions under which this holds. (For example, it is not enough to say that x does not occur in any elements of the input list. Working out why is left as an exercise for the reader.)

splitOneOf :: Eq a => [a] -> [a] -> [[a]] Source #

Split on any of the given elements. Equivalent to split . dropDelims . oneOf.

>>> splitOneOf ";.," "foo,bar;baz.glurk"
["foo","bar","baz","glurk"]

splitWhen :: (a -> Bool) -> [a] -> [[a]] Source #

Split on elements satisfying the given predicate. Equivalent to split . dropDelims . whenElt.

>>> splitWhen (<0) [1,3,-4,5,7,-9,0,2]
[[1,3],[5,7],[0,2]]
>>> splitWhen (<0) [1,-2,3,4,-5,-6,7,8,-9]
[[1],[3,4],[],[7,8],[]]

endBy :: Eq a => [a] -> [a] -> [[a]] Source #

Split into chunks terminated by the given subsequence. Equivalent to split . dropFinalBlank . dropDelims . onSublist.

>>> endBy ".;" "foo.;bar.;baz.;"
["foo","bar","baz"]

Note also that the lines function from Data.List is equivalent to endBy "\n".

endByOneOf :: Eq a => [a] -> [a] -> [[a]] Source #

Split into chunks terminated by one of the given elements. Equivalent to split . dropFinalBlank . dropDelims . oneOf.

>>> endByOneOf ";," "foo;bar,baz;"
["foo","bar","baz"]

wordsBy :: (a -> Bool) -> [a] -> [[a]] Source #

Split into "words", with word boundaries indicated by the given predicate. Satisfies words === wordsBy isSpace; equivalent to split . dropBlanks . dropDelims . whenElt.

>>> wordsBy (`elem` ",;.?! ") "Hello there, world! How?"
["Hello","there","world","How"]
>>> wordsBy (=='x') "dogxxxcatxbirdxx"
["dog","cat","bird"]

linesBy :: (a -> Bool) -> [a] -> [[a]] Source #

Split into "lines", with line boundaries indicated by the given predicate. Satisfies lines === linesBy (=='n'); equivalent to split . dropFinalBlank . dropDelims . whenElt.

>>> linesBy (==';') "foo;bar;;baz;"
["foo","bar","","baz"]
>>> linesBy (=='x') "dogxxxcatxbirdxx"
["dog","","","cat","bird",""]

Other splitting methods

Other useful splitting methods which are not implemented using the combinator framework.

chunksOf :: Int -> [e] -> [[e]] Source #

chunksOf n splits a list into length-n pieces. The last piece will be shorter if n does not evenly divide the length of the list. If n <= 0, chunksOf n l returns an infinite list of empty lists.

>>> chunksOf 3 [1..12]
[[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
>>> chunksOf 3 "Hello there"
["Hel","lo ","the","re"]
>>> chunksOf 3 ([] :: [Int])
[]

Note that chunksOf n [] is [], not [[]]. This is intentional, and satisfies the property that

chunksOf n xs ++ chunksOf n ys == chunksOf n (xs ++ ys)

whenever n evenly divides the length of xs.

splitPlaces :: Integral a => [a] -> [e] -> [[e]] Source #

Split a list into chunks of the given lengths.

>>> splitPlaces [2,3,4] [1..20]
[[1,2],[3,4,5],[6,7,8,9]]
>>> splitPlaces [4,9] [1..10]
[[1,2,3,4],[5,6,7,8,9,10]]
>>> splitPlaces [4,9,3] [1..10]
[[1,2,3,4],[5,6,7,8,9,10]]

If the input list is longer than the total of the given lengths, then the remaining elements are dropped. If the list is shorter than the total of the given lengths, then the result may contain fewer chunks than requested, and the last chunk may be shorter than requested.

splitPlacesBlanks :: Integral a => [a] -> [e] -> [[e]] Source #

Split a list into chunks of the given lengths. Unlike splitPlaces, the output list will always be the same length as the first input argument. If the input list is longer than the total of the given lengths, then the remaining elements are dropped. If the list is shorter than the total of the given lengths, then the last several chunks will be shorter than requested or empty.

>>> splitPlacesBlanks [2,3,4] [1..20]
[[1,2],[3,4,5],[6,7,8,9]]
>>> splitPlacesBlanks [4,9] [1..10]
[[1,2,3,4],[5,6,7,8,9,10]]
>>> splitPlacesBlanks [4,9,3] [1..10]
[[1,2,3,4],[5,6,7,8,9,10],[]]

Notice the empty list in the output of the third example, which differs from the behavior of splitPlaces.

chop :: ([a] -> (b, [a])) -> [a] -> [b] Source #

A useful recursion pattern for processing a list to produce a new list, often used for "chopping" up the input list. Typically chop is called with some function that will consume an initial prefix of the list and produce a value and the rest of the list.

For example, many common Prelude functions can be implemented in terms of chop:

group :: (Eq a) => [a] -> [[a]]
group = chop (\ xs@(x:_) -> span (==x) xs)

words :: String -> [String]
words = filter (not . null) . chop (break isSpace . dropWhile isSpace)

divvy :: Int -> Int -> [a] -> [[a]] Source #

Divides up an input list into a set of sublists, according to n and m input specifications you provide. Each sublist will have n items, and the start of each sublist will be offset by m items from the previous one.

>>> divvy 5 5 [1..15]
[[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15]]
>>> divvy 5 2 [1..15]
[[1,2,3,4,5],[3,4,5,6,7],[5,6,7,8,9],[7,8,9,10,11],[9,10,11,12,13],[11,12,13,14,15]]

In the case where a source list's trailing elements do no fill an entire sublist, those trailing elements will be dropped.

>>> divvy 5 2 [1..10]
[[1,2,3,4,5],[3,4,5,6,7],[5,6,7,8,9]]

As an example, you can generate a moving average over a list of prices:

type Prices = [Float]
type AveragePrices = [Float]

average :: [Float] -> Float
average xs = sum xs / (fromIntegral $ length xs)

simpleMovingAverage :: Prices -> AveragePrices
simpleMovingAverage = map average . divvy 20 1

Splitting combinators

The core of the library is the Splitter type, which represents a particular list-splitting strategy. All of the combinators revolve around constructing or transforming Splitter objects; once a suitable Splitter has been created, it can be run with the split function. For example:

>>> split (dropBlanks . condense $ whenElt (<0)) [1,2,4,-5,-6,4,9,-19,-30]
[[1,2,4],[-5,-6],[4,9],[-19,-30]]

data Splitter a Source #

A splitting strategy.

defaultSplitter :: Splitter a Source #

The default splitting strategy: keep delimiters in the output as separate chunks, don't condense multiple consecutive delimiters into one, keep initial and final blank chunks. Default delimiter is the constantly false predicate.

Note that defaultSplitter should normally not be used; use oneOf, onSublist, or whenElt instead, which are the same as the defaultSplitter with just the delimiter overridden.

The defaultSplitter strategy with any delimiter gives a maximally information-preserving splitting strategy, in the sense that (a) taking the concat of the output yields the original list, and (b) given only the output list, we can reconstruct a Splitter which would produce the same output list again given the original input list. This default strategy can be overridden to allow discarding various sorts of information.

split :: Splitter a -> [a] -> [[a]] Source #

Split a list according to the given splitting strategy. This is how to "run" a Splitter that has been built using the other combinators.

Basic strategies

All these basic strategies have the same parameters as the defaultSplitter except for the delimiter.

oneOf :: Eq a => [a] -> Splitter a Source #

A splitting strategy that splits on any one of the given elements.

>>> split (oneOf ",;") "hi;there,world"
["hi",";","there",",","world"]
>>> split (oneOf "xyz") "aazbxyzcxd"
["aa","z","b","x","","y","","z","c","x","d"]

onSublist :: Eq a => [a] -> Splitter a Source #

A splitting strategy that splits on the given list, when it is encountered as an exact subsequence.

>>> split (onSublist "xyz") "aazbxyzcxd"
["aazb","xyz","cxd"]

Note that splitting on the empty list is a special case, which splits just before every element of the list being split.

>>> split (onSublist "") "abc"
["","","a","","b","","c"]
>>> split (dropDelims . dropBlanks $ onSublist "") "abc"
["a","b","c"]

However, if you want to break a list into singleton elements like this, you are better off using chunksOf 1, or better yet, map (:[]).

whenElt :: (a -> Bool) -> Splitter a Source #

A splitting strategy that splits on any elements that satisfy the given predicate.

>>> split (whenElt (<0)) [2,4,-3,6,-9,1 :: Int]
[[2,4],[-3],[6],[-9],[1]]

Strategy transformers

Functions for altering splitting strategy parameters.

dropDelims :: Splitter a -> Splitter a Source #

Drop delimiters from the output (the default is to keep them).

>>> split (oneOf ":") "a:b:c"
["a",":","b",":","c"]
>>> split (dropDelims $ oneOf ":") "a:b:c"
["a","b","c"]

keepDelimsL :: Splitter a -> Splitter a Source #

Keep delimiters in the output by prepending them to adjacent chunks.

>>> split (keepDelimsL $ oneOf "xyz") "aazbxyzcxd"
["aa","zb","x","y","zc","xd"]

keepDelimsR :: Splitter a -> Splitter a Source #

Keep delimiters in the output by appending them to adjacent chunks.

>>> split (keepDelimsR $ oneOf "xyz") "aazbxyzcxd"
["aaz","bx","y","z","cx","d"]

condense :: Splitter a -> Splitter a Source #

Condense multiple consecutive delimiters into one.

>>> split (condense $ oneOf "xyz") "aazbxyzcxd"
["aa","z","b","xyz","c","x","d"]
>>> split (dropDelims $ oneOf "xyz") "aazbxyzcxd"
["aa","b","","","c","d"]
>>> split (condense . dropDelims $ oneOf "xyz") "aazbxyzcxd"
["aa","b","c","d"]

dropInitBlank :: Splitter a -> Splitter a Source #

Don't generate a blank chunk if there is a delimiter at the beginning.

>>> split (oneOf ":") ":a:b"
["",":","a",":","b"]
>>> split (dropInitBlank $ oneOf ":") ":a:b"
[":","a",":","b"]

dropFinalBlank :: Splitter a -> Splitter a Source #

Don't generate a blank chunk if there is a delimiter at the end.

>>> split (oneOf ":") "a:b:"
["a",":","b",":",""]
>>> split (dropFinalBlank $ oneOf ":") "a:b:"
["a",":","b",":"]

dropInnerBlanks :: Splitter a -> Splitter a Source #

Don't generate blank chunks between consecutive delimiters.

>>> split (oneOf ":") "::b:::a"
["",":","",":","b",":","",":","",":","a"]
>>> split (dropInnerBlanks $ oneOf ":") "::b:::a"
["",":",":","b",":",":",":","a"]

mapSplitter :: (b -> a) -> Splitter a -> Splitter b Source #

Split over a different type of element by performing a preprocessing step.

>>> split (mapSplitter snd $ oneOf "-_") $ zip [0..] "a-bc_d"
[[(0,'a')],[(1,'-')],[(2,'b'),(3,'c')],[(4,'_')],[(5,'d')]]
>>> import Data.Char (toLower)
>>> split (mapSplitter toLower $ dropDelims $ whenElt (== 'x')) "abXcxd"
["ab","c","d"]

Derived combinators

Combinators which can be defined in terms of other combinators, but are provided for convenience.

dropBlanks :: Splitter a -> Splitter a Source #

Drop all blank chunks from the output, and condense consecutive delimiters into one. Equivalent to dropInitBlank . dropFinalBlank . condense.

>>> split (oneOf ":") "::b:::a"
["",":","",":","b",":","",":","",":","a"]
>>> split (dropBlanks $ oneOf ":") "::b:::a"
["::","b",":::","a"]

startsWith :: Eq a => [a] -> Splitter a Source #

Make a strategy that splits a list into chunks that all start with the given subsequence (except possibly the first). Equivalent to dropInitBlank . keepDelimsL . onSublist.

>>> split (startsWith "app") "applyapplicativeapplaudapproachapple"
["apply","applicative","applaud","approach","apple"]

startsWithOneOf :: Eq a => [a] -> Splitter a Source #

Make a strategy that splits a list into chunks that all start with one of the given elements (except possibly the first). Equivalent to dropInitBlank . keepDelimsL . oneOf. example:

>>> split (startsWithOneOf ['A'..'Z']) "ACamelCaseIdentifier"
["A","Camel","Case","Identifier"]

endsWith :: Eq a => [a] -> Splitter a Source #

Make a strategy that splits a list into chunks that all end with the given subsequence, except possibly the last. Equivalent to dropFinalBlank . keepDelimsR . onSublist.

>>> split (endsWith "ly") "happilyslowlygnarlylily"
["happily","slowly","gnarly","lily"]

endsWithOneOf :: Eq a => [a] -> Splitter a Source #

Make a strategy that splits a list into chunks that all end with one of the given elements, except possibly the last. Equivalent to dropFinalBlank . keepDelimsR . oneOf.

>>> split (condense $ endsWithOneOf ".,?! ") "Hi, there!  How are you?"
["Hi, ","there!  ","How ","are ","you?"]