Skip to content

@dcsobral /gist:1112401

Embed URL


Subversion checkout URL

You can clone with
Download ZIP
Faster char split
// Since this link got passed around some, I'm going to paste here the
// versions made by sirthias. If you want to see the old ones,
// look up the history.
// Well, actually, these are non-pimped versions. See the comments to
// a link where they are provided as pimped methods on String.
import scala.annotation.tailrec
def fastSplit(s: String, delimiter: Char): List[String] = {
def split(end: Int, elements: List[String]): List[String] = {
val ix = s.lastIndexOf(delimiter, end - 1)
if (ix < 0)
s.substring(0, end) :: elements
split(ix, s.substring(ix + 1, end) :: elements)
split(s.length, Nil)
// And this one produces a Stream, so you don't need to split any more segments than you need.
def lazySplit(s: String, delimiter: Char): Stream[String] = { // based on an implemented by Jed Wesley-Smith
def split(start: Int): Stream[String] = {
val ix = s.indexOf(delimiter, start)
if (ix < 0)
Stream.cons(s.substring(start), Stream.Empty)
Stream.cons(s.substring(start, ix), split(ix + 1))
// This produces an Iterator, which is faster than Stream
def splitIterator(s: String, delimiter: Char): Iterator[String] = new Iterator[String] {
private var hasNextFlag = true
var start = 0
def hasNext = hasNextFlag
def next =
if (hasNext) {
val ix = s.indexOf(delimiter, start)
val result = if (ix < 0) {
hasNextFlag = false
} else s.substring(start, ix)
start = ix + 1
} else throw new java.util.NoSuchElementException("next on empty iterator")

Daniel, why not a Stream?

def split(s: String, c: Char): Stream[String] = {
def stream(i: Int): Stream[String] = {
val p = s indexOf (c, i)
if (p < 0) Stream.cons(s.substring(i), Stream.Empty)
else Stream.cons(s.substring(i, p), { println(p); stream(p + 1) })


You know you can fork the gist and then produce your version?

Great idea, actually! Once I replaced a split in Perl with a regex match, and reduced an execution time from 30 minutes to 7 seconds, precisely because I did not need to split the whole thing, just the first 100 or so fields.


didn't realise about the forking actually, nice!

yeah, this one can have nicer memory usage particularly if you don't iterate across all elements


You can make the eager version of this even cleaner and more efficient by searching for the delimiter from the back of the string rather than the front.
I just committed a polished version of these two implementations (with added tests) to spray:


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.