Skip to content

Embed URL

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Faster char split
// Since this link got passed around some, I'm going to paste here the
// versions made by sirthias. If you want to see the old ones,
// look up the history.
// Well, actually, these are non-pimped versions. See the comments to
// a link where they are provided as pimped methods on String.
import scala.annotation.tailrec
def fastSplit(s: String, delimiter: Char): List[String] = {
@tailrec
def split(end: Int, elements: List[String]): List[String] = {
val ix = s.lastIndexOf(delimiter, end - 1)
if (ix < 0)
s.substring(0, end) :: elements
else
split(ix, s.substring(ix + 1, end) :: elements)
}
split(s.length, Nil)
}
// And this one produces a Stream, so you don't need to split any more segments than you need.
def lazySplit(s: String, delimiter: Char): Stream[String] = { // based on an implemented by Jed Wesley-Smith
def split(start: Int): Stream[String] = {
val ix = s.indexOf(delimiter, start)
if (ix < 0)
Stream.cons(s.substring(start), Stream.Empty)
else
Stream.cons(s.substring(start, ix), split(ix + 1))
}
split(0)
}
// This produces an Iterator, which is faster than Stream
def splitIterator(s: String, delimiter: Char): Iterator[String] = new Iterator[String] {
private var hasNextFlag = true
var start = 0
def hasNext = hasNextFlag
def next =
if (hasNext) {
val ix = s.indexOf(delimiter, start)
val result = if (ix < 0) {
hasNextFlag = false
s.substring(start)
} else s.substring(start, ix)
start = ix + 1
result
} else throw new java.util.NoSuchElementException("next on empty iterator")
}
@jedws

Daniel, why not a Stream?

def split(s: String, c: Char): Stream[String] = {
def stream(i: Int): Stream[String] = {
val p = s indexOf (c, i)
if (p < 0) Stream.cons(s.substring(i), Stream.Empty)
else Stream.cons(s.substring(i, p), { println(p); stream(p + 1) })
}
stream(0)
}

@dcsobral
Owner

You know you can fork the gist and then produce your version?

Great idea, actually! Once I replaced a split in Perl with a regex match, and reduced an execution time from 30 minutes to 7 seconds, precisely because I did not need to split the whole thing, just the first 100 or so fields.

@jedws

didn't realise about the forking actually, nice!

yeah, this one can have nicer memory usage particularly if you don't iterate across all elements

@sirthias

You can make the eager version of this even cleaner and more efficient by searching for the delimiter from the back of the string rather than the front.
I just committed a polished version of these two implementations (with added tests) to spray:

spray/spray@77b962a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.