Skip to content

Instantly share code, notes, and snippets.

@Vermeille
Created December 24, 2016 10:49
Show Gist options
  • Save Vermeille/fdd690f6d204824ce4d1588a71011b3c to your computer and use it in GitHub Desktop.
Save Vermeille/fdd690f6d204824ce4d1588a71011b3c to your computer and use it in GitHub Desktop.
A Good, Idiomatic, STL-like split algorithm?
#include <algorithm>
#include <iostream>
#include <string>
#include <tuple>
#include <utility>
/*
This split impl has a nice prototype and isn't callback / continuation
oriented. However, it has two major drawbacks
1) Iterating over it is a bit painful. See below.
2) If the first split-range is empty, it's ignored. The following are not.
See below how the range where "e" is missing actually makes an iteration, and
"g" doesn't. That's not consistent, and I see no way to fix this with this
interace.
*/
template <class FwdIt, class Separator>
std::pair<FwdIt, FwdIt> split(FwdIt b, FwdIt e, const Separator& sep) {
if (b == e) {
return std::make_pair(e, e);
}
if (*b == sep) {
++b;
}
return std::make_pair(b, std::find(b, e, sep));
}
int main() {
std::string csv =
"a,b,c\n"
"d,,f\n"
",h,i\n";
auto line = split(csv.begin(), csv.end(), '\n');
while (line.first != csv.end()) {
std::cout << "line: " << std::string(line.first, line.second) << "\n";
auto val = split(line.first, line.second, ',');
while (val.first != line.second) {
std::cout << " val: " << std::string(val.first, val.second)
<< "\n";
val = split(val.second, line.second, ',');
}
line = split(line.second, csv.end(), '\n');
}
}
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
/*
This implementation is consistent and works. However
1) that CPS isn't very STL-idiomatic.
2) It also forces to explore all splits. You can certainly play with the b/e
iterators to discard some, but that sucks.
*/
template <class FwdIt, class T, class F>
void split(FwdIt b, FwdIt e, const T& x, F&& f) {
while (true) {
auto found = std::find(b, e, x);
f(b, found);
if (found == e) {
return;
}
b = found + 1;
}
}
int main() {
std::string csv =
"a,b,c\n"
"d,,f\n"
",h,i\n";
split(csv.begin(), csv.end(), '\n', [&](auto lb, auto le) {
std::cout << "line: " << std::string(lb, le) << "\n";
split(lb, le, ',', [&](auto tb, auto te) {
std::cout << " val: " << std::string(tb, te) << "\n";
});
});
}
@joboccara
Copy link

Hi Guillaume,

What do you think of the split view adaptor in range-v3 to solve this problem ?
std::string input = "split me into words";
auto splitRange = view::split(input, [](char c){returns c == ' '));
which produces a view on the initial range over which you can iterate.

If you want to keep the results and destruct the initial string, you can copy splitRange, or alternatively use boost's split:
std::string input = "split me into words";
std::vector<std::string> results;
boost::split(results, input, [](char c){returns c == ' ')}

Although I think boost's interface clarity could be improved by returning the results by value:
std::string input = "split me into words";
std::vector<std::string> results = split(input, [](char c){returns c == ' ')}
which shouldn't impact performance thanks to move semantics.

References:
split view adaptor in range-v3:
https://github.com/ericniebler/range-v3/blob/ca997df10962c482274e6be37fdbe39add8664c9/test/view/split.cpp
boost split
http://www.boost.org/doc/libs/1_57_0/doc/html/string_algo/usage.html#idp430824992

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment