Created
June 14, 2019 09:20
-
-
Save WojciechMula/78f7b579abe77ebcfe38beae8d037e88 to your computer and use it in GitHub Desktop.
Demonstration of regex and string_view problems
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
Demonstration of regex and string_view problems | |
$ g++ -std=c++17 -Wall -Wextra regex.cpp -o test | |
$ ./test | |
wrong() | |
- fixed | |
- fi | |
- fixed | |
correct() | |
- kitten | |
- 42 | |
- fixed | |
*/ | |
#include <regex> | |
#include <string_view> | |
#include <iostream> | |
#include <vector> | |
std::regex re{"name=([a-z]+), number=([0-9]+), fixed=(fixed)"}; | |
std::string_view input{"name=kitten, number=42, fixed=fixed"}; | |
std::vector<std::string_view> wrong() | |
{ | |
std::vector<std::string_view> result; | |
std::match_results<std::string_view::const_iterator> match; | |
if (std::regex_match(input.cbegin(), input.cend(), match, re)) { | |
for (size_t i=1; i < match.size(); i++) | |
result.push_back(match[i].str()); | |
} | |
return result; | |
} | |
std::vector<std::string_view> correct() | |
{ | |
std::vector<std::string_view> result; | |
std::match_results<std::string_view::const_iterator> match; | |
if (std::regex_match(input.cbegin(), input.cend(), match, re)) { | |
for (size_t i=1; i < match.size(); i++) { | |
const char* first = match[i].first; | |
const char* last = match[i].second; | |
result.push_back({first, static_cast<std::size_t>(last - first)}); | |
} | |
} | |
return result; | |
} | |
void dump(const std::vector<std::string_view>& v) { | |
for (const auto& sv: v) | |
std::cout << "- " << sv << '\n'; | |
} | |
int main() { | |
auto r1 = wrong(); | |
std::cout << "wrong()" << '\n'; | |
dump(r1); | |
auto r2 = correct(); | |
std::cout << "correct()" << '\n'; | |
dump(r2); | |
} |
@mmatrosov thanks for the C++20 pointer, I wasn't aware of it.
Glad you found this snippet helpful, I wasted a lot of time trying to figure out why a simple regex produces crazy results. :)
Thanks for the gist, the definitions of first
and last
are not necessary, also one can use std::vector::emplace_back
:
std::for_each(match.begin()+1, match.end(), [&args](const auto& m) {result.emplace_back(m.first, m.second);});
or if you prefer the discrete loop:
for (size_t i = 1; i < match.size(); ++i)
result.emplace_back(match[i].first, match[i].second);
@wich Thank you for the comment. Your version uses the C++20 constructor of string_view, which accepts iterators. The snippet was written in the good old days of C++17 :)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for this useful example! Note that in C++20 you can also use this simpler correct form: