| void RemoveWhitespace(cString &szString) | |
| { | |
| // Remove leading whitespace | |
| size_t nFirstIndex = szString.find_first_not_of(_L(' ')); | |
| if(nFirstIndex != cString::npos) | |
| { | |
| szString = szString.substr(nFirstIndex); | |
| } | |
| // Remove trailing newlines | |
| size_t nLastIndex = szString.find_last_not_of(_L('\n')); | |
| while(nLastIndex != szString.length()-1) | |
| { | |
| szString.erase(nLastIndex+1,1); | |
| nLastIndex = szString.find_last_not_of(_L('\n')); | |
| }; | |
| // Tabs | |
| nLastIndex = szString.find_last_not_of(_L('\t')); | |
| while(nLastIndex != szString.length()-1) | |
| { | |
| szString.erase(nLastIndex+1,1); | |
| nLastIndex = szString.find_last_not_of(_L('\t')); | |
| }; | |
| // Spaces | |
| nLastIndex = szString.find_last_not_of(_L(' ')); | |
| while(nLastIndex != szString.length()-1) | |
| { | |
| szString.erase(nLastIndex+1,1); | |
| nLastIndex = szString.find_last_not_of(_L(' ')); | |
| }; | |
| } |
For what it's worth, this code is used to read INI-like text files, and what I actually did to fix it was to stop copying strings around. My replacement function just takes a pair of iterators ("start" inclusive, "end" exclusive) and modifies it to strip whitespace:
static bool iswhite(int ch)
{
return ch == _L(' ') || ch == _L('\t') || ch == _L('\n');
}
template<typename T>
static void RemoveWhitespace(T& start, T& end)
{
while (start < end && iswhite(*start))
++start;
while (end > start && iswhite(*(end - 1)))
--end;
}
The code still eventually copies the extracted values into strings, but there's no reason to be creating tons of temporary strings in the meantime. For what it's worth, this is from a program that spent a significant amount of its (loading) time in RemoveWhitespace and its immediate callers, doing extremely inefficient text file parsing.
It's probably not the topic, but C++ has a very elegant way of solving this problem. Here's how simple it is:
bool iswhite(int ch)
{
return ch == ' ' || ch == '\t' || ch == '\n';
}
void RemoveWhitespace(std::string &str)
{
str.erase(std::remove_if(str.begin(), str.end(), iswhite),
str.end());
}
@dariomanesku: Your code removes all whitespace, not just leading and trailing.
@Poita: Yes, that's exactly what it does, isn't that what's necessary? If not, I'm sorry, I haven't quite taken my time to look carefully at the code, I was just passing by. Also, then, the function name could be a little different.
What about all the other whitespace characters? This targets only a rather small subset. :)
Ok, I cannot take it anymore. must not correct* ok I'll do it.... That code is horrible. Lots of veterans are laughing in horror, but unfortunately laughing won't make junior coders 'get it', so I'll proceed to explain. This comment aims to those young ones. Hopefully it will help someone:
First, erase has up to O( N ) complexity because it has to move everything that is after nLastIndex one place, because of that removal.
Worst case scenario (i.e. a string filled with tabs), this bit of code would need to shift N characters, then shift N-1 characters on the next iteration, then N-2, then N-3 and so forth.
The worst case scenario can be summarized as O( N * (N+1) / 2 ) which is very bad.
Add to this that the code tries to do it three times (newlines, tabs & spaces), you can see why it gets awful.
If the particular template implementation happens to be that after too many erase() calls it tries to "shrink to fit", it will have to reallocate memory, which will only trash memory and waste cycles here.
Second, find_last_not_of is a very useful function, but it's general use case is not optimized for this kind of abuse (searching for just one character inside a loop!). It should not be used here, specially if it's performance critical code.
Third, depending on the template implementation, there's a 99% chance substr will return a copy. This means if the string " started containing spaces at the beginning", the amount of memory used will be doubled. Ouch.
The memory being used itself is not much of a problem (unless the string is very large) but rather the waste of memory: using more memory can pollute & trash the cpu caches, allocating & deallocating memory isn't free, and it certainly hurts in a multithreaded environment unless the memory allocator is grabbing memory from a thread-local pool.
The following snippet is possible, much more efficient version:
This snippet is O( N ) in all scenarios, and doesn't waste ram at all, not a single reallocation needed; unless erase triggers a realloc, which isn't bad anyway (because it will be an intended shrink to fit)
We erase up until the end, so it takes O( 1 ), because there is nothing that comes after to move.
Much better than O( N * (N+1) / 2 ) multiple times, plus memory & cache trashing.
There are possibly more approaches to optimize it further.
I hope this serves as an example.
rygorous, feel free to correct me wherever you believe I'm wrong.
I've tested this code with multiple strings (empty strings, with & w/out leading spaces, w/ & w/out trailing spaces; 5 types of strings total) it compiles & works
I'm releasing this snippet to the public domain (where public domain is void, just use it as you like, commercial or purpose or not, whatever.). Released "AS IS" with no guarantees of any kind.
Credit / attribution is appreciated but not required.
IMPORTANT: My implementation is NOT functionally equivalent, because in the original snippet the strings:
RemoveWhitespace( "test\n\t " ) will output "test\n\t"
RemoveWhitespace( "test \t\n " ) will output "test"
while my snippet in all cases the result will be "test", which I presume was the intended behavior
Cheers
Matias Goldberg