[Disclaimer : All benchmarks were done in debug mode on an i7 920 system with no compiler optimizations]

After browsing some cool C++ util functions today on StackOverflow, one of the cool util functions was for trimming strings, so I became intrigued and decided to compare them against my own, as they used some cool C++ std library tricks that I've never seen before. After doing benchmark tests, I was excited to find my own implementation appeared to blow his out of the water. After 1,000,000 tests, his averaged about 27 microseconds, while mine at first averaged less than 5 microseconds. After tweaking mine a bit, my trim_trailingWhitespace function averaged close to 4 microseconds after averaging 1,000,000 samples. Also to note, his function works directly on the original string, while mine returns a new copy; so it's even more impressive.

Here's another benchmark using 1,000,000 samples. Using the following string: " " His: 37.88 microseconds Mine: 4.03 microseconds

Benchmark 3 using the same # of samples as before. Using the following string: " The world ends with you! " His: 30.37 microseconds Mine: 6.26 microseconds

The reason for this post, is because I was excited to find out my implementation is extremely fast, and I felt like sharing it with the world to see if it's possible to do even better! I'm still thinking of ways to improve my functions, but please let me know if you can best me in any way, I'm here to learn!


The only header you should need is the following, with the proper using statements too of course:

#include <iostream>

using std::basic_string;

This is the trim function used by my trim_trailingWhitespace

template<typename CharType>
static basic_string<CharType> trim(const basic_string<CharType>& str, size_t index, size_t len)
{
    size_t length = str.size();
    if(index >= length)
    {
        return basic_string<CharType>();
    }

    length -= index;
    length = length > len ? len : length;

    return str.substr(index, length);
}

And this is my function to trim the whitespace at the end of the string:

template<typename CharType>
static basic_string<CharType> trim_trailingWhitespace(const basic_string<CharType>& str)
{
    size_t length = str.size();
    if(length == 0)
    {
        return str;
    }

    register size_t index = length - 1;

    while(' ' == str[index])
    {
        if(index != 0)
        {
            --index;
        }
        else
        {
            return basic_string<CharType>();
        }
    }

    return trim(str, 0, ++index);
}

You'll notice I'm using some funky template stuff here, and basic_string; this will allow you to use whatever kind of string encoding you want, 1Byte ASCII characters (string), or 2Byte unicode characters (wstring), or whatever. And the beauty of it is, if you pass in the proper string type, like string or wstring, you don't need to define the template parameters; your compiler will automatically fill them in for you so you only just need to call the function name without worrying about the template details!

I haven't gotten a chance to rewrite my trim_leadingWhitespace function, so that will be coming soon; and yes, I will rename these once I decide upon a better name.


After doing some further research I stumbled across this implementation. I realized my above comparison to the first algorithm wasn't entirely fair as the implementation wasn't even competitive, even though it worked in-place. So, here's the source for an actually efficient implementation of rear trimming white space:

void rtrim(string& str, const locale& loc)
{
  string::size_type pos = str.size();
  while (pos > 0 && isspace(str[pos - 1], loc)) pos--;
  str.erase(pos);
}

By replacing the isspace with ' ' == you can make this function much faster, though obviously this works with our locale, and may be different across other languages. Another possible reason this isn't as fast as it could be is because of the str[pos - 1], he's doing an extra calculation per loop besides the pos--. His solution gets around the problem of using an unsigned int (size_t) that mine originally had, but lucky for me my rewrite made mine even more efficient because of this trick he had to do.

And here's the details on isspace. I think my check works fine for both string and wstring and the like; if not, please let me know. [Edit] You could pass the locale space/seperator in as a parameter to the function if that's an issue.

As the above solution works in place, I decided to tweak my original function to work in place on a string, and am excited to say that it beats out this obviously improved implementation. Here's my new in-place implementation:

template<typename CharType>
static void trim_trailingWhitespace(basic_string<CharType>& str)
{
    size_t length = str.size();
    if(length == 0)
    {
        return;
    }

    register size_t index = length - 1;

    while(' ' == str[index])
    {
        if(index != 0)
        {
            --index;
        }
        else
        {
            str.clear();
            return;
        }
    }
    str.erase(++index);
}

So, my implementation provides shortcuts for an empty string, and for the case the entire string contains spaces (where you see str.clear()). It also provides an interesting way to loop through the string compared to the other implementation, allowing mine to be a bit quicker, probably because I only have to use str[index] (besides the above mentioned shortcuts).

Benchmark 1: String: " " His: 2.97 microseconds His + my tweak: 2.68 microseconds Mine:2.54 microseconds

Benchmark 2: String: " The world ends with you! " His: 3.93 microseconds His + my tweak: 3.68 microseconds Mine: 3.65 microseconds

As noted at the header, the above tests were done in debug mode with no optimizations. If you turn on optimizations you'll notice the two algorithms at the bottom here are neck and neck, it doesn't really matter what you use. Heck, if you look at the algorithm from StackOverflow that's not nearly as efficient in debug, suddenly becomes nearly comparable. Though, it's still nice to have a code that runs fast in debug too; but it does raise the question whether it was worth the time to do this research; as they say, don't optimize until you have to. It's been a nice educational endeavor so far, so I don't really care if it might be considered wasted time, as I've learned more than a thing or two. Thoughts?

And here's the link to the StackOverflow page of useful C++ utilities: Most Useful C++ Utils.

Comments

There are no comments yet.

Next Post Previous Post