Saturday, March 15, 2008

How to: Optimize the strings’ comparison

Due to my web research I found some useful tips about how to compare two strings making full use of performance in .NET Framework.

Equals() vs ==

strA.Equals(strB) is a little bit faster than strA == strB, because the == operator has to handle an overload implementation. According to the class library design guidelines the == operator’s overload implementation is same as for Equals().

Both are compiled into:

C# .NET

String.Equals(string1, string2, StringComparison.Ordinal);


and both start with a reference comparison.

String.Empty vs “”

String.Empty is faster than "" because it doesn’t create a new string object instance. Creating new instances brings penalties on both execution speed and memory usage.

C# .NET

strA.Equals(String.Empty);  // fastest

strA == String.Empty;        // fast

strA.Equals("");            // slow

strA == "";                  // slowest


StringComparison enum

This enumeration is new in the .NET Framework 2.0.

C# .NET

StringComparison.OrdinalIgnoreCase;            // fastest

StringComparison.Ordinal;                      // fast

StringComparison.InvariantCultureIgnoreCase;    // slow

StringComparison.InvariantCulture;              // slow

StringComparison.CurrentCultureIgnoreCase;      // slow

StringComparison.CurrentCulture;                // slowest


OrdinalIgnoreCase is 2x faster than Ordinal and 5x faster than CurrentCulture.
CurrentCulture-based and InvariantCulture-based have almost same speed performance.

Use StringComparison.CurrentCulture-based string operations when displaying the output to the user.

Don’t use StringComparison.InvariantCulture-based string operations in most cases; one of the few exceptions would be persisting linguistically meaningful but culturally-agnostic data. So, replace all InvariantCulture-based usage with Ordinal-based usage.

Don’t use overloads for string operations that don't explicitly or implicitly specify the string comparison mechanism.

ToUpperInvariant vs ToLowerInvariant

This enumeration is new in the .NET Framework 2.0.

Use ToUpperInvariant rather than ToLowerInvariant when normalizing strings for comparison because it is faster.

Use rather comparing methods that consume a StringComparison argument than ToUpper-based or ToLower-based methods which create new string instances, bringing penalties for both the execution speed and memory usage.

Switch blocks

C# .NET

switch (myString)

{

    case "StringA":

        ...

        break;

    case "StringB":

        ...

        break;

    default:

        ...

        break;

}


In any switch block, the overloded == operator is called, so the final comparison occurs in String.Equals(string, string) which is compiled into:

C# .NET

String.Equals(string1, string2, StringComparison.Ordinal);


Interned strings

All string comparisons based on Equals() do a full string comparison, which results in iterating through each of the containing chars of the two strings and comparing them. If different chars are found, the Equals() method will return False, breaking the iteration and ignoring the remaining chars.

Considering comparing many long strings that are not matching at ending parts only, using Equals() will become a slow process. To optimize this scenario, we can use the interned strings mechanism.

To do this we have to convert a code like this:

C# .NET

if (myString == "VERY LONG STRING #1")

{

    ...

}

else if (myString == "VERY LONG STRING #2")

{

    ...

}

else if (myString == "VERY LONG STRING #3")

{

    ...

}

...

else

{

    ...

}


into something like this:

C# .NET

if (Object.ReferenceEquals(myString, "VERY LONG STRING #1"))

{

    ...

}

else if (Object.ReferenceEquals(myString, "VERY LONG STRING #2"))

{

    ...

}

else if (Object.ReferenceEquals(myString, "VERY LONG STRING #3"))

{

    ...

}

...

else

{

    ...

}


This uses just the reference comparison (memory address comparison) and is much faster. However, the catch is that the myString has to be an interned string.

kick it on DotNetKicks.com

3 comments:

Hosam Aly said...

I am not convinced by many of these suggestions. Do you have actual measurements or references on which these recommendations were based?

Also why don't you write your name on this blog?!

Anonymous said...

I agree with Hosam Aly: every one of these tips seems wrong. (At least the first 3-4, I stopped reading after that.)

Anonymous said...

The last suggestion is bad:

String s1 = "MyTest";
String s2 = new StringBuilder().Append("My").Append("Test").ToString();
String s3 = String.Intern(s2);
Console.WriteLine(Object.ReferenceEquals(s1, s2)); // false
Console.WriteLine(Object.ReferenceEquals(s1, s3)); // true
Console.WriteLine(s2 == s1); // true