Friday, March 28, 2008

How to: Optimize the memory usage with strings

The System.String type is basically a sequence of Unicode characters. Some of the most important properties of the String type are:
  • It is immutable. Once a string is created it can not be modified. Updating the string’s value will end up in creating a new string object having the updated content and reclaiming the old string by the GC (Garbage Collector)
  • It is a reference type. Because of the immutability, many people think that the String is a value type. Actually it is a reference type, so a string can be null. Being a reference type is a good think, because we can save memory by sharing same object references for long strings having same content. A null string is not equivalent with an empty string!
  • It overloads the == operator. When the == operator is used, the Equals() method is called. This will check first if the compared strings share same object. If is so, the Equals() method will skip checking the content and it will return True. If the two strings are referring different objects, a content based comparison will start. So for the first scenario, the Equals() method is much faster. You can read more about comparing strings here: How to: Optimize the strings’ comparison

Preserve memory

System.String type is used in any .NET application. We have strings as: names, addresses, descriptions, error messages, warnings or even application settings. Each application has to create, compare or format string data. Considering the immutability and the fact that any object can be converted to a string, all the available memory can be swallowed by a huge amount of unwanted string duplicates or unclaimed string objects. Now let's see how a string object should be handled to preserve memory.

String literals

Using literals guaranties that strings with same content are using references to same string objects.

C# .NET

string literal1 = "STRING";

string literal2 = "STRING";

 

Console.WriteLine("literal1 = {0}", literal1);

Console.WriteLine("literal2 = {0}", literal2);

 

if (Object.ReferenceEquals(literal1, literal2))

{

    // Are sharing same object...               

}


This is where an often overlooked technique called string interning comes into play. Each .NET assembly has an intern pool, which is in essence a collection of unique strings. When your code is compiled, all the string literals you reference in your code are added to this pool. Since many literals in a program tend to appear in multiple places, this conserves memory.

Concatenated literals are using the intern pool too:

C# .NET

// Are sharing the same object!

string string1 = "My" + " " + "STRING";

string string2 = "My STRING";


String constants

The string constants give you same effect because the compiler will replace all constant refecences with the defined string literals.

String.Empty vs ""

Use String.Empty rather than "". This is more for speed than memory usage but it is a useful tip. The "" is a literal so will act as a literal: on the first use it is created and for the following uses its reference is returned. Only one instance of "" will be stored in memory no matter how many times we use it! I don't see any memory penalties here. The problem is that each time the "" is used, a comparing loop is executed to check if the "" is already in the intern pool. On the other side, String.Empty is a reference to a "" stored in the .NET Framework memory zone. String.Empty is pointing to same memory address for VB.NET and C# applications. So why search for a reference each time you need "" when you have that reference in String.Empty?

C# .NET

// Are NOT sharing same object!

string empty1 = "";

string empty2 = String.Empty;


String = String

If a string is initialized with a precreated string, both will share same object.

C# .NET

// Are sharing the same object!

string string1 = "STRING";

string string2 = string1;


Updating any of them will end up in creating two different string objects:

C# .NET

string1 = "UPDATED STRING";


Now string1 is pointing to the new created string ("UPDATED STRING") while string2 is pointing to the old string object ("STRING").

The String.Concat() method

The String.Concat() method is creating new string objects for each call. So, strings created by this method will never share same object even if they have same content:

C# .NET

// Two different objects are created!

string concat1 = String.Concat("My", " ", "String");

string concat2 = String.Concat("My", " ", "String");


The StringBuilder class

This is also true for using the StringBuilder class:

C# .NET

StringBuilder stringBuilder1 = new StringBuilder();

stringBuilder1.Append("String");

 

StringBuilder stringBuilder2 = new StringBuilder();

stringBuilder2.Append("String");

 

// Two different objects are created!

string sb1 = stringBuilder1.ToString();

string sb2 = stringBuilder2.ToString();


String created at run-time

Strings created at run-time don't share same objects:

C# .NET

// Two different objects are created!

string runTime1 = Char.ConvertFromUtf32(200);

string runTime2 = Char.ConvertFromUtf32(200);


The String.Intern() method

The strings created at run-time can behave like literals if the String.Intern() method is used. The Intern method uses the intern pool to search for a string equal to the value of argument. If such a string exists, its reference in the intern pool is returned. If the string does not exist, a reference to argument is added to the intern pool, then that reference is returned. Note that searching for a string in the intern pool can be expensive, depending how many strings are in the pool at that time.

C# .NET

// Are sharing the same object!

string interned1 = String.Intern(Char.ConvertFromUtf32(200));

string interned2 = String.Intern(Char.ConvertFromUtf32(200));


Keep in mind that interning a string has two unwanted side effects:
  • The memory allocated for interned String objects is not likely be released until the Common Language Runtime (CLR) terminates. The reason is that the CLR's reference to the interned String object can persist after your application, or even your application domain, terminates.
  • To intern a string, you must first create the string. The memory used by the String object must still be allocated, even though the memory will eventually be garbage collected.


kick it on DotNetKicks.com

8 comments:

Anonymous said...

Dude, lay off the Ctrl-B. Makes it very hard to read.

John said...

Great article! Thanks for the insight. (and I like the bold anonymous)

Anonymous said...

This is a fascinating article. Where did you acquire your knowledge of the intricacies of the .NET framework, such as string interning? Could you recommend any further reading on this topic, or similar topics related to the nuts and bolts of the .NET framework?

In what situations would String.Intern() be advisable? Perhaps if you have a really, really large string that you anticipate will be used in the intern pool?

Keep up the awesome work -- I'll look forward to more posts of this caliber.

I don't mind the bold =)

Anonymous said...

Thanks for sharing this
nice article

regards

Hopeto said...

Thanks for the helpful post.

Andrei Rinea said...

@DotNetYuppie : Read CLR via C# by Jeffrey Richter (hardcover or ebook) and you'll find all these and much more ;)

Ranganatha said...

This very good article..very informative.

alpee said...

very well explained...