Summary: in this tutorial, you’ll learn about the C# string type and the basic string operations.
Introduction to the C# string
C# uses the string
keyword to represent the string type. The string
keyword is an alias for the System.String
type. Therefore, the string
and String
are equivalent.
Declare a string
The following example declares a string variable without initializing it:
string message;
Code language: C# (cs)
After declaring the string variable, you can assign it a string literal. To form a string literal, you place the string text inside double quotes ("..."
). For example:
message = "Hi";
Code language: C# (cs)
The following example declares and initializes the string using one statement:
string message = "Hi";
Code language: C# (cs)
To create a zero-length string, you use the String.Empty
like this:
string message = String.Empty;
Code language: C# (cs)
It’s equivalent to the following:
string message = "";
Code language: C# (cs)
Get the length of a string
A string has the Length
property that returns the length of a string. To access the Length
property, you use the dot operator (.
) like this:
string message = "Hello";
Console.WriteLine(message.Length);
Code language: C# (cs)
Output:
5
Code language: C# (cs)
Concatenate two strings
To concatenate two strings into one, you use the +
operator. For example:
string message = "Good" + " Morning";
Console.WriteLine(message);
Code language: C# (cs)
Output:
Good Morning!
Code language: C# (cs)
To append a string to another, you can also use the +=
operator. For example:
string message = "Good";
message += " Morning!";
Console.WriteLine(message);
Code language: C# (cs)
Output:
Good Morning!
Code language: C# (cs)
The String provides you with Join()
method that allows you to concatenate two or more strings into a single string using a separator.
Besides the +
operator, you can use the Concat() method to concatenate two or more strings into a string.
C# string is immutable
C# string is immutable. It means that when you make any changes to a string, you’ll always get a new string. For example:
string message = "C#";
message += " string";
Console.WriteLine(message);
Code language: C# (cs)
Output:
C# string
Code language: C# (cs)
In this example:
- First, define the
message
string variable and initialize it to the string literal"C#"
. - Second, concatenate the
message
string variable with another string literal" string!"
. - Third, show the
message
string to the console.
When concatenating the message
with the " string"
, C# doesn’t change the original string message
but creates a new string that holds the concatenated string.
Accessing individual characters
Internally, C# stores a string as a collection of read-only characters. To access an individual character in a string, you use the square bracket notation []
with an index:
s[index]
Code language: C# (cs)
The first character has an index of 0. The second character has an index of 1, and so on. For example:
string message = "Hello";
Console.WriteLine(message[0]); // H
Code language: C# (cs)
Output:
H
Code language: C# (cs)
Because a string is immutable, you can only read individual characters from it.
The following example results in a compilation error because it attempts to change the first character of a string:
string name = "Jill";
name[0] = 'B';
Code language: C# (cs)
Escape sequences
A literal string can contain special characters like tabs, and newlines, … using a backslash (\
). They are called escape sequences. For example:
string header = "id\tname";
Console.WriteLine(header);
Code language: C# (cs)
Output:
id name
Code language: C# (cs)
The header
string has the \t
character as the tab character. So when we display it to the console, the output has a tab character between the id and name.
If a string literal contains double quotes, you need to use the backslash character \
to escape them. For example:
string message = "\"C# is awesome\". They said";
Console.WriteLine(message);
Code language: C# (cs)
Output:
"C# is awesome". They said.
Code language: C# (cs)
In this example, the literal string contains two double quotes:
"C# is awesome". They said.
Code language: C# (cs)
Therefore, we use the backslash character (\) to escape each of them:
"\"C# is awesome\". They said."
Code language: C# (cs)
If a string contains the backslash character as a literal character, you need to use another backslash character to escape it like this:
string path = "C:\\users\\";
Console.WriteLine(path);
Code language: C# (cs)
Output:
C:\users\
Code language: C# (cs)
In this example, the directory path “C:\users\” string contains the backslashes. Therefore, we need to escape them using backslashes.
Verbatim string
If a string contains backslashes, you can escape them using backslashes. But double backslashes make the string difficult to read.
To fix this, you can turn a literal string into a verbatim string by prefixing the @
symbol. The verbatim string disables escape characters so that a backslash is a backslash. For example:
string path = @"C:\users\";
Console.WriteLine(path);
Code language: C# (cs)
Output:
C:\users\
Code language: C# (cs)
Because verbatim strings preserve newline characters as part of the string text, you can use them to create multiline strings. For example:
string content = @"I'm a multiline
string that span multiple
lines";
Console.WriteLine(content);
Code language: C# (cs)
Output:
I'm a multiline
string that span multiple
lines
Code language: C# (cs)
Interpolated string
Suppose you have a variable called name
:
string name = "Joe";
Code language: C# (cs)
And you want to embed the variable in a literal string.
To do that, you prefix the literal string with the $
character and place the variable inside the curly braces {}
:
string name = "Joe";
string greeting = $"Hello {name}!";
Console.WriteLine(greeting);
Code language: C# (cs)
Output:
Hello Joe!
Code language: C# (cs)
A literal string with the prefix $ is an interpolated string.
When encountering the $
prefix, the compiler replaces the {name}
variable with its value. This feature is called string interpolation.
UTF-8 strings
The web uses UTF-8 as the character encoding. Each character takes 1 to 4 bytes.
But in .NET, the string type uses UTF-16 by default. It means that each character takes at least 2 bytes in size.
If you use C# to process characters for the web, you need to convert UTF-16 to UTF-8.
Note that if you use ASP.NET Core, the framework does the conversion for you automatically.
To convert a string in UTF-16 to UTF-8, you use the following:
var utf8 = Encoding.UTF8.GetBytes("Hello WWW");
Code language: JavaScript (javascript)
This manual conversion creates a big overhead and slow down the program.
To solve this issue, C# 11 introduced the concept of UTF-8 string. A UTF-8 string has a suffix of u8
like this:
var utf8 = "Hello WWW"u8;
Code language: JavaScript (javascript)
The utf-8 string syntax brings not only elegant syntax but is also more efficient than converting a string from UTF-16 to UTF-8.
Note that C# 11 also introduced the concept of raw strings that we will cover in another tutorial.
Summary
- C# uses the
string
keyword to represent the string type. - The
string
keyword is an alias for theSystem.String
type. Therefore,string
andString
are the same. - C# strings are immutable.
- Use the
Length
property to get the length of the string. - Use the
+
operator to concatenate two strings and return a new string. - Use the square bracket with an index to access an individual character in a string.
- Use a verbatim string with the
@
prefix to disable the escape character so that backslashes have no special meaning. - Use an interpolated string with the
$
prefix to embed a variable in a literal string. - Use the
u8
suffix to create a string literal with UTF-8 encoding.