Summary: in this tutorial, you will learn how to use character classes in regular expressions using C#.
Introduction to the C# Regex Character Classes
Character classes define a set of characters. For example:
- Digits (0 to 9 )
- Alphabets (a to z)
- Whitespace (tab, space, …)
Character classes allow you to match characters from a specified set of characters.
\d: digit character class
The \d
represents the digit character class that matches any single digit from 0 to 9. The following example uses the \d
character class to find all digits in a string:
using System.Text.RegularExpressions;
using static System.Console;
var text = "7 Awesome New Features in C# 12";
var pattern = @"\d";
var matches = Regex.Matches(text, pattern);
foreach (var match in matches)
{
WriteLine(match);
}
Code language: C# (cs)
Output:
3
1
2
To match two digits, you use \d\d
like this:
using System.Text.RegularExpressions;
using static System.Console;
var text = "3 Awesome New Features in C# 12";
var pattern = @"\d\d";
var matches = Regex.Matches(text, pattern);
foreach (var match in matches)
{
WriteLine(match);
}
Code language: C# (cs)
Output:
12
In this example, the \d\d
matches 12, not 3.
Notice that you’ll learn how to use the quantifiers to make the character class more concise like this \d{2}
\w: the word character class
The \w
represents the word character class that matches a single ASCII character including the alphabets, digits, underscores (_)
The following example shows how to use the \w
character class to match all word characters from a string:
using System.Text.RegularExpressions;
using static System.Console;
var text = "C# is awesome";
var pattern = @"\w";
var matches = Regex.Matches(text, pattern);
foreach (var match in matches)
{
WriteLine(match);
}
Code language: C# (cs)
Output:
C
i
s
a
w
e
s
o
m
e
Code language: plaintext (plaintext)
\s: whitespace character class
The \s
represents the whitespace character class that includes newline, tab, vertical tab, space, etc. The following example uses the \s
character class to match whitespace characters in a string:
using System.Text.RegularExpressions;
using static System.Console;
var text = "C# is awesome!";
var pattern = @"\s";
var matches = Regex.Matches(text, pattern);
WriteLine($"{matches.Count} matches found");
foreach (var match in matches)
{
WriteLine(match);
}
Code language: C# (cs)
Output:
2 matches found
It returns two matches that correspond to the two spaces in the string.
Inverse character classes
Inverse character classes are also called negated character classes. They allow you to match any character that is not a specified set of characters. For example, an inverse character class of the digit character class matches any single character except for a digit.
The flowing table displays character classes and their inverse versions:
Character class | Inverse character class | Description |
---|---|---|
\d | \D | Match any character, excluding digits |
\w | \W | Match any character that is not a word character |
\s | \S | Match any character, excluding whitespaces |
For example, the following uses the \D
character class to match the non-digit character of a phone number:
using System.Text.RegularExpressions;
using static System.Console;
var phone = "+1-(408)-555-6666";
var pattern = @"\D";
var matches = Regex.Matches(phone, pattern);
foreach (var match in matches)
{
WriteLine(match);
}
Code language: C# (cs)
Output:
+
-
(
)
-
-
Code language: plaintext (plaintext)
To turn the phone number from +1-(408)-555-6666
format to 14085556666
, you use the Replace()
method of the Regex
class:
using System.Text.RegularExpressions;
using static System.Console;
var phone = "+1-(408)-555-6666";
var pattern = @"\D";
var result = Regex.Replace(phone, pattern, "");
WriteLine(result);
Code language: C# (cs)
Output:
14085556666
Summary
- A character class defines a set of characters.
\d
represents the digit character class.\w
represents the word character class.\s
represents the whitespace character class.\D
,\W
,\S
are inverse character classes of the\d
,\w
,\s
, which matches any character excluding the characters defined in the set.