<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Martijn's C# Programming Blog &#187; regex</title>
	<atom:link href="http://www.dijksterhuis.org/tag/regex/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dijksterhuis.org</link>
	<description>Information, news about programming in C#</description>
	<lastBuildDate>Fri, 07 Aug 2009 21:26:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Advanced Regular Expressions in C#</title>
		<link>http://www.dijksterhuis.org/regular-expressions-advanced/</link>
		<comments>http://www.dijksterhuis.org/regular-expressions-advanced/#comments</comments>
		<pubDate>Wed, 11 Mar 2009 10:44:23 +0000</pubDate>
		<dc:creator>Martijn</dc:creator>
				<category><![CDATA[Regular Expressions]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://www.dijksterhuis.org/?p=846</guid>
		<description><![CDATA[In this third and for now last post on using regular expressions we look at some advanced topics. When your expressions become more complicated they also become harder to understand so documenting them can help. And isn&#8217;t standard string replacement a little bit too basic? We also look at how speeding things up can improve [...]<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.dijksterhuis.org/wp-content/uploads/2009/03/advanced.jpg" alt="Regular Expressions in C# - Advanced Topics" title="Regular Expressions in C# - Advanced Topics" width="570" height="281" class="alignright size-full wp-image-857" /><br />
<em>In this third and for now last post on using regular expressions we look at some advanced topics. When your expressions become more complicated they also become harder to understand so documenting them can help. And isn&#8217;t  standard string replacement a little bit too basic? We also look at how speeding things up can improve your code&#8217;s efficiency.<br />
</em><br />
In this post we look at three topics: </p>
<ol>
<li>Improving your code&#8217;s readability by documenting regular expressions</li>
<li>Creating conditional string replacement by using MatchEvaluators</li>
<li>Speeding up regular expressions by compiling them, caching them in memory and pre-compiling them to their own DLL.</li>
</ol>
<p>If you are new to regular expressions in C# have a look at the theory of regular expression in <a href="http://www.dijksterhuis.org/regular-expressions-in-csharp-the-basics/">Regular Expressions : The Basics</a>. The second post <a href="http://www.dijksterhuis.org/regular-expressions-csharp-practical-use/">Regular Expressions in C#: Practical Usage</a> introduced the most common uses of regular expressions. </p>
<p><span id="more-846"></span></p>
<h3>Documenting your Regular Expressions</h3>
<p><P>Regular expressions can make for fine alphabet soup. The following expression validates an e-mail address and it does a good job at it. It is also very intimidating at first. So just imagine rereading your code after a few weeks, what is going on in there?</P></p>
<pre>
string validEmail = @"\b([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})\b";
</pre>
<p>With a little squinting you see that I would like to extract two groups: the username part, and the domain name part. C# allows us to name each group to make things a little easier to read. We can use the <i>?&lt;groupname&gt;</i> pattern to name each group. </p>
<p>A little rewrite can make our expression a lot easier to read. C# offers the &#8220;#&#8221; character to document our expressions in line.</p>
<pre class="brush: c#">
       static string validEmail = @&quot;\b    # Find a word boundary
                       (?&lt;Username&gt;       # Begin group: Username
                       [a-zA-Z0-9._%+-]+  #  Characters allowed in username, 1 or more
                       )                  # End group: Username
                       @                  # The e-mail &#039;@&#039; character
                       (?&lt;Domainname&gt;     # Begin group: Domain name
                       [a-zA-Z0-9.-]+     #  Domain name(s), we include a dot so that
                                          #  mail.dijksterhuis is also possible
                       .[a-zA-Z]{2,4}     #  The top level domain can only be 4 characters
                                          #  So .info works, .telephone doesn&#039;t.
                       )                  # End group: Domain name
                       \b                 # Ending on a word boundary
                       &quot;;
</pre>
<p>Because we have added a lot of spaces and new lines to our expression we need to tell Regex about them by specifying the<em> RegexOptions.Multiline</em> and <em>RegexOptions.IgnorePatternWhitespace</em> options.</p>
<pre class="brush: c#">
          string testEmail = &quot;martijn@dijksterhuis.org&quot;;
          Regex TestValidEmail = new Regex(validEmail,RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

           // Test the e-mail address
           Match TestResult = TestValidEmail.Match(testEmail);

           if (TestResult.Success)
           {
                Console.WriteLine(&quot;E-mail is: {0}@{1}&quot;,TestResult.Groups[&quot;Username&quot;].Value,
                                                         TestResult.Groups[&quot;Domainname&quot;].Value);
           }
</pre>
<h3>Conditional string replacement</h3>
<p>The<em> RegEx.Replace</em> method allows you to use substitution parameters to change the original content around. In a previous post we looked at how we could swap two words around by using grouped patterns and the $1 and $2 conditional replacement names.</p>
<pre class="brush: c#">
Regex Replacer = new Regex(@&quot;(\w*) (\w*)&quot;);
string Input = &quot;Molly Mallone&quot;;
string Output = Replacer.Replace(Input,&quot;$2 $1&quot;);
Console.WriteLine(Output);
</pre>
<p>That is sufficient if you just want to move the data around a little, but it would be nice if you could make a replacement conditional on some external condition. The <em>Regex.Replace</em> method allows you to specify a<em> MatchEvaluator</em> which does just that. <em>MatchEvaluator</em> is a delegate which takes Match as a parameter and returns the replacement string.</p>
<p>Handy for example if you are cleaning up a mailing list and want to conditionally update some, but not all, e-mail addresses. In the following code example we know that <em>mail.dijksterhuis.org</em> is now served by <em>smtp.dijksterhuis.org</em>, so we want to move all those users to the new domain name and leave all other e-mail addresses the same.</p>
<pre class="brush: c#">
using System;
using System.Text.RegularExpressions;

namespace RegularExpression
{
	class MainClass
	{

    	static string validEmail = @&quot;\b   			# Find a word boundary
							  (?&lt;Username&gt;			# Begin group: Username
							  [a-zA-Z0-9._%+-]+     #  Characters allowed in username, 1 or more
							  )                     # End group: Username
							  @					    # The e-mail &#039;@&#039; character
							  (?&lt;Domainname&gt;        # Begin group: Domain name
							  [a-zA-Z0-9.-]+        #  Domain name(s), we include a dot so that
                                                    #  mail.dijksterhuis is also possible
							  .[a-zA-Z]{2,4}        #  The top level domain can only be 4 characters
													#  So .info works, .telephone doesn&#039;t.
							  )                     # End group: Domain name
                              \b
							  &quot;;

		public static string UpdateDomainNames(Match match)
		{
			if (match.Groups[&quot;Domainname&quot;].Value==&quot;mail.dijksterhuis.org&quot;)
			 return match.Groups[&quot;Username&quot;].Value + &quot;@&quot; + &quot;smtp.dijksterhuis.org&quot;;
			return match.Groups[0].Value; // The original
		}

		public static void Main(string[] args)
		{

		   Regex TestValidEmail = new Regex(validEmail,RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

		   string[] MailingList = new string[] { &quot;martijn@dijksterhuis.org&quot;,
												 &quot;user@mail.dijksterhuis.org&quot;,
												 &quot;willy@wortel.org&quot;};

		   foreach(string email in MailingList)
		   {
				// Conditionaly replace e-mail addresses
				Console.WriteLine( TestValidEmail.Replace(email,UpdateDomainNames) );
		   }

		}
	}
}
</pre>
<h3>Speeding up regular expressions by compiling them</h3>
<p>Regular expressions can be quite slow and in another post I found that a simple string replacement routine <a href="http://www.dijksterhuis.org/manipulating-strings-in-csharp-replacing-part-string/">was some 40 times faster</a> than the equivalent regular expression.  Often you will want to stick with the regular expression as it will save you many lines of coding. </p>
<p>As the <em>RegEx</em> class encounters your expressions it compiles them to an internal format. It steps through this internal format each time you query the expression. It is also possible compile your expression to MSIL (the byte code to which C# is compiled) directly. In the best possible scenario the Just-In-Time compiler then translates this MSIL code directly to machine code giving another speed boost to your expression.</p>
<p>A note of caution: According to the MSDN team<a href="http://blogs.msdn.com/bclteam/archive/2004/11/12/256783.aspx"> the increase in speed can be up to 30%</a> which is nice but certainly isn&#8217;t amazing.</p>
<p>You can do this by setting the <em>RegexOptions.Compiled</em> option when you create a new RegEx:</p>
<blockquote><p>Regex theExpression = new Regex(thePattern,RegexOptions.Compiled);</p></blockquote>
<p>The penalty for this is the time to compile the expression which can add significantly to your applications start-up time. So although &#8220;compiled&#8221; might sound faster it might actually be slower. This is best applied if you frequently use the expression and it has a very long lifetime.</p>
<p><B>The expression cache</b></p>
<p>If you use many regular expressions the RegEx cache is also an important factor in how quickly your code executes.  Each time you define a regular expression the library needs to parse it. If you frequently use a small set of regular expressions they won&#8217;t be compiled over and over again, instead they come from a cache. You will find that .NET/C# caches the last 15 expressions. Any more and it will have to recompile them as it encounters them.</p>
<p>It is possible to expand the size of the cache by setting the <em>Regex.CacheSize</em> property to a higher value. This is probably best done after you made an overview of how many expressions are used by your code.</p>
<p><b>Compiling to an assembly</b></p>
<p>For compiling a regular expression to MSIL you need to pay a hefty price. But with your project about to ship it might be worthwhile to investigate taking your most frequently used regular expressions and putting them pre-compiled into a new assembly. The <em>Regex.CompileToAssembly</em> method performs this function. You will have to write a separate program to do the actual compilation, but once done you can link in the regular expression like any other assembly to your main application.</p>
<p>You can use the following class to create your own set of regular expressions and save them to a new assembly: </p>
<pre class="brush: c#">
using System;
using System.Collections;
using System.Text.RegularExpressions;

namespace CompileExpression
{
	class MainClass
	{
		// Add the expressions to the hash table
	 	public static Hashtable TheExpressions = new Hashtable();

		// CompileExpressions
		public static void CompileExpressions(string AssemblyName)
		{
			// Reserve space for each expression
			RegexCompilationInfo[] CI = new RegexCompilationInfo[TheExpressions.Count];

			int Cnt = 0;
        	foreach(DictionaryEntry de in TheExpressions)
        	{
				CI[Cnt++] = new RegexCompilationInfo((string)de.Value,		  // the reg. ex pattern
				                                     RegexOptions.Compiled,   // Options to specify
				                                     (string)de.Key,		  // name of the pattern
				                                     &quot;TheRegularExpressions&quot;, // name space name
				                                     true );                  // Public?
        	}

		   // Create a new assembly name structure
		   System.Reflection.AssemblyName aName = new System.Reflection.AssemblyName( );

		   // Assign the name
  		   aName.Name = AssemblyName;

		   // Compile all the regular expressions into the assembly
  		   Regex.CompileToAssembly(CI, aName);
		}

		public static void Main(string[] args)
		{
			// Add two expressions to the collection
			TheExpressions.Add(&quot;FindHTML&quot;,@&quot;(&lt;\/?[^&gt;]+&gt;)&quot;);
			TheExpressions.Add(&quot;FindTCPIP&quot;, @&quot;(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})&quot;);

			// Compile them to my new assembly called &quot;RegEx&quot;
			CompileExpressions(&quot;RegEx&quot;);
		}
	}
}
</pre>
<p>This will create a file called &#8220;RegEx.dll&#8221; in the home directory of your program. The next step is to verify if this works as advertised. Create a new project in Visual Studio and add a reference (in the Solution Explorer right click the name of the new project and click &#8220;Add Reference&#8230;&#8221; and navigate to where the RegEx.DLL file is located.</p>
<p>The following class will load the FindTCPIP expression from the DLL and execute it: </p>
<pre class="brush: c#">
using System;

namespace TCPSolution
{
   class Program
   {
       static void Main(string[] args)
       {
           TheRegularExpressions.FindTCPIP MatchTCP = new TheRegularExpressions.FindTCPIP();

           if (MatchTCP.Match(&quot;10.0.0.6&quot;).Success)
           {
               Console.WriteLine(&quot;This works!&quot;);
           }
       }
   }
}
</pre>
<h3>Regular Expressions and Mono</h3>
<p>I tested, prodded and played with the code for these regular expression posts on MonoDevelop and Mono. With the exception of the final &#8220;Compile to DLL&#8221; example. The code for that example compiles but on execution it will throw an &#8220;Not Implemented&#8221; exception in <em>Regex.CompileToAssembly</em>. </p>
<h3>The end</h3>
<p>This ends the mini series of three posts on regular expressions. I hope you have enjoyed them. The previous posts in this series are: </p>
<ul>
<li><a href="http://www.dijksterhuis.org/regular-expressions-in-csharp-the-basics/">Regular Expressions : The Basics</a>. The theory behind regular expressions. </li>
<li><a href="http://www.dijksterhuis.org/regular-expressions-csharp-practical-use/">Regular Expressions in C#: Practical Usage</a> Examples of common usage. </li>
</ul>
<p><a href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2fwww.dijksterhuis.org%2fregular-expressions-advanced"><img src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2fwww.dijksterhuis.org%2fregular-expressions-advanced%2f%3fpreview%3dtrue" border="0" alt="kick it on DotNetKicks.com" /></a></p>
<p>Image through Flickr by <a rel="nofollow" href="http://www.flickr.com/photos/djenan/">Djenan</a></p>
<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.dijksterhuis.org/regular-expressions-advanced/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Regular Expressions in C# &#8211; Practical Usage</title>
		<link>http://www.dijksterhuis.org/regular-expressions-csharp-practical-use/</link>
		<comments>http://www.dijksterhuis.org/regular-expressions-csharp-practical-use/#comments</comments>
		<pubDate>Tue, 10 Mar 2009 07:02:10 +0000</pubDate>
		<dc:creator>Martijn</dc:creator>
				<category><![CDATA[Regular Expressions]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://www.dijksterhuis.org/?p=808</guid>
		<description><![CDATA[This is the second post in the C# regular expression series and it follows up on &#8220;Regular Expressions in C# &#8211; The Basics&#8221; which explained the theory behind Regular expressions in C#. In this post we look at how to make practical use of regular expressions in our C# code. This post touches on four [...]<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.dijksterhuis.org/wp-content/uploads/2009/03/lions1.jpg" alt="Regular Expression - Practical Usage" title="Regular Expression - Practical Usage" width="570" height="253" class="alignright size-full wp-image-826" /></p>
<p><i>This is the second post in the C# regular expression series and it follows up on &#8220;<a href="http://www.dijksterhuis.org/regular-expressions-in-csharp-the-basics/">Regular Expressions in C# &#8211; The Basics</a>&#8221; which explained the theory behind Regular expressions in C#. In this post we look at how to make practical use of regular expressions in our C# code.</i> </p>
<p>This post touches on four major regular expression subjects:</p>
<ul>
<li><strong>String Comparison</strong> &#8211; does a string contain a particular sub-string?</li>
<li><strong>Splitting a string into segments</strong> &#8211; we will take an IPv4 address and retrieve its dotted components</li>
<li><strong>Replacement</strong> &#8211; modifying an input string</li>
<li><strong>Stricter input validation</strong> &#8211; how to harden your expressions</li>
</ul>
<p><span id="more-808"></span></p>
<h3>String Comparison &#8211; finding valid HTML tags</h3>
<p>One of the essential functions of expressions are their ability to find if a string is contained inside another one. The <strong>RegEx.Matches</strong> method tests if a given string matches the pattern. </p>
<p>We start with a simple example: finding out where the letter &#8220;a&#8221; is mentioned in a sentence:</p>
<pre class="brush: c#">
            string Input = &quot;apples make for great party accessories&quot;;
            Regex FindA = new Regex(&quot;a&quot;);

            foreach(Match Tag in FindA.Matches(Input))
            {
                Console.WriteLine(&quot;Found &#039;a&#039; at {0}&quot;,Tag.Index);
            }
</pre>
<p>That was almost too easy. Regular expressions really shine if you don&#8217;t know exactly what you are looking for but you can describe it. In the following example we will look for all valid HTML tags in an input string.</p>
<p>What is a valid HTML tag? &lt;code&gt;, &lt;/code&gt;, &lt;b&gt;,&lt;img src=&#8221;"&gt;, &lt;/br&gt; are all valid HTML tags.</p>
<blockquote><p>Regex HTMLTag = new Regex(@&#8221;(<\/?[^>]+>)&#8221;);</p></blockquote>
<p>To break this down:</p>
<ol>
<li>All valid HTML tags start with a &#8220;&lt;&#8221;</li>
<li>They might or not have a forward slash (we need to escape the forward slash) \/?</li>
<li>There is at least one or more characters which are not &#8220;&gt;&#8221;</li>
<li>The tag ends with a &#8220;&gt;&#8221;</li>
</ol>
<p>The following code example searches for all valid HTML tags in the input string:</p>
<pre class="brush: c#">
using System;
using System.Text.RegularExpressions;

namespace RegularExpression
{
    class MainClass
    {
        public static void Main(string[] args)
        {
            Regex HTMLTag = new Regex(@&quot;(&lt;\/?[^&gt;]+&gt;)&quot;);

            string Input = &quot;&lt;b&gt;&lt;i&gt;&lt;a href=&#039;http://apple.com&#039;&gt;Ipod News&lt;/a&gt;&lt;/b&gt;&lt;/i&gt;&quot;;

            foreach(Match Tag in HTMLTag.Matches(Input))
            {
                Console.WriteLine(&quot;Found {0}&quot;,Tag.Value);
            }
        }
    }
}
</pre>
<p>Resulting in: </p>
<div style="margin-left: 40px;">Found &lt;b&gt;<br />
Found &lt;i&gt;<br />
Found &lt;a href=&#8217;http://apple.com&#8217;&gt;<br />
Found &lt;/a&gt;<br />
Found &lt;/b&gt;<br />
Found &lt;/i&gt;</div>
<h3>Splitting a string into parts</h3>
<p>Parentheses () not only allow you to group your expressions into parts they allow you to split a single string into multiple segments which we can inspect individually.  To demonstrate we will use a regular expression to split an IPv4 address into its components. </p>
<p>A decimal TCP/IP address looks like <b>XXX.XXX.XXX.XXX</b> with X being a decimal number. Each column has at least 1 digit, and a maximum of 3. So a single column can be described as &#8220;<b>(\d{1-3})</b>&#8220;. There are four columns, each seperated by a dot. The dot (.) has a special meaning in regex so we need to escape it. <b>(\.)</b></p>
<p>The <b>Regex.Match</b> method returns a new <b>Match</b> instance. We can now test <b>Match.Success</b> to see if the input string matched the TCP/IP address pattern. Through the <b>Match.Groups</b> property can we then extract each of the four IP address columns.The zero entry in the Groups property is alway the complete match, in this case &#8220;10.0.0.6&#8243;. The [1] entry contains the first groups contents, [2] the second etc. </p>
<pre class="brush: c#">
            string IPMatchExp = @&quot;(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})&quot;;
            Match theMatch  = Regex.Match(&quot;10.0.0.6&quot;,IPMatchExp);
            if (theMatch.Success)
            {
                Console.WriteLine(&quot;{0}.{1}.{2}.{3}&quot;,theMatch.Groups[1].Value,
                                                      theMatch.Groups[2].Value,
                                                      theMatch.Groups[3].Value,
                                                      theMatch.Groups[4].Value);
            }
</pre>
<h3>String Replacement</h3>
<p>Often is useful to manipulate a string, by replacing the matched pattern with something new. The <b>RegEx.Replace</b> method allows us to specify a pattern to look for and a replacement string. </p>
<p>The following example matches the last character and space following each word and replaces it with &#8220;b_&#8221;. </p>
<pre class="brush: c#">
            Regex Replacer = new Regex(@&quot;\w &quot;); // Single [a-zA-Z] followed by a space
            string Input  = &quot;ax bx sax dam pom&quot;;
            string Output = Replacer.Replace(Input,&quot;b_&quot;); // Replace all items found with a b and underscore
            Console.WriteLine(Output);
</pre>
<p><b>Substitution Patterns</b></p>
<p>What to do if you would like to flip parts of a string? C# offers several substitution patterns for this. Substitution patterns can only be used in a replacement string, and are used in combination with grouping. </p>
<p>They are useful if you would like to format the results of the match. A common task is to flip two words around.  In the below example we flip the name &#8220;Molly Malone&#8221; into &#8220;Malone Molly&#8221;: </p>
<pre class="brush: c#">
            Regex Replacer = new Regex(@&quot;(\w*) (\w*)&quot;);
            string Input  = &quot;Molly Mallone&quot;;
            string Output = Replacer.Replace(Input,&quot;$2 $1&quot;);
            Console.WriteLine(Output);
</pre>
<p>The regular expression is defined as two groups of words (\w*) separated by a space. Each group can be referred to with a substitution pattern. $1 refers to the first group, $2 to the second (and if we had defined more $3 would be the third etc).</p>
<h3>Input validation &#8211; we have to be more strict</h3>
<p>Often we need to check if the data inputed or read from a file matches a definition so that we know its valid. But for this to work we need to ensure that our expressions only match a valid input. Many expressions of convenience are defined too loose. If we are to use them for input validation we need to harden them. </p>
<p>The pattern we used in an earlier example neatly broke down a valid IP address. But it wasn&#8217;t very strict and there are many combinations that would have matched that aren&#8217;t valid IP addresses. <b>999.999.999.999</b> is not a valid IPv4 address but it would have matched our pattern (<b>@&#8221;(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})&#8221;</b>). So we couldn&#8217;t have used it for testing for a valid IP address.</p>
<p>So what is a valid match? We need to define this first.</p>
<p>A valid IP address range is from <b>0.0.0.0</b> to <b>255.255.255.255</b> (with each column being represented by a byte).</p>
<p>At this point there are two things we can do: we can validate the results returned by our expressions with a few additional lines of C# code or we modifying our regular expression to become stricter. As this post is about regular expressions we will modify our expression to match only valid IP addresses.</p>
<p>How do we define valid ? 0,9,10,19,100,199,200,249,255 are all valid inputs for each column. 300 isn&#8217;t valid, and neither is 299. To keep things simple, we don&#8217;t allow 09 as a valid input. </p>
<ul>
<li>Single digit: 0 &#8211; 9 :&nbsp;&nbsp; [0-9]</li>
<li>Double digit: 10 &#8211; 99: [1-9][0-9]</li>
<li>Triple digit 1:&nbsp; 100 &#8211; 199:&nbsp; 1[0-9]{2}</li>
<li>Triple digit 2: and 200 &#8211; 249:&nbsp; 2[0-4][0-9]</li>
<li>Triple digit 3: 250 &#8211; 255 25[0-5]</li>
</ul>
<p>The single ([0-9])and double digit ([1-9][0-9]) combinations can be combined into: <b>[1-9]?[0-9]</b>. (Read as: The first 1-9 is optional, occurs 0 or 1 time)</p>
<p>So a single column can be defined as:&nbsp;<b>(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\</b><b>.)</b> Note the &#8220;.&#8221; at the end.</p>
<p>On the final column we do not need a &#8220;dot&#8221;. We can save some space by repeating the first expression three times, but we need to write out the fourth in full. Thus our expressions becomes: <b>([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.{3}</b><b>([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])</b></p>
<p>Not exactly easy to read, but lets test to see if it works as expected. The following example program tries all column combinations from 0-999</p>
<pre class="brush: c#">
using System;
using System.Text.RegularExpressions;

namespace RegularExpression
{
    class MainClass
    {
        public static void Main(string[] args)
        {
            string IPTestExp = @&quot;(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|255[0-5])\.){3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])&quot;;

            for (int Lp = 0; Lp &lt; 999; Lp++)
            {
                string IPAddress = String.Format(&quot;{0}.{0}.{0}.{0}&quot;,Lp);

                if (Regex.Match(IPAddress,IPTestExp).Success)
                    Console.WriteLine(&quot;{0} is valid&quot;,IPAddress);
                else
                {
                    Console.WriteLine(&quot;{0} is invalid&quot;,IPAddress);
                    break;
                }
            }
          }
    }
}
</pre>
<p>For brevity the program ends at the first invalid combination. If we had let it run it would have shown 256-999 as invalid.</p>
<div style="margin-left: 40px;">0.0.0.0 is valid<br />
1.1.1.1 is valid<br />
2.2.2.2 is valid<br />
&#8230;<br />
254.254.254.254 is valid<br />
255.255.255.255 is valid<br />
256.256.256.256 is invalid</div>
<p>This took a bit of work but we now have a single line test to see if a string is a valid IPv4 address.</p>
<p><b>Concluding</b></p>
<p>This ends the second post in this series. In the next post I will look at some advanced regular expression topics. </p>
<p>If you would like to read more on the theory behind regular expressions have a look at the first post in the series: <a href="http://www.dijksterhuis.org/regular-expressions-in-csharp-the-basics/">Regular Expressions in C# &#8211; The Basics</a></p>
<p>Image credit: <a rel="nofollow" href="http://www.flickr.com/photos/tambako/">Tambako</a></p>
<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.dijksterhuis.org/regular-expressions-csharp-practical-use/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Regular Expressions in C# &#8211; The Basics</title>
		<link>http://www.dijksterhuis.org/regular-expressions-in-csharp-the-basics/</link>
		<comments>http://www.dijksterhuis.org/regular-expressions-in-csharp-the-basics/#comments</comments>
		<pubDate>Mon, 09 Mar 2009 03:49:22 +0000</pubDate>
		<dc:creator>Martijn</dc:creator>
				<category><![CDATA[Regular Expressions]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://www.dijksterhuis.org/?p=789</guid>
		<description><![CDATA[One of the most common coding tasks is to take an input, munch it around and turn it into something different altogether. Are you looking for FedEx numbers in a text file? Do you want to replace &#8220;love&#8221; with &#8220;hate&#8221; in your source files? Is a string a valid e-mail address? Problems like these can [...]<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.dijksterhuis.org/wp-content/uploads/2009/03/expression.jpg" alt="Regular Expressions in C#" title="Regular Expressions in C#" width="580" height="206" class="alignright size-full wp-image-800" /></p>
<p><em>One of the most common coding tasks is to take an input, munch it around and turn it into something different altogether. Are you looking for FedEx numbers in a text file? Do you want to replace &#8220;love&#8221; with &#8220;hate&#8221; in your source files? Is a string a valid e-mail address? Problems like these can be solved by applying regular expressions, or &#8220;regex&#8221; for short. </em><br />
<span id="more-789"></span></p>
<h3>Introduction</h3>
<p>This post explores the basic theory of expressions. If you are already familiar with them and want to know how to use them in your own C# programs have a look at the next post &#8220;<a href="http://www.dijksterhuis.org/regular-expressions-csharp-practical-use/">Regular Expressions in C# &#8211; Practical Applications</a>&#8221; </p>
<p>Expressions offer a method of describing and testing for particular combinations of characters in a string. A simple regular expression can often save you from having to write many lines of regular code.</p>
<ul>
<li>Are you looking for the characters &#8220;car&#8221;  in &#8220;cartoon&#8221;, &#8220;carbonate&#8221; or  &#8220;carton&#8221; ?</li>
<li>Do you want to only match when the word &#8220;car&#8221; is standing by itself as in  &#8220;car sales for 2009&#8243; ?</li>
<li>Or only return true when the car is red or blue ? &#8220;blue car&#8221;/ &#8220;red car&#8221; / &#8220;green car&#8221;</li>
</ul>
<p>In C# expressions are provided by the <em>RegEx</em> class in the <em>System.Text.RegularExpressions</em> namespace.</p>
<p>The expressions themselves are more or less standard between computer languages. You can often take an expression from another language and with a little or no work apply them to your C# code. If you are not familiar with them yet you should consider learning to use them.</p>
<div id="attachment_797" class="wp-caption alignright" style="width: 160px"><a href="http://xkcd.com/208/"><img src="http://www.dijksterhuis.org/wp-content/uploads/2009/03/3005983191_41ca486eec-150x150.jpg" alt="Regular Expressions to the rescue" title="Regular Expressions to the rescue" width="150" height="150" class="size-thumbnail wp-image-797" /></a><p class="wp-caption-text">Regular Expressions to the rescue</p></div>
<p><strong>What can you use regular expressions for?</strong></p>
<ul>
<li><strong>Data capture</strong>: split a string into multiple fields which you can manipulate. 13-Jan-2006 becomes (day,month,year)</li>
<li><strong>Data input validation</strong>: Check if the input followed the required formatting rules. For example test if a valid telephone number was entered.</li>
<li><strong>String comparison</strong>: Does A exist in B?</li>
<li><strong>String replacement</strong>: Replace &#8220;foo&#8221; with &#8220;bar&#8221;</li>
<li><strong>Code size reduction</strong>: One line of regular expression code can replace large amount of dedicated code</li>
</ul>
<p><strong>When not to use regular expressions?</strong></p>
<p>Don&#8217;t use them when <strong>speed</strong> is of the essence. Expressions have a serious drawback in that they can be slow to execute. If you are concerned about optimizing a part of your code it can be worthwhile to write your own replacement. In <a id="s-38" title="a previous post" href="../manipulating-strings-in-csharp-replacing-part-string/">a previous post</a> I noticed that a simple string replacement routine was 40 times faster than the regular expression equivalent.</p>
<h3>The basics</h3>
<p>To understand expressions we need a little bit of theory. This bit explains all the main operators and how to use them.</p>
<p><strong>Literal characters</strong></p>
<p>The most basic expression contains a single character. If we define &#8220;c&#8221; as the expression and test it against &#8220;car company&#8221; it will match against the &#8220;c&#8221; in &#8220;car&#8221;. If we ask the RegEx class to search again it will match against the &#8220;c&#8221; in &#8220;company&#8221;.</p>
<p>Several characters have a special meaning: ?, +, *, \, [, ( , ), ], {, }, . (dot) and ^</p>
<p>If we want to include them we need to escape them first using a backslash:</p>
<ul>
<li>10 * 10 = 100 <strong><em style="color: #ff0000;">wrong</em></strong></li>
<li><em>10 \* 10 = 100 <strong><span style="color: #00ff00;">ok</span></strong></em></li>
</ul>
<p><em>Normally when parsing strings C# will try to break down escaped sequences such as \n,\r etc. Expression statements usually contain many backslash operators. By adding the &#8220;@&#8221; string literal the compiler will not inspect the string too much and take it literally instead.<br />
</em></p>
<div style="margin-left: 40px;">string exampleLiteral = @&#8221;10 \* 10 = 100&#8243;;</div>
<p><strong>Character Sets</strong></p>
<p>Character sets allow us to limit the characters that can match. Say for example we want to use just the numbers 0-9: <strong>[0-9]</strong> , or the characters a-z &amp; A-Z: <strong>[a-zA-Z]</strong>. A character set only matches against a single character, so the following doesn&#8217;t work: &#8220;c[a-z]kie&#8221; matches against &#8220;cokie&#8221; but not &#8220;cookie&#8221;.</p>
<p>You can also define your own sets. If you are matching a date, a date separator can be a defined as a space, dash or slash: <strong>[ -/]</strong></p>
<p>Many character sets are used so often that they have been given their own shorthands:</p>
<ul>
<li>\w matches any word character [a-z,A-Z]</li>
<li>\s matches any whitespace (space, tab)</li>
<li>\d matches against any digit [0-9]</li>
</ul>
<p>For a longer list of the available short hands have a look at my <a id="tv6h" title="C# Regular Expression Cheat Sheet" href="../csharp-regular-expression-operator-cheat-sheet/">C# Regular Expression Cheat Sheet</a> .<strong></strong></p>
<p><strong>The Dot is special</strong></p>
<p><strong></strong>The dot &#8220;.&#8221; matches against any character, except for line breaks. You should use it sparingly as it can introduce unwanted results. Often it is better to be more specific, using \w or \d, or a character set that limits the set of possible characters.</p>
<ul>
<li><strong>&#8220;g..gle&#8221;</strong> matches &#8220;google&#8221;, &#8220;gaagle&#8221;,&#8221;g%$gle&#8221; and much more.</li>
<li><strong>&#8220;\d\d.\d\d.\d\d&#8221;</strong> matches a valid date such as &#8220;12-08-99&#8243;  and &#8220;12/08/99&#8243; but also to an invalid date: &#8220;12508799&#8243;</li>
</ul>
<p><strong>Creating alternatives using the boolean &#8220;or&#8221;</strong></p>
<p>A vertical bar separates (|) alternatives, so &#8220;red|blue car&#8221; would match either a red or blue car. Written in C# code: </p>
<p>if (Regex.Match(&#8220;blue car&#8221;,&#8221;blue|red car&#8221;).Success)<br />
Console.WriteLine(&#8220;Matches!&#8221;);</p>
<p>You can add as many alternatives as you would like, so &#8220;red|blue|purple|yellow car&#8221; are all possible.</p>
<p><strong>Grouping with parentheses ()</strong></p>
<p>Parentheses () make it easier to group things  together. So if you would like to match for either &#8220;color&#8221; or &#8220;colour&#8221; you could write the word &#8220;color&#8221; (or &#8220;colour&#8221;) as one of:</p>
<ul>
<li>col(o|ou)r</li>
<li>(color|colour)</li>
</ul>
<p><strong>Repetition</strong></p>
<p>A repetition quantifier specifies how often a preceding element is allowed to repeat.</p>
<table border="0" cellspacing="1" cellpadding="1">
<tbody>
<tr style="vertical-align: top;">
<td style="width: 15px;"><code><strong>?</strong></code></td>
<td>A question mark indicates <em>zero or one</em> of the preceding element. For example &#8220;S?DRAM&#8221; matches &#8220;SDRAM&#8221; and &#8220;DRAM&#8221;</td>
</tr>
<tr style="vertical-align: top;">
<td><code><strong>*</strong></code></td>
<td>The asterisk indicates there are <em>zero or more</em> of the preceding element. For example, <code>ab*c</code> matches &#8220;<em>ac</em>&#8220;, &#8220;<em>abc</em>&#8220;, &#8220;<em>abbc</em>&#8220;, &#8220;<em>abbbc</em>&#8220;, and so on.</td>
</tr>
<tr style="vertical-align: top;">
<td><code><strong>+</strong></code></td>
<td>The plus sign indicates that there is <em>one or more</em> of the preceding element. For example, <code>ab+c</code> matches &#8220;<em>abc</em>&#8220;, &#8220;<em>abbc</em>&#8220;, &#8220;<em>abbbc</em>&#8220;, and so on, but not &#8220;<em>ac</em>&#8220;.</td>
</tr>
<tr style="vertical-align: top;">
<td>{n}{n,}{n,m}</td>
<td>If you would like to match an exact number of times use <em>{n}</em>, for at least n matches use <em>{n,}</em>. For at least <em>n</em> matches, and more than <em>m</em> use <em>{n,m}</em></td>
</tr>
</tbody>
</table>
<p>To give some examples: </p>
<ul>
<li>\d{1,3} reads as &#8220;a decimal digit (0-9)&#8221;, minimum of 1, maximum of 3</li>
<li>[az]+ reads as &#8220;one or more of a-z&#8221;, &#8220;abc&#8221; matches, and so does &#8220;axxxz&#8221;</li>
</ul>
<p>In the following example &#8220;aab&#8221; matches, but so does &#8220;aaab&#8221;.</p>
<div style="margin-left: 40px;">// {a2,3}b reads as: 2 or 3 times a, followed by a b<br />
if (Regex.Match(&#8220;aab&#8221;,&#8221;a{2,3}b&#8221;).Success)<br />
Console.WriteLine(&#8220;Matches!&#8221;);<br />
else<br />
Console.WriteLine(&#8220;No Match!&#8221;);</div>
<p>Repetition is useful for testing if an input matches a required pattern. If you need to test for a telephone number formatted as : <strong>XXX-XXXX</strong> you could write this as <strong>\d{3}[-]\d{4}</strong>.</p>
<p><strong>Lazy and Greedy matching</strong></p>
<p>All the above repetition operators are &#8220;greedy&#8221;, they match to the longest possible string they can find.</p>
<ul>
<li><strong>a[bz]+z </strong>against &#8220;<strong>abcbzcdze</strong>&#8221; returns &#8220;<strong>abcbzcdz</strong>&#8220;</li>
<li><strong>&lt;a</strong><tt class="regex"><strong>.+&gt;</strong> against "<strong>&lt;a href='index.php'&gt;Beginning&lt;/a&gt;</strong>" matches everything, instead of just the opening &lt;a href""&gt;.</tt></li>
</ul>
<p>To avoid this we can apply &#8220;lazy&#8221; matching instead. In a lazy match, as soon as it finds a match the parser stops and returns the result. You can make a match lazy by simply adding a question mark:</p>
<ul>
<li><strong>a[bz]+?z </strong>against &#8220;<strong>abcbzcdze</strong>&#8221; returns &#8220;<strong>abcbz</strong>&#8220;</li>
<li><strong>&lt;a</strong><tt class="regex"><strong>.+?&gt;</strong> against "<strong>&lt;a href='index.php'&gt;Beginning&lt;/a&gt;</strong>" returns </tt><tt class="regex"><strong>&lt;a href='index.php'&gt;</strong></tt><tt class="regex">.</tt></li>
</ul>
<p><strong>Anchoring</strong></p>
<p>All the above examples didn&#8217;t care where in the string the match was made. You could also use them repeatedly to find more instances of the match in the input string. Anchoring allows you to match only those strings that are close to the beginning and/or end.</p>
<ul>
<li><strong>^string </strong>reads as: only match if &#8220;string&#8221; is at the beginning of the input. The &#8220;^&#8221; indicates the beginning. So &#8220;string of wool&#8221; matches, but &#8220;woolly string&#8221; doesn&#8217;t.</li>
<li><strong>string$ </strong>reads as: only match if &#8220;string&#8221; is at the end of the input. Here the &#8220;$&#8221; indicates the end. In this case &#8220;string of wool&#8221; can&#8217;t match, but &#8220;woolly string&#8221; can.</li>
<li><strong>$string^</strong> reads as: only match if &#8220;string&#8221; is the whole pattern. The &#8220;s&#8221; comes as the first character, and the &#8220;g&#8221; as the last. So only &#8220;string&#8221; can match this pattern.</li>
</ul>
<p>This ends the theoretical introduction to Regular Expressions &#8212; see also the next post &#8220;<a href="http://www.dijksterhuis.org/regular-expressions-csharp-practical-use/">Regular Expressions in C# &#8211; Practical Applications</a>&#8221; .</p>
<p>Image credit: <a rel="nofollow" href="http://www.flickr.com/photos/sarae/2082776106/">Sarae</a></p>
<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.dijksterhuis.org/regular-expressions-in-csharp-the-basics/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>C# Regular Expression Cheat Sheet</title>
		<link>http://www.dijksterhuis.org/csharp-regular-expression-operator-cheat-sheet/</link>
		<comments>http://www.dijksterhuis.org/csharp-regular-expression-operator-cheat-sheet/#comments</comments>
		<pubDate>Fri, 06 Mar 2009 05:38:31 +0000</pubDate>
		<dc:creator>Martijn</dc:creator>
				<category><![CDATA[Learn C#]]></category>
		<category><![CDATA[Regular Expressions]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://www.dijksterhuis.org/?p=769</guid>
		<description><![CDATA[I have been doing quite a bit with regular expressions recently and to avoid having to look them up again and again I made myself a little table with the most important C# regular expression operators and stuck it on the wall. This post contains the C# regular expression operators as used by the .NET [...]<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></description>
			<content:encoded><![CDATA[<p>I have been doing quite a bit with regular expressions recently and to avoid having to look them up again and again I made myself a little table with the most important C# regular expression operators and stuck it on the wall. This post contains the C# regular expression operators as used by the .NET regular expression classes such as <em>RegEx</em>.</p>
<p>If you would like to print this, click here for a <a href='http://www.dijksterhuis.org/wp-content/uploads/2009/03/regular_expressions_in_.html'>pure HTML version</a>. </p>
<p><span id="more-769"></span></p>
<h3>Escape Characters</h3>
<table width="100%" border="1" cellpadding="0" cellspacing="0">
<tr>
<td width="20%"><b>Character</b></td>
<td><b>Description</b></td>
</tr>
<tr>
<td>ordinary characters</td>
<td>Characters other than . $ ^ { [ ( | ) * + ? \ match<br />
themselves.</td>
</tr>
<tr>
<td>. (dot)</td>
<td>Matches any character</td>
</tr>
<tr>
<td>\w</td>
<td>Matches any word character. </td>
</tr>
<tr>
<td>\W</td>
<td>The negation of \w</td>
</tr>
<tr>
<td>\s</td>
<td>Matches any white-space character.</td>
</tr>
<tr>
<td>\S</td>
<td>Matches any non-white-space character. </td>
</tr>
<tr>
<td>\d</td>
<td>Matches any decimal digit. </td>
</tr>
<tr>
<td>\D</td>
<td>Matches any non-decimal digit.</td>
</tr>
<tr>
<td><b>\a</b></td>
<td>Matches a bell (alarm) \u0007.</td>
</tr>
<tr>
<td><b>\b</b></td>
<td>Matches a backspace \u0008 if in a [] character class</td>
</tr>
<tr>
<td><b>\t</b></td>
<td>Matches a tab</td>
</tr>
<tr>
<td><b>\r</b></td>
<td>Carriage return</td>
</tr>
<tr>
<td><b>\v</b></td>
<td>Vertical tab</td>
</tr>
<tr>
<td><b>\f</b></td>
<td>Form feed</td>
</tr>
<tr>
<td><b>\n</b></td>
<td>New line</td>
</tr>
<tr>
<td><b>\e</b></td>
<td>Matches an escape</td>
</tr>
<tr>
<td><b>\040</b></td>
<td>Matches an ASCII character as octal (up to three digits);</td>
</tr>
<tr>
<td><b>\x20</b></td>
<td>Matches an ASCII character using hexadecimal representation<br />
(exactly two digits).</td>
</tr>
<tr>
<td><b>\cC</b></td>
<td>Matches an ASCII control character; for example, \cC is<br />
control-C.</td>
</tr>
<tr>
<td><b>\u0020</b></td>
<td>Matches a Unicode character using hexadecimal representation<br />
(exactly four digits).</td>
</tr>
<tr>
<td><b>\</b></td>
<td>When followed by a character that is not recognized as an<br />
escaped character, matches that character. For example, <b>\*</b><br />
is the same as <b>\x2A</b>.</td>
</tr>
</table>
<p></p>
<h3>Alternation</h3>
<table width="100%" cellpadding="0" cellspacing="0" border="1">
<tbody>
<tr>
<th width="20%"><b>Alternation</b></th>
<th><b>Definition</b></th>
</tr>
<tr>
<td><b>|</b></td>
<td>Matches any one of the terms separated by the | (vertical bar)<br />
character; for example, <span>cat|dog|tiger</span>. The leftmost<br />
successful match wins.</td>
</tr>
<tr>
<td><b>(?(</b><i>expression</i><b>)yes|no)</b></td>
<td>Matches the &#8220;yes&#8221; part if the expression matches at this point;<br />
otherwise, matches the &#8220;no&#8221; part.&nbsp;</td>
</tr>
<tr>
<td><b>(?(</b><i>name</i><b>)yes|no)</b></td>
<td>Matches the &#8220;yes&#8221; part if the named capture string has a match;<br />
otherwise, matches the &#8220;no&#8221; part.</td>
</tr>
</tbody>
</table>
<p></p>
<h3>Substitutions</h3>
<table width="100%" border="1" cellpadding="0" cellspacing="0">
<tr>
<td width="20%"><b>Character</b></td>
<td><b>Description</b></td>
</tr>
<tr>
<td><b>$</b><i>number</i></td>
<td>Substitutes the last substring matched by group number<br />
<i>number</i> (decimal).</td>
</tr>
<tr>
<td><b>${</b><i>name</i><b>}</b></td>
<td>Substitutes the last substring matched by a<br />
(?&lt;<i>name</i>&gt; ) group.</td>
</tr>
<tr>
<td><b>$$</b></td>
<td>Substitutes a single &#8220;$&#8221; literal.</td>
</tr>
<tr>
<td><b>$&amp;</b></td>
<td>Substitutes a copy of the entire match itself.</td>
</tr>
<tr>
<td><b>$`</b></td>
<td>Substitutes all the text of the input string before the<br />
match.</td>
</tr>
<tr>
<td><b>$&#8217;</b></td>
<td>Substitutes all the text of the input string after the<br />
match.</td>
</tr>
<tr>
<td><b>$+</b></td>
<td>Substitutes the last group captured.</td>
</tr>
<tr>
<td><b>$_</b></td>
<td>Substitutes the entire input string.</td>
</tr>
</table>
<p></p>
<h3>Word boundaries</h3>
<table width="100%" border="1" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td width="20%"><b>Assertion</b></td>
<td><b>Description</b></td>
</tr>
<tr>
<td><b>^</b></td>
<td>Specifies that the match must occur at the beginning of the<br />
string or the beginning of the line.</td>
</tr>
<tr>
<td><b>$</b></td>
<td>Specifies that the match must occur at the end of the string,<br />
before <b>\n</b> at the end of the string, or at the end of the<br />
line.</td>
</tr>
<tr>
<td><b>\A</b></td>
<td>Specifies that the match must occur at the beginning of the<br />
string (ignores the <b>Multiline</b> option).</td>
</tr>
<tr>
<td><b>\Z</b></td>
<td>Specifies that the match must occur at the end of the string or<br />
before <b>\n</b> at the end of the string (ignores the<br />
<b>Multiline</b> option).</td>
</tr>
<tr>
<td><b>\z</b></td>
<td>Specifies that the match must occur at the end of the string<br />
(ignores the <b>Multiline</b> option).</td>
</tr>
<tr>
<td><b>\G</b></td>
<td>Specifies that the match must occur at the point where the<br />
previous match ended. When used with Match.NextMatch(), this<br />
ensures that matches are all contiguous.</td>
</tr>
<tr>
<td><b>\b</b></td>
<td>Specifies that the match must occur on a boundary between<br />
<b>\w</b> (alphanumeric) and <b>\W</b> (nonalphanumeric)<br />
characters. The match must occur on word boundaries (that is, at<br />
the first or last characters in words separated by any<br />
nonalphanumeric characters). The match can also occur on a word<br />
boundary at the end of the string.</td>
</tr>
<tr>
<td><b>\B</b></td>
<td>Specifies that the match must not occur on a <b>\b</b><br />
boundary.</td>
</tr>
</tbody>
</table>
<p></p>
<h3>Quantifiers</h3>
<table width="100%" border="1" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td width="20%">*</td>
<td>Matches the preceding element zero or more times. It is<br />
equivalent to <b>{0,}</b>. <span>*</span> is a greedy quantifier<br />
whose non-greedy equivalent is <span class="input">*?</span>.</td>
</tr>
<tr>
<td>+</td>
<td>Matches the preceding element one or more times. It is<br />
equivalent to <span>{1,}</span>. <span class="input">+</span> is a<br />
greedy quantifier whose non-greedy equivalent is<br />
<span>+?</span>.</td>
</tr>
<tr>
<td>?</td>
<td>Matches the preceding element zero or one time. It is<br />
equivalent to <span>{0,1}</span>. <span class="input">?</span> is a<br />
greedy quantifier whose non-greedy equivalent is<br />
<span>??</span>.</td>
</tr>
<tr>
<td>{n}</td>
<td>Matches the preceding element exactly <i>n</i> times.<br />
<span>{n}</span> is a greedy quantifier whose non-greedy equivalent<br />
is <span>{n}?</span>.</td>
</tr>
<tr>
<td>{n,}</td>
<td>Matches the preceding element at least <i>n</i> times.<br />
<span>{n,}</span> is a greedy quantifier whose non-greedy<br />
equivalent is <span>{n}?</span>.</td>
</tr>
<tr>
<td>{<i>n</i>,<i>m</i>}</td>
<td>Matches the preceding element at least <i>n</i>, but no more<br />
than <i>m</i>, times. <span>{n,m}</span> is a greedy quantifier<br />
whose non-greedy equivalent is <span class=<br />
"input">{n,m}?</span>.</td>
</tr>
<tr>
<td>*?</td>
<td>Matches the preceding element zero or more times, but as few<br />
times as possible. It is a lazy quantifier that is the counterpart<br />
to the greedy quantifier <span>*</span>.</td>
</tr>
<tr>
<td>+?</td>
<td>Matches the preceding element one or more times, but as few<br />
times as possible. It is a lazy quantifier that is the counterpart<br />
to the greedy quantifier <span>+</span>.</td>
</tr>
<tr>
<td>??</td>
<td>Matches the preceding element zero or one time, but as few<br />
times as possible. It is a lazy quantifier that is the counterpart<br />
to the greedy quantifier <span>?</span>.</td>
</tr>
<tr>
<td>{<i>n</i>}?</td>
<td>Matches the preceding element exactly <span class=<br />
"parameter">n</span> times. It is a lazy quantifier that is the<br />
counter to the greedy quantifier <span class=<br />
"input">{n}+</span>.</td>
</tr>
</tbody>
</table>
<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.dijksterhuis.org/csharp-regular-expression-operator-cheat-sheet/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Manipulating Strings in C# -Replacing part of a string / Replacing all occurences of a sub-string</title>
		<link>http://www.dijksterhuis.org/manipulating-strings-in-csharp-replacing-part-string/</link>
		<comments>http://www.dijksterhuis.org/manipulating-strings-in-csharp-replacing-part-string/#comments</comments>
		<pubDate>Fri, 13 Feb 2009 02:04:05 +0000</pubDate>
		<dc:creator>Martijn</dc:creator>
				<category><![CDATA[Beginner]]></category>
		<category><![CDATA[Learn C#]]></category>
		<category><![CDATA[c#]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[strings]]></category>

		<guid isPermaLink="false">http://www.dijksterhuis.org/?p=637</guid>
		<description><![CDATA[Very often you need to change part of a string, maybe just once, or many times over. Strings in .NET/C# are immutable we cannot actually change a string in-place. But we are able to work on copies. The code example below attaches two new methods to the C# string class. The ReplaceFirst method replaces the [...]<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></description>
			<content:encoded><![CDATA[<p>Very often you need to change part of a string, maybe just once, or many times over. Strings in .NET/C# are immutable we cannot actually change a string in-place. But we are able to work on copies. The code example below attaches two new methods to the C# string class.</p>
<ul>
<li>The ReplaceFirst method replaces the first occurrence of &#8220;needle&#8221; in a string and replaces it with &#8220;replacement&#8221;.</li>
<li>The ReplaceAll function is similar: it steps through the string modifying it each time it finds &#8220;needle&#8221; and replaces it. To avoid a possible infinite loop it first checks whether &#8220;needle&#8221; is equivalent to &#8220;replacement&#8221;.</li>
</ul>
<p><span id="more-637"></span></p>
<pre class="brush: c#">
using System;
using System.Collections;

namespace StringItems
{
        static class StringExt
        {
                public static string ReplaceFirst(this string haystack, string needle, string replacement)
                {
                        int pos = haystack.IndexOf(needle);
                        if (pos &lt; 0) return haystack;

                        return haystack.Substring(0,pos) + replacement + haystack.Substring(pos+needle.Length);
                }

                public static string ReplaceAll(this string haystack, string needle, string replacement)
                {
                        int pos;
                        // Avoid a possible infinite loop
                        if (needle == replacement) return haystack;
                        while((pos = haystack.IndexOf(needle))&gt;0)
                                haystack = haystack.Substring(0,pos) + replacement + haystack.Substring(pos+needle.Length);
                        return haystack;
                }

        }
}
</pre>
<p>Both methods are implemented using a class extension. (for more on creating class extensions see also <a href="../manipulating-strings-in-csharp-finding-all-occurrences-of-a-string-within-another-string/">Finding all occurrences of a string within another string</a>) After you include these methods into your project you can call them directly from any string instance:</p>
<blockquote><p>string myString = &#8220;Hello World&#8221;;<br />
string myModifiedString = myString.ReplaceFirst(&#8220;World&#8221;,&#8221;People&#8221;);<br />
Console.WriteLine(&#8220;{0}&#8221;,myModifiedString); // Writes: &#8220;Hello People&#8221;</p></blockquote>
<p>An example use of the ReplaceAll method:</p>
<blockquote><p>string myString = &#8220;boo foo is not foo boo or foo boo foo&#8221;;<br />
string myModifiedString = myString.ReplaceFirst(&#8220;boo&#8221;,&#8221;goo&#8221;);<br />
Console.WriteLine(&#8220;{0}&#8221;,myModifiedString); // Writes: &#8220;goo foo is not foo goo or foo goo foo&#8221;;</p></blockquote>
<p><strong>Why not just use a regular expression?</strong></p>
<p><strong></strong>If you are familiar with the RegEx class in C# you can easily write a regular expression to achieve the same string replacement result:</p>
<blockquote><p>using System.Text.RegularExpressions;<br />
Regex regex = new Regex(&#8220;boo&#8221;);<br />
string result = regex.Replace(&#8220;boo foo is not foo boo or foo boo foo&#8221;, &#8220;goo&#8221;);</p></blockquote>
<p>Regular expressions are flexible and if you do anything more complex than just a basic string replacement they are your only choice. But they come at a hefty performance price. To run a regular expression it needs to be compiled first and then executed. The .NET runtime caches the expression for performance but using a regular expression for string replacement is still much slower.</p>
<p><strong>How much slower are regular expressions for string replacement?</strong></p>
<p>In an earlier post I described <a href="http://www.dijksterhuis.org/timing-function-performance-stopwatch-class/">the Stopwatch class in System.Diagnostics</a>. It is ideal for a little benchmark testing &#8212; so lets compare my string replacement methods with the build-in regular expression library:</p>
<pre class="brush: c#">
string haystack = &quot;boo foo is not foo boo or foo boo foo&quot;;
string result;
Stopwatch sw = Stopwatch.StartNew();
for (int Lp = 0; Lp &lt; 100000; Lp++)
result = regex.Replace($haystack, &quot;goo&quot;);
sw.Stop();
Console.WriteLine(&quot;Time used (float): {0} ms&quot;,sw.Elapsed.TotalMilliseconds);
</pre>
<p><span>And the same for the string replacement functions:</span></p>
<pre class="brush: c#">
string haystack = &quot;boo foo is not foo boo or foo boo foo&quot;;
string result;
Stopwatch sw = Stopwatch.StartNew();
for(int Lp = 0; Lp &lt; 100000; Lp++)
result = haystack.ReplaceAll(&quot;boo&quot;,&quot;goo&quot;);
sw.Stop();
Console.WriteLine(&quot;Time used (float): {0} ms&quot;,sw.Elapsed.TotalMilliseconds);
</pre>
<p>The regular expression code needed <strong>1100ms </strong>, whereas the string replacement code needed just<strong> 27ms</strong>. So for this particular example, the string replacement was <strong>40 times faster</strong> than a regular expression.</p>
<p>This is a post from <a href="http://www.dijksterhuis.org">Martijn's C# Coding Blog</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.dijksterhuis.org/manipulating-strings-in-csharp-replacing-part-string/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

