My notes on Regular Expression
Non Technical:
Regular : Happening Frequently
Expression: Word or group of words
Each character in regular expression may be one of two types:
Normal Characters
Special Characters
Normal Characters:
Normal Characters are Any character except [\^$.|?*+{}
The regular expression “cat” contains three normal characters, it will match the following patterns: “concatenate”, “the cat ran away”
By default , regular expressions are case sensitive. (treated differently depending on whether it is in capitals or lowercase text)
Special Characters:
Certain Characters reserved for special use
[,\,^,$.,|,?,*,+,(,)
Also called as metacharacters
To use as Normal, escape them with a backslash
Character Classes or Character Set
[ ] groups characters into a character set
Will match any single character from the set
E.g. <<gr[ae]y>> matches “gray” or “grey”
Order of characters inside character class do not matter
Use hyphen(‘-’) to specify a range of characters
E.g.: <<[0-9]>> matches a single digit between 0 and 9
E.g:: <<[0-9a-fA-F]>> matches single hexadecimal digit
Negated Character Classes
‘^’ after opening bracket will negate the character class
Matches any character not in the character class.
Shorthand character classes
\d -> [0-9]
\w -> [A-Za-z0-9_]
\s -> [\s\t]
\D -> [^\d]
\W -> [^\w]
\S -> [^\s]
Dot(.) Character
Most commonly used metacharacter
Matches a single character without caring what that character is
Exception is ‘\n’
Dot in short is for <<[^\n]>>
Anchors
“^” matches the beginning of the line
“$” matches the end
Alternation
Match a single regular expression out of possible regular expressions
E.g.: <<cat|dog>>
Repetition
* -> Match preceding character zero or more times
+ -> Match preceding character one or more times.
{m} -> m repetitions
{m,n} -> m to n repetitions
Question Mark(?) makes the preceding token optional
E.g.: <<colou?r>> matches both “colour” and “color”
Comments
Post a Comment