The Regular Expressions
This is the regular expression in python tutorial. As a python developer you should also be aware of the fact that a regular expression is often defined as a type of a special sequence of characters. This special sequence of characters can help you in finding other different kinds of strings. This particular function is quite commonly used in the world of UNIX. A full support is further provided by the re module.
We would be discussing the two major functions that come under this regular expressions function. However, before we begin it is important for you to know that when it comes to using the regular expression in python then we tend to use the Raw Strings in terms of their r’expression’.
The Match Function
As an individual who is using this programming language it is important for you to know that with the help of the match function you can choose to match any kind of RE patterns with the optional flags to any particular kind of string. The syntax of this particular function or of the python regular expression string is also mentioned below.
re.match(pattern, string, flags=0)
The table for the description of this function is also mentioned below.
Serial Number |
Parameters |
Descriptions |
1 |
pattern |
This is the regular expression that needs to be matched |
2 |
string |
This is the string which would be searched to match the pattern at the beginning of the string |
3 |
flags |
You can specify different flags using bitwise OR (|). These are the modifiers which are also listed in the table that is mentioned below. |
It is also important for you to know that the re.match function of the python regular expression string allows you to match whenever there is a success and it shows none of any failure. While using this programming language if you wish to match any particular object to receive any matched expression then it is recommended that you should use the functions of group(num) and group ( ). The table for the match function is mentioned below.
Serial Number |
Match Object Method |
Descriptions |
1 |
group(num=0) |
This method returns the entire match or the specific subgroup num |
2 |
groups ( ) |
This method returns all matching subgroups in a tuple and this shows empty if there were no tuples at all |
The example for this is mentioned below.
#!/usr/bin/python
import re
line = “Cats are smarter than dogs”
match0bj = re.match ( r’(.*) are (.*?) . *’, line, re.M|re.I)
if match0bj:
print “match0bj.group ( ) : “, match0bj.group ( )
print “match0bj.group ( ) : “, match0bj.group ( )
print “match0bj.group ( ) : “, match0bj.group ( )
else:
print “No match! !”
Once you have finished writing this code then you can successfully choose to execute this code. And the result of this code that you can get to observe is mentioned below.
match.0bj.group ( ) : Cats are smarter than dogs
match.0bj.group ( 1 ) : Cats
match.0bj.group ( 2 ) : smarter
This was one of the examples for regular expression python tester.
The Search Function
The search function is a function in this programming language that can help you in looking out for the first RE pattern that might be present inside the string which must also have some kind of optional flags. The particular syntax for this type of function is mentioned below.
re.search(pattern, string, flags=0)
The description table for this function is mentioned below.
Serial Number |
Parameters |
Descriptions |
1 |
pattern |
This is the regular expression that needs to be matched |
2 |
string |
This is the string which would be searched to match the pattern that might be present anywhere in the string |
3 |
flags |
You can specify the different flags using bitwise OR ( | ). These are the types of modifiers which are also listed in the table that is present below. |
It is also important for you to know that if you choose to use the re.search function then with the help of that function whenever there is a success then you will see a match object. But whenever there is a failure then you will notice a none. And if you wish to get any kind of matched expression then for that you can choose to use the group(num) or the groups ( ) functions. Another example for regular expression python tester is mentioned below.
#!/usr/bin/python
import re
line = “Cats are smarter than dogs”
search0bj = re.search ( r’ (.*) are (.*?) .*’, line, re.M|re.I )
if search0bj:
print “search0bj.group ( ) : “, search0bj.group ( )
print “search0bj.group ( 1 ) : “, search0bj.group ( 1 )
print “search0bj.group ( 2 ) : “, search0bj.group ( 2 )
else:
print “Nothing found!”
After writing the code whenever you choose to execute it then you will observe the below mentioned results.
search0bj.group ( ) : Cats are smarter than dogs
search0bj.group ( 1 ) : Cats
search0bj.group ( 2 ) : smarter
This was an example for the regular expression python tester.
The Matching Vs. Searching
On the basis of regular expressions, this programming language offers two different types of primitive operations. Those operations include the match and the search options. The match option can help you in looking for a particular type of match only at the beginning of a string and the search option can help you in looking for a match that might be present anywhere on the entire string. This function is also similar to what the Perl does by default in this programming language. The example for regular expression python tester is mentioned below.
#!/usr/bin/python
import re
line = “Cats are smarter than dogs”;
match0bj = re.match ( r ‘dogs’, line, re.M|re.I )
if match0bj :
print “match --> match0bj.group ( ) : “, match0bj.group ( )
else:
print “No match! !”
search0bj = re.search ( r ‘dogs’, line, re.M|re.I )
if search0bj :
print “search --> search0bj.group ( ) : “, search0bj.group ( )
else :
print “Nothing found! !”
When this code is written and successfully executed then you will observe the below mentioned results.
No match! !
search --> match0bj.group ( ) : dogs
This is an example of the regular expression python tester.
The Search and Replace
It is important for you to know that one of the most important methods of re is sub. This method also uses the python regular expression string.
Syntax
re.sub(pattern, repl, string, max=0)
It is important for you to know that the method mentioned above can help you in replacing all kind of RE pattern that is present in the string with a repl. It is also important for you to know that this method helps in substituting all the occurrences unless the function of max is provided. This method of python regular expression string can also help you in returning the modified string.
For example
#!/usr/bin/python
import re
phone = “2004-959-559 # This is a phone number”
# Delete Python-style comments
num = re.sub ( r’#.*$’, “”, phone)
print “Phone Num : “”, num
# Remove anything other than digits
num = re.sub ( r’\D’, “”, phone)
print “Phone Num : “, num
After completely writing this code of regular expression in python, when you execute it then the below mentioned results are displayed.
Phone Num : 2004-959-559
Phone Num : 2004959559
The Option Flags: The Regular Expression Modifier
The regular expression in python modifier is a concept in this programming language that includes a number of optional modifiers that are required to control a number of different aspects of matching. The modifiers are also specified as a type of options flag. It is also important for you to know that you can also choose to provide multiple modifiers by using the particular OR (|). This has been also shown in the previous sections.
The table for the regular expression in python modifier is mentioned below.
Serial Numbers |
Modifiers |
Descriptions |
1 |
re.I |
Performs case-sensitive matching |
2 |
re.L |
Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B) |
3 |
re.M |
Makes $ match the end of a line and not just at the end of the string. This also makes the ^ match at the start of any line and not just at the start of a string |
4 |
re.S |
Makes a period (dot) match any character. This also includes a newline |
5 |
re.U |
Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B |
6 |
re.X |
Permits “cuter” regular expression syntax. It ignores whitespace ( except inside a set [ ] or when escaped by a backslash) and treats the unescaped # as a comment marker |
The Regular Expression Patterns
It is important for you to know that in this programming language all characters match themselves except from the control characters. The example of the control characters are + ? . * ^ $ ( ) [ ] { } | \. It is also important for you to know that you can also choose to escape any particular control character by choosing to precede it with a type of backlash. The table for the regular expression in python modifier patterns is mentioned below.
Serial Number |
Patterns |
Descriptions |
1 |
^ |
Matches beginning of line |
2 |
$ |
Matches end of line |
3 |
. |
Matches any single character except newline. Using m option allows it to match newline as well |
4 |
[…] |
Matches any single character in brackets |
5 |
[^…] |
Matches any single character not in brackets |
6 |
re* |
Matches 0 or more occurrence of preceding expressions |
7 |
re+ |
Matches 1 or more occurrence of preceding expression |
8 |
re? |
Matches 0 to 1 occurrence of preceding expressions |
9 |
re{ n } |
Matches exactly n number of occurrences of preceding expression |
10 |
re{ n, } |
Matches n or more occurrences of preceding expressions |
11 |
re{ n,m } |
Matches at least n and at most m occurrences of preceding expression |
12 |
a| b |
Matches either a or b |
13 |
(re) |
Groups regular expressions and remembers matched texts |
14 |
(?imx) |
Temporarily toggles on i, m, or x options within a regular expression. If in parenthesis, only that area is affected |
15 |
(?-imx) |
Temporarily toggles off i, m, or x options within a regular expression. The area is only affected if the parenthesis is present there. |
16 |
(?: re) |
Groups regular expressions without remembering matched text |
17 |
(?imx: re) |
Temporarily toggles on i, m, or x options within parenthesis |
18 |
(?-imx: re) |
Temporarily toggles off i, m, or x options within parenthesis |
19 |
(?#...) |
Comment |
20 |
(?= re) |
Specifies positions using a pattern. It lacks a range |
21 |
(?! re) |
Specifies position using pattern negation. It further doesn’t have a range |
22 |
(?> re) |
Matches independent pattern without backtracking |
23 |
\w |
Matches word characters |
24 |
\W |
Matches nonword characters |
25 |
\s |
Matches whitespace equivalent to [\t\n\r\f] |
26 |
\S |
Matches nonwhitespace |
27 |
\d |
Matches digits equivalent to [0-9] |
28 |
\D |
Matches nondigits |
29 |
\A |
Matches beginning of string |
30 |
\Z |
Matches end of string. If a newline exists, it matches just before newline |
31 |
\z |
Matches end of string |
32 |
\G |
Matches point where last match finished |
33 |
\b |
Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets |
34 |
\B |
Matches nonword boundaries |
35 |
\n, \t, etc. |
Matches newlines, carriage returns, tabs, etc |
36 |
\1… \9 |
Matches nth grouped subexpression |
37 |
\10 |
Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code |
The Regular Expression Examples
The Literal Characters
The table for the python regular expression string literal characters is mentioned below.
Serial Number |
Examples |
Descriptions |
1 |
python |
Match “python”. |
The Character Classes
The table for the character classes is mentioned below.
Serial Number |
Examples |
Descriptions |
1 |
[Pp]ython |
Match “Python” or “python” |
2 |
rub[ye] |
Match “ruby” or “rube” |
3 |
[aeiou] |
Match any one lowercase vowel |
4 |
[0-9] |
Match any digit; same as [0123456789] |
5 |
[a-z] |
Match any lowercase ASCII letter |
6 |
[A-Z] |
Match any uppercase ASCII letter |
7 |
[a-zA-Z0-9] |
Match any of the above |
8 |
[^aeiou] |
Match anything other than a lowercase vowel |
9 |
[^0-9] |
Match anything other than a digit |
The Special Character Classes
The table for the python regular expression string special character classes is mentioned below.
Serial Number |
Examples |
Descriptions |
1 |
. |
Match any character except newline |
2 |
\d |
Match a digit: [0-9] |
3 |
\D |
Match a nondigit: [^0-9] |
4 |
\s |
Match a whitespace character: [ \t\r\n\f] |
5 |
\S |
Match nonwhitespace: [^ \t\r\n\f] |
6 |
\w |
Match a single word character: [A-Za-z0-9_] |
7 |
\W |
Match a nonword character: [^A-Za-z0-9_] |
The Repetition Cases
The table for the regular expression python tester repetitions cases is mentioned below.
Serial Number |
Examples |
Descriptions |
1 |
ruby? |
Match “rub” or “ruby”: the y is optional |
2 |
ruby* |
Match “rub” plus 0 or more ys |
3 |
ruby+ |
Match “rub” plus 1 or more ys |
4 |
\d{3} |
Match exactly 3 digits |
5 |
\d{3,} |
Match 3 or more digits |
6 |
\d{3,5} |
Match 3, 4, or 5 digits |
The Nongreedy Repetitions
The Nongreedy repetitions in this programming language perform the function of matching the smallest number of repetitions. The table for the regular expression python tester nongreedy repetitions is mentioned below.
Serial Number |
Examples |
Descriptions |
1 |
<.*> |
The greedy repetition: matches “<python>perl>” |
2 |
<.*?> |
Nongreedy: matches “<python>” in “<python>perl>” |
The Grouping with Parenthesis
The table for the regular expression python tester grouping with parenthesis is mentioned below.
Serial Number |
Examples |
Descriptions |
1 |
\D\d + |
No group: + repeats \d |
2 |
(\D\d) + |
Grouped: + repeats \D\d pair |
3 |
( [Pp]ython (, )?) + |
Match “Python”, “Python, python, python”, etc. |
The Backreferences
Backreferences in this programming language are basically used to match a previously matched group once again. The table for the regular expression python tester backreferences is mentioned below.
Serial Number |
Examples |
Descriptions |
1 |
([Pp])ython&\1ails |
Match python&pails or Python&Pails |
2 |
([“”])[^\1]*\1 |
Single or double-quoted string. \1 matches whatever the 1st group matched. \2 matches whatever the 2nd group matched, etc |
The Alternatives
The table for the python regular expression string alternatives is mentioned below.
Serial Number |
Examples |
Descriptions |
1 |
python|perl |
Match “Python” or “perl” |
2 |
rub(y|le) ) |
Match “ruby” or “ruble” |
3 |
Python (! + |\?) |
“Python” followed by one or more ! or one ? |
The Anchors
Anchors in this programming language are required to specify the exact match positions. The table for the python regular expression string anchors is mentioned below.
Serial Number |
Examples |
Descriptions |
1 |
^Python |
Match “Python” at the start of a string or a literal line |
2 |
Python$ |
Match “Python” at the end of a string or a line |
3 |
\APython |
Match “Python” at the start of a string |
4 |
Python\Z |
Match “Python” at the end of a string |
5 |
\bPython\b |
Match “Python” at a word boundary |
6 |
\brub\B |
\B is nonword boundary: match “rub” in “rube” and “ruby” but not alone |
7 |
Python (?=!) |
Match “Python”, if followed by an exclamation point |
8 |
Python (?!!) |
Match “Python”, if not followed by an exclamation point |
The Special Syntax with Parenthesis
The table for the regular expression in python special syntax with parenthesis is mentioned below.
Serial Number |
Example |
Descriptions |
1 |
R (? # comment) |
Matches “R”. All the rest is a comment |
2 |
R (?i) uby |
Case-insensitive while matching “uby” |
3 |
R (?i:uby) |
Case-insensitive while matching “uby” |
4 |
rub (?:y|le) ) |
Group only without creating \1 backreference |
With this, we finish the regular expression in python part of our Python tutorial.