Python regular expression
A regular expression is a special sequence of characters that helps you easily check if a string matches a pattern.
Python has added the re module since version 1.5, which provides a Perl-style regular expression pattern.
The
re module gives the Python language full of regular expression functionality.
The
compile function generates a regular expression object based on a pattern string and optional flag parameters. This object has a set of methods for regular expression matching and replacement.
The
re module also provides functions that are fully functional with these methods, using a pattern string as their first argument.
This chapter focuses on regular expression processing functions commonly used in Python.
re.match function
re.match attempts to match a pattern from the beginning of the string. If the start position is not matched successfully, match() returns none.
Function syntax:
re.match Span>(pattern, string, flags =0)
Function parameter description:
Parameters | Description |
pattern | matching regular expressions |
string | The string to match. |
The flags | flag is used to control how regular expressions are matched, such as whether to distinguish between uppercase and lowercase, multi-line matching, and so on. See: Regular Expression Modifiers - Optional Flags |
The matching success re.match method returns a matching object, otherwise it returns None.
We can use the group(num) or groups() matching object functions to get the matching expression.
Matching object methods | Description |
group(num=0) | matches the string of the entire expression, group() can enter multiple group numbers at a time, in which case it will return one containing those The tuple of the value corresponding to the group. |
groups() | Returns a tuple containing all the group strings, from 1 to the included team number. |
Instance
import re
print(re.match('www', 'www.welookups.com').span())
print(re.match('com', 'www.welookups.com'))
The above example runs the output as:
(0, 3)
None
Instance
import re
line = "Cats are smarter than dogs"
matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)
if matchObj:
print "matchObj.group() : ", matchObj.group()
print "matchObj.group(1) : ", matchObj.group(1)
print "matchObj.group(2) : ", matchObj.group(2)
else:
print "No match!!"
The above example execution results are as follows:
matchObj.group() : Cats are smarter than dogs
matchObj.group(1) : Cats
matchObj.group(2) : smarter
re.search method
re.search scans the entire string and returns the first successful match.
Function syntax:
re.search Span>(pattern, string, flags =0)
Function parameter description:
Parameters | Description |
pattern | matching regular expressions |
string | The string to match. |
The flags | flag is used to control how regular expressions are matched, such as whether to distinguish between uppercase and lowercase, multi-line matching, and so on. |
Successful match The re.search method returns a matching object, otherwise it returns None.
We can use the group(num) or groups() matching object functions to get the matching expression.
Matching object methods | Description |
group(num=0) | matches the string of the entire expression, group() can enter multiple group numbers at a time, in which case it will return one containing those The tuple of the value corresponding to the group. |
groups() | Returns a tuple containing all the group strings, from 1 to the included team number. |
Instance
import re
print(re.search('www', 'www.welookups.com').span())
print(re.search('com', 'www.welookups.com').span())
The above example runs the output as:
(0, 3)
(11, 14)
instance
import re
line = "Cats are smarter than dogs";
searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)
if searchObj:
print "searchObj.group() : ", searchObj.group()
print "searchObj.group(1) : ", searchObj.group(1)
print "searchObj.group(2) : ", searchObj.group(2)
else:
print "Nothing found!!"
The above example execution results are as follows:
searchObj.group() : Cats are smarter than dogs
searchObj.group(1) : Cats
searchObj.group(2) : smarter
The difference between re.match and re.search
re.match only matches the beginning of the string. If the string does not match the regular expression, the match fails, the function returns None; and re.search matches the entire string until a match is found.
Instance
import re
line = "Cats are smarter than dogs";
matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
print "match --> matchObj.group() : ", matchObj.group()
else:
print "No match!!"
matchObj = re.search( r'dogs', line, re.M|re.I)
if matchObj:
print "search --> matchObj.group() : ", matchObj.group()
else:
print "No match!!"
The above example runs as follows:
No match!!
search --> matchObj.group() : dogs
Search and Replace
Python's re module provides re.sub for replacing matches in strings.
Syntax:
re.sub(pattern, repl, string, count=0, flags=0)
Parameters:
- pattern : The pattern string in the regular.
-
Repl : The replaced string, which can also be a function.
-
String : The original string to be replaced by the lookup.
-
Count : The maximum number of substitutions after pattern matching. The default 0 means to replace all matches.
Instance
import re
phone = "2004-959-559 # This is a foreign phone number"
num = re.sub(r'#.*$', "", phone)
print "phone number is: ", num
num = re.sub(r'\D', "", phone)
print "phone number is : ", num
The above example execution results are as follows:
phone number is: 2004-959-559
phone number is : 2004959559
repl The argument is a function
Multiply the number in the string by 2 in the following example:
Instance
import re
def double(matched):
value = int(matched.group('value'))
return str(value * 2)
s = 'A23G4HFD567'
print(re.sub('(?P<value>\d+)', double, s))
Execution output is:
A46G8HFD1134
re.compile function
The
compile function is used to compile a regular expression and generate a regular expression (pattern ) object for use by the match() and search() functions.
The syntax is:
Parameters:
-
pattern : A regular expression in the form of a string
-
flags : Optional, indicating matching mode, such as ignoring case, multi-line mode, etc. The specific parameters are:
-
re.I ignore case
-
re.L indicates that the special character set \w, \W, \b, \B, \s, \S depends on the current environment
-
re.M multi-line mode
-
re.S is . and includes any characters including line breaks (. not included Line breaks)
-
re.U indicates that the special character set \w, \W, \b, \B, \d, \D, \s, \S depends on the Unicode character property database
-
re.X For readability, ignore spaces and comments after #
>
Instance
Instance
>>>import re
>>> pattern = re.compile(r'\d+')
>>> m = pattern.match('one12twothree34four')
>>> print m
None
>>> m = pattern.match('one12twothree34four', 2, 10)
>>> print m
None
>>> m = pattern.match('one12twothree34four', 3, 10)
>>> print m
lt;_sre.SRE_Match object at 0x10a42aac0>
>>> m.group(0)
'12'
>>> m.start(0)
3
>>> m.end(0)
5
>>> m.span(0)
(3, 5)
In the above, a Match object is returned when the match is successful, where:
group([group1, ...])
method is used to get one or more group matching strings. When you want to get the entire matching substring, you can use group directly. )
or group(0)
;
The start([group])
method is used to get the starting position of the substring of the group matching in the entire string (the index of the first character of the substring). The default value of the parameter is 0. ;
The end([group])
method is used to get the end position of the substring of the packet matching in the entire string (index +1 of the last character of the substring). The default value of the parameter is 0. ;
The span([group])
method returns (start(group), end(group))
.
Look at an example:
Instance
>>>import re
>>> pattern = re.compile(r'([a-z]+) ([a-z]+)', re.I)
>>> m = pattern.match('Hello World Wide Web')
>>> print m
<_sre.SRE_Match object at 0x10bea83e8>
>>> m.group(0)
'Hello World'
>>> m.span(0)
(0, 11)
>>> m.group(1)
'Hello'
>>> m.span(1)
(0, 5)
>>> m.group(2)
'World'
>>> m.span(2)
(6, 11)
>>> m.groups()
('Hello', 'World')
>>> m.group(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: no such group
findall
Find all substrings matched by the regular expression in the string and return a list, or an empty list if no match is found.
Note: match and search are matched once and findall matches all.
The syntax is:
findall(string Span>[, pos[, endpos]])
Parameters:
- string : The string to be matched.
- pos : An optional parameter that specifies the starting position of the string. The default is 0.
-
endpos : An optional parameter that specifies the end of the string. The default is the length of the string.
Find all the numbers in the string:
Instance
import re
pattern = re.compile(r'\d+')
result1 = pattern.findall('welookups 123 google 456')
result2 = pattern.findall('welook88ups123google456', 0, 10)
print(result1)
print(result2)
Output results:
['123', '456']
['88', '12']
re.finditer
Similar to findall, find all substrings that the regular expression matches in the string and return them as an iterator.
re.finditer(pattern, string, flags=0)