Python regular expression

A regular expression is a special sequence of characters that helps you easily check if a string matches a pattern.

Python has added the re module since version 1.5, which provides a Perl-style regular expression pattern.

The

re module gives the Python language full of regular expression functionality.

The

compile function generates a regular expression object based on a pattern string and optional flag parameters. This object has a set of methods for regular expression matching and replacement.

The

re module also provides functions that are fully functional with these methods, using a pattern string as their first argument.

This chapter focuses on regular expression processing functions commonly used in Python.


re.match function

re.match attempts to match a pattern from the beginning of the string. If the start position is not matched successfully, match() returns none.

Function syntax:

re.match(pattern, string, flags =0)

Function parameter description:

The
ParametersDescription
patternmatching regular expressions
stringThe string to match.
flags flag is used to control how regular expressions are matched, such as whether to distinguish between uppercase and lowercase, multi-line matching, and so on. See: Regular Expression Modifiers - Optional Flags

The matching success re.match method returns a matching object, otherwise it returns None.

We can use the group(num) or groups() matching object functions to get the matching expression.

Matching object methodsDescription
group(num=0) matches the string of the entire expression, group() can enter multiple group numbers at a time, in which case it will return one containing those The tuple of the value corresponding to the group.
groups()Returns a tuple containing all the group strings, from 1 to the included team number.

Instance

#!/usr/bin/python # -*- coding: UTF-8 -*- import re print(re.match('www', 'www.welookups.com').span()) # Match at the starting position print(re.match('com', 'www.welookups.com')) # Does not match at the starting position

The above example runs the output as:

(0, 3)
None

Instance

#!/usr/bin/python import re line = "Cats are smarter than dogs" matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I) if matchObj: print "matchObj.group() : ", matchObj.group() print "matchObj.group(1) : ", matchObj.group(1) print "matchObj.group(2) : ", matchObj.group(2) else: print "No match!!"

The above example execution results are as follows´╝Ü

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

re.search method

re.search scans the entire string and returns the first successful match.

Function syntax:

re.search(pattern, string, flags =0)

Function parameter description:

The
ParametersDescription
patternmatching regular expressions
stringThe string to match.
flags flag is used to control how regular expressions are matched, such as whether to distinguish between uppercase and lowercase, multi-line matching, and so on.

Successful match The re.search method returns a matching object, otherwise it returns None.

We can use the group(num) or groups() matching object functions to get the matching expression.

Matching object methodsDescription
group(num=0) matches the string of the entire expression, group() can enter multiple group numbers at a time, in which case it will return one containing those The tuple of the value corresponding to the group.
groups()Returns a tuple containing all the group strings, from 1 to the included team number.

Instance

#!/usr/bin/python # -*- coding: UTF-8 -*- import re print(re.search('www', 'www.welookups.com').span()) # Match at the starting position print(re.search('com', 'www.welookups.com').span()) # Does not match at the starting position

The above example runs the output as:

(0, 3)
(11, 14)

instance

#!/usr/bin/python import re line = "Cats are smarter than dogs"; searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I) if searchObj: print "searchObj.group() : ", searchObj.group() print "searchObj.group(1) : ", searchObj.group(1) print "searchObj.group(2) : ", searchObj.group(2) else: print "Nothing found!!"
The above example execution results are as follows:
searchObj.group() : Cats are smarter than dogs
searchObj.group(1) : Cats
searchObj.group(2) : smarter

The difference between re.match and re.search

re.match only matches the beginning of the string. If the string does not match the regular expression, the match fails, the function returns None; and re.search matches the entire string until a match is found.

Instance

#!/usr/bin/python import re line = "Cats are smarter than dogs"; matchObj = re.match( r'dogs', line, re.M|re.I) if matchObj: print "match --> matchObj.group() : ", matchObj.group() else: print "No match!!" matchObj = re.search( r'dogs', line, re.M|re.I) if matchObj: print "search --> matchObj.group() : ", matchObj.group() else: print "No match!!"
The above example runs as follows:
No match!!
search --> matchObj.group() :  dogs

Search and Replace

Python's re module provides re.sub for replacing matches in strings.

Syntax:

re.sub(pattern, repl, string, count=0, flags=0)

Parameters:

  • pattern : The pattern string in the regular.
  • Repl : The replaced string, which can also be a function.
  • String : The original string to be replaced by the lookup.
  • Count : The maximum number of substitutions after pattern matching. The default 0 means to replace all matches.

Instance

#!/usr/bin/python # -*- coding: UTF-8 -*- import re phone = "2004-959-559 # This is a foreign phone number" # Remove Python comments from strings num = re.sub(r'#.*$', "", phone) print "phone number is: ", num # Delete non-numeric (-) strings num = re.sub(r'\D', "", phone) print "phone number is : ", num
The above example execution results are as follows:
phone number is: 2004-959-559
phone number is : 2004959559

repl The argument is a function

Multiply the number in the string by 2 in the following example:

Instance

#!/usr/bin/python # -*- coding: UTF-8 -*- import re # Multiply the matching number by 2 def double(matched): value = int(matched.group('value')) return str(value * 2) s = 'A23G4HFD567' print(re.sub('(?P<value>\d+)', double, s))

Execution output is:

A46G8HFD1134

re.compile function

The

compile function is used to compile a regular expression and generate a regular expression (pattern ) object for use by the match() and search() functions.

The syntax is:

Parameters:

  • pattern : A regular expression in the form of a string

  • flags : Optional, indicating matching mode, such as ignoring case, multi-line mode, etc. The specific parameters are:

    1. re.I ignore case
    2. re.L indicates that the special character set \w, \W, \b, \B, \s, \S depends on the current environment
    3. re.M multi-line mode
    4. re.S is . and includes any characters including line breaks (. not included Line breaks)
    5. re.U indicates that the special character set \w, \W, \b, \B, \d, \D, \s, \S depends on the Unicode character property database
    6. re.X For readability, ignore spaces and comments after #
>

Instance

Instance

>>>import re >>> pattern = re.compile(r'\d+') # Used to match at least one number >>> m = pattern.match('one12twothree34four') # Find the head, no match >>> print m None >>> m = pattern.match('one12twothree34four', 2, 10) # Matches from the 'e' position, no match >>> print m None >>> m = pattern.match('one12twothree34four', 3, 10) # Matches from the '1' position, just matching >>> print m #Return a Match object lt;_sre.SRE_Match object at 0x10a42aac0> >>> m.group(0) # Can be omitted 0 '12' >>> m.start(0) # Can be omitted 0 3 >>> m.end(0) # Can be omitted 0 5 >>> m.span(0) # Can be omitted 0 (3, 5)

In the above, a Match object is returned when the match is successful, where:

  • group([group1, ...]) method is used to get one or more group matching strings. When you want to get the entire matching substring, you can use group directly. ) or group(0);
  • The
  • start([group]) method is used to get the starting position of the substring of the group matching in the entire string (the index of the first character of the substring). The default value of the parameter is 0. ;
  • The
  • end([group]) method is used to get the end position of the substring of the packet matching in the entire string (index +1 of the last character of the substring). The default value of the parameter is 0. ;
  • The
  • span([group]) method returns (start(group), end(group)).

Look at an example:

Instance

>>>import re >>> pattern = re.compile(r'([a-z]+) ([a-z]+)', re.I) # re.I Indicates that case is ignored >>> m = pattern.match('Hello World Wide Web') >>> print m # Matches successfully, returns a Match object <_sre.SRE_Match object at 0x10bea83e8> >>> m.group(0) # Returns the entire substring of the matching success 'Hello World' >>> m.span(0) # Returns the index of the entire substring that matched the success (0, 11) >>> m.group(1) # Returns the substring of the first group matching success 'Hello' >>> m.span(1) # Returns the index of the substring that the first packet matches successfully (0, 5) >>> m.group(2) # Returns the substring of the second group matching success 'World' >>> m.span(2) # Returns the substring of the second group matching success (6, 11) >>> m.groups() #Equivalent to (m.group(1), m.group(2), ...) ('Hello', 'World') >>> m.group(3) # Returns the second group matching the successful substring does not exist the third group Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: no such group

findall

Find all substrings matched by the regular expression in the string and return a list, or an empty list if no match is found.

Note: match and search are matched once and findall matches all.

The syntax is:

findall(string[, pos[, endpos]])

Parameters:

  • string : The string to be matched.
  • pos : An optional parameter that specifies the starting position of the string. The default is 0.
  • endpos : An optional parameter that specifies the end of the string. The default is the length of the string.

Find all the numbers in the string:

Instance

# -*- coding:UTF8 -*- import re pattern = re.compile(r'\d+') # Find numbers result1 = pattern.findall('welookups 123 google 456') result2 = pattern.findall('welook88ups123google456', 0, 10) print(result1) print(result2)

Output results:

['123', '456']
['88', '12']

re.finditer

Similar to findall, find all substrings that the regular expression matches in the string and return them as an iterator.

re.finditer(pattern, string, flags=0)





welookups is optimized for learning.© welookups .
All Right Reserved and you agree to have read and accepted our term and condition