String Methods
The str
data type is a built-in class whose instances include more than 30 methods for parsing, transforming, splitting and joining its content. Here is a quick reference to learn the most used string methods.
ParsingΒΆ
The count()
method returns the number of times the specified set of characters appears within the string.
>>> s = "Hello world" >>> s.count("Hello") 1 >>> s.count("o") 2 >>> s.count("x") 0
The find()
and index()
methods return the location (starting at 0) where the specified argument is found.
>>> s.find("world") 6 >>> s.index("world") 6
They differ in that the latter throws ValueError
when the argument is not found, while the former returns -1
.
>>> s.find("mundo") -1 >>> s.index("mundo") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: substring not found
In both methods the search occurs from left to right. To search for a set of characters starting from the end of the string, use rfind()
and rindex()
in the same way.
>>> s = "C:/path/to/something" >>> s.find("/") # Returns the first occurrence. 2 >>> s.rfind("/") # Returns the last one. 10
startswith()
and endswith()
tell whether the string begins or ends with the set of characters passed as argument.
>>> s = "Hello world" >>> s.startswith("Hello") True >>> s.endswith("world") True >>> s.endswith("mundo") False
These methods are preferred over slicing:
# startswith() is preferred. >>> s[:4] == "Hello" True
The isdigit()
, isnumeric()
, and isdecimal()
methods determine whether all characters in the string are digits, numbers, or decimal numbers (= base ten numbers, usually opposed to base 16 or hexadecimal numbers).
>>> "1234".isnumeric() True >>> "1234".isdecimal() True >>> "abc123".isdigit() False
isdigit()
considers any character that can form a number, including those that belong to oriental languages. isnumeric()
is broader, as it also includes characters with numeric connotation that are not necessarily digits (for example, a fraction). Thus:
>>> "β ".isnumeric() True >>> "β ".isdigit() False
The isdecimal()
method is the most restrictive as it only takes into account decimal numbers, i.e., numbers formed by digits ranging from 0 to 9.
The following six parsing functions are pretty self-explanatory:
# Determines if all characters are alphanumeric. >>> "abc123".isalnum() True # Determines if all characters are alphabetic. >>> "abcdef".isalpha() True >>> "abc123".isalpha() False # Determines if all letters are lowercase. >>> "abcdef".islower() True # Capital letters. >>> "ABCDEF".isupper() True # Determines if the string contains all printable characters. >>> "Hello \t world!".isprintable() False # Determines if the string contains only spaces. >>> "Hello world".isspace() False >>> " ".isspace() True
TransformingΒΆ
Remember that strings in Python are immutable. Therefore, all the methods below do not act on the original object but return a new one.
capitalize()
returns the string with its first letter capitalized.
>>> "hello world".capitalize() 'Hello world'
encode()
encodes the string with the specified character map and returns a bytes
instance.
>>> "Hello world".encode("utf-8") b'Hello world'
The original string might be recovered by using the bytes.decode()
mehotd.
>>> b'Hello world'.decode("utf-8") 'Hello world'
The center()
, ljust()
, and rjust()
methods align a string to center, left, or right, respectively. They take one argument specifies the length or width of the returned string. For example, if the original string has 4 characters and the passed width is 11, the remaining 7 characters are filled with spaces.
>>> "Hello".center(11) ' Hello ' >>> "Hello".ljust(11) 'Hello ' >>> "Hello".rjust(11) ' Hello'
These methods are especially useful when printing in table format in order align each value. The second argument indicates the character to be used when filling the remaining space (" " by default).
>>> "Hello".center(11, "*") '***Hello***'
lower()
and upper()
return a copy of the string with all its letters in lowercase or uppercase, respectively.
>>> "HeLlO wOrLd!".lower() 'hello world!' >>> "HeLlO wOrLd!".upper() 'HELLO WORLD!'
While swapcase()
changes uppercase to lowercase and vice-versa:
>>> "HeLlO wOrLd!".swapcase() 'hElLo WoRlD!'
The strip()
, lstrip()
and rstrip()
functions remove the whitespaces that precedes and/or follows the string, if any.
>>> s = " Hello world! " >>> s.strip() 'Hello world!' >>> s.rstrip() # Only right spaces. ' Hello world!' >>> s.lstrip() # Left spaces. 'Hello world! '
New lines (\n
), tabs (\t
), and carriage returns (\r
) are also removed by these stripping functions.
Finally, the widely used replace()
method replaces one string with another.
>>> s = "Hello world!" >>> s.replace("world", "mundo") 'Hello mundo!' >>> s.replace("o", "_") # Each occurrence is replaced. 'Hell_ w_rld!' >>> s # Remember! The original string remains the same. 'Hello world!'
Splitting and JoiningΒΆ
The most used method for splitting a string according to a separator character is split()
. The separator defaults to spaces, new lines (\n
), tabs (\t
), and carriage returns (\r
).
>>> "Hello world!\nFrom\tPython!".split() ['Hello', 'world!', 'From', 'Python!']
The separator can be specified as an argument.
>>> "Hello world!\nFrom\tPython!".split(" ") ['Hello', 'world!\nFrom\tPython!']
There is also splitlines()
, which is equivalent to split("\n")
:
>>> "Hello world!\nFrom\tPython!".splitlines() ['Hello world!', 'From\tPython!']
A second argument to split()
indicates the maximum number of splits that can take place (the default -1
means no limit).
# Only the first two spaces are splitted. >>> "Hello world from Python!".split(" ", 2) ['Hello', 'world', 'from Python!']
The less used splitting method partition()
returns a tuple of three elements: the character block before the first occurrence of the separator, the separator itself, and the block after it.
>>> "Hello world from Python!".partition(" ") ('Hello', ' ', 'world from Python!')
rpartition()
works in a similar way, but searching from right to left.
>>> "Hello world from Python!".rpartition(" ") ('Hello world from', ' ', 'Python!')
Finally, the extremely useful join()
method, which must be called from a string that acts as a separator to join the elements of a list or any other sequence.
>>> " ".join(["Hello", "world"]) 'Hello world' >>> ", ".join(["C", "C++", "Python", "Java"]) 'C, C++, Python, Java'
As can be seen, split()
and join()
are exact opposites:
>>> sep = " " >>> sep.join("Hello world!".split(sep)) 'Hello world!'