There are three main ways within Python on how to remove specific characters from a string in Python, and I’ve clustered these approaches based on the following methods:
Each approach has its own unique way of being able to perform the task required, so we will explore each with the use of examples to illustrate what might suit your use case best.
Remove Characters Using Built-In String Methods
The most popular methods of removing specific characters from a string in Python is through the use of 2 string methods:
- strip, lstrip, rstrip
- replace
The caveat with using either of the above methods is that the variable being performed must be of type str (string). If you are operating on something else, you would need to convert the variable to a Python string data type.
Here’s an example where the replace string method will not work, because the operation is being performed on variable that is of a non-string data type:
>>> float_type = 12.3456
>>> type(float_type)
<class 'float'>
>>> result = not_string.replace("56", "")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'replace'
As shown from the above example, if you are operating on a variable that is not of type string , using the replace method on that variable will not work and you will need to convert the variable to a string.
To force the operation by converting the variable to a string, and using the same example as above, this would mean using the str() method as follows:
>>> float_type = 12.3456
>>> float_str = str(float_type)
>>> type(float_str)
<class 'str'>
>>> result = float_str.replace("56", "")
>>> print(result)
12.34
Keep the above in mind as we continue to explore the use of these built-in string methods when removing characters.
How to Use strip, lstrip, rstrip Methods
The strip method is a very specific string method that works on a particular character – the whitespace character. By using this method we are looking to remove whitespace at either the front and back of a string, or just the front, or just the back.
If you’re looking for a quick way to remove whitespace characters from a string, then you will want to use the strip method, or one of it’s cousins rstrip or lstrip depending on which direction you want to strip whitespace away from the string.
Here’s a demonstration of it’s use:
>>> s = " hello world "
>>> s.strip()
# "hello world"
>>> s.rstrip()
# " hello world"
>>> s.lstrip()
# "hello world "
So as you can see the strip methods can help you remove characters from a string in Python, however, the character to remove would be whitespace and would need to be at either end of the string.
What if you wanted to remove more than just whitespace? This is where the handy replace method comes in.
How To Use replace Method
The easiest and most frequent method of choice to remove characters from a string is using the standard replace function.
The replace method has the following parameters:
str.replace(old, new, [count])
The first parameter in this method is the character we wish to find and remove within the original string, the second argument in this method needs to be an empty string which is defined as "" .
Interestingly the third parameter ( count ) is optional. If this parameter is not set it is assumed the replacement action is to be performed on all characters in the string . If there is a number set, it is assumed to perform the replacement only a set number of times as defined by the third parameter.
It is important to note that the string in the first parameter must exactly match what is contained within the string being performed, for example if I wanted to remove a set of characters from a phone number string (such as all the characters that are not digits), I cannot pass in a list or the set of characters I want removed:
>>> ugly_phone = "(02) 9412-345 678"
>>> ugly_phone.replace("()- ", "")
'(02) 9412-345 678'
Why didn’t it remove the brackets, dash and space characters? This is because the first parameter did not exactly match what was contained in the string – there is no occurrence of "()- " .
If we wanted to remove those set of characters from the phone number string using the replace method, we would need to daisy chain each of the replace calls individually, like so:
>>> ugly_phone = "(02) 9412-345 678"
>>> ugly_phone.replace("(","").replace(")","").replace("-","").replace(" ","")
'029412345678'
Notice how we can chain the replace method to individually remove 1 character a time. When performing this action the emphasis would be on the order of operation , you will want to be mindful of the order of each call as one call may impact another.
Notice also by using the replace function how the original string was not mutated as a result of the operation performed:
>>> ugly_phone = "(02) 9412-345 678"
>>> ugly_phone.replace("(","").replace(")","").replace("-","").replace(" ","")
'029412345678'
>>> print(ugly_phone)
'(02) 9412-345 678'
However, this process can get quite tedious. What if we find users have input other characters into the phone number field – such as a letter of the alphabet!
Can there be a quicker way to remove a set of characters from the string in one replace call?
Yes!
This would involve using a regular expression which would permit us to remove multiple characters from the string using one replace call.
Remove Characters Using Regex
The regex ( re ) library can be imported to help remove characters from your string especially when there are multiple characters to remove and chaining replace methods becomes too tedious.
To continue with our phone number example, all we wanted to keep was digits (numbers), and a regular expression that helps to remove all non-digits from a string is \D .
Let’s try that by importing the regex library first and using the sub (substitute) method:
>>> import re
>>> ugly_phone = "(02) 9412-345 678"
>>> re.sub("\D", "", ugly_phone)
# 029412345678
Notice how elegant that solution is compared to daisy chaining a multitude of replace methods?
While this solution is succinct, it does require a little knowledge on framing regular expressions and translating those into meaningful ways to remove the unnecessary characters from your strings.
Another benefit of using regular expressions is that you can provide a list of characters for removal, in comparison to the replace method where you could only remove one character string at a time.
>>> import re
>>> s = "to be or not to be, I do not know"
>>> s.replace("to", "").replace("be", "")
' or not , I do now know'
>>> re.sub("(to|be)", "", s)
' or not , I do now know'
By inserting all the strings within parentheses separated by the pipe character you can list all the precise characters to remove.
As you can see the regular expression substitute method is a very handy and powerful feature, and we haven’t even begun to scratch the surface!
Remove Characters By Position
One other technique that may prove to be useful with removing characters from a string, is by removing by position. I have found this technique handy when parsing through a series of strings all of which have the same pattern and length, but I wish to remove the same characters according to where they are in the string.
Using our common telephone number example, say each phone number was formatted correctly, but I wanted to remove the area code, with a sample of our data looking like the following:
(02) 1234 5678
(03) 1234 5679
(04) 1234 5670
I could use the replace string method, by writing something like this:
str.replace("(02)", "").replace("(03)", "").replace("(04)", "")
But again, this would get very ugly very quickly the more unique area codes we have in our data set.
If we performed this using regular expression patterns, we could write something like this:
import re
re.sub("\(\d+\)", "", str)
While this again is quite succinct, there is an even more succinct way using position, as follows:
>>> s = "(02) 1234 5678"
>>> s[4:]
' 1234 5678'
The expression used here slices the original string by starting at index 4 as the first character to keep (not to be confused with the actual 4th character, an index starts at 0 for the 1st character), and then captures all characters to the end (as no index number was provided after the ":" character).
If we did want to only capture a specific range of characters then we would insert an index number to represent the last character, being aware that the last character is not captured, as shown here:
>>> s = "(02) 1234 5678"
>>> s[4:9]
' 1234'
By inserting a complete range within the list, we are only requiring characters from our new string up to and excluding the 9th index character. This is why in the above example the last character in our result does not include the space after the number 4 as the space character after the number 4 represents the 9th index character in the string.
This type of removal is fast and easy if we want to keep characters within a string according to their position.
Conclusion
There are several ways within Python to be able to remove characters within a string. The built-in replace string method is perhaps the most renowned and easiest to perform and helps when you want to remove a specific character, allowing you to chain multiple calls if needed, and the regular expression substitute method being the most versatile.
Finally, we looked at another way of removing characters easily from our string by position if we know for certainty the position of each character within the string being operated on.