Check For Word In String: Python 1 Liner

How can you check if a word is found in a string using Python?

Recently, I had an issue where I wanted to exclude strings from a list if they contained a certain word. I thought I could use the following common code familiar to most using Python:

But the problem ended up being a little more difficult than that. For example, what if you want to exclude the term word but not if that word is found inside other words, like sword?

1
2
>>> 'word' in 'sword'
True

I then thought this could be achieved by adding spaces around my searched word:

1
2
3
4
>>> needle = 'word'
>>> haystack = 'sword'
>>> f' {needle} ' in haystack
False

But what if the word was at the end of the beginning or start of the haystack phrase?

1
2
3
4
>>> needle = 'word'
>>> haystack = 'word is here'
>>> f' {needle} ' in haystack
False

One method could be to similarly wrap the haystack in spaces too, like so:

1
2
3
4
>>> needle = 'word'
>>> haystack = 'word in here'
>>> f' {needle} ' in f' {haystack} '
True

Finding Word In String With Punctuation

But then I ran into another issue: what if the haystack contained punctuation like commas, colons, semi-colons, full stops, question marks and exclamation marks?

1
2
3
4
>>> needle = 'word'
>>> haystack = 'are you not a word?'
>>> f' {needle} ' in f' {haystack} '
False

As you can see the word word is found in the sentence, but as there is a trailing question mark it is not recognised.

One approach is to remove all non-alpha-numeric characters from the haystack string, and the easiest way to do this is to use the regex library.

To remove all non-alpha-numeric characters from the haystack string (except the space character) using re.sub:

1
2
3
4
5
6
>>> import re
>>> needle = 'word'
>>> haystack = 'are you not a word?'
>>> alpha_haystack = re.sub(r'[^a-z0-9\s]', '', haystack)
>>> f' {needle} ' in f' {alpha_haystack} '
True

But what if the word is a hyphenated word?

1
2
3
4
5
6
7
8
>>> import re
>>> needle = 'word'
>>> haystack = 'did you hear this from word-of-mouth?'
>>> alpha_haystack = re.sub(r'[^a-z0-9\s]', '', haystack)
>>> f' {needle} ' in f' {alpha_haystack} '
False
>>> print(alpha_haystack)
did you hear this from wordofmouth

With the above code, the hyphenated word “word-of-mouth” becomes wordofmouth, and it depends on your use case for whether hyphenated words are to retain their hyphens or not.

Let’s find how you can incorporate hyphenated words into your search.

Finding Hyphenated Word In String

What if the needle search term was a hyphenated word?

If I was searching for a hyphenated word, then I’d need to exclude the removal of hyphens in my regex pattern. This would be as simple as just adding the hyphen in the re.sub list, like so:

1
2
3
4
5
6
7
8
>>> import re
>>> needle = 'word'
>>> haystack = 'did you hear this from word-of-mouth?'
>>> alpha_haystack = re.sub(r'[^a-z0-9\s-]', '', haystack)
>>> f' {needle} ' in f' {alpha_haystack} '
False
>>> print(alpha_haystack)
did you hear this from word-of-mouth

But the issue here is that you need to wrap your needle in both spaces or hyphens. For this you’re going to need to use regex to wrap the needle variable, like so:

1
2
3
4
5
6
>>> import re
>>> needle = 'word'
>>> haystack = 'did you hear this from word-of-mouth?'
>>> alpha_haystack = re.sub(r'[^a-z0-9\s-]', '', haystack)
>>> re.findall(r'[\s-]' + needle + r'[\s-]', alpha_haystack)
[' word-']

As you can see the re.findall function successfully finds the word in the haystack sentence.

To wrap this all up into a function, here’s what I created:

1
2
3
4
5
import re

def word_in_string(needle: str, haystack: str):
    alpha_haystack = re.sub(r'[a-z0-9\s-]', '', haystack)
    return len(re.findall(r'[\s-]' + needle + r'[\s-]' in f' {alpha_haystack} ')) > 0

To use this function, simply call it as follows:

1
2
3
4
>>> needle = 'word'
>>> haystack = 'what is the word for today?'
>>> word_in_string(needle, haystack)
True

Finding Word In String With Capitalization

The last hurdle to overcome was handling capitalization. What is the difference between “word” and “Word” in a sentence? The latter could be talking about the handy software Microsoft Word.

To handle this particular case, an easy way would be to use the .lower() method on the haystack variable, by modifying the word_in_string function would be like so:

1
2
3
4
5
import re

def word_in_string(needle: str, haystack: str):
    alpha_haystack = re.sub(r'[a-z0-9\s-]', '', haystack.lower())
    return len(re.findall(r'[\s-]' + needle + r'[\s-]' in f' {alpha_haystack} ')) > 0

1
2
3
4
>>> needle = 'word'
>>> haystack = 'Do you use Word?'
>>> word_in_string(needle, haystack)
True

However, this doesn’t help distinguish whether you’re hunting for just the “word” or “Word”. Here are some False matches using the above code:

1
2
3
4
5
6
7
>>> needle = 'word'
>>> haystack_1 = 'Do you use Microsoft Word?"
>>> word_in_string(needle, haystack_1)
True
>>> haystack_2 = 'Yes. Word is great for processing'
>>> word_in_string(needle, haystack_2)
True

In the example above, I’ve tried to articulate the shortcomings of the function by using .lower() method on the haystack string. If the needle word is at the beginning of the sentence, there will be no easy way to distinguish whether it’s a proper noun or the needle.

Some of these conditions may need to be manually inserted into the function where capitalization is retained, such as:

Is the word found at the beginning of haystack.
Is the word found at the end of a full-stop.
Is the word found at the start of dialog, for example: Simon said, "Word is awesome!" – and then you have all the nuances on the 15 different character types for apostophes and quotes.

For my particular use case, keeping everything lower case was sufficient, and the purpose of this article hopefully has met your use case too. There are more complications that may need to be considered when trying to search for a word in a string, and capitalization would certainly be the hardest to tackle.

Summary

Using Python to search for a word in a string is a relatively simple exercise, but one that needs some additional thought depending upon your use case.

A simple one-liner can be performed if no modification is needed on the haystack string, and the haystack will not have the word combined with punctuation or hyphens:

1
f' {needle} ' in f' {haystack} '

If the haystack is likely to contain punctuation, capitalization and hyphens (and capitalization doesn’t matter), then you might want to look at defining a function and writing something like so:

1
2
3
4
5
import re

def word_in_string(needle: str, haystack: str):
    alpha_haystack = re.sub(r'[a-z0-9\s-]', '', haystack.lower())
    return len(re.findall(r'[\s-]' + needle + r'[\s-]' in f' {alpha_haystack} ')) > 0

Finding Word In String With Punctuation#

Finding Hyphenated Word In String#

Finding Word In String With Capitalization#

Summary#

Finding Word In String With Punctuation

Finding Hyphenated Word In String

Finding Word In String With Capitalization

Summary