Skip to Content

Is Word In String? Python One-Liner & Custom Function

Recently, I had an issue where I wanted to exclude strings from a list if they contained a certain word. I thought I could use the following common code familiar to most using Python:

if 'word' in 'how to find word in string':
  # do something

But the problem ended up being a little more difficult than that. For example, what if you want to exclude the term word but not if that word is found inside other words, like sword?

'word' in 'sword'
# True

I then thought this could be achieved by adding spaces around my searched word:

needle = 'word'
haystack = 'sword'
f' {needle} ' in haystack
# False

But what if the word was at the end of the beginning or start of the haystack phrase?

needle = 'word'
haystack = 'word is here'
f' {needle} ' in haystack
# False

One method could be to similarly wrap the haystack in spaces too, like so:

needle = 'word'
haystack = 'word in here'
f' {needle} ' in f' {haystack} '
# True

Finding Word In String With Punctuation

But then I ran into another issue: what if the haystack contained punctuation like commas, colons, semi-colons, full stops, question marks and exclamation marks?

needle = 'word'
haystack = 'are you not a word?'
f' {needle} ' in f' {haystack} '
# False

I’d have to remove all non-alpha-numeric characters from the haystack string (except the space character):

import re

needle = 'word'
haystack = 'are you not a word?'
alpha_haystack = re.sub(r'[^a-z0-9\s]', '', haystack)
f' {needle} ' in f' {alpha_haystack} '
# True

This does cause issues for hyphenated words:

import re

needle = 'word'
haystack = 'did you hear this from word-of-mouth?'
alpha_haystack = re.sub(r'[a-z0-9\s]', '', haystack)
f' {needle} ' in f' {alpha_haystack} '
# False
print(alpha_haystack)
# did you hear this from wordofmouth

With the above code, the hyphenated word “word-of-mouth” becomes wordofmouth, and it depends on your use case for whether hyphenated words are to retain their hyphens or not.

Finding Hyphenated Word In String

What if the needle search term was a hyphenated word?

If I was searching for a hyphenated word, then I’d need to exclude hyphens in my regex pattern, like so:

import re

needle = 'word-of-mouth'
haystack = 'did you hear this from word-of-mouth?'
alpha_haystack = re.sub(r'[a-z0-9\s-]', '', haystack)
f' {needle} ' in f' {alpha_haystack} '
# True
print(alpha_haystack)
# did you hear this from word-of-mouth

To wrap this all up into a function, here’s what I created:

import re

def word_in_string(needle: str, haystack: str):
    alpha_haystack = re.sub(r'[a-z0-9\s-]', '', haystack)
    return f' {needle} ' in f' {alpha_haystack} '

To use this function, simply call it as follows:

needle = 'word'
haystack = 'what is the word for today?'
word_in_string(needle, haystack)
# True

Finding Word In String With Capitalisation

The last hurdle to overcome was handling capitalisation. What is the difference between “word” and “Word” in a sentence? The latter could be talking about the handy software Microsoft Word.

To handle this particular case, an easy way would be to use the .lower() method on the haystack variable, by modifying the word_in_string function would be like so:

import re

def word_in_string(needle: str, haystack: str):
    alpha_haystack = re.sub(r'[a-z0-9\s-]', '', haystack.lower())
    return f' {needle} ' in f' {alpha_haystack} '

needle = 'word'
haystack = 'Do you use Word?'
word_in_string(needle, haystack)
# True

However, this doesn’t help distinguish whether you’re hunting for just the “word” or “Word”. Here are some False matches using the above code:

needle = 'word'
haystack_1 = 'Do you use Microsoft Word?"
word_in_string(needle, haystack_1)
# True
haystack_2 = 'Yes. Word is great for processing'
word_in_string(needle, haystack_2)
# True

In the example above, I’ve tried to articulate the shortcomings of the function by using .lower() method on the haystack string. If the needle word is at the beginning of the sentence, there will be no easy way to distinguish whether it’s a proper noun or the needle.

Some of these conditions may need to be manually inserted into the function where capitalisation is retained, such as:

  • Is the word found at the beginning of haystack.
  • Is the word found at the end of a full-stop.
  • Is the word found at the start of dialog, for example: Simon said, "Word is awesome!" – and then you have all the nuances on the 15 different character types for apostophes and quotes.

For my particular use case, keeping everything lower case was sufficient, and the purpose of this article hopefully has met your use case too. There are more complications that may need to be considered when trying to search for a word in a string, and capitalisation would certainly be the hardest to tackle.

Summary

Using Python to search for a word in a string is a relatively simple exercise, but one that needs some additional thought depending upon your use case.

A simple one-liner can be performed if no modification is needed on the haystack string:

f' {needle} ' in f' {haystack} '

If modification is needed on the haystack string, then you might want to look at defining a function and writing something like so (this function retains hyphens in words):

import re

def word_in_string(needle: str, haystack: str):
    alpha_haystack = re.sub(r'[a-z0-9\s-]', '', haystack.lower())
    return f' {needle} ' in f' {alpha_haystack} '