Checking whether two strings are anagrams of each other is a fairly common coding challenge or kata. At a local meetup recently, we looked at an extended version of the challenge: not just to to check whether two words are anagrams, but also to find all possible valid anagrams within a language reference. This is a more difficult version of the common “is it an anagram” coding challenge.
There are several approaches you could take for this challenge, and some are more efficient than others. This is one of the great reasons for doing these kinds of exercise. It’s not just about finding a solution, but about thinking about different approaches to a problem, and reasoning as to why one approach may be better than another in a specific context.
An initial idea which might present itself when looking for anagrams is the use of permutations – ordered combinations of the available characters. The trouble with permutations-based solutions though is that they tend to be very time consuming, as the number of permutations of n
object grows very fast as n
grows. In fact the number can be calculated as n!
or “n factorial” which means n x (n-1) x (n-2) x ... 2 x 1
. This is because there are n possible choices for the first item, and once this choice has been made, for each of the initial choices, there are n-1
remaining options, and so on. Due to the generally inefficient nature of permutations-based solutions, we did not try using this approach for this challenge.
Python anagrams challenge using sorting
A potentially more efficient solution, which may seem obvious in hindsight, but which is really quite a clever idea for whoever first thought of it, is the sort the words to be compared. For example, if we have the word “arts”, and wish to see whether it is an anagram of “star”, we can sort both words alphabetically, and it becomes clear that both contain the same characters, as they both become “arst.”
>>> sorted("arts")
['a', 'r', 's', 't']
>>> sorted("star")
['a', 'r', 's', 't']
Have a go now at implementing you own solution to this challenge based on sorting using the boilerplate Python code given below.
Some notes about the boilerplate code:
- I am using a “dictionary” (as in word list) provided on a URL referenced in Al Sweigart’s excellent book “Cracking Codes With Python: An Introduction To Building And Breaking Ciphers” linked below.
- Before we begin looking for anagrams the word list is processed by importing using
urllib.request
, and converting to upper case usingmap
. - Since the words list is now upper case, don’t forget to account for this in your solution function.
-
Good luck!
import urllib.request LANGUAGE_REFERENCE_URL = "https://inventwithpython.com/dictionary.txt" def find_anagrams(input_str): pass with urllib.request.urlopen(LANGUAGE_REFERENCE_URL) as response: words_list = response.read().decode('utf-8').split("\n") # Convert words_list to upper case. Needed if source has mixed case words_list = list(map(lambda x: x.upper(), words_list)) # Create a version of word_list with each entry sorted sorted_words_list = list(map(lambda x: "".join(sorted(x)), words_list)) print(find_anagrams("rats")) print(find_anagrams("RatS")) print(find_anagrams("Goldfish")) print(find_anagrams("SteaL"))
Expected output:
['ARTS', 'RATS', 'STAR']
['ARTS', 'RATS', 'STAR']
['GOLDFISH']
['LEAST', 'SLATE', 'STALE', 'STEAL', 'TALES']
Cracking Codes with Python
Check out this fun book on cracking codes with Python. Develop your Python coding skills while learning about cryptography.
Please note, as an Amazon Associate I earn from qualifying purchases.
Python Anagrams Challenge Solutions using Sorting.
Hopefully you had a good go at the challenge on your own before getting here. Even if you didn’t fully succeed, you will have gained valuable insight into the problem and which aspects of your own knowledge may need developing.
Here’s my solution:
import urllib.request
import time
LANGUAGE_REFERENCE_URL = "https://inventwithpython.com/dictionary.txt"
def find_anagrams_sorting(input_str):
input_str = input_str.upper()
sorted_input_str = "".join(sorted(input_str))
results = []
for idx, word in enumerate(sorted_words_list):
if word == sorted_input_str:
results.append(words_list[idx])
return results
with urllib.request.urlopen(LANGUAGE_REFERENCE_URL) as response:
words_list = response.read().decode('utf-8').split("\n")
words_list = list(map(lambda x: x.upper(), words_list)) # Needed if source has mixed case
sorted_words_list = list(map(lambda x: "".join(sorted(x)), words_list))
# Time execution for sorting version
start = time.perf_counter()
print(find_anagrams_sorting("rats"))
print(find_anagrams_sorting("RatS"))
print(find_anagrams_sorting("Goldfish"))
print(find_anagrams_sorting("SteaL"))
end = time.perf_counter()
print(f"Checked all anagrams in language reference using sorting. Seconds taken: {end - start:.7f}")
Python anagrams challenge using hash tables
A common way to get a fast solution to a coding problem is to use a hash table. These are collections of (key, data)
pairs which allow fast access by performing a transformation on the key which determines the data’s storage location. This same process can then be used to quickly retrieve the data from it’s location in memory.
Python Counters Example
A specific type of hash table which is very useful in solving the anagrams challenge is a Counter
. The idea is that if you keep a record of how many times each character in a word appears, an anagram of the same word will have the same number of each letter. So we can compare our “counters” to see whether two words are anagrams of each other.
Here’s a basic example of a Python counter:
from collections import Counter
word = "drink"
count_me = Counter(word)
print(count_me)
Output:
Counter({'d': 1, 'r': 1, 'i': 1, 'n': 1, 'k': 1})
Python Anagrams Challenge Solution using Hash Tables.
Here’s my solution using Python Counter
s which are a type of hash table. Notice that the code is quite similar to the above solution based on sorting.
import urllib.request
import time
from collections import Counter
LANGUAGE_REFERENCE_URL = "https://inventwithpython.com/dictionary.txt"
def find_anagrams_hash_table(input_str):
input_str = input_str.upper()
input_str_counter = Counter(input_str)
results = []
for idx, word in enumerate(word_counters):
if word == input_str_counter:
results.append(words_list[idx])
return results
with urllib.request.urlopen(LANGUAGE_REFERENCE_URL) as response:
words_list = response.read().decode('utf-8').split("\n")
words_list = list(map(lambda x: x.upper(), words_list)) # Needed if source has mixed case
word_counters = list(map(Counter, words_list))
# Time execution
start = time.perf_counter()
print(find_anagrams_hash_table("rats"))
print(find_anagrams_hash_table("RatS"))
print(find_anagrams_hash_table("Goldfish"))
print(find_anagrams_hash_table("SteaL"))
end = time.perf_counter()
print(f"Checked all anagrams in language reference using dictionary and counters. \
Seconds taken: {end - start:.7f}")
Comparing efficiency of sorting vs hashing solutions to the Python algorithms challenge
To compare the efficiency of the sorting vs hashing solutions to the Python algorithms challenge, I used perf_counter()
from the time
module and got the following results:
Hash Table Version Execution Time
Checked all anagrams in language reference using counters.
Seconds taken: 0.2034073
Sorting Version Execution Time
Checked all anagrams in language reference using sorting. Seconds taken: 0.0113170
These results are perhaps surprising, as generally hash tables are considered to be very fast (average case Big-O(1) for lookup), whereas sorting is often Big-O(n log n). A possible explanation that for such small words Big-O(n log n) is so small that anything else can quite easily drag down the run-time. Also, comparing counter objects involves comparing string keys and corresponding integer values. IN this context, this approach turns out to be less efficient than the sorting-based solution.
Code for timing execution
The code for this timing looked like this, with the function calls replaced with the appropriate function name for the sorting version:
Don’t for get to import time
at the top of you file if you want to try this yourself.
# Time execution for hash table version
start = time.perf_counter()
print(find_anagrams_hash_table("rats"))
print(find_anagrams_hash_table("RatS"))
print(find_anagrams_hash_table("Goldfish"))
print(find_anagrams_hash_table("SteaL"))
end = time.perf_counter()
print(f"Checked all anagrams in language reference using counters. Seconds taken: {end - start:.7f}")
This article has explored a less well-known variant of the “anagrams” coding challenge, where all anagrams of a given input within a language reference are to be found. Different approaches have been covered and their performance compared. I hope you found it interesting and helpful.
Happy Computing!