Scraping And Finding Ordered Words

What are ordered words?

An ordered word is a word in which the letters appear in alphabetic order. For example abbey & dirt. The rest of the words are unordered for example geeks

The task at hand

This task is taken from Rosetta Code and it is not as mundane as it sounds from the above description. To get a large number of words we will use an online dictionary available on http://www.puzzlers.org/pub/wordlists/unixdict.txt which has a collection of about 2,500 words and since we are gonna be using python we can do that by scraping the dictionary instead of downloading it as a text file and then doing some file handling operations on it.

Requirements:

pip install requests

Code

Recommended: Please try your approach on {IDE} first, before moving on to the solution.


Scraping

  1. Using the python library requests we will fetch the data from the given URL
  2. Store the content fetched from the URL as a string
  3. Converting the long string of content into a list of words

Finding the ordered words

  1. Traversing the list of words
  2. Pairwise comparison of the ASCII value of every adjacent character in each word
  3. Otherwise printing the ordered word
# Python program to find ordered words
import requests

# Scrapes the words from the URL below and stores
# them in a list
def getWords():

	# contains about 2500 words

	fetchData = requests.get(url)

	# extracts the content of the webpage
	wordList = fetchData.content

	# decodes the UTF-8 encoded text and splits the
	# string to turn it into a list of words
	wordList = wordList.decode("utf-8").split()

	return wordList


# function to determine whether a word is ordered or not
def isOrdered():

	# fetching the wordList
	collection = getWords()

	# since the first few of the elements of the
	# dictionary are numbers, getting rid of those
	# numbers by slicing off the first 17 elements
	collection = collection[16:]
	word = ''

	for word in collection:
		result = 'Word is ordered'
		i = 0
		l = len(word) - 1

		if (len(word) < 3): # skips the 1 and 2 lettered strings
			continue

		# traverses through all characters of the word in pairs
		while i < l:		
			if (ord(word[i]) > ord(word[i+1])):
				result = 'Word is not ordered'
				break
			else:
				i += 1

		# only printing the ordered words
		if (result == 'Word is ordered'):
			print(word,': ',result)


# execute isOrdered() function
if __name__ == '__main__':
	isOrdered()
Output:
aau: Word is ordered
abbe: Word is ordered
abbey: Word is ordered
abbot: Word is ordered
abbott: Word is ordered
abc: Word is ordered
abe: Word is ordered
abel: Word is ordered
abet: Word is ordered
abo: Word is ordered
abort: Word is ordered
accent: Word is ordered
accept: Word is ordered
...........................
...........................
...........................