Classifying Handwriting Using Feedforward Neural Networks

Girik Narang
14 min readDec 16, 2022

--

Photo by Jason Brownlee on Machine Learning Mastery

Take a minute to try to read out the numbers in the picture above.

Seems simple enough, right?

What if I told you this was actually a challenging task… for a computer?

It turns out that even as intelligent and complex as these machines can be, there are some things computers just can’t do, at least not as well as humans.

Computers would fail to recognize these digits invariably as each image contains inconsistencies in pixels. In other words, if two digits aren’t the EXACT copy of one another, an untrained computer would presumably fail to recognize the underlying similarities and, as a result, inaccurately group the two.

Since Each “3” Is Written Differently, Computers May Have a Hard Time Classifying Them

Attempting to convert written sentences to typed form can get even more complicated. The computer now has to factor in additional variables humans wouldn’t even think twice about, including the start of new words, letter spacing, capitals vs lowercase, and messy handwriting.

Interestingly enough, that’s where machine learning can be applied.

Machine learning, or ML, is a subfield of artificial intelligence (AI) that analyses and develops algorithms that can learn from data, identify patterns, and make decisions/predictions with minimal human intervention.

Machine learning algorithms can be divided into four types:

  • Supervised learning algorithms use labelled data to learn how to classify or predict a specific output.
  • Unsupervised learning algorithms use unlabeled data to identify patterns and hidden relationships between data points.
  • Semisupervised learning algorithms use both labelled and unlabeled data to produce more accurate models.
  • Reinforcement learning algorithms use rewards and punishments to learn how to optimize an action or decision.
Breakdown of Machine Learning Types

Neural Networks

One of AI’s most widespread and commonly used buzzwords is neural networks. These learning algorithms are modelled after the human brain and are used to solve complex problems that are generally too demanding for traditional algorithms. Each “neuron” is connected and communicates with each other, just like in the human brain and transmits information through “synapses,” which act like electrical signals.

Neural networks can learn from data by analyzing and recognizing patterns, then making decisions based on those patterns. They can adjust their parameters as new data is presented, allowing them to perform more complex tasks as they become more familiar with the data. This process of learning from data is known as “deep learning.”

The above diagram illustrates the typical architecture of a neural network. It starts with an input layer that takes in certain data from a specific dataset and ends with an output layer that predicts the final outcome. In between, hidden layers perform complex computations in the network to get from the input to the output layer. If there is more than one hidden layer, the neural network is considered “deep.”

Going back to our original digit classifying dilemma, we can use a specific type of neural network, called feedforward neural networks, to train an algorithm that simulates the activity in our brains by using patterns to identify the characteristics of a particular digit instead of simply trying to memorize it.

In feedforward neural networks, the algorithm only moves in one direction: forward, from the input layer to the many hidden layers to the output layer. Furthermore, it resets after every epoch, which is one complete run through all of the data forwards and backward when training/testing, resulting in higher accuracies.

Now that we have this information, we can actually employ these concepts to develop a handwriting classifier using Google Collaboratory. This free cloud-based programming environment allows users to write code, develop machine learning models, and collaborate with other developers.

If you’re new to programming or Python, don’t worry; I’ll keep this section as straightforward as possible while still breaking all the code down.

Finally, we will need to take the following steps to ensure this project is a success:

  1. Find or create a labelled data set to train our neural network
  2. Create the neural network
  3. Train, test, and modify our code until we feel like our model is accurate
  4. Test our model with unique data and analyze the results

Building The Project

Step #1: Importing Libraries

The first thing we’ll do is import the EMNIST libraries so that we have access to the training set.

!git clone https://github.com/sorki/python-mnist
!./python-mnist/bin/mnist_get_data.sh
!pip3 install emnist
from emnist import extract_training_samples

print("Imported the EMNIST libraries we need!")

Step #2: Extracting Training Samples From the Imported Libraries

We need to access data from the imported libraries to prepare for our model. Specifically, we will access the EMNIST “letters” dataset, which consists of 145,600 28x28 pixel images of letters. Each pixel is a grayscale value between 0 and 255. To provide a more efficient input for the neural network, we divide each value by 255 and transform it into a number between 0 and 1. This approach is referred to as pre-processing and may vary depending on the data type.

This dataset contains labels for the images our neural network will learn from. We will use the first 60,000 images as the training set and the following 10,000 images as the testing set to evaluate how well the network learned. This is an arbitrary choice, but it is a large enough sample to produce an accurate model. Remember to keep the training and testing sets separate if you want to try different samples.

# Grab the data from the OpenML website
# X will be our images and y will be the labels
X, y = extract_training_samples('letters')

# Make sure that every pixel in all of the images is a value between 0 and 1
X = X / 255.

# Use the first 60000 instances as training and the next 10000 as testing
X_train, X_test = X[:60000], X[60000:70000]
y_train, y_test = y[:60000], y[60000:70000]

# There is one other thing we need to do, we need to
# record the number of samples in each dataset and the number of pixels in each image
X_train = X_train.reshape(60000,784)
X_test = X_test.reshape(10000,784)

print("Extracted our samples and divided our training and testing data sets")

It might take a little time to run this cell — remember that this code essentially downloads 70,000 pictures!

Step #3: Verify That the Data Was Downloaded Correctly

import matplotlib.pyplot as plt

img_index = 14000 # <<<<< You can update this value to look at other images
img = X_train[img_index]
print("Image Label: " + str(chr(y_train[img_index]+96)))
plt.imshow(img.reshape((28,28)))

Let’s double-check the data we downloaded by examining the photos and labels in the set. Swap the img_index with various numbers and run the code to check other training examples.

We can easily recognize some of the letters in the dataset (for example, 14000 is an apparent “M”), though others can be more difficult to decipher.

It is crucial that we can be sure that the data was downloaded successfully, thus enabling us to move forward with the next steps.

Step #4: Building Our MLP Neural Network

Having studied the data, let’s construct a multi-layer perception (MLP) neural network to accept images as inputs and output the correct letter. To expedite the process, we’ll leverage an existing library that provides some MLP components rather than writing the entire network from scratch. This approach is often employed when tackling AI challenges, as existing libraries can be tweaked and adapted to suit our needs.

To try this out, we’ll create a neural network — an MLP classifier — with one hidden layer that has 50 neurons in it. We’ll have it run through the training data 20 times (that’s max_iter=20) so it doesn't take too long.

We also set various other parameters for teaching the MLP. Some of these, such as the learning rate, may be familiar to you. By altering these parameters, we can affect how the MLP behaves in the training and testing phase. I recommend starting with the default settings and then working on it from there.

# These two lines import the ML libraries we need
from sklearn.datasets import fetch_openml
from sklearn.neural_network import MLPClassifier

# This creates our first MLP with 1 hidden layer with 50 neurons and sets it to run through the data 20 times
mlp1 = MLPClassifier(hidden_layer_sizes=(50,), max_iter=20, alpha=1e-4,
solver='sgd', verbose=10, tol=1e-4, random_state=1,
learning_rate_init=.1)

print("Created our first MLP network")

Step #5: Training the MLP

Now that we have created the MLP let's test it out. Use the “fit” command to start the training process and see how well it can classify handwritten letters. It could take a moment for the training to complete. During the training, the program will print out status updates about the learning process. As the procedure progresses, the loss value should decrease, which means the MLP is becoming better at recognizing the examples it is being taught.

mlp1.fit(X_train, y_train)
print("Training set score: %f" % mlp1.score(X_train, y_train))
print("Test set score: %f" % mlp1.score(X_test, y_test))

Step #6: Analyze Training + Testing Accuracy

By executing this multiple times, you should observe that the accuracy and loss stay in the same range but with some variation. This is because the neural network weights are reset with random values each time. Usually, you will observe a training dataset accuracy of around 89% and a testing dataset accuracy of 84%, as long as you haven’t modified anything involving the MLP structure.

That’s actually pretty good — it means that the network is guessing correctly 84% of the time on images it’s never been exposed to before. We could further examine the dataset to determine what went wrong and improve the results.

# First let's initialize a list with all the predicted values from the training set
y_pred = mlp1.predict(X_test)

# Now let's visualize the errors between the predictions and the actual labels using a confusion matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
plt.matshow(cm)

Step #7: Identify Commonly Confused Letters

The brightness of each cell in the confusion matrix indicates the number of elements in that cell, with brighter colours indicating more elements. The matrix rows correspond to the correct label, while the matrix columns correspond to the predicted label. The numbers on the x-axis and y-axis of the matrix refer to the 26 letters in the alphabet (in EMNIST, these are represented by numbers, with 0=A, 1=B, etc.). For example, the brightness of cell (0,0) in the matrix shows the number of times that our network correctly classified an “A” as an “A.”

The diagonal line of cells in the matrix represents the instances in which the predicted label was correct. It is desirable to see a bright diagonal line in the confusion matrix, which indicates a high number of correct predictions.

We can investigate further if any cells (off the diagonal) appear significantly brighter than the others.

It can be easy to mix up letters such as “I” and “L” (“U” and “V” are also examples of this), so let’s review any cases where this occurs. You can modify the code and run it as many times as you wish to observe examples of these mistakes.

# You can change this to any letters that you think the neural network may have confused...
predicted_letter = 'l'
actual_letter = 'i'


# This code counts all mistakes for the letters above
mistake_list = []
for i in range(len(y_test)):
if (y_test[i] == (ord(actual_letter) - 96) and y_pred[i] == (ord(predicted_letter) - 96)):
mistake_list.append(i)
print("There were " + str(len(mistake_list)) + " times that the letter " + actual_letter + " was predicted to be the letter " + predicted_letter + ".")

# Once we know how many mistakes were made, we can change this to see an image of a particular one
mistake_to_show = 4 # <<< e.g., change this to 3 if you want to see the 4th mistake

# This code checks that the number mistake you asked for can be shown and if so, displays an image of it
if (len(mistake_list)> mistake_to_show):
img = X_test[mistake_list[mistake_to_show]]
plt.imshow(img.reshape((28,28)))
else:
print("Couldn't show mistake number " + str(mistake_to_show + 1) + " because there were only " + str(len(mistake_list)) + " mistakes to show!")

Step #8: Alter Variables + Increase Iterations (Improve Accuracy)

On EMNIST, the state-of-the-art machine learning techniques get higher than 96% accuracy, so our network could surpass 84%. Let’s experiment and see what we can do.

To begin with, we can have more hidden layers and more neurons in the hidden layers. For example, to add another hidden layer of 50 neurons, you can do the following: hidden_layer_sizes=(50,50,).

We can also try training for more epochs (or iterations). For example, we can change our max_iter=30.

Experimenting with different structures can be fun; however, remember that increasing layers and epochs will mean a more extended training period for the network.

Let us start with 5 layers of 100 neurons each and train for 50 epochs.

# Change some of the values in the below statement and re-run to see how they 
# affect performance!
mlp2 = MLPClassifier(hidden_layer_sizes=(100,100,100,100,100,), max_iter=50, alpha=1e-4,
solver='sgd', verbose=10, tol=1e-4, random_state=1,
learning_rate_init=.1)
mlp2.fit(X_train, y_train)
print("Training set score: %f" % mlp2.score(X_train, y_train))
print("Test set score: %f" % mlp2.score(X_test, y_test))

Step #9: Grab New Data To Retest the Network

This modified dataset from Crash Course AI is what we’ll use in our code, and you can find it in the GitHub repository here:

# Pulls the scanned data set from GitHub
!git clone https://github.com/crash-course-ai/lab1-neural-networks.git
!git pull
!ls lab1-neural-networks/letters_mod
!cd /content/lab1-neural-networks/letters_mod
!pwd

# Puts all the data in the "files" variable
import os
path, dirs, files = next(os.walk("/content/lab1-neural-networks/letters_mod/"))
files.sort()

Step #10: Registering the Images

We will go through each image, add it to a list, and print it out to ensure it has been read correctly. You can change the index if you want to view different letters.

# These libraries let us import the letters, resize them, and print them out
import cv2
import matplotlib.pyplot as plt

# This code processes all the scanned images and adds them to the handwritten_story
handwritten_story = []
for i in range(len(files)):
img = cv2.imread("/content/lab1-neural-networks/letters_mod/"+files[i],cv2.IMREAD_GRAYSCALE)
handwritten_story.append(img)

print("Imported the scanned images.")

plt.imshow(handwritten_story[4]) #<--- Change this index to see different letters

Step #11: Testing With the Handwritten Sentence

We’ll put the handwritten sentence into our trained MLP and see how it performs.

# These are libraries we need to do some math on the image
# to be able to give it to the MLP in the right format and to resize it to 28x28 pixels
import numpy
import cv2

typed_story = ""
for letter in handwritten_story:
letter = cv2.resize(letter, (28,28), interpolation = cv2.INTER_CUBIC)
single_item_array = (numpy.array(letter)).reshape(1,784)
prediction = mlp2.predict(single_item_array)
typed_story = typed_story + str(chr(prediction[0]+96))

print("Conversion to typed story complete!")
print(typed_story)

Step #12: Adding Blank Spaces

Understandably, the output is nonsensical.

We did not teach our model to recognize spaces. However, we can train it to identify blank images just as it does with handwritten letters. Consequently, we will assess each image to see if it is a blank space, and if it is, then a space will be added to the sentence.

# This is a library we need to do some math on the image to be able to give it to the MLP in the right format
import numpy

typed_story = ""
for letter in handwritten_story:
letter = cv2.resize(letter, (28,28), interpolation = cv2.INTER_CUBIC)

#this bit of code checks to see if the image is just a blank space by looking at the color of all the pixels summed
total_pixel_value = 0
for j in range(28):
for k in range(28):
total_pixel_value += letter[j,k]
if total_pixel_value < 20:
typed_story = typed_story + " "
else: #if it NOT a blank, it actually runs the prediction algorithm on it
single_item_array = (numpy.array(letter)).reshape(1,784)
prediction = mlp2.predict(single_item_array)
typed_story = typed_story + str(chr(prediction[0]+96))

print("Conversion to typed story complete!")
print(typed_story)

Step #13: Resizing Labels

There is a noticeable improvement here, as there are separate words, and the first word is “The.” But there’s still something going wrong.

We need to make the strokes more blurry, put the letter at the very center of the image, crop out the rest, and resize each one to 28x28 pixels.

Once the processing has been completed, we can print out a single letter to evaluate the result. We can also modify the index in the code to display a different letter if desired.

# These steps process the scanned images to be in the same format and have the same properties as the EMNIST images
# They are described by the EMNIST authors in detail here: https://arxiv.org/abs/1702.05373v1
processed_story = []

for img in handwritten_story:
#step 1: Apply Gaussian blur filter
img = cv2.GaussianBlur(img, (7,7), 0)

#steps 2 and 3: Extract the Region of Interest in the image and center in square
points = cv2.findNonZero(img)
x, y, w, h = cv2.boundingRect(points)
if (w > 0 and h > 0):
if w > h:
y = y - (w-h)//2
img = img[y:y+w, x:x+w]
else:
x = x - (h-w)//2
img = img[y:y+h, x:x+h]

#step 4: Resize and resample to be 28 x 28 pixels
img = cv2.resize(img, (28,28), interpolation = cv2.INTER_CUBIC)

#step 5: Normalize pixels and reshape before adding to the new story array
img = img/255
img = img.reshape((28,28))
processed_story.append(img)

print("Processed the scanned images.")

import matplotlib.pyplot as plt
plt.imshow(processed_story[4]) #<<< change this index if you want to see a different letter from the story

Step #14: Processing the Handwritten Sentence One Last Time

Now we can process the sentence with our neural network one last time!

# This is a library we need to do some math on the image to be able to give it to the MLP in the right format
import numpy

typed_story = ""
for letter in processed_story:
#this bit of code checks to see if the image is just a blank space by looking at the color of all the pixels summed
total_pixel_value = 0
for j in range(28):
for k in range(28):
total_pixel_value += letter[j,k]
if total_pixel_value < 20:
typed_story = typed_story + " "
else: #if it NOT a blank, it actually runs the prediction algorithm on it
single_item_array = (numpy.array(letter)).reshape(1,784)
prediction = mlp2.predict(single_item_array)
typed_story = typed_story + str(chr(prediction[0]+96))

print("Conversion to typed story complete!")
print(typed_story)

Although the result isn’t wholly accurate, I think it emphasizes the true power of computers and their ability to learn quickly. Just think about how you could leverage machine learning to go from being completely unable to recognize similar letters to just under 90% accurate in fifteen minutes.

Imagine how much more we could accomplish with AI and ML.

So how did it go for you? Were you able to follow along successfully?

If you’re seeking more clarification behind the code or want to see the result, check out the original Crash Course video, where they go step-by-step with you throughout the whole project.

This project was modified from Crash Course AI’s How to make an AI read your handwriting tutorial. See the original project above.

For quick access to all the code, here’s the complete collab:

Let’s Connect!

If you enjoyed reading this article or learned something new, I’d love to connect on LinkedIn. If you’d like to stay updated on my recent articles or projects, you can subscribe to my monthly newsletter here!

--

--

Girik Narang
Girik Narang

Written by Girik Narang

just trying to solve problems.

No responses yet