This document (R-Markdown file) is made for the R-part of the MathematicaVsR project “Handwritten digits classification by matrix factorization”.
The main goal of this document is to demonstrate how to do in R: - the ingestion images from binary files the MNIST database of images of handwritten digits, and - using matrix factorization to built a classifier, and - classifier evaluation by accuracy and F-score calculation.
The matrix factorization methods used are Singular Value Decomposition (SVD) and Non-negative Matrix Factorization (NMF).
The concrete steps taken follow.
Make a linear vector space representation of the images by simple unfolding.
For each digit find the corresponding representation matrix and factorize it.
Store the matrix factorization results in a suitable data structure. (These results comprise the classifier training.)
For a given test image (and its linear vector space representation) find the basis that approximates it best. The corresponding digit is the classifier prediction for the given test image.
Evaluate the classifier(s) over all test images and compute accuracy, F-Scores, and other measures.
More details about the classification algorithm are given in the blog post “Classification of handwritten digits”, [2]. The method and algorithm are described Chapter 10 of [3].
library(plyr)
library(ggplot2)
library(irlba)
library(MASS)
library(data.table)
library(doParallel)
library(devtools)
source_url("https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/R/NonNegativeMatrixFactorization.R")
## SHA-1 hash of file is e8ee968e4c02573c76a248e9835bd151c80a5cce
In order to use Non-negative Matrix Factorization (NMF) instead of SVD, the classification phase has to be modified since the obtained bases are not orthogonal. See below for theoretical and algorithmic details.
First we download the files given in the MNIST database site :
The following code follows very closely the R code for MNIST ingestion written by Brendan O’Connor (brendano).
ReadMNISTImages <- function( fileName, .progress = "none" ) {
toRead <- file( fileName, "rb")
## magic number, number of images, number of rows, number of columns
readInfo <- as.list( readBin( toRead, 'integer', n=4, size=4, endian="big") )
names(readInfo) <- c("MNum", "NImages", "NRows", "NColumns")
images <-
llply( 1:readInfo$NImages, function(i) {
mat <- matrix( readBin( toRead, 'integer', size = 1,
n= readInfo$NRows * readInfo$NColumns, endian="big", signed=F ),
readInfo$NRows, readInfo$NColumns )
mat[, nrow(mat):1]
}, .progress = .progress )
close(toRead)
images
}
ReadMNISTImageLabels <- function( fileName ) {
toRead <- file(fileName, "rb")
readLabelsInfo <- as.list( readBin( toRead, 'integer', n=2, size=4, endian="big") )
names(readLabelsInfo) <- c("MNum", "NImages")
labels = readBin(toRead, 'integer', n = readLabelsInfo$NImages, size=1, signed=F )
close(toRead)
labels
}
if( FALSE || !exists("trainImages") ) {
cat( "\tTime to read training images :", system.time( {
trainImages <- ReadMNISTImages("~/Datasets/MNIST/train-images-idx3-ubyte")
trainImagesLabels <- ReadMNISTImageLabels("~/Datasets/MNIST/train-labels-idx1-ubyte")
} ), "\n")
cat( "\tTime to read test images :", system.time( {
testImages <- ReadMNISTImages("~/Datasets/MNIST/t10k-images-idx3-ubyte")
testImagesLabels <- ReadMNISTImageLabels("~/Datasets/MNIST/t10k-labels-idx1-ubyte")
} ), "\n")
names(trainImages) <- paste("train", 1:length(trainImages), sep = "-" )
names(trainImagesLabels) <- paste("train", 1:length(trainImages), sep = "-" )
names(testImages) <- paste("test", 1:length(testImages), sep = "-" )
names(testImagesLabels) <- paste("test", 1:length(testImages), sep = "-" )
}
Number of images per digit for the training set:
count(trainImagesLabels)
## x freq
## 1 0 5923
## 2 1 6742
## 3 2 5958
## 4 3 6131
## 5 4 5842
## 6 5 5421
## 7 6 5918
## 8 7 6265
## 9 8 5851
## 10 9 5949
Visualize the first \(100\) images with their labels of the training set:
par( mfrow = c(10,10), mai = c(0,0,0,0))
for(i in 1:100){
image( trainImages[[i]], axes = FALSE, col = gray( 0:255 / 255 ) )
text( 0.2, 0, trainImagesLabels[[i]], cex = 1.4, col = 2, pos = c(3,4))
}
(If order to make the plot above with inverted colors use ‘col = gray( 255:0 / 255 )’.)
Number of images per digit for the testing set:
count(testImagesLabels)
## x freq
## 1 0 980
## 2 1 1135
## 3 2 1032
## 4 3 1010
## 5 4 982
## 6 5 892
## 7 6 958
## 8 7 1028
## 9 8 974
## 10 9 1009
Visualize the first \(100\) images with their labels of the testing set:
par( mfrow = c(10,10), mai = c(0,0,0,0))
for(i in 1:100){
image( testImages[[i]], axes = FALSE, col = gray( 0:255 / 255 ) )
text( 0.2, 0, testImagesLabels[[i]], cex = 1.4, col = 2, pos = c(3,4))
}
(If order to make the plot above with inverted colors use col = gray( 255:0 / 255 )
.)
Each image is flattened and it is placed as a row in a matrix.
if ( FALSE || !exists("trainImagesMat") ) {
cat( "\tTime to make the training images matrix:", system.time(
trainImagesMat <- laply( trainImages, function(im) as.numeric( im ), .progress = "none" )
), "\n")
cat( "\tTime to make the training images matrix:", system.time(
testImagesMat <- laply( testImages, function(im) as.numeric( im ), .progress = "none" )
), "\n")
classLabels <- sort( unique( trainImagesLabels ) )
}
Note here we also have computed the class labels from the data. (We expect them to be all digits from 0 to 9.)
The following code takes a sub-matrix for each digit and factorizes it using SVD. The factorization results are put in a list with named elements. (The element names being the digits.)
cat( "\tTime to SVD factorize the training images sub-matrices :", system.time( {
svdRes <-
llply( classLabels, function(cl) {
inds <- trainImagesLabels == cl
smat <- trainImagesMat[ inds, ]
smat <- as( smat, "sparseMatrix")
irlba( A = smat, nv = 40, nu = 40, maxit = 100, tol = 1E-6 )
}, .progress = "none" )
names(svdRes) <- classLabels
}), "\n")
## Time to SVD factorize the training images sub-matrices : 5.353 0.454 5.809 0 0
This multi-panel plot shows the singular values for each SVD factorization:
diagDF <- ldply( names(svdRes), function(x) data.frame( Digit = x, SingularValue = rev(sort(svdRes[[x]]$d)), Index = 1:length(svdRes[[x]]$d), stringsAsFactors = FALSE ) )
ggplot(diagDF) + geom_point( aes( x = Index, y = SingularValue) ) + facet_wrap( ~ Digit, ncol = 5 )
Here is how the basis for \(5\) looks like:
dlbl <- "5"
U <- svdRes[[dlbl]]$U; V <- svdRes[[dlbl]]$v; D <- svdRes[[dlbl]]$d
par( mfrow = c(5,8), mai = c(0,0,0,0))
for(i in 1:40){
image( matrix(V[,i], 28, 28), axes = FALSE, col = gray( 0:255 / 255 ) )
text( 0.2, 0, i, cex = 1.2, col = 2, pos = c(3,4))
}
par(mfrow=c(1,1), mai = c(1,1,1,1))
With SVD we get the first vector (corresponding to the largest singular value) to be the baseline image for a digit sub-set and the rest are corrections to be added or subtracted from that baseline image. Compare with NMF basis in which all vectors are interpret-able as handwritten images.
vnorm <- function(x) sqrt( sum(x^2) )
#' @description Classify by representation over an SVD basis.
SVDClassifyImageVector <- function( factorizationsRes, vec ) {
residuals <-
laply( factorizationsRes, function(x){
if( is.null(x$mean) ) { lv <- vec } else { lv <- vec - x$mean }
rv <- lv - x$v %*% ( t(x$v) %*% lv )
vnorm(rv)
})
names(residuals) <- names(factorizationsRes)
list( Label = names(factorizationsRes)[ which.min(residuals) ], Residuals = residuals )
}
Here is an example classification with this function:
testImagesLabels[[123]]
## [1] 7
SVDClassifyImageVector( factorizationsRes = svdRes, vec = testImagesMat[123,] )
## $Label
## [1] "7"
##
## $Residuals
## 0 1 2 3 4 5 6
## 1214.8866 991.6803 1087.7770 955.3925 958.6550 1040.5932 1305.7998
## 7 8 9
## 544.9251 1041.5995 779.4606
The following command runs the classification function over each row (image) of the testing set matrix representation.
if(!exists("svdClRes")) {
cat( "\tTime to compute the SVD classifier evalution :", system.time( {
svdClRes <-
ldply( 1:nrow(testImagesMat), function(i) {
res <- SVDClassifyImageVector( factorizationsRes = svdRes, vec = testImagesMat[i,] )
data.frame( Actual = testImagesLabels[[i]], Predicted = res$Label, stringsAsFactors = FALSE )
}, .progress = "none")
}), "\n")
}
mean( svdClRes$Actual == svdClRes$Predicted )
## [1] 0.957
svdClResDT <- data.table( svdClRes )
svdClResDT[ , .( .N, Accuracy = mean(Actual == Predicted) ), by = Actual]
## Actual N Accuracy
## 1: 7 1028 0.9377432
## 2: 2 1032 0.9437984
## 3: 1 1135 0.9894273
## 4: 0 980 0.9928571
## 5: 4 982 0.9725051
## 6: 9 1009 0.9474727
## 7: 5 892 0.9316143
## 8: 6 958 0.9665971
## 9: 3 1010 0.9326733
## 10: 8 974 0.9507187
xtabs(~ Actual + Predicted, svdClRes)
## Predicted
## Actual 0 1 2 3 4 5 6 7 8 9
## 0 973 0 0 0 0 0 2 1 4 0
## 1 0 1123 6 1 0 0 1 0 4 0
## 2 9 6 974 4 4 0 1 10 23 1
## 3 7 1 13 942 0 15 0 4 23 5
## 4 1 2 3 0 955 0 5 1 3 12
## 5 4 2 2 17 0 831 6 2 25 3
## 6 8 3 0 0 1 15 926 0 5 0
## 7 1 6 14 1 4 1 0 964 4 33
## 8 5 2 6 17 2 4 1 3 926 8
## 9 5 7 2 7 11 2 0 9 10 956
I tried using the CRAN package NMF but it was too slow. Here we are using the implementation from MathematicaForPediction at GitHub.
The following code takes a sub-matrix for each digit and factorizes it using SVD. The factorization results are put in a list with named elements. (The element names being the digits.)
if(!exists("nnmfRes")) {
cat( "\tTime to compute the NMF of training images sub-matrices :", system.time( {
nnmfRes <-
llply( classLabels, function(cl) {
inds <- trainImagesLabels == cl
smat <- trainImagesMat[ inds, ]
smat <- as( smat, "sparseMatrix")
res <- NNMF( V = smat, k = 40, maxSteps = 20, tolerance = 1E-4, regularizationParameter = 0.1 )
res <- NNMFNormalizeMatrixProduct( W = res$W, H = res$H, normalizeLeft = TRUE )
bres <- NNMFNormalizeMatrixProduct( W = res$W, H = res$H, normalizeLeft = FALSE )
bres$D <- aaply( as.matrix(res$H), 1, vnorm )
bres$invH <- ginv( as.matrix( bres$H ) )
bres$Wrn <- as( aaply( as.matrix(bres$W), 1, function(x) x / vnorm(x) ), "sparseMatrix")
bres$M <- smat
bres
}, .progress = "none" )
names(nnmfRes) <- classLabels
}), "\n")
}
The following multi-panel plot shows the significance of the obtained NMF basis vectors for each digit.
diagDF <- ldply( names(nnmfRes), function(x) data.frame( Digit = x, SingularValue = nnmfRes[[x]]$D, Index = 1:length(nnmfRes[[x]]$D), stringsAsFactors = FALSE ) )
ggplot(diagDF) + geom_point( aes( x = Index, y = SingularValue) ) + facet_wrap( ~ Digit, ncol = 5 )
Here is how the basis for \(5\) looks like:
dlbl <- "5"
H <- nnmfRes[[dlbl]]$H
par( mfrow = c(5,8), mai = c(0,0,0,0))
for(i in 1:40){
image( matrix(H[i,], 28, 28), axes = FALSE, col = gray( 0:255 / 255 ) )
text( 0.2, 0, i, cex = 1.2, col = 2, pos = c(3,4))
}
par(mfrow=c(1,1), mai = c(1,1,1,1))
As expected, with NMF the basis vectors are interpret-able as handwritten digits image “topics”.
Given a matrix \(M \in \mathbf{R}^{n \times m}\) comprised of \(n\) image vectors, the classification process for NMF is more complicated than that with SVD because the rows of the factor \(H\) of the factorization \(M=W H\) (the new basis) are not orthogonal to each other. Instead, for an image vector \(v \in \mathbf{R}^{m}\) we have to look for the nearest neighbors in the matrix \(W \in \mathbf{R}^{n \times k}\) of the vector \(v H^{-1} \in \mathbf{R}^k\). The labels of those nearest neighbors are used to predict the label of \(v\).
#' @description Classify by representation over a NMF basis.
NNMFClassifyImageVector <- function( factorizationsRes, vec, numberOfTopBasisVectors = 4, distanceFunction = "euclidean" ) {
residuals <-
laply( factorizationsRes, function(x) {
approxVec <- matrix(vec,1) %*% x$invH
if( distanceFunction == "euclidean" ) {
dmat <- as.matrix(x$W) - matrix( rep(approxVec,nrow(x$W)), nrow = nrow(x$W), byrow = TRUE )
## inds <- order(apply( dmat, 1, vnorm))[1:numberOfTopBasisVectors]
dmat <- rowSums(dmat^2)
inds <- order(dmat)[1:numberOfTopBasisVectors]
} else {
approxVec <- approxVec[1,] / vnorm( approxVec[1,] )
inds <- order( x$Wrn %*% approxVec )[1:numberOfTopBasisVectors]
}
approxVec <- colSums( x$M[inds,,drop=F] )
rv <- vec / vnorm(vec) - approxVec / vnorm( approxVec )
vnorm(rv)
}, .progress = "none" )
names(residuals) <- names(factorizationsRes)
list( Label = names(factorizationsRes)[ which.min(residuals) ], Residuals = residuals )
}
Here is an example classification with this function:
testImagesLabels[[123]]
## [1] 7
NNMFClassifyImageVector( factorizationsRes = nnmfRes, vec = testImagesMat[123,] )
## $Label
## [1] "7"
##
## $Residuals
## 0 1 2 3 4 5 6
## 0.9986632 0.9178749 1.0210067 0.8368899 0.9584976 0.9944164 1.1281940
## 7 8 9
## 0.5303352 1.0370188 0.7755566
The classification process for NMF is slower and in order to speed it up we are going to use parallel computations.
if ( !exists("nnmfClRes") ) {
if ( FALSE ) {
cat( "\tTime to sequentielly compute the NMF classifier evalution :", system.time( {
nnmfClRes <-
ldply( 1:nrow(testImagesMat), function( i ) {
res <- NNMFClassifyImageVector( factorizationsRes = nnmfRes, vec = testImagesMat[i,], numberOfTopBasisVectors = 20, distanceFunction = "euclidean" )
data.frame( Actual = testImagesLabels[[i]], Predicted = res$Label, stringsAsFactors = FALSE )
}, .progress = "time" )
}), "\n")
} else {
# Function definition for slicing a vector.
Slice <- function(x, n) split(x, as.integer((seq_along(x) - 1) / n))
if ( !exists("cl") || !("SOKcluster" %in% class(cl)) ) {
mcCores <- 4
cl <- makeCluster( mcCores )
registerDoParallel( cl )
}
slicedIndsList <- Slice( 1:nrow(testImagesMat), ceiling( nrow(testImagesMat) / mcCores ) )
startTime <- Sys.time()
nnmfClRes <-
foreach( parInds = slicedIndsList, .combine = rbind, .packages = c("Matrix") ) %dopar% {
ldply( parInds, function( i ) {
res <- NNMFClassifyImageVector( factorizationsRes = nnmfRes, vec = testImagesMat[i,], numberOfTopBasisVectors = 20, distanceFunction = "euclidean" )
data.frame( Actual = testImagesLabels[[i]], Predicted = res$Label, stringsAsFactors = FALSE )
} )
}
endTime <- Sys.time()
cat("\n\t\tNMF classification time in parallel on", mcCores, "cores is :", difftime( endTime, startTime, units = "secs"), "\n" )
}
}
mean( nnmfClRes$Actual == nnmfClRes$Predicted )
## [1] 0.9683
nnmfClResDT <- data.table( nnmfClRes )
nnmfClResDT[ , .( .N, Accuracy = mean(Actual == Predicted) ), by = Actual]
## Actual N Accuracy
## 1: 7 1028 0.9640078
## 2: 2 1032 0.9767442
## 3: 1 1135 0.9947137
## 4: 0 980 0.9877551
## 5: 4 982 0.9663951
## 6: 9 1009 0.9435084
## 7: 5 892 0.9506726
## 8: 6 958 0.9832985
## 9: 3 1010 0.9435644
## 10: 8 974 0.9681725
xtabs(~ Actual + Predicted, nnmfClRes)
## Predicted
## Actual 0 1 2 3 4 5 6 7 8 9
## 0 968 0 3 0 0 2 1 2 3 1
## 1 0 1129 2 0 1 1 0 1 1 0
## 2 4 0 1008 2 0 0 1 9 7 1
## 3 0 0 8 953 0 14 0 9 24 2
## 4 1 1 1 0 949 0 1 5 2 22
## 5 2 0 2 19 1 848 6 1 11 2
## 6 7 3 0 0 2 3 942 0 1 0
## 7 1 5 9 0 7 0 0 991 3 12
## 8 1 0 0 16 2 5 0 3 943 4
## 9 4 3 2 2 21 5 0 15 5 952
[1] Yann LeCun et al., MNIST database site.
[2] Anton Antonov, “Classification of handwritten digits”, (2013), blog post at MathematicaForPrediction at WordPress.
[3] Lars Elden, Matrix Methods in Data Mining and Pattern Recognition, 2007, SIAM.