In this lecture we will implement PCA algorithm through Python. We will also see how to reduce features in the data set.
About Minist Data Set
The MNIST dataset (Modified National Institute of Standards and Technology database) is a large dataset of handwritten digits that is commonly used for training various image processing systems. Available on kaggle (https://www.kaggle.com/c/digit-recognizer/data)
The database is also widely used for training and testing in the field of machine learning.
- The dataset consists of pair, “handwritten digit image” and “label”. Digit ranges from 0 to 9, meaning 10 patterns in total. handwritten digit image: This is gray scale image with size 28 x 28 pixel.
- label : This is actual digit number this handwritten digit image represents. It is either 0 to 9.
In this data sets around 42000 rows and 784 columns are available, we will try to reduce features from 784, so that we will have less features and maximum information.
Let’s explore the concept through jupyter notebook.