Archive 2020

Comparison Operators

In this lecture we will be learning about Comparison Operators in Python. These operators will allow us to compare variables and output a Boolean value (True or False).

If you have any sort of background in Math, these operators should be very straight forward.

First we’ll present a table of the comparison operators and then work through some examples:

Table of Comparison Operators 

In the table below, a=3 and b=4.

OperatorDescriptionExample
==If the values of two operands are equal, then the condition becomes true.(a == b) is not true.
!=If values of two operands are not equal, then condition becomes true.(a != b) is true
>If the value of left operand is greater than the value of right operand, then condition becomes true.(a > b) is not true.
<If the value of left operand is less than the value of right operand, then condition becomes true.(a < b) is true.
>=If the value of left operand is greater than or equal to the value of right operand, then condition becomes true.(a >= b) is not true.
<=If the value of left operand is less than or equal to the value of right operand, then condition becomes true.(a <= b) is true.

Let’s now work in jupyter notebook through quick examples of each of these.

Python Statements

In this post we will be doing a quick overview of Python Statements. This post will emphasize differences between Python and other languages such as C++.

There are two reasons we take this approach for learning the context of Python Statements:

  • If you are coming from a different language this will rapidly accelerate your understanding of Python.
  • Learning about statements will allow you to be able to read other languages more easily in the future.

Python vs Other Languages

Let’s create a simple statement that says: “If a is greater than b, assign 2 to a and 4 to b”

Take a look at these two if statements (we will learn about building out if statements soon).

Version 1 (Other Languages)

if (a>b){
a = 2;
b = 4;
}

Version 2 (Python)

if a>b:
a = 2
b = 4

You’ll notice that Python is less cluttered and much more readable than the first version. How does Python manage this?

Let’s walk through the main differences:

Python gets rid of () and {} by incorporating two main factors: a colon and whitespace. The statement is ended with a colon, and whitespace is used (indentation) to describe what takes place in case of the statement.

Another major difference is the lack of semicolons in Python. Semicolons are used to denote statement endings in many other languages, but in Python, the end of a line is the same as the end of a statement.

Lastly, to end this brief overview of differences, let’s take a closer look at indentation syntax in Python vs other languages:

Indentation

Here is some pseudo-code to indicate the use of whitespace and indentation in Python:

Other Languages

if (x)
    if(y)
        code-statement;
else
    another-code-statement;

Python

if x:
    if y:
        code-statement
else:
    another-code-statement

Note how Python is so heavily driven by code indentation and whitespace. This means that code readability is a core part of the design of the Python language.

Now let’s start diving deeper by coding these sort of statements in Python!

Time to code!

if, elif, else Statements

if Statements in Python allows us to tell the computer to perform alternative actions based on a certain set of results.

Verbally, we can imagine we are telling the computer:

“Hey if this case happens, perform some action”

We can then expand the idea further with elif and else statements, which allow us to tell the computer:

“Hey if this case happens, perform some action. Else, if another case happens, perform some other action. Else, if none of the above cases happened, perform this action.”

Let’s go ahead and look at the syntax format for if statements to get a better idea of this:

if case1:
    perform action1
elif case2:
    perform action2
else: 
    perform action3

Let’s Explore more example in jupyter notebook.

Python for Loops

A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).

This is less like the for keyword in other programming languages, and works more like an iterator method as found in other object-orientated programming languages.

With the for loop we can execute a set of statements, once for each item in a list, tuple, set etc.

Here’s the general format for a for loop in Python:

for item in object:
_____statements to do stuff

The variable name used for the item is completely up to the coder, so use your best judgment for choosing a name that makes sense and you will be able to understand when revisiting your code. This item name can then be referenced inside your loop, for example if you wanted to use if statements to perform checks.

Let’s go ahead and work through several example of for loops using a variety of data object types. We’ll start simple and build more complexity later on.

Python while Loops

The while statement in Python is one of most general ways to perform iteration. A while statement will repeatedly execute a single statement or group of statements as long as the condition is true. The reason it is called a ‘loop’ is because the code statements are looped through over and over again until the condition is no longer met.

The general format of a while loop is:

while test:
    code statements
else:
    final code statements

Let’s look at a few simple while loops in action.

Python List Comprehensions

Python is famous for allowing you to write code that’s elegant, easy to write, and almost as easy to read as plain English. One of the language’s most distinctive features is the list comprehension, which you can use to create powerful functionality within a single line of code. However, many developers struggle to fully leverage the more advanced features of a list comprehension in Python. Some programmers even use them too much, which can lead to code that’s less efficient and harder to read.

By the end of this tutorial, you’ll understand the full power of Python list comprehensions and how to use their features comfortably.

Benefits of Using List Comprehensions

  • One main benefit of using a list comprehension in Python is that it’s a single tool that you can use in many different situations.
  • In addition to standard list creation, list comprehensions can also be used for mapping and filtering. You don’t have to use a different approach for each scenario.
  • List comprehensions are also more declarative than loops, which means they’re easier to read and understand.

Every list comprehension in Python includes three elements:

  1. expression is the member itself, a call to a method, or any other valid expression that returns a value. In the example above, the expression i * i is the square of the member value.
  2. member is the object or value in the list or iterable. In the example above, the member value is i.
  3. iterable is a list, set, sequence, generator, or any other object that can return its elements one at a time. In the example above, the iterable is range(10).

Let’s explore the concept through jupyter notebook.

Python Function

A function is a block of code which only runs when it is called. Python allows us to divide a large program into the basic building blocks known as function.

The function contains the set of programming statements enclosed by {}. A function can be called multiple times to provide reusability and modularity to the python program.

Python provide us various inbuilt functions like range() or print(). Although, the user can create its functions which can be called user-defined functions.

Advantage of Functions in Python-

There are the following advantages of Python functions:

  • By using functions, we can avoid rewriting same logic/code again and again in a program.
  • We can call python functions any number of times in a program and from any place in a program.
  • We can track a large python program easily when it is divided into multiple functions.
  • Reusability is the main achievement of python functions.
  • However, Function calling is always overhead in a python program.

Creating a function –

In python, we can use def keyword to define the function. The syntax to define a function in python is given below.

def my_function():
function-suite
return <Expression>

Function calling –

In python, a function must be defined before the function calling otherwise the python interpreter gives an error. Once the function is defined, we can call it from another function or the python prompt. To call the function, use the function name followed by the parentheses.

A simple function that prints the message “Hello Word” is given below.

def hello_world():  
    print("hello world")  
hello_world()   

output-
hello world

Parameters in function –

The information into the functions can be passed as the parameters. The parameters are specified in the parentheses. We can give any number of parameters, but we have to separate them with a comma.

Consider the following example which contains a function that accepts a string as the parameter and prints it.

Example-

#python function to calculate the sum of two variables   
#defining the function  
def sum (a,b):  
    return a+b;  
  
#taking values from the user  
a = int(input("Enter a: "))  
b = int(input("Enter b: "))  
  
#printing the sum of a and b  
print("Sum = ",sum(a,b))  

Output-

Enter a: 10
Enter b: 20
Sum = 30

The return Statement –

The statement return [expression] exits a function, optionally passing back an expression to the caller. A return statement with no arguments is the same as return None.

All the above examples are not returning any value. You can return a value from a function as follows

# Function definition is here
def sum( arg1, arg2 ):
   
# Add both the parameters and return them."
   total = arg1 + arg2
   print "Inside the function : ", total
   return total;

# Now you can call sum function
total = sum( 10, 20 );
print "Outside the function : ", total 

Output-
Inside the function :  30
Outside the function :  30

Types of arguments in the function –

There may be several types of arguments which can be passed at the time of function calling.

  1. Required arguments
  2. Keyword arguments
  3. Default arguments
  4. Variable-length arguments

Required Arguments-

Till now, we have learned about function calling in python. However, we can provide the arguments at the time of function calling. As far as the required arguments are concerned, these are the arguments which are required to be passed at the time of function calling with the exact match of their positions in the function call and function definition.

If either of the arguments is not provided in the function call, or the position of the arguments is changed, then the python interpreter will show the error.

Consider the following example.

#the argument name is the required argument to the function func   
def func(name):  
    message = "Hi "+name;  
   return message;  
name = input("Enter the name?")  
print(func(name)) 

Output-
Enter the name?John
Hi John

Keyword arguments –

Python allows us to call the function with the keyword arguments. This kind of function call will enable us to pass the arguments in the random order.

The name of the arguments is treated as the keywords and matched in the function calling and definition. If the same match is found, the values of the arguments are copied in the function definition.

Consider the following example.

#function func is called with the name and message as the keyword arguments  
def func(name,message):  
    print("printing the message with",name,"and ",message)  
func(name = "John",message="hello") #name and message is copied with the   values John and hello respectively  

Output-
Printing the message with John and  hello

Variable length Arguments –

In the large projects, sometimes we may not know the number of arguments to be passed in advance. In such cases, Python provides us the flexibility to provide the comma separated values which are internally treated as tuples at the function call.

However, at the function definition, we have to define the variable with * (star) as *<variable – name >.

Consider the following example.

def printme(*names):  
    print("type of passed argument is ",type(names))  
    print("printing the passed arguments...")  
    for name in names:  
        print(name)  
printme("john","David","smith","nick")  

Output:
type of passed argument is  <class 'tuple'>
printing the passed arguments...
john
David
smith
nick

Default Arguments –

Python allows us to initialize the arguments at the function definition. If the value of any of the argument is not provided at the time of function call, then that argument can be initialized with the value given in the definition even if the argument is not specified at the function call.

Example-

def printme(name,age=22):  
    print("My name is",name,"and age is",age)  
printme(name = "john") #the variable age is not passed into the function however the default value of age is considered in the function 

Output:
My name is john and age is 22

Scope of variables –

The scopes of the variables depend upon the location where the variable is being declared. The variable declared in one part of the program may not be accessible to the other parts.

In python, the variables are defined with the two types of scopes.

  1. Global variables
  2. Local variables

Variables that are defined inside a function body have a local scope, and those defined outside have a global scope.

This means that local variables can be accessed only inside the function in which they are declared, whereas global variables can be accessed throughout the program body by all functions. When you call a function, the variables declared inside it are brought into scope. Following is a simple

example –

total = 0; # This is global variable.
# Function definition is here
def sum( arg1, arg2 ):
   # Add both the parameters and return them."
   total = arg1 + arg2; # Here total is local variable.
   print "Inside the function local total : ", total
   return total;

# Now you can call sum function
sum( 10, 20 );
print "Outside the function global total : ", total 

When the above code is executed, it produces the following result −

Inside the function local total :  30
Outside the function global total :  0

Recursion:

Python also accepts function recursion, which means a defined function can call itself.

Recursion is a common mathematical and programming concept. It means that a function calls itself. This has the benefit of meaning that you can loop through data to reach a result.

The developer should be very careful with recursion as it can be quite easy to slip into writing a function which never terminates, or one that uses excess amounts of memory or processor power. However, when written correctly recursion can be a very efficient and mathematically-elegant approach to programming.

In this example, tri_recursion() is a function that we have defined to call itself (“recurse”). We use the k variable as the data, which decrements (-1) every time we recurse. The recursion ends when the condition is not greater than 0 (i.e. when it is 0).

To a new developer it can take some time to work out how exactly this works, best way to find out is by testing and modifying it.

Example –

deftri_recursion(k):

  if(k > 0):

    result = k + tri_recursion(k - 1)

    print(result)

  else:

    result = 0

  return result



print("\n\nRecursion Example Results")

tri_recursion(6)

Output-

Recursion Example Results:
1
3
6
10
15
21

In the next chapter we will go through several function examples

Reference

W3school

Tutorialpoint

Javapoint

Principal component Analysis(PCA)-Theory

In real world scenario data analysis tasks involve complex data analysis i.e. analysis for multi-dimensional data. We analyse the data and try to find out various patterns in it.

Here dimensions represents your data point x, As the dimensions of data increases, the difficulty to visualize it and perform computations on it also increases. So, how to reduce the dimensions of a data

  • Remove the redundant dimension
  • Only keep the most important dimension

To reduce dimensions of the data we use principle component analysis. Before we deep dive in working of PCA, lets understand some key terminology, which will use further.

Variance:

It is a measure of the variability or it simply measures how spread the data set is. Mathematically, it is the average squared deviation from the mean score. We use the following formula to compute variance var(x).

Covariance: It is a measure of the extent to which corresponding elements from two sets of ordered data move in the same direction. Formula is shown above denoted by cov(x,y) as the covariance of x and y.

Here, xi is the value of x in ith dimension. x bar and y bar denote the corresponding mean values.
One way to observe the covariance is how interrelated two data sets are.

Positive, negative and zero covariance:

Positive covariance means X and Y are positively related i.e. as X increases Y also increases. Negative covariance depicts the exact opposite relation. However zero covariance means X and Y are not related.

Eigenvectors and Eigenvalues:

To better understand these concepts, let’s consider the following situation. We are provided with 2-dimensional vectors v1, v2, …, vn. Then, if we apply a linear transformation T (a 2×2 matrix) to our vectors, we will obtain new vectors, called b1, b2,…,bn.

Some of them (more specifically, as many as the number of features), though, have a very interesting property: indeed, once applied the transformation T, they change length but not direction. Those vectors are called eigenvectors, and the scalar which represents the multiple of the eigenvector is called eigenvalue

Thus, each eigenvector has a correspondent eigenvalue.

When should I use PCA:

  1. If you want to reduce the number of variables, but aren’t able to identify variables to completely remove from consideration?
  2. If you want to ensure your variables are independent from each other.
  3. To avoid overfitting your model.
  4. If you are comfortable making your independent variable less interpretable.

Background:

  • PCA is an unsupervised statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.
  • It is also known sometimes as a general factor analysis.
  • Where regression determines a line of best fit to a data set, factor analysis determines several orthogonal lines of best fit to the data set.
  • Orthogonal means “at right angles”.
    • Actually the lines are perpendicular to each other in n-dimensional space.
  • N-dimensional Space is the variable sample space.
    • There are as many dimensions as there are variables, so in a data set with 4 variables the sample space is 4-dimensional.
  • Here we have some data plotted along two features, x and y.
  • We can add an orthogonal line. Now we can begin to understand the components!
  • Components are a linear transformation that chooses a variable system for the data set such that the greatest variance of the data set comes to lie on the first axis.
  • The second greatest variance on the second axis, and so on.
  • This process allows us to reduce the number of variables used in an analysis.
  • We can continue this analysis into higher dimensions.
  • If we use this technique on a data set with a large number of variables, we can compress the amount of explained variation to just a few components.
  • The most challenging part of PCA is interpreting the components.

For our work with Python, we’ll walk through an example of how to perform PCA with scikit learn. We usually want to standardize our data by some scale for PCA, so we’ll cover how to do this as well.

PCA Algorithm

  • Calculate the covariance matrix X of data points.
  • Calculate eigenvectors and corresponding eigenvalues.
  • Sort the eigen vectors according to their eigenvalues in decreasing order.
  • Choose first k eigenvectors and that will be the new k dimensions.
  • Transform the original n dimensional data points into k dimensions.

Advantages of PCA

  1. Removes Correlated Features: In a real world scenario, this is very common that you get thousands of features in your dataset. You cannot run your algorithm on all the features as it will reduce the performance of your algorithm and it will not be easy to visualize that many features in any kind of graph. So, you MUST reduce the number of features in your dataset. You need to find out the correlation among the features (correlated variables). Finding correlation manually in thousands of features is nearly impossible, frustrating and time-consuming. PCA does this for you efficiently.
  2. Improves Algorithm Performance: With so many features, the performance of your algorithm will drastically degrade. PCA is a very common way to speed up your Machine Learning algorithm by getting rid of correlated variables which don’t contribute in any decision making. The training time of the algorithms reduces significantly with less number of features. So, if the input dimensions are too high, then using PCA to speed up the algorithm is a reasonable choice.
  3. Improves Visualization: It is very hard to visualize and understand the data in high dimensions. PCA transforms a high dimensional data to low dimensional data (2 dimension) so that it can be visualized easily. We can use 2D Scree Plot to see which Principal Components result in high variance and have more impact as compared to other Principal Components.

Disadvantages of PCA

  1. Independent variables become less interpretable: After implementing PCA on the dataset, your original features will turn into Principal Components. Principal Components are the linear combination of your original features. Principal Components are not as readable and interpretable as original features.
  2. Data standardization is must before PCA: You must standardize your data before implementing PCA, otherwise PCA will not be able to find the optimal Principal Components. For instance, if a feature set has data expressed in units of Kilograms, Light years, or Millions, the variance scale is huge in the training set. If PCA is applied on such a feature set, the resultant loadings for features with high variance will also be large. Hence, principal components will be biased towards features with high variance, leading to false results.
  3. Information Loss: Although Principal Components try to cover maximum variance among the features in a dataset, if we don’t select the number of Principal Components with care, it may miss some information as compared to the original list of features.

Reference:

Medium

PCA with python

In this lecture we will implement PCA algorithm through Python. We will also see how to reduce features in the data set.

About Minist Data Set

The MNIST dataset (Modified National Institute of Standards and Technology database) is a large dataset of handwritten digits that is commonly used for training various image processing systems. Available on kaggle (https://www.kaggle.com/c/digit-recognizer/data)

The database is also widely used for training and testing in the field of machine learning.

  • The dataset consists of pair, “handwritten digit image” and “label”. Digit ranges from 0 to 9, meaning 10 patterns in total. handwritten digit image: This is gray scale image with size 28 x 28 pixel.
  • label : This is actual digit number this handwritten digit image represents. It is either 0 to 9.

Our Objective

In this data sets around 42000 rows and 784 columns are available, we will try to reduce features from 784, so that we will have less features and maximum information.

Let’s explore the concept through jupyter notebook.