Python for Data Science

Nov 2, 2019

NumPy-Indexing and Selection

By Datasciencelovers in Python for Data Science Tag data analysis, indexing and selection, numpy, numpy indexing, numpy slicing

Indexing and Slicing are the important operations that you need to be familiar with when working with Numpy arrays. You can use them when you would like to work with a subset of the array. This tutorial will take you through Indexing and Slicing on multi-dimensional arrays.

Please refer to following .ipynb file for numpy implementation through python.

Nov 2, 2019

NumPy-Operations

By Datasciencelovers in Python for Data Science Tag data analysis, numpy, numpy operations

In this chapter we are going to see how various operation we can perform on NumPy array. Operation such as addition, subtraction, multiplication, division of two matrices.

Please go through the .ipynb below, it will give you more idea how we can do python operation with python.

Nov 3, 2019

Pandas-Introduction

By Datasciencelovers in Python for Data Science Tag data analysis, data science, pandas, python

What is pandas?

Pandas is a python open source library which allow you to perform data manipulation, analysis and cleaning. It is build on top of NumPy . It is a most important library for data science.

According to Wikipedia “Pandas is derived from the term “panel data”, an econometrics term for data sets that include observations over multiple time periods for the same individuals.”

Why Pandas?

Following are the advantages of pandas for Data Scientist.

Easily handling missing data.
It provides an efficient way to slicing and data wrangling.
It is helpful to merge, concatenate or reshape the data.
It has includes a powerful time series tool to work with.

How to install Pandas?

To install python pandas go to command line/terminal and type “pip install pandas” or else if you have anaconda install in the system just type in “conda install pandas”. Once the installation is completed, go to your IDE(Jupyter) and simply import it by typing “import pandas as pd”.

In next chapter we will learn about pandas Series.

Nov 3, 2019

Pandas–Series

By Datasciencelovers in Python for Data Science Tag data analysis, data science, pandas, pandas series

The first main data type we will learn about for pandas is the Series data type.

A series is a one-dimensional data structure. A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn’t need to hold numeric data, it can hold any arbitrary Python Object.

10

23

56

17

52

61

73

90

26

72

So important point to remember for pandas series is:

Homogeneous data
Size Immutable
Values of Data Mutable

Let’s import Pandas and explore the Series object with the help of python.

Nov 3, 2019

Pandas-DataFrame

By Datasciencelovers in Python for Data Science Tag data analysis, data science, dataframe, pandas

A data frame is a standard way to store data and data is aligned in a tabular fashion in rows and columns.

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index Let us assume that we are creating a data frame with student’s data, it will look something like this.

A pandas DataFrame can be created using the following constructor

pandas.DataFrame( data, index, columns, dtype, copy)

Data – data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
Index – For the row labels, the Index to be used for the resulting frame is Optional Default np.arrange(n) if no index is passed.
Columns – For column labels, the optional default syntax is – np.arrange(n). This is only true if no index is passed.
dtype – Data type of each column.
Copy – This command (or whatever it is) is used for copying of data, if the default is False.

Creations of DataFrame:

A pandas DataFrame can be created using various inputs like list, dict, series, numpy ndarray, another dataframe.

Let’s explore DataFrame with python in jupyter notebook.

Nov 3, 2019

Pandas-Data input and Output

By Datasciencelovers in Python for Data Science Tag data analysis, data science, dataframe, pandas, read file, write file

To do data analysis successfully, a Data analyst should know how to read and write different file format such as .CSV, .XLS, .HTML, JASON etc.

DataFrame has a Reader and a Writer function. The Reader function allows you to read the different data formats while the Writer function enables you to save the data in a particular format.

Below is a table containing available readers and writers.

Following notebook is the reference code for getting input and output, pandas can read a variety of file types using it’s pd.read_ methods. Let’s take a look at the most common data types:

Nov 4, 2019

pandas-Indexing and selecting Data

By Datasciencelovers in Python for Data Science Tag data frame, indexing and selection, pandas

Indexing in pandas is basically selecting particular rows and columns of data from a DataFrame. Indexing can help you to select all the rows and the entire column or you can also select some of the rows and few columns. It is also known as subset selection.

Let’s explore the concept through example.

Nov 4, 2019

Pandas-Groupby

By Datasciencelovers in Python for Data Science Tag data analysis, dataframe, groupby, pandas

Groupby basically split the data into different groups depending upon the criteria.

Groupby function can be use for following operations on the original object.
They are −

Splitting the Object
Applying a function
Combining the results

Let’s understand groupby by following python code

Nov 6, 2019

Pandas-Merging/Joining

By Datasciencelovers in Python for Data Science Tag data analysis, merge, pandas join, pandas merging

Merging two datasets is the process of bringing two datasets together into one and aligning the rows from each based on common attributes or columns.

Data Merging:

The pandas.merge() method joins two data frames and align the rows from each other by a “key” variable that contains unique values.

In pandas there are separate “Merge” and “Join” method but both do the similar work.

With pandas.merge(), you can only combine 2 data frames at a time. If you have more than 2 data frames to merge, you will have to use this method multiple times.

Let’s see pandas.merge() and some of the available arguments to pass. Here is the general structure and the recommended bare minimum arguments to pass.

pandas.merge(left_data_frame, right_data_frame, on= , how= )

left is one of the data frames.
right is the other data frame
on is the variable, a.k.a the column, on which you want to merge on. This is the keyvariable and has to be the same name in both data frames.
If the data frames has different column names for the merge variables then you can also use left_on and right_on.
- left_on is the variable name in the left data frame to be merged on.
- right_on is the variable name in the left data frame to be merged on.

how is where you pass the options of merging. These include:

“inner”, where only the observations with matching values based on the “on” argument that is passed are kept.
“left”, where all observations will be kept from the data frame in the left argument regardless if there is matching values with the data frame in the right argument. observations that do not have a matching value based on the on argument in the “right” data frame will be discarded.
“right”, where all observations will be kept from the data frame in the right argument regardless if there is matching values with the data frame in the left argument. Observations that do not have a matching value based on the on argument in the “left” data frame will be discarded.
“outer”, all observations will be kept from both data frames.

Now let’s understand the concepts with example.

We are going to use following data set for operation.

user_usage.csv – This dataset containing users monthly mobile usage details.
user_device.csv – This data set is containing details of an individual “use” of the system, with dates and device information.
android_devices.csv – This dataset with device and manufacturer data, which lists all Android devices and their model code.

Let’s understand the concept with coding..

Nov 6, 2019

Pandas-concat() method

By Datasciencelovers in Python for Data Science Tag concatination, data analysis, data frame, pandas concat

The pandas.concat() method combines two data frames by stacking them on top of each other. If one of the data frames does not contain a variable column or variable rows, observations in that data frame will be filled with NaN values.

new_concat_dataframe = pd.concat([dataframe1, dataframe2], ignore_index= “true”)

Note – If you wish for a new index starting at 0, pass the “ignore_index” argument as “true”.

Let’s understand Concat() function through coding.

Category Python for Data Science

What is pandas?

Why Pandas?

How to install Pandas?

Creations of DataFrame:

Data Merging:

Now let’s understand the concepts with example.

Note – If you wish for a new index starting at 0, pass the “ignore_index” argument as “true”.