Tag dataframe

Pandas-DataFrame

A data frame is a standard way to store data and data is aligned in a tabular fashion in rows and columns.

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index Let us assume that we are creating a data frame with student’s data, it will look something like this.

A pandas DataFrame can be created using the following constructor

pandas.DataFrame( data, index, columns, dtype, copy)

  • Data –  data takes various forms like ndarray, series, map, lists, dict, constants and also another DataFrame.
  • Index – For the row labels, the Index to be used for the resulting frame is Optional Default np.arrange(n) if no index is passed.
  • Columns – For column labels, the optional default syntax is – np.arrange(n). This is only true if no index is passed.
  • dtype – Data type of each column.
  • Copy – This command (or whatever it is) is used for copying of data, if the default is False.

Creations of DataFrame:

A pandas DataFrame can be created using various inputs like list, dict, series, numpy ndarray, another dataframe.

Let’s explore DataFrame with python in jupyter notebook.

Pandas-Data input and Output

To do data analysis successfully, a Data analyst should know how to read and write different file format such as .CSV, .XLS, .HTML, JASON etc.

DataFrame has a Reader and a Writer function. The Reader function allows you to read the different data formats while the Writer function enables you to save the data in a particular format.

Below is a table containing available readers and writers.

Following notebook is the reference code for getting input and output, pandas can read a variety of file types using it’s pd.read_ methods. Let’s take a look at the most common data types:

Pandas–Operations

There are various useful pandas operation available which is really handy in data analysis.

In this lecture we are going to cover following topics:

  • How to find unique values.
  • How to select data with multiple conditions?
  • How to apply function on a particular column?
  • How to remove column?
  • How to get column and index name?
  • Sorting by column
  • Checking null value
  • Filling in NaN values with something else
  • Pivot table creation
  • Change column name in pre-existing data frame.
  • .map() and .apply() function
  • Get column name in the data frame
  • Change order of column in the data frame
  • Add new column in existing data frame
  • Data type conversion
  • Date and time conversion

Let’s see all these operations in python