The tabular structures in the Python pandas library are also termed DataFrames.
These structures represent the rows and columns using labels. A row label is called an index, and a column label is called a column index/header.
DataFrames are created by filtering and manipulating large datasets. But these processed DataFrames have the same index as their original datasets. This calls for resetting the index of the DataFrame.
Resetting the index is also helpful:
- In the pre-processing stage, when dropping missing values or filtering data. Besides making the DataFrame smaller, it also jumbles the index.
- When the index labels don't provide much information about the data.
- When the index needs to be treated as a common DataFrame column.
You can use the reset_index() method in pandas to reset the index in a DataFrame. Doing this converts the original index of the DataFrame into a column.
We'll walk you through using the method to reset Pandas DataFrames in this post.
Basics of pandas.reset_index
Besides resetting the index of a DataFrame to a default one, the reset_index() method is also helpful in removing one or more levels of a DataFrame with a MultiIndex.
The syntax of the method is:
pandas.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill= ”)
Where the parameters have the following meanings:
Parameter |
Acceptable Data Types |
Default Value |
Description |
level |
int, str, tuple, list |
None |
Removes all levels by default. If levels are mentioned, it removes those. |
drop |
bool |
False |
Adds the old index into the DataFrame by default. It does not add it if value is changed to True. |
inplace |
bool |
False |
Carries out the changes in the current DataFrame object |
col_level |
int, str |
0 |
Determines which level the labels need to be inserted (if multiple levels are involved). The labels are inserted into the first level (0) by default. |
col_fill |
object |
" |
It determines how the levels are named if columns have multiple levels. If the value is None, the index name is repeated. |
The .reset_index() method returns None if inplace = True. Otherwise, the DataFrame with the new index is returned.
Resetting the Index with .reset_index()
Using the .reset_index() method to reset the index is as simple as chaining the method to the DataFrame object.
You must begin by creating a DataFrame, like so:
import pandas as pd import numpy as np import random # We're making a DataFrame with an initial index. It represents marks out of 50. df = pd.DataFrame({ 'Global Finance': [44, 29, 50, 17, 36], 'Politics': [31, 43, 21, 42, 17], 'Family Enterprise': [30, 30, 16, 46, 41] }, index=['Leonard', 'Brayan', 'Wendy', 'Nathaniel', 'Edwin'] ) df
In tabular form, the DataFrame would look like this:
Global Finance |
Politics |
Family Enterprise |
|
Leonard |
44 |
31 |
30 |
Brayan |
29 |
43 |
30 |
Wendy |
50 |
21 |
16 |
Nathaniel |
71 |
42 |
46 |
Edwin |
36 |
17 |
41 |
Resetting the index is now simply a matter of calling .reset_index(), like so:
df.reset_index()
The index will apply to the DataFrame as a new column named "index." It begins with zero and continues along the length of the DataFrame. It'll look like this:
index |
Global Finance |
Politics |
Family Enterprise |
|
0 |
Leonard |
44 |
31 |
30 |
1 |
Brayan |
29 |
43 |
30 |
2 |
Wendy |
50 |
21 |
16 |
3 |
Nathaniel |
71 |
42 |
46 |
4 |
Edwin |
36 |
17 |
41 |
Persisting the Change to the DataFrame
The output you see above indicates that the DataFrame's index has been changed. However, if you run "df," you will see that the changes don't persist, and the output does not have an index.
If you want the changes to persist, you must use the "inplace" parameter, setting its value to "True." Here's what running it looks like:
df.reset_index(inplace=True) df
The output of this code will be:
index |
Global Finance |
Politics |
Family Enterprise |
|
0 |
Leonard |
44 |
31 |
30 |
1 |
Brayan |
29 |
43 |
30 |
2 |
Wendy |
50 |
21 |
16 |
3 |
Nathaniel |
71 |
42 |
46 |
4 |
Edwin |
36 |
17 |
41 |
Resetting an Index in a DataFrame with a Named Index
If a DataFrame has a named index, that is, an index with a name, then resetting the index will lead to the named index in question becoming a column name in the DataFrame.
Let's see how this works by first creating a DataFrame with a named index:
namedIndex = pd.Series(['Leonard', 'Brayan', 'Wendy', 'Nathaniel', 'Edwin'], name='initial_index') # Creating a series and giving it a name df = pd.DataFrame({ 'Global Finance': [44, 29, 50, 17, 36], 'Politics': [31, 43, 21, 42, 17], 'Family Enterprise': [30, 30, 16, 46, 41] }, index=['Leonard', 'Brayan', 'Wendy', 'Nathaniel', 'Edwin'] ) # Creating the DataFrame, then passing the named series as the index df
This DataFrame would look like this:
Global Finance |
Politics |
Family Enterprise |
|
initial_index |
|||
Leonard |
44 |
31 |
30 |
Brayan |
29 |
43 |
30 |
Wendy |
50 |
21 |
16 |
Nathaniel |
71 |
42 |
46 |
Edwin |
36 |
17 |
41 |
Executing df.reset_index puts the "initial_index" entry in the table a column name in the DataFrame, like so:
initial_index |
Global Finance |
Politics |
Family Enterprise |
|
0 |
Leonard |
44 |
31 |
30 |
1 |
Brayan |
29 |
43 |
30 |
2 |
Wendy |
50 |
21 |
16 |
3 |
Nathaniel |
71 |
42 |
46 |
4 |
Edwin |
36 |
17 |
41 |
Resetting a Multi-Level Index in a DataFrame
Let's take a look at a multi-level index in a DataFrame:
# Creating a multi-level index newIndex = pd.MultiIndex.from_tuples( [('BBA', 'Leonard'), ('BBA', 'Brayan'), ('MBA', 'Wendy'), ('MBA', 'Nathaniel'), ('BSC', 'Edwin') ], names= ['Branch', 'Name']) # Creating multi-level columns columns = pd.MultiIndex.from_tuples( [('subject1', 'Global Finance'), ('subject2', 'Politics'), ('subject3', 'Family Enterprise') ]) df = pd.DataFrame([ (45, 31, 30), (29, 21, 30), (50, 21, 16), (17, 42, 46), (36, 17, 41) ], index=newIndex, columns=columns) df
The output of which is:
subject1 |
subject2 |
subject3 |
||
|
Global Finance |
Politics |
Family Enterprise |
|
Branch |
Name |
|||
BBA |
Leonard |
44 |
31 |
30 |
Brayan |
29 |
43 |
30 |
|
MBA |
Wendy |
50 |
21 |
16 |
Nathaniel |
71 |
42 |
46 |
|
BSC |
Edwin |
36 |
17 |
41 |
The Branch level maps to multiple rows, making this a multi-level index. Applying the .reset_index() function merges the levels as columns in the DataFrame.
So, running "df.reset_index()" will do this:
subject1 |
subject2 |
subject3 |
|||
Global Finance |
Politics |
Family Enterprise |
|||
Branch |
Name |
||||
0 |
BBA |
Leonard |
44 |
31 |
30 |
1 |
BBA |
Brayan |
29 |
43 |
30 |
2 |
MBA |
Wendy |
50 |
21 |
16 |
3 |
MBA |
Nathaniel |
71 |
42 |
46 |
4 |
BSC |
Edwin |
36 |
17 |
41 |
You can also reset the index at the Branch level with the level parameter like so:
df.reset_index(level='Branch')
Which produces the output:
subject1 |
subject2 |
subject3 |
||
Global Finance |
Politics |
Family Enterprise |
||
Name |
Branch |
|||
Leonard |
BBA |
44 |
31 |
30 |
Brayan |
BBA |
29 |
43 |
30 |
Wendy |
MBA |
50 |
21 |
16 |
Nathaniel |
MBA |
71 |
42 |
46 |
Edwin |
BSC |
36 |
17 |
41 |