There are some operations with pandas that I often forget. This site exists exactly for that: to remind me how to perform these tasks. In this post, I’ll cover some very basic pandas operations that I often forget, including dropping columns, reordering columns, and changing the names of columns.

Pandas

I won’t waste time explaining pandas 🐼, the powerful Python library for data processing. Pandas can handle many complex operations and is a must-have tool for anyone working with data.

At the time of writing this, I am much more proficient with R than pandas, which might explain why I repeatedly forget these simple operations.

Our Base DataFrame

import pandas as pd

items = {
    "color": ["red", "blue", "yellow", "black", "white"],
    "name": ["rose", "sky", "yolk", "ebony", "snow"],
    "price": [50, 1200, 3, 40, 2],
    "date_created": ["1999-10-30 05:00:00", 
                     "2002-08-21 08:23:00", 
                     "2003-05-15 14:19:00", 
                     "2006-11-29 11:21:00", 
                     "2018-02-12 23:23:00"
                     ],
    "useless_column": ["data", "data", "data", "data", "data"],
    "id": ["1999-ROS-01",
           "2002-SKY-01",
           "2003-YOL-03",
           "2006-EBO-01",
           "2018-SNO-03"
           ]
}

items_df = pd.DataFrame(items)

Our base dataframe is a simple table containing miscellaneous data about items. There’s even an unnecessary column, which we’ll remove shortly.

Dropping Columns

Let’s get rid of the useless column. The inplace=True parameter is self-explanatory: it removes the column in place rather than returning a new object. This way, the original dataframe is directly updated.

items_df.drop(columns=["useless_column"], inplace=True)

Reordering Columns

Sometimes, I need to export dataframes to tables or CSV files where the ordering of columns matters for end users. Reordering columns is as simple as passing a list of column names in the desired order:

name_order = ["id", "date_created", "name", "color", "price"]
items_df = items_df[name_order]

Changing the Names of Columns

Another operation I often need is renaming columns. To rename columns, pass a dictionary in the format {"old_name": "new_name"} to the rename method.

items_df.rename(columns={"id": "sku", "color": "Color"}, inplace=True)