You are currently viewing Aggregation, grouping, and filtering data with Pandas

Aggregation, grouping, and filtering data with Pandas

Aggregation, grouping, and filtering of data are essential operations in data analysis. Pandas provide several functions to perform these operations.

Here are some examples

Aggregation

# load the data into a Pandas DataFrame
df = pd.read_csv('sales_data.csv')

# calculate the total revenue
total_revenue = df['quantity'] * df['price']
print(total_revenue.sum())

# calculate the average price per product type
avg_price = df.groupby('product_type')['price'].mean()
print(avg_price)

Grouping

# load the data into a Pandas DataFrame
df = pd.read_csv('sales_data.csv')

# group the data by product type and calculate the total revenue
revenue_by_type = df.groupby('product_type')['quantity', 'price'].sum()
print(revenue_by_type)

# group the data by month and calculate the average quantity
avg_quantity_by_month = df.groupby(df['order_date'].dt.month)['quantity'].mean()
print(avg_quantity_by_month)

Filtering

# load the data into a Pandas DataFrame
df = pd.read_csv('sales_data.csv')

# filter the data to include only rows with a quantity greater than 10
df_filtered = df[df['quantity'] > 10]

# filter the data to include only rows with a price between $10 and $20
df_filtered = df[(df['price'] >= 10) & (df['price'] <= 20)]

These are just a few examples of the many aggregation, grouping, and filtering techniques that can be performed using Pandas. The specific techniques used will depend on the characteristics of the data and the goals of the analysis.