![]() |
In the modern programming landscape, few libraries have revolutionized the work of developers as much as pandas. If you work in data analysis, machine learning, automation, or simply manipulating structured information, it's almost impossible not to come across its fundamental concept: the DataFrame.
In this article, I'll explain what it is, how it works, and why it has become an indispensable tool in the world of IT.
📌 What is a DataFrame?
A DataFrame is a two-dimensional data structure, similar to an Excel spreadsheet or SQL table. Think of it as a data array organized into rows and columns, where each column can contain a different data type (strings, numbers, dates, Booleans, etc.).
In the world of pandas, a DataFrame is powerful because it combines:
- the ease of use of tables
- the speed of optimized C data structures
- the flexibility of Python.
import pandas as pd
df = pd.DataFrame({
"Name": ["Anna", "Mark", "Luke"],
"Age": [23, 31, 28],
"City": ["Rome", "Milan", "Turin"]
})
# Display the DataFrame in the console
print(df)
Output:
Name Age City
0 Anna 23 Rome
1 Mark 31 Milan
2 Luke 28 Turin
✅ Can rows be named?
Yes — rows in a pandas DataFrame can have labels: this is the index. By default, the index is numeric (0, 1, 2, ...), but you can set and manipulate it as you like. Below are some practical examples.
1. Set custom names on creation
import pandas as pd
df = pd.DataFrame(
{
"Name": ["Anna", "Mark", "Luke"],
"Age": [23, 31, 28],
},
index=["row_1", "row_2", "row_3"]
)
# Display the DataFrame in the console
print(df)
Output:
Name Age
row_1 Anna 23
row_2 Mark 31
row_3 Luke 28
2. Rename lines after creation
# Replace the whole index
df.index = ["a", "b", "c"]
print(df)
# Or rename selected index labels
df.rename(index={"a": "first", "b": "second"}, inplace=True)
print(df)
Output after full replacement:
Name Age
a Anna 23
b Mark 31
c Luke 28
Output after rename:
Name Age
first Anna 23
second Mark 31
c Luke 28
3. Give the index itself a name
# Give a name to the index axis
df.index.name = "ID"
print(df)
Output:
ID Name Age
first Anna 23
second Mark 31
c Luke 28
4. Set the index from a column (e.g. from CSV/Excel/SQL)
# Suppose you read a CSV that has an 'id' column you want to use as index
# df = pd.read_csv("data.csv")
# Or set index from existing DataFrame column:
df2 = pd.DataFrame({
"id": ["u1", "u2", "u3"],
"Name": ["Alice", "Bob", "Carol"],
"Age": [29, 34, 41]
})
df2.set_index("id", inplace=True)
print(df2)
Output:
id Name Age
u1 Alice 29
u2 Bob 34
u3 Carol 41
5. Multi-index (hierarchical indexes)
If you want labels on multiple levels (useful for hierarchical data), you can create a MultiIndex:
arrays = [
["group1", "group1", "group2", "group2"],
["a", "b", "a", "b"]
]
index = pd.MultiIndex.from_arrays(arrays, names=("Group", "Label"))
df_multi = pd.DataFrame(
{"Value": [10, 20, 30, 40]},
index=index
)
print(df_multi)
Output:
Group Label Value
group1 a 10
b 20
group2 a 30
b 40
In summary: you can set rows on creation, replace it, rename it, derive it from existing columns, and even use hierarchical indexes.
🔍 Why is this so important?
DataFrames have become the standard for data manipulation in Python thanks to:
✔ Ease of manipulation
filtered_df = df[df["Age"] > 25]
print(filtered_df)
Output:
Name Age City
1 Mark 31 Milan
2 Luke 28 Turin
✔ Fast and optimized operations
✔ Support for heterogeneous formats
📄 1. CSV
Reading:
df = pd.read_csv("data.csv")
print(df)
Writing:
df.to_csv("output.csv", index=False)
📊 2. Excel
Reading:
df = pd.read_excel("data.xlsx")
print(df)
Writing:
df.to_excel("output.xlsx", index=False)
🧱 3. JSON
Reading:
df = pd.read_json("data.json")
print(df)
Writing:
df.to_json("output.json", orient="records", indent=2)
🗄️ 4. SQL
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("sqlite:///mydb.sqlite")
# Read a table from SQL database
df = pd.read_sql("SELECT * FROM users", engine)
print(df)
# Write to SQL
df.to_sql("users_output", engine, index=False, if_exists="replace")
Example Output:
id name age city
0 1 Anna 23 Rome
1 2 Mark 31 Milan
2 3 Luke 28 Turin
✔ Seamless integration with machine learning
🔧 The most common operations
1. Column selection
print(df["Name"])
2. Filtering
print(df[df["City"] == "Rome"])
3. Sorting
print(df.sort_values("Age"))
4. Adding new columns
df["BirthYear"] = 2025 - df["Age"]
print(df)
5. Grouping (Group by)
print(df.groupby("City")["Age"].mean())
📈 When to use DataFrames?
DataFrames shine in all contexts where structured data is manipulated:
-
Statistical analysis
-
Data cleaning
-
ETL (Extract, Transform, Load)
-
Automated reporting
-
Machine learning
-
Rapid prototyping pipelines data
Whether you're a developer, an analyst, a data scientist, or simply an enthusiast, DataFrames allow you to transform raw datainto useful information — quickly and elegantly.
🚀 Conclusion
The pandas DataFrame is more than just a data structure: it is a true ally in data analysis and manipulation.
Its mix of simplicity, speed, and versatility makes it an indispensable standard in the world of data science and beyond.
Follow me #techelopment
Official site:www.techelopment.it
facebook:Techelopment
instagram: @techelopment
X: techelopment
Bluesky: @techelopment
telegram: @techelopment_channel
whatsapp: Techelopment
youtube: @techelopment
