The task:
Emulate SQL
SELECT
with a pandas dataframe
E.g.
SELECT *
FROM table
WHERE column_name = some_value;
The code in this lesson was written in a Juypter Notebook. This means it can be run sequentially using IPython. If you wish to use non-interactive Python you can create a .py
file and run it as you normally would, omitting any special directives such as %load_ext nb_black
.
# Optional auto-formatting. Installation required (`pip install nb_black`)
%load_ext nb_black
# Import Pandas
import pandas as pd
UK Best Selling Cars of 2021 up to February 5th
Source: https://www.autoexpress.co.uk/news/94280/best-selling-cars-2021
# Create a dictionary containing our data
cars = {
"Model": [
"Vauxhall Corsa",
"Kia Sportage",
"Nissan Qashqai",
"Ford Fiesta",
"Volvo XC40",
],
"Sales": [3078, 2986, 2835, 2594, 2201],
}
# Create a pandas dataframe from the data in `cars`
df = pd.DataFrame(cars, columns=["Model", "Sales"])
# Display the dataframe
print(df)
Model Sales
0 Vauxhall Corsa 3078
1 Kia Sportage 2986
2 Nissan Qashqai 2835
3 Ford Fiesta 2594
4 Volvo XC40 2201
Practice some SELECT
statement equivalents
# Use df.loc for label-based indexing
df.loc[df["Model"] == "Kia Sportage"]
Model | Sales | |
---|---|---|
1 | Kia Sportage | 2986 |
df.loc[df["Sales"] > 2500]
Model | Sales | |
---|---|---|
0 | Vauxhall Corsa | 3078 |
1 | Kia Sportage | 2986 |
2 | Nissan Qashqai | 2835 |
3 | Ford Fiesta | 2594 |
# Search for a string
df.loc[df["Model"].str.contains("V")]
Model | Sales | |
---|---|---|
0 | Vauxhall Corsa | 3078 |
4 | Volvo XC40 | 2201 |
# Multiple conditions
# Use `&`, `|`, `~` for and , or, not
# Parentheses required due to precedence of logic and comparision operators
df.loc[(df["Model"].str.contains("V")) | (df["Sales"] == 2986)]
Model | Sales | |
---|---|---|
0 | Vauxhall Corsa | 3078 |
1 | Kia Sportage | 2986 |
4 | Volvo XC40 | 2201 |
That is how you can query pandas dataframes in a similar way to using SQL SELECT
statements.
Happy Computing!