Practical Data Science with Python – Selecting Data from a DataFrame

The task:

Emulate SQL SELECT with a pandas dataframe

E.g.

SELECT *
FROM table
WHERE column_name = some_value;

The code in this lesson was written in a Juypter Notebook. This means it can be run sequentially using IPython. If you wish to use non-interactive Python you can create a .py file and run it as you normally would, omitting any special directives such as %load_ext nb_black.

# Optional auto-formatting. Installation required (`pip install nb_black`)
%load_ext nb_black

# Import Pandas
import pandas as pd

UK Best Selling Cars of 2021 up to February 5th

Source: https://www.autoexpress.co.uk/news/94280/best-selling-cars-2021

# Create a dictionary containing our data
cars = {
    "Model": [
        "Vauxhall Corsa",
        "Kia Sportage",
        "Nissan Qashqai",
        "Ford Fiesta",
        "Volvo XC40",
    ],
    "Sales": [3078, 2986, 2835, 2594, 2201],
}

# Create a pandas dataframe from the data in `cars`
df = pd.DataFrame(cars, columns=["Model", "Sales"])

# Display the dataframe
print(df)
            Model  Sales
0  Vauxhall Corsa   3078
1    Kia Sportage   2986
2  Nissan Qashqai   2835
3     Ford Fiesta   2594
4      Volvo XC40   2201

Practice some SELECT statement equivalents

# Use df.loc for label-based indexing
df.loc[df["Model"] == "Kia Sportage"]
Model Sales
1 Kia Sportage 2986
df.loc[df["Sales"] > 2500]
Model Sales
0 Vauxhall Corsa 3078
1 Kia Sportage 2986
2 Nissan Qashqai 2835
3 Ford Fiesta 2594
# Search for a string
df.loc[df["Model"].str.contains("V")]
Model Sales
0 Vauxhall Corsa 3078
4 Volvo XC40 2201
# Multiple conditions
# Use `&`, `|`, `~` for and , or, not
# Parentheses required due to precedence of logic and comparision operators
df.loc[(df["Model"].str.contains("V")) | (df["Sales"] == 2986)]
Model Sales
0 Vauxhall Corsa 3078
1 Kia Sportage 2986
4 Volvo XC40 2201

That is how you can query pandas dataframes in a similar way to using SQL SELECT statements.

Happy Computing!

Sharing is caring!

Leave a Reply

Your email address will not be published. Required fields are marked *