Exploring 5 Techniques for Applying Conditional Statements in Pandas DataFrames

Apply the IF condition in the Pandas DataFrame

Now let’s review the following 5 cases:

IF condition – a set of numbers

Let’s say you create a DataFrame in Python with 10 numbers (from 1 to 10). Then, you’ll apply the following IF conditions:

  • If the number is equal to or less than 4, the assignment is ‘True’
  • Otherwise, if the number is greater than 4, the assignment is ‘False’

Here’s the general structure you can use to create an IF condition: For our Pandas DataFrame IF conditional usage example, the Python code looks like this: Here’s what you’ll get in Python:

df.loc[df['column name'] condition, 'new column name'] = 'value if condition is met'
import pandas as pd

numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])

df.loc[df['set_of_numbers'] <= 4, 'equal_or_lower_than_4?'] = 'True' 
df.loc[df['set_of_numbers'] > 4, 'equal_or_lower_than_4?'] = 'False' 

print (df)
   set_of_numbers   equal_or_lower_than_4?
0               1                     True
1               2                     True
2               3                     True
3               4                     True
4               5                    False
5               6                    False
6               7                    False
7               8                    False
8               9                    False
9              10                    False

IF condition – a set of numbers and lambdas

You’ll now see how you can get the same results as in Case 1 by using Lambada, where the conditions are:

  • If the number is equal to or less than 4, the assignment is ‘True’
  • Otherwise, if the number is greater than 4, the assignment is ‘False’

Here’s a generic structure you can apply in Python: For our Pandas DataFrame IF conditional usage example: Here’s what you’ll get, matching case 1:

df['new column name'] = df['column name'].apply(lambda x: 'value if condition is met' if x condition else 'value if condition is not met')
import pandas as pd

numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])

df['equal_or_lower_than_4?'] = df['set_of_numbers'].apply(lambda x: 'True' if x <= 4 else 'False')

print (df)
   set_of_numbers   equal_or_lower_than_4?
0               1                     True
1               2                     True
2               3                     True
3               4                     True
4               5                    False
5               6                    False
6               7                    False
7               8                    False
8               9                    False
9              10                    False

IF condition – string

Now, let’s create a DataFrame that contains only strings/text with 4 names: Jon, Bill, Maria, and Emma. The conditions are:

  • If the name is equal to ‘Bill’, the value ‘Match’ is assigned
  • Otherwise, if the name is not ‘Bill’, the value ‘Mismatch’ is assigned
import pandas as pd

names = {'first_name': ['Jon','Bill','Maria','Emma']}
df = pd.DataFrame(names,columns=['first_name'])

df.loc[df['first_name'] == 'Bill', 'name_match'] = 'Match'  
df.loc[df['first_name'] != 'Bill', 'name_match'] = 'Mismatch'  
 
print (df)

After running the Python code above, you’ll see:

  first_name   name_match
0        Jon     Mismatch
1       Bill        Match
2      Maria     Mismatch
3       Emma     Mismatch

IF conditions – strings and lambdas

How Pandas DataFrame applies IF conditions: With lambda, you’ll get the same result as in case 3: Here’s the output from Python:

import pandas as pd

names = {'first_name': ['Jon','Bill','Maria','Emma']}
df = pd.DataFrame(names,columns=['first_name'])

df['name_match'] = df['first_name'].apply(lambda x: 'Match' if x == 'Bill' else 'Mismatch')

print (df)
  first_name   name_match
0        Jon     Mismatch
1       Bill        Match
2      Maria     Mismatch
3       Emma     Mismatch

IF conditions vs. OR

In the last case, let’s apply the following condition:

  • If the name is “Bill” or “Emma”, a value for “Match” is assigned
  • Otherwise, if the name is neither ‘Bill’ nor ‘Emma’, the assignment is ‘Mismatch’

The following is an example of the conditional usage of Pandas DataFrame IF: Run the Python code and you’ll get the following result:

import pandas as pd

names = {'first_name': ['Jon','Bill','Maria','Emma']}
df = pd.DataFrame(names,columns=['first_name'])

df.loc[(df['first_name'] == 'Bill') | (df['first_name'] == 'Emma'), 'name_match'] = 'Match'  
df.loc[(df['first_name'] != 'Bill') & (df['first_name'] != 'Emma'), 'name_match'] = 'Mismatch'  

print (df)
  first_name   name_match
0        Jon     Mismatch
1       Bill        Match
2      Maria     Mismatch
3       Emma        Match

Apply the IF condition under the existing DataFrame column

By now, you’ve learned how to apply an IF condition by creating a new column. Alternatively, you can store the results under an existing DataFrame column. How Pandas DataFrame applies IF conditions: For example, let’s say you create a DataFrame with 12 numbers, where the last two digits are zeros: ‘set_of_numbers’: [1,2,3,4,5,6,7,8,9,10, 0 , 0 ] You can then apply the following IF condition and then store the result in an existing “set_of_ numbers” column:

  • If the number is equal to 0, change the value to 999
  • If the number is equal to 5, change the value to 555
import pandas as pd

numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10,0,0]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])
print (df)

df.loc[df['set_of_numbers'] == 0, 'set_of_numbers'] = 999
df.loc[df['set_of_numbers'] == 5, 'set_of_numbers'] = 555

print (df)

Here are the before and after results, under the existing “set_of_numbers” column, “5” becomes “555” and “0” becomes “999”.

Before:

    set_of_numbers
0                1
1                2
2                3
3                4
4                5
5                6
6                7
7                8
8                9
9               10
10               0
11               0

 After:

    set_of_numbers
0                1
1                2
2                3
3                4
4              555
5                6
6                7
7                8
8                9
9               10
10             999
11             999

In another instance, you might have a DataFrame that contains NaN values. You can then apply an IF condition to replace these values with zeros, as shown in the following example: Before you see the NaN value, and after you see the zero value: 

import pandas as pd
import numpy as np

numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10,np.nan,np.nan]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])
print (df)

df.loc[df['set_of_numbers'].isnull(), 'set_of_numbers'] = 0
print (df)

Before:

    set_of_numbers
0              1.0
1              2.0
2              3.0
3              4.0
4              5.0
5              6.0
6              7.0
7              8.0
8              9.0
9             10.0
10             NaN
11             NaN

After:

    set_of_numbers
0              1.0
1              2.0
2              3.0
3              4.0
4              5.0
5              6.0
6              7.0
7              8.0
8              9.0
9             10.0
10             0.0
11             0.0

Conclusion

How does Pandas DataFrame use IF conditions?You’ve just seen how to apply IF conditions in Pandas DataFrames. There are indeed multiple ways to apply such a condition in Python. You can get the same results by using Lambada or just sticking with Pandas. In the end, it comes down to using the method that best suits your needs. Finally, you may want to check out the following external resources for additional information about Pandas DataFrames.