Apply the IF condition in the Pandas DataFrame
Now let’s review the following 5 cases:
IF condition – a set of numbers
Let’s say you create a DataFrame in Python with 10 numbers (from 1 to 10). Then, you’ll apply the following IF conditions:
- If the number is equal to or less than 4, the assignment is ‘True’
- Otherwise, if the number is greater than 4, the assignment is ‘False’
Here’s the general structure you can use to create an IF condition: For our Pandas DataFrame IF conditional usage example, the Python code looks like this: Here’s what you’ll get in Python:
df.loc[df['column name'] condition, 'new column name'] = 'value if condition is met'
import pandas as pd
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])
df.loc[df['set_of_numbers'] <= 4, 'equal_or_lower_than_4?'] = 'True'
df.loc[df['set_of_numbers'] > 4, 'equal_or_lower_than_4?'] = 'False'
print (df)
set_of_numbers equal_or_lower_than_4?
0 1 True
1 2 True
2 3 True
3 4 True
4 5 False
5 6 False
6 7 False
7 8 False
8 9 False
9 10 False
IF condition – a set of numbers and lambdas
You’ll now see how you can get the same results as in Case 1 by using Lambada, where the conditions are:
- If the number is equal to or less than 4, the assignment is ‘True’
- Otherwise, if the number is greater than 4, the assignment is ‘False’
Here’s a generic structure you can apply in Python: For our Pandas DataFrame IF conditional usage example: Here’s what you’ll get, matching case 1:
df['new column name'] = df['column name'].apply(lambda x: 'value if condition is met' if x condition else 'value if condition is not met')
import pandas as pd
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])
df['equal_or_lower_than_4?'] = df['set_of_numbers'].apply(lambda x: 'True' if x <= 4 else 'False')
print (df)
set_of_numbers equal_or_lower_than_4?
0 1 True
1 2 True
2 3 True
3 4 True
4 5 False
5 6 False
6 7 False
7 8 False
8 9 False
9 10 False
IF condition – string
Now, let’s create a DataFrame that contains only strings/text with 4 names: Jon, Bill, Maria, and Emma. The conditions are:
- If the name is equal to ‘Bill’, the value ‘Match’ is assigned
- Otherwise, if the name is not ‘Bill’, the value ‘Mismatch’ is assigned
import pandas as pd
names = {'first_name': ['Jon','Bill','Maria','Emma']}
df = pd.DataFrame(names,columns=['first_name'])
df.loc[df['first_name'] == 'Bill', 'name_match'] = 'Match'
df.loc[df['first_name'] != 'Bill', 'name_match'] = 'Mismatch'
print (df)
After running the Python code above, you’ll see:
first_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Mismatch
IF conditions – strings and lambdas
How Pandas DataFrame applies IF conditions: With lambda, you’ll get the same result as in case 3: Here’s the output from Python:
import pandas as pd
names = {'first_name': ['Jon','Bill','Maria','Emma']}
df = pd.DataFrame(names,columns=['first_name'])
df['name_match'] = df['first_name'].apply(lambda x: 'Match' if x == 'Bill' else 'Mismatch')
print (df)
first_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Mismatch
IF conditions vs. OR
In the last case, let’s apply the following condition:
- If the name is “Bill” or “Emma”, a value for “Match” is assigned
- Otherwise, if the name is neither ‘Bill’ nor ‘Emma’, the assignment is ‘Mismatch’
The following is an example of the conditional usage of Pandas DataFrame IF: Run the Python code and you’ll get the following result:
import pandas as pd
names = {'first_name': ['Jon','Bill','Maria','Emma']}
df = pd.DataFrame(names,columns=['first_name'])
df.loc[(df['first_name'] == 'Bill') | (df['first_name'] == 'Emma'), 'name_match'] = 'Match'
df.loc[(df['first_name'] != 'Bill') & (df['first_name'] != 'Emma'), 'name_match'] = 'Mismatch'
print (df)
first_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Match
Apply the IF condition under the existing DataFrame column
By now, you’ve learned how to apply an IF condition by creating a new column. Alternatively, you can store the results under an existing DataFrame column. How Pandas DataFrame applies IF conditions: For example, let’s say you create a DataFrame with 12 numbers, where the last two digits are zeros: ‘set_of_numbers’: [1,2,3,4,5,6,7,8,9,10, 0 , 0 ] You can then apply the following IF condition and then store the result in an existing “set_of_ numbers” column:
- If the number is equal to 0, change the value to 999
- If the number is equal to 5, change the value to 555
import pandas as pd
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10,0,0]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])
print (df)
df.loc[df['set_of_numbers'] == 0, 'set_of_numbers'] = 999
df.loc[df['set_of_numbers'] == 5, 'set_of_numbers'] = 555
print (df)
Here are the before and after results, under the existing “set_of_numbers” column, “5” becomes “555” and “0” becomes “999”.
Before:
set_of_numbers
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 0
11 0
After:
set_of_numbers
0 1
1 2
2 3
3 4
4 555
5 6
6 7
7 8
8 9
9 10
10 999
11 999
In another instance, you might have a DataFrame that contains NaN values. You can then apply an IF condition to replace these values with zeros, as shown in the following example: Before you see the NaN value, and after you see the zero value:
import pandas as pd
import numpy as np
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10,np.nan,np.nan]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])
print (df)
df.loc[df['set_of_numbers'].isnull(), 'set_of_numbers'] = 0
print (df)
Before:
set_of_numbers
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
10 NaN
11 NaN
After:
set_of_numbers
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
10 0.0
11 0.0
Conclusion
How does Pandas DataFrame use IF conditions?You’ve just seen how to apply IF conditions in Pandas DataFrames. There are indeed multiple ways to apply such a condition in Python. You can get the same results by using Lambada or just sticking with Pandas. In the end, it comes down to using the method that best suits your needs. Finally, you may want to check out the following external resources for additional information about Pandas DataFrames.