Introduction to Python installation and use of Pandas tutorial
Pandas is an open-source Python library that is primarily used for data analysis. The collection of tools in the Pandas package is an important resource for preparing, transforming, and aggregating data in Python.
The Pandas library is based on the NumPy package and is compatible with a large number of existing modules. Two new tabular data structures, Series and DataFrames, have been added, enabling users to take advantage of features similar to those found in relational databases or spreadsheets.
PythonHow to install and use Pandas? This article will show you how to install Python Pandas and introduce you to basic Pandas commands.
How to install Python Pandas
The popularity of Python has led to the creation of many distros and packages. Package managers are efficient tools for automating the installation process, managing upgrades, configuring, and removing Python packages and dependencies.
Note: Python 3.6.1 or later is a prerequisite for Pandas installation. Use our detailed guide to check your current Python version. If you don’t have the required version of Python, you can use one of the following detailed guides:
- How to install Python 3.8 on Ubuntu 18.04 or Ubuntu 20.04.
- How to install Python 10 on Windows 3
- How to install the latest version of Python 3 on Centos 7
Install Pandas using Anaconda
How to install and use Python Pandas? The Anaconda package already includes the Pandas library. Check the current Pandas version by typing the following command into the terminal:
conda list pandas
The output confirms the Pandas version and build.
If you don’t have Pandas on your system, you can also use the conda
tool to install Pandas:
conda install pandas
Anaconda manages the entire transaction by installing a set of modules and dependencies.
Python installation and use of Pandas tutorial: Install Pandas using pip
The PyPI software repository regularly manages and maintains the latest version of Python-based software. Install the PyPI package manager pip and use it to deploy Python pandas:
pip3 install pandas
The download and installation process will take some time to complete.
Install Pandas on Linux
Installing a pre-packaged solution may not always be preferred. You can install Pandas on any Linux distribution using the same method as other modules. For example, install the basic Pandas module on Ubuntu 20.04 with the following command:
sudo apt install python3-pandas -y
Keep in mind that packages in Linux repositories often don’t contain the latest available version.
Use Python Pandas
PythonHow to install and use Pandas? The flexibility of Python allows you to use Pandas in a variety of frameworks. This includes a basic Python code editor, commands issued from the terminal’s Python shell, interactive environments (e.g., Spyder, PyCharm, Atom, etc.). The actual examples and commands in this tutorial are rendered using Jupyter Notebooks.
Import the Python Pandas library
To analyze and process data, you’ll need to import the Pandas library in a Python environment. Start a Python session and import Pandas with the following command:
import pandas as pd
import numpy as np
It is considered good practice to import Pandas as pd
and numpy’s scientific library as np
. This action allows you to use PD
or NP
when typing commands. Otherwise, you will need to enter the full module name each time.
The Pandas library must be imported every time you start a new Python environment.
Python Installation and Use of Pandas Tutorial: Series and DataFrames
Python Pandas uses Series and DataFrames to structure data and prepare it for a variety of analytical operations. These two data structures are the backbone of Pandas’ versatility. Users who are already familiar with relational databases have an innate understanding of basic Pandas concepts and commands.
Pandas Series
A series represents an object in the Pandas library. They provide structure to simple one-dimensional datasets by pairing each data element with a unique label. A series consists of two arrays – a master array that holds data and an index array that holds pairs of tags.
Use the following example to create a basic Series. In this example, the Series constructs the car sales number indexed by the manufacturer:
s = pd.Series([10.8,10.7,10.3,7.4,0.25],
index = ['VW','Toyota','Renault','KIA','Tesla')
After running the command, type s
to see the Series you just created. The results list the manufacturers in the order in which they were entered.
You can perform a complex and diverse set of functions on a series, including math functions, data manipulations, and arithmetic operations between series. A full list of Pandas parameters, properties, and methods is available on the official Pandas page.
Pandas DataFrames
How to install and use Python Pandas? DataFrame introduces a new dimension to Series data structures. In addition to an index array, a set of strictly arranged columns DataFrames provides a table-like structure. Each column can store a different data type. Try manually creating a dict object named ‘data’ with the same car sales data:
data = { 'Company' : ['VW','Toyota','Renault','KIA','Tesla'],
'Cars Sold (millions)' : [10.8,10.7,10.3,7.4,0.25],
'Best Selling Model' : ['Golf','RAV4','Clio','Forte','Model 3']}
Pass the Data object to the PD. DataFrame()
constructor:
frame = pd.DataFrame(data)
Use the name of the DataFrame frame, to run the object:
frame
The resulting DataFrame formats the values into rows and columns.
The DataFrame structure allows you to select and filter values based on columns and rows, assign new values, and pivot data. Like the Series, the official Pandas page provides a complete list of DataFrame parameters, properties, and methods.
Use Pandas to read and write
PythonHow to install and use Pandas? With Series and DataFrames, Pandas introduces a set of features that enable users to import text files, complex binary formats, and information stored in databases. The syntax for reading and writing data in Pandas is simple:
pd.read_filetype = (filename or path)
– Import data from other formats into Pandas.df.to_filetype = (filename or path)
– Export data from Pandas to other formats.
The most common formats include CSV, XLXS, JSON, HTML, and SQL.
read | write |
---|---|
pd.read_csv (‘filename.csv’) | df.to_csv (‘filename or path’) |
pd.read_excel (‘filename.xlsx’) | df.to_excel (‘file name or path’) |
pd.read_json (‘filename.json’) | df.to_json (‘filename or path’) |
pd.read_html (‘filename.htm’) | df.to_html (‘filename or path’) |
pd.read_sql (‘Table Name’) | df.to_sql (‘database name’) |
In this example, nz_population CSV file contains population data for New Zealand for the last 10 years. Use the following command to import the CSV file into the Pandas library:
pop_df = pd.read_csv('nz_population.csv')
The user is free to define the name of the DataFrame (pop_df). Type the name of the newly created DataFrame to display the data array:
pop_df
Python Installation and Use Pandas Tutorial: Common Pandas Commands
Once you’ve imported your files into the Pandas library, you can explore and manipulate the dataset using a simple set of commands.
Basic DataFrames commands
PythonHow to install and use Pandas? Enter the following command to retrieve an overview of the pop_df DataFrame in the previous example:
pop_df.info()
The output provides the number of entries, the name of each column, the data type, and the file size.
Use the pop_df.head()
command to display the first 5 lines of the DataFrame.
Type the pop_df.tail()
command to display the last 5 lines of the pop_df DataFrame.
Use the name and iloc
attributes to select specific rows and columns. Select a single column using the name in square brackets:
pop_df['population']
The iloc
property allows you to retrieve a subset of rows and columns. Specify rows before the comma and columns after the comma. The following command retrieves data from rows 6 to 16 and columns 2 through 4:
pop_df.iloc [6:15,2:4]
Colon: Indicates
that Pandas displays the entire specified subset.
Conditional expressions
You can select rows based on conditional expressions. Conditions are defined in square brackets [].
The following command filters rows with a ‘percent’ column value greater than 0.50%.
pop_df [pop_df['percent'] > 0.50]
Data aggregation
How to install and use Python Pandas? Use functions to calculate the value of an entire array and produce a single result. The square brackets []
also allow the user to select a single column and convert it to a DataFrame. The following command creates a new total_migration DataFrame from the migration column in pop_df:
total_migration = pop_df['migration']
Verify the data by examining the first 5 lines:
total_migration.head()
Use the following df.sum()
function to calculate the net migration into New Zealand:
total_migration = total_migration.sum()
total_migration
The output produces a single result that represents the sum of the values in the total_migration DataFrame.
Some of the more common aggregate functions include:
df.mean()
– The average value of the calculated values.df.median()
– the median of the calculated value.df.describe()
– provides a summary of statistics.df.min()/df.max()
– Minimum and maximum values in the dataset.df.idxmin()/df.idxmax()
– minimum and maximum index values.
These basic features represent only a small fraction of the available actions and actions that Pandas has to offer.
Python installation and use of Pandas tutorial conclusion
PythonHow to install and use Pandas? You’ve successfully installed Python Pandas and learned how to manage simple data structures. The examples and command sequences outlined in this tutorial show you how to prepare, process, and aggregate data in Python Pandas.