Boston Housing Price Prediction using Linear Regression

Boston Housing Data: This dataset is taken from the StatLib library and maintained by Carnegie Mellon University. The dataset relates to housing prices in Boston, the city of housing. The dataset provided has 506 instances and 13 features.

The dataset description is taken from

Let’s build a linear regression model to predict house prices

Input libraries and datasets.

# Importing Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
   
# Importing Data
from sklearn.datasets import load_boston
boston = load_boston()

Enter the shape of the Boston data and get the feature_names

boston.data.shape
boston.feature_names

Convert the data from nd-array to dataframe and add feature names to the data

data = pd.DataFrame(boston.data)
data.columns = boston.feature_names
  
data.head( 10 )

Add a Price column to the dataset

# Adding 'Price' (target) column to the data 
boston.target.shape
data[ 'Price' ] = boston.target
data.head()

Description of the Boston dataset

data.describe()

Boston dataset

data.info()

Get input and output data and further split the data into training and test datasets.

# Input Data
x = boston.data
   
# Output Data
y = boston.target
   
   
# splitting data to training and testing dataset. 
from sklearn.cross_validation import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.2 , random_state = 0 )
   
print ( "xtrain shape : " , xtrain.shape)
print ( "xtest shape  : " , xtest.shape)
print ( "ytrain shape : " , ytrain.shape)
print ( "ytest shape  : " , ytest.shape)

Apply a linear regression model to a dataset and predict prices.

# Fitting Multi Linear regression model to training model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(xtrain, ytrain)
   
# predicting the test set results
y_pred = regressor.predict(xtest)

Scatter plots are plotted to show the predictions – “ytrue” values vs. “y_pred” values

# Plotting Scatter graph to show the prediction 
# results - 'ytrue' value vs 'y_pred' value
plt.scatter(ytest, y_pred, c = 'green' )
plt.xlabel( "Price: in $1000's" )
plt.ylabel( "Predicted value" )
plt.title( "True value vs predicted value : Linear Regression" )
plt.show()

The result of linear regression, i.e., the mean square error.

# Results of Linear Regression.
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(ytest, y_pred)
print ( "Mean Square Error : " , mse)

Based on the results, our model is only 66.55% accurate. Therefore, a prepared model is not very good for predicting the price of a house. There are many other possible machine learning algorithms and techniques that can be used to improve prediction results.

First, your interview preparation enhances your data structure concepts with the Python DS course.