GridSearchCV is a scikit-learn library function that performs an exhaustive search over a specified parameter grid for an estimator (model). It is commonly used to find the optimal combination of hyperparameters for a machine learning model.

GridSearchCV

Here is a description of how GridSearchCV works:

  1. The function is initialized with an estimator, a parameter grid, and a scoring function.
  2. The function splits the data into training and test sets.
  3. The function iterates over the parameter grid, training the estimator on the training set for each combination of hyperparameters.
  4. The function evaluates the estimator’s performance on the test set for each combination of hyperparameters using the scoring function.
  5. The function returns the combination of hyperparameters that resulted in the best performance, as determined by the scoring function.

Here is an example of how GridSearchCV might be used to tune the hyperparameters of a support vector machine (SVM) model:

from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV# Set up the parameter grid
param_grid = {'kernel': ['linear', 'rbf'], 'C': [0.1, 1, 10]}# Initialize the estimator and the grid search
estimator = SVC()
grid_search = GridSearchCV(estimator, param_grid, scoring='accuracy')# Fit the model to the data
grid_search.fit(X_train, y_train)# Get the best combination of hyperparameters
best_params = grid_search.best_params_

In this example, the grid search will train and evaluate an SVM model with a linear kernel and C=0.1, a linear kernel and C=1, a linear kernel and C=10, an RBF kernel and C=0.1, an RBF kernel and C=1, and an RBF kernel and C=10, and it will return the combination of kernel and C that resulted in the highest accuracy on the test set.

Another example, if you are training a support vector machine (SVM) model, you might specify a grid that includes different values for the kernel and C hyperparameters. The function will then train an SVM model for each combination of kernel and C values in the grid and evaluate the model’s performance using the scoring function. The function will return the combination of hyperparameters that resulted in the best performance.

GridSearchCV can be used with any estimator that has a fit and predict method, and it is particularly useful for fine-tuning the hyperparameters of complex models. It is important to note that GridSearchCV can be computationally expensive, especially for large datasets or when searching over a large parameter grid.

Datasets : https://www.kaggle.com/datasets/nabin96nabin/gridsearchcv?select=Social_Network_Ads.csv
# Importing libraries
import pandas as pd
import numpy as np

# Reading the dataset
df = pd.read_csv('social_Network_Ads.csv')

# Displaying a random sample of 5 rows
df.sample(5)

# Replacing categorical data with numerical values
df['Gender'].replace({'Male': 0, 'Female': 1}, inplace=True)

# Splitting the data into features (x) and target variable (y)
x = df.iloc[:, 1:4].values
y = df.iloc[:, -1].values

# Standardizing the features using StandardScaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x = scaler.fit_transform(x)

# Displaying the standardized features
x

# Splitting the data into training and testing sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=13)

# Creating a Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()
clf.fit(x_train, y_train)

# Making predictions on the test set
y_pred = clf.predict(x_test)

# Calculating the accuracy score
from sklearn.metrics import accuracy_score
accuracy_score(y_pred, y_test)

# Performing GridSearchCV for hyperparameter tuning
params_dist = {
    "criterion": ["gini", "entropy"],
    "max_depth": [1, 2, 3, 4, 5, 6, 7, None]
}

from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(clf, param_grid=params_dist, cv=10, n_jobs=-1)
grid.fit(x_train, y_train)

# Displaying the best estimator, best score, and best parameters
grid.best_estimator_
grid.best_score_
grid.best_params_

Explanation:

  • The code reads a dataset, displays a random sample, and replaces categorical data with numerical values.
  • It then standardizes the features using StandardScaler and splits the data into training and testing sets.
  • A Decision Tree Classifier is trained on the training set, and predictions are made on the test set.
  • The accuracy score is calculated to evaluate the model’s performance.
  • GridSearchCV is used for hyperparameter tuning, exploring different combinations of criterion and max_depth.
  • The best estimator, best score, and best parameters are displayed after the grid search.

Leave a Reply

Your email address will not be published. Required fields are marked *