What is Gradient Descent?

  • Definition: Gradient Descent is an optimization algorithm used to minimize a function iteratively.
  • Objective: Find the minimum point (minimum cost) of a given function.
  • Application: Frequently used in machine learning for adjusting parameters in models to minimize prediction errors.
# Gradient Descent Algorithm
def gradient_descent(X, y, learning_rate, epochs):
    # Initialize parameters
    b = 0
    m = 0
    
    # Gradient Descent Iterations
    for epoch in range(epochs):
        gradient_b = 0
        gradient_m = 0
        
        # Calculate gradients for each data point
        for i in range(len(X)):
            gradient_b += -2 * (y[i] - (m * X[i] + b))
            gradient_m += -2 * (y[i] - (m * X[i] + b)) * X[i]
        
        # Update parameters
        b -= learning_rate * gradient_b
        m -= learning_rate * gradient_m
    
    return b, m

Intuition:

  • Analogy: Imagine standing on a hilly terrain (representing the cost function) and wanting to find the lowest point (minimum cost).
  • Strategy: You take steps downhill, following the slope, to reach the lowest valley (minimum cost) gradually.

Mathematical Formulation:

  • Function: Consider a function (cost function) that depends on some parameters (weights and biases in the context of machine learning).
  • Goal: Adjust the parameters to minimize the function (reduce the prediction error).
  • Gradient: Calculate the gradient (partial derivatives) of the function with respect to each parameter. The gradient points in the direction of the steepest increase.
  • Update Parameters: Move in the opposite direction of the gradient to descend towards the minimum.
  • Update Rule: Parameters_new = Parameters_old – (learning_rate * Gradient)

Code from Scratch:

  • Initialization: Start with initial parameters (weights and biases).
  • Iteration: In a loop, calculate the gradient of the cost function with respect to each parameter.
  • Update Parameters: Adjust parameters using the gradient and a small step size called the learning rate.
  • Repeat: Continue this process until the algorithm converges (parameters stabilize).
# Example Code (for a simple linear regression)
for epoch in range(epochs):
    gradient_b = 0
    gradient_m = 0
    for data_point in dataset:
        gradient_b += -2 * (actual_y - (m * x + b))
        gradient_m += -2 * (actual_y - (m * x + b)) * x
    b -= learning_rate * gradient_b
    m -= learning_rate * gradient_m

Visualization :

  • Plotting the Cost Function: Visualize the cost function in 3D, with the parameters on the axes.
  • Observation: Observe how the cost changes based on different parameter values. The goal is to find the combination that minimizes the cost.
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Visualizing the Cost Function
def plot_cost_function(X, y, b_values, m_values, z_values):
    fig = plt.figure(figsize=(10, 8))
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(b_values, m_values, z_values, c='r', marker='o')
    ax.set_xlabel('Intercept (b)')
    ax.set_ylabel('Slope (m)')
    ax.set_zlabel('Cost')
    plt.show()

# Example Usage
plot_cost_function(X, y, b_values, m_values, z_values)

Effect of Learning Rate:

  • Learning Rate: A hyperparameter that controls the step size during optimization.
  • Impact: Too small learning rates may make the algorithm converge very slowly, while too large learning rates may cause overshooting and divergence.
  • Optimal Learning Rate: Experimentation is needed to find a balance for efficient convergence.
# Experimenting with Learning Rate
learning_rates = [0.001, 0.01, 0.1]

for lr in learning_rates:
    b_optimal, m_optimal = gradient_descent(X, y, learning_rate=lr, epochs=100)
    # Visualize or print results for each learning rate

Adding ( m ) into the Equation:

  • Multiple Parameters: Extend the concept to functions with more parameters. For example, in a linear regression with two parameters ( m ) and ( b ).
  • Update: Update each parameter based on its respective gradient.

Effect of Loss Function:

  • Loss Function: A measure of how well the model fits the data, indicating the prediction error.
  • Choice of Loss Function: Different problems may require different loss functions.
  • Impact: The choice of the loss function affects how the algorithm adjusts parameters to minimize the error.
# Different Loss Functions for Regression
def mean_squared_error(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

def mean_absolute_error(y_true, y_pred):
    return np.mean(np.abs(y_true - y_pred))

Effect of Data:

  • Dataset: Different datasets may lead to different optimal parameters.
  • Generalization: The goal is to find parameters that generalize well to new, unseen data, not just fit the training data.
# Different Datasets
X1, y1 = make_regression(n_samples=100, n_features=1, noise=20, random_state=42)
X2, y2 = make_regression(n_samples=100, n_features=1, noise=50, random_state=42)

# Compare the impact of different datasets on the optimal parameters
b1, m1 = gradient_descent(X1, y1, learning_rate=0.01, epochs=100)
b2, m2 = gradient_descent(X2, y2, learning_rate=0.01, epochs=100)

In summary, Gradient Descent is a fundamental optimization algorithm used in machine learning to iteratively minimize a cost function by adjusting model parameters. Understanding its components and experimenting with hyperparameters is crucial for successful model training.

The provided code snippets cover the basics of Gradient Descent, including the algorithm itself, visualization of the cost function, experimenting with learning rates, handling multiple parameters, using different loss functions, and considering the impact of different datasets.

171 Replies to “Gradient Descent from scratch”

  1. I don’t think the title of your article matches the content lol. Just kidding, mainly because I had some doubts after reading the article.

  2. Hello! I’ve been following your website for a while now and finally got the courage to go ahead and give you a shout out from Atascocita Texas! Just wanted to tell you keep up the good work!

  3. Comprehensive drug resource. Access drug details.
    [url=https://propecia365n.top]buy propecia with no prescription[/url]
    Complete medicine overview. Medication resource available.

  4. I just couldn’t depart your web site before suggesting that I actually enjoyed the standard info a person provide for your visitors? Is gonna be back often in order to check up on new posts

Leave a Reply

Your email address will not be published. Required fields are marked *