Building and Deploying a Machine Learning Pipeline with Python

Introduction to Machine Learning Pipeline

This guide will walk you through creating a complete machine learning pipeline and deploying it as a web API using the FastAPI framework in Python.

Learning Objectives

By the end of this tutorial, you will be able to:

Construct an end-to-end machine learning pipeline using PyCaret.
Understand what model deployment entails.
Develop an API with FastAPI to generate predictions for unseen data.

Understanding PyCaret

PyCaret is an open-source, low-code machine learning library designed in Python, which facilitates the automation of machine learning workflows. Its popularity stems from its user-friendly interface and speed in building and deploying comprehensive ML prototypes.

To get started with PyCaret, you can install it using the following command:

pip install pycaret

Exploring FastAPI

FastAPI is a modern and efficient web framework for building APIs with Python 3.6 or newer, relying on standard Python type hints. Its key attributes include:

Speed: High performance comparable to NodeJS and Go (thanks to Starlette and Pydantic).
Efficiency: Accelerates feature development by approximately 200-300%.
Simplicity: Designed to be user-friendly, minimizing the time spent on documentation.

To install FastAPI, run:

pip install fastapi

Workflow Overview: PyCaret and FastAPI

Business Scenario

In this tutorial, we will reference a popular case study from the Darden School of Business. The story revolves around Greg, who aims to propose to Sarah with a diamond ring. To ensure Sarah's satisfaction, Greg gathers data on 6,000 diamonds, including attributes like price, cut, and color.

Dataset

The objective is to predict diamond prices based on various features such as carat weight, cut, and color. The dataset can be accessed from the PyCaret library.

from pycaret.datasets import get_data

data = get_data('diamond')

Exploratory Data Analysis

To visualize the relationship between independent features (like weight, cut, color, clarity) and the target variable (Price), we can create scatter plots and histograms.

import plotly.express as px

fig = px.scatter(x=data['Carat Weight'], y=data['Price'], facet_col=data['Cut'], opacity=0.25, template='plotly_dark', trendline='ols', title='SARAH GETS A DIAMOND - A CASE STUDY')

fig.show()

#### Analyzing Price Distribution

To examine the target variable's distribution, we can plot histograms.

fig = px.histogram(data, x=["Price"], template='plotly_dark', title='Histogram of Price')

fig.show()

Given that the price distribution is right-skewed, applying a log transformation may help in achieving a more normal distribution.

import numpy as np

data_copy = data.copy()

data_copy['Log_Price'] = np.log(data['Price'])

fig = px.histogram(data_copy, x=["Log_Price"], title='Histogram of Log Price', template='plotly_dark')

fig.show()

Data Preparation

The setup function in PyCaret initializes the experiment and establishes the transformation pipeline based on parameters provided. This function must be executed prior to any other function calls.

from pycaret.regression import *

s = setup(data, target='Price', transform_target=True)

Model Training and Evaluation

Once the data is prepared, we can initiate the training process using the compare_models functionality. This function evaluates all available estimators using cross-validation.

best = compare_models()

The CatBoost Regressor emerged as the best model based on Mean Absolute Error (MAE), achieving a score of $543 against an average diamond value of $11,600—an impressive result given our efforts.

plot_model(best, plot='residuals_interactive')

plot_model(best, plot='feature')

Finalizing and Saving the Pipeline

Next, we will finalize the best model by training it on the complete dataset and saving the pipeline as a pickle file.

final_best = finalize_model(best)

save_model(final_best, 'diamond-pipeline')

Model Deployment

Deploying machine learning models involves making them accessible in production environments where web applications and APIs can utilize them. Predictions can be generated through batch processing or in real-time.

This section will demonstrate how to create an API using the FastAPI framework.

The initial lines of code involve basic imports, followed by initializing an app with FastAPI and loading the trained model.

from fastapi import FastAPI

import pickle

app = FastAPI()

model = pickle.load(open('diamond-pipeline', 'rb'))

@app.post("/predict")

def predict(data: dict):

# Use PyCaret's predict_model function to make predictions

return {"prediction": ...}

You can run this script using the following command in your command prompt (ensuring your script is in the same directory as the model):

uvicorn main:app --reload

This will start an API service on your localhost. You can access it through your web browser.

Utilizing the API

To make predictions using the API, you can use the requests library in Python, showcasing the straightforward nature of both PyCaret and FastAPI. In under 25 lines of code, multiple models have been trained and a machine learning pipeline deployed via an API.

About the Author

I write about data science, machine learning, and PyCaret. For updates, feel free to follow me on social media platforms.

Chapter 2: Video Resources

Deploying ML Models in Production: An Overview

This video provides an insightful overview of deploying machine learning models in production environments.

How to Deploy Machine Learning Models into Production

In this video, you will learn practical steps to deploy your machine learning models effectively.

4008063323.net

Building and Deploying a Machine Learning Pipeline with Python

Introduction to Machine Learning Pipeline

Learning Objectives

Understanding PyCaret

Exploring FastAPI

Workflow Overview: PyCaret and FastAPI

Business Scenario

Dataset

Exploratory Data Analysis

Data Preparation

Model Training and Evaluation

Finalizing and Saving the Pipeline

Model Deployment

Utilizing the API

About the Author

Chapter 2: Video Resources

Share the page:

Recent Post:

Resilience Through Pain: A Personal Journey of Defiance

Achieving Greater Success at Work: The Importance of Focus

Essential Techniques for Safeguarding Your Personal Boundaries