A Palms-On Introduction to cuML for GPU-Accelerated Machine Studying Workflows

On this article, you’ll be taught what cuML is, and the way it can considerably pace up the coaching of machine studying fashions by GPU acceleration.

Matters we are going to cowl embrace:

The intention and distinctive options of cuML.
The right way to put together datasets and prepare a machine studying mannequin for classification with cuML in a scikit-learn-like style.
The right way to simply evaluate outcomes with an equal typical scikit-learn mannequin, by way of classification accuracy and coaching time.

Let’s not waste any extra time.

A Palms-On Introduction to cuML for GPU-Accelerated Machine Studying Workflows
Picture by Editor | ChatGPT

Introduction

This text provides a hands-on Python introduction to cuML, a Python library from RAPIDS AI (an open-source suite inside NVIDIA) for GPU-accelerated machine studying workflows throughout broadly used fashions. Along side its knowledge science–oriented sibling, cuDF, cuML has gained reputation amongst practitioners who want scalable, production-ready machine studying options.

The hands-on tutorial beneath makes use of cuML along with cuDF for GPU-accelerated dataset administration in a DataFrame format. For an introduction to cuDF, try this associated article.

About cuML: An “Accelerated Scikit-Study”

RAPIDS cuML (quick for CUDA Machine Studying) is an open-source library that accelerates scikit-learn–model machine studying on NVIDIA GPUs. It offers drop-in replacements for a lot of well-liked algorithms, usually decreasing coaching and inference instances on massive datasets — with out main code adjustments or a steep studying curve for these acquainted with scikit-learn.

Amongst its three most distinctive options:

cuML follows a scikit-learn-like API, easing the transition from CPU to GPU for machine studying with minimal code adjustments
It covers a broad set of strategies — all GPU-accelerated — together with regression, classification, ensemble strategies, clustering, and dimensionality discount
By way of tight integration with the RAPIDS ecosystem, cuML works hand-in-hand with cuDF for knowledge preprocessing, in addition to with associated libraries to facilitate end-to-end, GPU-native pipelines

Palms-On Introductory Instance

For example the fundamentals of cuML for constructing GPU-accelerated machine studying fashions, we are going to take into account a pretty big, but simply accessible, dataset by way of public URL in Jason Brownlee’s repository: the grownup revenue dataset. This can be a massive, barely class-unbalanced dataset meant for binary classification duties, particularly predicting whether or not an grownup’s revenue stage is excessive (above $50K) or low (beneath $50K) based mostly on a set of demographic and socio-economic options. Subsequently, we intention to construct a binary classification mannequin.

IMPORTANT: To run the code beneath on Google Colab or an identical pocket book surroundings, be sure to change the runtime sort to GPU; in any other case, a warning might be raised indicating cuDF can’t discover the precise CUDA driver library it makes use of.

We begin by importing the required libraries for our situation:

import cudf import cuml from cuml.model_selection import train_test_split as gpu_train_test_split from cuml.linear_model import LogisticRegression as cuLogReg from IPython.show import show import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import time

import cudf

import cuml

from cuml.model_selection import train_test_split as gpu_train_test_split

from cuml.linear_model import LogisticRegression as cuLogReg

from IPython.show import show

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

import time

Be aware that, along with cuML modules and features to separate the dataset and prepare a logistic regression classifier, now we have additionally imported their classical scikit-learn counterparts. Whereas not necessary for utilizing cuML (as it really works independently from plain scikit-learn), we’re importing equal scikit-learn parts for the sake of comparability in the remainder of the instance.

Subsequent, we load the dataset right into a cuDF dataframe optimized for GPU utilization:

url = “https://uncooked.githubusercontent.com/jbrownlee/Datasets/grasp/adult-all.csv” # Column names (they aren’t included within the dataset’s CSV file we are going to learn) cols = [ “age”,”workclass”,”fnlwgt”,”education”,”education_num”, “marital_status”,”occupation”,”relationship”,”race”,”sex”, “capital_gain”,”capital_loss”,”hours_per_week”,”native_country”,”income” ] df = cudf.read_csv(url, header=None, names=cols) show(df.head())

url = “https://uncooked.githubusercontent.com/jbrownlee/Datasets/grasp/adult-all.csv”

# Column names (they aren’t included within the dataset’s CSV file we are going to learn)

cols = [

“age”,“workclass”,“fnlwgt”,“education”,“education_num”,

“marital_status”,“occupation”,“relationship”,“race”,“sex”,

“capital_gain”,“capital_loss”,“hours_per_week”,“native_country”,“income”

]

df = cudf.read_csv(url, header=None, names=cols)

show(df.head())

As soon as the information is loaded, we determine the goal variable and convert it into binary (1 for prime revenue, 0 for low revenue):

df[“income”] = df[“income”].str.strip() df[“income”] = (df[“income”] == “>50K”).astype(“int32”)

df[“income”] = df[“income”].str.strip()

df[“income”] = (df[“income”] == “>50K”).astype(“int32”)

This dataset combines numeric options with a slight predominance of categorical ones. Most scikit-learn fashions — together with determination bushes and logistic regression — don’t natively deal with string-valued categorical options, so that they require encoding. An analogous sample applies to cuML; therefore, we are going to choose a small variety of options to coach our classifier and one-hot encode the specific ones.

# Characteristic choice (for instance based mostly on area experience!) options = [“age”,”education_num”,”hours_per_week”,”workclass”,”occupation”,”sex”] X = df[features] y = df[“income”] # One-hot encode categorical options X_enc = cudf.get_dummies(X, drop_first=True) print(“Encoded function form:”, X_enc.form)

# Characteristic choice (for instance based mostly on area experience!)

options = [“age”,“education_num”,“hours_per_week”,“workclass”,“occupation”,“sex”]

X = df[features]

y = df[“income”]

# One-hot encode categorical options

X_enc = cudf.get_dummies(X, drop_first=True)

print(“Encoded function form:”, X_enc.form)

To this point, now we have used cuML (and in addition cuDF) very similar to utilizing classical scikit-learn together with Pandas.

Now comes the attention-grabbing half. We are going to break up the dataset into coaching and take a look at units and prepare a logistic regression classifier twice, utilizing each CUDA GPU (cuML) and standalone scikit-learn. We are going to then evaluate each the classification accuracy and the time taken to coach every mannequin. Right here’s the whole code for the mannequin coaching and comparability:

# MODEL 1: GPU (cuML) train-test break up and coaching t0 = time.time() X_train, X_test, y_train, y_test = gpu_train_test_split(X_enc, y, test_size=0.2, random_state=42) model_gpu = cuLogReg(max_iter=1000) model_gpu.match(X_train, y_train) gpu_time = time.time() – t0 acc_gpu = model_gpu.rating(X_test, y_test) print(f”cuML Logistic Regression accuracy: {acc_gpu:.4f}, time: {gpu_time:.3f} sec”) # MODEL 2: Scikit-learn and Pandas-driven train-test break up and mannequin coaching df_pd = pd.read_csv(url, header=None, names=cols) df_pd[“income”] = df_pd[“income”].str.strip() df_pd[“income”] = (df_pd[“income”] == “>50K”).astype(“int32”) X_pd = df_pd[features] y_pd = df_pd[“income”] X_pd = pd.get_dummies(X_pd, drop_first=True) t0 = time.time() X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(X_pd, y_pd, test_size=0.2, random_state=42) model_cpu = LogisticRegression(max_iter=1000) model_cpu.match(X_train_pd, y_train_pd) cpu_time = time.time() – t0 acc_cpu = model_cpu.rating(X_test_pd, y_test_pd) print(f”scikit-learn Logistic Regression accuracy: {acc_cpu:.4f}, time: {cpu_time:.3f} sec”)

# MODEL 1: GPU (cuML) train-test break up and coaching

t0 = time.time()

X_train, X_test, y_train, y_test = gpu_train_test_split(X_enc, y, test_size=0.2, random_state=42)

model_gpu = cuLogReg(max_iter=1000)

model_gpu.match(X_train, y_train)

gpu_time = time.time() – t0

acc_gpu = model_gpu.rating(X_test, y_test)

print(f“cuML Logistic Regression accuracy: {acc_gpu:.4f}, time: {gpu_time:.3f} sec”)

# MODEL 2: Scikit-learn and Pandas-driven train-test break up and mannequin coaching

df_pd = pd.read_csv(url, header=None, names=cols)

df_pd[“income”] = df_pd[“income”].str.strip()

df_pd[“income”] = (df_pd[“income”] == “>50K”).astype(“int32”)

X_pd = df_pd[features]

y_pd = df_pd[“income”]

X_pd = pd.get_dummies(X_pd, drop_first=True)

t0 = time.time()

X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(X_pd, y_pd, test_size=0.2, random_state=42)

model_cpu = LogisticRegression(max_iter=1000)

model_cpu.match(X_train_pd, y_train_pd)

cpu_time = time.time() – t0

acc_cpu = model_cpu.rating(X_test_pd, y_test_pd)

print(f“scikit-learn Logistic Regression accuracy: {acc_cpu:.4f}, time: {cpu_time:.3f} sec”)

The outcomes are fairly attention-grabbing. They need to look one thing like:

cuML Logistic Regression accuracy: 0.8014, time: 0.428 sec scikit-learn Logistic Regression accuracy: 0.8097, time: 15.184 sec

cuML Logistic Regression accuracy: 0.8014, time: 0.428 sec

scikit–be taught Logistic Regression accuracy: 0.8097, time: 15.184 sec

As we are able to observe, the mannequin educated with cuML achieved very related classification efficiency to its classical scikit-learn counterpart, however it educated over an order of magnitude quicker: about 0.5 seconds in comparison with roughly 15 seconds for the scikit-learn classifier. Your precise numbers will range with {hardware}, drivers, and library variations.

Wrapping Up

This text supplied a delicate, hands-on introduction to the cuML library for enabling GPU-boosted building of machine studying fashions for classification, regression, clustering, and extra. By way of a easy comparability, we confirmed how cuML might help construct efficient fashions with considerably enhanced coaching effectivity.

Main Menu

What's Hot

7 NumPy Methods to Vectorize Your Code

Superb-Tuning & Knowledge Optimization for LLMs in 2026

Bitter APT Exploiting Previous WinRAR Vulnerability in New Backdoor Assaults – Hackread – Cybersecurity Information, Knowledge Breaches, Tech, AI, Crypto and Extra

A Palms-On Introduction to cuML for GPU-Accelerated Machine Studying Workflows

7 NumPy Methods to Vectorize Your Code

5 with MIT ties elected to Nationwide Academy of Medication for 2025 | MIT Information

Construct an Inference Cache to Save Prices in Excessive-Visitors LLM Apps

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

7 NumPy Methods to Vectorize Your Code

Superb-Tuning & Knowledge Optimization for LLMs in 2026

Bitter APT Exploiting Previous WinRAR Vulnerability in New Backdoor Assaults – Hackread – Cybersecurity Information, Knowledge Breaches, Tech, AI, Crypto and Extra

Why your electrical invoice is so excessive now: Blame AI knowledge facilities

Main Menu

Subscribe to Updates

What's Hot

A Palms-On Introduction to cuML for GPU-Accelerated Machine Studying Workflows

Introduction

About cuML: An “Accelerated Scikit-Study”

Palms-On Introductory Instance

Wrapping Up

Related Posts