Close Menu
    Main Menu
    • Home
    • News
    • Tech
    • Robotics
    • ML & Research
    • AI
    • Digital Transformation
    • AI Ethics & Regulation
    • Thought Leadership in AI

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    New PathWiper Malware Strikes Ukraine’s Vital Infrastructure

    June 9, 2025

    Soneium launches Sony Innovation Fund-backed incubator for Soneium Web3 recreation and shopper startups

    June 9, 2025

    ML Mannequin Serving with FastAPI and Redis for sooner predictions

    June 9, 2025
    Facebook X (Twitter) Instagram
    UK Tech Insider
    Facebook X (Twitter) Instagram Pinterest Vimeo
    UK Tech Insider
    Home»Machine Learning & Research»Information to Ray for Scalable AI and Machine Studying Purposes
    Machine Learning & Research

    Information to Ray for Scalable AI and Machine Studying Purposes

    Charlotte LiBy Charlotte LiApril 18, 2025Updated:April 29, 2025No Comments22 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Information to Ray for Scalable AI and Machine Studying Purposes
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link


    Ray has emerged as a robust framework for distributed computing in AI and ML workloads, enabling researchers and practitioners to scale their functions from laptops to clusters with minimal code adjustments. This information gives an in-depth exploration of Ray’s structure, capabilities, and functions in fashionable machine studying workflows, full with a sensible undertaking implementation.

    Studying Aims

    • Perceive Ray’s structure and its function in distributed computing for AI/ML.
    • Leverage Ray’s ecosystem (Practice, Tune, Serve, Knowledge) for end-to-end ML workflows.
    • Examine Ray with various distributed computing frameworks.
    • Design distributed coaching pipelines for massive language fashions.
    • Optimize useful resource allocation and debug distributed functions.

    This text was revealed as part of the Knowledge Science Blogathon.

    Introduction to Ray and Distributed Computing

    Ray is an open-source unified framework for scaling AI and Python functions, offering a easy, common API for constructing distributed functions that may scale from a laptop computer to a cluster. Developed initially at UC Berkeley’s RISELab and now maintained by Anyscale, Ray has gained vital traction within the AI group, changing into the spine for coaching and deploying a few of the most superior AI fashions immediately.

    The rising significance of distributed computing in AI stems from a number of components:

    • Growing mannequin sizes: Trendy AI fashions, particularly massive language fashions (LLMs), have grown exponentially in dimension, with billions and even trillions of parameters.
    • Increasing datasets: Coaching knowledge continues to develop in quantity, typically exceeding what could be processed on a single machine.
    • Computational calls for: Advanced algorithms and coaching procedures require extra computational assets than particular person machines can present.
    • Deployment challenges: Serving fashions at scale requires distributed infrastructure to deal with various workloads effectively.

    Conventional distributed computing frameworks typically require vital rewrites of current code, presenting a steep studying curve. Ray differentiates itself by providing a easy, intuitive API that makes transitioning from single-machine to multi-machine computation simple, typically requiring just a few decorator adjustments to current Python code.

    Problem of Scaling Python Purposes

    Python has turn into the lingua franca of knowledge science and machine studying, however it wasn’t designed with distributed computing in thoughts. When practitioners must scale their Python functions, they historically face a number of challenges:

    • Low-level distribution issues: Managing employee processes, load balancing, and fault tolerance.
    • Knowledge motion: Effectively transferring knowledge between machines.
    • Useful resource administration: Allocating and monitoring CPU, GPU, and reminiscence assets throughout a cluster.
    • Code complexity: Rewriting algorithms to work in a distributed trend.

    It addresses these challenges by offering a unified framework that abstracts away a lot of the complexity whereas nonetheless permitting fine-grained management when wanted.

    Ray Framework

    Ray Framework structure is structured into three main elements:​

    • Ray AI Libraries: This assortment of Python-based, domain-specific libraries gives machine studying engineers, knowledge scientists, and researchers with a scalable toolkit tailor-made for varied ML functions.
    • Ray Core: Serving as the inspiration, Ray Core is a general-purpose distributed computing library that empowers Python builders to parallelize and scale functions, thereby enhancing machine studying workloads.
    • Ray Clusters: Comprising a number of employee nodes linked to a central head node, Ray Clusters could be configured with a set dimension or set to dynamically modify assets based mostly on the calls for of the working functions.

    This modular design allows customers to effectively construct and handle distributed functions with out requiring in-depth experience in distributed techniques.​

    Getting Began with Ray 

    Earlier than diving into the superior functions, it’s important to arrange your Ray atmosphere and perceive the fundamentals of getting began.

    Ray could be put in utilizing pip. To put in the newest secure model, run: 

    # For machine studying functions
    
    pip set up -U "ray[data,train,tune,serve]"
    
    ## For reinforcement studying help, set up RLlib as an alternative.
    ## pip set up -U "ray[rllib]"
    
    # For common Python functions
    
    pip set up -U "ray[default]"
    
    ## If you don't need Ray Dashboard or Cluster Launcher, set up Ray with minimal dependencies as an alternative.
    ## pip set up -U "ray"

    Ray’s Programming Mannequin: Duties and Actors

    Ray’s programming mannequin revolves round two main abstractions:

    • Duties: Capabilities that execute remotely and asynchronously. Duties are stateless computations that may be scheduled on any employee within the cluster.
    • Actors: Courses that keep state and execute strategies remotely. Actors encapsulate state and supply an object-oriented method to distributed computing.

    These abstractions permit builders to precise several types of parallelism naturally:

    import ray
    # Initialize Ray
    ray.init()
    
    # Outline a distant process
    @ray.distant
    def process_data(data_chunk):
        # Course of knowledge and return outcomes
        return processed_result
    
    # Outline an actor class
    @ray.distant
    class Counter:
        def __init__(self):
            self.depend = 0
        
        def increment(self):
            self.depend += 1
            return self.depend
        
        def get_count(self):
            return self.depend
    
    # Execute duties in parallel
    data_chunks = [data_1, data_2, data_3, data_4]
    result_refs = [process_data.remote(chunk) for chunk in data_chunks]
    outcomes = ray.get(result_refs)  # Watch for all duties to finish
    
    # Create an actor occasion
    counter = Counter.distant()
    counter.increment.distant()  # Execute methodology on the actor
    depend = ray.get(counter.get_count.distant())  # Get the actor's state

    Ray’s programming mannequin makes it simple to remodel sequential Python code into distributed functions with minimal adjustments. Duties are perfect for stateless, embarrassingly parallel workloads, whereas actors are excellent for sustaining state or implementing companies.

    Ray Cluster Structure

    A Ray cluster consists of a number of key elements:

    • Head Node: The central coordination level for the cluster, internet hosting the International Management Retailer (GCS) which maintains cluster metadata.
    • Employee Nodes: Processes that execute duties and host actors. Every employee runs on a separate CPU or GPU core.
    • Driver Course of: The method working the consumer’s program, answerable for submitting duties to the cluster.
    • Object Retailer: A distributed, shared-memory object retailer for environment friendly knowledge sharing between duties and actors.
    • Scheduler: Answerable for assigning duties to staff based mostly on useful resource availability and constraints.
    • Useful resource Administration: Ray’s system for allocating and monitoring CPU, GPU, and customized assets throughout the cluster.

    Organising a Ray cluster could be achieved in a number of methods:

    • Domestically on a single machine
    • On a personal cluster utilizing Ray’s cluster launcher
    • On cloud suppliers like AWS, GCP, or Azure
    • Utilizing managed companies like Anyscale
    # Beginning Ray on a single machine (head node)
    ray begin --head --port=6379
    
    # Becoming a member of a employee node to the cluster
    ray begin --address=:6379

    Ray Object Retailer and Reminiscence Administration

    Ray features a distributed object retailer that permits environment friendly sharing of objects between duties and actors. Objects within the retailer are immutable and could be accessed by any employee within the cluster.

    import ray
    import numpy as np
    
    ray.init()
    
    # Retailer an object within the object retailer
    knowledge = np.random.rand(1000, 1000)
    data_ref = ray.put(knowledge)  # Returns a reference to the item
    
    # Go the reference to a distant process
    @ray.distant
    def process_matrix(matrix_ref):
        # The matrix is retrieved from the item retailer
        matrix = ray.get(matrix_ref)
        return np.sum(matrix)
    
    result_ref = process_matrix.distant(data_ref)
    consequence = ray.get(result_ref)

    The article retailer optimizes knowledge switch by:

    • Avoiding pointless knowledge copying: Objects are shared by reference when doable.
    • Spilling to disk: Routinely transferring objects to disk when reminiscence is restricted.
    • Distributed references: Monitoring object references throughout the cluster.

    Ray for AI and ML Workloads

    The Ray gives a complete ecosystem of libraries particularly designed for various points of AI and ML workflows:

    Ray Practice for Distributed Mannequin Coaching utilizing PyTorch

    Ray Practice simplifies distributed deep studying with a unified API throughout completely different frameworks

    For reference, the ultimate code will look one thing like the next:

    import os
    import tempfile
    
    import torch
    from torch.nn import CrossEntropyLoss
    from torch.optim import Adam
    from torch.utils.knowledge import DataLoader
    from torchvision.fashions import resnet18
    from torchvision.datasets import FashionMNIST
    from torchvision.transforms import ToTensor, Normalize, Compose
    
    import ray.practice.torch
    
    def train_func():
        # Mannequin, Loss, Optimizer
        mannequin = resnet18(num_classes=10)
        mannequin.conv1 = torch.nn.Conv2d(
            1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
        )
        # [1] Put together mannequin.
        mannequin = ray.practice.torch.prepare_model(mannequin)
        # mannequin.to("cuda")  # That is achieved by `prepare_model`
        criterion = CrossEntropyLoss()
        optimizer = Adam(mannequin.parameters(), lr=0.001)
    
        # Knowledge
        remodel = Compose([ToTensor(), Normalize((0.28604,), (0.32025,))])
        data_dir = os.path.be a part of(tempfile.gettempdir(), "knowledge")
        train_data = FashionMNIST(root=data_dir, practice=True, obtain=True, remodel=remodel)
        train_loader = DataLoader(train_data, batch_size=128, shuffle=True)
        # [2] Put together dataloader.
        train_loader = ray.practice.torch.prepare_data_loader(train_loader)
    
        # Coaching
        for epoch in vary(10):
            if ray.practice.get_context().get_world_size() > 1:
                train_loader.sampler.set_epoch(epoch)
    
            for photos, labels in train_loader:
                # That is achieved by `prepare_data_loader`!
                # photos, labels = photos.to("cuda"), labels.to("cuda")
                outputs = mannequin(photos)
                loss = criterion(outputs, labels)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
    
            # [3] Report metrics and checkpoint.
            metrics = {"loss": loss.merchandise(), "epoch": epoch}
            with tempfile.TemporaryDirectory() as temp_checkpoint_dir:
                torch.save(
                    mannequin.module.state_dict(),
                    os.path.be a part of(temp_checkpoint_dir, "mannequin.pt")
                )
                ray.practice.report(
                    metrics,
                    checkpoint=ray.practice.Checkpoint.from_directory(temp_checkpoint_dir),
                )
            if ray.practice.get_context().get_world_rank() == 0:
                print(metrics)
    
    # [4] Configure scaling and useful resource necessities.
    scaling_config = ray.practice.ScalingConfig(num_workers=2, use_gpu=True)
    
    # [5] Launch distributed coaching job.
    coach = ray.practice.torch.TorchTrainer(
        train_func,
        scaling_config=scaling_config,
        # [5a] If working in a multi-node cluster, that is the place you
        # ought to configure the run's persistent storage that's accessible
        # throughout all employee nodes.
        # run_config=ray.practice.RunConfig(storage_path="s3://..."),
    )
    consequence = coach.match()
    
    # [6] Load the skilled mannequin.
    with consequence.checkpoint.as_directory() as checkpoint_dir:
        model_state_dict = torch.load(os.path.be a part of(checkpoint_dir, "mannequin.pt"))
        mannequin = resnet18(num_classes=10)
        mannequin.conv1 = torch.nn.Conv2d(
            1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
        )
        mannequin.load_state_dict(model_state_dict)

    Ray Practice gives:

    • Multi-node and multi-GPU coaching capabilities
    • Help for widespread frameworks (PyTorch, TensorFlow, Horovod)
    • Checkpointing and fault tolerance
    • Integration with hyperparameter tuning

    Ray Tune for Hyperparameter Optimization

    Hyperparameter tuning is essential for AI and ML mannequin efficiency. Ray Tune gives scalable hyperparameter optimization.

    To run, set up the next:

    pip set up "ray[tune]"
    from ray import tune
    from ray.tune.schedulers import ASHAScheduler
    
    # Outline the target perform to optimize
    def goal(config):
        mannequin = build_model(config)
        for epoch in vary(100):
            # Practice the mannequin
            loss = train_epoch(mannequin)
            tune.report(loss=loss)  # Report metrics to Tune
    
    # Configure the search house
    search_space = {
        "learning_rate": tune.loguniform(1e-4, 1e-1),
        "batch_size": tune.alternative([16, 32, 64, 128]),
        "hidden_layers": tune.randint(1, 5)
    }
    
    # Run hyperparameter optimization
    evaluation = tune.run(
        goal,
        config=search_space,
        scheduler=ASHAScheduler(metric="loss", mode="min"),
        num_samples=100
    )
    
    # Get the perfect configuration
    best_config = evaluation.get_best_config(metric="loss", mode="min")

    Ray Tune presents:

    • Numerous search algorithms (grid search, random search, Bayesian optimization)
    • Adaptive useful resource allocation
    • Early stopping for inefficient trials
    • Integration with ML frameworks

    Ray Serve for Mannequin Deployment

    It’s designed for deploying ML fashions at scale:

    Set up Ray Serve and its dependencies:

    #import csv
    import ray
    from ray import serve
    from starlette.requests import Request
    import torch
    import json
    
    # Begin Ray Serve
    serve.begin()
    
    # Outline a deployment for our mannequin
    @serve.deployment(route_prefix="/predict", num_replicas=2)
    class ModelDeployment:
        def __init__(self, model_path):
            self.mannequin = torch.load(model_path)
            self.mannequin.eval()
        
        async def __call__(self, request: Request):
            knowledge = await request.json()
            input_tensor = torch.tensor(knowledge["input"])
            
            with torch.no_grad():
                prediction = self.mannequin(input_tensor).tolist()
            
            return {"prediction": prediction}
    
    # Deploy the mannequin
    model_deployment = ModelDeployment.deploy("./trained_model.pt")

    The Ray Serve allows:

    • Mannequin composition and microservices
    • Horizontal scaling
    • Visitors splitting and A/B testing
    • Batching for efficiency optimization

    Ray Knowledge for ML-Optimized Knowledge Processing

    Ray Knowledge gives distributed knowledge processing capabilities optimized for ML workloads:

    import ray
    
    # Initialize Ray
    ray.init()
    
    # Create a dataset from a file or knowledge supply
    ds = ray.knowledge.read_csv("s3://bucket/path/to/knowledge.csv")
    
    # Apply transformations in parallel
    def preprocess_batch(batch):
        # Apply preprocessing to the batch
        return processed_batch
    
    transformed_ds = ds.map_batches(preprocess_batch)
    
    # Break up for coaching and validation
    train_ds, val_ds = transformed_ds.train_test_split(test_size=0.2)
    
    # Create a loader for ML framework (e.g., PyTorch)
    train_loader = train_ds.to_torch(batch_size=32, shuffle=True)

    Knowledge presents:

    • Parallel knowledge loading and transformation
    • Integration with ML coaching
    • Help for varied knowledge codecs and sources
    • Optimized for ML workflows

    Distributed Advantageous-tuning of a Massive Language Mannequin with Ray

    Let’s implement an entire undertaking that demonstrates how you can use Ray for fine-tuning a massive language mannequin (LLM) utilizing distributed computing assets. We’ll use GPT-J-6B as our base mannequin and Ray Practice with DeepSpeed for environment friendly distributed coaching.

    On this undertaking, we’ll:

    • Arrange a Ray cluster for distributed coaching
    • Put together a dataset for fine-tuning the LLM
    • Configure DeepSpeed for memory-efficient coaching
    • Implement distributed coaching utilizing Ray Practice
    • Consider the mannequin and deploy it with Ray Serve

    Atmosphere Setup

    First, let’s arrange the environment with the required dependencies:

    # Set up required packages
    !pip set up "ray[train]" transformers datasets speed up deepspeed torch consider

    Ray Cluster Configuration

    For this undertaking, we’ll configure a Ray cluster with a number of GPUs:

    import ray
    import os
    
    # Configuration
    model_name = "EleutherAI/gpt-j-6B"  # We'll use GPT-J-6B as our base mannequin
    use_gpu = True
    num_workers = 16  # Variety of coaching staff (modify based mostly on accessible GPUs)
    cpus_per_worker = 8  # CPUs per employee
    
    # Initialize Ray
    ray.init(
        runtime_env={
            "pip": [
                "transformers==4.26.0",
                "accelerate==0.18.0",
                "datasets",
                "evaluate",
                "deepspeed==0.12.3",
                "torch>=1.12.0"
            ]
        }
    )

    This initialization creates an area Ray cluster. In a manufacturing atmosphere, you would possibly hook up with an current Ray cluster as an alternative.

    Knowledge Preparation

    For fine-tuning our language mannequin, we’ll put together a textual content dataset:

    from datasets import load_dataset
    from transformers import AutoTokenizer
    
    # Load tokenizer for our mannequin
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    tokenizer.pad_token = tokenizer.eos_token  # GPT fashions haven't got a pad token by default
    
    # Load a textual content dataset (instance utilizing a subset of wikitext)
    dataset = load_dataset("wikitext", "wikitext-2-raw-v1")
    
    # Outline preprocessing perform for tokenization
    def preprocess_function(examples):
        return tokenizer(
            examples["text"],
            truncation=True,
            max_length=512,
            padding="max_length",
            return_tensors="pt"
        )
    
    # Tokenize the dataset in parallel utilizing Ray Knowledge
    import ray.knowledge
    ray_dataset = ray.knowledge.from_huggingface(dataset)
    tokenized_dataset = ray_dataset.map_batches(
        preprocess_function,
        batch_format="pandas",
        batch_size=100
    )
    
    # Convert again to Hugging Face dataset format
    train_dataset = tokenized_dataset.practice.to_huggingface()
    eval_dataset = tokenized_dataset.validation.to_huggingface()

    DeepSpeed Configuration for Reminiscence-Environment friendly Coaching

    Coaching massive fashions like GPT-J-6B requires reminiscence optimization methods. DeepSpeed is a deep studying optimization library that permits environment friendly coaching.

    Let’s configure it for our distributed coaching:

    # DeepSpeed configuration
    deepspeed_config = {
        "fp16": {
            "enabled": True
        },
        "zero_optimization": {
            "stage": 2,
            "offload_optimizer": {
                "machine": "cpu"
            },
            "allgather_bucket_size": 5e8,
            "reduce_bucket_size": 5e8
        },
        "train_batch_size": "auto",
        "train_micro_batch_size_per_gpu": 4,
        "gradient_accumulation_steps": "auto",
        "optimizer": {
            "kind": "AdamW",
            "params": {
                "lr": 5e-5,
                "weight_decay": 0.01
            }
        }
    }
    
    # Save the config to a file
    import json
    with open("deepspeed_config.json", "w") as f:
        json.dump(deepspeed_config, f)

    This configuration makes use of a number of optimization methods:

    • FP16 precision to cut back reminiscence utilization
    • ZeRO stage 2 optimizer to partition optimizer states
    • CPU offloading to maneuver some knowledge from GPU to CPU reminiscence
    • Automated batch dimension and gradient accumulation configuration

    Implementing Distributed Coaching

    Outline the coaching perform and use Ray Practice to distribute it throughout the cluster:

    from transformers import AutoModelForCausalLM, Coach, TrainingArguments
    import torch
    import torch.distributed as dist
    from ray.practice.huggingface import HuggingFaceTrainer
    from ray.practice import ScalingConfig
    
    # Outline the coaching perform to be executed on every employee
    def train_func(config):
        # Initialize course of group for distributed coaching
        dist.init_process_group(backend="nccl")
        
        # Load pre-trained mannequin
        mannequin = AutoModelForCausalLM.from_pretrained(
            config["model_name"],
            revision="float16",
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True
        )
        
        # Arrange coaching arguments
        training_args = TrainingArguments(
            output_dir="./output",
            per_device_train_batch_size=config["batch_size"],
            per_device_eval_batch_size=config["batch_size"],
            evaluation_strategy="epoch",
            num_train_epochs=config["epochs"],
            fp16=True,
            report_to="none",
            deepspeed="deepspeed_config.json",
            save_strategy="epoch",
            load_best_model_at_end=True,
            logging_steps=10
        )
        
        # Initialize Coach
        coach = Coach(
            mannequin=mannequin,
            args=training_args,
            train_dataset=config["train_dataset"],
            eval_dataset=config["eval_dataset"],
        )
        
        # Practice the mannequin
        coach.practice()
        
        # Save the ultimate mannequin
        coach.save_model("./final_model")
        
        return {"loss": coach.state.best_metric}
    
    # Configure the distributed coaching
    scaling_config = ScalingConfig(
        num_workers=num_workers,
        use_gpu=use_gpu,
        resources_per_worker={"CPU": cpus_per_worker, "GPU": 1}
    )
    
    # Create the Ray Practice Coach
    coach = HuggingFaceTrainer(
        train_func,
        scaling_config=scaling_config,
        train_loop_config={
            "model_name": model_name,
            "train_dataset": train_dataset,
            "eval_dataset": eval_dataset,
            "batch_size": 4,
            "epochs": 3
        }
    )
    
    # Begin the distributed coaching
    consequence = coach.match()

    This code units up distributed coaching throughout a number of GPUs utilizing Ray Practice. The train_func is executed on every employee, with Ray dealing with the distribution of the workload.

    Mannequin Analysis

    After coaching, we’ll consider the mannequin’s efficiency:

    from transformers import pipeline
    
    # Load the fine-tuned mannequin
    model_path = "./final_model"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    mannequin = AutoModelForCausalLM.from_pretrained(model_path)
    
    # Create a textual content era pipeline
    text_generator = pipeline("text-generation", mannequin=mannequin, tokenizer=tokenizer, machine=0)
    
    # Instance prompts for analysis
    prompts = [
        "Artificial intelligence is",
        "The future of distributed computing",
        "Machine learning models can"
    ]
    
    # Generate textual content for every immediate
    for immediate in prompts:
        generated_text = text_generator(immediate, max_length=100, num_return_sequences=1)[0]["generated_text"]
        print(f"Immediate: {immediate}")
        print(f"Generated: {generated_text}")
        print("---")
    

    Deploying the Mannequin with Ray Serve

    Lastly, we’ll deploy the fine-tuned mannequin for inference utilizing Ray Serve:

    import ray
    from ray import serve
    from starlette.requests import Request
    import json
    
    # Begin Ray Serve
    serve.begin()
    
    # Outline a deployment for our mannequin
    @serve.deployment(route_prefix="/generate", num_replicas=2, ray_actor_options={"num_gpus": 1})
    class TextGenerationModel:
        def __init__(self, model_path):
            self.tokenizer = AutoTokenizer.from_pretrained(model_path)
            self.mannequin = AutoModelForCausalLM.from_pretrained(
                model_path,
                torch_dtype=torch.float16,
                device_map="auto"
            )
            self.pipeline = pipeline(
                "text-generation",
                mannequin=self.mannequin,
                tokenizer=self.tokenizer
            )
        
        async def __call__(self, request: Request) -> dict:
            knowledge = await request.json()
            immediate = knowledge.get("immediate", "")
            max_length = knowledge.get("max_length", 100)
            
            generated_text = self.pipeline(
                immediate,
                max_length=max_length,
                num_return_sequences=1
            )[0]["generated_text"]
            
            return {"generated_text": generated_text}
    
    # Deploy the mannequin
    model_deployment = TextGenerationModel.deploy("./final_model")
    
    # Instance shopper code to question the deployed mannequin
    import requests
    
    response = requests.publish(
        "http://localhost:8000/generate",
        json={"immediate": "Synthetic intelligence is", "max_length": 100}
    )
    print(response.json())
    

    This deployment makes use of Ray Serve to create a scalable inference service. Ray Serve handles the complexity of scaling, load balancing, and useful resource administration, permitting us to concentrate on the applying logic.

    Actual-World Purposes and Case Research of Ray

    Ray has gained vital traction in varied industries attributable to its capacity to scale AI/ML workloads effectively. Listed below are some notable real-world functions and case research:

    Massive-Scale AI Mannequin Coaching (OpenAI, Uber, and Meta)

    • OpenAI used Ray to scale reinforcement studying for coaching AI brokers like Dota 2 bots.
    • Uber’s Michelangelo leverages Ray for distributed hyperparameter tuning and mannequin coaching at scale.
    • Meta (Fb) employs Ray to optimize large-scale deep studying workflows.

    Monetary Providers and Fraud Detection (Ant Group, JP Morgan, and Goldman Sachs)

    • Ant Group (Alibaba’s fintech arm) integrates Ray for real-time fraud detection and threat evaluation.
    • JP Morgan and Goldman Sachs use Ray to speed up monetary modeling, threat evaluation, and algorithmic buying and selling methods.

    Autonomous Autos and Robotics (NVIDIA, Waymo, and Tesla)

    • NVIDIA makes use of Ray for reinforcement learning-based autonomous driving simulations.
    • Waymo and Tesla make use of Ray to coach self-driving automobile fashions with large-scale sensor knowledge processing.

    Healthcare and Drug Discovery (DeepMind, Genentech, and AstraZeneca)

    • DeepMind leverages Ray for protein folding simulations and AI-driven medical analysis.
    • Genentech and AstraZeneca use Ray in AI-driven drug discovery, accelerating computational biology and genomics analysis.

    Massive-Scale Suggestion Methods (Netflix, TikTok, and Amazon)

    • Netflix employs Ray to energy personalised content material suggestions and A/B testing.
    • TikTok scales advice fashions with Ray to enhance video recommendations in actual time.
    • Amazon enhances its advice algorithms and e-commerce search utilizing Ray’s distributed computing capabilities.

    Cloud & AI Infrastructure (Google Cloud, AWS, and Microsoft Azure)

    • Google Cloud Vertex AI integrates Ray for scalable machine studying mannequin coaching.
    • AWS SageMaker helps Ray for distributed hyperparameter tuning.
    • Microsoft Azure makes use of Ray for optimizing AI and machine studying companies.

    Ray at OpenAI: Powering Massive Language Fashions

    Some of the notable customers of Ray is OpenAI, which has leveraged the framework for coaching its massive language fashions, together with ChatGPT. In response to reviews, Ray was key in enabling OpenAI to reinforce its capacity to coach massive fashions effectively.

    Earlier than adopting Ray, OpenAI used a set of customized instruments to develop early fashions. Nevertheless, as the restrictions of this method turned obvious, the corporate switched to Ray. OpenAI’s president, Greg Brockman, highlighted this transition on the Ray Summit.

    The important thing benefit that Ray gives for LLM coaching is the power to run the identical code on each a developer’s laptop computer and an enormous distributed cluster. This functionality turns into more and more necessary as fashions develop in dimension and complexity.

    Superior Ray Options and Greatest Practices

    Allow us to now discover superior ray options and greatest practices:

    Reminiscence Administration in Distributed Purposes

    Environment friendly reminiscence administration is essential when working with large-scale ML workloads:

    • Object Spilling: Ray mechanically spills objects to disk when reminiscence strain is excessive. Configure spilling thresholds appropriately on your workload:
    ray.init(
        object_store_memory=10 * 10**9,  # 10 GB
        _memory_monitor_refresh_ms=100,  # Test reminiscence utilization each 100ms
    )
    • Reference Administration: Explicitly delete references to massive objects when now not wanted:
    # Create a big object
    data_ref = ray.put(large_dataset)
    
    # Use the reference
    result_ref = process_data.distant(data_ref)
    consequence = ray.get(result_ref)
    
    # Delete the reference when achieved
    del data_ref
    
    • Streaming Knowledge Processing: For very massive datasets, use Ray Knowledge’s streaming capabilities as an alternative of loading every part into reminiscence:
    import ray
    dataset = ray.knowledge.read_csv("s3://bucket/large_dataset/*.csv")
    
    # Course of the dataset in batches with out loading every part
    for batch in dataset.iter_batches():
        # Course of every batch
        process_batch(batch)
    

    Debugging Distributed Purposes

    Debugging distributed functions could be difficult. Ray gives a number of instruments to assist:

    • Ray Dashboard: Offers visibility into process execution, actor states, and useful resource utilization:
    # Begin Ray with the dashboard enabled
    ray.init(dashboard_host="0.0.0.0")
    # Entry the dashboard at http://:8265
    
    • Detailed Logging: Use Ray’s logging utilities to seize logs from all staff:
    import ray
    import logging
    
    # Configure logging
    ray.init(logging_level=logging.INFO)
    
    @ray.distant
    def task_with_logging():
        logger = logging.getLogger("ray")
        logger.information("This message will probably be captured in Ray's logs")
        return "Process accomplished"
    • Exception Dealing with: Ray propagates exceptions from distant duties again to the motive force:
    @ray.distant
    def task_that_might_fail(x):
        if x < 0:
            increase ValueError("x should be non-negative")
        return x * x
    
    # This may increase the ValueError within the driver
    strive:
        consequence = ray.get(task_that_might_fail.distant(-1))
    besides ValueError as e:
        print(f"Caught exception: {e}")
    

    Ray vs. Different Distributed Computing Frameworks

    We are going to now look in Ray vs. Different Distributed computing frameworks:

    Ray vs. Dask

    Each Ray and Dask are Python-native distributed computing frameworks, however they’ve completely different focuses:

    • Programming Mannequin: Ray’s process and actor mannequin gives extra flexibility in comparison with Dask’s process graph method.
    • ML/AI Focus: Ray has specialised libraries for ML (Practice, Tune, Serve), whereas Dask focuses extra on knowledge processing.
    • Knowledge Processing: Dask has deeper integration with PyData ecosystem (NumPy, Pandas).
    • Efficiency: Ray sometimes reveals higher efficiency for fine-grained duties and dynamic workloads.

    When to decide on Ray over Dask:

    • For ML-specific workloads (coaching, hyperparameter tuning, mannequin serving)
    • If you want the actor programming mannequin for stateful computation
    • For extremely dynamic process graphs that change throughout execution

    Ray vs. Apache Spark

    Ray and Apache Spark serve completely different main use circumstances:

    • Language Help: Ray is Python-first, whereas Spark is JVM-based with Python bindings.
    • Use Instances: Spark excels at batch knowledge processing, whereas Ray is designed for ML/AI workloads.
    • Iteration Velocity: Ray presents sooner iteration for ML experiments than Spark.
    • Programming Mannequin: Ray’s mannequin is extra versatile than Spark’s RDD/DataFrame abstractions.

    When to decide on Ray over Spark:

    • For Python-native ML workflows
    • If you want fine-grained process scheduling
    • For interactive improvement and quick iteration cycles
    • When constructing complicated functions that blend batch and on-line processing

    Ray vs. Kubernetes + Customized ML Code

    Whereas Kubernetes can be utilized to orchestrate ML workloads:

    • Abstraction Stage: Ray gives higher-level abstractions particular to ML/AI than Kubernetes.
    • Improvement Expertise: Ray presents a extra seamless improvement expertise with out requiring information of containers and YAML.
    • Integration: Ray can run on Kubernetes, combining the strengths of each techniques.

    When to decide on Ray over uncooked Kubernetes:

    • To keep away from the complexity of container orchestration
    • For a extra built-in ML improvement expertise
    • If you need to concentrate on algorithms somewhat than infrastructure

    Reference: Ray docs

    Conclusion

    Ray has emerged as a crucial instrument for scaling AI and ML workloads, from analysis prototypes to manufacturing techniques. Its intuitive programming mannequin, mixed with specialised libraries for coaching, tuning, and serving, makes it a beautiful alternative for organizations trying to scale their AI efforts effectively. Ray gives a path to scale that doesn’t require rewriting current code or mastering complicated distributed techniques ideas.

    By understanding Ray’s core ideas, libraries, and greatest practices outlined on this information, builders and knowledge scientists can leverage distributed computing to deal with issues that may be infeasible on a single machine, opening up new prospects in AI and ML improvement.

    Whether or not you’re coaching massive language fashions, optimizing hyperparameters, serving fashions at scale, or processing huge datasets, Ray gives the instruments and abstractions to make distributed computing accessible and productive. As the sphere continues to advance, Ray is positioned to play an more and more necessary function in enabling the following era of AI functions.

    Key Takeaways

    • Ray simplifies distributed computing for AI/ML by enabling seamless scaling from a single machine to a cluster with minimal code modifications.
    • Ray’s ecosystem (Practice, Tune, Serve, Knowledge) gives end-to-end options for distributed coaching, hyperparameter tuning, mannequin serving, and knowledge processing.
    • Ray’s process and actor-based programming mannequin makes parallelization intuitive, remodeling Python functions into scalable distributed workloads.
    • It optimizes useful resource administration by means of environment friendly scheduling, reminiscence administration, and computerized scaling throughout CPU/GPU clusters.
    • Actual-world AI functions at scale, together with LLM fine-tuning, reinforcement studying, and large-scale knowledge processing.

    Incessantly Requested Questions

    Q1. What’s Ray, and why is it used?

    A. Ray is an open-source framework for distributed computing, enabling Python functions to scale throughout a number of machines with minimal code adjustments. It’s extensively used for AI/ML workloads, reinforcement studying, and large-scale knowledge processing.

    Q2. How does Ray simplify distributed computing?

    A. Ray abstracts the complexities of parallelization by offering a easy process and actor-based programming mannequin. Builders can distribute workloads throughout a number of CPUs and GPUs with out managing low-level infrastructure.

    Q3. How does Ray evaluate to different distributed frameworks like Spark?

    A. Whereas Spark is optimized for batch knowledge processing, Ray is extra versatile, supporting dynamic, interactive, and AI/ML-specific workloads. Ray additionally has built-in help for deep studying and reinforcement studying functions.

    This fall. Can Ray run on cloud platforms?

    A. Sure, Ray helps deployment on main cloud suppliers (AWS, GCP, Azure) and integrates with Kubernetes for scalable orchestration.

    Q5. What varieties of workloads profit from Ray?

    A. Ray is good for distributed AI/ML mannequin coaching, hyperparameter tuning, large-scale knowledge processing, reinforcement studying, and serving AI fashions in manufacturing.

    The media proven on this article just isn’t owned by Analytics Vidhya and is used on the Writer’s discretion.


    Krishnaveni Ponna

    Hey! I am a passionate AI and Machine Studying fanatic at the moment exploring the thrilling realms of Deep Studying, MLOps, and Generative AI. I take pleasure in diving into new initiatives and uncovering progressive methods that push the boundaries of know-how. I will be sharing guides, tutorials, and undertaking insights based mostly by myself experiences, so we will be taught and develop collectively. Be a part of me on this journey as we discover, experiment, and construct superb options on the planet of AI and past!

    Login to proceed studying and luxuriate in expert-curated content material.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Charlotte Li
    • Website

    Related Posts

    ML Mannequin Serving with FastAPI and Redis for sooner predictions

    June 9, 2025

    Construct a Textual content-to-SQL resolution for information consistency in generative AI utilizing Amazon Nova

    June 7, 2025

    Multi-account assist for Amazon SageMaker HyperPod activity governance

    June 7, 2025
    Leave A Reply Cancel Reply

    Top Posts

    New PathWiper Malware Strikes Ukraine’s Vital Infrastructure

    June 9, 2025

    How AI is Redrawing the World’s Electrical energy Maps: Insights from the IEA Report

    April 18, 2025

    Evaluating the Finest AI Video Mills for Social Media

    April 18, 2025

    Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

    April 18, 2025
    Don't Miss

    New PathWiper Malware Strikes Ukraine’s Vital Infrastructure

    By Declan MurphyJune 9, 2025

    A newly recognized malware named PathWiper was just lately utilized in a cyberattack concentrating on…

    Soneium launches Sony Innovation Fund-backed incubator for Soneium Web3 recreation and shopper startups

    June 9, 2025

    ML Mannequin Serving with FastAPI and Redis for sooner predictions

    June 9, 2025

    OpenAI Bans ChatGPT Accounts Utilized by Russian, Iranian and Chinese language Hacker Teams

    June 9, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    UK Tech Insider
    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms Of Service
    • Our Authors
    © 2025 UK Tech Insider. All rights reserved by UK Tech Insider.

    Type above and press Enter to search. Press Esc to cancel.