A consumer can conduct machine studying (ML) knowledge experiments in knowledge environments, equivalent to Snowflake, utilizing the Snowpark library. Nevertheless, monitoring these experiments throughout numerous environments will be difficult as a result of problem in sustaining a central repository to observe experiment metadata, parameters, hyperparameters, fashions, outcomes, and different pertinent data. On this submit, we show methods to combine Amazon SageMaker managed MLflow as a central repository to log these experiments and supply a unified system for monitoring their progress.
Amazon SageMaker managed MLflow affords totally managed companies for experiment monitoring, mannequin packaging, and mannequin registry. The SageMaker Mannequin Registry streamlines mannequin versioning and deployment, facilitating seamless transitions from growth to manufacturing. Moreover, integration with Amazon S3, AWS Glue, and SageMaker Characteristic Retailer enhances knowledge administration and mannequin traceability. The important thing advantages of utilizing MLflow with SageMaker are that it permits organizations to standardize ML workflows, enhance collaboration, and speed up synthetic intelligence (AI)/ML adoption with a safer and scalable infrastructure. On this submit, we present methods to combine Amazon SageMaker managed MLflow with Snowflake.
Snowpark permits Python, Scala, or Java to create customized knowledge pipelines for environment friendly knowledge manipulation and preparation when storing coaching knowledge in Snowflake. Customers can conduct experiments in Snowpark and observe them in Amazon SageMaker managed MLflow. This integration permits knowledge scientists to run transformations and have engineering in Snowflake and utilise the managed infrastructure inside SageMaker for coaching and deployment, facilitating a extra seamless workflow orchestration and safer knowledge dealing with.
Answer overview
The combination leverages Snowpark for Python, a client-side library that permits Python code to work together with Snowflake from Python kernels, equivalent to SageMaker’s Jupyter notebooks. One workflow may embrace knowledge preparation in Snowflake, together with characteristic engineering and mannequin coaching inside Snowpark. Amazon SageMaker managed MLflow can then be used for experiment monitoring and mannequin registry built-in with the capabilities of SageMaker.
Determine 1: Structure diagram
Seize key particulars with MLflow Monitoring
MLflow Monitoring is vital within the integration between SageMaker, Snowpark, and Snowflake by offering a centralized surroundings for logging and managing all the machine studying lifecycle. As Snowpark processes knowledge from Snowflake and trains fashions, MLflow Monitoring can be utilized to seize key particulars together with mannequin parameters, hyperparameters, metrics, and artifacts. This permits knowledge scientists to observe experiments, evaluate completely different mannequin variations, and confirm reproducibility. With MLflow’s versioning and logging capabilities, groups can seamlessly hint the outcomes again to the particular dataset and transformations used, making it easier to trace the efficiency of fashions over time and preserve a clear and environment friendly ML workflow.
This method affords a number of advantages. It permits for scalable and managed MLflow tracker in SageMaker, whereas using the processing capabilities of Snowpark for mannequin inference throughout the Snowflake surroundings, making a unified knowledge system. The workflow stays throughout the Snowflake surroundings, which boosts knowledge safety and governance. Moreover, this setup helps to cut back value by using the elastic compute energy of Snowflake for inference with out sustaining a separate infrastructure for mannequin serving.
Stipulations
Create/configure the next assets and make sure entry to the aforementioned assets previous to establishing Amazon SageMaker MLflow:
- A Snowflake account
- An S3 bucket to trace experiments in MLflow
- An Amazon SageMaker Studio account
- An AWS Id and Entry Administration (IAM) position that’s an Amazon SageMaker Area Execution Function within the AWS account.
- A brand new consumer with permission to entry the S3 bucket created above; observe these steps.
- Verify entry to an AWS account by means of the AWS Administration Console and AWS Command Line Interface (AWS CLI). The AWS Id and Entry Administration (IAM) consumer will need to have permissions to make the mandatory AWS service calls and handle AWS assets talked about on this submit. Whereas offering permissions to the IAM consumer, observe the precept of least-privilege.
- Configure entry to the Amazon S3 bucket created above following these steps.
- Observe these steps to arrange exterior entry for Snowflake Notebooks.
Steps to name SageMaker’s MLflow Monitoring Server from Snowflake
We now set up the Snowflake surroundings and join it to the Amazon SageMaker MLflow Monitoring Server that we beforehand arrange.
- Observe these steps to create an Amazon SageMaker Managed MLflow Monitoring Server in Amazon SageMaker Studio.
- Log in to Snowflake as an admin consumer.
- Create a brand new Pocket book in Snowflake
- Tasks > Notebooks > +Pocket book
- Change position to a non-admin position
- Give a reputation, choose a database (DB), schema, warehouse, and choose ‘Run on container’

- Pocket book settings > Exterior entry> toggle on to permit all integration
- Set up libraries
!pip set up sagemaker-mlflow
- Run the MLflow code, by changing the arn worth from the beneath code:
Determine 3: Set up sagemaker-mlflow library
Determine 4: Configure MLflow and do experiments.
On a profitable run, the experiment will be tracked on Amazon SageMaker:
Determine 5: Observe experiments in SageMaker MLflow
To get into particulars of an experiment, click on on the respective “Run title:”
Determine 6: Expertise detailed experiment insights
Clear up
Observe these steps to clear up the assets that now we have configured on this submit to assist keep away from ongoing prices.
- Delete the SageMaker Studio account by following these steps, this deletes the MLflow monitoring server as properly
- Delete the S3 bucket with its contents
- Drop the Snowflake pocket book
- Confirm that the Amazon SageMaker account is deleted
Conclusion
On this submit, we explored how Amazon SageMaker managed MLflow can present a complete resolution for managing a machine studying lifecycle. The combination with Snowflake by means of Snowpark additional enhances this resolution, serving to to allow seamless knowledge processing and mannequin deployment workflows.
To get began, observe the step-by-step directions offered above to arrange MLflow Monitoring Server in Amazon SageMaker Studio and combine it with Snowflake. Keep in mind to observe AWS safety finest practices by implementing correct IAM roles and permissions and securing all credentials appropriately.
The code samples and directions on this submit function a place to begin – they are often tailored to match a selected use instances and necessities whereas sustaining safety and scalability finest practices.
Concerning the authors
Ankit Mathur is a Options Architect at AWS centered on fashionable knowledge platforms, AI-driven analytics, and AWS–Accomplice integrations. He helps clients and companions design safe, scalable architectures that ship measurable enterprise outcomes.
Mark Hoover is a Senior Options Architect at AWS the place he’s centered on serving to clients construct their concepts within the cloud. He has partnered with many enterprise purchasers to translate advanced enterprise methods into revolutionary options that drive long-term progress.

