Simplify entry management and auditing for Amazon SageMaker Studio utilizing trusted id propagation

AWS helps trusted id propagation, a characteristic that permits AWS companies to securely propagate a person’s id throughout service boundaries. With trusted id propagation, you have got fine-grained entry controls primarily based on a bodily person’s id somewhat than counting on IAM roles. This integration permits for the implementation of entry management by companies akin to Amazon S3 Entry Grants and maintains detailed audit logs of person actions throughout supported AWS companies akin to Amazon EMR. Moreover, it helps long-running person background classes for coaching jobs, so you possibly can log off of your interactive ML software whereas the background job continues to run.

Amazon SageMaker Studio now helps trusted id propagation, providing a robust answer for enterprises looking for to boost their ML system safety. By integrating trusted id propagation with SageMaker Studio, organizations can simplify entry administration by granting permissions to present AWS IAM Identification Heart identities.

On this publish, we discover the best way to allow and use trusted id propagation in SageMaker Studio, demonstrating its advantages by sensible use circumstances and implementation pointers. We stroll by the setup course of, focus on key concerns, and showcase how this characteristic can remodel your group’s method to safety and entry controls.

Answer overview

On this part, we evaluate the structure for the proposed answer and the steps to allow trusted id propagation on your SageMaker Studio area.

The next diagram exhibits the interplay between the completely different parts that enable the person’s id to propagate from their id supplier and IAM Identification Heart to downstream companies akin to Amazon EMR and Amazon Athena.

With a trusted id propagation-enabled SageMaker Studio area, customers can entry information throughout supported AWS companies utilizing their finish person id and group membership, along with entry allowed by their area or person execution position. As well as, API calls from SageMaker Studio notebooks and supported AWS companies and Amazon SageMaker AI options log the person id in AWS CloudTrail. For a listing of supported AWS companies and SageMaker AI options, see Trusted id propagation structure and compatibility. Within the following sections, we present the best way to allow trusted id propagation on your area.

This answer applies for SageMaker Studio domains arrange utilizing IAM Identification Heart as the strategy of authentication. In case your area is ready up utilizing IAM, see Implement user-level entry management for multi-tenant ML platforms on Amazon SageMaker AI for greatest practices on managing and scaling entry management.

Stipulations

To comply with together with this publish, it’s essential to have the next:

An AWS account with a company occasion of IAM Identification Heart configured by AWS Organizations
Administrator permissions (or elevated permissions permitting modification of IAM principals, and SageMaker administrator entry to create and replace domains)

Create or replace the SageMaker execution position

For trusted id propagation to work, the SageMaker execution position (area and person profile execution position), ought to enable the sts:SetContext permissions, along with sts:AssumeRole, in its belief coverage. For a brand new SageMaker AI area, create a website execution position by following the directions in Create execution position. For present domains, comply with the directions in Get your execution position to seek out the person or area’s execution position.

Subsequent, to replace the belief coverage for the position, full the next steps:

Within the navigation pane of the IAM console, select Roles.
Within the listing of roles in your account, select the area or person execution position.
On the Belief relationships tab, select Edit belief coverage.
Replace the belief coverage with the next assertion:

{
  "Model": "2012-10-17",
  "Assertion": [
     .....
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "sagemaker.amazonaws.com",
        ]
      },
      "Motion": [
        "sts:AssumeRole",
        "sts:SetContext"
      ],
      "Situation": {
	"aws:SourceAccount": ""
         }
       }
    }
  ]
}

Select Replace coverage to avoid wasting your adjustments.

Trusted id propagation solely works for personal areas on the time of launch.

Create a SageMaker AI area with trusted id propagation enabled

SageMaker AI domains utilizing IAM Identification Heart for authentication can solely be arrange in the identical AWS Area because the IAM Identification Heart occasion. To create a brand new SageMaker area, comply with the steps in Use customized setup for Amazon SageMaker AI. For Trusted id propagation, choose Allow trusted id propagation for all customers on this area, and proceed with the remainder of the setup to create a website and assign customers and teams, selecting the position you created within the earlier step.

Replace an present SageMaker AI area

You may as well replace your present SageMaker AI area to allow trusted id propagation. You may allow trusted id propagation even whereas the area or person has lively SageMaker Studio purposes. Nonetheless, for the adjustments to be utilized, the lively purposes have to be restarted. You need to use the EffectiveTrustedIdentityPropagationStatus subject within the response to the DescribeApp API for operating purposes to find out if the appliance has trusted id propagation enabled.

To allow trusted id propagation for the area utilizing the SageMaker AI console, select Edit underneath Authentication and permissions on the Area settings tab.

For Trusted id propagation, choose Allow trusted id propagation for all customers on this area, and select Submit to avoid wasting the adjustments.

(Elective) Replace person background session configuration in IAM Identification Heart

IAM Identification Heart now helps operating person background classes, and the session length is ready by default to 7 days. With background classes, customers can launch long-running SageMaker coaching jobs that assume the person’s id context together with the SageMaker execution position. As an administrator, you possibly can allow or disable person background classes, and modify the session length for person background classes. As of the time of writing, the utmost session length which you can set for person background classes is 90 days. The person’s session is stopped on the finish of the required length, and consequently, the coaching job will even fail on the finish of the session length.

To disable or replace the session length, navigate to the IAM Identification Heart console, select Settings within the navigation pane, and select Configure underneath Session length.

For Person background classes, choose Allow person background classes and use the dropdown to alter the session length. If person background classes are disabled, the person have to be logged in during the coaching job; in any other case, the coaching job will fail as soon as the person logs out. Updating this configuration doesn’t have an effect on present operating classes and solely applies to newly created person background classes. Select Save to avoid wasting your settings.

Use circumstances

Think about you’re an enterprise with tons of and even 1000’s of customers, every requiring various ranges of entry to information throughout a number of groups. You’re answerable for sustaining an AI/ML system on SageMaker AI and managing entry permissions throughout numerous information sources akin to Amazon Easy Storage Service (Amazon S3), Amazon Redshift, and AWS Lake Formation. Historically, this has concerned sustaining complicated IAM insurance policies for customers, companies, and sources, together with bucket insurance policies the place relevant. This method is just not solely tedious but additionally makes it difficult to trace and audit information entry with out sustaining a separate position for every person.

That is exactly the situation that trusted id propagation goals to handle. With trusted id propagation assist, now you can preserve service-specific roles with minimal permissions, akin to s3:GetDataAccess or LakeFormation:GetDataAccess, together with further permissions to begin jobs, view job statuses, and carry out different essential duties. For information entry, you possibly can assign fine-grained insurance policies on to particular person customers. As an example, Jane might need learn entry to buyer information and full entry to gross sales and pricing information, whereas Laura would possibly solely have learn entry to gross sales developments. Each Jane and Laura can assume the identical SageMaker AI position to entry their SageMaker Studio purposes, whereas sustaining separate information entry permissions primarily based on their particular person identities.Within the following sections, we discover how this may be achieved for frequent use circumstances, demonstrating the facility and suppleness of trusted id propagation in simplifying information entry administration whereas sustaining sturdy safety and auditability.

State of affairs 1: Experiment with Amazon S3 information in notebooks

S3 Entry Grants present a simplified strategy to handle information entry at scale. Not like conventional IAM roles and insurance policies that require an in depth data of IAM ideas, and frequent coverage updates as new sources are added, with S3 Entry Grants, you possibly can outline entry to information primarily based on acquainted database-like grants that mechanically scale together with your information. This method considerably reduces the operational overhead of managing 1000’s of IAM insurance policies and bucket insurance policies, and overcomes the restrictions of IAM permissions, whereas strengthening safety by entry patterns. In the event you don’t have S3 Entry Grants arrange, see Create an S3 Entry Grant occasion to get began. For detailed structure and use circumstances, it’s also possible to seek advice from Scaling information entry with Amazon S3 Entry Grants. After you have got arrange S3 Entry Grants, you possibly can grant entry to your datasets to customers primarily based on their id in IAM Identification Heart.

To make use of S3 Entry Grants from SageMaker Studio, replace the next IAM roles with insurance policies and belief insurance policies.

For the area or person execution position, add the next inline coverage:

{
    "Model": "2012-10-17",
    "Assertion": [
        {
            "Sid": "AllowDataAccessAPI",
            "Effect": "Allow",
            "Action": [
                "s3:GetDataAccess"
            ],
            "Useful resource": [
                "arn:aws:s3:::access-grants/default"
            ]
        },
        {
            "Sid": "RequiredForTIP",
            "Impact": "Permit",
            "Motion": "sts:SetContext",
            "Useful resource": "arn:aws:iam:::position/"
        }
    ]
}

Be sure that the S3 Entry Grants position’s belief coverage permits the sts:SetContext motion along with sts:AssumeRole. The next is a pattern belief coverage:

{
    "Model": "2012-10-17",
    "Assertion": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "access-grants.s3.amazonaws.com"
                ]
            },
            "Motion": [
                "sts:AssumeRole",
                "sts:SetContext"
            ],
            "Situation": {
                "StringEquals": {
                    "aws:SourceArn": "arn:aws:s3:::access-grants/default"
                }
            }
        }
    ]

Now, the person can entry the info as allowed by S3 Entry Grants on your person id by calling the GetDataAccess API to return momentary credentials, and by assuming the momentary credentials to learn or write to their prefixes. For instance, the next code exhibits the best way to use Boto3 to get momentary credentials and assume the credentials to get entry to Amazon S3 places which might be allowed by S3 Entry Grants:

import boto3
from botocore.config import Config

def get_access_grant_credentials(account_id: str, goal: str, 
                                 permission: str="READ"):
    s3control = boto3.shopper('s3control')
    response = s3control.get_data_access(
        AccountId=account_id,
        Goal=goal,
        Permission=permission
    )
    return response['Credentials']

def create_s3_client_from_credentials(credentials) -> boto3.shopper:
    return boto3.shopper(
        's3',
        aws_access_key_id=credentials['AccessKeyId'],
        aws_secret_access_key=credentials['SecretAccessKey'],
        aws_session_token=credentials['SessionToken']
    )

# Create shopper
credentials = get_access_grant_credentials('',
                                        "s3:////")
s3 = create_s3_client_from_credentials(credentials)

# Will succeed
s3.list_objects(Bucket="", Prefix="")

# Will fail
s3.list_objects(Bucket="", Prefix="")

State of affairs 2: Entry Lake Formation by Athena

Lake Formation gives centralized governance and fine-grained entry management administration for information saved in Amazon S3 and metadata within the AWS Glue Knowledge Catalog. The Lake Formation permission mannequin operates at the side of IAM permissions, providing granular controls on the database, desk, column, row, and cell ranges. This dual-layer safety mannequin gives complete information governance whereas sustaining flexibility in entry patterns.

Knowledge ruled by Lake Formation may be accessed by varied AWS analytics companies. On this situation, we show utilizing Athena, a serverless question engine that integrates seamlessly with Lake Formation’s permission mannequin. For different companies like Amazon EMR on EC2, be sure the useful resource is configured to assist trusted id propagation, together with establishing safety configurations and ensuring the EMR cluster is configured with IAM roles that assist trusted id propagation.

The next directions assume that you’ve already arrange Lake Formation. If not, see Arrange AWS Lake Formation and comply with the AWS Lake Formation tutorials to arrange Lake Formation and herald your information.

Full the next steps to entry your ruled information in trusted id propagation-enabled SageMaker Studio notebooks utilizing Athena:

Combine Lake Formation with IAM Identification Heart by following the directions in Integrating IAM Identification Heart. At a excessive stage, this consists of creating an IAM position permitting creating and updating software configurations in Lake Formation and IAM Identification Heart, and offering the one sign-on (SSO) occasion ID.
Grant permissions to the IAM Identification Heart person to the related sources (database, desk, row or column) utilizing Lake Formation. See Granting permissions on Knowledge Catalog sources directions.
Create an Athena workgroup that helps trusted id propagation by following directions in Create a workgroup and selecting IAM Identification Heart as the strategy of authentication. Be sure that the person has entry to put in writing to the question outcomes location supplied right here utilizing S3 Entry Grants, as a result of Athena makes use of entry grants by default when selecting IAM Identification Heart because the authentication technique.
Replace the Athena workgroup’s IAM position with the next belief coverage (add sts:SetContext to the prevailing belief coverage). You will discover the IAM position by selecting the workgroup you created earlier and in search of Position title.

{
    "Model": "2012-10-17",
    "Assertion": [
        {
            "Sid": "AthenaTrustPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "athena.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:SetContext"
            ],
            "Situation": {
                "StringEquals": {
                    "aws:SourceAccount": ""
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:athena:::workgroup/"
                }
            }
        }
    ]
}

The setup is now full. Now you can launch SageMaker Studio utilizing an IAM Identification Heart person, launch a JupyterLab or Code Editor software, and question the database. See the next instance code to get began:

import time
import boto3
import pandas as pd
athena_client = boto3.shopper("athena")

database = ""
desk = ""
question = f"SELECT * FROM {database}.{desk}"
output_location = "s3:///queries"  # bucket title and site from Step 3

response = athena_client.start_query_execution(
    QueryString=question,
    QueryExecutionContext={'Database': database},
    ResultConfiguration={'OutputLocation': output_location}
)

# Get the question execution ID
query_execution_id = response['QueryExecutionId']

# look forward to question to finish
whereas True:
    query_status = athena_client.get_query_execution(QueryExecutionId=query_execution_id)
    standing = query_status['QueryExecution']['Status']['State']
    if standing in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
        break
    time.sleep(1)

# If the question succeeded, fetch and show outcomes
if standing == 'SUCCEEDED':
    outcomes = athena_client.get_query_results(QueryExecutionId=query_execution_id)
    
    # Extract column names and information
    columns = [col['Name'] for col in outcomes['ResultSet']['ResultSetMetadata']['ColumnInfo']]
    information = []
    for row in outcomes['ResultSet']['Rows'][1:]:  # Skip the header row
        information.append([field.get('VarCharValue', '') for field in row['Data']])
    
    # Create a pandas DataFrame
    df = pd.DataFrame(information, columns=columns)
    
    # Show the primary few rows
    print(df.head())
else:
    print(f"Question failed with standing: {standing}")

State of affairs 3: Create a coaching job supported with person background classes

For a trusted id propagation-enabled area, a person background session is a session that continues to run even when the end-user has logged out of their interactive session akin to JupyterLab purposes in SageMaker Studio. For instance, the person can provoke a coaching job from their SageMaker Studio area, and the job can run within the background for days or even weeks whatever the person’s exercise, and use the person’s id to entry information and log audit trails. In case your area doesn’t have trusted id propagation enabled, you possibly can proceed to run coaching jobs and processing jobs as earlier than; nevertheless, if trusted id propagation is enabled, be sure your person background session time is up to date to mirror the length of your coaching jobs, as a result of the default is ready mechanically to 7 days. When you have enabled person background classes, replace your SageMaker Studio area or person’s execution position with the next permissions to offer a seamless expertise for information scientists:

{
    "Model": "2012-10-17",
    "Assertion": [
        {
            "Sid": "AllowDataAccessAPI",
            "Effect": "Allow",
            "Action": [
                "s3:GetDataAccess",
                "s3:GetAccessGrantsInstanceForPrefix"
            ],
            "Useful resource": [
                "arn:aws:s3:::access-grants/default"
            ]
        },
        {
            "Sid": "RequiredForTIP",
            "Impact": "Permit",
            "Motion": "sts:SetContext",
            "Useful resource": "arn:aws:iam:::position/"
        }
    ]
}

With this setup, a knowledge scientist can use an Amazon S3 location that they’ve entry to by S3 Entry Grants. SageMaker mechanically appears to be like for information entry utilizing S3 Entry Grants and falls again to the job’s IAM position in any other case. For instance, within the following SDK name to create the coaching job, the person gives the S3 Amazon URI the place the info is saved, they’ve entry to it by S3 Entry Grants, and so they can run this job with out further setup:

    response = sm.create_training_job(
        TrainingJobName=training_job_name,
        AlgorithmSpecification={
            'TrainingImage': '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04',
            'TrainingInputMode': 'File',
            ...
                    RoleArn='arn:aws:iam:::position/tip-domain-role',
        InputDataConfig=[
            {
                'ChannelName': 'training',
                'DataSource': {
                    'S3DataSource': {
                        'S3DataType': 'S3Prefix',
                        'S3Uri': 's3:///',
                        'S3DataDistributionType': 'FullyReplicated'
                    }
                },
                'CompressionType': 'None',
                'RecordWrapperType': 'None'
            },
            ...
        }

(Optional) View and manage user background sessions on IAM Identity Center

When training jobs are run as user background sessions, you can view these sessions as user background sessions on IAM Identity Center. The administrator can view a list of all user background sessions and optionally stop a session if the user has left the team, for example. When the user background session is ended, the training job subsequently fails.

To view a list of all user background sessions, on the IAM Identity Center console, choose Users and choose the user you want view the user background sessions for. Choose the Active sessions tab to view a list of sessions. The user background session can be identified by the Session type column, which shows if the session is interactive or a user background session. The list also shows the job’s Amazon Resource Name (ARN) under the Used by column.

To end a session, select the session and choose End sessions.

You will be prompted to confirm the action. Enter confirm to confirm that you want to end the session and choose End sessions to stop the user background session.

Scenario 4: Auditing using CloudTrail

After trusted identity propagation is enabled for your domain, you can now track the user that performed specific actions through CloudTrail. To try this out, log in to SageMaker Studio, and create and open a JupyterLab space. Open a terminal and enter aws s3 ls to list the available buckets in your Region.

On the CloudTrail console, choose Event history in the navigation pane. Update the Lookup attributes to Event name and in the search box, enter ListBuckets. You should see a list of events, as shown in the following screenshot (it might take up to 5 minutes for the logs to be available in CloudTrail).

Choose the event to view its details (verify the user name is SageMaker if you have also listed buckets through the AWS console or APIs). In the event details, you should be able to see an additional field called onBehalfOf that has the user’s identity.

Supported services and SageMaker AI features called from a trusted identity propagation-enabled SageMaker Studio domain will have the OnBehalfOf field in CloudTrail.

Clean up

If you have created a SageMaker Studio domain for the purposes of trying out trusted identity propagation, delete the domain and its associated Amazon Elastic File System (Amazon EFS) volume to avoid incurring additional charges. Before deleting a domain, you must delete all the users and their associated spaces and applications. For detailed instructions, see Stop and delete your Studio running applications and spaces.

If you created a SageMaker training job, they are ephemeral, and the compute is shut down automatically when the job is complete.

Athena is a serverless analytics service that charges per query billing. No cleanup is necessary, but for best practices, delete the workgroup to remove unused resources.

Conclusion

In this post, we showed you how to enable trusted identity propagation for SageMaker AI domains that use IAM Identity Center as the mode of authentication. With trusted identity propagation, administrators can manage user authorization to other AWS services through the user’s physical identity in conjunction with IAM roles. Administrators can streamline permissions management by maintaining a single domain execution role and manage granular access to other AWS services and data sources through the user’s identity. In addition, trusted identity propagation supports auditing, so administrators can track user activity without the need for managing a role for each user profile.

To learn more about enabling this feature and its use cases, see Trusted identity propagation use cases and Trusted identity propagation with Studio. This post covered a subset of supported applications; we encourage you to check out the documentation and choose the services that best serve your use case and share your feedback!

About the authors

Amit Shyam Jaisinghani is a Software Engineer on the SageMaker Studio team at Amazon Web Services, and he earned his Master’s degree in Computer Science from Rochester Institute of Technology. Since joining Amazon in 2019, he has built and enhanced several AWS services, including AWS WorkSpaces and Amazon SageMaker Studio. Outside of work, he explores hiking trails, plays with his two cats, Missy and Minnie, and enjoys playing Age of Empire.

Durga Sury is a Senior Solutions Architect at Amazon SageMaker, where she helps enterprise customers build secure and scalable AI/ML systems. When she’s not architecting solutions, you can find her enjoying sunny walks with her dog, immersing herself in murder mystery books, or catching up on her favorite Netflix shows.

Khushboo Srivastava is a Senior Product Manager for Amazon SageMaker. She enjoys building products that simplify machine learning workflows for customers, and loves playing with her 1-year old daughter.

Krishnan Manivannan is a Senior Software Engineer at Amazon Web Services and a founding member of the SageMaker AI API team. He has 8 years of experience in the architecture and security of large-scale machine learning services. His specialties include API design, service scalability, identity and access management, and inventing new approaches for building and operating distributed systems. Krishnan has led multiple engineering efforts from design through global launch, delivering reliable and secure systems for customers worldwide.

Main Menu

What's Hot

Russian hackers accused of assault on Poland electrical energy grid

Palantir Defends Work With ICE to Workers Following Killing of Alex Pretti

The Workers Who Quietly Maintain Groups Collectively

Simplify entry management and auditing for Amazon SageMaker Studio utilizing trusted id propagation

How CLICKFORCE accelerates data-driven promoting with Amazon Bedrock Brokers

5 Breakthroughs in Graph Neural Networks to Watch in 2026

AI within the Workplace – O’Reilly

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

Russian hackers accused of assault on Poland electrical energy grid

Palantir Defends Work With ICE to Workers Following Killing of Alex Pretti

The Workers Who Quietly Maintain Groups Collectively

Nike Knowledge Breach Claims Floor as WorldLeaks Leaks 1.4TB of Recordsdata On-line – Hackread – Cybersecurity Information, Knowledge Breaches, AI, and Extra

Main Menu

Subscribe to Updates

What's Hot

Simplify entry management and auditing for Amazon SageMaker Studio utilizing trusted id propagation

Answer overview

Stipulations

Create or replace the SageMaker execution position

Create a SageMaker AI area with trusted id propagation enabled

Replace an present SageMaker AI area

(Elective) Replace person background session configuration in IAM Identification Heart

Use circumstances

State of affairs 1: Experiment with Amazon S3 information in notebooks

State of affairs 2: Entry Lake Formation by Athena

State of affairs 3: Create a coaching job supported with person background classes

(Optional) View and manage user background sessions on IAM Identity Center

Scenario 4: Auditing using CloudTrail

Clean up

Conclusion

About the authors

Related Posts