Automating AWS S3 Cross-Account Replication: A Comprehensive Guide
Introduction
In a multi-account AWS environment, managing data across accounts can be a complex task. AWS S3 Cross-Account Replication simplifies this by allowing automatic replication of objects from one bucket in one AWS account to another bucket in a different AWS account. This guide provides an overview of the concepts involved and presents a solution using Python scripts to automate the setup process.
The Problem
Ensuring data redundancy and disaster recovery in a multi-account setup can be challenging. Manual data synchronization is error-prone and inefficient, leading to potential data loss and extended recovery times. Cross-account replication addresses these issues by automating the process, ensuring data consistency and availability across accounts.
The Solution
Cross-account S3 replication uses IAM roles and bucket policies to securely replicate objects between S3 buckets in different AWS accounts. This approach not only automates data synchronization but also enhances data redundancy and disaster recovery capabilities.
For the successful replication of objects between the source and destination S3 buckets, the following requirements must be met:
- Enable AWS Regions: The source bucket owner must have the source and destination AWS Regions enabled for their account. Similarly, the destination bucket owner must have the destination Region enabled for their account. Enabling or disabling an AWS Region can be managed using the AWS Management Console or APIs.
- Enable Versioning: Both the source and destination buckets must have versioning enabled. Versioning keeps track of multiple versions of an object in the bucket, ensuring data integrity and facilitating the replication process. You can enable versioning for a bucket through the AWS Management Console or via the AWS CLI/APIs.
- Set Up Replication Permissions: Amazon S3 must have the necessary permissions to replicate objects from the source bucket to the destination bucket(s) on your behalf. These permissions involve configuring appropriate access policies and roles.
- Set Up Bucket Policy Permissions: The owner of the destination buckets must grant the owner of the source bucket the necessary permissions to replicate objects. This can be achieved by configuring a bucket policy that allows the source bucket owner to replicate objects to the destination bucket.
- Create a batch job to copy existing objects: To replicate existing objects, a batch job is created on the source account. This job uses the defined replication rules to copy all eligible objects.
Concepts and Implementation
Let’s explore the key concepts and the Python script used to automate the setup of S3 cross-account replication.
Enabling Versioning
Versioning must be enabled on both source and destination buckets to support replication. This ensures that all versions of objects are replicated.
def enable_versioning(bucket_name, profile_name):
s3_client = get_s3_client(profile_name)
s3_client.put_bucket_versioning(
Bucket=bucket_name,
VersioningConfiguration={'Status': 'Enabled'}
)
Creating IAM Role for Replication in the Source AWS Account
An IAM role in the source account is required to handle replication tasks. This role needs specific permissions to interact with S3 buckets. The assume role policy document defines the trusted entity that can assume the role. In this case, the trusted entity is the Amazon S3 service and Batch operations service.
def create_iam_role(role_name, profile_name):
iam_client = get_iam_client(profile_name)
iam_client.create_role(
RoleName=role_name,
AssumeRolePolicyDocument=json.dumps({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": ["s3.amazonaws.com", "batchoperations.s3.amazonaws.com"]
},
"Action": "sts:AssumeRole"
}
]
}),
)
Attaching Policies to IAM Role in the Source AWS Account
Policies need to be attached to the IAM role to grant it necessary permissions for accessing and replicating objects. Additionally need to provide access to completion_report_bucket where the results of the batch operations are stored.
def attach_policy_to_iam_role(role_name, policy_name, destination_bucket, source_bucket, completion_report_bucket, profile_name):
print(f"Attaching policy to the IAM role in {profile_name}")
iam_client = get_iam_client(profile_name)
policy_document = {
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action":[
"s3:GetReplicationConfiguration",
"s3:ListBucket",
"s3:PutInventoryConfiguration"
],
"Resource": f"arn:aws:s3:::{source_bucket}"
},
{
"Effect":"Allow",
"Action":[
"s3:GetObjectVersionForReplication",
"s3:GetObjectVersionAcl",
"s3:GetObjectVersionTagging",
"s3:InitiateReplication"
],
"Resource": f"arn:aws:s3:::{source_bucket}/*"
},
{
"Effect":"Allow",
"Action":[
"s3:ReplicateObject",
"s3:ReplicateDelete",
"s3:ReplicateTags"
],
"Resource": f"arn:aws:s3:::{destination_bucket}/*"
},
{
"Effect":"Allow",
"Action":[
"s3:GetObject",
"s3:GetObjectVersion",
"s3:PutObject"
],
"Resource": f"arn:aws:s3:::{completion_report_bucket}/*"
}
]
}
iam_client.put_role_policy(
RoleName=role_name,
PolicyName=policy_name,
PolicyDocument=json.dumps(policy_document)
)
Adding Bucket Policies to the destination bucket
Bucket policies grant the source account’s IAM role the necessary permissions to access the destination bucket.
def add_bucket_policy(bucket_name, role_name, profile_name, source_account_id):
print(f"Adding bucket policy to the destination bucket {bucket_name} in {profile_name}...")
# Get the current bucket policy
s3_client = get_s3_client(profile_name)
try:
current_policy = json.loads(s3_client.get_bucket_policy(Bucket=bucket_name)['Policy'])
except:
current_policy = {'Statement': []}
# New policy statement
new_statement = [
{
"Sid":"Permissions on objects",
"Effect":"Allow",
"Principal":{
"AWS":f"arn:aws:iam::{source_account_id}:role/{role_name}"
},
"Action":[
"s3:ReplicateDelete",
"s3:ReplicateObject",
"s3:ReplicateTags"
],
"Resource":f"arn:aws:s3:::{bucket_name}/*"
},
{
"Sid":"Permissions on bucket",
"Effect":"Allow",
"Principal":{
"AWS":f"arn:aws:iam::{source_account_id}:role/{role_name}"
},
"Action": [
"s3:List*",
"s3:GetBucketVersioning",
"s3:PutBucketVersioning"
],
"Resource":f"arn:aws:s3:::{bucket_name}"
}
]
# Add the new statement to the existing policy
current_policy['Statement'].extend(new_statement)
# Apply the updated policy to the bucket
s3_client.put_bucket_policy(
Bucket=bucket_name,
Policy=json.dumps(current_policy)
)
Creating and Applying Replication Configuration in Source AWS Account
A replication configuration defines the rules for how objects are replicated between the source and destination buckets.
def apply_replication_configuration(bucket_name, replication_config_file, profile_name):
print("Applying replication configuration to the source bucket")
s3_client = get_s3_client(profile_name)
s3_client.put_bucket_replication(
Bucket=bucket_name,
ReplicationConfiguration={
'Role': f"arn:aws:iam::{SOURCE_AWS_ACCOUNT_ID}:role/{ROLE_NAME}",
'Rules': [
{
'ID': 'ReplicationRule',
'Status': 'Enabled',
'Prefix': '',
'Destination': {
'Bucket': f"arn:aws:s3:::{DESTINATION_BUCKET}",
'StorageClass': 'STANDARD_IA'
}
}
]
}
)
Creating Batch Job for Existing Objects
To replicate existing objects, a batch job is created. This job uses the defined replication rules to copy all eligible objects.
def create_batch_job_to_copy_existing_objects(profile_name, source_aws_account_id, source_bucket, report_bucket, role_name):
reportfolder = "report/"
token = str(uuid.uuid4())
boto3_session = boto3.Session(profile_name=profile_name)
cl = boto3_session.client('s3control', region_name = boto3_session.region_name)
response = cl.create_job(
AccountId = source_aws_account_id,
ConfirmationRequired = False,
Operation = {
'S3ReplicateObject': {}
},
Report = {
'Bucket': f"arn:aws:s3:::{report_bucket}",
'Format': 'Report_CSV_20180820',
'Enabled': True,
'Prefix': reportfolder,
'ReportScope': 'AllTasks'
},
ClientRequestToken = token,
ManifestGenerator = {
'S3JobManifestGenerator': {
'ExpectedBucketOwner': source_aws_account_id,
'SourceBucket': f"arn:aws:s3:::{source_bucket}",
'ManifestOutputLocation': {
'ExpectedManifestBucketOwner': source_aws_account_id,
'Bucket': f"arn:aws:s3:::{report_bucket}",
'ManifestPrefix': 'manifest/',
'ManifestEncryption': {
'SSES3': {}
},
'ManifestFormat': 'S3InventoryReport_CSV_20211130'
},
'Filter': {
'EligibleForReplication': True
},
'EnableManifestOutput': True
}
},
Priority = 1,
RoleArn = f"arn:aws:iam::{source_aws_account_id}:role/{role_name}"
)
print(response)
Automating the Setup with a Master Script
To further simplify the process, we can create a master script that sets up replication for all buckets in the source account. This script dynamically retrieves bucket names and applies the necessary steps for each.
Master Script
Here’s a Python script that automates the setup of S3 cross-account replication for all buckets in the source account.
import boto3
import json
import uuid
# Variables
SOURCE_AWS_PROFILE = "<SOURCE_AWS_PROFILE>"
DESTINATION_AWS_PROFILE = "<DESTINATION_AWS_PROFILE>"
DESTINATION_AWS_ACCOUNT_ID = "<DESTINATION_AWS_ACCOUNT_ID>"
SOURCE_AWS_ACCOUNT_ID = "<SOURCE_AWS_ACCOUNT_ID>"
ROLE_NAME = "s3-replication-role"
POLICY_NAME = "s3-replication-policy"
COMPLETION_REPORT_BUCKET = "replication-completion-reports"
REPLICATION_CONFIG = "replication-config.json"
def get_s3_client(profile_name):
return boto3.Session(profile_name=profile_name).client('s3')
def get_iam_client(profile_name):
return boto3.Session(profile_name=profile_name).client('iam')
def list_buckets(profile_name):
s3_client = get_s3_client(profile_name)
response = s3_client.list_buckets()
return [bucket['Name'] for bucket in response['Buckets']]
def create_destination_bucket(bucket_name, profile_name):
s3_client = get_s3_client(profile_name)
s3_client.create_bucket(
Bucket=bucket_name,
CreateBucketConfiguration={
'LocationConstraint': boto3.Session(profile_name=profile_name).region_name
}
)
def check_role_exists(role_name, profile_name):
iam_client = get_iam_client(profile_name)
try:
iam_client.get_role(RoleName=role_name)
return True
except iam_client.exceptions.NoSuchEntityException:
return False
def main():
source_buckets = list_buckets(SOURCE_AWS_PROFILE)
for bucket in source_buckets:
destination_bucket = f"replicated-{bucket}"
create_destination_bucket(destination_bucket, DESTINATION_AWS_PROFILE)
enable_versioning(bucket, SOURCE_AWS_PROFILE)
enable_versioning(destination_bucket, DESTINATION_AWS_PROFILE)
if not check_role_exists(ROLE_NAME, SOURCE_AWS_PROFILE):
create_iam_role(ROLE_NAME, SOURCE_AWS_PROFILE)
attach_policy_to_iam_role(ROLE_NAME, POLICY_NAME, destination_bucket, bucket, COMPLETION_REPORT_BUCKET, SOURCE_AWS_PROFILE)
add_bucket_policy(destination_bucket, ROLE_NAME, DESTINATION_AWS_PROFILE, SOURCE_AWS_ACCOUNT_ID)
apply_replication_configuration(bucket, REPLICATION_CONFIG, SOURCE_AWS_PROFILE)
create_batch_job_to_copy_existing_objects(SOURCE_AWS_PROFILE, SOURCE_AWS_ACCOUNT_ID, bucket, COMPLETION_REPORT_BUCKET, ROLE_NAME)
print("Cross-account S3 replication setup complete for all buckets.")
if __name__ == "__main__":
main()
Conclusion
Automating AWS S3 cross-account replication ensures that your data is consistently replicated across accounts, enhancing data redundancy and disaster recovery. This guide has provided an overview of the key concepts and a practical implementation using Python scripts. By using the provided master script, you can efficiently set up replication for all buckets in your source account, ensuring a robust and reliable data management solution.
References:
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication-walkthrough-2.html
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/setting-repl-config-perm-overview.html#setting-repl-config-crossacct
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-iam-role-policies.html#batch-ops-batch-replication-policy