Setting Up IAM Identity Center with Terraform and Terragrunt

& DevOps Practitioner

& DevOps Practitioner
π 12 minute read
We're building Hive, a platform whose backend needs to check domain availability and SSL certificate status. The backend makes read-only API calls to Route 53 (ListHostedZones, ListResourceRecordSets) and ACM (ListCertificates, DescribeCertificate). Nothing exotic β just four API actions. But those calls need credentials, and the golden rule is zero long-lived credentials. No access keys in .env files, no secrets baked into Docker images, nothing committed to source control.
We wanted short-lived, automatically-rotated credentials through AWS IAM Identity Center (formerly AWS SSO). This post walks through everything we built β all the Terraform, all the Terragrunt, how the pieces connect β and the five consecutive deployment failures we hit before it worked.
The existing setup
Our infrastructure lives in a modular Terraform and Terragrunt repo. Each AWS resource type gets its own module under modules/, and each environment gets a Terragrunt config under environments/dev/. A shared root.hcl in the environment directory configures the S3 backend and AWS provider, and every module's terragrunt.hcl includes it. A single gitignored file β env.dev.hcl β holds all the sensitive, environment-specific values like account IDs, IP addresses, and hosted zone IDs. The committed counterpart, env-example.dev.hcl, is a template with placeholder values so new developers know what to fill in.
The directory structure after adding the two new modules looks like this:
modules/
iam-policy/
main.tf
variables.tf
outputs.tf
iam-identity-center/
main.tf
variables.tf
outputs.tf
environments/
dev/
root.hcl
iam-policy/
terragrunt.hcl
iam-identity-center/
terragrunt.hcl
vpc/
terragrunt.hcl
...other modules...
env.dev.hcl # gitignored β real values
env-example.dev.hcl # committed β placeholder template
We needed to slot IAM and SSO into this pattern without disrupting the other modules.
Module 1: the IAM policy
The first module creates a standalone least-privilege IAM policy granting only the four read-only actions the Hive backend needs. Route 53 and ACM don't support resource-level ARN restrictions for list and describe operations, so Resource: "*" is correct here β the policy is still least-privilege because it only allows four specific read-only actions.
Here's modules/iam-policy/variables.tf:
variable "infra_name" {
description = "Infrastructure name prefix"
type = string
}
variable "env" {
description = "Environment name (e.g. dev, staging, prod)"
type = string
}
variable "iac" {
description = "IaC tool identifier for tagging"
type = string
}
variable "policy_name_suffix" {
description = "Suffix appended to the IAM policy name (e.g. 'hive-domain-readonly')"
type = string
default = "hive-domain-readonly"
}
variable "policy_description" {
description = "Description for the IAM policy"
type = string
default = "Read-only access to Route 53 hosted zones/records and ACM certificates"
}
variable "route53_actions" {
description = "List of Route 53 actions to allow"
type = list(string)
default = [
"route53:ListHostedZones",
"route53:ListResourceRecordSets",
]
}
variable "acm_actions" {
description = "List of ACM actions to allow"
type = list(string)
default = [
"acm:ListCertificates",
"acm:DescribeCertificate",
]
}
variable "additional_policy_statements" {
description = "Additional IAM policy statements to include"
type = list(object({
sid = string
effect = string
actions = list(string)
resources = list(string)
}))
default = []
}
variable "tags" {
description = "Additional tags to apply to resources"
type = map(string)
default = {}
}
The defaults mean this module works out of the box for our use case β you only need to pass infra_name, env, and iac. The route53_actions and acm_actions variables exist so you can extend or narrow the permissions without forking the module.
And modules/iam-policy/main.tf:
data "aws_iam_policy_document" "this" {
statement {
sid = "DomainAvailabilityReadOnly"
effect = "Allow"
actions = concat(
var.route53_actions,
var.acm_actions,
)
resources = ["*"]
}
dynamic "statement" {
for_each = var.additional_policy_statements
content {
sid = statement.value.sid
effect = statement.value.effect
actions = statement.value.actions
resources = statement.value.resources
}
}
}
resource "aws_iam_policy" "this" {
name = "${var.infra_name}-${var.env}-${var.policy_name_suffix}"
description = var.policy_description
policy = data.aws_iam_policy_document.this.json
tags = merge(
{
Name = "${var.infra_name}-${var.env}-${var.policy_name_suffix}"
Environment = var.env
IaC = var.iac
},
var.tags,
)
}
We use aws_iam_policy_document as a data source rather than writing raw JSON β Terraform validates the structure at plan time and the dynamic block makes it easy to bolt on extra statements later without changing the core policy. The concat of Route 53 and ACM actions means you could swap out just one set if your use case differs.
The outputs in modules/iam-policy/outputs.tf expose the ARN, name, ID, and the rendered JSON:
output "policy_arn" {
description = "ARN of the IAM policy"
value = aws_iam_policy.this.arn
}
output "policy_name" {
description = "Name of the IAM policy"
value = aws_iam_policy.this.name
}
output "policy_id" {
description = "ID of the IAM policy"
value = aws_iam_policy.this.id
}
output "policy_json" {
description = "The rendered JSON policy document"
value = data.aws_iam_policy_document.this.json
}
The policy_json output is the important one β the identity center module consumes it as an inline policy on the SSO permission set.
Module 2: IAM Identity Center
This is the larger module. It reads the existing IAM Identity Center instance, creates a permission set, attaches the policy, optionally creates a group in the Identity Store, and assigns the permission set to principals in the target AWS account.
Here's modules/iam-identity-center/variables.tf:
variable "infra_name" {
description = "Infrastructure name prefix"
type = string
}
variable "env" {
description = "Environment name (e.g. dev, staging, prod)"
type = string
}
variable "iac" {
description = "IaC tool identifier for tagging"
type = string
}
variable "enabled" {
description = "Set to true to provision SSO resources. Requires IAM Identity Center to be enabled in your AWS organization first."
type = bool
default = false
}
variable "permission_set_name" {
description = "Name for the SSO permission set (e.g. 'HiveDomainReadOnly')"
type = string
default = "HiveDomainReadOnly"
}
variable "permission_set_description" {
description = "Description for the permission set"
type = string
default = "Read-only access to Route 53 and ACM for domain availability checks"
}
variable "session_duration" {
description = "Maximum session duration in ISO 8601 format (e.g. PT8H for 8 hours)"
type = string
default = "PT8H"
}
variable "inline_policy_json" {
description = "JSON policy document to attach as an inline policy on the permission set. If empty, the module creates a default Route53+ACM read-only policy."
type = string
default = ""
}
variable "managed_policy_arns" {
description = "List of AWS managed policy ARNs to attach to the permission set"
type = list(string)
default = []
}
variable "account_assignments" {
description = <<-EOT
List of account assignments for this permission set. Each entry assigns
the permission set to a principal (GROUP or USER) in a given AWS account.
Example:
account_assignments = [
{
account_id = "123456789012"
principal_type = "GROUP"
principal_name = "HiveBackend"
}
]
EOT
type = list(object({
account_id = string
principal_type = string
principal_name = string
}))
default = []
}
variable "create_group" {
description = "Whether to create a new group in the IAM Identity Center identity store"
type = bool
default = false
}
variable "group_name" {
description = "Display name of the group to create (only used when create_group = true)"
type = string
default = "HiveBackend"
}
variable "group_description" {
description = "Description for the new group"
type = string
default = "Hive application backend β read-only Route 53 and ACM access"
}
variable "tags" {
description = "Additional tags to apply to resources"
type = map(string)
default = {}
}
The enabled variable defaults to false β more on why shortly. The account_assignments variable takes a list of objects so you can assign the same permission set to multiple accounts, groups, or users in one go.
Now modules/iam-identity-center/main.tf. This is the full file as it ended up after all the fixes:
# -------------------------------------------------------------------
# IAM Identity Center (SSO) β permission set + account assignments
# -------------------------------------------------------------------
# ---------------------------
# Data: SSO instance
# ---------------------------
data "aws_ssoadmin_instances" "this" {
count = var.enabled ? 1 : 0
}
locals {
sso_instance_arn = var.enabled ? tolist(data.aws_ssoadmin_instances.this[0].arns)[0] : ""
identity_store_id = var.enabled ? tolist(data.aws_ssoadmin_instances.this[0].identity_store_ids)[0] : ""
# Default inline policy β Route 53 + ACM read-only
default_inline_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "DomainAvailabilityReadOnly"
Effect = "Allow"
Action = [
"route53:ListHostedZones",
"route53:ListResourceRecordSets",
"acm:ListCertificates",
"acm:DescribeCertificate",
]
Resource = "*"
}
]
})
inline_policy = var.inline_policy_json != "" && var.inline_policy_json != "{}" ? var.inline_policy_json : local.default_inline_policy
}
# ---------------------------
# Permission set
# ---------------------------
resource "aws_ssoadmin_permission_set" "this" {
count = var.enabled ? 1 : 0
name = var.permission_set_name
description = var.permission_set_description
instance_arn = local.sso_instance_arn
session_duration = var.session_duration
tags = merge(
{
Name = "${var.infra_name}-${var.env}-${var.permission_set_name}"
Environment = var.env
IaC = var.iac
},
var.tags,
)
}
# Inline policy on the permission set
resource "aws_ssoadmin_permission_set_inline_policy" "this" {
count = var.enabled ? 1 : 0
instance_arn = local.sso_instance_arn
permission_set_arn = aws_ssoadmin_permission_set.this[0].arn
inline_policy = local.inline_policy
}
# Optional: attach AWS managed policies
resource "aws_ssoadmin_managed_policy_attachment" "this" {
for_each = var.enabled ? toset(var.managed_policy_arns) : toset([])
instance_arn = local.sso_instance_arn
permission_set_arn = aws_ssoadmin_permission_set.this[0].arn
managed_policy_arn = each.value
}
# ---------------------------
# Optional: Identity Store group
# ---------------------------
resource "aws_identitystore_group" "this" {
count = var.enabled && var.create_group ? 1 : 0
identity_store_id = local.identity_store_id
display_name = var.group_name
description = var.group_description
}
# ---------------------------
# Account assignments
# ---------------------------
# Look up each principal in the identity store so we can get its ID.
# Groups:
data "aws_identitystore_group" "assignments" {
for_each = var.enabled ? {
for a in var.account_assignments : "${a.account_id}-${a.principal_type}-${a.principal_name}" => a
if a.principal_type == "GROUP" && !var.create_group
} : {}
identity_store_id = local.identity_store_id
alternate_identifier {
unique_attribute {
attribute_path = "DisplayName"
attribute_value = each.value.principal_name
}
}
}
# Users:
data "aws_identitystore_user" "assignments" {
for_each = var.enabled ? {
for a in var.account_assignments : "${a.account_id}-${a.principal_type}-${a.principal_name}" => a
if a.principal_type == "USER"
} : {}
identity_store_id = local.identity_store_id
alternate_identifier {
unique_attribute {
attribute_path = "UserName"
attribute_value = each.value.principal_name
}
}
}
locals {
# Build a map of assignment key β principal_id
principal_ids = merge(
{
for k, v in data.aws_identitystore_group.assignments :
k => v.group_id
},
{
for k, v in data.aws_identitystore_user.assignments :
k => v.user_id
},
# If we created the group ourselves, wire it in
var.enabled && var.create_group ? {
for a in var.account_assignments :
"${a.account_id}-${a.principal_type}-${a.principal_name}" =>
aws_identitystore_group.this[0].group_id
if a.principal_type == "GROUP" && a.principal_name == var.group_name
} : {},
)
}
resource "aws_ssoadmin_account_assignment" "this" {
for_each = var.enabled ? {
for a in var.account_assignments :
"${a.account_id}-${a.principal_type}-${a.principal_name}" => a
} : {}
instance_arn = local.sso_instance_arn
permission_set_arn = aws_ssoadmin_permission_set.this[0].arn
target_id = each.value.account_id
target_type = "AWS_ACCOUNT"
principal_type = each.value.principal_type
principal_id = local.principal_ids[each.key]
}
There's a lot going on here, so let me call out the key design decisions.
The count = var.enabled ? 1 : 0 guard on every resource and data source means the entire module is a no-op when enabled is false. The local.inline_policy line has a dual check β != "" and != "{}" β which is a defensive measure against mock outputs (explained below in the deployment section). The principal_ids local merges three sources: groups looked up by name, users looked up by name, and groups created by this module. That merge means the account assignment resource doesn't need to care whether the group already existed or was just created β it gets the ID from the same map either way.
The outputs in modules/iam-identity-center/outputs.tf are all gated on the enabled flag:
output "permission_set_arn" {
description = "ARN of the SSO permission set"
value = var.enabled ? aws_ssoadmin_permission_set.this[0].arn : ""
}
output "permission_set_name" {
description = "Name of the SSO permission set"
value = var.enabled ? aws_ssoadmin_permission_set.this[0].name : ""
}
output "sso_instance_arn" {
description = "ARN of the IAM Identity Center instance"
value = local.sso_instance_arn
}
output "identity_store_id" {
description = "ID of the Identity Store"
value = local.identity_store_id
}
output "sso_start_url" {
description = "The SSO start URL (portal URL) β use this in ~/.aws/config"
value = var.enabled ? "https://${tolist(data.aws_ssoadmin_instances.this[0].identity_store_ids)[0]}.awsapps.com/start" : ""
}
output "group_id" {
description = "ID of the created Identity Store group (empty if create_group = false)"
value = var.enabled && var.create_group ? aws_identitystore_group.this[0].group_id : ""
}
output "account_assignment_ids" {
description = "Map of account assignment keys to their IDs"
value = {
for k, v in aws_ssoadmin_account_assignment.this : k => v.id
}
}
When enabled is false, everything returns empty strings or empty maps so other modules that might reference these outputs don't blow up.
Wiring it up with Terragrunt
With the modules written, we need Terragrunt configs to deploy them and an environment config to feed in the values.
The IAM policy Terragrunt config
environments/dev/iam-policy/terragrunt.hcl is straightforward β it includes the shared root config, reads the environment variables, and points at the module:
include "root" {
path = find_in_parent_folders("root.hcl")
}
locals {
env_vars = read_terragrunt_config(find_in_parent_folders("env.dev.hcl"))
}
terraform {
source = "../../../modules/iam-policy"
}
inputs = merge(
local.env_vars.inputs,
{
# Override defaults if needed:
# policy_name_suffix = "hive-domain-readonly"
# route53_actions = ["route53:ListHostedZones", "route53:ListResourceRecordSets"]
# acm_actions = ["acm:ListCertificates", "acm:DescribeCertificate"]
}
)
The defaults in the module's variables handle everything, so the inputs block just passes through the shared environment variables (infra_name, env, iac).
The IAM Identity Center Terragrunt config
environments/dev/iam-identity-center/terragrunt.hcl is more interesting because it has a cross-module dependency:
include "root" {
path = find_in_parent_folders("root.hcl")
}
locals {
env_vars = read_terragrunt_config(find_in_parent_folders("env.dev.hcl"))
}
dependency "iam_policy" {
config_path = "../iam-policy"
# Use mock outputs when iam-policy has no state yet.
# During run-all apply, iam-policy is applied first (dependency ordering),
# then iam-identity-center reads its real outputs.
mock_outputs = {
policy_json = ""
policy_arn = "arn:aws:iam::123456789012:policy/mock"
}
}
terraform {
source = "../../../modules/iam-identity-center"
}
inputs = merge(
local.env_vars.inputs,
{
# Enable only after IAM Identity Center is active in your AWS org
enabled = local.env_vars.inputs.sso_enabled
# Permission set configuration
permission_set_name = "HiveDomainReadOnly"
session_duration = "PT8H"
# Use the policy from the iam-policy module
inline_policy_json = dependency.iam_policy.outputs.policy_json
# Sensitive values pulled from env.dev.hcl (gitignored)
account_assignments = local.env_vars.inputs.sso_account_assignments
create_group = local.env_vars.inputs.sso_create_group
group_name = local.env_vars.inputs.sso_group_name
group_description = local.env_vars.inputs.sso_group_description
}
)
The dependency block establishes two things: ordering (Terragrunt applies iam-policy before iam-identity-center during run-all) and data flow (dependency.iam_policy.outputs.policy_json passes the rendered policy JSON into the identity center module). The mock_outputs block provides fallback values when iam-policy hasn't been applied yet β without it, terragrunt plan would fail on a fresh checkout.
The environment config
All the sensitive, environment-specific values live in env.dev.hcl which is gitignored. The committed env-example.dev.hcl serves as a template. Here's what the SSO section of the template looks like:
inputs = {
infra_name = "your-infra-name"
aws_region = "your-region"
env = "dev"
iac = "terragrunt"
# ...other existing config (VPC, bastion, Route 53, etc.)...
# -----------------------------------------------------------------
# IAM Identity Center (SSO) Configuration
# -----------------------------------------------------------------
# Set to true only after enabling IAM Identity Center in your AWS org
sso_enabled = false
# Your AWS account ID (used for SSO account assignments)
sso_account_id = "your-aws-account-id"
# Which groups/users should receive the SSO permission set
sso_account_assignments = [
{
account_id = "your-aws-account-id"
principal_type = "GROUP"
principal_name = "YourAppName"
},
]
# Set to true to create the group in Identity Center via Terraform
sso_create_group = false
sso_group_name = "YourAppName"
sso_group_description = "Application backend β read-only Route 53 and ACM access"
}
New contributors copy this file to env.dev.hcl, fill in their real values, and git never sees them.
Deploying it (and everything that went wrong)
With all the code in place, we ran terragrunt run-all plan. It failed five times before it worked, and each failure taught us something about how Terragrunt resolves dependencies and how AWS validates inputs. Here's what happened, and what we changed each time.
Failure 1: IAM Identity Center wasn't enabled yet
Error: Invalid index
on main.tf line 25, in locals:
25: sso_instance_arn = tolist(data.aws_ssoadmin_instances.this.arns)[0]
data.aws_ssoadmin_instances.this.arns is empty list of string
The aws_ssoadmin_instances data source queries the IAM Identity Center instance in your account. We hadn't enabled it yet β the data source returned an empty list, and indexing into [0] blew up. This is what motivated the enabled variable you saw in the code above. The original version didn't have the count guards β every resource and data source was unconditional. We added count = var.enabled ? 1 : 0 to every resource and data source, and wrapped the locals in ternaries:
data "aws_ssoadmin_instances" "this" {
count = var.enabled ? 1 : 0
}
locals {
sso_instance_arn = var.enabled ? tolist(data.aws_ssoadmin_instances.this[0].arns)[0] : ""
identity_store_id = var.enabled ? tolist(data.aws_ssoadmin_instances.this[0].identity_store_ids)[0] : ""
}
This pattern β a boolean gate on the entire module β is something we already use elsewhere in the codebase. The bastion host has bastion_enabled, the NAT gateway has nat_enabled. Consistency matters.
We then went into the AWS console and enabled IAM Identity Center. A message appeared recommending that you don't store resources in the management account. That's the ideal multi-account architecture β a management account for Organizations and IAM Identity Center, with separate member accounts for workloads. It's a recommendation, not a requirement. For a single development environment with one read-only permission set, a single account is perfectly fine. We set sso_enabled = true in env.dev.hcl and ran plan again.
Failure 2: the group didn't exist
ResourceNotFoundException: GROUP not found.
with data.aws_identitystore_group.assignments["123456789012-GROUP-HiveBackend"]
We had sso_create_group = false in the config, which told the module to look up an existing group called "HiveBackend" in the Identity Store. But we'd just enabled IAM Identity Center β the Identity Store was empty. Look at this section from the module's main.tf:
data "aws_identitystore_group" "assignments" {
for_each = var.enabled ? {
for a in var.account_assignments : "${a.account_id}-${a.principal_type}-${a.principal_name}" => a
if a.principal_type == "GROUP" && !var.create_group
} : {}
...
}
When create_group is false, this data source tries to look up the group by name. When create_group is true, the !var.create_group condition filters it out and the aws_identitystore_group resource creates the group instead. One line change in env.dev.hcl:
sso_create_group = true
Failure 3: dependency had no outputs
./iam-policy/terragrunt.hcl is a dependency of ./iam-identity-center/terragrunt.hcl
but detected no outputs. Either the target module has not been applied yet, or the
module has no outputs.
This one took two attempts to fix. The original Terragrunt config for iam-identity-center had both a dependencies block and a dependency block pointing at iam-policy:
# This was the problem β strict check, no mock fallback
dependencies {
paths = ["../iam-policy"]
}
# This was fine β has mock_outputs
dependency "iam_policy" {
config_path = "../iam-policy"
mock_outputs = { ... }
mock_outputs_allowed_terraform_commands = ["validate", "plan", "init"]
}
The dependencies block (plural) performs a strict check β it verifies the target has outputs and fails hard if it doesn't, with no mock fallback. The dependency block (singular) already establishes the same ordering relationship and provides mock outputs. Having both was redundant, and dependencies was the one causing the failure. We removed it entirely.
Failure 4: mocks still not available
Same error. The dependency block had mock_outputs_allowed_terraform_commands = ["validate", "plan", "init"], restricting mock usage to only those specific Terraform commands. But Terragrunt's dependency resolution runs before that command-level check kicks in and fails at the Terragrunt level. We removed mock_outputs_allowed_terraform_commands entirely. The final dependency block β the one you see in the Terragrunt config above β has no command restrictions:
dependency "iam_policy" {
config_path = "../iam-policy"
mock_outputs = {
policy_json = ""
policy_arn = "arn:aws:iam::123456789012:policy/mock"
}
}
Now mocks are available whenever the dependency has no state, regardless of which command is running. The dependency relationship still ensures iam-policy is applied first during terragrunt run-all apply. Once it has real state, real outputs replace the mocks automatically.
Failure 5: empty policy rejected by AWS
ValidationException: Value of input 'inlinePolicy' failed to satisfy constraint:
Member must have length greater than or equal to 1
This was the most subtle failure β an interaction between Terragrunt's mock outputs and our plan -out=tfplan / apply tfplan workflow. When we ran terragrunt run-all plan -out=tfplan, the iam-policy module hadn't been applied yet, so the mock output for policy_json was baked into the saved plan file. The original mock was "{}", and the original check in the module was just var.inline_policy_json != "". But "{}" is not an empty string β it passed the check, got used as the inline policy, and AWS rejected it because "{}" isn't a valid IAM policy document.
Two changes fixed it. First, we changed the mock output from "{}" to "" so it correctly triggers the fallback to the module's built-in default policy. Second, we added a belt-and-suspenders check in local.inline_policy:
inline_policy = var.inline_policy_json != "" && var.inline_policy_json != "{}" ? var.inline_policy_json : local.default_inline_policy
Both of these changes are already reflected in the code shown above β the module and the Terragrunt config you see earlier in this post are the final working versions.
It works
After all five fixes, terragrunt run-all apply completed successfully:
[iam-identity-center] Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
[iam-identity-center] Outputs:
permission_set_name = "HiveDomainReadOnly"
sso_start_url = "https://<your-identity-store-id>.awsapps.com/start"
group_id = "a1b2c3d4-0001-70f7-4533-abc123def456"
The Hive backend can now assume the HiveDomainReadOnly permission set and make read-only Route 53 and ACM API calls using short-lived SSO credentials. No access keys, no secrets in config files.
On naming
An early draft used "HiveDevelopers" as the group name, with descriptions like "Developers who need read-only Route 53 and ACM access." We corrected this β the developers don't need AWS access. The Hive application backend needs access, and it decides internally which of its own users can trigger those API calls. The naming now reflects this: HiveBackend, with descriptions about the application, not people. Small thing, but naming that reflects the actual architecture saves confusion later.
Things to keep in mind
Gate new modules behind an enabled flag. Not every AWS service is active in every account, and a boolean gate lets the module coexist peacefully in run-all workflows without blocking other modules.
Don't mix dependencies and dependency blocks for the same target. The dependency block with mock_outputs is strictly more capable β it handles ordering and provides a fallback. The dependencies block adds a strict check that fights against mock outputs.
Don't restrict mock_outputs_allowed_terraform_commands unless you have a specific reason. Terragrunt's dependency resolution can run outside the context of any specific Terraform command, so mocks need to be available unconditionally.
Mock values must be valid for the downstream consumer. A mock of "{}" for a policy JSON broke the AWS API. The mock should be "" so the module falls back to its built-in default.
The plan -out=tfplan then apply tfplan pattern bakes dependency values into the plan. On first deployment, mocks get frozen into the plan file. For initial deployments with fresh dependencies, consider running terragrunt run-all apply without a saved plan so each step resolves real outputs in sequence.
Keep sensitive values in one gitignored file. Account IDs, group names, and feature flags all go in env.dev.hcl. The committed example file is just a template.