Portfolio

Back to Blog

AWS Cognito + AppSync: Field-Level Authorization at Scale

How we secured 5+ GraphQL services with AWS Cognito, supporting 10K+ users with fine-grained permissions and zero token management overhead

Note: Company-specific details have been anonymized. "FoodCo" is used as a placeholder to maintain confidentiality while preserving the technical integrity of this case study.

Table of Contents


Introduction

Here's a common enterprise challenge: You need to secure multiple GraphQL APIs with fine-grained authorization where different user roles (admins, operators, viewers) have different permissions, all while managing thousands of users across development, staging, and production environments.

The naive solution? Roll your own JWT auth with custom middleware for each service. Result: token management nightmare, inconsistent authorization logic, manual user provisioning, no SSO support, and security gaps.

This is what I built with the Cognito Auth Stack: a centralized authentication service that powers 5+ GraphQL APIs (pricing, products, menu config, KDS) with field-level authorization, supporting 10K+ users, SSO integration, and automated group management—all with zero token management code.

We'll explore AWS Cognito User Pools, AppSync @aws_auth directives for field-level authorization, pre-signup Lambda triggers for SSO user linking, multi-client auth flows, and CDK patterns for deploying auth at scale.

graph TD
    A[User] --> B[Cognito User Pool]
    B --> C[App Clients: OAuth, Admin, SSO]
    C --> D[AppSync GraphQL APIs]
    D --> E[Field-Level @aws_auth]

The Problem: Rolling Your Own Auth Doesn't Scale

Summary: Custom auth leads to inconsistency and maintenance hell.

Real-World Requirements

A production authentication system for microservices must handle:

  1. Authentication Flows:

    • User/password authentication (internal tools)
    • OAuth 2.0 authorization code flow (Postman, external apps)
    • SSO integration (enterprise SAML providers)
    • Machine-to-machine (admin password flow for batch jobs)
  2. Authorization Patterns:

    • Field-level permissions in GraphQL
    • Role-based access control (RBAC)
    • Group-based permissions (e.g., pricing.admin, product.read)
    • Consistent authorization across multiple services
  3. Scale Requirements:

    • 10K+ users across 5+ services
    • Sub-100ms token validation
    • Multi-environment (dev, qa, staging, prod)
    • SSO federation with external identity providers
    • Automatic user provisioning

The Custom Auth Anti-Pattern

Attempt 1: Custom JWT with middleware

// Custom middleware for each service (anti-pattern)
async function authenticateRequest(req: Request): Promise<User> {
  const token = req.headers.authorization?.split(" ")[1];

  // Manual JWT validation
  const decoded = jwt.verify(token, SECRET_KEY);

  // Check user exists in database
  const user = await db.query("SELECT * FROM users WHERE id = ?", [
    decoded.sub,
  ]);

  // Check user groups
  if (!user.groups.includes("pricing.admin")) {
    throw new ForbiddenError("Insufficient permissions");
  }

  return user;
}

⚠️ Pitfall: Every service duplicates logic, risking inconsistencies.

Problems:

  • Every service reimplements auth logic (inconsistent)
  • Manual token rotation and secret management
  • No SSO support (users manage separate passwords)
  • User provisioning requires custom admin UI
  • Token validation hits database (slow)
  • No field-level authorization (all-or-nothing access)
ApproachSSO SupportField-Level AuthConsistencyCost
Custom JWTHigh (Dev Time)
Cognito + AppSyncLow

Solution: AWS Cognito + AppSync Integration

Summary: Centralized auth with native validation.

Core Architecture

Centralized Cognito User Pool with multiple app clients:

Cognito User Pool (tb-menu-prod-cognito-pool)
├── DefaultClient (OAuth, Postman testing)
├── InternalClient (admin password flow, batch jobs)
├── ExternalClient (user/password, REST APIs)
└── RetoolClient (OAuth + SSO, internal tooling)

↓ Federated to ↓

5+ AppSync GraphQL APIs
├── Pricing Service (@aws_auth directives)
├── Product Service (@aws_auth directives)
├── Menu Config Service (@aws_auth directives)
├── KDS Service (@aws_auth directives)
└── Discount Service (@aws_auth directives)

Why this works:

  • Single source of truth for users/groups
  • AppSync validates tokens automatically (no custom code)
  • Field-level authorization with @aws_auth directive
  • SSO federation via SAML identity provider
  • Automatic group propagation to all services
{
  "type": "pie",
  "data": {
    "labels": ["User Pool Management", "AppSync Validation", "SSO Federation"],
    "datasets": [{
      "data": [40, 30, 30],
      "backgroundColor": ["#3498db", "#2ecc71", "#e74c3c"]
    }]
  },
  "options": {
    "plugins": {"title": {"display": true, "text": "Auth Components"}}
  }
}

Implementation: Real Production Code

Summary: Multi-client pools and triggers enable flexible auth.

1. Cognito User Pool with Multiple Clients

Creating the user pool with CDK and configuring different auth flows:

// cognito/lib/stack.ts
export class Stack extends cdk.Stack {
  readonly userPool: UserPool;

  constructor(scope: Construct, id: string, props: TbStackProps) {
    super(scope, id, props);

    // Create user pool with email/username login
    this.userPool = new UserPool(this, "UserPool", {
      userPoolName: `tb-menu-${props.stage}-${props.serviceName}-pool`,
      signInAliases: {
        username: true,
        email: true,
      },
      selfSignUpEnabled: false, // Admin-provisioned only
      passwordPolicy: {
        minLength: 12,
        requireLowercase: true,
        requireUppercase: true,
        requireDigits: true,
        requireSymbols: true,
      },
      accountRecovery: AccountRecovery.EMAIL_ONLY,
      removalPolicy: cdk.RemovalPolicy.RETAIN, // Never delete user data
    });

    // Default client: OAuth flow for Postman/testing
    this.userPool.addClient("DefaultClient", {
      userPoolClientName: "default",
      authFlows: {
        userSrp: true, // Secure Remote Password
      },
      oAuth: {
        callbackUrls: [
          "https://oauth.pstmn.io/v1/callback", // Postman
          "https://oauth.pstmn.io/v1/browser-callback",
          "https://oauth.pstmn.io/v1/vscode-callback",
        ],
        flows: {
          authorizationCodeGrant: true,
        },
      },
      refreshTokenValidity: cdk.Duration.days(1),
      preventUserExistenceErrors: true, // Security: don't leak user existence
      writeAttributes: new ClientAttributes().withStandardAttributes({
        email: true,
      }),
      enableTokenRevocation: true,
    });

    // Internal client: Admin password flow for batch jobs
    const internalAppClient = this.userPool.addClient("InternalClient", {
      userPoolClientName: "internal",
      generateSecret: true, // Client secret for machine-to-machine
      authFlows: {
        adminUserPassword: true, // Allow backend to auth users
      },
      disableOAuth: true,
      supportedIdentityProviders: [UserPoolClientIdentityProvider.COGNITO],
      refreshTokenValidity: cdk.Duration.days(1),
      preventUserExistenceErrors: true,
      writeAttributes: new ClientAttributes().withStandardAttributes({
        fullname: true,
      }),
      enableTokenRevocation: true,
    });

    // Store client ID in SSM for other services
    new StringParameter(this, "InternalAppClientIdParameter", {
      parameterName: `/${props.stage}/${props.serviceName}/internal-app-client-id`,
      stringValue: internalAppClient.userPoolClientId,
    });

    // External client: User/password for REST APIs
    this.userPool.addClient("ExternalClient", {
      userPoolClientName: "external",
      generateSecret: true,
      authFlows: {
        userSrp: true,
      },
      disableOAuth: true,
      supportedIdentityProviders: [UserPoolClientIdentityProvider.COGNITO],
      refreshTokenValidity: cdk.Duration.days(1),
      preventUserExistenceErrors: true,
      writeAttributes: new ClientAttributes().withStandardAttributes({
        fullname: true,
      }),
      enableTokenRevocation: true,
    });

    // Retool client: OAuth + SSO for internal tooling
    this.userPool.addClient("RetoolClient", {
      userPoolClientName: "retool",
      generateSecret: true,
      authFlows: {
        userSrp: true,
      },
      oAuth: {
        callbackUrls: [
          "https://tbretool-dev.tblandingpage.com/oauth/user/oauthcallback",
          "https://tbretool.tblandingpage.com/oauth/user/oauthcallback",
        ],
        flows: {
          authorizationCodeGrant: true,
        },
      },
      supportedIdentityProviders: [
        isProtectedStage(props.stage)
          ? UserPoolClientIdentityProvider.custom("YumSSO") // SSO in prod
          : UserPoolClientIdentityProvider.COGNITO, // Local auth in dev
      ],
      refreshTokenValidity: cdk.Duration.days(1),
      preventUserExistenceErrors: true,
      writeAttributes: new ClientAttributes().withStandardAttributes({
        email: true,
      }),
      enableTokenRevocation: true,
    });

    // Export user pool ARN for other services
    new StringParameter(this, "UserPoolArnParameter", {
      parameterName: `/${props.stage}/${props.serviceName}/user-pool-arn`,
      stringValue: this.userPool.userPoolArn,
    });
  }
}

💡 Key patterns: Client per use case, SSO conditional per stage.

2. Pre-Signup Lambda Trigger for SSO Linking

Preventing user duplication during SSO signups:

// pre-signup-trigger.ts (Lambda code)
export const handler: PreSignUpTriggerHandler = async (event) => {
  const { userName, request, triggerSource } = event;

  if (triggerSource === "PreSignUp_ExternalProvider") {
    const userPoolId = event.userPoolId;
    const cognito = new CognitoIdentityProviderClient({ region: event.region });

    // Check if Cognito user exists with same email
    const listUsersCommand = new ListUsersCommand({
      UserPoolId: userPoolId,
      Filter: `email = "${request.userAttributes.email}"`,
    });

    const { Users } = await cognito.send(listUsersCommand);

    if (Users && Users.length > 0) {
      const cognitoUser = Users[0];
      const cognitoUserName = cognitoUser.Username;

      // Link SSO identity to existing Cognito user
      const adminLinkProviderForUserCommand = new AdminLinkProviderForUserCommand({
        UserPoolId: userPoolId,
        DestinationUser: {
          ProviderName: "Cognito",
          ProviderAttributeValue: cognitoUserName,
        },
        SourceUser: {
          ProviderName: event.userName.split("_")[0], // e.g., "YumSSO"
          ProviderAttributeName: "sub",
          ProviderAttributeValue: userName.split("_")[1],
        },
      });

      await cognito.send(adminLinkProviderForUserCommand);

      // Auto-confirm SSO user
      event.response.autoConfirmUser = true;
      event.response.autoVerifyEmail = true;
    }
  }

  return event;
};

⚠️ Pitfall: Without linking, SSO creates duplicate users.

3. Field-Level Authorization in GraphQL Schema

# schema.graphql
type Query {
  getProduct(id: ID!): Product
    @aws_auth(cognito_groups: ["product.read"])

  updatePrice(id: ID!, price: Float!): Price
    @aws_auth(cognito_groups: ["pricing.admin"])

  adminStats: Stats
    @aws_auth(cognito_groups: ["admin"])
}

Why? Least privilege—users see only authorized fields.

4. AppSync Authorization Config

// pricing-service/lib/stack.ts
const api = new appsync.GraphqlApi(this, "Api", {
  definition: appsync.Definition.fromFile("schema.graphql"),
  authorizationConfig: {
    defaultAuthorization: {
      authorizationType: appsync.AuthorizationType.USER_POOL,
      userPoolConfig: {
        userPool: importedUserPool,
        defaultAction: appsync.UserPoolDefaultAction.DENY, // Secure by default
      },
    },
  },
});

💡 Default DENY: Forces explicit @aws_auth on fields.

5. Machine-to-Machine Authentication (Batch Jobs)

Using admin password flow for automated systems:

// pricing-service/src/functions/priceMigrationProcessor.ts

interface AuthParameters {
  userPoolId: string;
  clientId: string;
  clientSecret: string;
  username: string;
  password: string;
}

async function fetchAuthParameters(): Promise<AuthParameters> {
  const client = new SSMClient({ region: "us-east-1" });

  // Fetch all parameters in parallel
  const [userPoolId, clientId, clientSecret, username, password] =
    await Promise.all([
      getParameter(client, `/dev/cognito/user-pool-id`),
      getParameter(client, `/dev/cognito/internal-app-client-id`),
      getParameter(client, `/dev/pricing-service/internal-client-secret`),
      getParameter(client, `/dev/pricing-migration/username`),
      getParameter(client, `/dev/pricing-migration/password`),
    ]);

  return { userPoolId, clientId, clientSecret, username, password };
}

async function authenticateAdmin(credentials: AuthParameters): Promise<string> {
  const cognito = new CognitoIdentityProviderClient({});

  // Calculate SECRET_HASH (required for clients with secrets)
  const secretHash = crypto
    .createHmac("sha256", credentials.clientSecret)
    .update(credentials.username + credentials.clientId)
    .digest("base64");

  const command = new AdminInitiateAuthCommand({
    UserPoolId: credentials.userPoolId,
    ClientId: credentials.clientId,
    AuthFlow: "ADMIN_USER_PASSWORD_AUTH",
    AuthParameters: {
      USERNAME: credentials.username,
      PASSWORD: credentials.password,
      SECRET_HASH: secretHash,
    },
  });

  const response = await cognito.send(command);
  const accessToken = response.AuthenticationResult?.AccessToken;

  if (!accessToken) {
    throw new Error("Failed to authenticate admin user");
  }

  return accessToken;
}

async function sendGraphQLRequest(
  graphUrl: string,
  variables: MutationPriceChangeCreateArgs,
  token: string,
  mutation: string
): Promise<Response> {
  return fetch(graphUrl, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${token}`, // Cognito access token
    },
    body: JSON.stringify({
      query: mutation,
      variables,
    }),
  });
}

💡 Why admin flow: Programmatic auth for batches, no UI.


Performance & Production Results

Summary: Sub-100ms validation at scale with low cost.

Metrics (12 months in production)

Scale:

  • 10K+ users across 5 services
  • 5+ GraphQL APIs secured with Cognito
  • 4 auth flows supported (OAuth, admin, SSO, user/password)
  • 50K+ token validations/day

Performance:

  • Token validation: <5ms (AppSync native)
  • Pre-signup trigger: 200ms average
  • SSO federation: 500ms (external IdP latency)
  • Zero auth-related outages in 12 months
MetricValueNotes
Token Validation<5msAppSync built-in
Users10K+Multi-service
Cost/Month$5For 50K+ MAUs

Security:

  • Multi-factor authentication (MFA): Supported (not enforced)
  • Token revocation: Enabled on all clients
  • Password policy: 12+ chars, symbols, uppercase, lowercase, digits
  • Zero credential leaks in 12 months

Cost:

  • Cognito MAU pricing: First 50K users free, then $0.0055/MAU
  • Lambda triggers: $2/month (pre-signup processing)
  • Total: ~$5/month for 10K+ users
{
  "type": "bar",
  "data": {
    "labels": ["Token Validation", "Pre-Signup", "SSO Federation", "Outages"],
    "datasets": [{
      "label": "Latency (ms)/Count",
      "data": [5, 200, 500, 0],
      "backgroundColor": "#2ecc71"
    }]
  },
  "options": {
    "plugins": {"title": {"display": true, "text": "Performance Metrics"}}
  }
}

Lessons Learned

Summary: Fine-grained and multi-client designs enhance security.

1. Field-Level Authorization > API-Level Authorization

Bad (all-or-nothing):

# User needs access to entire API or nothing
type Query {
  products: [Product!]!
  pricing: [Price!]!
  admin: AdminQueries!
}

Good (fine-grained):

# User can access products but not admin queries
type Query {
  products: [Product!]! @aws_auth(cognito_groups: ["product.read"])
  pricing: [Price!]! @aws_auth(cognito_groups: ["pricing.read"])
  admin: AdminQueries! @aws_auth(cognito_groups: ["admin"])
}

Benefits:

  • Principle of least privilege
  • Easier auditing (groups in CloudWatch logs)
  • Simpler permission management (add/remove groups)

2. Multiple App Clients for Different Use Cases

Pattern:

  • DefaultClient: OAuth for testing (Postman, local dev)
  • InternalClient: Admin password for batch jobs
  • ExternalClient: User/password for REST APIs
  • RetoolClient: OAuth + SSO for internal tools

Why?

  • Different security requirements per client
  • Separate token revocation per use case
  • Clear audit trail (which client issued token)

3. Pre-Signup Triggers Prevent User Duplication

Without trigger:

  • SSO creates new user: YumSSO_john.doe
  • Cognito user exists: john.doe
  • Result: Two users with same email (nightmare)

With trigger:

  • SSO login → Pre-signup trigger fires
  • Check if john.doe exists
  • Link YumSSO_john.doejohn.doe
  • Result: Single user, multiple identity providers

4. Group Namespacing Prevents Permission Collisions

Bad (global groups):

admin
read
write

Good (namespaced groups):

pricing.admin
pricing.read
product.admin
product.read

Why?

  • Services can have different admin permissions
  • Clear ownership (which service does group belong to)
  • Easier RBAC management

5. SSM Parameter Store for Secret Sharing

Pattern:

// Cognito stack exports
new StringParameter(this, "UserPoolArnParameter", {
  parameterName: `/${stage}/cognito/user-pool-arn`,
  stringValue: userPool.userPoolArn,
});

// Product service imports
const userPoolArn = StringParameter.fromStringParameterName(
  this,
  "UserPoolArn",
  `/${stage}/cognito/user-pool-arn`
).stringValue;

Benefits:

  • Single source of truth
  • Automatic propagation to all services
  • No hardcoded ARNs in code

Takeaways for Developers

Summary: Cognito simplifies auth for GraphQL at scale.

When to Use Cognito + AppSync

Perfect for:

  • GraphQL APIs with multiple services
  • Field-level authorization requirements
  • SSO integration (SAML, OIDC)
  • Serverless architectures (no server management)
  • 10K+ users with diverse permission requirements

Not ideal for:

  • Simple single-service auth (overkill)
  • No SSO requirements (simpler solutions exist)
  • Custom token claims (Cognito has limitations)
  • Very high token validation throughput (>1M/day, consider caching)

Key Patterns

  1. Multiple app clients for different use cases
  2. Field-level @aws_auth for fine-grained permissions
  3. Pre-signup Lambda triggers for SSO user linking
  4. Namespace groups by service (e.g., pricing.admin)
  5. SSM Parameter Store for cross-service discovery
  6. defaultAction: DENY for secure by default

Quick Start Guide

1. Create Cognito User Pool:

const userPool = new UserPool(this, "UserPool", {
  signInAliases: { email: true, username: true },
  selfSignUpEnabled: false,
  passwordPolicy: { minLength: 12 },
});

2. Create app client with OAuth:

userPool.addClient("DefaultClient", {
  authFlows: { userSrp: true },
  oAuth: {
    flows: { authorizationCodeGrant: true },
    callbackUrls: ["https://oauth.pstmn.io/v1/callback"],
  },
});

3. Configure AppSync:

new appsync.GraphqlApi(this, "Api", {
  definition: appsync.Definition.fromFile("schema.graphql"),
  authorizationConfig: {
    defaultAuthorization: {
      authorizationType: appsync.AuthorizationType.USER_POOL,
      userPoolConfig: {
        userPool: userPool,
        defaultAction: appsync.UserPoolDefaultAction.DENY,
      },
    },
  },
});

4. Add field-level authorization:

type Query {
  products: [Product!]! @aws_auth(cognito_groups: ["product.read"])
}

Conclusion

AWS Cognito + AppSync transformed our authentication from custom JWT spaghetti into a centralized, secure, and scalable auth system supporting 10K+ users across 5 services with zero token management code.

The impact:

  • 95% reduction in auth-related code
  • 100% consistent authorization across services
  • Zero credential leaks in 12 months
  • <5ms token validation (AppSync native)
  • $5/month for 10K+ users

But the real win? Developers can add new services without rebuilding auth. No JWT libraries, no custom middleware, no token rotation nightmares.

If you're building GraphQL microservices with complex permission requirements, Cognito + AppSync is worth the investment in CDK setup. Your security team will thank you.


Related Articles:

  • "Pricing Service: DynamoDB Patterns for GraphQL at Scale"
  • "Product Service: Solving N+1 Queries in GraphQL with Drizzle ORM"
  • "Building Secure GraphQL APIs: Lessons from 5 Services"

Originally published on [your blog/medium] • 15 min read