This document starts with a list of concepts, mindset and foundations, followed by jobs-to-be-done.
- Application Lifecycle Management (ALM)
- Architecturally significant requirements criteria: business value/risk, stakeholder concern, quality level, external dependencies, cross-cutting, first-of-a-kind, source of problems on past projects.
- Architectural decision records (ADRs): Records that support team alignment, document strategic directions for a project or product, and reduce recurring and time-consuming decision-making efforts.
- Continuous Configuration values often fall into two groups: those that modify operational behavior of an application—such as throttling, limits, connection limits, or logging verbosity—and those that control FAC (Feature Access Control), including feature flags, A/B testing, and user allow/deny lists.
- Coupling: Coupling describes the independent variability of connected systems, i.e., whether a change in System A has an effect on System B. If it does, A and B are coupled.
- Coupling facets (video, post): 1/ Technology (Java vs. C++, Kubernetes, PostgreSQL) 2/ Location (IP addresses, DNS) 3/ Data Format (Binary, XML, JSON, protobuf, Avro) 4/ Data Type (int16, int32, string, UTF-8, null, empty) 5/ Semantic (Name, Middlename, ZIP) 6/ Temporal (sync, async) 7/ Interaction Style (messaging, RPC, query, GraphQL) 8/ Conversation (pagination, caching, retries).
- Declarative provisioning not equal to Declarative language (video, slides)
- Event-Driven Architecture patterns: 1/ Event Notification 2/ Event-carried State Transfer 3/ Event Sourcing 4/ Command and Query Responsibility Segregation.
- Feature Flags
- GitOps: 1/ Declarative 2/ Versioned and Immutable 3/ Pulled Automatically 4/ Continuously Reconciled.
- Platform: a set of standardized elements that provide value but do not presuppose all problems.
- SaaS Architecture Fundamentals
- Software Boundaries or "Fracture Planes": 1/ Business Domain Bounded Context 2/ Regulatory Compliance 3/ Change Cadence 4/ Team Location 5/ Risk 6/ Performance Isolation 7/ Technology 8/ User Personas.
- Software delivery performance four key metrics: 1/ Cycle Time (Change Lead Time) 2/ Deployment Frequency 3/ Change Failure Rate (CFR) 4/ Mean Time to Recovery (MTTR).
- Frugal Architecture: 1/ Make Cost a Non-functional Requirement 2/ Systems that Last Align Cost to Business 3/ Architecting is a Series of Trade-offs 4/ Unobserved Systems Lead to Unknown Costs 5/ Cost Aware Architectures Implement Cost Controls 6/ Cost Optimization is Incremental 7/ Unchallenged Success Leads to Assumptions.
- Abstraction in the cloud: a service with higher-level vocabulary that shields from the complexity, security and operations of the underlying implementation (Alex Pulver)
- A system is only evolvable if you can easily understand it and you can safely change it. (Rebecca Parsons)
- Cloud automation tools like AWS CDK and Pulumi allow you to implement application’s architecture and logic using the same programming language. (Alex Pulver)
- Build thoughtfully, and fail fast. (Nadiya Amlani)
- Don’t look for a great Idea, find a good Problem to solve. (Someone)
- Feature branching implies a lower bound to the size of a change-set - you can't be smaller than a cohesive feature. (Martin Fowler)
- Increasing the frequency of integration is an important reason to reduce the size of features. (Martin Fowler)
- Infrastructure libraries provide compositions, not abstractions, because consumers own security and operations of the underlying implementation. (Alex Pulver)
- Requirements are the things that you should discover before starting to build your product. Discovering the requirements during construction, or worse, when you client starts using your product, is so expensive and so inefficient, that we will assume that no right-thinking person would do it, and will not mention it again. (Suzanne and James Robertson)
- To achieve modularity we need to constantly watch our system as it grows and tend it in a more modular direction. Refactoring is the key to achieving this, and refactoring requires high-frequency integration. Modularity and rapid integration thus support each other in a healthy codebase. (Martin Fowler)
- You won’t be agile by focusing on agile frameworks. Agility requires changing everything we do, beginning with engineering our systems for lower delivery friction. (Bryan Finster)
- Architecture Independent Value Streams
- Domain-Driven Cloud: Aligning Your Cloud Architecture to Your Business Model
- Modernizing Technology and Mindset with ‘Enabling Teams’
- SaaS Cost Attribution: How to Align Technology with Business
- Start Your Architecture Modernization with Domain-Driven Discovery
- Strategies for investment in Tech Debt vs Product Debt when building new software products
- The Builder’s Guide to Better Mousetraps
- Using domain analysis to model microservices
- Why I Never Want to Build Another MVP
- B2B SaaS benchmarks: What metrics do VCs look at for signs of product-market fit?
- Product metrics that matter the most: A flywheel framework for cloud business leaders
- SaaS and the Rule of 40: Keys to the critical value creation metric
- Aligning SaaS and Service Planes Definitions
- Architect multitenant solutions on Azure
- Are you integrating or building distributed applications? (video, slides)
- AWS Decision Guides
- Building ClickHouse Cloud From Scratch in a Year
- Cloud Automation à la DDD: From stringly typed to affordances
- Cloud design patterns, architectures, and implementations
- Choreography vs Orchestration in the land of serverless
- Failing successfully: The AWS approach to resilient design
- How we ended up with microservices
- I’m sorry, but the way you adopt serverless is wrong
- Introducing the Journey to SaaS Guide to Help You Build, Launch, and Operate SaaS Solutions on AWS
- Kubernetes as a platform vs. Kubernetes as an API
- Minimizing Design Time Coupling in a Microservice Architecture
- Modern cloud applications: Do they lock you in? (video, slides)
- Monoliths are not dinosaurs
- On Designing and Deploying Internet-Scale Services
- SaaS architecture patterns: From concept to implementation (video, slides)
- Serverless or Kubernetes on AWS
- Takeaways of building a business-critical low-latency microservice at scale
- The Serverless Illusion
- You Want Modules, Not Microservices
- Application Design Framework (ADF)
- AWS Well-Architected Framework pillars: 1/ Operational excellence 2/ Security 3/ Reliability 4/ Performance efficiency 5/ Cost optimization
- Operational Readiness Reviews (ORR)
- SaaS Lens for the AWS Well-Architected Framework
- 7 tell-tale signs of fake DevOps
- Agile Rehab: Replacing Process Dogma with Engineering to Achieve True Agility
- DevOps at Amazon: A Look at Our Tools and Processes
- DevOps Topologies
- Fireside Chat: DevOps at Amazon with Ken Exner, GM of AWS Developer Tools - AWS Online Tech Talks
- Leadership Session: Developer Tools on AWS (video, slides)
- Linking Modular Architecture to Development Teams
- Pattern-based process for making design decisions
- Seven Shipping Principles
- Software Architecture: the Hard Parts
- Team Interaction Modeling with Team Topologies
- The Away Team Model at Amazon
- The problems with MVPs in legacy replacement (Part 1, Part 2)
- Two-pizza teams: Organizing for innovation (video, slides)
- Would you like architects with your architecture?
- Building Infrastructure Platforms
- Integrating Backstage at DAZN
- Platform Product Management Versus Platform Engineering
- The Magic of Platforms • Gregor Hohpe • PlatformCon 2022
- How Detailed Should a User Story Be?
- Product Backlog Building Canvas
- Product requirements: User/actor, Functional, Non-Functional, Technical (not usually in the story)
- Product Requirements Document
- Product requirements documents, downsized
- Shape Up: Mapping the Scopes
- Story types:
- User Story – “As a [type of user] I [want this thing] so that [I can accomplish this goal]”. Example: “As a site visitor, I want to see new content when I come to the site, so I come back more often”.
- Job Story – “When [situation], I want to [motivation], So I can [expected outcome]”. Example: “When it’s dinner time tonight, I want to have pizza so I can easily feed my friends”.
- Feature-Driven Development (FDD) – “[action] the [result] [by|for|of|to] a(n) [object]”. Example: “Generate a unique identifier for a transaction”.
- Why the Three-Part User Story Template Works So Well
- Box’s Aaron Levie on navigating SaaS’ several stages of growth
- Managing growth and value creation in SaaS: An interview with a software leader
- Amazon’s Not So Secret Weapon - The magic of Working Backwards: a real-world case study
- HEY Bubble Up: From kickoff to launch
- A Rails Multi-Tenant Strategy That's ~30 Lines and "Just Works"
- Amazon CodeWhisperer Customizations architecture case study
- Building Multi-Tenant Solutions with Amazon OpenSearch Service
- How to implement SaaS tenant isolation with ABAC and AWS IAM
- How to secure CI/CD roles without burning production to the ground
- Implementing SaaS Tenant Isolation Using Amazon SageMaker Endpoints and IAM
- Performance isolation in a multi-tenant database environment
- SaaS tenant isolation with ABAC using AWS STS support for tags in JWT
- Secure data movement across Amazon S3 and Amazon Redshift using role chaining and ASSUMEROLE
- Securing Multi-Tenant Kubernetes Clusters at Scale
- Solving large-scale data access challenges with Amazon S3 (video, slides)
- Architecture patterns for consuming private APIs cross-account
- Best practices for working with the Apache Velocity Template Language in Amazon API Gateway
- How Netflix Scales its API with GraphQL Federation (Part 1, Part 2)
- Should you use a Lambda Monolith, aka Lambdalith, for your API?
- A day in the life of a billion requests (slides, video)
- DynamoDB now supports resource-based policies. But is that a good idea?
- Edge Authentication and Token-Agnostic Identity Propagation
- Enhancing Amazon DynamoDB single-table design with AWS AppSync access and security features
- Entitlements: Architecting Authorization
- How to Persist JWT Tokens for Your SaaS Application
- Amazon DocumentDB (with MongoDB compatibility) user-defined roles for access control
- JSON Web Token (JWT) Profile for OAuth 2.0 Access Tokens
- On The Nature of OAuth2’s Scopes
- Amazon CI/CD Practices for Software Development Teams (video, slides)
- Amazon's approach to high-availability deployment (video, slides)
- Automate rollbacks for Amazon ECS rolling deployments with CloudWatch alarms
- Automating safe, hands-off deployments (video, article, podcast)
- Best practices for CI/CD using AWS Fargate and Amazon ECS (video, slides)
- Best practices for CI/CD with AWS Lambda and Amazon API Gateway
- Building a Continuous Integration Workflow with Step Functions and AWS CodeBuild
- Building a cross-account continuous delivery pipeline for database migrations
- Building and testing polyglot applications using AWS CodeBuild
- CDK Pipelines: Continuous delivery for AWS CDK applications
- Continuous Delivery: Anatomy of the Deployment Pipeline
- Continuous Delivery of Amazon EKS Clusters Using AWS CDK and CDK Pipelines
- Deploying GitOps with Weave Flux and Amazon EKS
- Deployment Pipelines Reference Architecture and Reference Implementations
- Ensuring rollback safety during deployments
- Migrating Critical Traffic At Scale with No Downtime (Part 1, Part 2)
- My CI/CD pipeline is my release captain
- Overview of Deployment Options on AWS
- Parallel and dynamic SaaS deployments with AWS CDK Pipelines
- Practicing Continuous Integration and Continuous Delivery on AWS
- Releasing Mission-Critical Software at Amazon (video, slides)
- Rolling Forward and other Deployment Myths
- Seamless branch deploys with Kubernetes
- Serverless CI/CD for the Enterprise on AWS
- The Scary Thing About Automating Deploys
- Using AWS Step Functions State Machines to Handle Workflow-Driven AWS CodePipeline Actions
- Validating AWS CodeCommit Pull Requests with AWS CodeBuild and AWS Lambda
- Applying the Twelve-Factor App Methodology to Serverless Applications
- Branch by Abstraction for major changes that take time
- Building production-ready prototypes (video, slides)
- Deploy AWS Organizations resources by using CloudFormation
- IAC Adoption Monitor
- Include CloudFormation templates in the CDK
- Managing resources using AWS CloudFormation Resource Types
- Running bash commands in AWS CloudFormation templates
- The Twelve-Factor App
- This is why you should keep stateful and stateless resources together
- Trunk-Based Development
- AWS KMS: How many keys do I need?
- Control Access to Your Data with Slack Enterprise Key Management and AWS KMS
- Architecture patterns for consuming private APIs cross-account
- Implementing the transactional outbox pattern with Amazon EventBridge Pipes
- Starbucks Does Not Use Two-Phase Commit
- Engineering Practices for LLM Application Development
- How to scale machine learning inference for multi-tenant SaaS use cases
- Implement Multi-Region Serverless (and Functionless) WebSocket Pub/Sub APIs with AWS AppSync and Amazon EventBridge
- Ten tips for multi-tenant, multi-Region object replication in Amazon S3
- Addressing latency and data transfer costs on EKS using Istio
- Building the Next Evolution of Cloud Networks at Slack – A Retrospective
- Designing hyperscale Amazon VPC networks
- How FactSet handles networking for 1000+ AWS accounts
- VPC sharing: key considerations and best practices
- Amazon CloudWatch Now Includes Contributor Insights - in Preview
- AWS X-Ray (see also Integrating AWS X-Ray with Other AWS Services)
- AWS X-Ray Now Supports Amazon API Gateway and New Sampling Rules API
- Container monitoring for Amazon ECS, EKS, and Kubernetes is now available in Amazon CloudWatch
- Debugging with Amazon CloudWatch Synthetics and AWS X-Ray
- One observability workshop
- Using Prometheus Metrics in Amazon CloudWatch
- Visualize and Monitor Highly Distributed Applications with Amazon CloudWatch ServiceLens
- Accounting for the Basecamp 3 outage on June 27, 2022
- Amazon’s approach to failing successfully (video, slides)
- Building dashboards for operational visibility
- Changing the Wheels on a Moving Bus — Spotify’s Event Delivery Migration
- Kubernetes cluster upgrade: the blue-green deployment strategy
- Resolve IT Incidents Faster with Incident Manager, a New Capability of AWS Systems Manager
- Towards Operational Excellence blog post series:
- ZEN and the art of Reliability
- Decomposing the GitLab backend database:
- E-Commerce at Scale: Inside Shopify's Tech Stack - Stackshare.io
- Herding elephants: Lessons learned from sharding Postgres at Notion
- Improve performance and manageability of large PostgreSQL tables by migrating to partitioned tables on Amazon Aurora and Amazon RDS
- Partitioning GitHub’s relational databases to handle scale
- Scaling Datastores at Slack with Vitess
- Scaling Etsy Payments with Vitess: