Planning to go “FaaS Serverless” all-in? Here is what you need to consider

Eran Levy
6 min readFeb 5, 2024

--

TL;DR

FaaS Serverless architecture, offering scalability, cost savings, and operational simplicity, brings challenges such as complex call chains and the need for specific tooling for development and observability. Effective CI/CD practices, thoughtful observability strategies, and the right development tools are essential for managing “serverless” functions, ensuring a smooth developer experience and maintaining production systems. See below for more around the following: Function vs Service, Cost, Developer Experience, CI/CD, Observability.

In the rapidly evolving landscape of software engineering, Function as a Service (FaaS) Serverless architecture offers an enticing promise: to focus purely on code, leaving the burdens of infrastructure to the cloud providers. This promise has led many engineering organizations to consider, and in some cases, fully commit to going all-in with FaaS Serverless architectures. As an engineer, you can quite easily select a serverless function service (such as AWS Lambda, Azure Functions, or GCP Cloud Functions) from your cloud provider, initiate a new function, connect it to your API gateway or queue service, and start handling traffic immediately.

Obviously, FaaS isn’t a new capability; it has been around for years, and cloud providers continue to invest in this area, introducing new runtimes, features, etc. In addition to these FaaS services, cloud providers are also releasing “serverless” services that enable you, as a cloud-native engineer, to focus on developing your logic and delivering impact, rather than managing the infrastructure.

While the benefits of FaaS — such as scalability, cost savings, and operational simplicity — are widely celebrated, embracing this capability in your architecture introduces a set of challenges.

At Zesty, our architecture has extensively incorporated AWS Lambda and other “serverless” technologies, alongside Kubernetes, especially in the initial phases. This approach was chosen to tackle growth challenges and delegate the management of our infrastructure to AWS. Having worked with an architecture that heavily utilizes AWS Lambda services for the first time, I believe it’s a valuable opportunity to share some insights and ‘gotchas’ that I’ve encountered, which are not clearly addressed in existing blogs:

  • Function vs Service — A “serverless” function is merely a single function and does not constitute a service by itself. However, a collection of “serverless” functions can be combined to form a service. It can be very tricky and the boundaries can blur pretty easily. If you decide you compose your services with a bunch of “serverless” functions, you should ask yourself the following questions while designing a feature that might be composed of “serverless” functions: How do you build a domain context that is based on a set of functions? How do you define the domain context boundaries? How do you make sure the domain contract is well defined? How do you make sure the rule of thumbs around version control is defined and won’t break as your engineering organization grow?
  • Cost —If you haven’t been completely out of the loop, you’ve probably come across the post from May 23 detailing how the Amazon Prime Video team transitioned one of their service architectures from relying on serverless components, like AWS Lambda and AWS Step Functions, to utilizing EC2 and ECS, achieving a cost reduction of 90%. This serves as a reminder that it is a crucial non-functional design aspect to consider, alongside observability and more. Despite the significant benefits of distributed systems, they inherently incur higher costs, not only financially but also in terms of engineering effort. I published a nice post on that topic in the past (Let’s Take Our Conversations about Microservices to the Next Level). “Serverless” functions, inherently designed to perform specific, well-defined tasks, prompt a critical decision: should these tasks be executed in a separate process for each execution, or could they simply run as an additional thread within an existing process (i.e. your Kubernetes pod) that’s already handling the request from start to finish (or any other service orchestration in use)? When considering building your services around “serverless” components, consider asking yourself several key questions: How often this function triggers? What is the expected duration? What are the resource (Memory/Storage) allocations for this function? What is the value I get of running this on a dedicated process?
  • Developer Experience— Navigating cloud-native engineering is a complex task. You are developing a service which is part of larger chain of calls. The development environment is dependent on many things such as: your company size, existing architecture, development tools and more. It’s crucial to allocate additional DevOps and engineering resources to refine your development setup. Without this investment, there’s a real risk of engineers defaulting to local code execution without adherence to any standards, which can slow down development, degrade the engineering velocity, and lead to a cluttered codebase with some commented or “if/else” code inside. Moreover, excuses such as: “it worked on my machine” can quickly become common as each environment got its own things. I think implementing the Twelve-Factor App methodology really varies how you are running your current workload. Things such as: how do you inject secrets, set environment variables, and more won’t look the same on your Kubernetes vs your AWS Lambda functions. There are so many questions that you should ask yourself like: How do you run locally? How do you debug a function? How do I connect to peripheral services such as other APIs or databases? Do I utilize the cloud provider tooling (such as: AWS SAM) or do I use some framework such as the Serverless Framework? and there are so much more to consider…
  • CI/CD — Developing a CI/CD pipeline for “serverless” functions, such as AWS Lambda, requires a slightly different approach compared to traditional CI/CD processes, such as those used for applications deployed on Kubernetes. It’s important to give thoughtful consideration to how your “serverless” functions are organized in your Git repository and the method you’ll use to package your “serverless” function artifacts, whether that be as ZIP files or containerized on AWS. Each packaging approach comes with its own set of advantages and disadvantages. Some cloud providers offer a layering mechanism that shall assist in such packaging strategies and impact the actual runtimes. There is one more thing to consider on your foundation before we move on to the rest: what is the deployment tool you are going to use? is it the cloud provider tooling (such as: AWS SAM) or do I use some framework such as the Serverless Framework or something else? It is bit connected to the developer experience in the previous bullet as this deployment tools will obviously impact your engineers as well. Having established the foundational elements, lets move forward to the general CI/CD practices which requires slight changes when referring to “serverless” functions: How do you test your function? How do you run integration tests? How do you inject environment variables and secrets? Are your DevOps tooling integrate well with the “serverless” functions (such as: your security scanners)? How do you roll-out your “serverless” function gradually and rollback if your tests fail?
  • Observability — Every company have their own set of tools for their system observability. I would like to emphasize 2 critical points here: (a) developing “serverless” functions, expands your call chain significantly making it challenging to monitor without an end-to-end tracing capabilities; (b) adapting your approach for dispatching “Logs” and “Metrics” is necessary, as it will differ from methods like exposing a /metrics endpoint for your Kubernetes containers via Prometheus. These insights are crucial because they necessitate dedicating time to implement appropriate tooling if you would like to understand what is going on your production system. Now you have much larger chain of calls with more queues in the middle in order to achieve an accomplished business logic, you should have the right tooling to view this end-to-end trace, alert on misbehaving spans, etc. In addition to that, how do you dispatch your metrics to your metrics storage such as AWS Cloudwatch? do you use a dedicated SDK such as: AWS Lambda Powertools or you do it different? Do you just log everything to your cloud provider monitoring tools such as: AWS Cloudwatch and transfer them on from there?

Beyond the points mentioned, there’s a wealth of additional considerations, with numerous blogs and resources available that delve into architecture, best practices, and more. A simple web search will lead you to some valuable reads.

Happy coding ;)

--

--