Building & Debugging Custom Container Images for Lambda
Jun 07, 2021
So I've got a side-project that's one of those somewhat frustrating endeavours. It's just useful enough that I don't want to shut it down, it's not important enough to justify much of my focus. Which in itself is fine. But inevitably something happens that means it suddenly requires me somewhat urgent focus: a CVE in a dependency that might expose the data I've stored in it, the decommissioning of a service I'm dependent on. It's then that I inevitably discover that the bulk of what I'm running is no longer officially supported. Heroku has stopped supporting the version of ruby I'm using. The postgres version I'm using is woefully out of date. All manner of local development build chain apparently deprecated.
And so what seemed like it would be a relatively trivial version bump means an upgrading of all the things.
Taking ownership of erosion resistance
One of the propositions of Heroku was that it was "erosion resistent". Which to their credit they've done a commendable job of honouring, I've had joke apps still running there almost a decade later. But they don't make any promises about supporting a particular stack forever. Nor do I think they should. But I also don't love the upgrade all of the things at once dynamic that happens when I do need to fix something.
It's been a long time desire to refactor some of these smaller apps to use custom container images on AWS Lambda. For my fairly low-traffic and sporadic loads I don't need permanently provisioned infrastructure. I like the idea of knowing the stack I'm using will be usable for as long as I'm willing to support it. And for my current problem there's a path to make my upgrade hell easier by isolating the offending function into it's own function/container/service and leaving the rest of the app untouched.
The best laid plans…
A quick skim of some AWS documentation, blog post announcements, and some GitHub READMEs and I'm ready to go. This should be easy. Some minor refactoring a Docker builds later and we get:
START RequestId: 5a0e91fc-7ae2-480e-af2d-260de6cebf90 Version: $LATESTtime="2021-05-17T07:50:54.186" level=warning msg="Cannot list external agents" error="open /opt/extensions: no such file or directory"time="2021-05-17T07:50:54.186" level=warning msg="Couldn't find valid bootstrap(s)" bootstrapPathsChecked="[aws_lambda_ric]"time="2021-05-17T07:50:54.186" level=warning msg="First fatal error stored in appctx: Runtime.InvalidEntrypoint"
And here I'm stuck. For hours. Actually, days. I slept on it and came back no wiser to what I'd done wrong. I get it, it needs a bootstrap
script. But I've got one. In fact in my fumbling frustrations I've got dozens just incase I'd misconfigured or misnamed them. Every permutation and interpretation I could think of is in my container just in case. It's still not working.
It's at this point I should probably abandon this plan and just go make the change on Heroku like I ordinarily would. But... that's not how I do things. It's time to pull this thing apart and understand it properly.
So, dear reader, here's hoping you get value from this bottoms-up dive into how custom Docker images for Lambda work.
The Steve Austin approach
It's ok, this isn't going to cost $6M. But we do have the technology. We can rebuild this. So it's time to start with the most basic and bare bones container possible, and incrementally step our way toward something closer to the developer ergonomics I'm after. Each way solidifying understanding of how and why things work a certain way before adapting them to my particular needs.
We'll start by using the barebones shell examples from the AWS docs, which consists of a bootstrap script:
#!/bin/shset -euo pipefail# Initialization - load function handlersource $LAMBDA_TASK_ROOT/"$(echo $_HANDLER | cut -d. -f1).sh"# Processingwhile truedoHEADERS="$(mktemp)"# Get an event. The HTTP request will block until one is receivedEVENT_DATA=$(curl -sS -LD "$HEADERS" -X GET "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")# Extract request ID by scraping response headers received aboveREQUEST_ID=$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS" | tr -d '[:space:]' | cut -d: -f2)# Run the handler function from the scriptRESPONSE=$($(echo "$_HANDLER" | cut -d. -f2) "$EVENT_DATA")# Send the responsecurl -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response" -d "$RESPONSE"done
The lambda container will use the script above when it is instantiated to bootstrap itself, and fetch for any work to do. The $_HANDLER
environment variable is set at runtime and will contain the name of the function that contains our code. The while true
loop ensures that we keep looping around to process events for as long as the container is alive. The curl
command will block until we have an event to process so we wont DoS ourselves by looping constantly when there is nothing to do.
FROM alpine:3.7RUN apk add curlENV LAMBDA_TASK_ROOT=/var/taskENV LAMBDA_RUNTIME_DIR=/var/runtimeCOPY bootstrap ${LAMBDA_RUNTIME_DIR}/bootstrapRUN chmod 755 ${LAMBDA_RUNTIME_DIR}/bootstrapCOPY function.sh ${LAMBDA_TASK_ROOT}/function.shRUN chmod 755 ${LAMBDA_TASK_ROOT}/function.shCMD [ "function.handler" ]
function.sh
function handler () {EVENT_DATA=$1echo "$EVENT_DATA" 1>&2;RESPONSE="Echoing request: '$EVENT_DATA'"echo $RESPONSE}
docker build -t aws-lambda-custom-image-example .
Local testing
Install the Lambda Runtime Interface Emulator:
mkdir -p ~/.aws-lambda-rie && \curl -Lo ~/.aws-lambda-rie/aws-lambda-rie https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie && \chmod +x ~/.aws-lambda-rie/aws-lambda-rie
Now you can mount the emulator into your container at runtime, and adjust the entrypoint to use it:
docker run -d -v ~/.aws-lambda-rie:/aws-lambda -p 9000:8080 \--entrypoint /aws-lambda/aws-lambda-rie \aws-lambda-custom-image-example /var/runtime/bootstrap function.handler
Test it:
curl -v -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
You'll see a 200
response and the literal string from our function of Echoing request: '{}'
.
Change the payload (i.e., the {}
in the -d '{}
argument).
Unpacking how and why this all works
Now that we've got a basic working implementation we can use it to understand all of the moving parts. Once there's a solid understanding of all of the fundamentals it'll be easier to rebuild it to do exactly what we need.
The Lambda Runtime Interface
The Lambda Runtime Interface is just an API contract your container needs to adhere to. The Lambda platform has a certain set of baseline expectations about how it can communicate with you container, and how your container will communicate back. If you don't honour the contract nothing will work.
Our first contact with that is with the URL we provided to the curl
command. The localhost:9000
host and port combo is how we target the Docker container we're running, but what about that long path of /2015-03-31/functions/function/invocations
? Where did that come from? Enter the magic of the Lambda Runtime Interface Emulator we downloaded, and specifically the aws-lambda-rie
command. We mounted the emulator into the container at runtime as a new volume. We then updated the entrypoint of the container to be the aws-lambda-rie
which we then passed two additional arguments: the name of our bootstrap script and the name of our function, /var/runtime/bootstrap
and function.handler
respectively. Standard naming conventions with Lambda handlers expect the handler to follow the format of filename_without_extension.method_name
. In our case we're calling the handler()
method within the function.sh
file. You'll see in the bootstrap
script we append the .sh
to make sure we source in the correct file.
The aws-lambda-rie
command is what does all the magic of making our function callable via HTTP and at the appropriate path.
It's also helping us manage and process events and state for our invocations. Within the bootstrap
script you may have noticed we're actually making requests to get an initial event to process, and then sending a POST request with the result when we're done processing. Our container is handling those requests too (i.e., we're querying an API within the same container). That API is all magically available and does what it needs to thank to the Lambda Runtime Interface Emulator.
On AWS these requests would go back out to the platform itself but for the purposes of local testing this is more than sufficient.
Retrying our custom Ruby container
So here's the Dockerfile
for my own custom Ruby 2.5 container image which in theor should work based on how I understood the documentation:
FROM alpine:3.7ENV RUBY_MAJOR 2.5ENV RUBY_VERSION 2.5.9ENV RUBY_DOWNLOAD_SHA256 a87f2fa901408cc77652c1a55ff976695bbe54830ff240e370039eca14b358f0ENV RUBYGEMS_VERSION 3.1.4ENV LAMBDA_TASK_ROOT=/var/taskENV LAMBDA_RUNTIME_DIR=/var/runtimeRUN mkdir -p /usr/local/etc \&& { \echo 'install: --no-document'; \echo 'update: --no-document'; \} >> /usr/local/etc/gemrcRUN set -ex \\&& apk add --no-cache --virtual .ruby-builddeps \autoconf \bison \bzip2 \bzip2-dev \ca-certificates \coreutils \dpkg-dev dpkg \gcc \gdbm-dev \glib-dev \libc-dev \libffi-dev \libxml2-dev \libxslt-dev \linux-headers \make \ncurses-dev \openssl \openssl-dev \procps \readline-dev \ruby \tar \yaml-dev \zlib-dev \xzRUN wget -O ruby.tar.xz "https://cache.ruby-lang.org/pub/ruby/${RUBY_MAJOR%-rc}/ruby-$RUBY_VERSION.tar.xz" \&& echo "$RUBY_DOWNLOAD_SHA256 *ruby.tar.xz" | sha256sum -c - \\&& mkdir -p /usr/src/ruby \&& tar -xJf ruby.tar.xz -C /usr/src/ruby --strip-components=1 \&& rm ruby.tar.xz \\&& cd /usr/src/ruby \\# hack in "ENABLE_PATH_CHECK" disabling to suppress:# warning: Insecure world writable dir&& { \echo '#define ENABLE_PATH_CHECK 0'; \echo; \cat file.c; \} > file.c.new \&& mv file.c.new file.c \\&& autoconf \&& gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" \# the configure script does not detect isnan/isinf as macros&& export ac_cv_func_isnan=yes ac_cv_func_isinf=yes \&& ./configure \--build="$gnuArch" \--disable-install-doc \--enable-shared \&& make -j "$(nproc)" \&& make install \\&& runDeps="$( \scanelf --needed --nobanner --recursive /usr/local \| awk '{ gsub(/,/, "\nso:", $2); print "so:" $2 }' \| sort -u \| xargs -r apk info --installed \| sort -u \)" \&& apk add --virtual .ruby-rundeps $runDeps \bzip2 \ca-certificates \libffi-dev \openssl-dev \yaml-dev \procps \zlib-dev \&& apk del .ruby-builddeps \&& cd / \&& rm -r /usr/src/ruby \\&& gem update --system "$RUBYGEMS_VERSION"ENV BUNDLER_VERSION 2.2.11RUN gem install bundler --version "$BUNDLER_VERSION"# install things globally, for great justice# and don't create ".bundle" in all our appsENV GEM_HOME /usr/local/bundleENV BUNDLE_PATH="$GEM_HOME" \BUNDLE_BIN="$GEM_HOME/bin" \BUNDLE_SILENCE_ROOT_WARNING=1 \BUNDLE_APP_CONFIG="$GEM_HOME"ENV PATH $BUNDLE_BIN:$PATHRUN mkdir -p "$GEM_HOME" "$BUNDLE_BIN" \&& chmod 777 "$GEM_HOME" "$BUNDLE_BIN"RUN gem install aws_lambda_ricWORKDIR ${LAMBDA_TASK_ROOT}RUN mkdir -p ${LAMBDA_TASK_ROOT}COPY app.rb ${LAMBDA_TASK_ROOT}ENTRYPOINT ["/usr/local/bin/aws_lambda_ric"]CMD ["app.App::Handler.process"]
And for the sake of completeness here's the app.rb
that serves as our function:
module Appclass Handlerdef self.process(event:, context:)"Hello World!"endendend
Towards the end of the Dockerfile
we install aws_lambda_ric
which is the AWS Lambda Ruby Runtime Interface Client. Think of it like the emulator, but only implementing the API endpoints required for a production deployment (i.e., not the lambda platform API for getting the next event or posting results back). It also serves as a replacement for our bootstrap script and handles the polling required to hand out additional work.
All the way back at the start of this post it was this script that was causing our error of "Couldn't find valid bootstrap(s)", even though the script exists. Was it not setting the right values somewhere? Is there something else a bootstrap
needed to do to announce that the container was indeed bootstrapped?
Use what we've learnt from the minimal approach to debug this thing. Time to mount this container just like we did before with the emulator and post an event to it. We'll launch the new container, with the new bootstrap (i.e., aws_lambda_ric
instead of our script), with the new handler:
docker run -d -v ~/.aws-lambda-rie:/aws-lambda -p 9000:8080 \--entrypoint /aws-lambda/aws-lambda-rie \lambda-ruby2.5 \/usr/local/bin/aws_lambda_ric app.App::Handler.process
Post an event to it just like before and... 502 bad gateway. Again. Looking at the logs and once again:
time="2021-06-07T06:51:55.195" level=error msg="Init failed" InvokeID= error="Couldn't find valid bootstrap(s): [/usr/local/bin/aws_lambda_ric]"time="2021-06-07T06:51:58.162" level=warning msg="Couldn't find valid bootstrap(s)" bootstrapPathsChecked="[/usr/local/bin/aws_lambda_ric]"
Hrm. As we learnt before the bootstrap
is pretty simple, it just needs to sit there in a loop looking for work. Maybe I've been assuming inferring more than I should have here from the "valid bootstrap" terminology? Maybe this isn't about aws_lambda_ric
being invalid, maybe it can't find that script at all! Time to connect to that container via an interactive shell and test that theory:
$ docker run --entrypoint="" -it lambda-ruby2.5 /bin/sh$ find / -name "aws_lambda_ric"/usr/local/bundle/bin/aws_lambda_ric/usr/local/bundle/gems/aws_lambda_ric-1.0.2/lib/aws_lambda_ric/usr/local/bundle/gems/aws_lambda_ric-1.0.2/bin/aws_lambda_ric
Yep! Definitely a user problem here! It turns out all of the various base images I'd tried had been installing aws_lambda_ric
into /usr/local/bundle/bin/
(note the extra /bundle
sub-directory). Update the ENTRYPOINT
in my Dockerfile
(and in the command to test locally) and it works!
And there you have it. Use aws_lambda_ric
(it's available in multiple languages, not just ruby) and you can turn any custom image into a Lambda-compatible container.
Previously I led the Terraform product team @ HashiCorp, where we launched Terraform Cloud and set the stage for a successful IPO. Prior to that I was part of the Startup Team @ AWS, and earlier still an early employee @ Heroku. I've also invested in a couple of dozen early stage startups.