Integrating AWS Lambda with Your Majestic Monolith

Title: Integrating AWS Lambda with Your Majestic Monolith
Date: May 22, 2020
Time: 1:00 PM ET/10:00 AM PT
Duration: 1 hr

Speaker: Luke Closs, Founder, ReCollect Systems

Resources:
TechTalk Registration
AWS Lambda and the Serverless Framework - Hands-On Learning! (O’Reilly Video, Free for ACM Members)
AWS Lambda in Action: Event-Driven Serverless Applications (Skillsoft Book, Free for ACM Members)
AWS Lambda Quick Start Guide (O’Reilly Book, Free for ACM Members)
Amazon Web Services for Mobile Developers: Building Apps with AWS (Skillsoft Book, Free for ACM Members)
Cloud Computing, Second Edition (Science Direct Book, Free for ACM Members)
Moving to Serverless with AWS Lambda (O’Reilly Learning Path, Free for ACM Members)
Programming AWS Lambda (O’Reilly Book, Free for ACM Members)

Following the recent ACM Tech Talk, Luke Closs was kind enough to answer some additional questions we were not able to get to during the live event. The questions and answers are presented below:

I heard that API Gateway is costly. Is it cheaper to write your own HTTP trigger endpoint in EC2?

Really depends on usage! I could imagine you could have a scenario where it made financial sense to not use API gateway. But then you’re also trading off ops work. Part of value is in the platform.

This type of question is pretty easy to answer and graph, as the pricing is transparent.

As a CTO I ask: do i want to be managing more infrastructure? Typically the answer is increasingly: no. So there is value in leveraging API Gateway. Other solutions are possible and reasonable.

So, it doesn’t matter what language you write the Lambda in, with respect to speed? One language will execute just as fast as another?

Of course Lambda is not magic. It’s just other-people’s servers running your code. So languages each have their own performance characteristics. You’ll need to assess what is appropriate for your team. Increasingly though, the performance of the languages is less important than developer efficiency. Sometimes however, you really care about performance and it will matter!

What is great about Lambda is that it lets you leverage these different languages! So if you need to do some integration with a third party API, maybe nodejs is nice because it’s quick and easy to write in and build tests. But maybe another project needs intense concurrency - perhaps golang is a better choice in this case. Perhaps you have existing Java code that would be handy to integrate with. You can do that.

What approach do you use if your processing has to exceed the Lambda max time length? Launch and terminate EC2 instances as needed?

Right - Lambda has a maximum processing time - it’s designed for shorter tasks. For longer processing, I would look at other “Serverless” technologies that had different tradeoffs. For instance, I’m looking at using eg: Amazon Elastic Container Service where it’s heavier weight to spin up and processes bigger items. Lambda isn’t the right solution for every problem.

There is a limit of zipped package (python package or jar) to put over lambda (around 50MB). And this seems like a big bottleneck when you want to expose a nice tensorflow rich ML app which is exposed through an endpoint. Its because tensorflow, pytorch or other libraries themselves are pretty heavy in size. How will you approach this problem?

Yes - today there are size limitations on AWS Lambda packages. I’ve observed that these limits have slowly grown over time, as the technology matures. One technique is to use AWS Lambda Layers which increase the limit to about 250mb unzipped.

For this kind of heavier app, perhaps lambda is not the best fit - and I’d look at other serverless technologies like ECS + Fargate.

if you wanted to construct part or all of an API using Lambda, would you want to do one Lambda function per API endpoint, or would that be overkill (especially for a large API)?

I would start with a single lambda for the whole API. If there was a reason where that lambda started to get unwieldly, or started doing 2 or more independent things, then I’d consider breaking it out.

Maybe you’d like it to be separate lambdas for financial transparency? Maybe some API endpoints need vastly more memory (read: more expensive), whereas other API calls are light. You could upload the same code as 2 different lambdas that each had different performance, and then route to each.

But in general we should Keep It Simple.

Is there a wizard for generating the json for step functions?

Google will find some ideas that could be useful for you. I found it easy enough to craft by hand, and we check in the JSON as part of our code. We deploy it using the terraform tools.

How can you compare Google FIrebase and AWS Lambda? (I know you had AWS EC2 experience and continued with AWS —but what can you say about this comparison? As an instructor finding a complete Mobile (including Android) - MaaS-IaaS-PaaS-Serverless and many ML/AI/Authethen services Google offers makes me sometime favor it --though do some EC2 and Lambda. What should students learn.

I see lots of students using and learning Google Cloud. That’s cool. If it works and you can teach curriculum, then cool. My possibly controversial statement is that I see far less usage of Google Cloud in industry than AWS. So maybe I’d want to have students aware of both platforms. They have pros and cons.

In my serverless function, can I invoke a docker container ?

AWS Lambda does not use containers. Other “serverless” technology works with containers, such as Google Cloud Run.

So is general “serverless” technology compatible with docker containers, sure!

Do you reuse your microservices for other projects?

One example is that a couple years back we carved off our image serving APIs to a separate subsystem that uses Fastly CDN, Lambda, DynamoDB and S3. This has become some solid infrastructure that we can build on in any of our other projects and products.

But we’re mostly a monolith app. I wouldn’t really say we’re microservices, really. You can imagine very large companies that have thousands of microservices (Uber is the canonical example here) that all have complex interdependencies.

I mentioned the book Building Evolutionary Architectures, which talks about some of the challenges with this model, and how to cope with them.

I am English, I really care where the tea came from!

Excellent! so now you and I can point to our wardley map and debate the merits of using your artisanal tea vendor in our business. We can talk about the value it provides and how it impacts our business!

Please explain the Wardley Map example here as it applies to this talk. For instance, why is AWS (due to the lack of server-based and micro/small piece programming) helping us with these algorithmic design approaches?

I showed the wardley maps to help provide some historical context about this technology. I tried to tell the story of Scale Up (Legacy IT) -> Scale Out (DevOps) -> PaaS / Serverless. Today people may dismiss serverless / lambda as a toy. I think that’s a mistake.

If you reuse a microservice, do you document this in a library of microservices? How do you organize and publish the catalog so developers can find these microservices?

In our team, I wouldn’t say we have a huge library of microservices - we’re mostly a monolith, but we increasingly take advantage of serverless. Our team is also quite small. We have a wiki where we document our Product Technical Architecture, and that may link to separate pages for different subsystems. Eg: our Images System, our Integrations Architecture and our PDF Generation all have separate wiki pages that try to cover our team’s knowledge.

For more information on Microservices, I highly recommend the book Building Production Ready Microservices by Susan Fowler. She has an amazing checklist that anyone using microservices should use.

How do you see developers shifting to include finance in their decisions? That seems like a really big paradigm shift from "this is more efficient, let’s make it work.

So in some ways this is new, but in some ways this is already part of Legacy and DevOps practices. It used to be “How big of a server should we buy”, then it became “How many servers should we buy”. In serverless it becomes “How much do i think this function will cost me?” (Or better: “How much will this collection of components cost me”).

I think to a developer, it’s just one more parameter to optimize for! But today our infrastructure often has an opacity that makes these costs disconnected from the developer.

Today, I look at the line-items on my AWS bill, once or twice a quarter. Where i notice trends or questions, I flag them for myself or my team to investigate. It’s very easy to imagine the tooling here continue to improve, with ML insights into what is trending upwards, or what could be optimized.

What is the comparative differences between Heroku and AWS? For someone who doesnt know Heroku at all.

Heroku is one of the canonical examples of a PaaS - Platform as a Service. For certain types of apps that fit the Heroku molds, it can be very easy to host your app there. For some apps, they may outgrow the limits of heroku and need to look at other platforms. I have heard that Heroku can be expensive when you start scaling up very large apps.

Are you using serverless for handling incoming API calls from third parties? If so, what are challenges, etc and can you support SLAs with this kind of architecture?

Yes, for example in my talk I mentioned that we’re using serverless at the edge CDN to map requests. That’s a simple example of “serverless”. The alternative may be running my own HAProxy or Nginx or something.

Another example is our Image rendering API. For these applications, the SLAs that are provided by API Gateway and Lambda are sufficient for our needs.

Of course there are examples where extreme response times are needed, and the performance characteristics of AWS Lambda (specifically as a technology) may not fit.

Historically one criticism of AWS Lambda was that your function was not kept “hot” - so the first time it’s invoked, AWS needs to spin it up, and the initial response will be slower. In our experience, this has not been an issue in practice, for our requirements.

AWS Lambda did recently announce a new feature (in the past 6 months - this technology is still evolving!) called Provisioned Concurrency - which lets you pay to keep your lambda hot. Again depending on your team, and your situation this tech may or may not make sense relative to other solutions.

How do we mitigate cold start issue of lambda functions?

Check out the recent Provisioned Concurrency AWS Lambda feature, which lets you pay to keep your lambda hot. It’s not cheap, but if it is a critical requirement for your app you can consider it.

Comments, thought, pros/cons on SageMaker?

Not really! I tried it to do my first machine learning project last year and it was easy and fun to learn! As someone new to ML, it was easy to get started - I didn’t have do any setup or maintenance - and I just paid for what I used, so my project could use big hardware, but was cheap!

I saw the word FINOPS next to DEVOPS in a slide, Luke Could you please give us a clue of that what is? , Thanks!

Yeah, what IS IT? Finance Operations? DevOps + Finance? FinDevOps? I’m trying to point at a concept so don’t get hung up on the specific terms.

Simon Wardley calls it JEFF:

JEFF is just the placeholder for the meme that will eventually describe the co-evolved practices that emerge from serverless.

So, effectively you’re saying infrastructure as code is no longer a concern, just the application code?

I don’t think I would say that. Infrastructure as code came from the DevOps movement. I think the serverless best practices will incorporate Infrastructure as Code as a practice, it’s just not as relevant as in DevOps world.

Like in my team’s DevOps - we have our terraform and ansible - and it sets up EC2 servers and installs packages or builds docker images - all this kind of thing. And then my application sits on that.

In Serverless, I still have some of that, just a lot less. Here, my Infrastructure as Code looks more like a serverless.yaml file that handles the deployment. It’s there, it’s just simpler and less of a concern because the infrastructure is more abstracted away.

What languages will continue to be used in the future? Should we start future-proofing our skills?

My answer is always to look at the skills and needs of your team and your business. Having said that, Javascript will probably last 100 years. (Worse is better)

As more and more serverless components exist, you can start to imagine more reuse at the component level. And those components will be written in many different languages. So perhaps being a polyglot is the future-proof way to live. Also I’m a Perl hacker and i’m teaching you about Serverless, so you do you. :slight_smile:

Is it easy (reasonable) to build distributed (data-oriented) applications with lambdas?

Is it easy (reasonable) to build distributed (data-oriented) applications? Building distributed applications is hard, regardless of execution environment. There is no specific answer here - it depends on your context, your problem domain, your business, and your team. I believe over 5-10 years most new built applications will be on serverless / PaaS platforms.

As you think about maintainability of Lambdas, and picking the best tool (language) for the job, how important is it to restrict ourselves to a small number of languages so that the team retains the skills to maintain them long-term?

I think this is a big concern! You really should look at the skills of your team.

A developer wants to build a critical component in a new language that nobody else on the team knows. Is that a good idea? Probably it’s a terrible idea.

In our team, we’ve got ~3 languages we use: Javascript, Perl, C# and some Golang. In hindsight while it’s cool to learn & build it golang, i probably would have used JS, as it’s already in our wheelhouse and we could invest in getting even better. We don’t do much Python, but some folks on our team have experience. If they wanted to build a project in python we’d be open to it, but I probably would challenge them to see if JS was appropriate, to keep our technology consistent.

I’m coming from the Academic space - I really like your use of wiki documentation - suggestions on good examples and/or how to ingrain this concept in new developers?

I highly recommend the book Building Production Ready Microservices by Susan Fowler. She has an amazing checklist that anyone using microservices should use. I’d love to see students exposed to the depth this checklist provides.

How do you connect big data with serverless?

Remember that “serverless” basically just means “you don’t have to run the computers”. I’m familiar with AWS, so i’ll give you those examples. You could use AWS Sagemaker to do serverless ML. You could use AWS Glue to do “serverless” batch or streaming ETL. You could load up a S3 bucket full of time series data, use AWS Glue to process your data into efficient Parquet format and then use AWS Athena to do serverless SQL queries on the data.

So maybe AWS Lambda isn’t the specific tool for your serverless use-case. That’s ok, it’s not for everything.

PROBLEM: have you seen the costs of some of these serverless services (specifically thinking ML) --so much… will everyone be able to afford these services or be forced to setup own environments in IaaS to afford them.

It’s all tradeoffs, right? In our business we prefer to have a smaller team of developers, and to take advantage of managing less infrastructure. Other teams may find that it’s reasonable for them manage their own environments themselves. That’s not bad. It’s all tradeoffs. Serverless isn’t the only solution, and it’s not the cheapest solution for all problems.

When you pay for serverless and then the financial transparency hits - that can be surprising - wow this system costs this much to run!

When your developers build a custom solution that is 30% cheaper in terms of AWS costs, but it took them 2 months to build and will take their ongoing effort to keep working and upgrade… that cost is much more hidden and doesn’t show up on your AWS bill.

Main implication of building apps on cloud with deeper and deeper Platform-as-a-Service layers (eg. Spring on OpenShift on Kubernetes on RHEL on …) is that developers have much more complex dependencies on what’s ‘Under the Covers’. Are you also creating new Standards to keep this manageable (& fast) for devs?

I don’t agree with the premise. For the most part, I don’t care what version of Linux is running the AWS Lambda that is executing my code. Apps and developers can care less about the deeper layers with PaaS.

But - how to keep things managable for devs - I think different team sizes and cultures have different approaches.

I highly recommend the book Building Production Ready Microservices by Susan Fowler. Susan talks about how Uber tracks all the dependencies and SLAs on large interdependent projects. It’s a fantastic resource.

Given that many EC2 instances are over-sized, are lambda functions a good cost reduction technique?

I think in many situations, yes! It’s very possible to save money migrating from an EC2 instance with roughly fixed costs for variable usage … to a serverless technology where you only pay-by-us.

As the pricing is very transparent, you could gather data on your current system - eg: # of API calls per month. You could do a short investigation to look into how resource intensive your requests would be on Lambda (OR if you’re lazy like me, assume the most expensive lambda cost) and come up with an estimated cost to operate it serverlessly. Keep in mind you’ll need to price in all the components you’d need.

Depending on your scale and the costs involved, you may decide it is or isn’t worth the engineering effort to make the change.

Begginer level should select an old fashion server, if not from where should begin?

I would start with the Serverless Hello World example.

Does AWS provide a uniform data form that enables lambda programs written in different languages to communicate with each other?

Yep! Lambdas get JSON and return JSON. When you invoke a lambda, you invoke it with an event hash parameter.

AWS Step Functions does a beautiful job of choreographing this data in service of operating a State Machine.