Cloud Engineering

End-to-End Cloud Engineering


This showcase describes how we implemented a solution for our customer from zero to production in just 4 months using AWS serverless services while keeping AWS running costs under $50 per month.

arch.cloud is developing cloud-native solutions for customers who need to gain market advantage by using low-cost, highly-scalable and leading edge solutions. We are using the best available cloud for the job. It can be AWS or GCP or Azure, depending on customer needs.

In this showcase we want to demonstrate how we used AWS and Particle clouds to implement an end to end cloud native solution for our customer.

Business Requirements

Our customer ENFusion Energy LLc. is a manufacturing company located in Colorado, US. They are producing devices that provide power to motors, pumps and compressors using solar energy.

With thousands of such devices installed across the US, ENFusion had a need to centrally operate and monitor devices as well as to provide API-based access to the devices to third-party partners.

Each of ENFusion motor controllers can be connected to the cloud by using ENStratus Comm that is based on Boron IoT devices developed by Particle.

Particle cloud provides an overall IoT solution that ENFusion is using to establish secure connectivity between their field devices and the Particle cloud. Using Particle webhooks and APIs we are able to establish two way communication between field devices and AWS. Even though Particle cloud comes with many useful features, in our project we are using Particle only as a communication component between ENFusion devices and AWS. All of data processing and management is done in AWS.

Main application features:

device management
configurable charts showing solar and grid power usage
device geolocation
daily scheduling of device controls
history of device events
support for different user personas: admins, partners, dealers, customer

Project Roadmap

The project, named ENStratus Cloud, was divided into three phases with overall duration estimated to be 4 months. For arch.cloud the only unknown was connectivity between AWS and Particle clouds. We haven't used Particle before so we decided to jump into understanding this part first. In parallel we were doing the usual "AWS first steps" such as setting up account structure, creating landing zone, establishing access controls and setting up infrastructure monitoring tools.

Once the analysis of business requirements was finished we were able to design the solution using AWS serverless services, to start designing user experience and scripting the whole AWS environment in infrastructure as code tool CDK (Cloud Development Kit).

After the first phase was cleared, we entered into regular iterative weekly-sprint development approach. About a month into development we had the first business review session with ENFusion sales and marketing teams where some new requirements were added and some existing ones were further refined.

The last phase of development was continuation of the iterative approach until all to-do's were removed from our kanban board.

Accounts

The first step is always to setup a proper account structure according to AWS best practices. We are using AWS Organizations to create Organizational Units, accounts, Service Control Policies and Single Sign-On access. By using delegated administration feature we are setting up all security tools in the Security account (Security Hub, Guard Duty, IAM Access Analyzer and Config). Logs from all accounts (CloudTrail, Config) are shipped to Log account where we use Athena to analyze and inspect them. Networking account is used to setup DNS delegation (Route 53) and Backup account is where DynamoDB, Cognito and S3 data are being backed up. Backup account stores data in a different region (us-east-1) to the one where the application itself is running (us-east-2).

(If you want to learn more about AWS account management and security, take a look at our 2-day training AWS Account Management)

Solution Architecture

The following diagram shows the final solution architecture that we developed for ENStratus Cloud.

There are three areas of interest in the architecture:

Static Web Site hosting - public website that is hosted on S3, behind CloudFront distribution and Certificate Manager to enable TLS encryption.
Operator Dashboard - private admin console that requires authentication to be accessed. Authentication is done via Cognito service, front-end UI is developed with React framework and dynamic interaction with the backend is done using GraphQL operations managed by AppSync service.
API Access - API access to ENStratus Cloud for third-party industrial partners that is implemented with REST API Gateway.

Backend business logic is all implemented with serverless Lambda functions coded in Python. Data is stored in DynamoDB using single-page database design which we'll cover later on.

Interesting bits:

No VPC, No Servers - all services that are used are serverless AWS managed services. Scaling, availability, reliability, responding to failures is all done automatically by AWS. Operational effort in maintaining such environment is reduced to minimum, which was one of the initial business requirements.
CloudWatch as data store - ENFusion devices are sending status data each minute to AWS as a JSON document. Initially we were using Amazon Timestream database to store data events from devices but we relatively quickly realized that the cost of using Timestream on a scale of 5.000 devices would reach few thousand dollars per month. That would hurt our customer so we searched for an alternative solution. We calculated cost of using DynamoDB and Aurora Serverless instead, but both weren't good enough. At the end, we decided to go with CloudWatch as a data store. CloudWatch was by far the cheapest solution and we stored JSON data events just like logs in Log groups. Retrieving them was straightforward using CloudWatch Log Insights service where we could define timeframe and size of the data events to search for. Each device had its own log group that was automatically created when a new device was being registered. After 30 days these logs would be copied to S3 for retention. This solution reduced the cost from several hundred dollars for Timestream (for the first batch of 100 devices) to just a few dollars for CloudWatch. For inspecting CloudWatch logs locally, we used Lola which is a desktop tool that simplifies navigation through different log groups.

(If you want to learn more about AWS services and development, take a look at our 2-day training AWS For IT Teams)

Infrastructure as Code

Automation for us is job zero. Every piece of the solution has been scripted from the beginning. Nothing is being created manually through Console or CLI.

For setting up infrastructure we used AWS CDK tool, which is AWS native infrastructure as code tool. We used Typescript to define every service of our solution in CDK. Using a functional programming language such as Typescript to write the infrastructure components was much more closer to our developer mindset compared to using markup languages such as YAML for CloudFormation or HCL for Terraform.

We were using multiple CDK applications to enable different teams to work on different pieces of code. Infrastructure team was writing CDK application that deals with common infrastructure services such as CloudFront, S3, SNS, Cognito etc. The following directory listing shows services defined by the infrastructure CDK application.

Other development teams had their own CDK applications. The team working on REST API Gateway had "api_lambda" CDK application where all Lambdas for third party partners were defined. Frontend team developing Appsync Lambda resolvers had their own "appsync_lambda" CDK application.

With such automation in place we can create new environment in a new account in about 45 minutes. If we are not happy how the solution looks like, we can tear it down equally fast.

Development

As for the code organization, we decided to go with the multi-repo approach where each part of the application had it's own CodeCommit source code repository and it's own CodePipeline.

All developers needed to do was "git push" from their local machine. They never needed to go to AWS environment to deploy something. Code pipelines attached to each repository would detect new code and automatically execute changes.

The code is initially pushed to DEV account where the CodeCommit repositories are located. Code pipelines are building the code and deploying it to DEV, TEST and PROD accounts sequentially with manual approval steps in between. This is another benefit of using AWS CDK. By using CDK L3 construct CDK Pipelines it becomes trivial to deploy the code across different AWS accounts.

Single Table Design

To enable infinite scaling with near-constant performance we decided to use single page design approach for DynamoDB. That means all of our data are stored in a single NoSQL table and every relevant piece of data can be reached by a single DynamoDB action.

We used PrimaryKey overloading technique to place all entities in a single table. Global Secondary Indexes (GSI) were used on particular attributes to enable data searching operations.

In creating a single NoSQL table the first steps are to understand the relational model and data access patterns. The corresponding relational model with access functions (queries and mutations) is the following:

Customer Feedback

Take Aways

Average application response time ~20ms per page for 500 concurrent users.
Team of 5 people distributed remotely was able to deliver the project in 4 months. Team composition: 1 AWS Solution Architect, 1 front-end React developer, 2 Python backend developers and 1 embedded systems engineer.
Only 1 person is currently operating the environment and making sure that things are running smoothly.
With using AWS managed services we never experienced any failures or downtimes of our solution.
Team satisfaction in using AWS serverless services is gone through the roof.