What is a Cloud-Native Application?

January 24, 2023

The term Cloud Native is something that goes beyond moving applications on-premises, i.e. hosted directly in a data centre to an infrastructure from a Cloud provider, whether public or private.

What is known as "lift & shift" of applications to the Cloud is nothing more than a process in which our applications stop running on the on-premises infrastructure of our data centre to move to run on a Cloud infrastructure, but often without any type of redesign in the architecture of these applications, nor in the construction, deployment and/or operation practices of the same.

Obviously, we can take advantage of some basic "out of the box" benefits by using infrastructure with greater redundancy, with backup facilities, updated with the latest security patches, etc.

But we have to bear in mind that our application is not going to become a Cloud Native just because we deploy it in the Cloud: if you have a system that is a chestnut and you deploy it on an AWS Kubernetes EKS cluster... you have a 'kubernetised' chestnut!

Our application is not going to become a Cloud Native just because we deploy it in the Cloud

The Cloud has changed the rules of the game

Not so many years ago, it was necessary to make a good study of the capacities (compute, network and storage) that we would need for our system to offer a guaranteed service and place an order to cover those capacities (from one or several suppliers) that could take months to be ready. From time to time, we had to assess the potential growth of the system and buy more hardware again if we didn't want our clients to leave us when it stopped working.

Today it is possible to do all this with a couple of clicks on an administration console or better yet, with a call to an API (Application Program Interface) that allows us to automate this process, and Cloud has made computation, network, storage and other more advanced services (databases, message queuing systems, data analytics, etc.) as software-defined abstractions giving rise to what is known as Cloud Computing.

Cloud Computing, in short, are those computing resources that we need to build our systems (CPU, storage, network, etc.) but which are available on the network and can be consumed on demand, offering cost efficiency and a scalability never seen before.

Cloud Native is a consequence of the need to scale up

It is clear then that the technology on which we build and deploy our systems has changed but, what is the reason? Giving a single reason would perhaps be a bit risky, but what we can say is that Cloud Computing solves a scalability problem.

In recent years, digital solutions —gaming platforms, video streaming, music, social networks, etc.— are increasingly consumed from different devices, not just PCs, we are talking about mobile phones, tablets, Smart TVs and even IoT (Internet of Things) devices that have created different scenarios for accessing our systems.

Scenarios where the number of requests and the volume of data is changing, i.e., scenarios that require our system to continue to function correctly in the face of changes in demand for resources, in short, scenarios that require our system to be “scalable”.

However, this is not free, the management of this scalability in our services is becoming more complex, the traditional methods are no longer useful, so we need another way of doing things. Just as PCs emerged back in the day and with them the client/server architectures needed to take advantage of their computing capacity, thus relieving the old "mainframes" from doing all the work of our systems.

Along with this technological change that we call Cloud Computing and to respond to the management of this scalability, new architectural patterns and also new practices to operate our systems have emerged, giving rise to the term Cloud Native.

The goal of Cloud Native applications is to take advantage of the benefits of the cloud to improve scalability, performance and efficiency.

So, when we say that a system or an application is Cloud Native, we are not really referring to whether it runs in Cloud, but to how it has been designed to be able to run and operate correctly on a Cloud Computing technology, also benefiting from the advantages of this technology.

Cloud Native systems are designed in such a way that they have the capacity to grow/decrease dynamically and that they can be updated, all of this without loss of service, which is known as "zero downtime". Zero downtime does not mean perfect uptime but fulfilling a goal whereby no interruption of service is perceived throughout the operation of an application^[1].

Cloud Native according to CNCF (Cloud Native Computing Foundation)

Today's users do not have - or rather, we do not have - the same patience as we did years ago when it was totally normal for a website to take a few seconds to load, or for a streaming video to have some latency, or even to stop from time to time.

The level of scalability provided by Cloud makes it possible for social networking, instant messaging, video or audio streaming applications to allow millions of users to chat, upload photos, videos, watch movies or listen to music, all at the same time.

Users do not want failures, they need the services to be working properly all the time, and this is complicated in an environment as changeable as the Cloud.

The CNCF talks about Cloud Native as a set of technologies that allow us to build and run scalable applications in modern, dynamic environments such as public, private and hybrid Clouds.

It also tells us that the resulting systems are loosely coupled, resilient, manageable, observable and with good automation will allow us to introduce frequent and predictable changes with little effort.

Cloud Native attributes for building reliable and resilient systems

In such a fast-changing environment as the Cloud, it is going to be necessary that we design our systems to be able to react to possible errors that may cause failures in our systems. If we can ensure that our systems have the attributes that the CNCF defines for Cloud Native applications: scalable, loosely coupled, resilient, malleable and observable, we will be able to keep our clients satisfied by providing them with systems that work on a continuous basis.

If we consider each of these attributes, we can see how they help us to make our systems reliable and run virtually uninterrupted.

Scalability: if we design our systems to be able to operate statelessly, we will make our systems scalable and therefore able to adapt to unexpected growth in demand for resources, a form of “failure prevention”.

Weak coupling: if our systems are loosely coupled, avoiding sharing dependencies as could happen when we design a system based on microservices and end up generating a distributed monolith (where changes in one microservice cannot be made without changes in others), it will allow those components or services (or microservices) that are needed independently to evolve and scale, and it will also prevent failures derived from the necessary changes in multiple components if they were coupled.

Resilience: through the redundancy of components or the application of certain patterns that avoid the cascade propagation of failures, we can make our systems more resilient and therefore be able to continue functioning even when certain failures occur, i.e. we will make our system fault tolerant.

Manageable: if we design our systems to be easily configurable, we will be able to change certain system behaviours without the need to deploy a new version of the system, and we may even be able to eliminate possible errors that may have arisen.

Observable: finally, we should take measurements (metrics) of different indicators of our systems that we can observe continuously to be able to predict errors or undesired behaviour and act before they occur.

Cloud Native allows to manage all the complexity that comes with the almost infinite capacity provided by Cloud Computing

By applying design patterns and operating practices, we make our systems even more reliable than the Cloud infrastructure on which they run (for example, a failover between two regions of a Cloud) and at the same time the user has full confidence in the operation of our system.

* * *

^[1] On the ability to plan based on the user's perception of service quality, there are a number of books written by Google engineers —known as SRE or Service Resilience Engineers— which extend the concept of SLA, while adding new ones such as SLI or SLO.

Featured photo: Shahadat Rahman / Unsplash