Advanced topics in cloud computing

16. 8. 2021

What does it mean to start using cloud services efficiently? From a business perspective, the most important thing is to gain added value compared to on-premises solutions. That’s what my last post was about. In this post, I will discuss several areas that are important from a technical perspective.

The continuous development of the cloud

The cloud is a new technology, and it is far from able to do everything. If we start using the cloud as a platform to develop our own solutions, we will quickly come across areas that aren’t suited to our specific needs. These can include networking configuration options, ready-made software configurations such as Hadoop clusters, the capabilities of precomputed AI models or internal data catalog capabilities. The crucial decision is how much it pays, if at all, to invest in overcoming these limitations at such a time.

Cloud providers are in the same situation as you. They try to create the components and services that users need and find useful. And they are in a fundamentally better position than you are. They know what cloud users are most concerned about, they can invest significantly more in development, and because they are familiar with the internal technology, they can design and build better solutions. It is highly probable that after six months of developing a functionality, a button will appear in the service definition asking if you want to use an equivalent of the functionality that you have gone out of your way to develop.

For example, in 2020, Amazon Web Services introduced 2,300 (two thousand three hundred) innovations, expansions and improvements to its services. That’s more than six innovations per day. The probability that what you need or what could help you will appear in such numbers is high.

I have two good pieces of advice for this situation. Don’t develop anything that the cloud provider could do better. Use cloud services only as they are, in the simplest way recommended. Build solutions quickly according to user needs and anticipate that they may have a short moral lifespan. In two years, you can build the solution again and better. And if you design the solution sensibly, the transition will be easy.

And if you’ve decided to develop something useful, ask what the provider’s plans are for affected service. Providers are very open in this regard, and the answer may surprise you. You may learn that the functionality you need exists but is difficult to find or has a different name, you may learn that you just need to wait a few months, or you may even learn that the service is on the obsolete feature list and that you should use something else or do things differently.

Surveillance and monitoring

When you move an application to the cloud, you are transferring it to a more complex environment. The technology stack will include your application, operating systems, containers, virtual servers and host hardware as well as disk arrays and other data storage. The network environment will be richer with many firewalls, gateways and various other elements. You will probably use multiple internal messaging services. All of these components will give you tons of information about their behaviour and status. But because you won’t have direct access to many of the components, it’s difficult to interpret the amount of information and get a sense of what the reported values say about other layers of the technology stack, especially what they say about your application’s performance. And that’s the only thing you’re interested in.

The key issue turns out to be selecting and monitoring the right things at the right level. This is particularly true when a component fails or is overloaded. Then, it is too late to figure out how the various layers of infrastructure interact or to wade through the values of hundreds of counters and measurement points, often in different applications and unrelated in time.

On the other hand, creating surveillance dashboards with useful reporting value is extremely expensive. Building them takes a lot of time, and the testing is time-consuming and functionally complicated. Furthermore, it’s difficult to push through investment in their development. They aren’t directly needed for the business function of the application. And if everything works as expected, they won’t even be used very much. Not neglecting this area is a big challenge.

Using multiple cloud service providers

The required independence from the provider is a much-discussed topic. Suppose we use a single isolated service with a simple interface such as a server with an operating system or existing services for speech, image or video processing. In this case, maintaining independence and switching between providers is relatively easy. But as you start to connect services on one platform, start using internal messaging, internal surveillance services or even platform services such as AWS Glue, AZURE SQL Edge or Google BigTable, the transfer of functionality to another platform becomes illusory.

There are systems that aim to manage multiple platforms transparently. For example, Terraform can manage AWS, Azure, Oracle Cloud, Google Cloud and Terraform Cloud. In practice, this means that you have to master both the general interface of the tool and how it handles the specialities of each provider. And these specialities are essential for efficient service utilization.

Completely adhering to one provider is also not a recommended strategy. Instead, it is worthwhile to use different cloud providers for different areas or functions. For example, analytics in Azure, primary systems in AWS and web services in Google Cloud. Or finance in AWS, CRM in Salesforce and sales and warehouses in AZURE. Maintaining the knowledge and ability to use multiple platforms is the third recommendation.

Upgrades and bugs are constants

Moving to the cloud reduces administration requirements. This is a frequently cited advantage, and it’s certainly true. The cloud has a lot of support for many administrative activities. However, the range of administrative activities and the scope of the necessary competencies remain unchanged.

The cloud can provide patches or updates for operating systems, database servers, and other components. However, it can’t guarantee that these interventions won’t affect your code. It can’t even test the impacts of the changes on your code. Likewise, the cloud is not an error-free environment. You still have to deal with error response, high availability and disaster recovery processes. The cloud offers many resources that can help with solutions and that, otherwise, you would have to build yourself. Their use and efficiency are ultimately up to you.

Administration can take less time after moving to the cloud, but it requires administrators with a greater range of knowledge and competencies.

Cloud use leads us to another instance of a fundamental IT question that has been with us in various forms for decades: “Is it possible to get a simpler and potentially cheaper solution by adding another system or another layer of technology?”

The answer is still the same. NO, it’s impossible. What we can get is a solution with additional added value.

The previous chapters are examples of what we have learned from customer projects over the last five years. They are examples of what needs to be done to maximize added value.

Author: Ondřej Zýka

Data and information are assets that need to be properly cared for. Ondřej Zýka has been promoting this idea for over fifteen years as Information Management Principal Consultant at Profinit and has been guided by it in projects involving data warehouse development, data quality improvement, metadata management and master data management deployment. It is also one of his lecture topics at universities, where he lectures on database systems.