In our latest panel discussion organised by our TechOps team, we bought together a panel of Cloud Engineering leaders to share experiences from their Cloud journeys and to discuss whether Cloud adoption or Cloud Agnosticism is the right route to go down for all businesses.
A huge thank you to our panel for sharing their insight:
Nayana Shetty, Principal Engineer at the Financial Times
Mark Pashby, Platform Engineering Manager at Cazoo
Chris West, Head of Cloud Engineering at Toyota Connected
David Stockton, Senior Engineering Manager at Confluent
Abz Mungul, Head of Site Reliability Engineering at Zoopla
What are the benefits of moving to the cloud? Why should businesses consider it?
Abz: Reducing total cost of ownership (TCO) is one of the factors. But the main factor would be around competitive advantage, innovation, and the ability to move fast. By moving to the Cloud, a lot of the undifferentiated heavy lifting is taken away from you and you have access to a huge technology pallet of services that you can consume without having to build from scratch.
Mark: It allows the product teams to concentrate on writing features and developing the products, rather than worrying about the platform and the infrastructure.
What are your thoughts on cloud agnosticism? Is this something that businesses should be investing in?
Nayana: At the FT we've decided not to go cloud agnostic and have heavily invested in AWS, but we have back-up tools and frameworks in place that can help us if one day AWS decided it wouldn't be hosting anymore. The way we're doing this is by thinking about building or buying. Do we actually build a product, or can we just buy something off the shelf? Instead of us hosting our own cloud solutions, we've used solutions in the past like Heroku. That’s how we've tried to balance it out, rather than over engineering.
David: Every cloud operates slightly differently, so a lot of tools you can use to improve the capability for being cloud agnostic have specific attributes and are often completely cloud-specific, such as Terraform. It’s a better idea to lock into a large provider such as AWS or GCP, that are very unlikely to go anywhere anytime soon, rather than perhaps another smaller vendor that may be offering you a potential for something else, but that still requires an awful lot of engineering work. Have some disaster recovery plans in case all else fails.
Abz: My view is really based on risk profile and business context and that should help you decide whether to go truly cloud agnostic or not. There’s a layer of complexity that you need to add if you're going to be truly cloud agnostic, and ask yourself is it really worth it? If you're looking at an authentication service rather than building your own on top of a cloud provider, maybe look at a SaaS provider that can provide that capability, or a mixture of cloud providers. But the caveat to that is, you need to have a cloud Centre of Excellence, so you need to get the mindset around using cloud right first, before starting to look at how you can be multi cloud.
What makes a successful cloud transition?
Abz: Successful migration to cloud is really centred around culture and mindset. It’s about making everyone understand the value, but also providing the skills to help with that mindset change. A lot of that could be centred around reading books such as The Phoenix Project, which is around the DevOps culture and how to execute towards a DevOps mindset. The major focus should be on culture and scaling up your workforce to understand how to use Cloud and get the best value out of it because fundamentally, the move to Cloud is to benefit the business and add business value so that you can start being more competitive and move faster.
Mark: It’s really important to measure before migration happens; having as many KPIs and metrics as possible and looking at all the different data you have across your organisation. If you don't have it, then you've got nothing to measure moving forward. The sort of metrics that you want to be looking at are application performance, reliability for your application and also the stableness of that application. It’s not just the tech metrics, you need to look at the business side as well. Communicate with product and customer teams to find out if you’re losing any customers.
Chris: There’s a quote attributed to Peter Drucker who said that ‘culture eats strategy for breakfast.’ In a business that has heavily invested in an IT service delivery culture and has large teams looking after that on-premise infrastructure, it can be very hard to sell to those people that everything should be outsourced off-site to a cloud provider. A major part of any cloud transition is the whole mindset shift and culture of moving away from the traditional way of hosting and making people see that the privacy of your data sets is much more important than where it's actually living.
Nayana: When we decided to go cloud only at the FT the main criteria for us was:
-Having clarity around what we were trying to do and how it would affect people
-Making sure we had continuous communication with our people so they didn't feel left out of the migration and communicating what their role would look like once we had migrated.
-And for me, the most important thing to get right was empathy. We used theories and processes that work for us. One was nudge theory, where instead of forcing people to make this change, we showed the team how things are going to get better, how this will help you innovate and improve your experience.
We also used the EAST acronym quite a lot:
What's the end goal of a cloud transition and what does success look like?
Abz: For me success looks like the ability to focus on business value. You’re seeing the cloud as a commodity which allows you to focus on the business value rather than things such as the infrastructure. If you look at the book Accelerate, there's heavy research around how a DevOps culture and tight cohesion and collaboration allows you to innovate and move faster, and how that can be applied to cloud. The DORA metrics can be one of the measures of looking at velocity and frequency of deployments, and how the health of your development lifecycle is being applied on the cloud.
Chris: Given that we have limited engineering skills, we need to focus those skills on building things that will enable the business to make more money. By outsourcing the infrastructure to your Cloud service provider, that gives your Engineers more time to focus on the really valuable stuff.
David: Moving to the Cloud is a much easier way to view costs, whether they're more or less, it's much simpler to attribute costs across different areas of the business. Whether you're tagging instances, or you move to Kubernetes and you're tagging pods, you can identify costs to much more granular levels and visibility is king.
Audience Q&A...
00:10 - Why do we need to develop in a Cloud Agnostic way?
04:03 - How do you take Finance and Commercial stakeholders on the Cloud Transformation journey?
08:26 - What’s the correct definition of DevOps? I’ve heard the term thrown around and I worry that what DevOps is, to me could mean something else to another person.
14:03 - Is there such thing as too much innovation ? How do you decide when to stop?
How much is distributed cloud different from multi cloud?
David: Distributed cloud is typically defined as selecting the best features from each cloud for different purposes. For example, your customer facing website may reside run on AWS-EKS, using AWS Shield to protect it and AWS RDS for backend storage.Multi-cloud is used to mean a lot of things but typically means using different cloud providers for part of the same overall system. A simplistic example might be serving your front-end assets with AWS Cloudfront/S3 but API requests are processed by components on a GKE cluster.
Cloud is very elastic, how do you secure a hugely distributed environment at scale with huge amounts of realtime data?
David: This is a huge topic. For example, DDoS security could mean Cloudflare, AWS Shield or similar. Encryption at rest could involve KMS keys encrypting EBS or S3 data. Encryption in flight could mean enabling Istio on a k8s cluster (ProTip: I wouldn’t recommend this if you’re really talking about “huge” amounts of data). Then there’s how you connect/interact with the systems - you might want to look at projects like https://www.boundaryproject.io/ from HashiCorp.
I’d consider if you’re really talking about huge though. I work on systems processing 100's of Gigabytes/sec of traffic and Petabytes of storage, but when interview candidates from FAANG companies I often need to reset my expectations on what huge really is.
How can you mitigate the reliance on a single vendor? An extreme example but I’m thinking of is Paler being kicked off AWS.
David: I think it’s fair to say there’s a solution to any problem - whether it’s cost effective/viable is another thing. There are several companies operating large platforms (particularly in Ad-Tech) which operate multicast networks to front their traffic. You can even “bring your own IP” to the 3x major cloud providers. The complexity of your stack will also factor into how viable this is - for example avoid using proprietary vendor protocols/technologies if this is a concern of yours. One option is to run everything (stateful and stateless) within k8s - then you’re fairly agnostic as to the underlying provider (including being able to go bare-metal). Remember that you’re only as strong as your weakest link… if all your DNS nameservers are hosted with the same company then you could have issues there too.
A massive thank you to our brilliant panel for their time and valuable insights. To be kept in the loop about all of our upcoming tech events, subscribe to our monthly tech newsletter here.