ShipTalk - SRE, DevOps, Platform Engineering, Software Delivery

A Time Before, During, and After Kubernetes - Santiago - GumGum

Santiago Campuzano Season 1 Episode 13

In this episode of ShipTalk we are joined by Santiago Campuzano. Santiago is a Lead DevOps Engineer at GumGum and has seen his fair share of distributed systems.  Santiago used to build distributed Java/JEE applications for the financial sector and has gone through the early days of containerization leveraging technologies such as Apache Mesos and Apache Marathon before leveraging Kubernetes. 

We go through in the episode describing the time right before containerization became popular to the journey that Santiago and firms have been on ever since.  Santiago has also published an excellent blog describing concerns and different points of view when implementing K8s at scale.

Blog:
https://medium.com/gumgum-tech/implementing-kubernetes-the-hidden-part-of-the-iceberg-part-1-76c3e9684d49 

Ravi Lachhman  00:06
Well hi everybody, and welcome back to another episode of ShipTalk. This is our first episode after {unscripted{. And I'm very honored to be talking to Santiago today. So Santiago for the listeners who don't know who you are, and maybe tell us a little bit about yourself, and where do you work? 

Santiago Campuzano  00:21
First of all, thanks so much, Ravi, for having me here. My name is Santiago Campuzano. I am a Lead DevOps engineer at GumGum. So again, GumGum is a computer vision advertising ad tech company, based in Santa Monica, California. Well, we as a company, we've been around for almost 12 years. I've been working at GumGum for almost three years. And yeah, it's really exciting. It's really exciting to be here to talk about my experiences with Kubernetes, with containers, you know, with all the infra and all the, all the technology that led ultimately to Kubernetes and to the container explosion that we are living right now. So thanks again, Ravi, for having me here. 

Ravi Lachhman  01:07
Yeah, absolutely. I'm quite excited to talk to Santiago. He actually produced a pretty excellent blog post, which we'll link to, and he actually talked about some of his early experience, I was like, ah-ha! Santiago has seen the entire evolution. Because a lot of times when we talk about Kubernetes, we're so focused on it. But especially with a person like  Santiago with lots of experience, you kind of see that there's things that are different things that are the same, right, so like the there wasn't much or the evolution wasn't as ground shattering, as as if you were just starting out as it appears to be. So  let's go on a little journey. Let's go to the 2000s the go back in time machine. And so let's go up right before Linux Containers became popular, right. So like, these are like the 2010s. But Santiago before the podcast is telling me he remembers the days of racking and stacking and running the networking wires, let's not go back that far. But we can say, you know what, let's talk about 2010. What was the technology landscape looking like, in your world?

Santiago Campuzano  02:19
Okay, yeah, by the time I used to be, I mean, I used to be like a consultant. For many companies in Colombia, especially I was focused on middleware software, like I mean, Java middleware. So I remember working with like Oracle WebLogic. Even IBM Application Server. So yeah, by getting on time, like the heavy technology was obviously VMware and the virtual machines, you know, virtual machines came as a technology that was gonna consolidate, you know, even further was gonna save companies a lot of money that was going to speed up the provisioning of new servers and new applications. Because before that, you just have to wait for, you know, the provisioning part of your company to acquire the servers to install the servers. So yeah, back in time. So we had VMware. The other hype, I remember back in those days was the SOA, the service oriented architecture hype. So I remember actually being a Service Oriented Architecture consultant. To be honest, for me, that was kind of a hybrid because, you know, the application integrations have existed for for many years. So it was just like a new architecture, trying to solve a lot of problems that, you know, companies were facing when integrating services. But probably a lot of companies got the concept wrong. Probably they were trying to force the service oriented architecture rhetoric to the companies. So I think actually, that was kind of the beginning of microservices, you know, because when you start working with Service Oriented Architecture, you have to think of your business and your applications, you know, as like, as small functions and functions with, you know, with proper business boundaries, you know, like perfectly defined business boundaries. I think that was like the earliest stages actually, of microservices. I mean, correct me if I'm wrong, but I think from my point of view, I think that was like the preview of microservices.

Ravi Lachhman  04:27
Awesome, I even I think now, I think we're related now because in the same time period, I was doing a lot of JEE development in the 2010. So it's like, I'm lucky that I know Santiago a little bit we have very similar career paths. Actually, I started out in Java/J2EE. And then I did a lot of Mesos stuff and then a lot of Kubernetes stuff. Pretty similar path. So let's start building up to the container lifestyle. So having A clustered application of more than one node is not something new. Right? So this is something that we did a lot in Java, I did a lot of multi node applications or distributed apps, or distributable or distributed applications in Java. And so like, each instance, I remember using WebSphere ND. So she would have would have like nine copies.

Santiago Campuzano  05:23
Which by the way, was super complex to install and configure  you needed a PhD, you know, to install and properly configure WebSphere ND. I mean, if you wanted to do it right,

Ravi Lachhman  05:36
Yeah, it felt like a lot of that. So like, there's a lot of concepts in an application server that mimics you know, just to call up the web container. Right? When we say container, now we think of a Linux Container like Docker. But there was a lot of we were building distributed apps fairly complicated distributed apps. Back in the day, I'm sure you were too. But what but what's changed is the application infrastructure or how we go about installing and distributing and deploying to it. So there used to be the mystic middleware engineer, which who knows where that person is now? They're missing. 

Santiago Campuzano  06:15
They're looking for jobs right now. 

Ravi Lachhman  06:16
Yeah. Or they're just DevOps engineers. And so it's, let's talk about what containerization actually did. So you and I used to build WARs and JARs and hand it off to somebody or we might have been the person again, that we handed it to ourselves and used ND to deploy stuff. But let's talk about Okay, like, what are some benefits, that just dumping something in a container you think, did?

Santiago Campuzano  06:48
I think there are a lot of benefits about like moving an application to work in containers, or like microservices, whatever you want to name that, actually, I remember, like the very first project that I work at, I mean, I work I used to work for a very big Colombian company, like an insurance company. So the core application of this company, which kind of huge, it was, like entirely written in Java, and PL/SQL. So they wanted to break it to break up this application into like small pieces, namely, say microservices. So I remember that I started working on that project as like, as some sort of DevOps engineer. So at that time, we wanted to actually use Docker, Docker was like, in its earliest stages. So because back in that time, you know, Mesos and Marathon, were liking it earliest stages as well, it was not like a mature product to use. But like, the main benefits that we saw back in a time for using containers were was, well, mainly resource allocation, you know, a resource isolation, but then a specific application, because back in that time, if you had, like, you know, 10 Java applications running in the same VM, they were going to compete each other for resources, I mean, mostly for CPU, because while at some point, you could isolate the memory, you know, setting up the Xmx, and Xms for the applications, but they weren't, they were going to compete each other for CPU and, networking, and I/O. So that was like, the very first benefit that we gained from that, like isolating resources from each other. The other benefit that we gain was to be able to a scale up different pieces of that microservice, you know, like a scaling app, for example, if you have issues with your messaging part of your application, you can you could scale up, those microservices, instead of scaling up the entire monolith. Because as I just mentioned, that core application of that insurance company was like a huge monolith. You know, actually, I remember that the WAR was like one gigabyte or even more than one gigabyte, so it was like a huge monolith. So actually starting that application used to take like 15-17 minutes, you know, because he was like an entire, monolith starting up and connecting to the hundreds of components, backends databases, right after that, we break up the monolith in probably 12 or 15 different microservices. So obviously, the startup times for those microservices, instead of being minutes. Now they were seconds. And because of that, obviously, they could have focused teams working separately on each microservice. So instead of having 100 developers working on the monolith, you know, you could have maybe small teams or maybe six or seven developers small focus on only single microservice, obviously, that has challenges as well, obviously, there's going to be drawbacks of implementing or breaking a monolith. But yeah, we can discuss that later. So yeah, for me, that was like the biggest benefit of starting working with microservices or containers.

Ravi Lachhman  10:16
Awesome. Yeah, that's great. Like, really, I like your explanation that, you know, a lot of times, you know, you're actually CPU constrained, and you don't realize it, because that Java respects itself pretty good about the memory, but the CPUs. Right. Now we're getting closer now we're certainly continuing on the journey towards Kubernetes. Right? So we're getting there. So getting slowly there. So one foot in one foot out of the JEE stuff now. So when we would deploy like to say, to ND or like some sort of cluster deployment, it was pretty standard, right? Like, we know how to load balance, right? So our application might have eight nodes, but you know, we have some sort of software load balancer in front of that, that will figure out, you know, we'll connect to either IBM's Oracle's or Red Hats, right? Like their, their particular routing mechanisms, but now we're having containers, right, so we might not get to use an application server anymore. And so some of the stuff that we like, you know, if you just have several containers, you're, you're wiring it yourself again, right? You're using right , Apache Mod-JK or something by yourself that, and now this is we're gonna start introducing. The other concept is container sprawl. Right? So if you thought virtual machine sprawl was bad, it is easy to spin up a container, and they also die. But let's talk about why even having a container orchestrator, right, so let's go to that. I'm just kind of leading Santiago here to his first experience with Apache Mesos and Apache Marathon,  the kind of the precursor to Kubernetes was these two projects that were job schedulers, actually, then it became Mesosphere product. But why don't you talk about the road to Mesos? Or why even it was needed?

Santiago Campuzano  12:08
Right, so yeah, obviously, as I just mentioned, we were kind of managing between 13 and 15, microservices, after we broke up  this huge monolith, obviously, that that led us to having some challenges, like for example, if you spin up a Docker container, right, that Docker container will have a will be exposing a particular port, right, because inside the container, you will have port 80, right, the HTTP port. But if you want to expose that port, to your host, you know, that's going to be probably a random port. Because if you want to spin up 10 containers, for the same microservice, obviously, you cannot expose the same port. So you will have random ports exposed to the host. So that's the first challenge, you know, how the load balancer or whatever the or the resource that you have for, you know, load balancing the incoming requests, how they are going to be aware of the running containers, and what ports are exposed on those on those containers, right? Because obviously, the idea of using containers is to have changing loads. So obviously, you will scale up and scale down your portys, or sorry, your containers. And because you have different machines or different servers, so you don't know where those new containers are going to be, you know, spin up, or you probably know, but you don't want to control that. So actually, back in those days, even before doing that, I remember that I wrote a shell script, actually, to manage that. But it was super difficult to maintain that shell script because that in that shell script, I had hard coded like the names of the containers, like I had to statically define the exposed ports for every every single container, like the minimum and maximum number of containers for that specific microservice. So it was like super, super manual. And actually the shell script worked somehow you know what the shell is to even had an interaction with our load balancer, which I think if I remember right, it was like a Cisco load balancer. So this  called the load balancer was able to attach and detach the containers but it was all the shell script magic. So it was it was not an easy thing to maintain. So that was the time when Mesos and Marathon appeared on the earliest stages. So obviously, Mesos and Marathon, they were a solution that tried to solve that issue, you know, how to orchestrate containers, how to schedule containers, you know, how to properly allocate resources for those containers, how to know the amount of CPU and memory available in your entire cluster in a single node. You know, so it was it was like a good solution. You know, for for the struggles that we were facing. But obviously there were they were there were other problems that weren't solved by Mesos and Marathon like security, like logging, like monitoring. Like, what else service discovery? So like Mesos and Marathon, they covered the basic problem of scheduling. But you know, working with coordinate with microservices and containers, it is not the unique problem that you have, you have more problems than just the scheduling. So that's citing, like the previous, or the precursor of Kubernetes for some saying, so Yeah, actually, our experience with messenger marketing was kind of really short. It lasted for for a while, actually, I left that company, right after I implemented this microservice architecture, Docker, Mesos, and Marathon. But yeah, that was pretty much my experience with because it's a model. And to be honest, I kind of liked it. Because it was an open source tool. It was pretty, pretty good maintain pretty good design. But yeah, obviously it was, it was missing a lot of things. 

Ravi Lachhman  16:15
And that's pretty true. I really like that journey that you know,  a lot of people don't know that struggle, right. Now, but when went to containerization. Every one of the things you said is correct. Like you need an orchestrator because scheduling, placement. Even just your containers are made to die on, like, on like a WebSphere node like they die. But you know, they'll come back with the help of several people, but containers will die all the time. Right and you place them again.  So let's, let's fast forward now. So I think the audience, you know, they might be familiar with good old K8s, Kubernetes. But let's talk about your your blog piece. So what really impressed me about Santiago was, he was actually talking about quoting the blog, it's the hidden part of the iceberg. A lot of the things that you don't think about with Kubernetes. So Kubernetes is kind of ubiquitous now. Like I think most people, you know, at least heard of it. But I'll leverage your expertise now. So tell me, what are some things that are just the tip of the iceberg, right, like some concerns that you know, that you had to work through that were more more or less housing that you mentioned in the blog post?

Santiago Campuzano  17:33
So actually, the idea of writing that blog post, it was like a revelation to me kind of a revelation. But I mean, I've been working at GumGum, as I just said, initially for three years. And during those three years, I've been working on the implementation of Kubernetes. So before working at GumGum might work for other companies trying to implement trying to implement Kubernetes as well, but here at GumGum was like my first serious production grade implementation of Kubernetes. So, I mean, Kubernetes is amazing. We're gonna say that I'm gonna start with that. But most people kind of overlook a lot of aspects about Kubernetes. I mean, like, so the tip of the iceberg is like those amazing things about Kubernetes. Right, like service discovery, like container orchestration, like what else probably resource management, right? That's the tip of the iceberg, because those are the things that are going to sell Kubernetes to most of the developers, right. So it's going to be the idea of write it once and run anywhere, right. That's the idea that most developers are gonna buy for Kubernetes. But it's, I say, it's an iceberg, because, you know, below the surface, there's a lot of challenges, a lot of problems that you have to suffer for real when you start implementing Kubernetes. For real for a company for enterprise, like GumGum. I mean, GumGum is not a huge company is a mid level company. But right now, we have probably 30 Kubernetes clusters running up production, and probably a bunch of applications. So yeah, so that was the the idea for, for writing the blog post, you know, describing precisely as I mean, with as much detail as possible, all those challenges, all those problems that we have faced implementing Kubernetes is and how we solve that. I mean, how we solve those problems. I mean, it's I am not saying that is the only way for solving those problems. But that was our approach, you know, for solving all those gaps or those problems or limitations that Kubernetes has as, as I mentioned, Kubernetes doesn't have I mean, doesn't include some batteries, as I say in the article as well. You have to provide some batteries. You know, to Kubernetes like logging, like monitoring like security, like volume management. So yeah, that's all about the the article, actually. I am preparing the second part for the article, because the article was long enough. So I'm gonna have like a part two mentioning some other aspects of that hidden part of the iceberg.

Ravi Lachhman  20:23
Awesome. I'm excited for the the part two, I think one thing people who, let's say use, like MiniKube on their desktop, you know, what they don't anticipate is how pluggable Kubernetes is. So, like, you go through the article that you're changing certain providers, or you're changing certain opinions in Kubernetes. And that's really hard, right? If you take like the difference, I think between your approach your software engineer, right, so like you started as a software engineer, so did I; some of the, I think tension in organizations come up with if someone started as a system engineer, so they're usually if you remember those like runbooks, like IBM Redbook, like, this is exactly how you do something. Right? Yeah, they're used to that. And but as a software engineers, like you and I were used to trying things over and over until we get it right. And so Kubernetes, this is awkward bridge between, you know, it's a piece of infrastructure that as software engineers, first thing, you know, we can change things. Don't worry, if you don't like it, we'll replug it, we'll use Istio now, instead of Calico, or something, just making something up there. But as a system engineer, they don't like that. They're like, we need to know the best practice. And you said very eloquently that a lot of Kubernetes is an emerging practice. Right? So best practice is not quite there. It's just the tip of the iceberg. And so I kind of like wrapping up a two questions here for the podcast. But wait, where do you see the ecosystem going? Just like you're very close to it. Like where would you say there's the most room for improvement in the entire Kubernetes ecosystem if you had a magic wand?

Santiago Campuzano  22:07
I think yeah. So just as you just say, the Kubernetes ecosystem is huge. Well, the thing with the Kubernetes architecture is that well, the architecture is pluggable. So you can plug a lot of components like CNI plugins for networking, like DNS plugins, like Ingress Controllers, so Kubernetes is super pluggable, that the ecosystem is huge. Actually, I mentioned that in the article, if you go to to the CNCF landscape for Kubernetes, it's overwhelming, you know, at some point, so if you try to look up for you know, Ingress Controller, you could probably find 20 or 30, different Ingress controllers. So you will be like, man, what's the right one? I mean, what should I use, or you tried to look up for volume management providers, there's going to be probably, I don't know, 10 or 20, different volume management providers. So I think probably in the in the short term, you know, that landscape is going to be probably, it's going to be like stabilize, at some point is going to be more stable. I think some products, or some providers of Kubernetes are going to be more mature, are going to be more used, and probably some others, you know, because right now, we are living that hype, I would say so probably a lot of people is trying to you know, a lot of companies, a lot of companies and vendors are trying to sell products for Kubernetes. So not so that hype is is making that landscape bigger and bigger. But I think that that's going to be probably settled down. And probably a probably we will have more stable products and providers for Kubernetes. Meaning that you will have options, obviously, but they will be fewer options and more mature options that you can implement into into Kubernetes because I remember working with a colleague at the GumGum, and she wants to implement, like a beta version of networking provider, and it was for a production Kubernetes cluster. I was like, Nah, I mean, it's it's a nice product, but it's not mature enough. You know, it's in it was in an alpha or beta version. And actually, that project right now is, I think is is being deprecated because of that, because it was just, you know, starting up so I think that's going to happen with the, with the Kubernetes ecosystem. A lot of companies you know, are trying to make money out of Kubernetes right now. Everyone is trying to sell anything for Kubernetes.

Ravi Lachhman  24:43
We should make a Kubernetes air freshener for your car. 

Santiago Campuzano  24:47
Right  that's gonna sell millions actually.  

Ravi Lachhman  24:56
Yeah. Yeah, awesome answer. I mean that that's as eloquent as any any answer, it's it's the answer from experience. It's like any other technology, right? Like any fast moving technology, maturity, we'll call it that this is coming from a seasoned architect like Santiago you've seen multiple waves in it from, you know, the Java world, the container world, the Kubernetes world. So I really, I really enjoyed our conversation. I always end the podcast and one question. This is our final question for the podcast. And it's kind of a intrinsic question. So a philosophical question. So let's say that you were walking down the street, where you live today, and you ran into a younger Santiago who just graduated university or school, what would be something you would tell your younger self, it could be anything like, don't go to jail, don't eat that, you know, like, it could be any piece of advice, he would tell your younger self?

Santiago Campuzano  25:52
I would say Santiago, don't study computer science, please study philosophy or something different. I'm just kidding. I'm just kidding. Probably. I didn't say to my younger self. Yeah, be more, be more skeptical. Be more patient. Be more, I don't know. Enjoy your career. You're on the right path. continue like that.

Ravi Lachhman  26:23
There you go man.

Santiago Campuzano  26:25
I really I really love my profession. To be honest, I really love my profession, my career and I've been so fortunate. Actually, I am going to reveal this into this podcast. But my dream as a when I was a student computer engineer at the at the college was to work for a very big American company. And that dream came true. I am working as a lead debuffs engineering. I never imagined managing a team. I am kind of doing it right now. So yeah, thanks so much Ravi, for for having me in your podcast. Actually, I have to confess that this is my first podcast in English because English is not my first language. So yeah, thanks so much for that. I thought there was gonna be more narrows, but it wasn't this. This

Ravi Lachhman  27:20
Was great. You speak better English than I do. And I only know English. Well, thank you so much, Santiago for your time on the podcast. I really enjoyed it. The listeners are gonna find this great. And yeah, again, Santiago. Thank you so much for being on ShipTalk

Santiago Campuzano  27:40
Thanks so much Ravi. Again, and guys, whoever listening to this podcast please try Harness; Harness is amazing  by the way.

People on this episode