Handling language stack deprecations: Part 1: Virtual Machine infrastructure

This post is a continuation of this tweet thread Language deprecation for stacks can be a task if you are on VMs, added to that the confusion on what version of that stack runs, in your inventory if it's not small. Summarizing what we ended up doing to bring visibility & giving people the ability to migrate themselves (1/n) — Tasdik Rahman (@tasdikrahman) January 29, 2021 Compute VM’s Given the nature of VMs and how they are run and created in our compute infrastructure environment. Managing, upgrading and adding fixes to them becomes a task in itself. Given that there is no control plane to control the lifecycle of these VM’s, the task is manual at best even though there is automation to delete and create VMs on demand (more on the VM creation API which we created in a different post). ...

February 2, 2021 · 7 min · Tasdik Rahman

Maintaing aptly - The debian package manager

This post is a continuation of this tweet thread Sad to see aptly slowly https://t.co/zkXgsruAGi rotting, but works really well till the last 1.4.0 build as a debian package repository for your needs. (1/n) — Tasdik Rahman (@tasdikrahman) November 11, 2020 Aptly is a debian package repository, the specific use case which we are using it for is pushing out application specific debian packages which will then be pulled out while deploying a new SHA/version of the application, to the app boxes. More on this in another post. But what this post will concentrate on, are a few things which we discovered while maintaining aptly, storing packages which ran into storage spaces consuming multiple TBs. ...

December 23, 2020 · 5 min · Tasdik Rahman

A few notes on GKE kubernetes upgrades

This post was originally published in Gojeks engineering blog, here, this post is a cross post of the same This post is more of a continuation to this tweet A few notes on @kubernetes cluster upgrades on GKE (1/n) — Tasdik Rahman (@tasdikrahman) July 21, 2020 If you are running kubernetes on GKE, chances are that you are already doing some form of upgrades for your kubernetes clusters, given that their release cycle is quarterly, which means you will have a minor version bump every quarter in the upstream. That is really a high velocity for version releases, but that’s not the focus of this post, the focus is on how you can attempt to keep up with this release cycle. ...

July 22, 2020 · 15 min · Tasdik Rahman

Our learnings from Istio’s networking APIs while running it in production

This was originally published under Gojek’s engineering blog by me, this post is a repost. We at Gojek have been running Istio 1.4 with a multi-cluster setup for some time now, on top of which, we have been piloting a few reasonably high throughput services in production, serving customer-facing traffic. One of these services hits ~195k requests/minute. In this blog, we’ll deep dive into what we have learnt and observed by using Istio’s networking APIs. ...

June 17, 2020 · 10 min · Tasdik Rahman

Specifying scheduling rules for your pods on kubernetes

This is more of an extended version of the tweet here If you haven't had a look at pod-affinity and anti-affinity, it's a great way which one can use to distribute the pods of their service across zones. https://t.co/iqhbyhruD8 (1/n) — Tasdik Rahman (@tasdikrahman) February 23, 2020 PodAntiAffinity/PodAffinity were released in beta some time back in 2017, in the 1.16 release for k8s, along with node affinity/anti-affinity, taints and tolerations and custom scheduling. ...

May 6, 2020 · 5 min · Tasdik Rahman

A Few Notes on Etcd Maintenance

This was originally published under Gojek’s engineering blog by me, this post is a repost. If you have worked around managing Kubernetes clusters on your infrastructure — instead of going with a managed version provided by cloud providers — chances are that you already are managing an etcd cluster. In case you are new to it, this post is for you. We’ll get the basics out of the way first, and define what Etcd is. ...

April 24, 2020 · 7 min · Tasdik Rahman

Introducing Kingsly — The Cert Manager

This was originally published under Gojek’s engineering blog by me, this post is a repost. There’s one thing all devices connected to the Internet have in common — they rely on protocols called SSL/TLS to protect information in transit. SSL/TLS are cryptographic protocols designed to provide secure communication over insecure infrastructure. Any communication over the public internet should be encrypted, for which we need SSL certificates. There are many cases for public communication in GOJEK as well. Some of them are listed below: ...

April 22, 2020 · 7 min · Tasdik Rahman

Route missing in kubernetes node with kuberouter as the CNI

Anyone who is evaluating into having a networking solution for their kubernetes cluster without having a lot of moving parts in the cluster, kuberouter provides pod networking, ability to enforce network policies, IPVS/LVS service proxy among other things. The problem which we faced specifically while running this in our clusters was missing routes upon restart of the node, or sometimes in the case when the node was joining the cluster as part of the worker node. ...

January 5, 2020 · 2 min · Tasdik Rahman

Various ways of enabling canary deployments in kubernetes

Update I gave a quick lightening talk about the same talk @ DevopsDays India, 2019. The slides for which can be found below What canary can be Shaping the traffic in a way, so that we could direct a % of traffic to the new pods and promoting the same deployment to a full scaleout and gradually phasing out the older release. Why canary? Testing on staging doesn’t weed out all the possible reasons for something failing, final testing for a feature being done on some part of the traffic is not something unheard of. Canary being a precursor to enable full blue green deployments. ...

September 12, 2019 · 4 min · Tasdik Rahman

Handling signals for applications running in kubernetes

When the power goes off in a device in a linux based system, one can think of ways in which this event can be handled in the applications running on it. One thing to note is that, when you plug the power cable off, the power doesn’t really go off immediately. But this needs to be notified to processes so that they can handle such an event and save the state of the application (if any). ...

April 24, 2019 · 6 min · Tasdik Rahman