The Uprise team decided to build the new product using a microservices architecture and use Docker with Mesos and Marathon Technologies. They thought this combination would reduce the time needed to deliver the product and will provide the development team with the full autonomy necessary to quickly iterate and release software updates.
The team quickly managed to set up the new infrastructure and started integrating it with the development workflow. Although the team enjoyed fast release cycles, they faced stability issues like deployments getting stuck or application crashes a few times a week. These issues risked delaying the release of product’s alpha version to customers.
Within a few months there was an overall feeling of frustration because time spent debugging these platform issues was stealing resources that could have been used to focus on product development. The team also needed to invest about 20% of their time in development of custom tooling and integrations to augment functionality the platform didn’t provide out of the box.
These problems lead to a deep lack of trust from the developers in the new tools and methodologies — and risked in putting the whole migration to Docker initiative on hold.
Mesos was not reliable. Leonid helped us migrate to Kubernetes, a mature, easy-to-use platform. The migration allowed us to focus on R&D work rather than worrying about deployments and infrastructure troubleshooting.
Rotem Fogel, Development Manager
The Uprise team sought out Leonid for help with stabilizing the new platform and expertise in Docker based infrastructure design.
After evaluating the existing Mesos and Marathon installation, we decided to concentrate all the efforts on rebuilding the platform on Kubernetes which better met the requirements of the team.
The Uprise product had low network latency requirements. We started the project by experimenting with a few Kubernetes networking drives and eventually chose Calico as the driver that could provide the best performance.
We decided to automate Kubernetes cluster creation and maintenance using Puppet. The company owns multiple datacenters. We built the automation modules with that in mind and made sure that it would be easy to create new Kubernetes clusters in the future.
The team had already used Jenkins in other projects, so we decided to continue to use Jenkins to minimize the amount of onboarding for new software. We integrated Jenkins with Kubernetes and automated the build plus the deployment of all the microservices using it.
Finally, before launching the product on the new platform, we worked with the development team members to train them on the Kubernete’s CLI and terminology. It was important to allow the team operate the system themselves.
Leonid’s calm and professional approach to problems is inspiring. He manages to concentrate on the results instead of being distracted by the daily noise. His approach helped us launch the product on time and made our developers happy working with the new infrastructure.
Doron Ben-David, CTO
Within a week after launching the product on the new Kubernetes cluster, the team felt that the development got past the major bottleneck, and within a few weeks the alpha version was released to production.
We saw positive results immediately. The amount of infrastructure-related problems dropped from a few incidents a week to almost no incidents in months, which constituted to a higher overall uptime of the new product.
The new platform also reduced the amount of manual touch points between the developers and the DevOps team, which allowed the teams to focus on their tasks instead of constantly being in crisis repair mode, which in turn increased productivity by at least 30%.
Kubernetes is now the de facto platform that is used for every new project within the company. Legacy applications are also being ported to Kubernetes in an effort to standardize how the services are deployed and maintained across all the products of the company.
Developers have increased trust in the new platform, and the time for launching a new service to production was reduced from a few weeks to days.