Chatwork logo

Chatwork Uses Amazon EKS to Increase Operational Efficiency, Support Over 6.5 Billion Messages

2021

Chatwork has been using Amazon Web Services (AWS) to power its popular business chat tool "Chatwork" since March 2011, when the AWS Asia Pacific (Tokyo) Region first became available, and has been evolving its architecture ever since. The company moved to a container-based architecture in 2016 and was self-managing Kubernetes on Amazon Elastic Compute Cloud (Amazon EC2). However, Chatwork found that the administrative and operational load of performing Kubernetes version upgrades every 3 months was too much. This prompted the company to fully migrate to Amazon Elastic Kubernetes Service (Amazon EKS) over the course of 2 years from 2018 to 2020. As a result of the migration, the time it takes to rollback or failover after a release issue has decreased by 95 percent, and the operational work required to perform a release has decreased by 90 percent. 

Chatwork Company Image
kr_quotemark

By adopting Amazon EKS as our platform for our business chat tool ‘Chatwork,' we offloaded the work of configuring and building Kubernetes clusters, which reduced the stress of our engineers and accelerated development. We will continue to meet the needs of our customers and offer innovative working styles through our services.”

Shigetoshi Kasuga
CTO and VP of Product, Chatwork Co., Ltd.

Chatwork's "Small-Start" Release Model and Move to Containers as Number of Users Increased

As part of its mission of  "making work more fun and creative," Chatwork continually aims to provide more innovative ways to work. The company is a pioneer of business chat, releasing the Chatwork app in March of 2011. Today, more than 304,000* companies use the Chatwork service (*data as of the end of February 2021). The app features a user-friendly design—an open platform that allows users to communicate both internally in the company as well as externally with one account—and flexible pricing options starting from a no-cost "freemium" plan. With the COVID-19 pandemic increasing the necessity for remote work, the number of users increased dramatically starting in April 2020, with over 36.5 percent year-over-year growth in registered users in 2020.

Chatwork had used Amazon EC2 and Amazon Simple Storage Service (Amazon S3) for certain parts of its business since 2007, which led to the decision to continue using AWS as the company released the Chatwork service. Shigetoshi Kasuga, Chatwork's CTO and VP of Product, remarks, “We were interested in the AWS concept of usage, which can be likened to turning on the tap to get water when you need it, and so we became one of the first companies in Japan to adopt those services.”

When the Chatwork service was released in 2011, it consisted of a simple LAMP stack composed of open-source software. By 2014, more than 50,000 companies were using the service, and the data layer load soared as the number of chat messages increased. In response, Chatwork implemented several measures to improve its scalability and quality of service between 2014 and 2015, including adopting Amazon CloudSearch for message search functionality. In 2016, with more than 100,000 enterprises now using Chatwork, the company adopted a distributed system for its message databases and containerized its applications with Kubernetes running on Amazon EC2.

“We containerized our platform to keep up with the pace of business,” explains Kasuga. “The Chatwork service was becoming social infrastructure at this point, and any service failure or outage would severely impact our customers’ business. To ensure that our customers could use our service reliably, we knew that, in addition to offering a high Service Level Agreement (SLA), we also had to continually improve the end-user experience. For these reasons, we proceeded to begin decoupling our service platform, adopted containers in order to allow for fast, short, iterative deployment cycles, and chose Kubernetes as the orchestration layer to take advantage of its significant ecosystem of open-source tools.”

Migrating to Amazon EKS Reduces Upgrade Process Workload

Due to issues with the tooling used at the onset of Kubernetes adoption, the operational load to manage each cluster was high, and upgrading Kubernetes versions become a major problem. In an attempt to tackle these issues, the company decided to migrate to Amazon EKS as soon as it became Generally Available in the Asia Pacific (Tokyo) Region in 2018.

Ryo Sakamoto from Chatwork’s Product Department SRE Division explains, “We had been using open-source tools to manage our Kubernetes clusters. This meant we had to do everything from building, managing, and operating the control plane ourselves. As the system grew in scope, the AWS CloudFormation specification files became increasingly more complicated, and operational load increased. Additionally, the OS we were using to host the previously-mentioned open-source tools was coming towards the end of support. For all these reasons we decided to adopt Amazon EKS.”

Chatwork migrated its messaging application platform and its PHP web application platform to Amazon EKS. The company began migrating its messaging application in late 2019 and the web application in July 2020, completing both projects by the end of August 2020.

The migration was done with two goals in mind: the clusters would be upgraded in a timely manner in accordance with quarterly Kubernetes releases, and failover risk during deployments would be kept to a minimum by building new clusters to ensure safe failover. Chatwork also adopted the open-source command line tool eksctl to create Amazon EKS Kubernetes clusters, as well as the Kubernetes package manager Helm. Operations are automated wherever possible in order to avoid dependency on any particular individual.

“Because of this, migrations to Amazon EKS are now managed automatically, and code version control, which up until this point had been done inside Amazon EC2 instances, could now be deployed and rolled back on a per-image basis,” says Kota Ozaki of Chatwork’s Product Department SRE Division. “We've shortened deployment times from 20 minutes to around 5 minutes. If you exclude the time it takes for all the pods to be replaced, deployments now only take about 35 seconds.”

Shorter Failover Times During Deployments Free Developers From Unnecessary Stress

As a result of migrating to Amazon EKS, configuration changes to the Chatwork service infrastructure can be done more easily and with greater flexibility, and it is now possible for them to take an SLA-based phased release strategy. The time it takes to perform failovers when release issues occur has been cut by 95 percent, and overall operational workloads during releases have dropped by 90 percent. Additionally, the system now has the operational resilience and flexibility to withstand usage surges resulting from things like marketing campaigns.

“For our developers, the greatest benefit was that we’ve reduced stress. We make deployments almost every day to improve our service, so ensuring that critical infrastructure isn't interrupted puts a lot of pressure on developers,” Kasuga explains. “Managing the Kubernetes control plane is a lot of work, but, now that we've offloaded this work to Amazon EKS, engineers can focus on improving applications with declarative Kubernetes deployments and failover quickly when issues do occur. We can now respond quickly to customer requests.”

As an additional, albeit unexpected, benefit, the availability of flexible management APIs from Amazon EKS has allowed for an optimization of system costs, and Chatwork expects cost optimizations of up to 80 percent as a result of adopting and integrating Amazon EC2 Spot Instances into its infrastructure.

Adopting a Microservice Architecture and Reorganizing Teams Accelerates Development

Now that almost all of its applications have finished migrating to Amazon EKS, Chatwork is seeking to further optimize operations in its Kubernetes environment. “We’re planning to adopt an Argo CD and Flux hybrid GitOps structure for CI/CD. We’re also aiming to automate software delivery during cluster updates and expect to further cut down deployment times by doing so,” says Sakamoto.

Ozaki adds, “Because our service holds critical customer data, we’re constantly strengthening security through penetration tests, security scans, and introducing security visualization tools for container environments.”

The company plans to continue enhancing its service platform architecture and speed up development times by shifting all of its applications to a microservices architecture and reorganizing teams according to those service functions. On the business side, Chatwork wants to analyze user trends using machine learning and promote sales measures to attract more paying users.

"For Chatwork, AWS is core to our system infrastructure and is indispensable to the services we provide,” says Kasuga. “AWS Solution Architects understand our services, so we’re looking forward to their ongoing support as we expand our capabilities."


About Chatwork

Based in Tokyo, Chatwork is the developer and operator of the "Chatwork" group chat app for global teams, which includes secure messaging, video chat, task management and file sharing functionality. 

Benefits of AWS

  • Failback time at release cut by 95%
  • Operating costs for release cut by 90%
  • Flexible operation that withstands surges in traffic
  • Spot Instances reduce Amazon EC2 costs by up to 80%
  • Developers freed from psychological pressures
  • Automated app delivery for cluster updates is under consideration

AWS Services Used

Amazon EKS

Amazon Elastic Kubernetes Service (Amazon EKS) gives you the flexibility to start, run, and scale Kubernetes applications in the AWS cloud or on-premises.

Learn more »

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. 

Learn more »

AWS CloudFormation

AWS CloudFormation gives you an easy way to model a collection of related AWS and third-party resources, provision them quickly and consistently, and manage them throughout their lifecycles, by treating infrastructure as code.

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. 

Learn more »


Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.