Mattermost v10 to the power 5: How our platform scales to 100k users

A scalable collaboration solution is essential in mission-critical enterprise environments. Whether to avoid communication breakdowns with an out-of-band communication platform or centralize workflows and ensure data sovereignty across the organization, Mattermost customers depend on us to support their scale with a high level of reliability.

The Mattermost server has always outperformed its scale. Built using Go, the Mattermost Server is light on both installation dependencies and cpu/mem. We also recognized at the very start that being self-hosted and clustering the application (being highly available) was foundational to reaching performance and reliability at scale. In those early years, we proved the server would easily scale past 25k users when the supporting components were correctly sized.

Since the early days, we’ve added a ton of functionality in Mattermost. Threaded messages, persistent drafts, message priority, and acknowledgments are a few of the many highlights. And with even more coming soon, we knew we needed to revisit our testing methodology, move beyond release performance testing, and expand into what we internally called our “Ceiling Test Program.”

A new model for testing

This new “Ceiling Test Program” was a heavy lift — a First Principles level of heavy lift. The program required updating the load testing tool and a deep analysis of the usage patterns of the current Mattermost releases.

Customers shared their data with us, allowing us to review the ways in which real users interacted with their systems. We analyzed team counts and switching, the number of channels, threaded messaging, and even the frequency of users logging on/off. All these operations formed part of the new test model.

It was also well understood that there is a performance difference when considering an empty vs. production database. The updated load test tool and new use model were used to generate a representative production database with 100m posts and all the data that comes with it. We now had the foundation for the ceiling tests.

Round 1: Finding the ceiling

The first round of ceiling tests run early this year not only proved the quality and scale we had reached, but also formed the basis for the first Reference Architectures we published. The largest deployment of Mattermost would now support 88,000 concurrent users. This is a six-node cluster with a single DB writer supported by four reader DB nodes. After finding the maximum number of supported users, we now set to create a more granular list of deployment sizes. This was designed to match the guidance our customers needed.

Round 2: Power up the deployment and retest

Much like creating the representative production database of 100m posts, we extended the testing to further align with how our customers configure Mattermost. We implemented SSO for the users’ logon/logoff activity and integrated Elasticsearch. Elasticsearch needed an index in production, so again, we built that and — like the 100m post database — made it part of the testing tool. We then ran our ceiling tests again.

In this round of tests we added an additional step; learnings from the earlier ceiling tests were incorporated into the product or configuration (all of which is included in the recent 9.11 ESR, released in Aug).

The ceiling was higher. Now the largest deployment of Mattermost would support 100,000 concurrent users.

The math nerds out there will note that 100,000 is 10 to the power 5. Happy coincidence!

Mattermost v10 and beyond

We have updated our Reference Architectures for Mattermost v10, with a wider list of supported deployment sizes, more details on instance types (this round was all run on AWS), details on the Elasticsearch node sizing and more. We will also be adding more of the testing methodology to our documentation, with the aim that anyone can reproduce the tests.

Beyond this, we plan to extend where we run our ceiling test program. Today we are looking at both the Azure and OCI (Oracle Cloud Infrastructure) platforms. The focus for our Ceiling Test Program is to meet our customers needs, supporting Mattermost wherever they want to run it.

We recommend a health check to perform a comprehensive evaluation of your system’s current operational health as you look to scale your environment. To get this process started, please generate and submit a Support Packet through our Support System as a standard support request, including “Health Check Provided” in the subject line or reach out to your account team.

The post Mattermost v10 to the power 5: How our platform scales to 100k users appeared first on Mattermost.