Rethinking Data Monitoring System

Cloud Infrastructure Redesign

How can we redesign the data monitoring experience systemically to attract data admins from the legacy software?

85%

users migration from legacy software

12x

Existing cloud user monitoring engagement

Company

Rubrik

My Role

Senior Product Designer

Timeline

Mar 2022 - May 2023

Team

UX researchers (2), product managers (5), product designers (3), engineers (8), visual designers (3), UX writers (2), sales (3), customer support (5), marketing director, legal team

Want to see a deep dive of my design process? Check out my slide deck !

Business Problem

Rubrik is a data security company that has been planning for IPO since 2020, why the long wait? In order for Rubrik to IPO as a SAAS company, we need to reach certain threshold of SAAS revenue, meaning we need majority of the users to migrate from the legacy local software to the new cloud SAAS software - Rubrik Security Cloud. By the time the start of this project, Rubrik remained only 15% of existing customers adoption rate to the new cloud software which is way below target.

What's the Challenge?

Data security is the infrastructure of all operating businesses, our customers and decision makers (Data admin, super admin, Data Security Officer,etc.) normally prefer familiarity and utility over user experience and intuitiveness.

Many basic core functionality in the legacy software (Data health monitoring, cluster job operations, disk replacement, etc.) are not maturely developed in the new SAAS software (since the emphasis on the new software development has been cloud ransomeware detection and other promoted product domains). Therefore, the customer stickiness of the legacy local software is extremely high.

Users had no motivation or sense of security to migrate to the new cloud software, since their basic day to day data monitoring operations on the legacy software can not be satisfied by Rubrik Security Cloud.

Our Idea

Our final push before IPO, is to deliver a full fledged data monitoring and operational system on Rubrik Security Cloud, even better than the legacy software, tackling users' pain points, reconsider their needs and redesign the hierarchy and functionality of all of their monitoring and operation interactions, attract users with a more promising and optimized experience, in order to achieve the migration of majority of existing users to the SAAS software, and accomplish the SAAS revenue target for IPO.

Result

We delivered data monitoring and operational in Rubrik security cloud with 4 distinct tiers of contextual views, replacing the single-layer dashboard in the legacy software, adding 8 more operational features, visualizing 17 more data points, and bringing 15 back-end access points alive in the UI.

After the launch of the new data operation system, we achieved 85% of existing user migration, and 10x of average individual users' engagement, reduced support cases by 70%. Secured the groundwork of a successful Rubrik IPO in 2024. 🎉

"The best part of the RSC (Rubrik Security Cloud) is the monitoring center." - PWCC data admin

"We've accomplished the most successful and crucial groundwork transformation." - Vasu Murthy, Rubrik Chief Product Officer

Familiarizing with users' roles and behaviors

Our customers purchase tangible "Data Clusters" from Rubrik to store their company data. Our users- data admins, utilize the Rubrik legacy local software that indicates the cluster data transmission status to maintain and monitor data health, persistency, and troubleshooting for each cluster, making sure all the data back up and recovery operations going as planned, coordinating for more storage spaces if needed.

Many of them operates both on the Rubrik local legacy software and back-end command line environment for operations not provided by the software. On average, small businesses own 1-2 clusters, large enterprises customers like Home Depot and Disney owns 3000+ clusters. Data admins rely heavily on Rubrik support on some basic data operations on the legacy software, we have received 66,267 support cases in the year of 2023.

Identifying pain points and opportunity areas

We conducted 90 min exploratory interviews with a broad range of user segments including small, medium, and large enterprise users and managed service providers. We asked them to walk us through their current cluster management experience and provide feedback on proposed product enhancements. We also got additional feedback from internal experts including professional services, support, and sales engineers who assist users with cluster operations and monitoring.

Interview Recruitment

Research Synthesis

“If it ain’t red or dead I don’t care.” - - Genuine Parts Company

Tracking backup and recovery failures is the single most important task a cluster admin performs daily, followed by troubleshooting failures and infrastructure health issues.

“My manager cares about compliance but this graph is not actionable for me.” - PCCW

IT Managers need higher level trends from the cluster dashboard to track health of the system but admins who manage the cluster on a day to day basis are more interested in actionable data that can be used to triage possible issues.

“Being able to predict capacity utilization & when hardware needs to be ordered would be powerful.” - Buncombe County

Almost all cluster admins build and maintain custom excel sheets or tools outside of Rubrik to track cluster capacity usage trends and estimated runway. The data provided in Rubrik is not robust enough to predict and plan for future hardware needs.

“We are never sure where the performance bottlenecks are coming from.” - Epic Holdings

Although users can view system performance, network utilization, throughput, etc. they do not have the ability to correlate these factors in one place to ensure the cluster is performing efficiently.

“I do not know what healthy means. Is 80% CPU utilization on a node normal?” - Sales Engineer

Users want to view more node level details like disk, network, power supply and memory status in the UI instead of run back-end command line interface for each node. They also want proactive UI trend alerts and recommendations to help manage hardware health.

“If it is not a basic backup or SLA level problem, it has to go to support” - King's College

Users prefer conducting cluster operations themselves from the UI instead of relying on support - e.g. running cluster health checks, checking connectivity status, diagnosing and fixing bad nodes/disks, turning on the LED on a bad disk, etc.

Key Takeaways

Information hierarchy: The current legacy UI does not highlight the most commonly viewed data points, such as back up and recovery failures, etc. Users would like to see a high level overview and identify these data points/ trends in one glance.


Actionable data: Users would like to see instructions and alerts beyond data points, to eliminate the time effort needed to calculate and identify the appropriate next steps.

Visual data correlation: Usually it takes comparing and matching different data trend (CPU vs. Network Activities) to discover a problem, it's difficult for them to perform the comparison without exporting into data points in an excel sheet

Missing important data representation: Certain statuses and data points are not provided by the legacy software (ig. node health statuses, power supply statuses, etc.)

Dependencies on Rubrik support: Users heavily rely on Rubrik's support for some basic operations, they prefer being able to perform these actions through UI themselves

Dependencies on back-end command lines: Many operations and data comparison are not provided by the legacy UI, users will have to resort to back-end command when they need to switch clusters or nodes.

Vision & Strategy

Create the concept of a ‘Centralized Monitoring Hub’

First thing first, we are moving away from that "one monitoring center for each cluster" structure for local uses. Instead, we are providing users a cloud-based multi-panel centralized monitoring hub for all of their Rubrik clusters with the consideration of scalability catering to both small customers (1-2 clusters) and large enterprise customers (3000+ clusters).

User Testings and Feedback Cycles

We conducted 90 minutes usability sessions where participants explored the design prototypes. Participants provided feedback for cluster health monitoring at a top level view (card/list view), drilled down into a specific cluster (dashboard), located and went through a hardware failure workflow, and evaluated network configuration workflows.

Positive Feedbacks

“So visually you can see if you are good, or if your cluster requires attention. Then you see the runway. That’s cool. That’s a very nice visual. You see immediately what you would like to know for your cluster.” - Senior System Engineer, PCCW

“What you're showing here [on the dashboard] at the high level is mostly everything we need to see in the single panel.” - Cloud Virtualization Manager, Cranfield University

  • Users feel the overall information architecture and workflows are a big improvement, and are excited to explore the new platform

  • Users like the overview of information for their clusters and the ability to drill down for more context

  • Users feel they have more autonomy and can complete operations they would otherwise have to depend on support for

  • Users feel more informed about the health of their clusters and being able to see key metrics at a glance

Learnings

Feedback from the prototype walk-through confirmed that our design was largely on the right track. Interviewees provided suggestions regarding the interactivity of data visualizations and emphasized the importance of certain key metrics on each monitoring panel. They also inquired about access points for specific settings and operations. Most suggested improvements were UI-focused, and overall, participants appreciated the concept of a centralized monitoring center with multi-layered control panels for each cluster.

Design Summary

With the incremental development approach, at the wrap of this project. We centralized all cluster environment into one platform. For each monitoring environment, we delivered 8 more operational features, visualized 17 more data points, and brought 15 back-end access points to the UI.

Global Dashboard

Cluster Gallery

Cluster Dashboard

Hardware Operational Dashboard

Product Outcome

Localized into 8 languages and adopted globally. With key features now supported in the cloud, 85% of users migrated from legacy software to the new platform before Rubrik IPO.

Till 2025, engagement with cluster monitoring jumped 12x among existing cloud users.

I collaborated with the visual team to create 32 new components—now used across other product domains at Rubrik.

Business Outcome

With a successful legacy user migration, this project helped secure SaaS revenue as Rubrik’s main revenue stream—laying the groundwork for its 2024 IPO.

Rubrik Security Cloud now powers all core services. Following the launch of the Monitoring Hub and cluster monitoring, the company officially began phasing out the legacy software.

Interested in A Deep Dive?

To know more about design decisions, data collection, user validation and all the bits and pieces about this project, check out my slide deck ->