The revolution in critical incident response at Dock: efficient integration and service improvement

Dock's transition to FireHydrant addressed previous challenges and also significantly optimized response time to critical incidents. Effective integration allowed for a more customer-centric approach, improving the quality of service provided. Learn how.

The FireHydrant Teamprofile image

By The FireHydrant Team on 2/13/2024

This is a translation of an article written by Renato Matos, Director of Dock, and published on LinkedIn on Feb. 9, 2024.

In this article, we will explore how Dock is working to significantly enhance its response time to critical incidents, emphasizing effective integration between tools as key to success. We will address how we challenge the conventional approach by shifting the focus from Mean Time to Acknowledge (MTTA) to Mean Time to Combat (MTTC), a customized metric that measures the time between incident detection and effective communication involving professionals capable of resolving it.

Current scenario and tools

Dock provides technology for payments and banking in Latin America and, for over 20 years, has been fulfilling its mission to democratize access to financial services, promoting the inclusion of millions of unbanked and underbanked individuals. The company manages technology, operations, and regulatory complexity so that clients can focus on expanding their businesses. Dock operates 70 million active accounts and over eight billion annual transactions. Using the Atlassian suite for ITSM, the company manages incidents, changes, and requests in Jira. Additionally, tools like Opsgenie, Confluence, Slack, Zoom, and StatusCast are crucial for communication and incident resolution. Monitoring tools include Splunk, Datadog, and Grafana.

Main challenges

Despite having cutting-edge tools, efficiently integrating them was a challenge for Dock. The pursuit of automation, elimination of redundancies, and parallelization of activities were crucial. The limitation of native integrations between Atlassian tools and Slack, for example, led to the exploration of alternative solutions such as Zapier.

Solution adopted

Dock adopted FireHydrant as a central management system for critical incidents. Natively integrating with Opsgenie, Jira, and Slack, FireHydrant simplified the workflow. With two distinct ways of declaring incidents, the tool demonstrated effectiveness in both automatic alerts and human interventions. This centralized approach significantly reduced MTTC, providing a unified view for the entire response team.

Benefits achieved

The implementation of FireHydrant resulted in a significant reduction in MTTC, directly impacting Mean Time to Recovery (MTTR). Operational efficiency was improved, allowing each tool in Dock to play its specific role. Communication became faster and more assertive, automatically reflecting on the internal Status Page and dedicated Slack channel. The tool also optimized the postmortem process, synchronizing information between FireHydrant and Slack.

Points of attention

Although Opsgenie and Jira were solid tools, some limiting points, such as integration with Slack and cultural change, led to the adoption of FireHydrant. Preserving existing tools without significant changes was crucial, as was maintaining the Opsgenie structure and sharing a service catalog between Opsgenie and Jira.

Conclusion

The transition to FireHydrant at Dock not only addressed previously faced challenges but also significantly optimized the response time to critical incidents. Effective integration between tools allowed for a more customer-centric approach, improving the quality of the service provided. This advancement highlights Dock's commitment to providing innovative and efficient solutions to its clients, positioning itself as a reference in the financial services sector.

See FireHydrant in action

See how service catalog, incident management, and incident communications come together in a live demo.

Get a demo