Bio: I have 8 years of experience working on large-scale distributed storage systems at Google as a Site Reliability Engineer. My current focus is establishing and negotiating SRE engagements with developer teams. On the technical side of things, I lead a team that works on fleet expansion, while maintaining a good balance between release velocity and reliability.

After hours, I play jam sessions.

L8 Conf talks

Testing in SRE with DiRT

Disaster and Recovery Testing (DiRT) is a flavor of Chaos Engineering used by Site Reliability Engineers (SRE) at Google. Chaos Engineering is often characterized as "breaking things in production", which lends it an air of something only feasible for elite or sophisticated organizations. In practice, it’s been a key element in digital transformation from the ground up for a number of companies ranging from pre-streaming Netflix to those in highly regulated industries like healthcare and financial services.

In this talk, you will learn the basic prerequisites for Chaos Engineering, including a couple pragmatic ways to get started.

Patryk Hes