Getting Started with Site Reliability Engineering

This is a quick primer to get started in Site Reliability Engineering if you're interested in becoming a Site Reliability Engineer (SRE).

Robert Rossprofile image

By Robert Ross on 8/16/2021

Site Reliability Engineer (SRE) is one of the fastest growing jobs in tech, with Linkedin reporting 34% growth YoY in 2020 and over 9000 openings in their Emerging Jobs Report.

If you’re new to SRE and exploring it as a career path, understand that it can be a challenging but rewarding experience. Here are some quick tips on how you can get started with SRE and jump-start a rewarding career.

What do SREs do? 

Before we dig into what you need to know to get started, you should absolutely know what site reliability engineering is and what an SRE does.

So, what is Site Reliability Engineering

Simply put, it’s a practice of engineering software that makes your system more reliable.

And what are SRE teams generally responsible for? 

Again, simply put, “the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).”

An SRE’s time is split between operational work and on-call duties. These responsibilities may include implementing automation, creating new features, or scaling a system in order to increase site reliability and performance.

We won’t go into more detail here, but if you’re interested, read What is SRE? in our Reliability Guide.

What base skills should an SRE have?

There are many base skills that will prepare you and you should take the time to educate yourself on them, but keep in mind that most companies seeking SRE’s are looking for engineers with at least a few years of experience. It’s also important to note that there’s no definitive path—by way of education or career—into an SRE role.

The following skills should help you get started, but make sure you review job descriptions and read about each business to know what will make you better suited for the role.

You should…

Be comfortable coding and be able to understand a full software stack

You don’t have to be the world’s best programmer to be an SRE, but you definitely need to know your way around code. The languages you need to know will vary by company, as much as the technologies you may encounter in each job. Here’s a sampling of some things you may need to focus on:

  • Traditional Programming languages: familiarity with Java and JavaScript can be as important as knowing a bit (or a lot) about newer languages (such as Node.js, Golang, and Scala)

  • Knowledge of operational performance: server platforms, databases, and networks

  • Old and new technologies and their nuances:  Git, CI/CD pipeline, monitoring tools, and incident management

Face complexity (rather than fear it) and be able to scale head-on 

In order to be successful as an SRE, you’ll have to understand how software is developed and how complex systems are built. This may include reading up about methodologies and best practices shared by industry practitioners or major companies.

Interested in more ways to learn? Check out the LinkedIn School of SRE.

PSA: You can’t know everything

If you’re in your first (or even second) SRE role, understand that you will not know everything. You may feel a bit of imposter syndrome, but try not to let it get in the way of learning and growing.

Depending on your organization, you may have other SRE’s to learn from or you could be in a lone role, expected to save the day when the need arises. 

While there are practical skills you’ll need, you will spend most of your time learning the existing codebase and getting to know your business’s unique processes (and your team)!

Other tips, tricks, and helpful SRE resources

If you’re feeling skittish and want to spend time educating yourself (we don’t blame you—we’re reading new stuff all the time), here are some  great places to start:

  1. Read the Google SRE Book

  2. Have a process to quickly learn about a new codebase (here are some ideas)

  3. Read up about different tools that SRE’s use (like FireHydrant!)

See FireHydrant in action

See how service catalog, incident management, and incident communications come together in a live demo.

Get a demo