Data?1662490819
Site Reliability Engineer @ Blip

Description

Blip is a Tech and Innovation Hub with a strong knowledge in software development, mobile apps, web platforms and retail applications for betting and gamming.

We are part of Flutter Entertainment – one of the World´s Largest Groups in the bookmaking industry, with annual revenue of around 2 billion euros. The Code we develop, powering brands such as PaddyPower, Betfair, and Fanduel, is used by over 5 million people in more than 100 countries and we are in the API Billionaire Club alongside players such as Google, Facebook, and Twitter. 

We are looking for an exceptional Site Reliability Engineer to deliver game-changing improvements for Blip.

The Role

As Site Reliability Engineer you'll be accountable for closely monitoring the availability of our platforms, performance and stability while closely working with software development teams in how to improve critical components.

Working with complex challenges while assuring uptime and reliability in different setups (AWS Cloud, AWS Outpost, OpenStack) allows you to use different skillsets in coding, algorithms and complexity analysis.

What will you be doing...

  • Engage in and improve the whole lifecycle of services—from design, deployment, operation, and refinement.

  • Take an active part in production problems root cause investigation, identification, and resolution (where necessary)

  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.

  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.

  • Be an active part of performance and capacity testing;

  • Optimize reliability monitoring & alerting;

  • Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.

  • Iteratively perform Auditing of performance and reliability vulnerabilities;

  • Define and revise Service Level Indicators (SLIs);

  • Practice sustainable incident response and blameless postmortems;

We are looking for someone who...

  • Has experience with Operating Systems & Networking knowledge;

  • Has experience with programming languages such as Python, Java or Go;

  • Has experience working with public cloud providers;

  • Has experience working with microservices architectures;

  • Has experience working with message queuing services and databases;

  • Has experience with Configuration Management tools such chef and ansible;

  • Has knowledge of Monitoring Solutions like Datadog and Splunk;

  • Familiar with CD/CI pipelines comprising Jenkins, Git, Artifactory or others.

Apply here