Observability Reliability Engineer

Observability Reliability Engineer
Company:

Mlp


Details of the offer

Observability Reliability Engineer

We are seeking an experienced Site Reliability Engineer (SRE) specialized in the Observability space to join our team. This role will be responsible for the design and implementation of observability solutions that ensure the reliable, performance, and scalable infrastructure. In addition, this role will involve reviewing our current observability stack, planning for future enhancements, implementing new solutions, and collaborating with developers to create actionable insights through effective dashboards and automated alerting systems. The ideal candidate will have a strong background in analytics and experience with advanced monitoring techniques to help us achieve metrics baselining, anomaly detection, and enhanced correlation and causation analysis.

Responsibilities
Conduct thorough reviews of our existing observability stack to identify areas for improvement and optimization
Collaborate with the team to plan and design the next version of our observability infrastructure
Assist in the implementation of the new observability stack, ensuring seamless integration and minimal disruption
Create and maintain insightful and actionable dashboards that provide clear visibility into system performance without adding unnecessary noise
Review existing alerts and work closely with developers to automate alert handlers for self-healing systems
Utilize your experience in analytics to perform metrics baselining and anomaly detection, ensuring our systems are operating optimally
Explore and integrate AI tools to enhance our correlation and causation analysis capabilities
Develop and maintain necessary components such as metrics exporters and self-service tools

Required Skills:
Demonstrated experience as a Site Reliability Engineer, Observability Engineer, or similar role in software development
Must have experience with Observability such as implementing monitoring, alerting and dashboarding solutions
Experience with alerts management and automation
Experience with custom metrics exporters, tracing tools
Experience with performance tools and optimization
Hands-on experience with the Prometheus ecosystem
Ability to design and develop code in Python or Go
Acute drive to automate manual operations and processes
Strong understanding of Linux operating systems
Hands-on experience with configuration management tools such as Ansible, SaltStack, or Terraform
Experience in managing and scaling distributed systems
Strong sense of ownership and integrity, demonstrated through clear communication and collaboration
Excellent troubleshooting and problem-solving skills
Ability to communicate complex concepts clearly with both stakeholders and developers


Source: Eightfold_Ai

Job Function:

Requirements

Observability Reliability Engineer
Company:

Mlp


Validation Engineer

Title: Validation EngineerRequisition ID: 66351 Date: Aug 16, 2024 Location: Dublin, Leinster, IE DESCRIPTION: At West, we're a dedicated team that is connec...


From West Pharmaceutical Services, Inc - County Dublin

Published a month ago

Automation Technician

Select how often (in days) to receive an alert: Requisition ID: 66382 Date: Aug 2, 2024 Location: Dublin, Leinster, IE Description: At West, we're a dedicate...


From West Pharmaceutical Services, Inc - County Dublin

Published a month ago

Senior Process Engineer

Title: Senior Process EngineerRequisition ID: 66764 Date: Aug 15, 2024 Location: Dublin, Leinster, IE Description: At West, we're a dedicated team that is co...


From West Pharmaceutical Services, Inc - County Dublin

Published a month ago

Tendering Engineer

Hitachi Energy has an exciting opportunity for a Tendering Engineer to support the Hitachi Energy Ireland Grid Integration Business, the role will report to ...


From Hitachi Energy Ireland Limited - County Dublin

Published a month ago

Built at: 2024-09-23T01:25:25.530Z