Senior Research Scientist, AWS Incident Tooling & ResponseDESCRIPTION
AWS Resilience owns service to prevent and respond to availability and security issues for all AWS Services. In other words, we're the people who keep the cloud running. We work on the most challenging problems, with constant new services and possible failure modes to prevent - and we're looking for talented people who want to help.
You'll join a diverse team of software, security experts, operations managers, and other vital roles. You'll collaborate with people across AWS to help us deliver the highest standards for safety and security and availability. You'll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
AWS Incident Response is at the heart of the high availability of Amazon Web Services. We make customer impacting events shorter and less frequent by driving large scale event and incident response. Our automated tooling quickly identifies the cause of an issue and helps mitigate its impact, and much of our engineer time is spent on projects to improve the tooling and automation. We also provide manual incident management for AWS and other Amazon groups, directing the resolution of an issue with service teams, and diving deep into those events to drive improvements to the tooling. It's an exciting time to join our team as we are growing and expanding our offerings.
Key job responsibilities
You will own the organisation strategy relative to the usage of ML, GenAI and propose the best technology to advance our ability to better detect, faster root cause, and correlate to prior incidents to shorten customer facing AWS incidents. Your work will enable us to identify gaps in our current strategy, learnings from past incidents. You will contribute to shortening incident response through deep analysis and introduction of new technology.
BASIC QUALIFICATIONS
- Masters degree (or European advanced degree equivalent) or PhD in Computer Science, or related technical, math, economics, or scientific field
- Several years of relevant experience in developing large scale machine learning or deep learning models and/or systems in a production environment
- Experience in using Python, R or Matlab or other statistical/machine learning software language
- Several year experience specifically with deep learning (e.g., CNN, RNN, LSTM, etc.)
- Experience hiring or mentoring more junior colleagues
PREFERRED QUALIFICATIONS
- PhD degree in computer science, engineering, mathematics, economics, or related technical/scientific field
- Hands on experience building models with deep learning frameworks like PyTorch, or similar
- Experience with machine learning, time series, NLP and CV solutions
- Proven communication skills, presentation skills, and attention to detail
- Comfortable working in a fast paced, highly collaborative, dynamic work environment
- Scientific thinking and the ability to invent, a track record of thought leadership and contributions that have advanced the field.
#J-18808-Ljbffr