Research Engineer, Prime Video Compression Efficiency TeamPrime Video is changing the way millions of customers interact with video content. Every day we face the challenges of a fast-paced market, expanding technology set, and a wide range of viewing devices.
Prime Video is looking for a driven and talented ML engineer with prior expertise in deploying, optimizing, and maintaining ML and DL-based workloads. ML/DL solutions enable content-adaptive processing and encoding of video as well as on models that measure video quality. You will help deploy proven algorithms/architectures, optimize, re-train, expand coverage to additional encoding profiles or codecs, quantize the models (as necessary), and integrate such workloads at scale with the help of other orchestration teams on instances that offer the best cost and turn-around times. You will develop suitable monitoring dashboards and guardrails to ensure proper operation.
Key Job ResponsibilitiesAs an ML engineer, you will assist Research/Applied Scientists in the team to collect ground-truth data, clean data and labels, set up scalable training of such models to utilize multiple GPUs efficiently, deploy pre-trained inference with optimal performance on appropriate EC2 instances, work with SDEs to define suitable job queues and APIs for the inference workloads to integrate them as part of larger orchestration, and develop suitable monitoring dashboards to keep track of the different training/inference jobs.
You will triage operational bottlenecks and failures related to ML/DL workloads, identify the evolving best practices for running such workloads at scale with optimum performance, and define/refine suitable processes related to maintenance of large datasets, framework versions, code maintenance, mechanisms used to identify the right instance type for a given algorithm, and ways to maximize utilization of availed compute instances while meeting SLA guarantees.
A Day in the LifeYou will extract and maintain features from a large set of training videos to train classical ML models, obtain and maintain ground-truth labels required for training ML models, develop or adopt tools to monitor progress during training, perform cross-validation in multiple folds to verify the performance of different ML models, benchmark readily available ML/DL solutions (open or proprietary) against internal solutions, and work with stakeholders (e.g., product, studios, Applied Scientists, Engineering team members) to facilitate fully automated as well as human-in-the-loop type of workflows.
You will create appropriate tickets for known issues and will triage and root-cause such issues as per their severity.
About the TeamOur mission is to build and operate the most innovative video streaming technology stack that provides the best customer-centric streaming experience for VOD and Live globally and supports all business use cases (subscription, transactional, ad-supported). We invent and implement technologies that deliver a flawless, engaging streaming experience for our customers, using the fewest bits possible. We commit to our values of respect and integrity by creating a work environment that is supportive, diverse, inspiring, and inclusive.
BASIC QUALIFICATIONS3+ years of non-internship professional software development experience2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experienceExperience programming with at least one software programming languagePrior experience in deploying training and inference workloads on cloud instances covering both CPU and multi-GPU.PREFERRED QUALIFICATIONS3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experienceBachelor's degree in computer science or equivalentWork experience deploying ML/DL workloads in production for video or computer vision use-cases
#J-18808-Ljbffr