about 15 hours ago
Base Salary
$197k - $243k/yr
Responsibilities
- Define the roadmap for leveraging Machine Learning Engineering to improve Production Systems Reliability.
- Enhance real-time anomaly detection capabilities using state-of-the-art ML techniques.
- Develop methods to build pipelines for various data streams including metrics, logs, and traces.
- Create a reasoning layer to identify root causes of production issues.
- Build time-series models to predict capacity exhaustion and manage traffic spikes.
Requirements
- Expert knowledge of various modeling techniques and the ability to fine-tune models for specific use cases.
- Proven ability to propose and architect infrastructure for learning systems.
- Strong fundamentals in distributed systems and understanding of high throughput systems.
Categories
AI & MLData Science