13 days ago
Arlington, VA, USASenior / Staff+
Base Salary
$118k - $189k/yr
Responsibilities
- Design and implement scalable ML and LLM infrastructure on AWS.
- Architect end-to-end ML and Generative AI lifecycle workflows.
- Integrate LLM pipelines into the enterprise MLOps stack.
- Define standards for CI/CD/CT pipelines across ML and GenAI workloads.
- Architect Retrieval-Augmented Generation (RAG) pipelines.
- Design and deploy LLM-based services using managed services and containerized custom inference services.
- Establish prompt versioning and evaluation frameworks for LLM systems.
- Implement guardrails for hallucination control and safety monitoring.
- Define architecture for LLM fine-tuning workflows.
- Implement scalable orchestration of LLM pipelines.
- Architect scalable inference patterns for traditional ML models and LLM APIs.
- Implement model monitoring frameworks for performance and quality.
- Define SLAs/SLOs for ML and GenAI systems.
- Design safe deployment strategies.
- Implement cost tracking for training workloads and inference endpoints.
- Optimize LLM workloads for cost-performance tradeoffs.
- Partner with finance and engineering teams to forecast ML/GenAI infrastructure spend.
- Define enterprise standards for experiment tracking and model registry.
- Provide architectural guidance to data science and engineering teams.
- Evaluate and recommend tooling across the ML/GenAI stack.
Requirements
- 6+ years of experience in ML engineering, data engineering, or MLOps roles.
- Proven experience architecting ML platforms in AWS.
- Strong hands-on experience with SageMaker.
- Experience operationalizing LLM or Generative AI systems in production.
- Experience building RAG pipelines and integrating vector databases.
- Experience working with Databricks in production.
- Experience implementing data governance and catalog systems.
- Strong understanding of CI/CD principles for ML and GenAI.
- Experience with containerization and orchestration.
- Deep knowledge of infrastructure-as-code.
- Strong understanding of observability and monitoring for ML systems.
- Experience implementing cloud cost optimization strategies.
- Strong Python proficiency.
- Experience with foundation model fine-tuning.
- Experience implementing model registries and experiment tracking tools.
- Experience designing feature stores and embedding stores.
- Familiarity with AI risk management and bias mitigation.
- Experience supporting regulated or data-sensitive environments.
- Platform-level architectural thinking.
- Deep understanding of integrating GenAI into enterprise ML ecosystems.
- Ability to balance scalability, governance, security, performance, and cost.
- Strong technical leadership and cross-functional collaboration skills.
- Hands-on ability to move from architecture design to implementation.
Benefits
- Competitive base salary range of $117,800 – $189,000.
- Annual incentive compensation eligibility up to 10%.
- Comprehensive medical, dental, and employer-paid vision plans.
- Flexible Spending Account for qualified out-of-pocket expenses.
- Lifestyle Spending Account for physical, mental, and financial well-being.
- 100% company paid insurances for short-term and long-term disability.
- Paid maternity and parental leave.
- Commuter benefits for travel expenses.
- LifeBalance program offering discounts on activities and services.
- Tuition reimbursement of up to $5,000 annually.
- Travel reimbursement for work-related travel.
- Paid time off and sick time.
- 401K plan with a 25% match on contributions.
