GrepJob
Kapitus

MLOps Architect - Gen Al

Kapitus
Apply
13 days ago
Arlington, VA, USASenior / Staff+

Base Salary

$118k - $189k/yr

Responsibilities

  • Design and implement scalable ML and LLM infrastructure on AWS.
  • Architect end-to-end ML and Generative AI lifecycle workflows.
  • Integrate LLM pipelines into the enterprise MLOps stack.
  • Define standards for CI/CD/CT pipelines across ML and GenAI workloads.
  • Architect Retrieval-Augmented Generation (RAG) pipelines.
  • Design and deploy LLM-based services using managed services and containerized custom inference services.
  • Establish prompt versioning and evaluation frameworks for LLM systems.
  • Implement guardrails for hallucination control and safety monitoring.
  • Define architecture for LLM fine-tuning workflows.
  • Implement scalable orchestration of LLM pipelines.
  • Architect scalable inference patterns for traditional ML models and LLM APIs.
  • Implement model monitoring frameworks for performance and quality.
  • Define SLAs/SLOs for ML and GenAI systems.
  • Design safe deployment strategies.
  • Implement cost tracking for training workloads and inference endpoints.
  • Optimize LLM workloads for cost-performance tradeoffs.
  • Partner with finance and engineering teams to forecast ML/GenAI infrastructure spend.
  • Define enterprise standards for experiment tracking and model registry.
  • Provide architectural guidance to data science and engineering teams.
  • Evaluate and recommend tooling across the ML/GenAI stack.

Requirements

  • 6+ years of experience in ML engineering, data engineering, or MLOps roles.
  • Proven experience architecting ML platforms in AWS.
  • Strong hands-on experience with SageMaker.
  • Experience operationalizing LLM or Generative AI systems in production.
  • Experience building RAG pipelines and integrating vector databases.
  • Experience working with Databricks in production.
  • Experience implementing data governance and catalog systems.
  • Strong understanding of CI/CD principles for ML and GenAI.
  • Experience with containerization and orchestration.
  • Deep knowledge of infrastructure-as-code.
  • Strong understanding of observability and monitoring for ML systems.
  • Experience implementing cloud cost optimization strategies.
  • Strong Python proficiency.
  • Experience with foundation model fine-tuning.
  • Experience implementing model registries and experiment tracking tools.
  • Experience designing feature stores and embedding stores.
  • Familiarity with AI risk management and bias mitigation.
  • Experience supporting regulated or data-sensitive environments.
  • Platform-level architectural thinking.
  • Deep understanding of integrating GenAI into enterprise ML ecosystems.
  • Ability to balance scalability, governance, security, performance, and cost.
  • Strong technical leadership and cross-functional collaboration skills.
  • Hands-on ability to move from architecture design to implementation.

Benefits

  • Competitive base salary range of $117,800 – $189,000.
  • Annual incentive compensation eligibility up to 10%.
  • Comprehensive medical, dental, and employer-paid vision plans.
  • Flexible Spending Account for qualified out-of-pocket expenses.
  • Lifestyle Spending Account for physical, mental, and financial well-being.
  • 100% company paid insurances for short-term and long-term disability.
  • Paid maternity and parental leave.
  • Commuter benefits for travel expenses.
  • LifeBalance program offering discounts on activities and services.
  • Tuition reimbursement of up to $5,000 annually.
  • Travel reimbursement for work-related travel.
  • Paid time off and sick time.
  • 401K plan with a 25% match on contributions.

Tech Stack

AWSDatabricksDockerKubernetesMLflowPythonTerraform

Categories

AI & MLData ScienceDevOps