AI Ops Engineer

about 2 months ago

Bengaluru, IndiaSenior

Responsibilities

Design and build document AI platforms powered by generative AI.
Implement event-driven and queue-based systems for scalable AI workflows.
Architect and maintain self-hosted LLM infrastructure on AWS.
Manage production systems for LLM serving and AI workflow orchestration.
Develop monitoring systems to reduce hallucinations and unsafe outputs.
Implement end-to-end observability for AI/ML pipelines.
Track performance metrics for AI systems.
Manage machine learning workflows and enable experiment tracking.
Implement AI platform security controls and optimize AWS infrastructure.

Requirements

Strong experience with AWS cloud infrastructure and services.
Experience building ML infrastructure using Infrastructure-as-Code tools.
Hands-on experience deploying LLM serving infrastructure.
Experience managing vector databases and retrieval systems.
Strong experience designing event-driven or asynchronous systems.
Experience implementing observability for distributed AI systems.
Strong programming experience in Python and asynchronous programming.
Experience with Docker, Kubernetes, and CI/CD pipelines.
5+ years of experience in MLOps, LLMOps, AIOps, or DevOps.

Benefits

Competitive salary and benefits including family insurance coverage.
Free health teleconsultations and learning/upskilling budgets.
Equity in the company.
Flexible hours and a hybrid work setup.
Unlimited PTO.
Opportunity to grow with a fast-scaling company.

Tech Stack

Apache KafkaAWSDockerFastAPIGitHub ActionsGrafanaKubernetesMLflowPrometheusPythonTerraform

Categories

AI & MLData EngineeringDevOps