about 2 months ago
Bengaluru, IndiaSenior
Responsibilities
- Design and build document AI platforms powered by generative AI.
- Implement event-driven and queue-based systems for scalable AI workflows.
- Architect and maintain self-hosted LLM infrastructure on AWS.
- Manage production systems for LLM serving and AI workflow orchestration.
- Develop monitoring systems to reduce hallucinations and unsafe outputs.
- Implement end-to-end observability for AI/ML pipelines.
- Track performance metrics for AI systems.
- Manage machine learning workflows and enable experiment tracking.
- Implement AI platform security controls and optimize AWS infrastructure.
Requirements
- Strong experience with AWS cloud infrastructure and services.
- Experience building ML infrastructure using Infrastructure-as-Code tools.
- Hands-on experience deploying LLM serving infrastructure.
- Experience managing vector databases and retrieval systems.
- Strong experience designing event-driven or asynchronous systems.
- Experience implementing observability for distributed AI systems.
- Strong programming experience in Python and asynchronous programming.
- Experience with Docker, Kubernetes, and CI/CD pipelines.
- 5+ years of experience in MLOps, LLMOps, AIOps, or DevOps.
Benefits
- Competitive salary and benefits including family insurance coverage.
- Free health teleconsultations and learning/upskilling budgets.
- Equity in the company.
- Flexible hours and a hybrid work setup.
- Unlimited PTO.
- Opportunity to grow with a fast-scaling company.
