about 11 hours ago
Bengaluru, IndiaMid Level / Senior
H1B Sponsor
Responsibilities
- Design, build, and scale services for orchestrating Ray clusters across various environments.
- Optimize control plane components for large-scale AI/ML workloads.
- Develop intelligent scheduling and resource management systems for compute clusters.
- Enhance reliability, performance, scalability, and observability of Ray workloads.
- Manage container images and dependencies for distributed workloads.
- Participate in code reviews and architecture discussions.
- Provide on-call support and troubleshoot infrastructure issues.
- Collaborate with experts in distributed systems and machine learning.
Requirements
- Bachelor's degree in Computer Science, Engineering, or equivalent experience.
- 3+ years of experience writing high-quality production code.
- Experience in building and maintaining scalable distributed systems.
- Expertise in cloud-native technologies and Kubernetes deployments.
- Deep understanding of networking, security, and authentication in cloud environments.
- Familiarity with observability stacks like Prometheus and Grafana.
- Proficiency in Go and Python programming languages.
- Knowledge of low-level operating system foundations.
