Machine Learning Engineer I
Tekion
about 2 months ago
Bengaluru, India
Entry Level / Mid Level
H1B Sponsor
Responsibilities
- Accelerate the rollout of LLM-powered and agent-driven features across Tekion products.
- Enable agentic workflows that automate, reason, and interact on behalf of users and internal stakeholders.
- Operationalize secure, compliant, and explainable LLM and agentic services at scale.
- Convert Applied Sciences models into scalable, compliant, cost-efficient production services.
- Standardize how models are trained, validated, deployed, and monitored across Tekion products.
- Power real-time, context-aware experiences by integrating batch/stream features, graph context, and online inference.
- Turn Applied Sciences prototype models into fast, reliable services with well-defined API contracts.
- Integrate with the LLM Gateway/MCP, prompt/config versioning.
- Build and orchestrate CI/CD pipelines.
- Review data science models; refactor and optimize code; containerize; deploy; version; and monitor for quality.
- Collaborate with data scientists, data engineers, product managers, and architects to design enterprise systems.
- Monitor, detect, and mitigate risks unique to LLMs and agentic systems.
- Implement prompt management: versioning, A/B testing, guardrails, and dynamic orchestration based on feedback and metrics.
- Design batch/stream pipelines and online features linked to our domain graph.
- Build inference microservices with schema versioning, structured outputs, and stringent p95 latency targets.
- Manage the model/feature lifecycle: feature store strategy, model/agent registry, versioning, and lineage.
- Instrument deep observability: traces/logs/metrics, data/feature drift, model performance, safety signals, and cost tracking.
- Ensure real-time reliability: autoscaling, caching, circuit breakers, retries/fallbacks, and graceful degradation.
- Develop templates/SDKs/CLIs, sandbox datasets, and documentation that make shipping ML the default path.
Requirements
- 2.5 years - 4 years in ML engineering/MLOps or backend/platform engineering with production ML.
- Experience with LLMs, retrieval systems, vector stores, and graph/knowledge stores.
- Strong software engineering fundamentals: Python plus one of Java/Go/Scala; API design; concurrency; testing.
- Hands-on with orchestration frameworks and libraries.
- Knowledge of agent architectures and safe execution patterns.
- Experience with pipelines and data tools like Airflow/Kubeflow, Spark/Flink, Kafka/Kinesis.
- Familiarity with microservices and runtime technologies like Docker/Kubernetes.
- Experience with model ops practices including experiment tracking and drift detection.
- Knowledge of observability tools like OpenTelemetry/Prometheus/Grafana.
- Familiarity with cloud services, preferably AWS, and security/compliance practices.
Tech Stack
Amazon DynamoDBApache AirflowApache FlinkApache KafkaApache SparkAWSDockerGoGrafanagRPCJavaKubernetesMLflowPrometheusPythonScala
Categories
AI & MLBackendData Engineering