Tekion

Machine Learning Engineer I

Tekion

Apply
about 2 months ago
Bengaluru, India
Entry Level / Mid Level
H1B Sponsor

Responsibilities

  • Accelerate the rollout of LLM-powered and agent-driven features across Tekion products.
  • Enable agentic workflows that automate, reason, and interact on behalf of users and internal stakeholders.
  • Operationalize secure, compliant, and explainable LLM and agentic services at scale.
  • Convert Applied Sciences models into scalable, compliant, cost-efficient production services.
  • Standardize how models are trained, validated, deployed, and monitored across Tekion products.
  • Power real-time, context-aware experiences by integrating batch/stream features, graph context, and online inference.
  • Turn Applied Sciences prototype models into fast, reliable services with well-defined API contracts.
  • Integrate with the LLM Gateway/MCP, prompt/config versioning.
  • Build and orchestrate CI/CD pipelines.
  • Review data science models; refactor and optimize code; containerize; deploy; version; and monitor for quality.
  • Collaborate with data scientists, data engineers, product managers, and architects to design enterprise systems.
  • Monitor, detect, and mitigate risks unique to LLMs and agentic systems.
  • Implement prompt management: versioning, A/B testing, guardrails, and dynamic orchestration based on feedback and metrics.
  • Design batch/stream pipelines and online features linked to our domain graph.
  • Build inference microservices with schema versioning, structured outputs, and stringent p95 latency targets.
  • Manage the model/feature lifecycle: feature store strategy, model/agent registry, versioning, and lineage.
  • Instrument deep observability: traces/logs/metrics, data/feature drift, model performance, safety signals, and cost tracking.
  • Ensure real-time reliability: autoscaling, caching, circuit breakers, retries/fallbacks, and graceful degradation.
  • Develop templates/SDKs/CLIs, sandbox datasets, and documentation that make shipping ML the default path.

Requirements

  • 2.5 years - 4 years in ML engineering/MLOps or backend/platform engineering with production ML.
  • Experience with LLMs, retrieval systems, vector stores, and graph/knowledge stores.
  • Strong software engineering fundamentals: Python plus one of Java/Go/Scala; API design; concurrency; testing.
  • Hands-on with orchestration frameworks and libraries.
  • Knowledge of agent architectures and safe execution patterns.
  • Experience with pipelines and data tools like Airflow/Kubeflow, Spark/Flink, Kafka/Kinesis.
  • Familiarity with microservices and runtime technologies like Docker/Kubernetes.
  • Experience with model ops practices including experiment tracking and drift detection.
  • Knowledge of observability tools like OpenTelemetry/Prometheus/Grafana.
  • Familiarity with cloud services, preferably AWS, and security/compliance practices.

Tech Stack

Amazon DynamoDBApache AirflowApache FlinkApache KafkaApache SparkAWSDockerGoGrafanagRPCJavaKubernetesMLflowPrometheusPythonScala

Categories

AI & MLBackendData Engineering