GrepJob
Appier

Staff/Senior Software Engineer, Machine Learning Platform (Ad Cloud) - Tokyo

Appier
Apply
about 3 hours ago
Tokyo, Japan
Senior / Staff+

Responsibilities

  • Architect, implement, and scale batch and streaming pipelines for ML training and evaluation.
  • Design and operate robust ML job execution frameworks for training, inference, and post-processing.
  • Build and maintain internal API servers and developer tools for orchestrating ML jobs on Kubernetes.
  • Design and monitor data infrastructure using ClickHouse and PostgreSQL.
  • Ensure high availability and observability through monitoring tools like Prometheus and Grafana.
  • Collaborate with data scientists, product managers, and engineers to deliver efficient ML platform capabilities.
  • Promote the use of LLM-based tools to accelerate development and debugging.
  • Mentor junior engineers and help evolve team engineering culture.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field; Master’s preferred.
  • 4+ years of hands-on experience in data systems, machine learning infrastructure, or platform engineering.
  • Strong coding proficiency in Python and/or Java, with experience in large-scale production systems.
  • Practical experience with Spark, Flink, Kubernetes, and infrastructure-as-code tools like Terraform and Helm.
  • Experience managing high-throughput data infrastructure using ClickHouse, PostgreSQL, or similar systems.
  • Deep understanding of ML pipelines and distributed job execution in production environments.
  • Proven ability to apply LLM-based tools to boost engineering productivity.
  • Strong ownership, architectural thinking, and ability to lead cross-functional platform projects.

Tech Stack

Apache FlinkApache SparkArgo CDClickHouseGrafanaHelmJavaKubernetesPostgreSQLPrometheusPythonTerraform

Categories

AI & MLData Engineering