Staff/Senior Software Engineer, Machine Learning Platform (Ad Cloud) - Tokyo

3 months ago

Tokyo, JapanSenior / Staff+

Responsibilities

Architect, implement, and scale batch and streaming pipelines for ML training and evaluation.
Design and operate robust ML job execution frameworks for training, inference, and post-processing.
Build and maintain internal API servers and developer tools for orchestrating ML jobs on Kubernetes.
Design and monitor data infrastructure using ClickHouse and PostgreSQL.
Ensure high availability and observability through monitoring tools like Prometheus and Grafana.
Collaborate with data scientists, product managers, and engineers to deliver efficient ML platform capabilities.
Promote the use of LLM-based tools to accelerate development and debugging.
Mentor junior engineers and help evolve team engineering culture.

Bachelor’s degree in Computer Science, Engineering, or a related field; Master’s preferred.
4+ years of hands-on experience in data systems, machine learning infrastructure, or platform engineering.
Strong coding proficiency in Python and/or Java, with experience in large-scale production systems.
Practical experience with Spark, Flink, Kubernetes, and infrastructure-as-code tools like Terraform and Helm.
Experience managing high-throughput data infrastructure using ClickHouse, PostgreSQL, or similar systems.
Deep understanding of ML pipelines and distributed job execution in production environments.
Proven ability to apply LLM-based tools to boost engineering productivity.
Strong ownership, architectural thinking, and ability to lead cross-functional platform projects.

Apache FlinkApache SparkArgo CDClickHouseGrafanaHelmJava Kubernetes PostgreSQLPrometheusPython Terraform