Software Engineer, Inference - Multi Modal

11 months ago

San Francisco, CA, USA

Mid Level / Senior

Base Salary

$295k - $555k/yr

Responsibilities

Design and implement inference infrastructure for large-scale multimodal models.
Optimize systems for high-throughput, low-latency delivery of image and audio inputs and outputs.
Enable experimental research workflows to transition into reliable production services.
Collaborate closely with researchers, infra teams, and product engineers to deploy state-of-the-art capabilities.
Contribute to system-level improvements including GPU utilization, tensor parallelism, and hardware abstraction layers.

Experience building and scaling inference systems for LLMs or multimodal models.
Familiarity with GPU-based ML workloads and performance dynamics of large models.
Comfortable dealing with systems that span networking, distributed compute, and high-throughput data handling.
Familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems.
Ability to own problems end-to-end and operate in ambiguous, fast-moving spaces.