GrepJob
Etched

Inference Software Engineer

Etched
Apply
about 2 months ago
Cupertino, CA, USAMid Level
H1B Sponsor

Responsibilities

  • Contribute to the architecture and design of the Sohu host software stack.
  • Implement high-performance, modular code across the complete Etched software stack.
  • Interface with firmware and drivers teams to deliver the highest-performance HW/SW stack.
  • Work with AI model researchers and product-facing teams to build out the Etched serving front-end.
  • Build scheduling logic for handling continuous batching and real-time inference.
  • Implement inference-time acceleration techniques such as speculative decoding and tree search.
  • Implement distributed networking primitives for efficient multi-server inference.

Requirements

  • Experience with C++ and Python.
  • Familiarity with transformer model architectures and inference serving stacks.
  • Experience working cross-functionally in large software and hardware organizations.
  • Experience with Rust is a plus.
  • Familiarity with GPU kernels and the CUDA compilation stack is advantageous.
  • Understanding of distributed systems, networking, and parallel programming.

Benefits

  • Full medical, dental, and vision packages, with 100% of premium covered.
  • Housing subsidy of $2,000/month for those living within walking distance of the office.
  • Daily lunch and dinner in the office.
  • Relocation support for those moving to Cupertino.

Tech Stack