GrepJob
Fireworks AI

Member of Technical Staff, Performance Optimization

Fireworks AI
Apply
2 months ago
San Mateo, CA, USASenior / Staff+
H1B Sponsor

Base Salary

$175k - $220k/yr

Responsibilities

  • Optimize system and GPU performance for high-throughput AI workloads across training and inference.
  • Analyze and improve latency, throughput, memory usage, and compute efficiency.
  • Profile system performance to detect and resolve GPU- and kernel-level bottlenecks.
  • Implement low-level optimizations using CUDA, Triton, and other performance tooling.
  • Drive improvements in execution speed and resource utilization for large-scale model workloads.
  • Collaborate with ML researchers to co-design and tune model architectures for hardware efficiency.
  • Improve support for mixed precision, quantization, and model graph optimization.
  • Build and maintain performance benchmarking and monitoring infrastructure.
  • Scale inference and training systems across multi-GPU, multi-node environments.
  • Evaluate and integrate optimizations for emerging hardware accelerators and specialized runtimes.

Requirements

  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience.
  • 5+ years of experience working on performance optimization or high-performance computing systems.
  • Proficiency in CUDA or ROCm and experience with GPU profiling tools.
  • Familiarity with PyTorch and performance-critical model execution.
  • Experience with distributed system debugging and optimization in multi-GPU environments.
  • Deep understanding of GPU architecture, parallel programming models, and compute kernels.

Benefits

  • Meaningful equity in a fast-growing startup.
  • Competitive salary and comprehensive benefits package.
  • Opportunity to solve hard problems at the forefront of AI infrastructure.
  • Work with bleeding-edge technology that impacts global AI usage.
  • Join a passionate team where your work directly shapes the future of AI.

Tech Stack

KubernetesPyTorch

Categories

AI & MLBackendData Engineering