Job Description
We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.
Responsibilities
-
Develop APIs for AI inference that will be used by both internal and external customers
-
Benchmark and address bottlenecks throughout our inference stack
-
Improve the reliability and observability of our systems and respond to system outages
-
Explore novel research and implement LLM inference optimizations
Qualifications
-
Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
-
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
-
Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Final offer amounts are determined by multiple factors, including, experience and expertise.
Equity: In addition to the base salary, equity may be part of the total compensation package.