
Senior AI Engineer Infrastructure Focus
Bitcoin Builders
Job Description
Senior AI Engineer (Infrastructure)
About Us
We run an AI platform with 50K+ daily active users, millions of generations per day, entirely powered by open-weight models running on our own GPU fleet.
New models drop weekly. New hardware ships quarterly. Our job is to be fast at adopting both.
The team
You'll be the AI/ML Infrastructure Engineer.
Our first built the system from scratch: dynamic LoRA serving with 100+ adapters hot- swapped per request, inference optimization (DeepCache, torch.compile, quantization, abliteration), and keeps us on the latest GPU hardware as it ships.
Together, you'll own everything between "a new model just dropped" and "it's live, fast, and cost-efficient."
A typical week
- Benchmark a new open-weight model, quantize it, test LoRA compatibility, decide ship or skip
- Tune block-level caching for Blackwell architecture, measure quality/speed tradeoffs
- Dig into GPU utilization data, find wasted spend, redesign auto- scaling
- Debug a 3 AM latency spike — OOM on two pods, fix it, write up what happened
You'll thrive here if you
- Have shipped open-weight models to production at scale — not notebooks, not demos. LLMs, VLMs, image — the more architectures the better.
- Can show real optimization results with numbers — Xs faster, $Y/ month saved, Z% latency reduction.
- Think in cost-per-generation, not just raw performance. We care about both.
- Pick up new models and hardware fast. The ecosystem won't wait for you.
- Work independently. You'll figure out what to optimize — we won't hand you a roadmap.
Bonus points
- Built or worked on dynamic adapter serving (LoRA hot-loading, multi-model routing)
- Model surgery beyond default settings: custom quantization, abliteration, architectural pruning
- Evaluated and migrated workloads across GPU generations
What we run
- Models: Various open-weight LLMs, VLMs, and image models — changes constantly Optimization: PyTorch, torch.compile, DeepCache, GPTQ/AWQ Serving: Custom dynamic LoRA system Hardware: RTX 6000 Blackwell, H100 — we evaluate and migrate as new GPUs ship Infra: RunPod + on-prem · Docker · Python · Go backend
Why us over a bigger company
- You won't spend 6 months getting access to a GPU cluster. You won't write design docs that never ship. You'll push to production this week.
- The problems are real, the scale is real, and you'll see your work in the numbers every morning.
- Social insurance, health insurance & private health insurance
- 13th month salary + year-end bonus based on real contribution
- Breakfast, lunch & afternoon snacks provided
- Flexible working hours
- AI Learning Budget — tools, courses, subscriptions to level up your skills
- Birthday leave
- Competitive pay + bonuses tied directly to impact.
- Macbook, iMac and monitors provided.
Benefits
- Social insurance, health insurance & private health insurance
- 13th month salary + year-end bonus based on real contribution
- Breakfast, lunch & afternoon snacks provided
- Flexible working hours
- AI Learning Budget — tools, courses, subscriptions to level up your skills
- Birthday leave
- Competitive pay + bonuses tied directly to impact.
- Macbook, iMac and monitors provided.