Skip to main content

Staff Software Engineer, Model LifeCycle

Job DescriptionJob Description

Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.

We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.

We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.

If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.

About This Role:

Crusoe is seeking a Staff Software Engineer to join our Model LifeCycle team, where you will be a key architect of a managed platform designed for the next of AI application development. In this role, you will build the infrastructure that allows developers to leverage Large Models (LLMs) and advanced foundational models at an unprecedented scale. By focusing on the end-to-end model development lifecycle, you will ensure that Crusoe’s sustainable, high-performance cloud remains the platform of choice for the world’s most advanced AI builders.

As a Staff Engineer, you will have significant scope for ownership, transitioning from design to high-impact implementation of core systems from first principles. You will bridge the gap between complex research and robust production systems, creating the abstractions and APIs that will define how models are trained, managed, and deployed. This is a full-time position for a seasoned engineer who is passionate about merging deep AI infrastructure with world-class systems engineering.

What You’ll Be Working On:

  • Fine-Tuning System Development: Contribute to the development of sophisticated fine-tuning systems (SFT, PEFT, LoRA, adapters), ensuring reliable multi-node orchestration, checkpointing, and failure recovery.

  • End-to-End Training Pipelines: Implement and maintain robust training rimes for LLMs, including distillation and reinforcement learning pipelines such as preference and policy optimization.

  • Agent Execution Infrastructure: Develop and maintain the scalable, high-performance infrastructure required for reliable agentic execution and complex model workflows.

  • Lifecycle Management Features: Implement enterprise-grade features for dataset, model, and experiment management, focusing on versioning, lineage, and reproducible fine-tuning at scale.

  • Collaborative API Design: Work closely with Principal Engineers and product teams to shape the core abstractions and APIs that power the Crusoe AI ecosystem.

  • Architectural Strategy: Contribute to mission-critical decisions regarding training runtimes, scheduling, storage, and the long-term evolution of model lifecycle management.

  • Ecosystem Engagement: Engage with the open-source LLM community to ensure our platform stays at the cutting edge of AI innovation.

What You’ll Bring to the Team:

  • Deep Engineering Foundations: 8–10+ years of industry experience with a demonstrated history of leading a varied portfolio of high-impact technical initiatives.

  • Production Excellence: A proven track record of delivering complex production features on time and at scale within a fast-paced environment.

  • Cloud Infrastructure Expertise: Hands-on experience with core cloud-based services, including elastic compute, object storage, virtual private networks, and managed databases.

  • Generative AI Mastery: Practical experience with Generative AI (LLMs, Multimodal) and the underlying infrastructure required for both training and inference.

  • Systemic Autonomy: A proactive, collaborative approach with the ability to drive independent workstreams while aligning with broader team goals.

  • Communication & Passion: Strong interpersonal skills and a visible passion for solving the industry's most challenging technical problems in the AI space.

  • Education: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.

Bonus Points:

  • Production Proficiency: Advanced skills in Golang or Python for building large-scale, production-level services.

  • Framework Expertise: Deep experience working with PyTorch and a history of training and fine-tuning LLMs in production environments.

  • Performance Optimization: Experience with GPU system optimizations and performance tuning for inference frameworks.

  • Open-Source Contributions: A background in contributing to or maintaining open-source AI projects.

  • Aspirational Drive: A desire to build "gold standard" infrastructure that aligns the future of computing with the future of the climate.

Benefits:

  • Competitive compensation

  • Restricted Stock Units

  • Paid time off & paid holidays

  • Comprehensive health, dental & vision insurance

  • Employer contributions to HSA account

  • Paid parental leave

  • Paid life insurance, short-term and long-term

  • Professional development & tuition reimbursement

  • Mental health & wellness support

  • Commuter benefits (parking & transit)

  • Cell phone stipend

  • 401(k) Retirement plan with company match up to 4% of salary

  • Volunteer time off

Compensation Range

Compensation will be paid in the range of up to $208,725 - $253,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to , , , , genetic information, , citizenship, marital status, /, sexual preference/ , , , veteran status, , or any other status protected by law or regulation.