Job DescriptionJob Description
Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.
We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.
If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.
About This Role:
Crusoe is seeking a visionary Principal Software Engineer for our Model LifeCycle team to architect a comprehensive managed platform for the next of AI development. In this high-impact role, you will be the technical authority responsible for the entire application development lifecycle, specifically optimized for Large Models (LLMs) and advanced foundational models. By building these core systems from first principles, you will enable developers to leverage Crusoe’s sustainable, high-performance infrastructure to push the boundaries of what is possible in AI.
As a Principal Engineer, you will have significant 0 → 1 ownership, designing mission-critical abstractions and influencing long-term architectural decisions across training runtimes, scheduling, and storage. This is a full-time position for a seasoned expert who thrives on technical complexity and is eager to lead the industry toward a more sustainable and powerful AI future.
What You’ll Be Working On:
-
Model Fine-Tuning Orchestration: Design and manage sophisticated systems for large foundation models (SFT, PEFT, LoRA, adapters), ensuring seamless multi-node orchestration, checkpointing, and cost-efficient scaling.
-
End-to-End Training Pipelines: Implement and maintain robust training rimes for LLMs, including distillation and reinforcement learning pipelines such as PPO, DPO, and reward modeling.
-
Agent & Execution Infrastructure: Architect the underlying infrastructure required for reliable agentic execution and complex model-driven workflows.
-
Lifecycle Management Systems: Develop enterprise-grade systems for dataset, model, and experiment management, emphasizing versioning, lineage, and reproducible fine-tuning at scale.
-
Strategic Architectural Influence: Drive long-term decisions regarding training runtimes and storage, shaping the core APIs that will define the user experience of Crusoe’s AI platform.
-
Cross-Functional Collaboration: Partner closely with product, business, and platform teams to translate high-level goals into scalable, performant technical realities.
-
Open-Source Engagement: Represent Crusoe within the open-source LLM ecosystem, contributing to and staying ahead of industry-standard frameworks and tools.
What You’ll Bring to the Team:
-
Advanced Technical Foundation: An advanced degree (Masters or PhD) in Computer Science, Engineering, or a related technical field.
-
Extensive Industry Experience: 10–15+ years of professional experience driving high-impact engineering projects, with a significant tenure dedicated to the AI/ML space.
-
0 → 1 Delivery Track Record: A proven history of architecting and delivering early-stage, foundational projects under tight deadlines and high-growth conditions.
-
Cloud Infrastructure Mastery: Expert-level proficiency in cloud-based services, including elastic compute, object storage, virtual private networks, and managed databases.
-
Generative AI Expertise: Deep, hands-on experience in Generative AI (LLMs, Multimodal) and the underlying infrastructure required for both training and large-scale inference.
-
Leadership & Communication: Exceptional interpersonal skills with the ability to work autonomously while proactively collaborating with stakeholders at all levels.
Bonus Points:
-
High-Scale Production Skills: Advanced proficiency in Golang or Python for building large-scale, production-level services.
-
Open-Source Contributions: Visible contributions to prominent AI projects such as vLLM, DeepSpeed, or similar high-performance frameworks.
-
Hardware & Performance Optimization: Deep experience with GPU system optimizations and inference framework performance tuning.
-
Deep Learning Frameworks: Extensive experience working specifically with PyTorch and specialized LLM fine-tuning libraries.
-
Technical Passion: A demonstrated obsession with building cutting-edge AI products and solving the industry’s most challenging technical "impossible" problems.
Benefits:
-
Competitive compensation
-
Restricted Stock Units
-
Paid time off & paid holidays
-
Comprehensive health, dental & vision insurance
-
Employer contributions to HSA account
-
Paid parental leave
-
Paid life insurance, short-term and long-term
-
Professional development & tuition reimbursement
-
Mental health & wellness support
-
Commuter benefits (parking & transit)
-
Cell phone stipend
-
401(k) Retirement plan with company match up to 4% of salary
-
Volunteer time off
Compensation Range
Compensation will be paid in the range of up to $260,000 - $326,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to , , , , genetic information, , citizenship, marital status, /, preference/ , , , veteran status, , or any other status protected by law or regulation.