Job DescriptionJob Description
Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.
We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.
If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.
About This Role:
Crusoe is seeking a Senior Software Engineer to join our Model LifeCycle team, where you will help build a world-class managed platform for the entire AI application development lifecycle. This role focuses on the core infrastructure required to leverage Large Models (LLMs) and advanced machine learning models at scale. You will contribute to a platform that Fortune 500 companies trust to power their most sophisticated AI applications, all while aligning the future of computing with the future of the climate.
As a Senior Engineer, you will have significant implementation ownership of core system components. You will work alongside a high-caliber team of Principal and Staff engineers to turn complex architectural designs into reliable, production-ready services. This is an ideal role for an engineer who is passionate about the "metal-to-model" journey and wants to build the foundational abstractions that define how the world interacts with AI.
What You’ll Be Working On:
-
Fine-Tuning Infrastructure: Implement and maintain systems for fine-tuning large foundation models (SFT, PEFT, LoRA, adapters), ensuring robust multi-node orchestration, checkpointing, and failure recovery.
-
LLM Training Pipelines: Build and optimize end-to-end training pipelines for Large Models, focusing on cost-efficient scaling and performance.
-
Advanced Model Optimization: Implement components for distillation and reinforcement learning pipelines, including preference optimization, policy optimization, and reward modeling.
-
Agentic Execution: Develop the core infrastructure required for agent execution, enabling complex, multi-step AI workflows.
-
Lifecycle Management: Build features for dataset, model, and experiment management, with a strict focus on versioning, lineage, and reproducible fine-tuning at scale.
-
API & Abstraction Development: Partner with product and platform teams to implement the system abstractions and APIs that our customers interact with daily.
-
Collaborative Technical Input: Contribute to high-level technical discussions regarding training runtimes, scheduling, and storage to ensure a cohesive platform experience.
-
Ecosystem Engagement: Engage with the open-source LLM ecosystem to keep Crusoe at the cutting edge of infrastructure innovation.
What You’ll Bring to the Team:
-
Professional Engineering Depth: 4-5+ years of industry experience with a demonstrated history of consistent success leading a varied portfolio of initiatives.
-
Production-Ready Delivery: A proven track record of delivering high-quality, scalable features into production environments.
-
Cloud Infrastructure Foundations: Familiarity with essential cloud-based services, including elastic compute, object storage, and networking.
-
AI/ML Familiarity: A solid understanding of Generative AI (LLMs, Multimodal) and experience with AI infrastructure components for both training and inference.
-
Collaborative Execution: A proactive and collaborative approach to problem-solving, with the ability to work cross-functionally to achieve team goals.
-
Clear Communication: Strong interpersonal and communication skills, with the ability to articulate technical concepts and progress effectively.
-
Education: Bachelor’s degree in Computer Science, Engineering, or a related technical field.
Bonus Points:
-
Modern Proficiency: Proficiency in Golang or Python for building large-scale production services.
-
Framework Knowledge: Hands-on familiarity with PyTorch and experience with the nuances of training and fine-tuning LLMs.
-
GPU Optimization: Experience with performance optimizations on GPU systems or specialized inference frameworks.
-
Open-Source Contributions: Prior involvement in open-source AI projects or infrastructure tooling.
-
Aspirational Drive: A genuine passion for building cutting-edge AI products and solving the unique technical challenges of high-performance computing.
Benefits:
-
Competitive compensation
-
Restricted Stock Units
-
Paid time off & paid holidays
-
Comprehensive health, dental & vision insurance
-
Employer contributions to HSA account
-
Paid parental leave
-
Paid life insurance, short-term and long-term
-
Professional development & tuition reimbursement
-
Mental health & wellness support
-
Commuter benefits (parking & transit)
-
Cell phone stipend
-
401(k) Retirement plan with company match up to 4% of salary
-
Volunteer time off
Compensation Range
Compensation will be paid in the range of up to $172,425 - $209,000 + Bonus. Restricted Stock Units are included in all offers. Compensation to be determined by the applicants knowledge, education, and abilities, as well as internal equity and alignment with market data.
Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to , , , , genetic information, , citizenship, marital status, /, preference/ , , , veteran status, , or any other status protected by law or regulation.