Storage Engineer
Job Title: Storage Engineer
About Hydra Host
Hydra Host is a Founders Fund-backed NVIDIA cloud partner building the infrastructure platform that powers AI at scale. We connect AI Factories - high-performance GPU data centers - with the teams that depend on them: research labs training foundation models, enterprises running production inference, and developer platforms demanding scalable compute capacity. Hydra Host is building the next-generation bare-metal GPU infrastructure network and marketplace under its Brokkr platform. The company enables independent data centers to monetize GPU capacity while providing enterprises with scalable, high-performance access to NVIDIA-based compute (e.g., H100, H200, B200, L40S, RTX 4090). As we expand our infrastructure capabilities, Hydra Host is now seeking a Storage Engineer to lead the architecture, development, and deployment of our next-generation AI/HPC storage platform.
The role:
As a Storage Engineer, you will be responsible for designing and building Hydra Host’s first production-grade storage platform from the ground up, supporting the company’s rapidly expanding network of bare-metal GPU clusters.
You’ll own the architecture, technology selection, implementation, and evolution of this platform, defining how Hydra Host manages data for large-scale, distributed AI workloads across global data centers.
This is a senior, hands‑on role for an engineer who has built storage systems for GPU clusters before, with deep expertise in both block and object storage and a strong understanding of parallel file systems, performance optimization, and large-scale orchestration.
Key Responsibilities
- Define, architect, and implement Hydra Host’s first production storage platform tailored for bare-metal GPU clusters and AI/HPC workloads.
- Lead all technical decisions around storage stack design, from hardware infrastructure to parallel file system orchestration and performance tuning.
- Select, build, and maintain storage solutions spanning both block (NVMe, SAN, Ceph, etc.) and object storage (S3-compatible, custom, or Ceph Object Gateway) layers.
- Design for high-throughput, low-latency access, supporting large datasets, rapid checkpointing, and parallel access for distributed AI training workloads.
- Integrate and optimize parallel file systems such as Lustre, BeeGFS, Spectrum Scale, WekaIO, or CephFS, ensuring maximum performance and fault tolerance.
- Ensure compatibility across Hydra’s diverse GPU/OEM ecosystem, accounting for unique firmware, BMC/Redfish APIs, and hardware configurations.
- Develop automation, observability, and management tooling for storage, focusing on reliability, scalability, and efficiency.
- Act as a builder and architect: deeply hands‑on in deployment, troubleshooting, and optimization, while guiding long‑term storage roadmap.
- Collaborate cross‑functionally with GPU, HPC, and platform engineering teams to integrate storage with compute and network layers.
- Interface with customers and product leadership to define feature priorities, performance benchmarks, and future enhancements.
Must-Have Qualifications
- 8+ years of progressive, hands‑on experience designing and implementing high-performance storage systems for compute clusters in HPC, AI, or bare‑metal cloud environments.
- Proven track record building storage infrastructure from scratch, not just operating existing systems.
- Deep expertise in block storage (NVMe, SAN, Ceph, distributed block systems) and object storage (S3, MinIO, Ceph Object Gateway, etc.).
- Strong background in parallel file systems (WekaIO, BeeGFS, Lustre, Spectrum Scale, or similar) supporting GPU or AI cluster workloads.
- Solid foundation in Linux systems engineering, automation, and scripting for distributed environments.
- Familiarity with BMC, Redfish APIs, and OEM server firmware for bare‑metal management.
- Deep understanding of AI/ML data pipelines: model checkpointing, data locality, and multi‑tiered storage optimization.
- Excellent problem‑solving, debugging, and communication skills, able to translate technical decisions into clear architectural direction.
Preferred Qualifications
- Experience building storage solutions for large‑scale GPU or HPC infrastructure.
- History of technical leadership or mentorship, growing teams or owning a product roadmap.
- Experience evaluating and managing vendor relationships and negotiating storage hardware/software contracts.
- Contributions to open‑source HPC or storage projects (Ceph, Lustre, BeeGFS, etc.).
- Familiarity with confidential computing, secure data handling, or high‑availability architectures.
Recommended Jobs
Seasonal Part Time Sales Help
EEO Statement: Hat World, Inc., Lids Holdings, Inc., dba Lids and subsidiaries is an Equal Opportunity Employer and is committed to complying with all federal, state, and local EEO laws. Hat World…
HVAC/R Commercial Specialty Tech - COLD SIDE
Overview: With over 150 years of experience, CoolSys is the market-leading indoor environment solutions company. Our full spectrum of best-in-class services includes HVAC & refrigeration installat…
Busser
Be a part of Violi’s launch at Mercato—Naples’ premier open-air lifestyle center! This is an hourly role - plus tips. About Violi & DineAmic Hospitality Violi is the latest concept b…
Physician Fellow - Individuals with Intellectual/Developmental Disabilities
Lee Health Graduate Medical Education (GME) has an exciting opportunity for a Physician Fellow (PGY-4). This fellowship is a year of training for a primary care physician who wishes to special…
Software Design Engineer
Software Design Engineer Location Orlando, FL : Company Overview: By Light Professional IT Services LLC readies warfighters and federal agencies with technology and systems engineered to connect, pr…
Travel Nurse RN - Cardiac Cath Lab - $2,158 per week
Supplemental Health Care is seeking a travel nurse RN Cardiac Cath Lab for a travel nursing job in Vero Beach, Florida. Job Description & Requirements ~ Specialty: Cardiac Cath Lab ~ Discipli…
General Utility Worker - University of West Florida
Job Description Are you self-motivated and proud of the work you do? Here at Aramark, we take pride in the level of service and safety we provide! As a General Utility Worker on our team of other …
Patient Concierge
Pay rate: $18 - $20 per hour Are you the kind of person who naturally makes others feel welcome, calm, and taken care of the moment they walk through the door? Do you thrive in fast-paced envir…
Account Manager
Yardnique Family of Companies Summary Here’s what you need to know about working here: At Yardnique, we’re a family. We are passionate about our teams and creating opportunities for our employees is…
Registered Nurse Open Heart (CVOR RN)
Cooperidge Consulting Firm is seeking a Registered Nurse (RN) Open Heart (CVOR) for a top Healthcare client. This highly specialized role provides expert perioperative nursing care to patients …