Storage Engineer - Hosting Job at Confidential, Miami, FL

ZU5PTUlZc3hDa2RCVkpJamZvSHhsQjY0S3c9PQ==
  • Confidential
  • Miami, FL

Job Description

Ready to help build the backbone of next-generation AI?

Join a Founders Fund-backed NVIDIA cloud partner that is creating the high-performance infrastructure powering some of the world’s most ambitious AI research. In the realm of GPU-as-a-Service, the bottleneck isn’t compute, it’s the data.

As a Storage Engineer, design and implement the data layer that enables foundation model training and enterprise-grade production inference. This role requires a deep understanding that AI at scale demands more than capacity: it demands massive throughput, ultra-low latency, and the ability to feed thousands of GPUs seamlessly.

Take the next step in your career and help shape the infrastructure that drives the future of AI.

Responsibilities:

  • Design & Deploy AI Storage: Architect and implement high-performance parallel file systems (Weka, Lustre, or similar) optimised specifically for GPU-heavy workloads and multi-node training.
  • Optimise Data Pipelines: Fine-tune storage performance to ensure maximum GPUDirect Storage (GDS) efficiency, minimising latency between the storage fabric and the GPU memory.
  • Manage Scale & Reliability: Build and maintain petabyte-scale storage clusters across multiple global data centers, ensuring 99.99% uptime for mission-critical AI research labs.
  • Infrastructure Integration: Partner with Network and Data Center engineers to configure high-speed storage networking (InfiniBand/400G Ethernet) and ensure seamless backend connectivity.
  • Automate Storage Ops: Develop Terraform providers, Ansible playbooks, or Python scripts to automate the provisioning, monitoring, and scaling of storage resources.
  • Troubleshoot Complex I/O: Act as the Tier-3 lead for storage-related performance degradation, identifying root causes in the filesystem, network, or Linux kernel.

Skills/Must have:

  • Specialised Storage Expertise: 5+ years of experience with high-performance storage solutions (WekaIO, VAST Data, BeeGFS, or DDN) in a Linux-heavy environment.
  • AI Infrastructure Knowledge: Deep understanding of how storage interacts with NVIDIA GPU stacks (HGX/DGX) and the specific I/O patterns of ML training (checkpoints, small file reads, etc.).
  • Networking Proficiency: Hands-on experience with InfiniBand, RoCEv2, and NVMe-over-Fabrics (NVMe-oF).
  • Systems Automation: Strong scripting skills in Python, Go, or Bash, and experience with IaC tools like Terraform or Pulumi.
  • Linux Internals: Deep knowledge of the Linux storage stack, including XFS/ZFS, LVM, and kernel tuning for high-throughput networking.

Benefits:

  • 10% bonus
  • Stock options

Salary:

  • $200,000 base salary

Job Tags

Permanent employment

Similar Jobs

Rendr

Medical Scribe and Medical Assistant Job at Rendr

 ...leading primary care focused, multi-specialty medical group dedicated to serving the Asian...  ...package.(Salary is based on previous experience and years of service.) Join a team that...  ...Account Job Overview: A Medical Scribe is responsible for relieving a physician... 

Employment Process Group

Associate Project Manager (Entry Level) - DOT Camera System Project Job at Employment Process Group

 ...Job Title: Project Manager Company: ANE Consulting Location: New York City (Fully Remote or Hybrid 55 Water Street) Type: Full-Time Compensation: Salary potential...  ...project plans, documentation, and executive-level presentations Support proposal... 

PPS-HPS

CNC EDM Machinist Job at PPS-HPS

 ...Now hiring a CNC EDM Machinist for a growing manufacturing company. Looking for someone with experience setting up and operating Wire EDM and/or Sinker EDM machines in a precision machining environment. What Youll Do: Set up and run CNC EDM equipment Read blueprints... 

Matrix Design Group

Senior Computer Vision Engineer Job at Matrix Design Group

 ...help keep people safe. Originally focused on the underground coal mining industry, Matrix has expanded into new industrial markets in the...  ...model deployments Implement CI/CD pipelines for models, data, and source code Work with product and project managers to ensure... 

BaRupOn LLC

NCCCO Certified Crane Operator Job at BaRupOn LLC

 ...Equipment | Logistics Safety-Driven. Precision-Focused. Industrial-Grade Work. BaRupOn is seeking a skilled and safety-focused Crane Operator to support industrial and heavy-lift operations in Liberty, TX. This role is ideal for an experienced operator with hands-on...