Infrastructure Operations Engineer (GPU Computing) - Enterprise AI

Aethir
Largo, FL

Aethir is a pioneering technology company at the forefront of GPU-based compute infrastructure, specializing in cutting-edge solutions for diverse industries ranging from AI and machine learning to high-performance computing (HPC). We're dedicated to pushing the boundaries of what's possible, leveraging the latest advancements in hardware and software to empower our clients with unparalleled computational capabilities.

About the Role:

We are seeking a highly skilled and motivated Infrastructure Operations Engineer to join our dynamic team. As an integral member of the InfraOps team, you will play a key role in managing and optimizing our GPU-based compute infrastructure (across multiple locations and partners), ensuring maximum performance, scalability, and reliability.

Responsibilities:

  • Infrastructure Management: Deploy, configure, and maintain GPU-based compute infrastructure, including servers, storage, networking, and associated software stack. Aethir facilitates compute from dozens of providers around the world, from 4090s to H200s.
  • Monitoring and Optimization: Implement robust monitoring and alerting systems to proactively identify performance bottlenecks, resource constraints, and potential failures. Continuously optimize infrastructure to improve performance, efficiency, and cost-effectiveness.
  • Automation and Orchestration: Develop automation scripts and tools to streamline deployment, configuration, and management of infrastructure components. Implement infrastructure as code (IaC) principles to enable rapid provisioning and scaling.
  • Security and Compliance: Implement and enforce security best practices to safeguard sensitive data and ensure compliance with relevant regulations and industry standards. Conduct regular security audits and vulnerability assessments.
  • Incident Response and Troubleshooting: Provide tier-3 support for infrastructure-related issues, investigating root causes and implementing timely resolutions. Participate in on-call rotation to respond to critical incidents outside of regular business hours.
  • Capacity Planning and Scaling: Collaborate with cross-functional teams to forecast resource requirements, plan capacity upgrades, and scale infrastructure to accommodate growing workloads and user demands.
  • Documentation and Knowledge Sharing: Maintain comprehensive documentation of infrastructure configurations, procedures, and troubleshooting guidelines. Share knowledge and best practices with team members to foster continuous learning and skill development.
  • Experience in infrastructure operations, preferably in a DevOps or SRE role or Sales Engineering or Solution Architect role - focused on GPU compute.
  • Proficiency in managing GPU-based compute infrastructure, including NVIDIA GPUs and CUDA programming.
  • Strong expertise in Linux system administration and shell scripting (e.g., Bash, Python).
  • Experience with configuration management tools (e.g., Ansible, Chef, Puppet) and version control systems (e.g., Git).
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Solid understanding of networking concepts, protocols, and troubleshooting techniques.
  • Excellent analytical and problem-solving skills, with a proactive and results-oriented mindset.
  • Effective communication skills and the ability to collaborate effectively with cross-functional teams. We operate in English, but speaking Mandarin as well is a big bonus as we have engineering teams in China and Southeast Asia.
  • Experience with cloud computing platforms (e.g., AWS, Azure, GCP) and hybrid cloud architectures.
  • Knowledge of HPC frameworks and job scheduling systems (e.g., Slurm, PBS Pro).
  • Familiarity with GPU-accelerated libraries and frameworks (e.g., TensorFlow, PyTorch, CUDA Toolkit).
  • Understanding of cybersecurity principles and practices, including encryption, access controls, and threat detection/prevention.
  • Bonus if you know Web3 (cryptocurrency, tokenization of RWAs, mining/staking, etc.).
  • Competitive compensation structure (and flexible on fiat/token mix).
  • Can be flexible on benefits, depending on location and setup.
  • Salary is also flexible depending on location and setup.
  • Flexible work hours and remote work options.
Posted 2025-09-22

Recommended Jobs

Accountant

Orcom Us
Miami, FL

Benefits/Perks Careers Advancement Opportunities  Flexible Scheduling Competitive Compensation Job Summary We are seeking a knowledgeable accountant to join our team. In this rol…

View Details
Posted 2025-09-14

Mental Health Counselor (Family Resource Center) (20426089)

CalOpps
Bay County, FL

Description The Department   The Human Services Department (HSD) delivers and supports services by forging long-term community partnerships, engaging with and building the capacity of the commu…

View Details
Posted 2025-08-06

Customer Service Representative

KATHERINE JONES INSURANCE AGENCY IN
Clearwater, FL

Job Description Job Description If you are still interested in a position with Katherine Jones State Farm, please click on the link below to apply for the most updated position available. -1.SF…

View Details
Posted 2025-07-30

Grounds Supervisor -PT

Mirada Club
San Antonio, FL

MetroLagoons is growing fast, and we are looking for motivated individuals to grow with us! If you are an energetic and enthusiastic professional. Come join the team at the best lagoons in the Unit…

View Details
Posted 2025-08-19

Senior Backend Software Engineer (Remote - US)

Jobgether
Largo, FL

This position is posted by Jobgether on behalf of AWeber . We are currently looking for a Senior Backend Software Engineer in United States . As a Senior Backend Software Engineer, you’ll play…

View Details
Posted 2025-09-14

Residences Houseman

Marriott International, Inc
Singer Island, FL

Residences Houseman Location Singer Island, FL : Additional Information Job Number 25070956 Job Category Housekeeping & Laundry Location The Ritz-Carlton Residences Singer Island, 2700 North Oce…

View Details
Posted 2025-09-16

Attorney

State of Florida
Tallahassee, FL

  Do not click the Apply button. Apply here   at GovernmentJobs.com   The Florida Legislature Senate Committee on Military and Veterans Affairs, Space, and Domestic Security     Job…

View Details
Posted 2025-09-21

Registered Dental Hygienist - Habersham Dental

MaKai Consulting, LLC
Miami, FL

Registered Dental Hygienist - Habersham Dental, Savannah GA  Location: Savannah, GA Position: Registered Dental Hygienist Type: Full-time/Part-time About Us: Habersham Dental is a premie…

View Details
Posted 2025-07-31

Nutrition Clinical Coordinator

Johns Hopkins Medicine
Saint Petersburg, FL

$10,000 sign-on bonus! Johns Hopkins All Children's Hospital is a premiere clinical and academic health system, providing expert pediatric care for infants, children and teens with some of the most …

View Details
Posted 2025-08-19

Insurance and Financial Services Position - State Farm Team Member

Sarah Mclendon - State Farm Agent
Jupiter, FL

Job Description Job Description Benefits: ~ Group life insurance ~401(k) ~ Bonus based on performance ~ Competitive salary ~ Opportunity for advancement ~ Paid time off ~ Training & …

View Details
Posted 2025-07-29