Job Description
We are looking for a visionary Senior AI Infrastructure Engineer to lead our next-generation data center operations. As we prepare to scale our neural networks for the year 2026, we need a technical leader who can architect resilient, high-performance systems capable of handling petabyte-scale data streams. You will bridge the gap between cutting-edge machine learning research and robust, scalable production infrastructure.
Why Join Us?
At Nexus Core, we are redefining the boundaries of artificial intelligence. You will work in a collaborative environment with world-class researchers and engineers, contributing to projects that will shape the future of tech.
Responsibilities
- Design and manage large-scale distributed AI training clusters using Kubernetes and custom orchestration tools.
- Optimize GPU utilization, memory management, and data pipeline efficiency to reduce training time.
- Implement and maintain high-availability infrastructure for deep learning models.
- Collaborate with data scientists to translate research prototypes into scalable production services.
- Drive automation initiatives to streamline CI/CD pipelines for machine learning models.
- Monitor system performance and implement disaster recovery strategies.
Qualifications
- 5+ years of experience in Systems Engineering, DevOps, or Infrastructure Architecture.
- Strong proficiency in Python, Go, or Rust, and experience with containerization (Docker, Kubernetes).
- Deep understanding of Linux internals and high-performance computing environments.
- Experience with cloud platforms (AWS, GCP) and managed AI services.
- Excellent problem-solving skills and the ability to work in a fast-paced, agile environment.
- Bachelor’s degree in Computer Science, Engineering, or a related technical field.