Job Description
We are on the cutting edge of the technological revolution, building the infrastructure for the future of intelligence. As we project our roadmap toward 2026, we are seeking a visionary AI/ML Infrastructure Architect to design scalable, high-performance systems that can handle next-generation machine learning workloads. If you are passionate about optimizing deep learning pipelines and building resilient cloud-native architectures, this is your opportunity to lead the charge.
At Apex Future Systems, we don't just predict the future; we engineer it. Join a team of world-class engineers and data scientists dedicated to pushing the boundaries of what is possible in artificial intelligence.
Responsibilities
- Architect and implement scalable distributed systems for training and inference of large-scale AI models.
- Design and manage GPU clusters and high-performance computing (HPC) environments to maximize resource utilization.
- Collaborate with data scientists to optimize model latency, throughput, and cost-efficiency.
- Ensure the security, reliability, and observability of all ML infrastructure pipelines.
- Lead the migration strategy to next-gen cloud platforms, ensuring seamless integration.
- Define technical standards and best practices for the ML engineering team.
Qualifications
- 10+ years of experience in systems engineering, software architecture, or machine learning operations.
- Deep expertise in Python, PyTorch, TensorFlow, and major ML frameworks.
- Strong proficiency in containerization technologies (Docker, Kubernetes) and cloud platforms (AWS, GCP, or Azure).
- Experience with MLOps tools and CI/CD pipelines for machine learning.
- Proven track record of designing high-availability systems capable of handling petabyte-scale data.
- Master's degree or PhD in Computer Science, Machine Learning, or a related technical field.