Navigos Search is seeking System Operations Engineer to work with global development teams to install, integrate, and secure the hardware and software components of various systems of the company or supply, continuously automating this process.
Join the team you will have the opportunities to:
- Working with huge system, servers: operating clusters of hundreds GPU/NPUs for development
- Learning and collaborating in a global expert team - using English daily
- Great package benefits with full social insurance package from day 1
Responsibilities
- Build/manage system S/W components such as GPU/NPU device drivers, communication libraries, directory services, distributed file systems, AI acceleration, and object storage for clustering.
- Automate S/W provisioning processes through IaC tools such as Ansible and Terraform or programming.
- Build/manage container orchestration tools such as Kubernetes (K8s) in clusters.
- Analyze and resolve the causes of various S/W or H/W errors.
- Provide overall management and technical consulting for Moreh's customer operating infrastructure.
- Install/operate various equipment in data centers, including CPU/GPU/NPU servers, high speed interconnection networks such as InfiniBand and RoCE, storage servers, and firewalls.
Qualifications
- 3+ years of experience operating and managing Linux-based cluster systems
- Extensive understanding of various H/W and S/W components of computer systems.
- Knowledge of Docker and Kubernetes, and experience building a Kubernetes cluster oneself.
- Experience in analyzing various logs and operating monitoring solutions for large-scale IT infrastructure.
- Experience in developing high-availability (HA) S/W and related knowledge.
- Experience in installing and maintaining Linux systems at an IT system/solution distributor or reseller.
- Fluent English conversation skills (Writing & Reading)
- Excellent logical thinking and problem-solving skills.
Preferential
- Bachelor's/Graduate degree in Computer Engineering or related field. Experience installing, configuring, and operating InfiniBand networks. Python/C Programming Skills.
- Experience in building and managing cluster systems, especially GPU clusters.
- Having experience operating/monitoring large-scale cluster (up to hundreds nodes).
- Fluent Chinese conversation. Fluent Korean conversation.
For more information, feel free to reach out to Ms. Nhung via:
Telegram/Skype: thuynhungng
Phone/Zalo/Whatsapp: 0973723298
DAOU, meaning ‘doing great good to the World’, has been leading the Korean IT industry with outstanding technology over the last 35 years. Daoukiwoom Innovation is a part of Daoukiwoom Group.
- DaouKiwoom Grop upstarted off in the IT business areas, having expanded business horizons to online based financial services, contents, and service businesses. It has also been continuing its stride to overseas, transforming itself into a global enterprise.
- With the first listing of Daou Technology on the KOSPI market in 1997, today DaouKiwoom Group has 3 affiliates listed on KOSPI and 5 on KOSDAQ, a total of 8 listed companies.
- DaouKiwoom Group has expanded into USA, Japan, China, Indonesia, Vietnam and France, creating outstanding developments in areas of IT, finance, and service.
If you are looking for a company where you can show your skills in a dynamic environment, please come to us! Being part of our company, you will work with a funny and happy team.