Internships at CSCS - the Swiss National Supercomputing Centre 2025
Auf einen Blick
- Veröffentlicht:17 Dezember 2024
- Pensum:100%
- Vertrag:Temporär
- Sprache:Englisch (Fliessend)
- Arbeitsort:Lugano
Internships at CSCS - the Swiss National Supercomputing Centre 2025
The Swiss National Supercomputing Centre (CSCS) develops and operates cutting-edge, high-performance computing (HPC) systems as an essential service facility for science. The centre enables world-class research through its scientific user lab, which is available to domestic and international researchers in academia, industry, and the business sector. The centre is operated by ETH Zurich and has offices in Lugano (headquarters) and Zurich.
Job description
Leveraging Retrieval-Augmented Generation with LLMs for Accurate Responses from Corporate Documentation.
The Swiss National Supercomputing Centre (CSCS) provides high-performance computing services for Swiss researchers, supporting activities from high-resolution simulations to complex data analysis. To maintain quality and continuity of services, CSCS manages extensive documentation in Confluence, with a publicly accessible section known as the Knowledge Base, and problem-solving knowledge in Jira Service Desk. This project aims to leverage Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) technologies to develop a chatbot service that delivers answers based on CSCS documentation. The service will be initially designed for internal use at CSCS. The work involves setting up state of the art pre-trained LLMs for efficient inference, ingesting data from Confluence and Jira, and integrating components into a web- or Slack-based chatbot. The minimum criterion for success is the development of a service capable of accurately retrieving and providing links to relevant Confluence pages and Jira issues based on user queries. A more advanced measure of success would be the chatbot's ability to generate precise, contextually relevant answers to queries, beyond simply returning links, thereby enhancing the overall user experience.
Although the project centers around using LLMs, the ideal candidate should have a strong focus and interest in system engineering. The work will require a solid knowledge of Python and Linux, deploying pre-trained LLMs, running them on Kubernetes (k8s) or SLURM clusters, and interacting with Confluence and Jira. The candidate should be independent, motivated, and eager to work in a flexible, international environment.
Requirements
Skills
- solid knowledge of Python and Linux
- independence and motivation
- problem-solving
- knowledge of HPC system is beneficial
- basic understanding of LLM is beneficial
- system engineering skills are beneficial
Enhancing an Internal Development Platform with Automation and Monitoring
The Swiss National Supercomputing Centre (CSCS) develops and operates a high-performance computing and data research infrastructure that supports world-class science in Switzerland. To achieve this mission, multiple specialized engineering teams must collaborate effectively, utilizing a dynamic set of processes and tools for developing and operating their solutions. For effective collaboration, an integrated and harmonized Software Development Lifecycle (SDLC)—covering planning, documentation, coding, building, testing, and monitoring—is essential. The platform team responsible for core SDLC services at the center is seeking a motivated intern to join them.
The selected candidate will have the opportunity to make tangible improvements to CSCS's internal development platform. Working alongside experienced system administrators and software engineers, the intern will gain hands-on experience with automation, CI/CD pipelines, containerized deployments, observability, and best practices in DevOps. The goal of the internship is to create a monitoring dashboard that meets the operational needs of the platform team by integrating essential information from tools such as Jira, Confluence, GitLab, JFrog, and Vault. This work will involve centralizing logging and automating validations to report on key metrics. At least one DORA metric should be included, with additional metrics to be defined based on the candidate's interests.
Requirements
- Skills
- Linux and git experience
- Scripting languages (e.g. Bash or Python)
- REST APIs familiarity
- Problem decomposition
- Good communication skills
- Beneficial Skills:
- CI-CD pipelines
- Automated testing
- Linux containers and their orchestration
- Monitoring frameworks
Add support for profiling CUDA workloads to Tracy
Tracy is an open source tool for profiling and performance analysys, originally designed for video game development . To see a presentation of Tracy in action, have a look at this talk from cppcon .
It has been widely adopted in other performance-sensitive fields, such as high-frequency trading, because of its responsive user interface, ease of use and low overheads. Tracy currently supports profiling of graphics APIs (DirectX, Vulkan, etc), and has a well-defined interface for adding support for additional GPU backends.
The focus of this internship would be to add support for profiling HPC and ML/AI workloads that use CUDA, by adding a new GPU back end to Tracy. At the end of the internship, we will aim to demonstrate Tracy working for GH200 system Alps. The plugin will be published as an open source extension, that we will try to have upstreamed to the main tracy code.
There is one main goal, and some optional stretch goals that would depend on the length of the internship and the interests of the applicant.
- main goal: add support for CUDA tracing of based on the existing Vulkan back end using the CUPTI library.
- stretch goal: add support for automatic tracing of kernel and memory transfers.
- stretch goal: investigate MPI support.
- stretch goal: add AMD GPU support.
Requirements
- Skills
- C/C++/Rust familiarity is required, and an interest in low-level programming.
- Experience GPU programming would be beneficial, but it is not a hard requirement.
YAULT Tool Integration and Development
We are looking for a motivated and skilled intern to join our team and contribute to the development and integration of YAULT (Yet Another User Logging Tool).
This internship offers an excellent opportunity to gain hands-on experience in HPC system monitoring, benchmarking, and quality assurance, contributing to the success of our research computing initiatives.
YAULT is a tool designed for monitoring applications executed by users on various Linux systems. It plays a crucial role in identifying the most used applications executed by users on our systems.
Thus, facilitating the understand and the development of user environments that fit the needs of our users.
Join us in advancing the frontier of High-Performance Computing and make a meaningful impact on the world of scientific research! Apply today to embark on an exciting journey in HPC system monitoring and benchmarking.
During the internship, you will be able to
- Collaborate with our team to develop, execute, and maintain the YAULT ecosystem;
- Assist in integrating the YAULT tool into our staging systems;
- Assist in optimizing and automating the testing process to ensure efficient and consistent validation in our staging system;
- Develop and integrate an application knowledge database into our central logging system for comprehensive tracking and analysis;
- Investigate the performance impact introduced by YAULT across different workloads; and
- Conduct an initial statistical analysis of the applications executed by users, providing insights into system usage.
If you are eager to learn, contribute, and gain practical experience in software development, we encourage you to apply!
Requirements
- Skills
- Python programming;
- Strong interest in High-Performance Computing and system testing;
- Familiarity with CI/CD pipelines to support continuous integration and deployment processes; and
- Knowledge of data science or machine learning, C++, and eBPF are a plus, but not mandatory
eBPF-based Tooling Development for HPC System Monitoring
We are seeking a passionate and technically skilled intern to join our team and assist in the development of advanced monitoring tools for our High-Performance Computing (HPC) systems using the eBPF (Extended Berkeley Packet Filter) technology.
This internship offers the opportunity to work at the cutting edge of system monitoring technology. Hands-on experience in developing monitoring tools using state-of-the-art technologies in an HPC environment, collaboration with a skilled and innovative team on critical projects involving performance and system monitoring, and exposure to performance analysis and the integration of monitoring solutions into large-scale systems.
During the internship, you will be able to
- Collaborate with our team to design, develop, and maintain monitoring tools leveraging eBPF;
- Assist in executing and refining these tools to monitor various aspects of our HPC systems; and
- Investigate the performance impact of different monitoring tools on diverse workloads and ensure efficiency in resource usage.
If you're eager to learn, contribute, and work on challenging real-world problems, we encourage you to apply and help shape the future of HPC system monitoring!
Requirements
- Skills
- Proficiency in C/C++ for low-level systems programming and tool development
- Familiarity with eBPF or bpftrace for advanced system monitoring and troubleshooting;
- Familiarity with CI/CD pipelines to support continuous tool integration and deployment; and
- Knowledge of the Linux Kernel and experience with its network stack are advantageous
Profile
For the above positions students must be enrolled in a Swiss University Master level (or final year Bachelor) and for Third Country nationals the internship must be a mandatory part of their university curriculum. The student must be attending the university in person (not on-line) and must be living in Switzerland. In addition the Masters cannot be already concluded.
The candidate must be a student in one of the following fields: Computer Science, Mathematics, Physics or related fields. Ph.D. students will not be considered
The ideal candidate is a team player and feels comfortable working in an international environment in the heart of Lugano, Canton of Ticino or in Zürich in Switzerland. Excellent command of written and spoken English (our official working language) is a must.
Additional rules for Third Country Nationals: For 3rd Country Nationals the internship must be a mandatory internship and it must be start during the semester. However, it can flow into the semester holidays.
We offer
CSCS values autonomy, ownership, and continuous learning. Students can gain specialised hands-on experience through various challenging activities typical of the HPC field.
- ETH Zurich is a family-friendly employer with excellent and flexible working conditions.
- You can look forward to an exciting working environment, cultural diversity, and attractive offers and benefits.
- We value the diversity of our team, and to further enhance our workforce's diversity, we encourage women to apply.
- We offer internships of 2-6 months. During this period the intern will be mentored by and collaborating with HPC experts in the centre. A salary of 2’500.00 CHF/month is granted.
We value diversity
Curious? So are we.
We look forward to receiving your complete online application, which we ask you to refer to Stephanie Frequente, HR Partner.
- a pdf letter of motivation
- pdf CV
- diplomas in pdf
- employment certificates in pdf
We only take applications with pdf documents into consideration.
Please specify in your application explicitly a maximum of 2 topics which fit your interests.
As there is a high demand for the internships in certain periods and we can only offer 2 internships per quarter, kindly also state your availability (prefered time frame for the internship).
Please note that we exclusively accept applications submitted through our online application portal. Applications via email or postal services will not be considered.
For further information, please visit our website or contact Dr Guilherme Peretti-Pezzi, Write an email.