The platform described briefly in this section was designed by Atos / Bull with the participation of DDN to support the execution of the many project use cases. The hardware part is described below and illustrated together with the software stack respectively by Figure 1 and Figure 2.
Needs for endless computing power has spurred architectural innovations in the pursuit of more compute capabilities: Many efforts have been dedicated to the integration of new computing devices including GPUs, FPGAs or ASICs in the quest to participate in the heterogeneous world of computing. This trend leads the Evolve platform to adopt the following computing devices (detailed descriptions can be found in Deliverables D7.1.a and D7.1.b):
- CPUs (Intel/Broadwell-Haswell-SKX) characterized by features including Homogeneous Numa Multi-core, Frequency, #Cores, cache sizes , cache coherence / memory consistency protocol (Intel/UPI®) to provide a total of N102 TFlops
- GPU (Intel embedded Gen9 GPU, Nvidia Tesla K20, P40, V100) supported by the following respective programming models: OpenMP/OpenAcc, CUDA / OpenCL
- FPGA (Altera Arria 10/Stratix 10) characterized by #CLBs/LEs/LUTs, RAMs, DSPs, Interconnect, SERDES/Transceivers and supporting as programming models HDL (Vhdl/SVerilog), OpenCL SDK.
The Evolve/Nova platform adopts as interconnect backbone IB/FDR (56 Gb/s). InfiniBand uses a low-cost bi-directional bus that equates to Ethernet but has been designed for high performance. It benefits from low latency (~ 1 µs) and speeds of several tens of Gbit/s. InfiniBand uses technology that allows multiple devices to access the network at the same time. The switch-based IB is actually the preferred interconnect for HPC for both storage and computing. The Evolve platform adopts a fat-tree interconnect topology minimizing the number of hops between computing nodes.
The IB/FDR network is doubled by an Ethernet network (Gb/s) for service and additional storage purposes and using Cisco catalyst 3560G (44 ports) switches. More details of the interconnect networks can be found in deliverables D7.1.a and D7.1.b.
The memory/storage capacity of the Evolve platform has been estimated from the following considerations:
- HPL/HPCG benchmarks: 2 GB (resp. 1GB) / x86 (resp. PowerPC) core
- Increase in function of the number of cores: 106cores => 10GB/core
Adoption of the DDN’s IME® scale-out, software-defined, flash storage platform is dictated by the needs to accelerate the path from computing nodes to storage devices. This allows in turn to streamline the data path for application I/O and to realizes flash-cache economics with the storage
Details of the memory and storage capacities can be found in deliverables D7.1.a and D7.1.b and can be summarized as 2 NetApp 2700 providing:
- 60 disks (1.8Gb each): 98Tb
- 24 disks (1.8Gb each): 47Tb
Access to the Evolve platform is granted to all project partners (currently 50 users) with possibilities to access to container registry repositories on internet.