Dolphin Express HPC Solutions

Dolphin Express DX is our new and most powerful solution for desktop and medium sized HPC.  Using the switched  dual port - 2x10Gbs (Total 40Gbs bidirectional) DX topology you can easily build very cost efficient and powerful clusters up to 10 nodes. It provides the worlds lowest latency PCI Express based HPC solution in the  world. Learn more about Dolphin Express DX.

Dolphin SCI has for many years been used to power larger low latency HPC clusters around the world. The larges installed SCI system is a 200 node system at the Linköping University, Sweden. Several other systems, e.g 125 nodes, 144 nodes exists. The largest new SCI installation so far in 2009 is 49 nodes. Dolphin Express SCI is still the recommended solution for building larger low latency HPC clusters scaling into hundreds of nodes. Learn more about Dolphin Express SCI.

 

Better latency than Infiniband

Both the DX and SCI interconnects provides approximately half the latency of similar PCIe connected Infiniband solutions. Some performace results below:

DX Pallas PingPong results using NMPI:

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 2.19 0.00
1 1000 2.51 0.38
2 1000 2.49 0.77
4 1000 2.56 1.49
8 1000 2.57 2.97
16 1000 2.60 5.87
32 1000 2.78 10.96
64 1000 2.93 20.81
128 1000 3.50 34.83
256 1000 4.12 59.23
512 1000 5.00 97.64
1024 1000 6.89 141.72
2048 1000 11.11 175.85
4096 1000 19.11 204.40
8192 1000 35.39 220.75
16384 1000 20.00 781.09
32768 1000 36.59 854.01
65536 640 61.11 1022.69
131072 320 125.44 996.47
262144 160 233.85 1069.04
524288 80 471.40 1060.67
1048576 40 994.11 1005.92
2097152 20 2351.15 850.65
4194304 10 5586.15 716.06

DX Pallas Sendrecv results using NMPI:

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 2.51 2.51 2.51 0.00
1 1000 2.81 2.81 2.81 0.68
2 1000 2.81 2.81 2.81 1.36
4 1000 2.90 2.90 2.90 2.63
8 1000 2.89 2.89 2.89 5.28
16 1000 2.92 2.92 2.92 10.43
32 1000 3.10 3.10 3.10 19.69
64 1000 3.25 3.25 3.25 37.53
128 1000 3.83 3.84 3.84 63.63
256 1000 4.43 4.43 4.43 110.15
512 1000 6.03 6.03 6.03 161.98
1024 1000 9.86 9.86 9.86 198.13
2048 1000 17.72 17.72 17.72 220.40
4096 1000 33.07 33.08 33.07 236.19
8192 1000 64.08 64.10 64.09 243.75
16384 1000 20.80 20.80 20.80 1502.62
32768 1000 38.37 38.37 38.37 1628.75
65536 640 75.20 75.20 75.20 1662.17
131072 320 153.16 153.16 153.16 1632.29
262144 160 303.49 303.51 303.50 1647.38
524288 80 631.20 631.25 631.23 1584.16
1048576 40 1259.98 1260.00 1259.99 1587.30
2097152 20 2831.10 2831.20 2831.15 1412.83
4194304 10 6415.49 6415.92 6415.70 1246.90

 

 

Software for running HPC Applications

There are two alternative software solutions for running HPC applications on a Dolphin Express cluster:

NMPI

NMPI is an effective and reliable implementation of the Message-Passing Interface (MPI-2) over Dolphin Express. It is based on MPICH2 - the open source implementation of MPI-2 developed by the Mathematics and Computer Science Division at Argonne National Laboratory. The initial NMPI implementation was completed by Nicevt.

NMPI is currently available for Linux and Windows for SCI and Linux for DX. A Solaris and Windows port is being implemented for DX.

The code is currently mainained and improved by a collaboration between Nicevt and Dolphin. The latest NMPI code can be found in the Dolphin Download section.

 

SuperSoCKets

Dolphin SuperSockets supports any ethernet based MPI library and is the recommended solution if you don't want to use the NMPI software.

The following MPI libraries has been tested with Dolphin SuperSockets

 

More details and performance data for various MPI libraries coming.