4. SISCI API Demo and Example programs

The eXpressWare development package comes with several example and benchmark programs. It is recommended to study these before designing your own application.

When you install the SISCI-devel RPM, it will place the available SISCI Demo and Example program source in /opt/DIS/src directory. The Cluster Node installation will by default also install the precompiled corresponding binaries in /opt/DIS/bin. More information in the next sections.

All example code can be compiled using the Makefile.demo makefile found in /opt/DIS/src.

	cd /opt/DIS/src
	make -f Makefile.demo
      

All SISCI example, demo and benchmark programs supports various command line options, details will be provided during runtime if you start each application with the -help option.

4.1. SISCI API Example programs

The purpose of the example programs is to demonstrate the basic usage of selected SISCI API functionality.

When you install the SISCI-devel RPM, it will place the SISCI example program source in /opt/DIS/src/examples/. The Cluster Node installation will install a precompiled set of these applications into the /opt/DIS/bin directory.

All programs share a common set of command line interface options:

'-rn XX' where XX is the nodeId of the remote system. The nodeId of a local system can be determined by running the 'query' SISCI utility.

'-server' or '-client' to specify the client or server side functionality.

4.1.1. shmem

The shmem program code demonstrates how to create a basic SISCI program and exchange data using PIO. An interrupt is created and signalled when the data exchange is completed.

4.1.2. memcopy

The memcopy program code demonstrates how to create a basic SISCI program and exchange data using PIO. An interrupt is created and signalled when the data exchange is completed.

4.1.3. interrupt

The interrupt program code demonstrates how to trigger an interrupt on a remote system using the SISCI API. The receiver thread is blocking, waiting for the interrupt to arrive.

4.1.4. data_interrupt

The data_interrupt program code demonstrates how to trigger an interrupt with data on a remote system.

4.1.5. intcb

The intcb program code demonstrates how to trigger an interrupt on a remote system. The receiver thread is notified using an interrupt callback function.

4.1.6. lsegcb

The lsegcb program code demonstrates the use of local segment callbacks.

4.1.7. rsegcb

The rsegcb program code demonstrates the use of remote segment callbacks.

4.1.8. dma

The dma program code demonstrates the basic use of DMA operations to move data between segments.

The MXH830 PCIe chip does not have a PCIe DMA engine. The SISCI Software will automatically utilize System DMA if this available with your system.

The eXpressWare software fully support system DMA if this is available with your motherboard / CPU. System DMA will automatically be detected and enabled when the software is loaded.

4.1.9. dmacb

The dma program code demonstrates the basic use of DMA operations to move data between segments using the completion callback mechanism.

4.1.10. dmavec

The dma program code demonstrates how to set up a vectorized DMA operations.

4.1.11. rpcia

The rpcia program code demonstrates how to use the PCIe peer to peer functionality to enable remote systems to access a local PCIe resource / Physical address within the system.

Note

Please note that the PCI Express peer to peer functionality is only available with some servers. Please ask your system vendor to confirm PCI Express peer to peer functionality is supported.

4.1.12. smartio_example

The smartio_example program code demonstrates how to use the SISCI API SmartIO extension to access a Transparent PCIe device in the PCIe fabric.

4.1.13. reflective_memory

The reflective_memory program code demonstrates how to use PCIe multicast / reflective memory functionality.

Note

Please note that the MXH830 and MXH930 in 3 and 5 node configurations does not support PCIe multicast (yet).

4.1.14. reflective_device

The reflective_device program code demonstrates how to use the SISCI API to enable a PCIe device to directly send multicast data to a multicast group.

This program requires the PCIe peer to peer functionality.

4.1.15. reflective_device_receive

The reflective_device program code demonstrates how to use the SISCI API to register a PCIe device to directly receive PCIe multicast data and how to enable a PCIe device to directly send PCIe multicast data to/from a multicast group.

This program requires the PCIe peer to peer functionality.

4.1.16. reflective_write

The reflective_write program code demonstrates how to use PCIe multicast / reflective memory functionality.

4.1.17. probe

The probe program code demonstrates how to determine if a remote system is accessible via the PCIe network.

4.1.18. query

The query program code demonstrates how to identify various system properties and status settings.

4.1.19. cuda

The cuda program demonstrates basic integration with the NVIDIA CUDA® programming environment for GPUs.

The program code demonstrates how to use the SISCI API to attach and access a CUDA GPU buffer.

To use this program, you need to install the CUDA programming environment from NVIDIA.

The Dolphin eXpressWare must also be installed using the --enable-cuda-support option. More information can be found in Section 2.9, “eXpressWare CUDA® integration”

The system having the GPU installed must also support PCI Express Peer 2 Peer transactions (P2P).

4.2. SISCI API demo and benchmarks programs

The purpose of the benchmark and demo programs is to demonstrate how to measure the actual communication performance over the PCIe network.

When you install the SISCI Devel RPM, it will place the SISCI benchmark and demo program source in /opt/DIS/src/demo/. The Cluster Node installation will install a precompiled set of these applications into the /opt/DIS/bin directory.

4.2.1. scibench2

The scibench2 program can be used to determine the actual CPU load/store performance to a remote or local segment.

The program copies data to the remote segment without any synchronization between the client and server side during the benchmark.

The send latency displayed by the application is the wall clock time to send the data once.

4.2.2. scipp

The scipp program can be used to determine the actual CPU store latency to a remote or local segment.

The program will sends data to the remote system. The remote system is polling for new data and will send a similar amount of data back when it detects the incoming message.

4.2.3. dma_bench

The dma_bench program can be used to determine the actual DMA performance to a remote or local segment.

The program connects to a remote segment and executes a series of single sided DMA operations copying data from a local segment to a remote segment. There is no synchronization between the client and server side during the benchmark.

4.2.4. intr_bench

The intr_bench program can be used to determine the actual latency for sending a remote interrupt.

The program implements a interrupt ping - pong benchmark where the client and server sides exchanges interrupts and measures the full round trip latency. The interrupt latency measured by the program will be the average of both systems. The interrupt latency measured is the full application to application latency.

4.2.5. reflective_bench

The reflective_bench program can be used to benchmark the reflective memory / multicast functionality enabled by PCI Express networks.

The program implements a multicast data ping - pong benchmark where the client and server sides exchanges multicast data.

Reflective memory functionality is fully supported in two node configurations and with a central switch.