NUMA Success Story

The Customer – HP Labs

HP Labs is the exploratory and advanced research group for Hewlett-Packard. The lab has some 600 researchers in seven locations throughout the world. HP Labs research areas include social computing, nanotechnology, sustainable IT, collaboration, business optimization, cloud computing, exascale computing and information management.

The Challenge –

To develop an operating system for MCS (Multi Computer System) – a high performance multiprocessor system that is based on the NUMA (Non Uniform Memory Access) architecture.

The Solutions – 

Calsoft handled the end to end product development life cycle from Analysis to Testing. The key components we implemented are listed below.

 1. Global (Cluster) File System (GFS) – 
Calsoft started with the Veritas Cluster File System and re-implemented three components to take advantage of shared memory. Our major challenge was to make all shared memory changes in a transactional manner. One component we re- implemented is the Distributed Lock Manager. The GLM provides synchronization services to access the shared objects like files in a distributed file system. Another component is the Virtual Disk Driver. It provides a single disk level cache for a shared disk in the MCS environment thus giving performance gain in data access. It is shared across multiple nodes as the cache  resides in global shared memory. Another advantage of this cache is its size is not limited to the NT virtual address space. The third component is a shared page cache in global shared memory.
 2. Inter-node communications module –
Another major component we contributed in the NUMA OS project was the networking component. We built a fast inter-node communications module using global shared memory and a single global IP address facility over the multiple NIC cards. This involved writing an NDIS intermediate driver for Windows NT.
 3. NT synchronization –
We also implemented recoverable spinlocks with NT semantics using global shared memory. This was the lowermost synchronization primitive of this OS, and therefore very critical.
The Benefits 
  • Increase the performance of an SMP and the availability of cluster
  • Extend the scalability and availability of the system beyond the limitation of SMP architecture
  • Provide a single system image for applications while having each MCS node run a different Windows NT kernel for high availability reasons
  • Make the operating system co-exist with Windows NT