I’ve written a little bit about the overall goals of the Raiden project in the past. Now that I’ve spent more time thinking about it (and especially because I’ve spent more time working on it) I have a more crystallized vision of what lies ahead.
The purpose of Raiden Mark I was to construct and measure a traditional commodity supercomputer. The goal was to learn what goes into building, operating and programming one of these systems and measure the baseline performance of a system scaled similarly to what is planned for Mark II (to allow for a more apples-to-apples comparison between architectures down the road).
Most of what I set out to do with Mark I is now complete. Many things did not go as planned and I learned a lot about what the practical challenges are to building & operating this class of supercomputer. Admittedly the results of my performance tests didn’t measure-up to my expectations and I would like to continue tuning the machine to see if more potential is there, but for the purpose of establishing a point-of-comparison what has been accomplished is sufficient.
In simplest terms Mark II is designed to replicate Mark I using commodity ARM components. Mark II will use the same interconnect (gigabit Ethernet) and physical topology (an array of stand-alone computers running a complete operating system, etc.) but in place of Intel-based server hardware, Mark II will use inexpensive ARM-based single-board computers (SBC’s).
The goal of building Mark II is to understand the performance difference between large, expensive, loud and power-hungry Intel-based server hardware and small, inexpensive and low-power ARM computers. Once this difference is understood, a test will be conducted to see if Mark II’s hardware can be scaled to match the performance of Mark I without diminishing Mark II’s advantages (lower cost, lower power, smaller size, etc.).
A secondary goal of Mark II is to collect and potentially develop the software components necessary to construct a usable ARM-based supercomputer. Minimally, this will result in a “recipe” which uses existing operating systems, system modules and development tools to produce a software package that can be deployed and maintained on a Mark II-compatible system without specialized HPC knowledge and minimal ongoing maintenance. Maximally it may include new operating systems and components along with development tools which make developing high-performance computing applications more accessible to a wider range of programmers.
Most of the components for Mark II have been selected and barring any new developments (or component end-of-lives), Mark II hardware will consist of the following components:
- 8x PINE A64 single-board computers (dual-core, 1GB, quad-core 2GB preferred but would increase build cost by a third and supply is limited)
- 1x WiFi interface for front-end node (Ethernet preferred but no dual-ethernet available on the PINE boards)
- 8x SD cards (whatever size has the best price-point at time of purchase)
- 1x commodity 8-port gigabit Ethernet switch
- 8x gigabit Ethernet cables
- 1x power supply (single device, 5/12VDC)
- 1x cabinet
- Misc. wiring, connectors, switches and indicator lamps
There are three major differences between Mark III and previous iterations of Raiden:
- Custom electronics
The first big change is the addition of FPGA to support dynamic, software-defined-hardware acceleration. My current direction with this is to use SOC processors which combine ARM cores with some FPGA fabric on the same chip (something along the lines of Zynq by Xilinx).
The second change is moving away from an Ethernet-based interconnect.
Ethernet has a number of advantages which make it a natural choice for small and low-cost systems. Gigabit Ethernet switches are inexpensive and Gigabit Ethernet interfaces are available on several SBC devices.
However using Ethernet for larger clusters has significant limitations. Clusters larger than 8 nodes require rack-mount switches which limit the minimum size of the machine. While Gigabit Ethernet interfaces are fairly common, the faster forms are less common and considerably more expensive. Finally, while Ethernet is entirely suitable for moving files from place to place, it doesn’t provide any functionality found in interconnects designed for supercomputing applications.
Exactly what will be used in place of Ethernet is not yet set in stone, but I’m currently leaning toward InfiniBand (perhaps implemented in some of the FPGA fabric discussed earlier).
Finally, Mark III is set apart from previous systems because it will require the design of custom electronics to combine the features above. Exactly what form this will take will decide on final selections of processors and interconnect, but it will likely require custom circuit board design and fabrication. This will be the first iteration that isn’t built from off-the-shelf components, but it may still incorporate some of these components if they are available (for example, one implementation might be a backplane for off-the-shelf SBCs containing ARM+FPGA SOCs, if such a board is available).
As each generation builds on the lessons learned from the previous one, so it’s difficult to estimate the functionality and chronological timeline for the completion of each iteration. For example, the hardware and O/S for Mark I came together much faster than expected while the software (specifically the HPL benchmark) took far longer to get working. I expect similar unexpected surges and delays in building future generations of the machine.
I would like to construct Mark II this winter and retire Mark I before the end of 2017. Most of the design work is complete so this will depend primarily on whether or not I can pull-together the time and funding to carry-out the build. My preference would be to keep Mark I operational until Mark II has met (or exceeded) the performance of Mark I, but various constraints may rule that out.
Once Mark II is operational I’ll begin formal work on Mark III. At this point I’ll make my initial estimates as to when I think Mark III will being to take shape and when it might be operational. The amount of knowledge and effort that will go into building Mark III will be exponentially greater than what will be needed for Mark II, and from the perspective of today it appears to be a daunting task, but the same was true of Mark II from a pre-Mark I vantage point so I assume that things will look a lot more realistic once I have Mark II under my belt.
Along the way I’m hoping to garner interest from others in this work. I don’t expect a project like this to raise a large or mainstream audience, but having a few other people who are interested in seeing it succeed (especially people who would like to put it to work) makes it much easier to justify the work and frankly makes it more excited to reach each milestone.