While reading a recent Mozilla blog post it occurred to me that several of the challenges described in the post overlap with common challenges faced by high-performance computing clusters. Specifically the problem of granularity and distributing processing load across multiple processing units.
I’ve largely ignored WebAssembly for a number of reasons but as it turns out, some of my assumptions were incorrect.
After reading this Mozilla post, I spent some time learning about WebAssembly, how it’s structured and how it’s written. As it turns out there are a number of reasons why it might be of use in high-performance computing.
Breaking a program down into the right size is one of the most constant (and difficult) challenges to writing fast, efficient software for supercomputers. Too big and the program can’t utilize all available processing resources; too small and the overhead of communicating between the pieces. Apart from choosing the right size it’s often harder for a programmer to decide how a program should be broken-apart.
The holy grail of course is to let the computer make these decisions for you, and attempts at this have enjoyed varying degrees of success. However in many cases this is a task left to the programmer and as such excludes many programmers from creating highly-parallel programs (due to the specialized knowledge and experience needed to develop these skills).
Mozilla’s WebAssembly compiler takes a swing at this goal in order to execute browser code across multiple cpu cores. This happens automatically and as I understand it (I still have more to learn) does so at the function level. This is a clever optimization, and it happens automatically without any direct input from the programmer.
Of course optimizing the utilization of cores in a single processor is a different matter from optimizing them across many nodes in a large parallel machine (which undoubtedly has a slower interconnect and no real shared memory) but there may be some lessons here on how to build a runtime/compiler that makes writing efficient parallel programs accessible to programmers with experience in more traditional computing environments.
Code + Data
WebAssembly defines a “binary” format that bundles both code and data. Keeping data close to where it will be processed is a cornerstone of high-performance clusters and there may be some potential to leverage this aspect of WebAssembly’s “container” format to these ends.
Streaming compilation and iterative optimization
Another clever optimization presented in the post is the ability to begin compiling the assembly code before it’s completely transmitted across the network. Here again we have a traditional cluster bottleneck (the interconnect) in common with web applications. It might seem like compiling this late in the process might not be an issue for high-performance applications (compilation happening long before runtime) however if we can optimize this step it opens some interesting doors for heterogeneous systems (for example, delivering the same assembly to CPU/GPU/etc. nodes and compiling a hardware-specific binary at runtime).
Along with streaming compilation a non-blocking, multi-pass compilation is performed allowing the browser to start running a faster to compile (but less optimized) binary while off-loading optimization to a second thread and switching to the optimized version as soon as its ready. It’s not hard to imagine how this multi-layered approach might apply to high-performance computing applications as well.
When worlds collide
I haven’t invested enough time yet to say whether or not any of these examples warrant the potential trade-offs of using something like a web development programming environment for high-performance computing, but I think it warrants serious consideration.
Aside from the potentially “hard” benefits described above, doing so might help bridge skills gap between “mainstream” developers and specialists whose skills and experience allow them to write efficient supercomputer applications. This not only increases the pool of talent available to fill the existing need for high-performance application developers, but creates an environment where programmers with new and interesting ideas can utilize supercomputers to realize new and innovative applications.