16 minutes read
In this article, we will build a Node.js addon that uses HPX to execute certain algorithms. As you might have already seen on my blog, I have written some articles about HPX, a parallelization runtime & framework written in C++. You might also be asking yourself, why on Earth should anyone use HPX inside the Node.js runtime? This is a very reasonable question. Well, the answer is pretty simple: because I wanted to try it out. The reason I wanted to try it out is that I stumbled upon a GitHub repository that uses HPX with Rust. So I thought, I could steal the idea by applying the Principle Of Least Power as defined by Tim Berners-Lee. This, in a way, was inevitable as Atwood’s Law postulates that any application that can be written in JavaScript will eventually be written in JavaScript. All it takes is a mediocre coder with too much free time.
And although I don’t think there will be that many use-cases for HPX on Node.js, I simply wanted to try it out just for the sake of learning. In fact, building an addon in C++ to access a highly parallelized runtime like HPX is really problematic (and this I was about to find out very soon) as it involves dealing with all sorts of tricky problems: context switching, ensuring that Node.js’ main thread remains unblocked, taking care of predicate functions, massive data copies that slow everything down, and other gremlins. But if there is one good thing that comes out of this project, then surely it’s the learning experience. Knowing how to write a Node.js addon in itself is a valuable skill. And knowing how to set up, build, deploy, and integrate a completely “foreign” framework into Node.js to make it useful inside a simple JS program is also a good lesson in software integration. But before we start with the code, here are a few links for those who prefer to simply try out things and come back later to read the article (if at all).
HPX Builder Repository contains the prebuilt HPX runtime. This image can be used directly and is independent of anything else that follows in this article (Node.js, addon development, JavaScript, etc.). It’s just the framework itself. There is also demo code written in C++ so you can try it out directly. Just follow the instructions in the README. I recommend that people who are new to HPX use this image only for a while and come back later to write the Node.js addon.
HPX Addon Builder Repository contains the image to build the addon. Here you will find the complete source code we’re discussing in this article. You’ll also find documents that describe its structure, the classes, the solutions developed to overcome certain obstacles, etc. You can use this image to build the addon and then copy it to your own JS projects, if you prefer.
If you don’t want to use the addon in your own project, there is a Dockerfile in the Addon Repository called app.Dockerfile. The image built with it uses the build artifacts from the Addon Builder Image to run a demo JS file that utilizes the addon. There is also a benchmark JS file and the complete test suite based on Mocha. Just use the regular “npm start”, “npm run benchmark”, and “npm test” commands. For more information on how to use these scripts and anything else regarding the addon, you can read the documents inside the “docs” subfolder in the Addon Repository.
These three images (HPX Builder, Addon Builder, App Builder) depend on each other. First, the HPX must be built, then the Addon, and ultimately the App itself that references both as it depends on the HPX libraries as well as the Addon binary itself.
Writing the Addon
The main addon sources are located in src/addon and contain the declarations and definitions of the exposed functions. Node.js addons use certain macros to expose functions to client JavaScript code. Below is the export call that makes the addon’s lifecycle and HPX functions visible.
We have defined the function InitAddon that receives a Node-API object which we use to declare functions like “sort”, “find”, “equal”, etc., as exported. You can think of this object as the C++ way of using module.exports in JavaScript code. Each time exports.Set is being called, an assignment between the exported name of the function and the actual C++ code is happening. The call Napi::Function::New creates a new JS function that uses the original C++ implementation inside the current execution environment. The return of the exports object establishes the interface to the addon. The macro NODE_API_MODULE registers the InitAddon function, which will be called each time Node.js loads the addon file, for example by executing the require function.
Using HPX Algorithmic Functions
Just like we split between JS exports and the addon’s C++ functions, we also split between those C++ functions and the actual algorithmic functions coming from the HPX library. There are, of course, many more such functions in HPX, so I have imported only a small subset containing 14 functions. HPX is really big, and it not only offers algorithms but also its own component model. In this article, however, I won’t be talking about that. If interested, check my other articles. But sure, it would be really cool if we could also define “components” in JS and hand them over to HPX. I am not sure if this could be possible (maybe it could), but as this addon is already complex (for me at least), I didn’t want to open that door. Maybe in the future, who knows? So, the algorithmic functions from HPX are located in a separate wrapper file called hpx_wrapper.cpp and look like this:
If we ignore the additional complexity regarding the runtime policy selection, the usage is straightforward. We take some arguments, then invoke hpx::async and within it run the respective algorithm asynchronously. Ultimately, an hpx::future, which is basically a “handle” to a possible return value, will be produced. Other function wrappers in hpx_wrapper.cpp are built the same way, but I am not sure if there is a more elegant way to do it. It works, at least. Here is the header file that shows the available algorithms:
Executing the Algorithms in the Addon
Up until now, using the HPX algorithms was more or less “easy”. The really hard part was bringing the C++ code of the addon (and thus the intricacies of Node.js itself) together with the HPX execution environment. As some of you might already know, HPX does not use OS threads but instead uses its own implementation. Therefore, any attempt “to do threading” with HPX while using facilities provided by your host OS is doomed to fail. Another problem is that Node.js runs a single thread that we must not interfere with; otherwise, the whole application would come to a halt. To make the addon work, we must first start the Node.js main thread, then initialize HPX in another thread so that HPX can, in the future, run its own threads within this initial thread, and then ensure that the entire communication and data transfer run without too much context switching. This part was the hardest one to solve for me. I tried different things, like using Node.js non-blocking calls and similar facilities, but the performance was abysmal. The constant context switching and copying of arrays consumed so much time that any potential performance gain from using HPX was rendered irrelevant in the beginning. I needed a way to not only run HPX in a separate thread, which I solved with a dedicated HPX Manager, but also ensure that I don’t use too many function calls when exchanging data with JS client code. Ideally, the data wouldn’t move that often, and function execution inside the addon would only happen when an HPX algorithm has finished its work. The first problem, as I said, was solved with the HPX Manager, which is a singleton that creates a separate OS thread to start HPX by calling hpx::start within it. This approach avoids blocking Node.js’s main thread. hpx::start itself is blocking, and this is what we want because we need to keep it alive as long as HPX’s algorithms are needed by the client JS code. Basically, we must satisfy different expectations simultaneously. So, to ensure HPX can later be shut down, we let it wait for a “shutdown signal” to finalize it. Below is the declaration of the HPX Manager class.
To keep HPX active, we pass the handler function as one of the arguments of hpx::start. This handler is used to keep HPX waiting for the finalization signal. As long as the HPX Manager is not sending it, we can use the HPX runtime. I will not post all of the code as this would only make the whole article unreadable. I think, it’s better to describe what I coded and why. I am not assuming that my solutions always follow the best practices or even the most performant ones, so don’t take them as final. Try to find bugs or break the code. I am sure there will be enough possibilities for both. In any case, the HPX Manager starts the HPX runtime, and now the addon can react to client JS code by executing the exported algorithms. Here is the implementation of “sort” in the addon.cpp file:
The implementation of all exported algorithms follows the same structure:
- Extract input arguments (info.Env).
- Enqueue the execution inside an async block (QueueAsyncWork).
- Pass arguments to HPX function wrappers (in the first lambda).
- Wait for hpx::promise to complete (fut.get() call).
- When finished, receive the result and error object in the second lambda.
- If the error object is not empty, return a failed Promise; otherwise, return a Promise containing the result value.
QueueAsyncWork is a template function that expects the Node::Env and two std::function arguments. With these three arguments, we can safely create an asynchronous environment that can both execute valid Node.js code and use HPX algorithms without negatively affecting their speed. At least, this is what I would like to believe, but you might find some problems that I overlooked. QueueAsyncWork is our bridge between async C++ and async JS. In JS, we do async stuff with Promises, and this is what QueueAsyncWork ultimately returns—a reference to a Napi::Promise::Deferred instance that either holds a result or an error. And just to clarify how Promises work, here is a simple example:
Dealing with Predicates
One of the more problematic things I encountered while coding the addon was the slowness introduced by using predicate functions. That is, when dealing with HPX algorithms that expect not only values but also functions as arguments. Take, for example, the algorithm hpx::count_if whose signature is:
hpx::count_if(ExPolicy &&policy, FwdIter first, FwdIter last, F &&f)
The first three arguments are the policy, the first, and the last iterator. So far so good. But the last one, the predicate function, is where the problems begin because the addon is being used by JS client code, and therefore the predicate functions will be defined in JS, which ultimately leads to the problem that it would be executed for every single element in the range defined by the iterators. This makes the execution extremely slow because the addon would be forced to execute a new Node.js non-blocking function each time, thus introducing a heavy context switch between HPX and Node.js. Imagine processing an array containing a million elements. So, instead of this, I decided to apply a trick that on the JS side creates a “mask array” of the whole original array and returns it to the addon. Inside the addon, we then create a Functor (an object that can be called) which holds the reference to the mask array and pass it to the actual HPX algorithm. The execution is now much faster because HPX doesn’t need to wait for anything and can process the masked array at once, in a batch operation. But for this to succeed, a little bit of “currying” on the JavaScript side had to be used.
We pass the original predicate function to the helper function that returns a batch predicate function that iterates over the complete array and returns a mask representing the original values with 1 or 0 depending on the logic of the original predicate. The returned function can now be passed to the addon. But this time, there’s no need to work element-wise on the C++ side. If we look into the definition of the CountIf implementation, we recognize its structural similarity to non-predicate functions, but there are some differences. The first difference is that we must create a separate ThreadSafeFunction to be executed once in Node.js. The returned value will be the masked array, which we reference by the functor MaskPredicate and pass it to hpx::count_if. The second difference is the usage of Functors, which we will discuss shortly.
The usage in JavaScript is straightforward:
What’s maybe less straightforward is the work being done by the MaskPredicate. As we saw above, this struct contains the operator() that uses the MaskData property to access the currentIndex of the masked array. The type of this index is std::atomic<size_t>, which allows it to be safely accessed in an asynchronous environment. This atomic index ensures that each thread accesses a unique index in the mask array, preventing data races and ensuring accurate counts. This is crucial when dealing with HPX because we don’t manage threads manually. HPX threads are not traditional OS threads, so we must ensure that our predicate function is thread-safe and can be utilized concurrently by as many threads as the machine can handle. Additionally, by precomputing the mask in a single batch call, we minimize the overhead of interacting with JavaScript, thereby enhancing performance when processing large datasets. For a quick test, use the example code below:
Of course, a possible implementation of a similar convert-to-batch-predicate in C++ would be possible, but after some experimentation, I decided that it’s better to keep it in the JS world because this is where addon users write their own predicates. The addon should not be changing them in any way. On the contrary, the addon should only ensure that their execution happens in an async fashion. That’s why we go the extra mile with additional structures like MaskData and MaskPredicate. However, I am still not sure if this is really a good solution.
Using the Addon
I have only used the addon for my own experimentations, but maybe there could be some real-world scenarios where Node.js environments would benefit from the usage of highly-performant runtimes like HPX. In the demo code of this project, a benchmark test is also included, which gave me some interesting insights. First, the trivial algorithms like count, equal, and find don’t benefit from HPX as there is not much complexity to be solved by utilizing multiple cores or threads. Those are algorithms with linear complexity, O(n), and some of them, like find, can even end early. So, not much to be gained from HPX here. Where the addon really shines is when algorithms like copyIf, countIf, sort, sortComp, partialSort, and partialSortComp are used. These algorithms either use predicates or must employ conditional checks. Some of them are O(n log n), like sort and sortComp, which involve comparisons and data rearrangements. Others, like countIf and copyIf, are just O(n) but nevertheless require conditional checks and potentially more memory operations. All these obstacles are much easier to overcome with HPX than with usual JS coding techniques. Below are the benchmark results done on my abysmally weak MacBook Air with 8GB of RAM and 4 threads. I am GPU-poor and thus have no clue how this would work in a more powerful environment. Heck, I don’t even know what a GPU looks like. I’ve never had one in my hand.
Some Useful Console Commands
To quickly try out the whole thing without needing to change anything, I’d recommend using Docker and the available Dockerfiles from the repo. Here’s how you can build the image that contains the addon and demo JS scripts:
docker build -t brakmic/hpx-nodejs-app:latest . -f app.Dockerfile
Run the default start script that will showcase all available algorithms with:
docker run --rm -it brakmic/hpx-nodejs-app:latest
To run the benchmarks, use this command:
docker run --rm -it brakmic/hpx-nodejs-app:latest npm run benchmark
To execute tests, use this command:
docker run --rm -it brakmic/hpx-nodejs-app:latest npm test