Building Node JS Scalable Applications: Best Practices, Tools, and Patterns for Optimal Performance

In an increasingly digital world, availing Node JS development services to build scalable, high-performing applications is not just an advantage, but a necessity. This article delves into the world of NodeJS, a runtime environment favored by numerous developers globally, delivering best practices, key tools, and strategic patterns to boost Node JS scalable apps’ performance. Whether you’re a novice dipping your toes into Node JS development services or a seasoned developer aiming to refine your application, this article will guide you through the fundamental steps to transform your NodeJS application from simply operational to outstanding. Harness the power of these insights and strategies to construct Node JS scalable apps that do more than just meet your performance expectations – they surpass them.

What is NodeJS?

NodeJs is a JavaScript runtime built on Chrome’s V8 JavaScript engine that uses an event-driven, non-blocking I/O model. That is to say, with NodeJs developers can execute Javascript code on the server-side, which enables Javascript developers to write both front end back end applications. The nature of having a single programming language across the stack is only one of the many selling points of NodeJs. Some of the others are:

NodeJs is asynchronous and event-driven, which means that when an operation is being conducted, if that operation is taking a long time to complete, the application can continue to run other operations while waiting for the first one to finish. This feature makes NodeJs applications efficient and fast.
Since it’s built on the V8 Javascript engine, NodeJs is very fast in code execution.
NodeJs has a large community of developers. That means that there are lots of resources to learn from when one is stuck, and lots of libraries to use to make development easier.
NodeJs is cross-platform. It can run on Windows, Linux, and Mac OS. And since it is basically just Javascript but on the server-side, it is easy to learn, use, and find developers for. It is not difficult to come up with a team that can write NodeJs, React Native, and ReactJS applications to cover all parts of the development process.
NodeJs is lightweight. It is not resource intensive, and it is easy to scale. In backend development, scaling means that an application can handle more requests per second without crashing or slowing down, thus making the user experience smoother. Since scaling is the main focus of this article, we will discuss it in more detail.

Understanding Event Loop in NodeJS

Before getting into scaling, let’s briefly look at what the event loop is. The event loop is a cardinal concept in NodeJs development. It is a single-threaded mechanism that runs incessantly and manages the execution of tasks such as reading from files, querying databases, or making network requests in an asynchronous manner in a NodeJs application. Instead of waiting for a task to be completed, NodeJs register callback functions to be executed once the operation at hand is finished. This non-blocking nature of NodeJs makes it very fast and highly scalable if the correct techniques are used.

What is Scaling?

Scaling, in the simplest sense, is the application’s capacity to handle many requests per second at once. There are two more terms in scaling terminology: Vertical and horizontal scaling. Vertical scaling, which is also known as scaling up, refers to the process where an application’s capacity to handle requests is enhanced by upgrading its resources, such as adding more RAM, increasing CPU, etc. Horizontal scaling, on the other hand, known as scaling out, is the process where more instances are added to the server.

Scaling in NodeJS with Multiple Instances

First of all, let us ask the question: Why scale at all? Simply, in our era of lots of users, an application that cannot handle all the incoming requests from all its users cannot expect to stay in the game.

And as backend developers, we need to make sure that our application is fast, responsive, and safe. Scaling helps developers to come up with more improved performance as they can distribute the workload across multiple instances or nodes, handle more traffic, and create fault tolerance, that is the process of having multiple instances so that if one instance fails, the other instances can take over and keep the Node JS application running.

Now, while some other programming languages like Go can handle concurrent requests by default, NodeJs, due to its single-threaded nature, handles operations a bit differently. Therefore, the techniques used to scale vary too.

NodeJs is fast. Very fast. But due to its single-threaded nature, it may fail to cope with multi-threading since it can only execute one thread at a time. Having too many requests at the same time can result in event loop blockage.

How to Scale Node JS Applications

There are different methods to scale Node.JS applications. Let us look some of them briefly, such as microservices architecture, cluster module, and database optimization.

Microservices Architecture

Node JS microservices architecture is the process of developing software that consists of loosely coupled, independent entities. Each service is a different Node JS application that’s developed and deployed, and they can communicate with each via HTTP requests, or messaging services like RabbitMQ or Apache Kafka. This method of developing software, instead of tucking everything into one monorepo, allows developers to focus on each service independently and implement the necessary changes without directly affecting the others. Although it should be noted there, the pros of microservices are a debated concept, and it should be used with caution.

To understand microservices architecture, let’s look at a hypothetical e-commerce application example. This app could be broken down into microservices like Product, Cart, and Order. Each microservice is developed and deployed independently.

For instance, the Product microservice might be responsible for managing product data in the system. It would provide CRUD endpoints and expose an HTTP API that other microservices can use to interact with product information.

The Cart microservice could handle all cart management capabilities like adding items, changing quantities, calculating totals, etc. It too would expose an API for other microservices to build carts and update them. And the Order microservice could enable order creation, payment processing, status tracking, and more. It would provide APIs for cart checkout and order lookup functions.

By separating concerns into standalone, decoupled microservices, the application is easier to scale and maintain. Each microservice focuses on a specific domain capability while still working together to deliver the full application experience.

For example, the Cart microservice would handle all shopping cart functionality – adding items, updating quantities, calculating totals, etc. It would manage cart data in its own database.

The Order microservice would provide endpoints for placing orders, looking up order history, and integrating the Cart and Product microservices. It serves as a bridge between the cart and product data/functionality.

This way, each microservice team can focus on their specific part of the application. The Cart team manages cart capabilities, the Product team handles product data and APIs, and the Order team deals with order processing and integration.

In theory, this separation of concerns by domain accelerates development by dividing work and reducing overlapping functionality across teams. It also promotes independence and loose coupling between services. Each microservice relies less on other parts of the system, reducing side effects from changes and enhancing reliability.

Caching

Caching is a technique used to enhance performance and scalability of Node.js apps by storing frequently accessed data temporarily for fast lookup.

Consider this example: We need to build an app that fetches and displays museum data – images, titles, descriptions, etc. in a grid. There is also pagination to let users view different pages of data.

Each paginated request might fetch 20 items from the museum’s public API. Being a public API, it likely has rate limiting to prevent abuse. If we request the data from the API on every page change, we will quickly hit those rate limits.

Instead, we can use caching to avoid redundant API calls. When the first page of data is requested, we cache it locally. On subsequent page visits, we first check if the data is in the cache. If so, we return the cached data to avoid exceeding rate limits.

Caching provides fast lookup of already-fetched data. For public APIs or any data that doesn’t change often, caching can vastly improve performance and reduce costs/limits on backend services.

One great way to solve this issue is to cache the data by using a caching service like Redis. It works like this: We get the data from the API for page number 1, and store it in Redis, in the memory.

Then, when the user changes the page to page 2, we send a request to the museum database as usual.

But caching really demonstrates its value when a user navigates back to a page already visited. For example, when the user returns to page 1 after viewing other pages, instead of sending a fresh API request, we first check if the data for page 1 exists in the cache. If it does, we return the cached data immediately, avoiding an unnecessary API call.

Only if the cache doesn’t contain the data do we make the API request, store the response in the cache, and return it to the user. This way, we reduce duplicate requests to the API as users revisit pages. By serving from cache when possible, we improve performance and stay within API rate limits. The cache acts as a short-term data store, minimizing calls to the backend.

Practice: Cluster Module, Multithreading & Worker Processes

Theory without practice is only half the job done. In this section, we will look at some of the techniques we can use to scale NodeJs applications: Cluster module and multiple-threading. We will first use NodeJS’ built-in cluster module, and then once we understand how it works, we will use the process manager, pm2 package, to make things easier. Then, we will change the example a bit and use the worker threads module to create multiple threads.

Cluster Module

Now, since NodeJs is single-threaded, no matter how many cores you have, it’ll only use a single core of your CPU. This is totally okay for input/output operations, but if the code is CPU intensive, the Node app might end up with performance issues. To solve this problem, we can use the cluster module. This module allows us to create child processes that share the same server port as the parent process.

This way, we can take advantage of all the cores of the CPU. To understand what that means and how it works, let us create a simple NodeJs application that will serve as our example.

We will start by creating a new folder named nodeJs-scaling and inside of that folder, we will create a file called no-cluster.js. Inside of that file, we will write the following code snippet:

const http = require("http");

const server = http.createServer((req, res) => {
if (req.url === "/") {
res.writeHead(200, { "content-type": "text/html" });
res.end("Home Page");
} else if (req.url === "/slow-page") {
res.writeHead(200, { "content-type": "text/html" });
 // simulate a slow page
for (let i = 0; i < 9000000000; i++) {
res.write("Slow Page");
}

res.end(); // Send the response after the loop completes
}
});

server.listen(5000, () => {
console.log("Server listening on port : 5000....");
});

Here, we start by importing NodeJs’ built-in HTTP module. We use it to create a server that has two endpoints, a base endpoint, and a slow-page endpoint. What we are aiming for with this structure is that when we go to the base endpoint, it will run and open the page as usual. But, as you can see, because of the for loop that’ll run once we go to the slow-page endpoint, the page will take a long time to load. While this is a simple example, it is a great way to understand how the process works.

Now, if we start the server by running node cluster.js and then sent a request to the base endpoint via CURL, or just open the page on a browser, it will load pretty quickly. An example CURL request is curl -i http://localhost:5000/. Now, if we did the same for curl -i http://localhost:5000/slow-page we will realize that it takes a long time, and even it might end up with an error. This is because the event loop is blocked by the for loop, and it cannot handle any other requests until the loop is completed. Now, there are a couple of ways to solve this problem. We will first start by using the built-in cluster module, and then use a handy library called PM2.

Built-in cluster module

Now let’s create a new file called cluster.js in the same directory and write the following snippet inside of it:

const cluster = require("cluster");
const os = require("os");
const http = require("http");

// Check if the current process is the master process
if (cluster.isMaster) {
 // Get the number of CPUs
const cpus = os.cpus().length;
console.log(`${cpus} CPUs`);
} else {
console.log("Worker process" + process.pid);
}

Here, we start by importing the cluster, operating system, and http modules.

What we are doing next is checking whether the process is the master cluster or not, if so, we’re logging the CPU count.

This machine has 6, it would be different for you depending on your machine. When we run node cluster.js we should get a response like “6 CPUs”. Now, let’s modify the code a bit:

const cluster = require("cluster");
const os = require("os");
const http = require("http");

// Check if the current process is the master process
if (cluster.isMaster) {
 // Get the number of CPUs
const cpus = os.cpus().length;

console.log(`Forking for ${cpus} CPUs`);
console.log(`Master process ${process.pid} is running`);

 // Fork the process for each CPU
for (let i = 0; i < cpus; i++) {
cluster.fork();
}
} else {
console.log("Worker process" + process.pid);
const server = http.createServer((req, res) => {
if (req.url === "/") {
res.writeHead(200, { "content-type": "text/html" });
res.end("Home Page");
} else if (req.url === "/slow-page") {

res.writeHead(200, { "content-type": "text/html" });

 // simulate a slow page
for (let i = 0; i < 1000000000; i++) {
res.write("Slow Page"); // Use res.write instead of res.end inside the loop
}

res.end(); // Send the response after the loop completes
}
});

server.listen(5000, () => {
console.log("Server listening on port : 5000....");
});
}

In this updated version, we’re forking the process for each CPU. We could have written cluster.fork() maximum amount of 6 times too (as this is the CPU count of the machine we are using, it would be different for you).

There’s a catch here: We should not succumb to the tantalizing idea of creating more forks than the number of CPUs as this will create performance issues instead of solving them. So, what we are doing is, we are forking the process for each CPU via a for loop.

Now, if we run node cluster.js we should get a response like this:

6 CPUs
Master process 39340 is running
Worker process39347
Worker process39348
Worker process39349
Server listening on port : 5000....
Worker process39355
Server listening on port : 5000....
Server listening on port : 5000....
Worker process39367
Worker process39356
Server listening on port : 5000....
Server listening on port : 5000....
Server listening on port : 5000....

As you can see all of those processes have a different id. Now, if we tried to first open the slow-page endpoint and then the base endpoint, we should see that instead of waiting for the long for loop to complete, we are getting a faster response from the base endpoint.

This is because the slow-page endpoint is being handled by a different process.

PM2 package

Instead of working with the cluster module itself, we can use a third-party package like pm2. As we will use it on the terminal, let us install it globally by running sudo npm i -g pm2. Now, let’s create a new file in the same directory, called no-cluster.js, and populate it with the following code:

const http = require("http");

const server = http.createServer((req, res) => {
if (req.url === "/") {
res.writeHead(200, { "content-type": "text/html" });
res.end("Home Page");
} else if (req.url === "/slow-page") {
res.writeHead(200, { "content-type": "text/html" });

 // simulate a slow page
for (let i = 0; i < 9000000000; i++) {
res.write("Slow Page"); // Use res.write instead of res.end inside the loop
}

res.end(); // Send the response after the loop completes
}
});

server.listen(5000, () => {
console.log("Server listening on port : 5000....");
});

Now that we’ve learnt how to get multiple processes running, let’s learn how to create multiple threads.

Multiple Threads

While the cluster module allows us to run multiple instances of NodeJs that can distribute workloads, the worker_threads module enables us to run multiple application threads within a single NodeJs instance.

So the Javascript code will run in parallel. We should note here that code executed in a worker thread runs in a separate child process, preventing it from blocking our main application.

Let us again see this process in action. Let’s create a new file called main-thread.js and add the following code:

const http = require("http");
const { Worker } = require("worker_threads");

const server = http.createServer((req, res) => {
if (req.url === "/") {
res.writeHead(200, { "content-type": "text/html" });
res.end("Home Page");
} else if (req.url === "/slow-page") {
 // Create a new worker
const worker = new Worker("./worker-thread.js");
worker.on("message", (j) => {
 res.writeHead(200, { "content-type": "text/html" });

res.end("slow page" + j); // Send the response after the loop completes
});
}
});

server.listen(5000, () => {
console.log("Server listening on port : 8000....");
});

Let’s also create a second file named worker-thread.js and add the following code:

const { parentPort } = require("worker_threads");
// simulate a slow page
let j = 0;
for (let i = 0; i < 1000000000; i++) {
 //  res.write("Slow Page"); // Use res.write instead of res.end inside the loop
j++;
}

parentPort.postMessage(j);

Now what is going on here? In the first file, we’re destructuring the Worker class from the worker_threads module.

With worker.on plus a fallback function, we can listen to the worker-thread.js file that posts its message to its parent, which is main.thread.js file. This method also helps us run parallel code in NodeJs.

Conclusion

In this tutorial, we’ve discussed different approaches on scaling NodeJs applications such as microservices architecture, in memory caching, using the cluster module, and multi-threading.. We’ve also gone over two distinct examples to show how these approaches work in practice. It’s always crucial to work with a reliable outsourcing NodeJS development partner or to hire NodeJS developers who are competent and capable of implementing any necessary functionality seamlessly.

If you enjoyed this article, check out our other guides below;

Conclusion

How can I utilize the Cluster module in Node.js to improve scalability?

The Cluster module in Node.js allows you to create child processes (workers) that run simultaneously and share the same server port. This leverages the full power of multiple cores on the same machine for processing all the requests in parallel(or at least a large number of them), which can significantly improve the scalability of your Node.js application.

What is the role of PM2 in Node.js scalability, and how is it different from the built-in process manager?

PM2 is a powerful process manager for Node.js that provides several features beyond the built-in Cluster module, like automatic restarts on crashes, zero-downtime reloads, and centralized logging. It also simplifies managing clusters by providing an easy-to-use command-line interface. These features make PM2 a popular choice for managing and scaling production Node.js applications.

How does in-memory caching improve the performance and scalability of a Node.js web application?

In-memory caching, such as using Redis, stores frequently accessed data in memory, reducing the need for expensive database operations. This can significantly boost the performance and scalability of your Node.js web application and when coupled with a load balancer, should see significant performance improvements. By serving cached data and applying a load balancer, you can handle more requests faster, improving user experience and allowing your application to scale more effectively to handle high loads. However, it’s crucial to implement a robust cache invalidation strategy to ensure data consistency.