Moving Node-RED to a monorepo with multiple modules

21 Mar 2019

How we tore apart the internals of Node-RED and glued it back together without users knowing

There are two logical parts of Node-RED; the runtime where flows run and the editor where flows are edited. Ever since the start of the project, these two parts have been bundled together in a single blob of npm module.

With the 0.20 release we’ve just published, the internals of Node-RED have now been split apart into 6 separate npm modules, along with the original node-red module that now has the task of pulling those modules back together so the user doesn’t know what we’ve done.

This post describes how we went about doing that and some of the challenges we faced along the way. If you want to see it for yourself, the code is here.

This blog post was presented at LNUG in September 2019.

One-vs-Many repositories.

When I started looking at how to structure the code to support this approach, I had to decide whether to keep all the code in one repo or to split it into one repo per module.

Splitting it out would have made it clear what code belongs to each module and make it easier for new developers to follow the structure.

But it would also greatly increase the administrative burden; multiple repositories to manage, with multiple issue lists and the need to carefully co-ordinate pull-requests when a new feature spanned multiple modules.

Keeping the code in one place made the most sense. The question was then how to do that in practice.

Tooling

I looked around at other projects that maintain multiple modules in a single repository. There seemed to be a split between ones that chose to use established tooling, such as Lerna and those that rolled their own solution.

Not wanting to needlessly reinvent the wheel, I spent some time playing with Lerna.

Lerna is a tool that optimizes the workflow around managing multi-package repositories with git and npm.

The problem I found was Lerna has grown over time and does a lot. That isn’t necessarily a bad thing, but I found it hard to visualise how we’d migrate into using it. I got too caught up bouncing between the different options it provides without settling on one approach.

I also found this post from Nolan Lawson on why PouchDB moved away from Lerna that gave some hands-on perspective - albeit from a while ago now.

Ultimately I decided I wanted to understand the code structure and consequences of the split, rather than instant delegate that to the tooling. It wouldn’t preclude us from adopting Lerna in the future - but we’d be able to do that better informed.

Laying out the code

The existing code structure was already split between the node.js based runtime and the browser-based editor:

├── editor   // All editor src and resources
├── nodes    // The default core nodes
├── red      // The node.js runtime code
└── test     // All test material

The primary target for repackaging was all of the node.js code under the red directory. The code was already reasonable well componentised under there, but it wasn’t perfect. It was littered with require statements with relative paths that made assumptions about where particular files were.

The main challenge was figuring out a code layout that would allow the require statements to be updated to the new module structure, whilst still just working when run in the development environment.

This is where I came across an approach linked to in Nolan’s post - the “Alle” model.

First a quick detour into how node does module loading. When you call require with a relative path, node loads that file directly. For example, given a pair of files in adjacent directories:

.
├── a
│   └── index.js
└── b
    └── index.js

The code in a/index.js can use the following code to load b/index.js:

const moduleB = require("../b/index.js");

Now, lets say a and b are properly formed npm modules - so they include a package.json file. Rather than require using a relative path, we want to require using the name of the module:

const moduleB = require("b");

When you pass the name of a module to require, node will check the current directory for a node_modules directory and look in there for a module with that name. If it doesn’t find one, it then checks the parent directory for node_modules and so on until it reaches the root of the filesystem.

We could take advantage of that in layout out the code - by adding a node_modules directory in the structure:

.
└── node_modules
    ├── a
    │   └── index.js
    └── b
        └── index.js

Now, when module a does require("b"), node will search up the directory structure find the node_modules directory and then find module b in there.

Putting this into practice led to a structure of:

├── packages
│   └── node_modules
│       ├── @node-red
│       │   ├── editor-api
│       │   ├── editor-client
│       │   ├── nodes
│       │   ├── registry
│       │   ├── runtime
│       │   └── util
│       └── node-red
└── test
    ├── editor
    ├── node_modules
    │   └── nr-test-utils
    ├── nodes
    └── unit
        ├── @node-red
        │   ├── editor-api
        │   ├── registry
        │   ├── runtime
        │   └── util
        └── node-red
            └── lib

You can see the seven new modules under packages/node_modules directory. They can now require each other just as they will when properly installed with npm.

The test material was also restructured to match the layout. You may also spot the same trick was used there to make a module called nr-test-utils available to the test material.

This module provides two functions: require and resolve. They can be used by the test material to require a particular file from the source tree without having to hardcode the relative path from the test material tree into the source tree.

Getting GitHub to not ignore node_modules

One downside of this approach is that many IDEs are told to ignore node_modules directories as they don’t typically contain code a developer is expected to edit.

Adding some rules to .gitignore to not ignore these directories seemed to fix it for my preferred editor, atom.

node_modules
!packages/node_modules
!test/**/node_modules

We also found that GitHub would not generate diffs when showing changes to any files under those directories, so we had to add .gitattributes file containing the following:

/packages/node_modules/** linguist-generated=false

That works on the desktop view, but the mobile view still suppresses diffs - not found a solution for that yet.

Managing dependencies

Each of the module directories has its own package.json listing its dependencies as normal. There is also a package.json file at the top level of the project that lists all dependencies (including development dependencies). This means we don’t have to run npm install in each module directory - in fact, we actively avoid doing that because those dependencies include references to our other modules that npm won’t be able to install itself.

This does mean there is an overhead around managing the dependencies.

Any new dependency needs adding in two places; the top-level package.json file so it gets installed in the development environment and the module’s own package.json file so it gets installed when the published module is installed.

As soon as you have the same piece of information in two places, you raise the risk of them getting out of sync.

The ‘Alle’ method I linked to earlier talks about automating the generation of the individual package.json files - something we haven’t adopted.

Instead, to help manage this, a new script was added, scripts/verify-package-dependencies.js that checks that every dependency listed in individual module package.json files is also listed in the top-level package.json and that the version specifier matches.

The default set of tests now includes this check, so a build will fail if a mismatch is found. The script can also be run with --fix to automatically update the versions in the module package.json file to match the top level one.

The one scenario this doesn’t catch is when a new dependency is added to the top-level file, but not to the individual module. The unit tests will still pass because the module is installed at the top level - but the published module will be missing it. We’ll have to be careful around that until we plug the gap.

Managing versions

Another important design decision was how to manage the version numbers of the individual modules. For example, if a fix was needed in one module would we publish a new version of just that module, or would we bump the version of all the modules.

npm makes it easy to take either approach. If we wanted to be able to publish modules individually, we could set the dependency version numbers to 0.20.x so they would always get the latest version available in a given minor release. The alternative would be to set them to a specific version to tie all the modules at a particular level.

The problem with not keeping the versions closely aligned is what happens when a user hits a problem. The main goal of this entire refactoring was to hide the internal details of the split.

If a user hits a problem today, they can tell us what version of Node-RED they have installed with a single number. It simply wouldn’t meet our goal if they had to provide the versions of all seven modules so would know exactly what they had installed. It would also be confusing to say they need to update their install to get a fix but the node-red module version doesn’t change.

So for now we’re going to keep all the modules in sync.

To help with that task, another script was added, scripts/set-package-version.js that can be used to update all of the individual package.json files with the right version number.

Building a release

We already had a build task, grunt release, that took the source tree and built the module that can be published to npm along with a zip file to upload to the GitHub release.

That task has been updated to now build the 7 individual modules, this time packed as tgz files.

.dist
├── modules
│   ├── node-red-0.20.3.tgz
│   ├── node-red-editor-api-0.20.3.tgz
│   ├── node-red-editor-client-0.20.3.tgz
│   ├── node-red-nodes-0.20.3.tgz
│   ├── node-red-registry-0.20.3.tgz
│   ├── node-red-runtime-0.20.3.tgz
│   └── node-red-util-0.20.3.tgz
└── node-red-0.20.3.zip

Those tgz files can be published one at a time to npm - ensuring the node-red module is done last. Currently that’s a manual task and something that is ripe for automating in the future.

What next?

On reflection, having published the 0.20 release and a handful of subsequent maintenance releases, I’m pretty happy with the approach we took. Aside from a few more bits of task automation we could add, I think we have created a project structure that is well defined and easy to work with.

The main success has been that we did this without a single issue from a user related to the module structure or packaging. Users have no idea we made these changes - unless they read the release notes where we keep talking about it.

knolleary