Introduction to NATS 2.0 Security

Kevin Hoffman
7 min readApr 9, 2019

Decentralized Authorization and Authentication with JWTs

NATS is a lightweight, cloud native, open-source high-performance messaging system. In this post, I want to talk about security in the upcoming NATS 2.0 release — what it is, why you should care, and what it can do for you and your organization. But before I get into those details, I want to take a moment to explain a journey common to people adopting messaging systems and asynchronous architectures.

This journey starts by evaluating some message broker. We play with it on our workstations, write some code, and then begin the slow and steady march toward production, assuming that first deployment marks the end of our trip.

This journey, unfortunately, doesn’t end at production, it really only begins there. All too often we forget about maintaining the systems and components we’ve deployed. We underestimate or outright neglect the effort and difficulty of upkeep and dynamic (re)configuration while still maintaining SLAs and avoiding downtime.

When evaluating NATS in the past for use in both personal and professional projects, I have enjoyed its power and simplicity, and in particular embraced the ability to do queue subscriptions (similar to Kafka “consumer groups”), without having to perform any central maintenance.

That statement is key. This means more to me than so many other criteria by which we typically judge third party systems. It’s invaluable that I can get this without sacrificing performance or, as you’ll see, security.

NATS’s ease of use and ease of maintenance stems predominantly from the fact that we don’t have to do anything to it while it’s running. Unlike Kafka, I don’t have to re-configure partitioning to accommodate new scaling patterns. Unlike RabbitMQ, I don’t have to configure fanout exchanges to control how I’m spreading message consumption across my services.

Up until recently, the one aspect of NATS that has remained centrally managed has been security — with authentication being either through simple credentials, or a token and authorization done through explicit definitions of roles in configuration files.

When managing allow and deny lists of subject publish and subscribe permissions (and you absolutely should be) I had to maintain those in configuration files. To change this data at runtime, I had to update the files and SIGHUP the NATS server to reload the configuration. This wasn’t a horrible experience, but it certainly could have been better.

In the course of delivering a broker-based system, once I dug myself out of the morass of managing whitelists and subject security, I then had to manage subject ontology. With multiple different applications all using a single message broker, we have to all agree on a scheme for namespaces, prefixes, and naming conventions so everyone can play nicely together.

This ontology is a soft ontology — it exists as a concept in our minds, and maybe as a living document somewhere, which is outdated at best, and often wholly inaccurate.

Invariably, adhering to these subject naming conventions fails and becomes a huge nightmare, especially when messages are delivered to the wrong topic when services weren’t coded defensively enough (ask me sometime about how big the blast radius is on “poison pill” failures like this. Spoiler: it’s staggeringly large).

Compared to its competitors, NATS 1.x security maintenance was a breeze. NATS 2.0 makes me wish every product did security this way, delivering far more— not just “I’ll have an extra scoop of ice cream” more, but “hey, thanks for that free ice cream truck and unlimited supply of ingredients!” more. With NATS 2.x, we get the ability to have truly decentralized security while also getting a ton of extra features for free.

Decentralized Security

NATS 2.0 inverts the traditional security model. Users aren’t stored on the server, nor are their credentials. This is worth emphasizing because this is a subtle, yet incredibly powerful, distinction. In decentralized security mode, a NATS server will not store a single piece of private information. That means the entire server’s memory can be read without any credential exposure.

NATS’s new security model is based on a hierarchy of entities with different roles and responsibilities. At the top of this new security hierarchy is the operator. An operator is the root administrative entity of a cluster. I typically name my operators the same as my clusters for clarity, but you can call them whatever you like. As with everything in this new security model, an operator has public information (in the form of claims) stored in a JSON Web Token (JWT). This token is signed (securely via ed25519) with the operator’s private key (this is an oversimplification for the moment, I’ll have more detail in forthcoming blog posts).

At the next level down are accounts. An account is a logical unit of isolation for messaging. Another way to think of accounts is this: what containers do for linux process isolation, accounts do for message isolation. All topics within an account are not shared with any entity outside that account by default, and traffic cannot enter or exit an account without an explicit grant for it (this is an additional feature I’ll talk about next).

Account JWTs are signed by the operator, asserting their legitimacy and granting them what the security world calls provenance — a verifiable chain of trust traceable back to a single trusted origin (as an aside to be covered in a subsequent post — a NATS cluster can have multiple trusted operators).

Within each account, we have users. A user is an entity that is granted the ability to connect to a NATS cluster. Users, including all of their permissions, are identified via JWTs and those JWTs are, predictably, signed by the owning account’s seed key (soon accounts will get multiple signing keys like operators have).

When your client connects to NATS 2, the server issues a random string called a nonce. When the client responds with a signed nonce, the server can then use the client’s public key (contained in the sub field of its JWT) to verify the signature. What this all distills down to is the notion that NATS can verify the client’s identity and permissions by verifying that it possesses the right seed (private) key. A basic tenet of asymmetric cryptography is that you can verify a signature in the absence of the secret that produced it.

The provenance of the client is then verified by walking backwards up the hierarchy from user to account to operator. Accounts can be resolved remotely via secure web request or maintained internally in server memory (again, without storing any secrets). Updates to accounts and account limits can be pushed in real-time through NATS itself, which is a feature powerful enough to warrant yet another blog post.

NATS 2.0 Security Hierarchy

Before moving on to the free ice cream, let’s recap: Operators vouch for accounts, which in turn vouch for users, all of whom are identified via completely public claims information and verified by signatures.

Additional Features

Now that we have account isolation with truly isolated subject namespaces (no cross-account subject naming collisions, and no more out of date ontology documents!), we can start taking advantage of this new foundation.

NATS 2 would be amazing enough with just the core security foundation, but as a bonus we get explicit sharing of services and streams. In NATS terminology, relative to the exporting account, a stream can be thought of as a published subject space, while a service is a subscription to externally supplied messages.

In the following diagram, account two has exported a stream called smurfdates , which has then been imported by account one and given the local name of updates. If a user has sufficient privilege, they can then subscribe to the updates subject, which will get messages via publication from users within account two.

NATS 2.x Secure Sharing Illustration

In order for this new import and export sharing to take place, exports can either be public or private. Public exports can be freely consumed by any account. For a private (secured) export, an exporter of a subject needs to create an activation token (another signed JWT), which is then embedded in the imports section of the target account’s JWT claims. This took me a while to get used to, once I got the hang of it I realized that it was one of those designs that is brilliant in the combination of its simplicity and power.

Let’s dive a little deeper on what this means and why it’s so powerful: two different accounts can mutually agree to import or export certain subjects in a secure, explicit manner. The activation token is like a signed contract, and even if an account is exporting a wildcard, the token might only allow for a specific subject (this is insanely powerful, and I’ll show examples of this in later posts). Depending on the nature of the activation contract, one or both of the accounts involved might need to have their JWTs re-signed because they were modified.

All of this is done out of band without the NATS server having direct knowledge of how the contract issuance and signing happened, and the NATS server still doesn’t need a single piece of private or sensitive information. It can continue to use public keys to verify the signature of the explicit import and verify the signature of the operator that vouched for the account containing the import, and so on up to the root of the operator hierarchy.

There is a lot to take in when it comes to the new NATS security model, so I will be following this introductory post up with posts describing how to build and manage operator hierarchies as well as how to write code that works with NATS 2.0.

As you’ll see soon, another brilliant aspect of this new design is that, aside from refactoring just a few liens of code to authenticate with a JWT and a seed key, clients can remain blissfully unaware of whether they are communicating via an account-local subject, an export, or an import.

In summary, I was already excited to use NATS with the 1.x version because of its simplicity, power, and performance. With 2.x, we get an incredible decentralized security and sharing model on top of that which, I think, is powerful enough to change the way we think about building message-oriented applications.

--

--

Kevin Hoffman

In relentless pursuit of elegant simplicity. Tinkerer, writer of tech, fantasy, and sci-fi. Converting napkin drawings into code for @CapitalOne