¡Ja! Encryption fingerprinting, right?

October 16, 2024

A very important (better said, fundamental) part of threat intelligence is the acquisition of signatures that help identify components of a campaign's infrastructure. They are “fixed points” that help us to identify and coordinate defensive actions. Without them, it would be very difficult to target a target that moves stealthily and quickly. They are called IOCs (Indicators of Commitment).

If you find an IOC in your systems (usually by triggering one or more rules) it is time to examine to rule out the false positive, the intrusion or its attempt. Without them, the only thing left to do would be to strengthen all the commitment to behavioral detection and new technologies that are still heuristics, even if they come with the artificial intelligence seal. We need IOCs, at least in the short and medium term.

An IOC in isolation does not tell us much, but if we pull the thread and perform a correlation exercise and “paint” the directed graph, we will get a complete picture. We go from having isolated data to information. With the latter we will be able to draw our plans better and if we base them on previous “successes” (intelligence) we will know what and how to face in the best possible way.

We know several of them, the classics: IP, domain, hashes, and even the mutexes used by malware to, among other things, avoid duplicating its execution-infection on a system. However, other new ones are relatively new, such as JA3 and JA3S or JARM.

JA3, a signature to identify the fingerprint of encrypted channels

Just as the statement reads. The idea and implementation (which originated with three Salesforce engineers (and for the sake of curiosity, the acronyms of the names of the three are J.A.) is based on the hash of a string that is the concatenation of several fields from the handshake or negotiation between client and server to establish an encrypted connection.

Let's take a breath to explain more concretely what we have just read.

An encrypted connection is negotiated between the two parties to agree on what encryption and configuration they will use. If all goes well, they will exchange keys (Diffie-Hellman, for example) and the payload of that conversation will be encrypted.

So, before the encryption starts, there are a series of handshakes in which both parties inform about the possible encryption and configuration they support. The idea is that as many clients and servers have a presentation of this “fixed” configuration, we can extract the fields that do not vary and produce a hash with them.

And that's it, if this hash appears we know which customer or server (JA3 or JA3S) we are dealing with. An image of the fields used for the calculation:

Source.

From this information, a string is extracted with the values that can be inspected and a text string is formed, which is then “hashed”. In fact, the hash function used is the classic MD5 (yes, it is considered insecure, but for the JA3 mission it has no negative impact implications). In JA3S (the server identification part) some fields vary but the result is identical.

A practical use of JA3

Ok, but how do we use this?

It is easy and simple. Imagine we have a rule that detects a certain trojan with a hash, for example, a TrickBot. But now, the creators of that malware have released a new version that slightly changes the configuration. This small change means that the rule is no longer valid for detecting new versions.

However, since the creators continue to use the same cryptographic libraries, even if the hash has changed, the TLS negotiation will remain the same and we will continue to identify the Trojan unless they make a change in the libraries, they use to encrypt communications.

Let's look at a JA3 determined on the "TrickBot" family (8916410db85077a5460817142dcbc8de):

Source: Own capture.

Consulting this hash in the ABUSE database we see that it identifies about 60,000 samples of that family.

And now the question that many of you may be asking: Doesn't that hash collide with software that uses that same library in that particular way?

Yes, it is possible. So JA3 is not a silver bullet but a good starting point for associating with families (if it crosses with SSDEEP, for example) and gives more weight to a detection by other factors. In other words, it is a “handle with care” but gives us a very good edge in the relationship graph.

JA3 problems grow

Despite its relatively short rollout (JA3 left the nest in 2017), from its implementation to today it has been found to have certain major limitations. In a post by CloudFlare's engineering team they point them out and list them:

  • Browsers have started to output in random order the TLS extensions of connections. This in itself shatters the accumulated signatures when it comes to identifying more current browsers (clients).
  • There are notable differences in the ecosystem of tools and utilities that compute JA3 hashes. This means that there are several hashes for the same TLS stack. That is, we can have a detection rule for a certain JA3 hash while the detection component calculates a completely different hash for the same object.
  • Finally, they criticize (rightly) that JA3 focuses only on certain fields of the TLS negotiation, the “ClientHello”; it does not take advantage of the other fields that can help us to better “nail” the identification.

JA3 is still a very good idea, despite these limitations due to use and experience. Thus, in 2023 a new iteration or twist was presented to solve some of the shortcomings already mentioned.

JA4, the next generation

JA4 was presented by FoxIO engineers in September 2023. The first thing that stands out is that there are no longer only two types of signature: JA3 and JA3S, but a complete suite of signatures specific to each layer using encryption and specific applications or protocols:

Source.

As we can see, JA4 is a complete family of hashes oriented to extract typical and differentiated information from each protocol. Moreover, if we read a JA4 hash, we can make a reading of it, since it carries information that is to some extent interpretable to the naked eye by a human. Let us look at the case of a JA4, which would be the equivalent of a JA3:

Source.

If we take a look, we see the first fragment of the hash which is interpretable on a read. Next, we see two fragments that are parts of the SHA256 hash of the supported cipher sets and the advertised extensions... but instead of calculating them in the order they appear, they do it by sorting them before applying the hash (which allows to avoid the randomness we were talking about before).

We have already solved most of the shortcomings of JA3 with this ordering of extensions and ciphers and the “upgrade” from MD5 to SHA256 to avoid collisions.

It is also a kind of fuzzy hash because, even if only one extension or cipher suite changes, we can continue to group the fragments that have not changed.

Now we will look at a completely different animal. Instead of focusing on TLS connections we will focus on the HTTP protocol:

Source.

JA4H is specialized in extracting signatures from an HTTPS client connection. As we can see, it is organized in four fragments (a, b, c, d).

The first of them has readable information, which we can understand in one reading. The next three fragments are parts of three SHA256 hashes. Each of these three hashes are calculated from the order of appearance of the HTTP headers, the hash of the cookies but sorted alphabetically and the third one is the same as the second one but with the values of the cookies.

The complete hash is not unique in its parts. For example, in two connections we can obtain the same first three fragments and only the fourth fragment, which carries the cookie values that can potentially change in each client connection, varies. We therefore have a hash that leaves the context of the TLS ClientHello fields and is applied to that of the HTTP protocol.

Conclusions

As we said at the beginning, information is everything, but if we can create intelligence with it, then it is everything and more.

And since the ingredients of information are data, the better the data, the better the dish will be cooked.

A path has been trodden with JA3 that gives good results, but with the JA4 family of hashes we now have a well paved road. It is only a matter of time before it is adopted and we see more JA4 hashes databases on which we can dive to enrich our information.

Image: 8photo / Freepik.