PackageDNA Our Development Package Analysis Framework That Made Its Debut at Blackhat

August 30, 2021

After several months of research and development, during the BlackHat USA 2021 Arsenal event, you saw our deep analysis tool for development packages called PackageDNA, in the talk "Scanning DNA to detect malicious packages in your code". Its goal was to showcase the library analysis framework that was programmed to help developers and companies validate the security of packages that are being used in their code.

Esta herramienta cuando nos planteamos en el equipo de innovación analizar el malware que se oculta dentro de las librerías. From time to time, it was made public that some libraries were supplanting the original ones, for example in this example from late 2018 in which a couple of libraries in PyPi were alerted. The story would repeat itself often since then, but how to do the research without a tool to make the search easier? Our initial idea was to take the PyPi packages only, but we set ourselves a bigger challenge and the idea evolved to take the libraries of the main programming languages. So it became a framework, which should show for each package it parsed in PyPi, RubyGems, NPM and Go, the following data:

Metadata of the package.
HASH of all the files it contains.
Detection of possible IoC, such as IP's, Hash, URL's and emails.
Static analysis of the code, with an open-source tool for each language.
Analysis using AppInspector, Microsoft's open-source tool for identifying malicious components.
Validation of suspicious files against Virustotal.
Validation of CVE report on GitHub, taking into account the specific version of the package.
Validation of packages generated by the same user within the library and in other programming languages.
Checking the possible typosquatting of the package in the same library.

This resulted in a powerful framework that allows a deep analysis of the libraries being used in the code being analysed or created, but also gives security analysts a static view of the security of the code, a view of the attacker's behaviour and data for threat intelligence.

How to use PackageDNA?

The famework is developed in Python3 with an interactive console that allows the user to simply select what they want to do, the first screen the user sees is as follows:

You must start with option 7 the configuration of all external tools that are associated with the use of the framework (all are free to use or open source developments) as you can see in the following image is only correctly load each value.

Once everything is condivd, the user can do the following with the PyPI, RubyGems, NPM and Go libraries:

Analyse the latest version of a package.
Analyse all versions of a package and compare results between versions.
Load a list of packages with specific versions.
Upload a local package for analysis.

For threat intelligence analysis, you should select option 4 in the initial panel and it allows you to enter another panel where you can perform.

Searches for the packages generated in each of the libraries and developments uploaded to github using the username you want to investigate.
Analyse the typosquatting and brandsquatting found in a specific library of a package.
Search for code segments within a specific package.

While the tool is designed without a database to store all searches, there is an option to review the results of the analysis performed and stored locally on the machine.

Having the information initially in the console, but with the option of viewing it in the browser through Flask, as shown in the following images.

Attacks on the software supply chain

During development, attacks on the software supply chain were gaining prominence around the world, with reports of several packages being detected as malicious in many libraries that were within our scope, so we couldn't have had a better testing scenario.

In fact we were able to analyse the versions of maratlib, a PyPI package that was deployed for malicious cryptocurrency mining and that spoofed a package commonly used in mathematics called matplotlib.

When running the tool and using the comparison on the two versions, we could clearly see the malicious code segment that is detected by AppInspector and that is present in only one of the versions loaded in the library.

But we can also look at the other packages in the report that are generated using typosquatting techniques.

So, with this framework we hope to provide the community of developers and code security analysts with a simple but powerful mechanism to achieve their goals. You can download it for free at https://github.com/telefonica/packagedna and we are open to your comments and contributions to improve the tool.