The ironic state of dependency management in Python (III)

December 26, 2023

The third dragon

When we talked about installing dependencies and projects stepping on each other's toes, we solved it with virtual environments. That erases at a stroke any altercation between neighbors. However, what happens if within the dependencies, two packages have overlapping and incompatible dependencies between them? Well, that's our third problem.

As many of you know, Python has a central repository from which the projects are fed to install the dependencies. This is typically handled with the command line tool 'pip'.

The problem with pip is that it does not watch for one package to step on the version-level dependencies of another. Let's illustrate this. If I have a library named tech.py that requires version 2.0 of the Pydantic package or library and I have a dependency that uses Pydantic but in its version 1.2, I still have the same problem that I had before with projects that use the same interpreter.

The difference is that I have the problem between packages of the same project and before I had it between projects.

Additionally, pip will not handle that dependency resolution problem without help from the project administrator. In fact, if we persist in our installation and force the inclusion of packages that are not compatible, it will fail at some point in its execution, since the functionality expected by some of the packages will not be found.

Welcome to the world of package and dependency managers for Python

Dependencies are not the only thing we will be concerned about when we are developing, testing, or deploying the application. Throughout the life of the application, we will want to keep the packages free of vulnerabilities and that means applying a strict security policy that leads us to check that they are free of bugs, when new updates are released and update our dependencies if they are released.

Be careful, because solving the dependency conflict network is a rather big problem that could throw us into the abyss of computational complexity since it can become a problem within the NP-Hard category.

We have already given examples of dependency conflicts, but within a project this problem can be exacerbated because some packages will depend on a particular version while others exclude the use of that particular version.

The conflict will be assured and often the solution is to adjust the project to tip the balance on one side or the other, with no room for Solomonic decisions.

Well, having described the problem and discarded pip for the dependency management of a moderately serious project (for small projects, pip can be enough) we need a tool that monitors conflicts, updates packages without compatibility problems (that is, for example, using semver and respecting the version specifications) and even builds and publishes our own Python packages.

The problem is not that such tools do not exist, but precisely the opposite. There is such an abundance of dependency managers that it is hard to commit to just one. Similar to when you have free time to watch a series and you spend it browsing through the whole catalog without deciding to "hit play".

Just kidding, let's not exaggerate. There are several managers, and it is difficult to decide on one because they are all competent in their work and the features practically overlap. In the end, the decision comes down more to use and comfort as we try them out than to a rigorous examination and technical test.

A practical graphic example

Here are three of them, with a lot in common but with different tastes and ways of doing things: Poetry, PDM y Rye.

Since they have very common elements, we will see examples in Poetry that are easily adaptable to the other managers.

These types of tools are more than just a package and dependency manager. Poetry, for example, allows us to build the project, package it and publish it in PyPI (the Python repository par excellence).

For the resolution of dependencies it relies heavily on the metadata of the packages published and accessible in PyPI following the versioning scheme standardized in pip-440 and not the semver that we have already discussed above.

One interesting thing is that when we start a new project it creates a directory and file structure typical in a Python project, saving us some input work. Here is an example:

The "new" command, at a stroke, has given us the directory for the sources (which optionally can also be src) and one to encourage us to write the always necessary but procrastination victims, tests.

Let's install a dependency, for example the requests package:

As we can see, it installs the package, its dependencies and the best thing is that instead of using the usual requirements.txt (we can export to that format) it declares it in a pyproject.toml file that is already defined in the Python standards.

Let's look at the content of this file:

If we observe well, the dependencies are not detailed. This is on purpose. What we have is a "clear" view of what the project needs, but we abstract from the dependencies of each package.

Those third-party dependencies are defined in a "lock" file that we have already seen but we replicate to see how it looks like after the installation of a required library:

What we see is the end of the file, as it is dense and difficult to process in a cursory reading. It is more of a record for the package manager to get a snapshot of "working" versions.

Then, when we want to replicate the execution environment in a deployment, there won't be (or shouldn't be) unnecessary conflicts because it is a practically cloned environment.

PDM and Rye are similar in terms of operation, although it goes without saying that Rye opts for a different path and even has the functionality of pyenv, that is to say, it has its own "Pythons" manager and will install the desired version (if available) and attach it to the project.

The case of Rye is curious because it is designed and implemented by the well-known Armin Ronacher "Mitsuhiko". This developer is the leader of projects such as Flask, Click and other widely used libraries.

Armin has merged the needs of Python package management but from the perspective of Rust, since for some years now, Ronacher has been a fan of this language which we have already talked about on more than one occasion from here.

Bonus track

When a dependency manager "locks" the packages required by a project, their hashes are kept in a file (usually with the extension .lock) where we can see the list of packages, their dependencies and all the hashes according to the installed version.

This allows us to compare the hash of the package being installed and if it does not match the hash noted in the file the manager will notify this disagreement.

The protection is obvious: The package may have been altered, either accidentally or as part of an attack on the supply chain. Mind you, these are not cryptographic signatures. Still, something is in place to ensure that the package that arrives does not come with a loose loop.

Conclusion

Having a portable execution environment (in the sense of moving it from machine to machine) in Python should not be a penance to Cvstodia. Of course, it is necessary to stumble over all the stones on the road to gain experience.

It is ironic for a language with the motto "There should be one-- and preferably only one--obvious way to do it." that there are so many options to get to the same point, so accidentally.

There is no orthodox way, and we are not trying to be prescriptive here, but it is practical and, as they say, "battle tested". We have used it in several of the projects we lead from the Innovation and Lab area and so far no one has broken the build ;)

—CONTINUING THIS SERIES

The ironic state of dependency management in Python (I)
The ironic state of dependency management in Python (II)