PyInstaller is a piece of software that attempts to bundle up a Python application and all its dependencies into a single executable. It's an excellent tool that makes deployments (especially to end users) much easier. It is capable of bundling not only a project's Python dependencies, but its C/C++ library dependencies as well.
This is great for any Python project that wants that self-contained, single-executable experience that compiled languages can offer. It is also very useful for proprietary projects that don't want to deploy their code directly to users. However, PyInstaller is a very aggressive packaging tool and pays no mind to the licenses of the libraries it packages up. This poses a significant challenge if your project does not use a GPL-compatible license, because you may be violating the licenses of your LGPL or GPL dependencies.
This post will take a look at strategies we have used to identify and either externalize or eliminate LGPL and GPL dependencies in a proprietary PyInstaller project.
Please note that I am in no way a lawyer and it's the responsibility of each company or individual to do their due diligence when it comes to license compliance. This is not legal advice.
Chances are that your project has a requirements.txt
or setup.py
file that
specifies your Python dependencies. Looking at the licenses for these projects
is a good start, but make sure that you also consult the licenses of your
transitive dependencies as well. Transitive dependencies are the dependencies
of your dependencies, and PyInstaller will package those alongside your direct
dependencies.
You can use pip
to inspect your transitive dependencies using pip show
.
In the following example, I'm taking a look at the dependencies of the falcon
package.
pip show falcon
This command outputs the following:
Name: falcon
Version: 1.4.1
Summary: An unladen web framework for building APIs and app backends.
Home-page: http://falconframework.org
Author: Kurt Griffiths
Author-email: mail@kgriffs.com
License: Apache 2.0
Location: /home/velovix/.virtualenvs/test/lib/python3.6/site-packages
Requires: six, python-mimeparse
Required-by:
If we take a look at the “Requires” section of this output, we can see that
falcon depends on the six
and python-mimeparse
packages. We need to make
sure we're adhering to the licenses of these packages as well.
Identifying C/C++ dependencies is less straightforward. It's not always immediately clear what C/C++ libraries your project depends on and which ones PyInstaller finds and bundles up for you. The best way I've found to know for sure is to inspect the directory that a PyInstalled executable creates at runtime.
When you run an executable made with PyInstaller, it starts by unpacking almost
everything that it is bundled with into a temporary directory. On Linux, this
directory is located in /tmp
. The actual name of this directory is random but
it always starts with the _MEI
prefix. As long as you're not running multiple
PyInstaller executables, this should be enough information to find the directory
in question.
Start by running your program like normal.
./MyCoolExecutable
While it's running, simply inspect the temporary directory that PyInstaller has
created. We're looking for .so
files, which are shared library files
containing compiled C/C++ code. Using the names of these files, it is usually
relatively straightforward to find their corresponding project and license.
Depending on the size of your project, you may find quite a few shared library files in this directory. My project contained more than 50. It may take quite a while to sift through all these, but nobody said license compliance was easy.
When looking through these libraries, take note of the names of library files that are GPL or LGPL licensed.
If your product is not using a GPL-compatible license, you will not be able to
use a GPL licensed library. Any code that is licensed with GPL must be removed
from your executable before deployment and you will not be able to link to an
external GPL library, either. It should be easy to avoid GPL Python libraries
by simply removing any libraries that use GPL code from your environment with
pip
. C/C++ libraries are not always so straightforward.
If you find a library file that is GPL licensed in your project, you'll need to identify where this dependency comes from and remove it. This can be straightforward in circumstances where a Python package you're using directly depends on a GPL library. If that's not the case, it's likely that the GPL library is a transitive dependency of another C/C++ library you depend on.
C/C++ library files can declare their dependencies on other library files by
linking to them. PyInstaller uses this linking information to figure out which
library files need to be included in the executable. We can inspect this
linking information by running the ldd
command on a library file. The
following command will run ldd
on every library file in your executable's
temporary directory. You can then inspect the output of this command to find
why an offending GPL dependency is being pulled into your build.
find <your /tmp/_MEIxxxx directory> -type f -name "*.so" -exec echo {} \; -exec ldd {} \;
Here is an example of the kind of output this command generates.
...
./xtables/libip6t_hbh.so
linux-vdso.so.1 (0x00007ffe79d56000)
libxtables.so.12 => /usr/lib/libxtables.so.12 (0x00007f4683acc000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f4683908000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f4683903000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007f4683b1f000)
./xtables/libip6t_srh.so
linux-vdso.so.1 (0x00007fff799ad000)
libxtables.so.12 => /usr/lib/libxtables.so.12 (0x00007f4c7e29b000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f4c7e0d7000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f4c7e0d2000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007f4c7e2ef000)
./xtables/libxt_SECMARK.so
linux-vdso.so.1 (0x00007ffe7a8c2000)
libxtables.so.12 => /usr/lib/libxtables.so.12 (0x00007ff02efb3000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007ff02edef000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007ff02edea000)
/usr/lib64/ld-linux-x86-64.so.2 (0x00007ff02f006000)
...
The path to each library file is printed, then every library file it links to is printed under it. Searching through this output for the name of the GPL library file you've identified should be sufficient to find out where it came from.
If you're creating a build on Linux, you're very likely to find library files for GNU Readline. Readline is a library that provides utilities for interactive command line applications and Python provides bindings to GNU Readline by default in the standard library. Here's the problem: GNU Readline is GPL licensed.
Since Readline is part of the standard library, it will be included regardless of if you actually use the bindings. The only option I'm aware of to avoid this is to compile the Python interpreter yourself in an environment where the C/C++ Readline library is not installed. This will create a Python interpreter without the readline module available.
python3 -c "import readline"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'readline'
It's worth mentioning that Python's readline bindings do technically support linking to editline, a mostly API-compatible alternative to GNU Readline with a more permissive license. In fact, Python uses this by default on MacOS where GNU Readline is not commonly installed. However, at the time of this writing support for editline is really only designed for MacOS and I don't know of a way to have the interpreter use editline on Linux. The bright side is that if your only target is MacOS, you don't have to worry about this!
The LGPL license is complicated and may or may not be an option for your proprietary application. Some companies and individuals choose not to use LGPL licensed code in their proprietary applications at all due to concerns with how the license may be interpreted. Like every other legal decision, I will leave this to the discretion of the reader.
If you decide to use LGPL code in your product, you must allow users to swap out that LGPL component with a modified version. The following is the “Combined Works” excerpt from the LGPL 3.0 regarding this.
4. Combined Works.
You may convey a Combined Work under terms of your choice that,
taken together, effectively do not restrict modification of the
portions of the Library contained in the Combined Work and reverse
engineering for debugging such modifications, if you also do each of
the following:
a) Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the Combined Work with a copy of the GNU GPL and this license
document.
c) For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to the
copies of the GNU GPL and this license document.
d) Do one of the following:
0) Convey the Minimal Corresponding Source under the terms of this
License, and the Corresponding Application Code in a form
suitable for, and under terms that permit, the user to
recombine or relink the Application with a modified version of
the Linked Version to produce a modified Combined Work, in the
manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.
1) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (a) uses at run time
a copy of the Library already present on the user's computer
system, and (b) will operate properly with a modified version
of the Library that is interface-compatible with the Linked
Version.
e) Provide Installation Information, but only if you would otherwise
be required to provide such information under section 6 of the
GNU GPL, and only to the extent that such information is
necessary to install and execute a modified version of the
Combined Work produced by recombining or relinking the
Application with a modified version of the Linked Version. (If
you use option 4d0, the Installation Information must accompany
the Minimal Corresponding Source and Corresponding Application
Code. If you use option 4d1, you must provide the Installation
Information in the manner specified by section 6 of the GNU GPL
for conveying Corresponding Source.)
If your project distributes its source to the customer, complying with this aspect of LGPL is straightforward. Simply include with your distribution instructions on how to rebuild the executable with PyInstaller. Then the user may install their own modified versions of any LGPL dependency and PyInstaller will create an executable with that version instead. For closed-source projects, things get more complicated.
Our goal in this section is to filter LGPL C/C++ libraries out of the
executable and put them into a separate lib
folder. This folder will be
deployed alongside the executable. End users who wish to swap out the program's
LGPL dependencies with their own modified version may simply compile the
modified version into a shared library file and replace the one included in the
deployment. We can accomplish this by hooking into PyInstaller's library
inclusion system.
As you probably already know, we can instruct PyInstaller on how we want our
application built by pointing it to a .spec
file. A simple build.spec
might
look something like this:
# -*- mode: python -*-
analysis = Analysis(['./my_main_file.py'],
binaries=[],
datas=[],
hiddenimports=[],
hookspath=[],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=None)
pyz = PYZ(analysis.pure, analysis.zipped_data, cipher=None)
elf = EXE(pyz,
analysis.scripts,
analysis.binaries,
analysis.zipfiles,
analysis.datas,
name='MyCoolExecutable',
debug=False,
strip=False,
upx=True,
console=True)
When you create a new Analysis
object, PyInstaller does the work of
identifying your dependencies. The binaries
field is a list of tuples in the
format (filename, full_path, type)
. These fields represent all of the shared
library files that PyInstaller will put in your resulting executable. We may
simply filter out the libraries we want externalized from this list. The result
might look something like this:
# -*- mode: python -*-
from shutil import copyfile
from pathlib import Path
analysis = Analysis(['./my_main_file.py'],
binaries=[],
datas=[],
hiddenimports=[],
hookspath=[],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=None)
lgpl_libraries = ["libavcodec", "libavformat"]
def is_lgpl(filename):
for lgpl_lib in lgpl_libraries:
if lgpl_lib in filename:
return True
return False
filtered_binaries = []
for binary in analysis.binaries:
filename, path, type_ = binary
if is_lgpl(filename):
print("Externalizing library:", filename)
copyfile(path, Path("lib", filename))
else:
filtered_binaries.append(binary)
pyz = PYZ(analysis.pure, analysis.zipped_data, cipher=None)
elf = EXE(pyz,
analysis.scripts,
filtered_binaries,
analysis.zipfiles,
analysis.datas,
name='MyCoolExecutable',
debug=False,
strip=False,
upx=True,
console=True)
In the above example, we keep a list of known LGPL libraries and filter out the
binaries
field of any libraries that look like them. When we find a match, we
copy the file to a lib
directory and exclude it from the bundling process.
The resulting filtered list of non-LGPL libraries is fed into the EXE
constructor.
Now that these library files are no longer managed by PyInstaller, we need to
make sure the linker can find these library files at runtime. There are a few
ways to accomplish this, but we opted to deploy a script alongside the
executable that modifies the LD_LIBRARY_PATH
variable to include the lib
directory we created.
#!/usr/bin/env bash
LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH ./MyCoolExecutable
PyInstaller does a lot in the background to make its Python dependency management feel as seamless as it does. We need to take a moment to understand how Python looks for modules and what PyInstaller does to change the interpreter's default behavior in this respect before going forward and making modifications of our own.
Note that these explanations are probably an oversimplification of what's actually going on behind the scenes. I'm not an expert on this topic.
sys.path
VariableWhen you import a module, the interpreter looks in a variety of places to find
the corresponding Python file. It consults a list of directories one-by-one
until it finds a matching module. That list of directories is sys.path
. The
sys.path
variable is very similar in spirit to the LD_LIBRARY_PATH
environment variable that C/C++ developers are all too familiar with. It's also
a bit like the regular old PATH
variable from Bash. They're all simply lists
of paths that instruct a program on where to look for something.
By default, the first entry in sys.path
points to the same directory as the
Python script that you're running. The other entries depend on your
installation and environment. One of the entries will point to the location of
the standard library, for instance. If you're running in a virtual environment,
there will be entries pointing to where in that virtual environment your
dependencies are stored. Basically, if you have a Python package installed
somewhere and you are able to import it, it's likely because there's a
sys.path
entry pointing to that location.
For more information on sys.path
, take a look at the
official documentation.
Keep in mind though that everything I've mentioned so far are just the interpreter's defaults. Like so many other things in Python, this system can be completely customized. This brings us to our next topic.
sys.meta_path
VariableFor simple cases, editing sys.path
is usually all you need to do to allow
Python to find your dependencies, but if you want to get into the nitty gritty
and completely change how Python finds modules, sys.meta_path
is the tool to
reach for.
The sys.meta_path
variable is a list of “meta path finders”. A meta path
finder is an object that can find a module from the information in an import
statement. These objects expose a few key methods that the interpreter calls to
access this functionality. The built-in meta path finders consult sys.path
to
do this work, but a custom meta path finder could use any method it likes in
order to find modules.
PyInstaller injects its own custom objects into sys.meta_path
so that the
modules embedded in the executable can be found by the interpreter. These
meta path finders do not consult sys.path
. If we want to add our own custom
module location, editing sys.path
will not be enough. Instead, we need to
create a meta path finder of our own and inject it into sys.meta_path
!
By the way, if you want to learn more about how PyInstaller does what it does, take a look at their excellent documentation.
With all this background in mind, let's circle back to the original problem and demonstrate how our newfound knowledge will help us solve it.
Say you're creating a proprietary application that depends on python-vlc
.
This package is LGPL licensed, so we have to give users a way to substitute in
their own modified version. To do this, we're going to allow the user to set an
optional environment variable called PYTHON_VLC_PATH
, whose value is a path
leading to the user's custom python-vlc
implementation. If this environment
variable is set, we will import from that location instead of our prepackaged
version.
As we mentioned earlier, we're going to have to write a custom meta path finder
to accomplish this. The only required method for a meta path finder to
implement is find_spec
. With this in mind, let's create a meta path finder
that can import a module given its name and the path to its implementation.
class LGPLFinder(MetaPathFinder):
def __init__(self, module_name, custom_location):
self.module_name = module_name
self.custom_location = custom_location
def find_spec(self, fullname, path, target=None):
if fullname == self.module_name:
source = os.path.join(self.custom_location,
self.module_name + ".py")
loader = SourceFileLoader(self.module_name, source)
return ModuleSpec(self.module_name, loader)
else:
# Some other module, let other importers handle it
return None
The class is fairly simple. You can see that we only try to import the module
if its fullname
is equal to the module we're tasked with importing.
Otherwise, we return None
, which tells the interpreter to consult the next
meta path finder in sys.meta_path
. If it is our module of interest, we create
a path pointing to a Python file with the same name as the module in the custom
location provided earlier. We put this and the module's name into a
SourceFileLoader
, a class that handles the details of how to load a Python
file into a module. What we return is a ModuleSpec, a class that describes
import-related information about a module.
I would definitely recommend reading up on these classes if you're curious about the details. See the documentation for SourceFileLoader and ModuleSpec.
Now that we have this custom meta path finder, let's take a look at how we
would inject it into sys.meta_path
.
python_vlc_path = os.environ.get("PYTHON_VLC_PATH", None)
if python_vlc_path is not None:
finder = LGPLFinder("vlc", python_vlc_path)
sys.meta_path.insert(0, finder)
import vlc
sys.meta_path.pop(0)
Once again, the code is surprisingly simple. We check to see if the user has
set the environment variable. If they have, we create a new LGPLFinder for
importing python-vlc
. Before importing, we insert our finder at the very
beginning of sys.meta_path
, ensuring it is the first meta path finder that
gets run. Then, we import python-vlc
and remove the meta path finder from
sys.meta_path
, since we no longer need it.
One important thing to know is that this code must be run before your LGPL dependency is imported anywhere else. This is because modules in Python are singletons. No matter how many times a module is imported in a project, the initialization of the module only happens once and we need to make sure it happens while our custom meta path loader is active. The benefit to this is that your LGPL dependency can be imported normally everywhere else in your project after this initial import.
Achieving full license compliance is difficult and PyInstaller doesn't make it any easier. However, we've demonstrated that it's still possible with enough diligence and some knowledge about PyInstaller's inner workings.