Virtualenv
When working with python it is common to have many dependencies from several python packages. PyPI is the official third-party software repository for python. You can install packages from PyPI or other sources with pip
:
python -m pip install package-name
Warning
Using a simple
pip install package-name
works in most of the simplest environments, but it can not have the desired effect in complicated situations. In particular, in the first version, you are sure you are using the right version of python which is currently setup (and not the one relative to the pip
executable).
By default pip
installs the packages in the system directories. Even if you have permission to do that (e.g. using the root user on your laptop) this is not recommended since your system may need a specific version of a package to work. Also solutions such as pip install --user
or pip install --target
are not recommended since they are not very maintainable or it is easy to mix different environments.
The most common solution is to use virtualenv which can create an isolated python environment (a fixed version of python and a set of python packages). Once you have created and activated your virtualenv only the packages inside it are used. The good thing is that you can create many virtualenv, and they are just folders that you can place in your home.
Warning
Copying the virtualenv folder between machines is very error-prone. Restart the configuration instead.
The drawback is that you can easily fill your home with many versions of the same packages.
Another more advanced solution is to use conda. Conda is more generic, for example, it can install different versions of python and full software as ROOT.
On your laptop
Be sure to have python
installed. This will create and activate a new virtual in you home folder (~
)
python -m venv ~/my-env # choose a proper name here
source ~/my-env/bin/activate.sh # change this if you are not in bash
After this setup, you can install packages with pip, and they will be installed in the my-env
folder, for example:
python -m pip install tqdm
Try to see if it works:
which python
It should point to the python installation inside your virtualenv. Then enter the python console:
python
import tqdm
tqdm.__file__
It should tell that the module tqdm is coming from your virtual environment.
If you want to reset the original setup deactivate the virtualenv:
deactivate
Other software helps you manage several virtualenv and similar tools such as virtualenvwrapper, conda, mamba, uv, ...
On proof or lxplus
In principle, you can repeat what you have done on your laptop. The problems are:
- Usually you want to set up your packages on top of another setup. For example, you want to set first a python version since you don't want to use the one provided by the system which can be very old
- virtualenv is not installed. You can install by yourself, but your installation will depend on the previous setup you have done, which can be different from time to time
Software on tier3s is distributed via cvmfs (look at /cvmfs
). Many general (non-experiment specific) packages are provided by LCG distributions. LCG software is the one available on the grid.
First, we need to choose an LCG release looking at this table, for example, we can choose LCG105
. If we click on the element we can see the available platforms. We want a platform compatible with our machine. On proof uname -a
tells us we are using centos7 (el7
). Many packages provided by LCG are already compiled so we need to choose the compiler the compilation has been done, for example, gcc11
. We choose opt
since we want the optimized compilation.
If you click on the package of a version, for example, https://lcginfo.cern.ch/release_packages/105/x86_64-centos7-gcc11-opt/ you can see the list of packages. For example, we see that it provides Python 3.9.12 and ROOT v6.30.02.
Creating a virtual environment on top of an LCG package is not so easy due to the extensive use of the PYTHONPATH
variable. The package cvmfs-venv helps to do that, and instruction on how to use it are on the repository page. In short, for example on a proof machine:
First, you have to download the script:
mkdir -p .local/bin
curl -sL https://raw.githubusercontent.com/matthewfeickert/cvmfs-venv/main/cvmfs-venv.sh -o ~/.local/bin/cvmfs-venv
export PATH=~/.local/bin:"${PATH}"
cd ~/.local/bin
chmod +x cvmfs-venv
It is convenient to execute automatically the last line every time you log in. You can do that by adding the line to your ~/.zshrc
(or ~/.bashrc
file if you are using bash, check it with echo $SHELL
).
Then you have to set up the LCG environment and create the virtualenv:
setupATLAS
lsetup 'views LCG_105 x86_64-centos7-gcc11-opt'
cvmfs-venv venv-lcg-105-centos7-gcc11-opt
Then you can activate it and use it:
source venv-lcg-105-centos7-gcc11-opt/bin/activate
Remember that each time you log in before using the virtualenv you have to set the LCG environment and activate the virtualenv:
setupATLAS
lsetup 'views LCG_105 x86_64-centos7-gcc11-opt'
source venv-lcg-105-centos7-gcc11-opt/bin/activate
Then you can start installing python packages inside it, e.g.
python -m pip install --upgrade tqdm
python
import tqdm
from time import sleep
import numpy as np
for i in tqdm.tqdm(np.arange(200)):
sleep(0.01)
import ROOT
print(ROOT.gROOT.GetVersion())
print(tqdm.__file__)
print(np.__file__)
You should see that tqdm
is coming from your virtualenv, while numpy
is provided by LCG
(from cvmfs).
If you prefer you can install packages on top of LCG, for example, if you want the latest version of numpy: python -m pip install --upgrade numpy
. This will install the latest version in your virtualenv.
Warning
pip is not very good at managing dependencies. If you install a package that requires a different version of a package already installed, it will overwrite the previous one. This can be problematic if you are using a package that requires a specific version of another package.