SciPy-bundle

This is an easybuild module containing several Python modules. This page has an overview of them:

Policy

All the packages that are part of SciPy-bundle are freely available to users at HPC2N.

Citations

See the entry for the specific package.

Usage at HPC2N

On HPC2N we have SciPy-bundle available as a module. The various versions have slightly different packages included.

Loading

To use the SciPy-bundle module and the packages included with it, add the module to your environment. You can find versions with

module spider SciPy-bundle 

and you can then find how to load a specific version (including prerequisites), with

module spider SciPy-bundle/<VERSION> 

Info on specific packages included

beniget

beniget is a static analyzer for Python code. Extract semantic information about static Python code.

Beniget provides a static over-approximation of the global and local definitions inside Python Module/Class/Function. It can also compute def-use chains from each definition.

Loading and using

beniget is available when you have loaded the SciPy-bundle module and its prerequisites.

See the beniget webpage for more information on usage.

Bottleneck

Bottleneck is a collection of fast NumPy array functions written in C.

Bottleneck is available when you have loaded the SciPy-bundle module and its prerequisites.

There is some more information on usage on:

deap

DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data structures transparent. It works in perfect harmony with parallelisation mechanism such as multiprocessing and SCOOP. The following documentation presents the key concepts and many features to build your own evolutions.

Citations

Authors of scientific papers including results generated using DEAP are encouraged to cite the following paper.

@article{DEAP_JMLR2012,
    author    = ” F\‘elix-Antoine Fortin and Fran\c{c}ois-Michel {De Rainville} and Marc-Andr\‘e Gardner and Marc Parizeau and Christian Gagn\‘e “,
    title     = { {DEAP}: Evolutionary Algorithms Made Easy },
    pages     = { 2171–2175 },
    volume    = { 13 },
    month     = { jul },
    year      = { 2012 },
    journal   = { Journal of Machine Learning Research }
}

deap is available when you have loaded the SciPy-bundle module and its prerequisites.

For more information about usage, please see

gast

gast - a generic AST to represent Python2 and Python3’s Abstract Syntax Tree(AST).

GAST provides a compatibility layer between the AST of various Python versions, as produced by ast.parse from the standard ast module.

gast is available when you have loaded the SciPy-bundle module and its prerequisites.

For more information about usage, please see

mpi4py

MPI for Python (mpi4py) provides bindings of the Message Passing Interface (MPI) standard for the Python programming language, allowing any Python program to exploit multiple processors.

  • Versions older than 3.1.4 are available through the SciPy-bundle and can be used after loading the specific SciPy-bundle module and its prerequisites.
  • Version 3.1.4 and newer are available as their own modules. To see how to load and use them, see the local mpi4py documentation page

For more information about how to use mpi4py, also see the local mpi4py documentation page.

mpmath

mpmath is a free (BSD licensed) Python library for real and complex floating-point arithmetic with arbitrary precision.

Citations

If you use mpmath in your research, please cite it! In BibTeX format, the following entry can be used:

@manual{mpmath,
  key     = {mpmath},
  author  = {The mpmath development team},
  title   = {mpmath: a {P}ython library for arbitrary-precision floating-point arithmetic (version 1.3.0)},
  note    = {{\tt http://mpmath.org/}},
}

This might render as:

  • The mpmath development team. mpmath: a Python library for arbitrary-precision floating-point arithmetic (version 1.3.0), 2023. http://mpmath.org/.

mpmath is available when you have loaded the SciPy-bundle module and its prerequisites.

For more information about how to use mpmath, see the mpmath homepage.

numexpr

NumExpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like '3*a+4*b') are accelerated and use less memory than doing the same calculation in Python.

In addition, its multi-threaded capabilities can make use of all your cores – which generally results in substantial performance scaling compared to NumPy.

Last but not least, numexpr can make use of Intel’s VML (Vector Math Library, normally integrated in its Math Kernel Library, or MKL). This allows further acceleration of transcendent expressions.

NumExpr is available when you have loaded the SciPy-bundle module and its prerequisites.

More information

numpy

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

  • Powerful N-dimensional arrays
  • Interoperable
  • Numerical computing tools
  • Performant
  • Open source
  • Easy to use

Citations

If NumPy has been significant in your research, and you would like to acknowledge the project in your academic publication, we suggest citing the following paper:

NumPy is available when you have loaded the SciPy-bundle module and its prerequisites.

Usage

This is an example of a batch script that can be used to run a Python script that uses NumPy.

#!/bin/bash
# The name of the account you are running in, mandatory.
#SBATCH -A hpc2nXXXX-YYY
# Request resources - here for a serial job
#SBATCH -n 1
# Request runtime for the job (HHH:MM:SS) where 168 hours is the maximum. 
# Here asking for 15 min.
#SBATCH --time=00:15:00

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job, it could be more or
# less, depending on other package needs. This is for a simple job needing
# numpy. 
module load GCC/13.3.0
module load SciPy-bundle/2024.05

# Running the job
python ./my_numpy_program

Note

  • If you are running a multi-threaded job you should ask for more cores
  • If you need mpi4py you probably need to load that module - it is not included with SciPy-bundle unless it is for an older version
  • For MPI jobs you need to run with srun python ./my_mpi_job

More information

pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Citations

If you use pandas for a scientific publication, we would appreciate citations to the published software and the following paper:

  • pandas on Zenodo, Please find us on Zenodo and replace with the citation for the version you are using. You can replace the full author list from there with “The pandas development team” like in the example below.

  @software{reback2020pandas,
      author       = {The pandas development team},
      title        = {pandas-dev/pandas: Pandas},
      month        = feb,
      year         = 2020,
      publisher    = {Zenodo},
      version      = {latest},
      doi          = {10.5281/zenodo.3509134},
      url          = {https://doi.org/10.5281/zenodo.3509134}
  }
  

  @InProceedings{ mckinney-proc-scipy-2010,
    author    = { {W}es {M}c{K}inney },
    title     = { {D}ata {S}tructures for {S}tatistical {C}omputing in {P}ython },
    booktitle = { {P}roceedings of the 9th {P}ython in {S}cience {C}onference },
    pages     = { 56 - 61 },
    year      = { 2010 },
    editor    = { {S}t'efan van der {W}alt and {J}arrod {M}illman },
    doi       = { 10.25080/Majora-92bf1922-00a }
  }
  

Pandas, like NumPy, has been part of the SciPy-bundle module since 2020.

pandas is available when you have loaded the SciPy-bundle module and its prerequisites.

Usage

This is an example of a batch job for a simple Python script using pandas (print-data-frame.py).

#!/bin/bash
# The name of the account you are running in, mandatory.
#SBATCH -A hpc2nXXXX-YYY
# Request resources - here for a serial job
#SBATCH -n 1
# Request runtime for the job (HHH:MM:SS) where 168 hours is the maximum. 
# Here asking for 15 min.
#SBATCH --time=00:15:00

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job, it could be more or
# less, depending on other package needs. This is for a simple job needing
# pandas. 
module load GCC/13.3.0
module load SciPy-bundle/2024.05

# Running the job
python print-data-frame.py 
import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df) 

As usual, you submit the job script with the command sbatch <my-batch-script.sh>.

More information

ply

PLY is yet another implementation of lex and yacc for Python. Some notable features include the fact that its implemented entirely in Python and it uses LALR(1) parsing which is efficient and well suited for larger grammars.

PLY provides most of the standard lex/yacc features including support for empty productions, precedence rules, error recovery, and support for ambiguous grammars.

PLY is extremely easy to use and provides very extensive error checking. It is compatible with both Python 2 and Python 3.

Ply is available when you have loaded the SciPy-bundle module and its prerequisites.

pythran

Pythran is an ahead of time compiler for a subset of the Python language, with a focus on scientific computing. It takes a Python module annotated with a few interface descriptions and turns it into a native Python module with the same interface, but (hopefully) faster.

It is meant to efficiently compile scientific programs, and takes advantage of multi-cores and SIMD instruction units.

Until 0.9.5 (included), Pythran was supporting Python 3 and Python 2.7. It now only supports Python 3.

Pythran is available when you have loaded the SciPy-bundle module and its prerequisites.

More information

scipy

SciPy is a free and open-source Python library used for scientific computing and technical computing.

SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier transform, signal and image processing, ordinary differential equation solvers and other tasks common in science and engineering.

Citations

If SciPy has been significant in your research, and you would like to acknowledge the project in your academic publication, we suggest citing the following paper:

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17(3), 261-272. DOI: 10.1038/s41592-019-0686-2.

For any specific algorithm, also consider citing the original author’s paper (this can often be found under the “References” section of the docstring).

SciPy is available when you have loaded the SciPy-bundle module and its prerequisites.

Usage

This is an example of a batch script for running a serial SciPy Python program:

#!/bin/bash
# The name of the account you are running in, mandatory.
#SBATCH -A hpc2nXXXX-YYY
# Request resources - here for a serial job
#SBATCH -n 1
# Request runtime for the job (HHH:MM:SS) where 168 hours is the maximum. 
# Here asking for 15 min.
#SBATCH --time=00:15:00

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job, it could be more or
# less, depending on other package needs. This is for a simple job needing
# scipy. 
module load GCC/13.3.0
module load SciPy-bundle/2024.05

# Running the job
python ./my_scipy_program

More information

tzdata

This is a Python package containing zic-compiled binaries for the IANA time zone database. It is intended to be a fallback for systems that do not have system time zone data installed (or don’t have it installed in a standard location), as a part of PEP 615.

tzdata is available when you have loaded the SciPy-bundle module and its prerequisites.

More information

versioneer

Versioneer is a tool to automatically update version strings (in setup.py and the conventional ‘from PROJECT import _version’ pattern) by asking your version-control system about the current tree.

Versioneer is available when you have loaded the SciPy-bundle module and its prerequisites.