GTDB-Tk

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

Policy

The GTDB-Tk is open source and released under the GNU General Public License (Version 3).

Citations

The GTDB-Tk team encourage you to cite GTDB-Tk and the third-party dependencies as described in References.

Overview

GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.

It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples.

It can also be applied to isolate and single-cell genomes.

GTDB-Tk at HPC2N

On HPC2N we have GTDB-Tk available as a module on Kebnekaise. To see the available versions, login to Kebnekaise and do ml spider GTDB-Tk.

Usage at HPC2N

To use, load the GTDB-Tk module to add it to your environment. You give this command to see how to load GTDB-Tk and its prerequisites:

ml spider GTDB-Tk

and to see how to load a specific module, including the prerequisites, do:

ml spider GTDB-Tk/<version> 

The corresponding database location is predefined when loading the module, the GTDBTK_DATA_PATH environment variable points to the default database for the version of GTDB-Tk that is loaded. The mash_db file is also pre-created and is most easily referred to by using

--mash_db $GTDBTK_DATA_PATH-mash_db

Submit file example

To use GTDB-Tk in a submit file we suggest to use this as the base:

#!/bin/bash
#SBATCH -A <your-project-id>
#SBATCH -J <your-job-name>
#SBATCH -t <hh:mm:ss>
#SBATCH -c <number-of-cores-to-use>

ml purge > /dev/null 2>&1  # Clean environment from outside interference
ml foss/2022a GTDB-Tk/2.3.2  # Change these as per instruction from "ml spider GTDB-Tk/required-version"

gtdbtk arguments --cpus $SLURM_CPUS_ON_NODE

Note

The important part of the above submit file is the “–cpus $SLURM_CPUS_ON_NODE” argument which will make sure gtdbtk runs with the allocated number of cores.

Additional info

You can find help about running GTDB-Tk with the command gtdbtk -h.

There is also help on the The GTDB-Tk homepage.

In addition, they have a list of command line options for GTDB-Tk here: https://ecogenomics.github.io/GTDBTk/commands/index.html#commands