GTDB-Tk¶
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
Policy¶
The GTDB-Tk is open source and released under the GNU General Public License (Version 3).
Citations
The GTDB-Tk team encourage you to cite GTDB-Tk and the third-party dependencies as described in References.
Overview¶
GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB.
It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples.
It can also be applied to isolate and single-cell genomes.
GTDB-Tk at HPC2N¶
On HPC2N we have GTDB-Tk available as a module on Kebnekaise. To see the available versions, login to Kebnekaise and do ml spider GTDB-Tk
.
Usage at HPC2N¶
To use, load the GTDB-Tk module to add it to your environment. You give this command to see how to load GTDB-Tk and its prerequisites:
and to see how to load a specific module, including the prerequisites, do:
The corresponding database location is predefined when loading the module, the GTDBTK_DATA_PATH
environment variable points to the default database for the version of GTDB-Tk that is loaded. The mash_db
file is also pre-created and is most easily referred to by using
Submit file example¶
To use GTDB-Tk in a submit file we suggest to use this as the base:
#!/bin/bash
#SBATCH -A <your-project-id>
#SBATCH -J <your-job-name>
#SBATCH -t <hh:mm:ss>
#SBATCH -c <number-of-cores-to-use>
ml purge > /dev/null 2>&1 # Clean environment from outside interference
ml foss/2022a GTDB-Tk/2.3.2 # Change these as per instruction from "ml spider GTDB-Tk/required-version"
gtdbtk arguments --cpus $SLURM_CPUS_ON_NODE
Note
The important part of the above submit file is the “–cpus $SLURM_CPUS_ON_NODE” argument which will make sure gtdbtk runs with the allocated number of cores.
Additional info¶
You can find help about running GTDB-Tk with the command gtdbtk -h
.
There is also help on the The GTDB-Tk homepage.
In addition, they have a list of command line options for GTDB-Tk here: https://ecogenomics.github.io/GTDBTk/commands/index.html#commands