File Systems

This section provides information about the file systems at HPC2N. Since your home directory is per default quite small, you should keep files needed for your jobs in your project storage.

There are also instructions on how to compress and archive your data and other files.

The section ‘Filetransfer’ gives some information about how to do file transfers here at HPC2N.

Overview

Project storage $HOME /scratch
Recommended for batch jobs Yes No (size) Yes
Backed up No Yes No
Accessible by the batch system Yes Yes Yes (node only)
Performance High High Medium
Default readability Group only Owner Owner
Permission management chmod, chgrp, ACL chmod, chgrp, ACL N/A for batch jobs
Notes This is the storage your
group gets allocated through
the storage projects
Your home directory Per node

$HOME

This is your home-directory (pointed to by the $HOME variable). It has a quota limit of 25GB per default. Your home directory is backed up regularly.

Since the home directory is quite small, it should not be used for most production jobs. These should instead be run from project storage directories.

To find the path to your home directory, either run pwd just after logging in, or do the following:

b-an01 [~/store]$ cd
b-an01 [~]$ pwd
/home/u/username
b-an01 [~]$

It is not generally possible to get more space in your home directory. You should generally use project storage instead. If you need more of that, the PI in your project should apply for it.

However, if you really need more space in your home directory, have your PI contact support@hpc2n.umu.se and include a good explanation of what you need the extra space for.

Project storage

Project storage is where a project’s members have the majority of their storage. It is applied for through SUPR, as a storage project. While storage projects needs to be applied for separately, they are usually linked to a compute project.

This is where you should keep your data and run your batch jobs from. It offers high performance when accessed from the nodes making it suitable for storage that are to be accessed from parallel jobs, and your home directory (usually) has too little space.

Project storage is located below /proj/nobackup/ in the directory name selected during the creation of the proposal.

Note

The project storage is not intended for permanent storage and there is NO BACKUP of /proj/nobackup.

Quota

The size of the storage depends on the allocation. There are small, medium, and large storage projects, each with their own requirements. You can read about this on SUPR. The quota limits are specific for the project as such, there are no user level quotas on that space.

There are actually 4 quota limits for the project storage space. Soft and hard limit for disk usage and soft and hard limit for the number of files. The hard limits are really hard limits. You can never go above them. You can be above the soft limit for a grace period, but after the grace period the soft limit will behave as a hard limit until you have gone below the soft limit again.

Misc

It is recommended to use the project’s storage directory for the projects data. Layout structure in that project directory is the responsibility of the project itself.

Note

  • For the PI, make sure to add any user in SUPR that should be granted access to the storage space to the storage project.
  • The storage project PI can link one or several compute projects to the storage project, thereby allowing users in the compute project access to the storage project without the PI having to explicitly handle access to the storage project.

/scratch

Our recommendation is that you use the project storage instead of /scratch when working on Compute nodes or Login nodes.

On the computers at HPC2N there is a directory called /scratch. It is a small local area split between the user using the node and it can be used saving (temporary) files you create or need during your computations. Please do not save files in /scratch you don’t need when not running jobs on the machine, and please make sure your job removes any temporary files it creates.

When anybody need more space than available on /scratch, we will remove the oldest/largest files without any notices.

Note

There is NO backup of /scratch.

The size of /scratch depends on the type of nodes and that size is split between the number of cores that your job has on the node.

  • Kebnekaise, standard compute nodes: ~170 GB
  • Kebnekaise, GPU nodes: ~170 GB
  • Kebnekaise, Largemem nodes: ~350 GB

SweStore - Nationally Accessible Storage

For data archiving and long-term storage we recommend our users to use the SweStore Nationally Accessible Storage. This is a robust, flexible and expandable long term storage system aimed at storing large amounts of data produced by various Swedish research projects.

For more information, see the documentation for SweStore available at docs.swestore.se.

Archiving and compressing

There are a number of options for archiving and compressing directories and files at HPC2N.

Note that in the below examples, $ and b-an01 [~]$ are bash prompts from the terminal and you should not write these.

tar (more information)

This program saves many files together into a single archive file, and it also restores individual files from the archive. Automatic archive compression/decompression options exists, as well as special features that allow tar to be used for incremental and full backups. The command tar –help will give the format (defaults to gnu). This is generally only important for files larger than 8 GB.

Examples

Archive a file

$ tar cvf myfile.tar myfile.txt 
myfile.txt

It adds the file “myfile.txt” to the tar archive myfile.tar, without any compression.

Archive and compress a file

$ tar cvfz myfile.tar.gz myfile.txt 
myfile.txt

It adds the file “myfile.txt” to the tar archive myfile.tar and then does gzip compression.

List contents of a tar archive file

$ tar tvf myfile.tar
-rwxr-xr-x username/folk 23717 2009-10-02 15:48:48 myfile.txt

Extract contents of a tar archive file

$ tar xvf myfile.tar
myfile.txt

In this case there were only 1 file in the tar archive, if there had been more, all would have been extracted here.

Extract contents of a gzipped tar-archive

$ tar xvfz myfile.tar.gz 
myfile.txt

Archive and compress all files in a directory to a single tar archive file

$ tar cvfz mydir.tar.gz C/
C/
C/hello
C/hello.c
C/hello_submit

Archive and compress all files of a certain type to a single tar archive file

In this example, all that are .c, and only those in the current directory and below.

$ tar cvfz myfile.tar.gz *.c
calc.c
converting.c
hello.c

gzip (more information)

Compression utility designed as a replacement for compress, with much better compression and no patented algorithms. The standard compression system for all GNU software.

Examples

Compress a file (also removes the uncompressed file)

$ gzip myfile.txt

Uncompress a file (also removes the compressed file)

$ gunzip myfile.txt.gz 

bzip2 (more information)

Strong, lossless data compressor based on the Burrows-Wheeler transform. Also available as a library.

Examples

Compress a file (also removes the uncompressed file)

$ bzip2 myfile.txt

Uncompress a file (also removes the compressed file)

$ bunzip2 myfile.txt.bz2 

zip (more information)

Simple compression and file packaging utility. Note that the maximum size limit of a zip file is 4GB and if this size limit is exceeded, the file becomes prone to corruption. This further leads to failure of the extraction process and inaccessibility of your data.

Zip examples

Compressing myfile.txt

$ zip myfile.zip myfile.txt 
  adding: myfile.txt (deflated 57%)

Uncompressing myfile.zip

If the file already exists, zip will ask if you want to replace or rename

$ unzip myfile.zip
Archive:  myfile.zip
  inflating: myfile.txt              

Compress all files in one directory to a single archive file

$ zip -r mydir.zip C/
  adding: C/ (stored 0%)
  adding: C/hello (deflated 66%)
  adding: C/hello.c (deflated 3%)
  adding: C/hello_submit (deflated 24%)

Compress all files of a certain type in the current directory (and in directories under this) to a single archive file

In this example case for all .c files

$ zip -r my_c_files.zip . -i \*.c
  adding: hello.c (deflated 3%)
  adding: C/hello.c (deflated 3%)
  adding: converting.c (deflated 31%)
  adding: calc.c (deflated 5%)

Archiving/compressing on Windows

There are a number of Windows programs using the same formats. These are a few of the more popular ones:

  • 7-Zip. Free Windows software package that can handle all the above formats.
  • WinZip. Commercial Windows software package that can handle all the above formats.
  • WinRAR. Commercial Windows software package that can handle all the above formats.

File transfer

There are several possible ways to transfer files and data to and from HPC2N’s systems.

Note that in the below examples, $ and b-an01 [~]$ are bash prompts from the terminal and you should not write these.

Jump to specific section: [ FTP | SCP | SFTP | LFTP | rsync ]

FTP - NOT PERMITTED!

FTP (File Transfer Protocol) is a simple data transfer mechanism. FTP is the original program for data transfer, but it was not designed for secure communications. FTP exists on the systems, but HPC2N does not permit connections using FTP because of the security problems. There are several modern FTP clients which support either SFTP or SCP which are similar, secure protocols for file transfer. Use one of those methods instead of FTP.

SCP

SCP (Secure CoPy) is a simple way of transferring files between two machines that use the SSH (Secure SHell) protocol. You may use SCP to connect to any system where you have SSH (log-in) access. There are some graphical file transfer programs which offers SCP as protocol and it is also a command line program on most Linux, Unix, and Mac OS X systems. SCP can copy single files, but will also recursively copy directory contents if given a directory name.

Command-line usage

From local system to a remote system

$ scp sourcefilename user@hostname:somedir/destfilename

Example:

$ scp irf.png user@machine.umu.se:C/irf.png
Password: 
irf.png                   100% 2100     2.1KB/s   00:00    

From a remote system to a local system

$ scp user@hostname:somedir/sourcefilename destfilename

Example:

$ scp user@machine.umu.se:irf.png irfpic.png
Password: 
irf.png                   100% 2100     2.1KB/s   00:00    

Recursive directory copy from a local system to a remote system

$ scp -r sourcedirectory/ user@hostname:somedir/

Installation

Linux / Solaris / AIX / HP-UX / Unix

The “scp” command line program should already be installed.

Microsoft Windows

  • WinSCP is a full-featured and free graphical SCP and SFTP client.
  • PuTTY also offers “pscp.exe”, which is an extremely small program and a basic SCP client.
  • Secure FX is a commercial SCP and SFTP client.

macOS / Mac OS X

The “scp” command line program should already be installed. You may start a local terminal window from “Applications->Utilities”.

SFTP

SFTP (SSH File Transfer Protocol or sometimes called Secure File Transfer Protocol) is a network protocol that provides file transfer over a reliable data stream. You may use SFTP to connect to most of HPC2N’s systems. SFTP is a command -line program on most Unix, Linux, and Mac OS X systems. It is also available as a protocol choice in some graphical file transfer programs. SFTP has more features than SCP and allows for other operations on remote files, such as remote directory listing, and it is also possible to resume interrupted transfers. Note, however, that command-line SFTP cannot recursively copy directory contents. If you need to do so, you must either use SCP or a graphical SFTP client.

Command-line usage

Start sftp

$ sftp -B buffersize user@hostname

Examples

From a local system to a remote system

enterprise-d [~]$ sftp user@kebnekaise.hpc2n.umu.se
Connecting to kebnekaise.hpc2n.umu.se...
user@kebnekaise.hpc2n.umu.se's password:
sftp> put file.c C/file.c
Uploading file.c to /home/u/user/C/file.c
file.c                          100%    1    0.0KB/s   00:00
sftp> put -P irf.png pic/
Uploading irf.png to /home/u/user/pic/irf.png
irf.png                         100% 2100    2.1KB/s   00:00
sftp>

From a remote system to a local system

sftp> get file2.c C/file2.c
Fetching /home/u/user/file2.c to C/file2.c
/home/u/user/file.txt  100%  1  0.1KB/s 00:00    
sftp> get -P file3.c C/
Fetching /home/u/user/file3.c to C/file3.c
/home/u/user/file.txt  100%  1  0.4KB/s 00:00    
sftp> exit
enterprise-d [~]$ 

The following two flags can be useful

  • -B: optional, specify buffer size for transfer; larger may increase speed, but costs memory
  • -P: optional, preserve file attributes and permissions

Regarding buffer size; in order to find a optimal buffer size, use the following formula:

optimal buffer size = 2 x bandwidth x delay

or

optimal buffer size = bandwidth x RTT

RTT = round trip time, which you get from ping, since delay is about ping average time/2.

Example:

ping average = 2.653 ms

The data link’s capacity = 1 GB, so optimal buffer size should be 2.653 ms x (1 GB/8 bit) = 331625 bytes.

See http://fasterdata.es.net/TCP-tuning/ for more information.

Installation

Linux / Solaris / AIX / HP-UX / Unix

The “sftp” command line program should already be installed.

Microsoft Windows

  • WinSCP is a full-featured and free graphical SFTP and SCP client.
  • PuTTY also offers “psftp.exe”, which is an extremely small program and a basic SFTP client.
  • Secure FX is a commercial SFTP and SCP client.

Mac OS X

  • The “sftp” command-line program should already be installed. You may start a local terminal window from “Applications->Utilities”.
  • MacSFTP

LFTP

LFTP is a command-line file-transfer program for Linux and Unix systems. FTP, HTTP, FISH, SFTP, HTTPS and FTPS protocols. LFTP has additional features not provided by SFTP such as bandwidth throttling, transfer queues, and parallel transfers. It may be used interactively or scripted. Every operation in LFTP is reliable, that is any non-fatal error is handled and the operation is retried automatically. So if downloading breaks, it will be restarted from the point automatically.

In order to connect over SFTP to our resources, the username and hostname shall be prefixed by sftp://

LFTP has shell-like command syntax allowing you to launch several commands in parallel in background (&). It is also possible to group commands within () and execute them in background. All background jobs are executed in the same single process. You can bring a foreground job to background with ^Z (c-z) and back with command wait' (orfg’ which is alias to wait'). To list running jobs, use commandjobs’. With parallel transfers LFTP can be much faster than SCP or SFTP, so its use is encouraged when possible.

LFTP is simply a client, so it is not needed on the remote machine involved in a transfer (the remote system need only support SFTP).

Examples

Retrieve and compress

lftp> cat file | gzip > file.gz
lftp> get file &

lftp> (cd /path && get file) &

The first command retrieves the file from the ftp server and passes its contents to gzip which in turn stores the compressed data to file.gz. Other commands show how to start commands or command groups in the background.

LFTP has a built in mirror which can download or update a whole directory tree. There is also reverse mirror (mirror -R) which uploads or updates a directory tree on server.

More interactive examples

Starting lftp

$ lftp sftp://user@hostname

Transfer all .dat files from a remote system to a local system

lftp :~> mget *.dat

Transfer filename.dat file from a local system to a remote system

lftp :~> put filename.dat

Transfer a directory and all contents from a remote system to a local ssytem, using 5 connections in parallel

lftp :~> mirror --parallel=5 remotedir localdir/

Transfer a directory and all contents from a local system to a remote system, using 8 connections in parallel

lftp :~> mirror -R --parallel=8 localdir remotedir/

Batch usage

Specify all actions on command line

$ lftp sftp://user@hostname -e "mget *.dat"

Specify all actions in the script file transfer.lftp

$ lftp sftp://user@hostname -f transfer.lftp

rsync

rsync is a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operating systems and is under the GPL-3.0-or-later license.

Note

Rerunning rsync will continue the transfer of all files/directories that have not yet been completely transferred. If you have a large file that was partially transferred, it will restart the transfer of that unless you had included the flag –partial.

Syntax for rsync

rsync FLAGS source destination

Where source is the source directory, either on a local system, local disk, or remote system and destination is the destination directory on local or remote system or disk.

rsync has many useful flags, which you can find with the man rsync commmand. Here we will only cover the most common:

  • -r: recursive
  • -a: archive. It syncs recursively and preserves symbolic links, modification times, groups/owners, and permissions. Equivalent to -rlptgoD.
  • -v: verbose
  • -n: Check before running
  • -z: Compress before transfer
  • -P: Combines flags –progress (progress bar for tranfer) and –partial (resume interrupted tranfer of a file).
  • –no-o: Do not preserve owner. Default unless you use -a. Useful if you have a different username on the remote and local system.
  • –no-perms: Do not preserve permissions. Default unless you added -a.
  • –no-links: Do note preserve symbolic links. Default unless you are using -a.

Note

  • Rerunning rsync will continue the transfer of all files/directories that have not yet been completely transferred. If you have a large file that was partially transferred, it will restart the transfer of that unless you had included the flag –partial.
  • When preserving modification times, upon rerun rsync will only update files that are new or have been modified since the previous run.

Examples

Recursively syncronize the files from a local source directory to another local destination directory

$ rsync -r dir1/ dir2

Recursively sync files from one remote directory to a local directory. Also preserve symbolic links and time stamps, and allows resume of partially transferred files on restart.

rsync -rlpt username@remote_host:sourcedir/ /path/to/localdir

Recursively sync files from one local dir to another. Also preserve symbolic links, owners, permissions, and modification times

$ rsync -a dir1/ dir2

Recursively sync a local directory to a remote destination directory, preserving owners, permission, modification times, and symbolic links

$ rsync -a /path/to/localdir/ username@remote_host:destination_directory

Recursively sync a remote directory to a local directory, while preserving owners, permissions, modification times, and symbolic links

$ rsync -a username@remote_host:/path/to/sourcedir/ /path/to/localdir

Recursively sync a remote directory to a local directory, preserving owners, permissions, modification times, and symbolic links. Also compress before transfer, show progress bar, and allow to continue transferring a file that was not completed when connection was broken

$ rsync -azP username@remote_host:/path/to/sourcedir/ /path/to/localdir

Recursively sync a remote dir to a local dir. Preserve modification times, but not owners, symbolic links, or permissions. Verbose is on

$ rsync -a --no-o --no-links --no-perms -v username@remote_host:/path/to/sourcedir/ /path/to/localdir