HDF5

HDF5 is a data model, library, and file format for storing and managing data.

Policy

HDF5 is freely available to users at HPC2N.

Citations

If you use this software, please cite it using the metadata from this file.

Overview

HDF5 supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.

The HDF5 technology suite includes:

  • A versatile data model that can represent very complex data objects and a wide variety of metadata.
  • A completely portable file format with no limit on the number or size of data objects in the collection.
  • A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
  • A rich set of integrated performance features that allow for access time and storage space optimizations.
  • Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.

Usage at HPC2N

On HPC2N we have HDF5 available as a module.

Loading

To use the HDF5 module, add it to your environment. You can find versions with

module spider HDF5

and then you can find how to load a specific version (including prerequisites) with

module spider HDF5/<version>

Example

Loading HDF5 version 1.14.3, GCC, OpenMPI

module load GCC/13.2.0 
module load OpenMPI/4.1.6
module load HDF5/1.14.3 

Or one of the other combinations given by module spider HDF5/1.14.3

Example

Loading HDF5 version 1.12.2, intel compilers, Intel MPI, CUDA

module load intel-compilers/2022.1.0 
module load impi/2021.6.0
module load HDF5/1.12.2
module load CUDA/12.6.0 

Loading the module should set any needed environmental variables as well as the path.

Running / using

There are example programs in the examples directory of the HDF5 installation. After loading the module, they can be found here: $EBROOTHDF5/share/hdf5_examples.

When the module is loaded, you can use $EBROOTHDF5 to find the binaries and libraries available.

HDF5 is used by adding calls to your program, depending on whether you want to create a new data file, read from an existing data file, or write to an existing data file.

Remember

You must add the HDF header file

#include "hdf5.h"  | C
#include "H5Cpp.h" | C++
USE HDF5           | Fortran

to the header of your program.

File Access Modes

  • H5Fcreate accepts H5F_ACC_EXCL or H5F_ACC_TRUNC
  • H5Fopen accepts H5F_ACC_RDONLY or H5F_ACC_RDWR
  • H5F_ACC_EXCL If the file already exists, H5Fcreate fails. If the file does not exist, it is created and opened with read-write access. (Default)
  • H5F_ACC_TRUNC If the file already exists, the file is opened with read-write access, and new data will overwrite any existing data. If the file does not exist, it is created and opened with read-write access.
  • H5F_ACC_RDONLY An existing file is opened with read-only access. If the file does not exist, H5Fopen fails. (Default)
  • H5F_ACC_RDWR An existing file is opened with read-write access. If the file does not exist, H5Fopen fails.

Creating a file

  • Define the file creation property list
  • Define the file access property list
  • Create the file
  • More information here about creating a file, opening an exisiting file, and closing a file.

Examples from HDF homepage

Creating an HDF5 file using property list defaults

file_id = H5Fcreate ("SampleFile.h5", H5F_ACC_EXCL,
    H5P_DEFAULT, H5P_DEFAULT)

Creating an HDF5 file using property lists

fcplist_id = H5Pcreate (H5P_FILE_CREATE)
  <...<em>set desired file creation properties</em>...>
faplist_id = H5Pcreate (H5P_FILE_ACCESS)
  <...<em>set desired file access properties</em>...>
file_id = H5Fcreate ("SampleFile.h5", H5F_ACC_EXCL, fcplist_id, faplist_id)

Opening an HDF5 file (read-only access)

faplist_id = H5Pcreate (H5P_FILE_ACCESS)
status = H5Pset_fapl_stdio (faplist_id)
file_id = H5Fopen ("SampleFile.h5", H5F_ACC_RDONLY, faplist_id)

Closing an HDF5 file

status = H5Fclose (file_id)

Viewing a file with h5dump

Included with the HDF5 distribution is a command-line utility called h5dump. This is a program for inspecting the contents of a HDF5 file. It displays ASCII output formatted according to the HDF5 DDL grammar.

Displaying the content of file.h5:

h5dump SampleFile.h5 

This is how the ‘default’ file will look, before any datasets or groups have been created, and no data has been written:

    HDF5 "file.h5" {
    GROUP "/" {
    }
    }
  • You can read more about the program h5dump here.
  • The HDF5 DDL grammar is described in this document.

File Function Summaries

Table of general library functions, macros (H5)

C Function Fortran Function Purpose
H5check_version h5check_version_f Verifies that HDF5 library versions are consistent.
H5close h5close_f Flushes all data to disk, closes all open identifiers, and cleans up memory.
H5dont_atexit h5dont_atexit_f Instructs the library not to install the atexit cleanup routine.
H5garbage_collect h5garbage_collect_f Garbage collects on all free-lists of all types.
H5get_libversion h5get_libversion_f Returns the HDF library release number.
H5open h5open_f Initializes the HDF5 library.
H5set_free_list_limits h5set_free_list_limits_f Sets free-list size limits.
H5_VERSION_GE (none) Determines whether the version of the library being used is greater than or equal to the specified version.
H5_VERSION_LE (none) Determines whether the version of the library being used is less than or equal to the specified version.

Table of file functions (H5F)

C Function Fortran Function Purpose
H5Fclear_elink_file_cache (none) Clears the external link open file cache for a file
H5Fclose h5fclose_f Closes HDF5 file.
H5Fcreate h5fcreate_f Creates new HDF5 file.
H5Fflush h5fflush_f Flushes data to HDF5 file on storage medium.
H5Fget_access_plist h5fget_access_plist_f Returns a file access property list identifier.
H5Fget_create_plist h5fget_create_plist_f Returns a file creation property list identifier.
H5Fget_filesize h5fget_filesize_f Returns the size of an HDF5 file.
H5Fget_freespace h5fget_freespace_f Returns the amount of free space in a file.
H5Fget_info (none) Returns global information for a file.
H5Fget_intent (none) Determines the read/write or read-only status of a file.
H5Fget_mdc_config (none) Obtain current metadata cache configuration for target file.
H5Fget_mdc_hit_rate (none) Obtain target file’s metadata cache hit rate.
H5Fget_mdc_size (none) Obtain current metadata cache size data for specified file.
H5Fget_name h5fget_name_f Retrieves name of file to which object belongs.
H5Fget_obj_count h5fget_obj_count_f Returns the number of open object identifiers for an open file.
H5Fget_obj_ids h5fget_obj_ids_f Returns a list of open object identifiers.
H5Fget_vfd_handle (none) Returns pointer to the file handle from the virtual file driver.
H5Fis_hdf5 h5fis_hdf5_f Determines whether a file is in the HDF5 format.
H5Fmount h5fmount_f Mounts a file.
H5Fopen h5fopen_f Opens existing HDF5 file.
H5Freopen h5freopen_f Returns a new identifier for a previously-opened HDF5 file.
H5Freset_mdc_hit_rate_stats (none) Reset hit rate statistics counters for the target file.
H5Funmount h5funmount_f Unmounts a file.

File creation property list functions (H5P)

C Function Fortran Function Purpose
H5Pset/get_userblock h5pset/get_userblock_f Sets/retrieves size of user-block.
H5Pset/get_sizes h5pset/get_sizes_f Sets/retrieves byte size of offsets and lengths used to address objects in HDF5 file.
H5Pset/get_sym_k h5pset/get_sym_k_f Sets/retrieves size of parameters used to control symbol table nodes.
H5Pset/get_istore_k h5pset/get_istore_k_f Sets/retrieves size of parameter used to control B-trees for indexing chunked datasets.
H5Pset_shared_mesg_nindexes h5pset_shared_mesg_nindexes_f Sets number of shared object header message indexes.
H5Pget_shared_mesg_nindexes (none) Retrieves number of shared object header message indexes in file creation property list.
H5Pset_shared_mesg_index h5pset_shared_mesg_index_f Configures the specified shared object header message index.
H5Pget_shared_mesg_index (none) Retrieves the configuration settings for a shared message index.
H5Pset_shared_mesg_phase_change (none) Sets shared object header message storage phase change thresholds.
H5Pget_shared_mesg_phase_change (none) Retrieves shared object header message phase change information.
H5Pget_version h5pget_version_f Retrieves version information for various objects for file creation property list.

File access property list functions (H5P)

C Function Fortran Function Purpose
H5Pset/get_alignment h5pset/get_alignment_f Sets/retrieves alignment properties.
H5Pset/get_cache h5pset/get_cache_f Sets/retrieves metadata cache and raw data chunk cache parameters.
H5Pset/get_elink_file_cache_size (none) Sets/retrieves the size of the external link open file cache from the specified file access property list.
H5Pset/get_fclose_degree h5pset/get_fclose_degree_f Sets/retrieves file close degree property.
H5Pset/get_gc_references h5pset/get_gc_references_f Sets/retrieves garbage collecting references flag.
H5Pset_family_offset h5pset_family_offset_f Sets offset property for low-level access to a file in a family of files.
H5Pget_family_offset (none) Retrieves a data offset from the file access property list.
H5Pset/get_meta_block_size h5pset/get_meta_block_size_f Sets the minimum metadata block size or retrieves the current metadata block size setting.
H5Pset_mdc_config (none) Set the initial metadata cache configuration in the indicated File Access Property List to the supplied value.
H5Pget_mdc_config (none) Get the current initial metadata cache configuration from the indicated File Access Property List.
H5Pset/get_sieve_buf_size h5pset/get_sieve_buf_size_f Sets/retrieves maximum size of data sieve buffer.
H5Pset_libver_bounds h5pset_libver_bounds_f Sets bounds on library versions, and indirectly format versions, to be used when creating objects.
H5Pget_libver_bounds (none) Retrieves library version bounds settings that indirectly control the format versions used when creating objects.
H5Pset_small_data_block_size h5pset_small_data_block_size_f Sets the size of a contiguous block reserved for small data.
H5Pget_small_data_block_size h5pget_small_data_block_size_f Retrieves the current small data block size setting.

File driver functions (H5P)

C Function Fortran Function Purpose
H5Pset_driver (none) Sets a file driver.
H5Pget_driver h5pget_driver_f Returns the identifier for the driver used to create a file.
H5Pget_driver_info (none) Returns a pointer to file driver information.
H5Pset/get_fapl_core h5pset/get_fapl_core_f Sets driver for buffered memory files (i.e., in RAM) or retrieves information regarding driver.
H5Pset_fapl_direct h5pset_fapl_direct_f Sets up use of the direct I/O driver.
H5Pget_fapl_direct h5pget_fapl_direct_f Retrieves direct I/O driver settings.
H5Pset/get_fapl_family h5pset/get_fapl_family_f Sets driver for file families, designed for systems that do not support files larger than 2 gigabytes, or retrieves information regarding driver.
H5Pset_fapl_log (none) Sets logging driver.
H5Pset/get_fapl_mpio h5pset/get_fapl_mpio_f Sets driver for files on parallel file systems (MPI I/O) or retrieves information regarding the driver.
H5Pset_fapl_mpiposix h5pset_fapl_mpiposix_f Stores MPI IO communicator information to a file access property list.
H5Pget_fapl_mpiposix h5pget_fapl_mpiposix_f Returns MPI communicator information.
H5Pset/get_fapl_multi h5pset/get_fapl_multi_f Sets driver for multiple files, separating categories of metadata and raw data, or retrieves information regarding driver.
H5Pset_fapl_sec2 h5pset_fapl_sec2_f Sets driver for unbuffered permanent files or retrieves information regarding driver.
H5Pset_fapl_split h5pset_fapl_split_f Sets driver for split files, a limited case of multiple files with one metadata file and one raw data file.
H5Pset_fapl_stdio H5Pset_fapl_stdio_f Sets driver for buffered permanent files.
H5Pset_fapl_windows (none) Sets the Windows I/O driver.
H5Pset_multi_type (none) Specifies type of data to be accessed via the MULTI driver enabling more direct access.
H5Pget_multi_type (none) Retrieves type of data property for MULTI driver.

File Property Lists

Additional information regarding file structure and access are passed to H5Fcreateand H5Fopenthrough property list objects. Property lists provide a portable and extensible method of modifying file properties via simple API functions. There are two kinds of file-related property lists:

  • File creation property lists
  • File access property lists

You can read more about File Property lists in the HDF5 User Guide.

Code Examples

The following examples (for version 1.14.X) are taken from the HDF5 - User Guide - Code Examples.

  • Reading/writing a chunked dataset (Chunking refers to a storage layout where a dataset is partitioned into fixed-size multi-dimensional chunks.) [C] [Fortran]
  • Reading/writing a compact dataset [C] [Fortran]
  • Reading/writing an external dataset [C] [Fortran]

There are more examples on the HDF5 API reference Examples by API, dataset page

Compiling

First, you must load the HDF5 module as mentioned at the beginning of this section.

Then, you compile with

  • h5cc hdf5_program.c (for C programs - use h5c++ for newer versions)
  • h5c++ hdf5_program.cpp (for C++ programs - or use h5pcc for newer versions)
  • h5fc program.f90 (for Fortran 90 programs - or use h5pfc for newer versions)

You can get more help with this by typing

h5cc --help
h5c++ --help
h5fc --help 

Example, compiling, running, GCC

HDF5, version 1.12.2, GCC compiler.

C (h5ex_d_chunk.c - get from download link above)

b-an01 [~]$ module load GCC/11.3.0 OpenMPI/4.1.4 
b-an01 [~]$ module load HDF5/1.12.2
b-an01 [~]$
b-an01 [~]$ h5c++ h5ex_d_chunk.c -o h5ex_d_chunk
b-an01 [~]$ ./h5ex_d_chunk
Original Data:
 [   1   1   1   1   1   1   1   1]
 [   1   1   1   1   1   1   1   1]
 [   1   1   1   1   1   1   1   1]
 [   1   1   1   1   1   1   1   1]
 [   1   1   1   1   1   1   1   1]
 [   1   1   1   1   1   1   1   1]

Storage layout for DS1 is: H5D_CHUNKED

Data as written to disk by hyberslabs:
 [   0   1   0   0   1   0   0   1]
 [   1   1   0   1   1   0   1   1]
 [   0   0   0   0   0   0   0   0]
 [   0   1   0   0   1   0   0   1]
 [   1   1   0   1   1   0   1   1]
 [   0   0   0   0   0   0   0   0]

Data as read from disk by hyperslab:
 [   0   1   0   0   0   0   0   1]
 [   0   1   0   1   0   0   1   1]
 [   0   0   0   0   0   0   0   0]
 [   0   0   0   0   0   0   0   0]
 [   0   1   0   1   0   0   1   1]
 [   0   0   0   0   0   0   0   0]
b-an01 [~]$

Example, compiling, running, Intel

Fortran 90 (h5ex_d_chunk.f90 - get from download link above)

b-an01 [~]$ module load intel-compilers/2022.1.0 impi/2021.6.0 
b-an01 [~]$ module load HDF5/1.12.2
b-an01 [~]$ 
b-an01 [~]$ h5pfc h5ex_d_chunk.F90 -o h5ex_d_chunk
b-an01 [~]$ ./h5ex_d_chunk 
Original Data:
 [  1  1  1  1  1  1  1  1 ]
 [  1  1  1  1  1  1  1  1 ]
 [  1  1  1  1  1  1  1  1 ]
 [  1  1  1  1  1  1  1  1 ]
 [  1  1  1  1  1  1  1  1 ]
 [  1  1  1  1  1  1  1  1 ]

Storage layout for DS1 is: H5D_CHUNKED

Data as written to disk by hyberslabs:
 [  0  1  0  0  1  0  0  1 ]
 [  1  1  0  1  1  0  1  1 ]
 [  0  0  0  0  0  0  0  0 ]
 [  0  1  0  0  1  0  0  1 ]
 [  1  1  0  1  1  0  1  1 ]
 [  0  0  0  0  0  0  0  0 ]

Data as read from disk by hyperslab:
 [  0  1  0  0  0  0  0  1 ]
 [  0  1  0  1  0  0  1  1 ]
 [  0  0  0  0  0  0  0  0 ]
 [  0  0  0  0  0  0  0  0 ]
 [  0  1  0  1  0  0  1  1 ]
 [  0  0  0  0  0  0  0  0 ]
b-an01 [~]$ </pre>

Additional info

You can find more information at the following locations: