HDF5¶
HDF5 is a data model, library, and file format for storing and managing data.
Policy¶
HDF5 is freely available to users at HPC2N.
Citations
If you use this software, please cite it using the metadata from this file.
Overview¶
HDF5 supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.
The HDF5 technology suite includes:
- A versatile data model that can represent very complex data objects and a wide variety of metadata.
- A completely portable file format with no limit on the number or size of data objects in the collection.
- A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
- A rich set of integrated performance features that allow for access time and storage space optimizations.
- Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.
Usage at HPC2N¶
On HPC2N we have HDF5 available as a module.
Loading¶
To use the HDF5 module, add it to your environment. You can find versions with
and then you can find how to load a specific version (including prerequisites) with
Example
Loading HDF5 version 1.14.3, GCC, OpenMPI
Or one of the other combinations given by module spider HDF5/1.14.3
Example
Loading HDF5 version 1.12.2, intel compilers, Intel MPI, CUDA
Loading the module should set any needed environmental variables as well as the path.
Running / using¶
There are example programs in the examples directory of the HDF5 installation. After loading the module, they can be found here: $EBROOTHDF5/share/hdf5_examples
.
When the module is loaded, you can use $EBROOTHDF5
to find the binaries and libraries available.
HDF5 is used by adding calls to your program, depending on whether you want to create a new data file, read from an existing data file, or write to an existing data file.
Remember
You must add the HDF header file
to the header of your program.
File Access Modes¶
H5Fcreate
acceptsH5F_ACC_EXCL
orH5F_ACC_TRUNC
H5Fopen
acceptsH5F_ACC_RDONLY
orH5F_ACC_RDWR
H5F_ACC_EXCL
If the file already exists,H5Fcreate
fails. If the file does not exist, it is created and opened with read-write access. (Default)H5F_ACC_TRUNC
If the file already exists, the file is opened with read-write access, and new data will overwrite any existing data. If the file does not exist, it is created and opened with read-write access.H5F_ACC_RDONLY
An existing file is opened with read-only access. If the file does not exist,H5Fopen
fails. (Default)H5F_ACC_RDWR
An existing file is opened with read-write access. If the file does not exist,H5Fopen
fails.
Creating a file¶
- Define the file creation property list
- Define the file access property list
- Create the file
- More information here about creating a file, opening an exisiting file, and closing a file.
Examples from HDF homepage¶
Creating an HDF5 file using property list defaults
Creating an HDF5 file using property lists
fcplist_id = H5Pcreate (H5P_FILE_CREATE)
<...<em>set desired file creation properties</em>...>
faplist_id = H5Pcreate (H5P_FILE_ACCESS)
<...<em>set desired file access properties</em>...>
file_id = H5Fcreate ("SampleFile.h5", H5F_ACC_EXCL, fcplist_id, faplist_id)
Opening an HDF5 file (read-only access)
faplist_id = H5Pcreate (H5P_FILE_ACCESS)
status = H5Pset_fapl_stdio (faplist_id)
file_id = H5Fopen ("SampleFile.h5", H5F_ACC_RDONLY, faplist_id)
Closing an HDF5 file
Viewing a file with h5dump¶
Included with the HDF5 distribution is a command-line utility called h5dump. This is a program for inspecting the contents of a HDF5 file. It displays ASCII output formatted according to the HDF5 DDL grammar.
Displaying the content of file.h5:
This is how the ‘default’ file will look, before any datasets or groups have been created, and no data has been written:
- You can read more about the program h5dump here.
- The HDF5 DDL grammar is described in this document.
File Function Summaries¶
Table of general library functions, macros (H5)
C Function | Fortran Function | Purpose |
---|---|---|
H5check_version | h5check_version_f | Verifies that HDF5 library versions are consistent. |
H5close | h5close_f | Flushes all data to disk, closes all open identifiers, and cleans up memory. |
H5dont_atexit | h5dont_atexit_f | Instructs the library not to install the atexit cleanup routine. |
H5garbage_collect | h5garbage_collect_f | Garbage collects on all free-lists of all types. |
H5get_libversion | h5get_libversion_f | Returns the HDF library release number. |
H5open | h5open_f | Initializes the HDF5 library. |
H5set_free_list_limits | h5set_free_list_limits_f | Sets free-list size limits. |
H5_VERSION_GE | (none) | Determines whether the version of the library being used is greater than or equal to the specified version. |
H5_VERSION_LE | (none) | Determines whether the version of the library being used is less than or equal to the specified version. |
Table of file functions (H5F)
C Function | Fortran Function | Purpose |
---|---|---|
H5Fclear_elink_file_cache | (none) | Clears the external link open file cache for a file |
H5Fclose | h5fclose_f | Closes HDF5 file. |
H5Fcreate | h5fcreate_f | Creates new HDF5 file. |
H5Fflush | h5fflush_f | Flushes data to HDF5 file on storage medium. |
H5Fget_access_plist | h5fget_access_plist_f | Returns a file access property list identifier. |
H5Fget_create_plist | h5fget_create_plist_f | Returns a file creation property list identifier. |
H5Fget_filesize | h5fget_filesize_f | Returns the size of an HDF5 file. |
H5Fget_freespace | h5fget_freespace_f | Returns the amount of free space in a file. |
H5Fget_info | (none) | Returns global information for a file. |
H5Fget_intent | (none) | Determines the read/write or read-only status of a file. |
H5Fget_mdc_config | (none) | Obtain current metadata cache configuration for target file. |
H5Fget_mdc_hit_rate | (none) | Obtain target file’s metadata cache hit rate. |
H5Fget_mdc_size | (none) | Obtain current metadata cache size data for specified file. |
H5Fget_name | h5fget_name_f | Retrieves name of file to which object belongs. |
H5Fget_obj_count | h5fget_obj_count_f | Returns the number of open object identifiers for an open file. |
H5Fget_obj_ids | h5fget_obj_ids_f | Returns a list of open object identifiers. |
H5Fget_vfd_handle | (none) | Returns pointer to the file handle from the virtual file driver. |
H5Fis_hdf5 | h5fis_hdf5_f | Determines whether a file is in the HDF5 format. |
H5Fmount | h5fmount_f | Mounts a file. |
H5Fopen | h5fopen_f | Opens existing HDF5 file. |
H5Freopen | h5freopen_f | Returns a new identifier for a previously-opened HDF5 file. |
H5Freset_mdc_hit_rate_stats | (none) | Reset hit rate statistics counters for the target file. |
H5Funmount | h5funmount_f | Unmounts a file. |
File creation property list functions (H5P)
C Function | Fortran Function | Purpose |
---|---|---|
H5Pset/get_userblock | h5pset/get_userblock_f | Sets/retrieves size of user-block. |
H5Pset/get_sizes | h5pset/get_sizes_f | Sets/retrieves byte size of offsets and lengths used to address objects in HDF5 file. |
H5Pset/get_sym_k | h5pset/get_sym_k_f | Sets/retrieves size of parameters used to control symbol table nodes. |
H5Pset/get_istore_k | h5pset/get_istore_k_f | Sets/retrieves size of parameter used to control B-trees for indexing chunked datasets. |
H5Pset_shared_mesg_nindexes | h5pset_shared_mesg_nindexes_f | Sets number of shared object header message indexes. |
H5Pget_shared_mesg_nindexes | (none) | Retrieves number of shared object header message indexes in file creation property list. |
H5Pset_shared_mesg_index | h5pset_shared_mesg_index_f | Configures the specified shared object header message index. |
H5Pget_shared_mesg_index | (none) | Retrieves the configuration settings for a shared message index. |
H5Pset_shared_mesg_phase_change | (none) | Sets shared object header message storage phase change thresholds. |
H5Pget_shared_mesg_phase_change | (none) | Retrieves shared object header message phase change information. |
H5Pget_version | h5pget_version_f | Retrieves version information for various objects for file creation property list. |
File access property list functions (H5P)
C Function | Fortran Function | Purpose |
---|---|---|
H5Pset/get_alignment | h5pset/get_alignment_f | Sets/retrieves alignment properties. |
H5Pset/get_cache | h5pset/get_cache_f | Sets/retrieves metadata cache and raw data chunk cache parameters. |
H5Pset/get_elink_file_cache_size | (none) | Sets/retrieves the size of the external link open file cache from the specified file access property list. |
H5Pset/get_fclose_degree | h5pset/get_fclose_degree_f | Sets/retrieves file close degree property. |
H5Pset/get_gc_references | h5pset/get_gc_references_f | Sets/retrieves garbage collecting references flag. |
H5Pset_family_offset | h5pset_family_offset_f | Sets offset property for low-level access to a file in a family of files. |
H5Pget_family_offset | (none) | Retrieves a data offset from the file access property list. |
H5Pset/get_meta_block_size | h5pset/get_meta_block_size_f | Sets the minimum metadata block size or retrieves the current metadata block size setting. |
H5Pset_mdc_config | (none) | Set the initial metadata cache configuration in the indicated File Access Property List to the supplied value. |
H5Pget_mdc_config | (none) | Get the current initial metadata cache configuration from the indicated File Access Property List. |
H5Pset/get_sieve_buf_size | h5pset/get_sieve_buf_size_f | Sets/retrieves maximum size of data sieve buffer. |
H5Pset_libver_bounds | h5pset_libver_bounds_f | Sets bounds on library versions, and indirectly format versions, to be used when creating objects. |
H5Pget_libver_bounds | (none) | Retrieves library version bounds settings that indirectly control the format versions used when creating objects. |
H5Pset_small_data_block_size | h5pset_small_data_block_size_f | Sets the size of a contiguous block reserved for small data. |
H5Pget_small_data_block_size | h5pget_small_data_block_size_f | Retrieves the current small data block size setting. |
File driver functions (H5P)
C Function | Fortran Function | Purpose |
---|---|---|
H5Pset_driver | (none) | Sets a file driver. |
H5Pget_driver | h5pget_driver_f | Returns the identifier for the driver used to create a file. |
H5Pget_driver_info | (none) | Returns a pointer to file driver information. |
H5Pset/get_fapl_core | h5pset/get_fapl_core_f | Sets driver for buffered memory files (i.e., in RAM) or retrieves information regarding driver. |
H5Pset_fapl_direct | h5pset_fapl_direct_f | Sets up use of the direct I/O driver. |
H5Pget_fapl_direct | h5pget_fapl_direct_f | Retrieves direct I/O driver settings. |
H5Pset/get_fapl_family | h5pset/get_fapl_family_f | Sets driver for file families, designed for systems that do not support files larger than 2 gigabytes, or retrieves information regarding driver. |
H5Pset_fapl_log | (none) | Sets logging driver. |
H5Pset/get_fapl_mpio | h5pset/get_fapl_mpio_f | Sets driver for files on parallel file systems (MPI I/O) or retrieves information regarding the driver. |
H5Pset_fapl_mpiposix | h5pset_fapl_mpiposix_f | Stores MPI IO communicator information to a file access property list. |
H5Pget_fapl_mpiposix | h5pget_fapl_mpiposix_f | Returns MPI communicator information. |
H5Pset/get_fapl_multi | h5pset/get_fapl_multi_f | Sets driver for multiple files, separating categories of metadata and raw data, or retrieves information regarding driver. |
H5Pset_fapl_sec2 | h5pset_fapl_sec2_f | Sets driver for unbuffered permanent files or retrieves information regarding driver. |
H5Pset_fapl_split | h5pset_fapl_split_f | Sets driver for split files, a limited case of multiple files with one metadata file and one raw data file. |
H5Pset_fapl_stdio | H5Pset_fapl_stdio_f | Sets driver for buffered permanent files. |
H5Pset_fapl_windows | (none) | Sets the Windows I/O driver. |
H5Pset_multi_type | (none) | Specifies type of data to be accessed via the MULTI driver enabling more direct access. |
H5Pget_multi_type | (none) | Retrieves type of data property for MULTI driver. |
File Property Lists¶
Additional information regarding file structure and access are passed to H5Fcreate
and H5Fopen
through property list objects. Property lists provide a portable and extensible method of modifying file properties via simple API functions. There are two kinds of file-related property lists:
- File creation property lists
- File access property lists
You can read more about File Property lists in the HDF5 User Guide.
Code Examples¶
The following examples (for version 1.14.X) are taken from the HDF5 - User Guide - Code Examples.
- Reading/writing a chunked dataset (Chunking refers to a storage layout where a dataset is partitioned into fixed-size multi-dimensional chunks.) [C] [Fortran]
- Reading/writing a compact dataset [C] [Fortran]
- Reading/writing an external dataset [C] [Fortran]
There are more examples on the HDF5 API reference Examples by API, dataset page
Compiling¶
First, you must load the HDF5 module as mentioned at the beginning of this section.
Then, you compile with
- h5cc hdf5_program.c (for C programs - use h5c++ for newer versions)
- h5c++ hdf5_program.cpp (for C++ programs - or use h5pcc for newer versions)
- h5fc program.f90 (for Fortran 90 programs - or use h5pfc for newer versions)
You can get more help with this by typing
Example, compiling, running, GCC
HDF5, version 1.12.2, GCC compiler.
C (h5ex_d_chunk.c - get from download link above)
b-an01 [~]$ module load GCC/11.3.0 OpenMPI/4.1.4
b-an01 [~]$ module load HDF5/1.12.2
b-an01 [~]$
b-an01 [~]$ h5c++ h5ex_d_chunk.c -o h5ex_d_chunk
b-an01 [~]$ ./h5ex_d_chunk
Original Data:
[ 1 1 1 1 1 1 1 1]
[ 1 1 1 1 1 1 1 1]
[ 1 1 1 1 1 1 1 1]
[ 1 1 1 1 1 1 1 1]
[ 1 1 1 1 1 1 1 1]
[ 1 1 1 1 1 1 1 1]
Storage layout for DS1 is: H5D_CHUNKED
Data as written to disk by hyberslabs:
[ 0 1 0 0 1 0 0 1]
[ 1 1 0 1 1 0 1 1]
[ 0 0 0 0 0 0 0 0]
[ 0 1 0 0 1 0 0 1]
[ 1 1 0 1 1 0 1 1]
[ 0 0 0 0 0 0 0 0]
Data as read from disk by hyperslab:
[ 0 1 0 0 0 0 0 1]
[ 0 1 0 1 0 0 1 1]
[ 0 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0 0]
[ 0 1 0 1 0 0 1 1]
[ 0 0 0 0 0 0 0 0]
b-an01 [~]$
Example, compiling, running, Intel
Fortran 90 (h5ex_d_chunk.f90 - get from download link above)
b-an01 [~]$ module load intel-compilers/2022.1.0 impi/2021.6.0
b-an01 [~]$ module load HDF5/1.12.2
b-an01 [~]$
b-an01 [~]$ h5pfc h5ex_d_chunk.F90 -o h5ex_d_chunk
b-an01 [~]$ ./h5ex_d_chunk
Original Data:
[ 1 1 1 1 1 1 1 1 ]
[ 1 1 1 1 1 1 1 1 ]
[ 1 1 1 1 1 1 1 1 ]
[ 1 1 1 1 1 1 1 1 ]
[ 1 1 1 1 1 1 1 1 ]
[ 1 1 1 1 1 1 1 1 ]
Storage layout for DS1 is: H5D_CHUNKED
Data as written to disk by hyberslabs:
[ 0 1 0 0 1 0 0 1 ]
[ 1 1 0 1 1 0 1 1 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 1 0 0 1 0 0 1 ]
[ 1 1 0 1 1 0 1 1 ]
[ 0 0 0 0 0 0 0 0 ]
Data as read from disk by hyperslab:
[ 0 1 0 0 0 0 0 1 ]
[ 0 1 0 1 0 0 1 1 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 1 0 1 0 0 1 1 ]
[ 0 0 0 0 0 0 0 0 ]
b-an01 [~]$ </pre>
Additional info¶
You can find more information at the following locations:
- Creating a file, opening a file/closing a file
- h5dump
- HDF5 DDL grammar
- Datatypes/File Function Summaries
- File creation property list functions (H5P)
- File access property list functions (H5P)
- HDF5 User Guide
- Code Examples
- HDF5 tutorial
- Compiling HDF5 Applications
- Code examples by API
- Simple HDF5 in Python and Fortran
- An introduction to HDF5 (YouTube)