Fortran essentials - useful I/O
This is part one of my series on essential libraries and techniques for modern Fortran. If you need a refresher on what this is all about, check out the last thing I wrote. For this post, I’ll be looking at different solutions for friendly I/O, including options for parsing configuration files and serialised output.
Why use an I/O library?
In contrast to most modern languages, it’s somewhat uncommon for Fortran codes to use any kind of structured I/O library (even built in ones), with most legacy codebases preferring to roll their own bespoke parser. There’s probably many overlapping reasons for this, but I suspect the two biggest factors are:
- Historical conventions - Fortran is a very old language and community norms are very slow to change.
- Most Fortran codes (especially smaller codebases) started out as research projects, where the goal is to get something working and results published as soon as possible; the code is just a tool to achieve that goal. While the attitude of “just get something out the door” is rational in a short-term research project, unfortunately these research codes have a way of sticking around for much longer than initially intended. As the saying goes, nothing is more permanent than a temporary solution.
Writing a parser is non-trivial though, especially if you’re not primarily a programmer, so hand-rolled input parsers often tend to be brittle and informally-specified. It’s very common for input file formats to just consist of fixed-width records with significant whitespace and very little flexibility. They also usually have non-obvious structures (e.g. “which of these unlabelled integers corresponds to this parameter?” or “why does it fail if I use spaces instead of tabs?”) and often ends up relying on a sort of oral-tradition among postdocs and grad students.
This is… non-optimal on several fronts. First, the general point: software input should be for humans, not machines. Fixed-format, significant whitespace input is hard for humans to read and even harder to debug when something goes wrong (and it will). It’s tempting to punt on UX if you’re the only person using your code, but if you want anybody else to be able to use your code effectively (grad students, collaborators and so-on), then investing a bit of extra time to use a nice I/O library will pay for itself many times over in productivity gains.
Fair warning and disclaimer
The rest of this post is essentially a dry review of various I/O parsing libraries available in the Fortran ecosystem. I’ve mostly written this so I have something to point at when the question of I/O comes up in future, so it probably won’t be interesting unless you’re also writing or maintaining Fortran code. If you are, and want to stop tearing your hair out over bad homebrew parsers then read on! Otherwise you might want to stop reading here and we’ll return to our regularly scheduled esoterica in due course.
As always, the views here are my own subjective opinion based off a few hours spent poking each library. Your results may vary and none of this is an attack on the fine people who develop these libraries - they do great work and should be celebrated. Also, these views don’t represent my employer(s), obviously.
What I’ll be assessing
I’ll be focusing on the following criteria:
- Ease of installation/compilation. For concreteness, I’ll look at ease of
installation with CMake, since that’s what my current project uses (and also I don’t like writing plain
Makefiles). I’ll do a proper run-down of
fpm
and other build systems in a future post. - How many dependencies does it bring in? Ideally, each library should be self-contained, but that’s not always possible given Fortran’s sparse standard library.
- Ease of use. This includes things like: does it have a sensible API? How much boilerplate does it require? How much do I have to restructure my code to integrate it? How easy is it to make changes to the library’s source code if needed?
What I won’t be assessing is stability, since that often requires long-term usage before you can get a good idea of things like bugs and performance regressions. If I end up using one of these for a real project, however, then I will come back and update this post with what I’ve learned. I also won’t be assessing libraries meant for data-intensive I/O like HDF5 - this post is about configuration files, but I will cover data-intensive I/O in a later post. Finally, I won’t be looking at JSON or XML parsers because I don’t think they’re a good choice for human-writable configuration files 1 and I won’t be looking at YAML because I don’t like it.
FiNeR - INI parsing
What are INI files?
Who knows? There’s no formal spec! INI files originate from Microsoft Windows, where they were used as configuration files for various programs before MS switched to using registry entries. INI files are still very widely used for program configuration, since they have a relatively simple structure and are easier to write a parser for than other comparable formats.
Despite the lack of formal specification, there are a few core features that most INI implementations can agree upon:
- Input consists of key-value pairs like
key=value
. - Key-value pairs can be organised into logically related groups via “section headers” enclosed in
square braces, e.g.:
key1 = value [Section 1] key2 = value
- Usually supports values of string, integer, float and boolean types.
- Sometimes supports lists of values like
key=[value1,value2,value3]
, although this is less common and sometimes has different syntax between implementations.
Overall, INI files are fairly simple and they’re not really able to represent scientific data (e.g. atomic wavefunctions, molecular dynamics trajectories), but they’re really useful for handling simulation parameters (e.g. type of system to simulate, level of theory to use, where to store the output).
Compilation
- Need to do a
git clone --recursive
, since it depends on a bunch of the author’s other libraries. Some of them are potentially worth re-using, likegetopt
-style command-line argument parsing - Easy with CMake, just add a library directory to the project.
- I like that the author has added support for referencing the library via namespaces
FiNeR::FiNeR
- I like that the author has added support for referencing the library via namespaces
- Author has his own bespoke build system.
- Thanks but no thanks
Ease of use
- GitHub wiki is severely out of date and incomplete
- Better to go by the automatically generated API docs
- Can also glean stuff from example code and unit tests
- This is slow and error-prone
- The file parsing sets an (optional) error code if the target file doesn’t exist
- No exceptions raised (regular Fortran I/O will usually crash if the file doesn’t exist) so need to manually check if the import worked
- Annoying, but not a game-breaker. Very much in line with lots of other libraries (doesn’t mean I like it though)
- Support for “basic” INI functionality, plus support for array-valued parameters
- No support for “global” properties which don’t belong to any section header
- String-valued properties returned by
ini%get
(e.g.key = 'value'
) are not automatically allocated to the correct size. There’s also no option to get the length of the string before parsing, so you need to pre-allocate the string to some size which is “big enough” and thentrim()
it after the subroutine returns.How much do I need to change my code?
- One line to read from an
ini
file - Single function to check for existence of sections, etc
- Single function to return the value of a property.
- Has a few custom data types to represent an abstract “configuration”, but properties return standard
strings (
character(len=:)
). Need to manually convert to other data types
Verdict
- Nice interface
- Support for a good set of INI functionality (except global properties).
- Kind of annoying to work with string-valued parameters
- I wish it had fewer dependencies
config-fortran - INI file parsing
Installation and compilation
- Makefile only :(
- Aside from examples, there’s only one source file
m_config.f90
, so it’s fairly straightforward to integrate with an existing CMake project.
Ease of use
- Very sparse documentation, but the API is pretty simple so it doesn’t need very much
- Need to “register” a property with the
CFG_t
bookkeeping variable beforeget
-ing it from a file, which is non-obvious. There’s a helper function for this calledCFG_add_get
, so it’s not too bad in practice. CFG
functions are generic, so they automatically cast values while parsing based on the type of the variable they are stored in. Need to be careful as Fortran is strongly-typed, so it can only handle types the author has explicitly written implementations for:- String
- Logical (bool)
- Int
- Real (double precision). Must be
real(kind=8)
(on most computers, it could conceivably be different in some weird edge cases because Fortran)
Verdict
- Simple input format, but with support for some extensions to the “standard” INI format
- Too much bookkeeping
- Would probably just use
FiNeR
instead.
toml-f - TOML parsing
What is TOML
In classic open-source naming tradition, TOML stands for Tom’s Obvious Minimal Language (the “Tom” is this guy); a fact that has caused some skepticism among non-programmers I’ve tried to explain this to. Despite its sort of amateurish sounding name it’s a mature, very well-supported serialisation format that’s basically like a fancy INI with extra features.
TOML supports lists, nested sections and special data types like dates, but does require making the
different data types explicit in the configuration files. For example, opt = 1.5
is a float, while
opt = "1.5"
is a string and string literals must always be quoted (contrast this with something like
YAML or most INI formats which are extremely forgiving).
There’s very little difference between TOML and INI for small, simple configuration files. There is, however some criticism that it becomes overly complicated and fragile for projects with very large or very many configuration files. For example, the aforementioned first-class support for dates is probably an anti-feature for most numerical projects, since time and date parsing is notoriously fiddly and error-prone, with many subtle edge-cases to trip over.
All that being said, I personally prefer INI files because they’re so simple and freeform, but TOML has
its uses and I certainly wouldn’t discount it if it turns out that toml-f
is the best parsing library
(oooh, foreshadowing!).
Installation and compilation
- Doesn’t pull in external packages - good!
- Easy to include via
CMake
- just doadd_subdirectory(toml)
if using as a git submodule, orinclude(toml-f)
otherwise.
Ease of use
Terrible documentationUPDATE 07/06/2022: the project now has some good, centralised documentation at this link. This specific point of criticism is now less salient than when I originally wrote this post.Project’s GitHub is both sparse and outdatedNeed to rely on the automatically generated API docs again- Authors also recommend reading the
fpm
source, since it makes a lot of use oftoml-f
- Actually not too hard once you figure out which functions and variables to use. Has polymorphic (well, Fortran “polymorphic”) interface to parse either a file or a string (stored in memory) and has nice error types.
- String types are allocated within
parse
andget_value
functions, so no need to trim whitespace.- Ditto for list-valued parameters.
- Need to recursively parse tables (this is the structure of TOML), so you can’t just do
get_value("heading.subheading.key")
. This is a little bit inconvenient, but manageable.
How much do I need to change my code?
A little bit:
- Need to use and allocate special
toml-f
-specific data types. - Can’t just do a simple
toml_parse()
thenget_value()
if there are nested sections: each section needs its own allocation, which you then callget_value()
on. - Can’t just parse input arrays/lists directly into an array. Need to specially allocate a TOML array type and then iterate over that.
- Fairly easy to isolate all input functions into a single module and treat it like a black box.
Verdict
- Overall, I like it
- Well-specified input format
- Feature rich (perhaps too much so)
- Intuitive API
- Well-designed build system
Terrible documentation, though:I might end up writing some basic documentation for this, since I’m planning to use it for a professional project.
F90 namelists
Namelist I/O is a very strange thing. It’s both extremely Fortran, being a part of the standard since Fortran 90 and closely mirroring Fortran syntax and semantics, but is also intuitive and robust? I was absolutely expecting some hoary old legacy format that’s impossible to use with modern code, but I actually came away liking namelists and how they integrates with the language.
Namelists have a very simple but limited API: parameters are declared in a NAMELIST
block in the code
with similar syntax to COMMON
or DATA
blocks:
INTEGER :: param1
REAL :: param2
LOGICAL :: param3
NAMELIST /input/ param1, param2, param3
Like in a COMMON
block, variables must already be declared before they can be used in a NAMELIST
block. Values are then provided in a file (or string, which can behave the same under Fortran I/O
functions), grouped by namelist identifier with potentially multiple namelists per file:
input&
param1 = 5
param2 = 2.5
param3 = .TRUE.
/
other_namelist&
...
/
Note that the &
and /
delimiters absolutely must be included, although whitespace is not significant.
This is then parsed by calling READ(nml=input, unit=iounit)
, where nml
must be set to some
pre-declared namelist identifier and unit
must already be opened for reading.
Input values must be written using Fortran syntax in the file, e.g. for boolean parameters:
param = .TRUE.
is validparam = true
is not Namelists are also strongly types, so Fortran will throw a runtime error if the type of variable in the namelist file (or string) does not exactly match its declared type in theNAMELIST
block. Finally, namelist files are allowed to omit variables (so we could leave outparam3
, for example), in which case they will be left with whatever value they had prior to theREAD(nml=...)
statement. This means that we can provide a default value in the variable declaration (or really anywhere prior to theREAD
statement) and support optional parameters in our configuration file. Neat! Note that the inverse is not true: namelist files cannot include values which are not declared in theNAMELIST
block and will error out if it finds any.
Installation and compilation
- Part of the Fortran standard since Fortran90. No need to install anything.
Ease of use
- Strings are not automatically allocated or trimmed, so you need to handle string size manually.
- Can use arrays (even multidimensional) arrays, but need to do manual initialisation inside the namelist file using standard Fortran array declarations. Must be fixed-size only (I think. I’d be very happy to learn otherwise). Overall, arrays are kind of annoying to use unless you have very simple needs.
- Somewhat sparse documentation in standard textbooks for some reason, but there’s a few handy resources online:
Verdict
- Actually pretty good, except for array-valued parameters which are very limited compared to other options.
- Has some structure to it, so it’s better than the default Fortran approach of reading records line-by-line with significant whitespace.
- Being part of the Fortran standard is appealing:
- No dependency hell
- No special steps to install
- Standard, stable interface
Conclusions
There are two standout winners in all categories: toml-f
and namelist I/O. Both are extremely easy to
install and have straightforward, unsurprising APIs and use an intuitive syntax for their
respective configuration languages. I’ve actually been using both of these in earnest since I started
writing this post, and I have been very pleasantly surprised by the ease with which I can spin up
something which is both robust and pleasant.
On balance, I prefer toml-f
for a few reasons. First, it supports a wider range of data types than
namelists, which are limited to Fortran primitives and fixed-length arrays. It also has more useful
runtime error messages than namelists. Namelist I/O obviously beats all of the other options when it
comes to ease of installation and dependency management (being a native Fortran feature), but toml-f
seamlessly integrates with existing CMake build systems and has no mandatory dependencies so it’s not
too bad. toml-f
unfortunately has nonexistent documentation, which admittedly is a fairly large
downside - I should probably write a simple user guide, if only for the benefit of Future Emily.
I must give an honourable mention to FiNeR
- it’s definitely got the right idea, but I just wish that
the documentation were slightly more complete and it had fewer linked dependencies.
So that’s my estimation of configuration/serialisation I/O libraries in modern Fortran, at least as it
currently stands. I’ll update this post if anything interesting pops up, and if I come across some
horrifying bug or design choice 6 months into using toml-f
or namelists in production.