DIPlib 3 design decisions
Contents
This page gives reasons behind some of the design choices of DIPlib 3. Many of these decisions are inherited from the previous version of the library, and some new ones are made possible by the port to C++.
Function signatures
There are two possible function signature styles for use in an image analysis library:
-
void Filter( dip::Image &in, dip::Image &out, int size );
-
dip::Image Filter( dip::Image &in, int size );
Both of these options have advantages and disadvantages. Style 1 allows for in-place operation:
dip::Image img = ... Filter( img, img, 1 );
The function here is able to write the results in the image’s pixel buffer, without having to allocate a temporary pixel buffer as would be the case for:
dip::Image img = ... img = Filter( img, 1 );
This is a huge advantage both in speed and memory usage. However, resulting programs are not as easy to read (which parameters are inputs and which are outputs?) and not as pretty as with style 2. For example, style 2 allows for a very elegant chaining of operations:
dip::Image img = ... img = Filter2( Filter1( img, 3 ), 1 );
Furthermore, style 2 makes it much easier to automatically generate interfaces to languages (such as MATLAB) that do not allow a function to modify its input arguments. Such an automatic interface generation tool needs to know which arguments are inputs and which are outputs.
In DIPlib 2, (written in C), all functions returned an error code, and so output values had to be function arguments (style 1). But in C++ we have exceptions to handle error conditions, and so are free to have an image as the return value of the function (style 2). Because, the advantage of style 1 is too large to ignore, we have kept the function signature style (and argument order) of DIPlib 2, but additionally we have written a small, inline wrapper function for each of the functions that follows the signature style 2. Such a wrapper is very straight-forward:
inline dip::Image Filter( dip::Image &in, int size ) { dip::Image out; Filter( in, out, size ); return out; }
We have chosen not to pollute the documentation with these wrapper functions.
If a function Filter( in, out )
exists, then you can assume that
there will also be a function out = Filter( in )
.
Class method vs function
Some libraries put all image processing/analysis functionality into the
image object as methods. The idea is to filter an image by
img.Gauss(sigma)
. This is a terrible idea for many reasons:
- one never knows if the image object is modified by the method or not,
- the core include file for the library changes when adding any type of functionality, forcing recompilation of the whole library, and
- it’s ugly.
Filters should be functions, not methods.
In DIPlib, methods to the core dip::Image
class query and manipulate image
properties, not pixel data (with the exception of dip::Image::Copy
and
dip::Image::Fill
). Filters and other algorithms that manipulate image data are
always functions or function objects.
We use function objects sparingly in DIPlib. ITK, for example, has taken the object-oriented approach to an extreme, where each algorithm is encapsulated in an object, and parameters to the algorithm are set through object methods. This leads to very verbose code that is not readable. For example, compare the two code snippets below (function object version is not ITK code, but a simplified version that ignores ITK’s templates and processing pipeline):
lib::GaussianFilter gauss; gauss.SetInput( image ); gauss.SetSigma( FloatArray{ 5, 1 } ); gauss.Execute(); outim = gauss.GetOutput();
outim = dip::Gauss( image, FloatArray{ 5, 1 } );
Compile-time vs run-time pixel type identification
DIPlib uses run-time identification of an image’s pixel type, and functions dispatch internally to the appropriate sub-function. These sub-functions are generated at compile time through templates
The alternative, seen in most C++ image analysis libraries (ITK, Vigra, CImg, etc.), is to define the image class, as well as most functions, as templates. The user declares an image having a specific data type (and dimensionality), and the compiler then creates an image class with that data type for the pixels, as well as instances of all functions called with this image as input. This takes time. Compiling even a trivial program that uses CImg takes a minute, rather than a fraction of a second it takes to compile a similar program that uses DIPlib. Writing most functionality as templates implies that most code is actually in the header files, rather than in the source files. This functionality then ends up in the application executable, rather than in an independent library file (shareable among many applications).
However, the largest disadvantage with templated functions happens when creating an (even slightly) general image analysis program: you need to write code that allows the user of your program to select the image data type, and write code that does all the right dispatching depending on the data type (see, for example, the SimpleITK interface to ITK). Alternatively, you have to restrict the data type to one choice. A library is meant to take away work from the programmer using the library, so it is logical that DIPlib should allow all data types and do the dispatching as necessary. After all, DIPlib is meant as a foundation for DIPimage and similar general-purpose image analysis tools, where you cannot determine in advance which data type the user will want to use. Think about how complicated each of the DIPlib-MEX-files would be if DIPlib had compile-time typing: each MEX-file would have to check the data type of the input array (or arrays), convert it to a DIPlib image class of the same type, then call one of 8 or 10 instances of a function. Furthermore, this MEX-file would need to know which image data types are meaningful for the function being called (integer only? binary only? does it work on complex values?). Instead, we simply convert the input array to a DIPlib image and call a function. The MEX-file is trivial, the DIPlib function itself takes care of everything.
DIPlib does expose a few templated functions to the user. However, these
templates typically abstract the type of a constant (see, for example, the
function dip::Add
with a templated rhs
argument), and never that of an image.
Such a template is always a trivial function that simplifies the library user’s
code.
Passing options to a function
Many algorithms require parameters that select a mode of operation. DIPlib 3
uses strings for such parameters when the function is intended to be usable
from interfaces. By not defining C++ constants, the interface code can be
kept simple. For example, dip::Dilation
has an option for the shape of the
structuring element. Instead of defining an enum
(as DIPlib 2 did)
with various values that the interface code needs to translate, the option
parameter is a string. The user of the interface and the user of the C++ library
see the same parameter and use the function in the same way. The overhead of a
few string comparisons is trivial compared to the computational cost of any
image analysis algorithm.
An other advantage is having fewer possibilities for name clashes when defining a lot of enumerator constants for the many, many options accumulated of the large collection of functions and algorithms in DIPlib.
A function that has multiple independent options takes a dip::StringSet
(a std::set<std::string>
) as input. The user can simply join strings using
curly braces, much like in MATLAB. The algorithm can easily check if a
specific string is given or not.
However, for infrastructure functions not typically exposed in interfaces (i.e.
the functions that DIPlib uses internally to do its work) we do define
numeric constants for options. For example, see the enumerator dip::Option::ThrowException
,
or any of the flags defined through DIP_DECLARE_OPTIONS
. These are more efficient
in use and equally convenient if limited to the C++ code.
Const correctness
When an image object is marked const
, the compiler will prevent modifications
to it, it cannot be assigned to, and it cannot be used as the output argument
to a filter function. However, it is possible to create a non-const
image that
points to the same data segment as a const
image. The assignment operator, the
dip::Image::QuickCopy
method, and most indexing operations will do this.
There were two important reasons for this design decision:
-
Making a
const
and a non-const
version of most of these indexing operators is possible, but some are functions (such asdip::DefineROI
) taking an output image object as function argument. This argument cannot be markedconst
because the function needs to modify it. However, the function must assign the data pointer from aconst
image into it. -
Functions such as the framework functions need to make certain type of modifications to an input image, such as converting the tensor dimension to a spatial dimension, or applying singleton expansion. The simplest way of accomplishing this is to create a copy of the input image, and modify the copy. However, it would make no sense to make a copy of the data also, since the data are not modified. Thus, we need to make a non-
const
copy of aconst
image that shares its data pointer.
Thus, strictly forbidding data pointers from const
images to be assigned to non-const
images would make it impossible to write certain types of functions, and would make
other types of functions much more complicated.
Because a copy of a const
image can provide non-const
access to its pixels, we felt
that it did not really make sense either to have the dip::Image::Data
,
dip::Image::Origin
, and dip::Image::Pointer
methods return const
pointers when applied to a const
image. That is, all accesses
to pixel data ignore the constness of the image object. The constness of the image
object applies only to the image’s properties (size, strides, tensor shape, color space,
etc.), not to its pixel values.
However, none of the functions in DIPlib will modify pixel values of a const
image.
Input images to functions are always const
references, and even though it would be
technically possible to modify its pixel values, we have an explicit policy to not do so.
The same applies to the dip::Measurement
object, but for a different reason:
Implementing correct handling of const
objects would require two versions of all
iterators (a const
one and a non-const
one). Since these iterators are quite complex,
and the benefit of correct const
handling is limited, we decided to follow the same
principle as with the dip::Image
object: non-const
data access is always allowed, but
DIPlib has an explicit policy to not to change data of a const
object.
Frameworks
The frameworks simplify the writing of many image processing algorithms such that they work on images of many different data types, are dimensionality independent, and use multithreading. Framework functions also take on the task of testing input images and creating output images. Typically, an image processing function only needs to write a function that processes a single image line.
Such a line function is passed to the framework function, and the framework
function calls the line function for every image line. The line function might
need a state (parameters, intermediate data, output data). Furthermore, intermediate
and output data (i.e. data that the line function writes) must be separate for each
thread calling the line function. DIPlib 2 did this in the typical C fashion:
a function pointer and a void*
to the state. The framework function passed the
void*
to the line function, which would cast it back to its original type. C++11
offers several alternatives that are more type-safe, and offer greater flexibility.
One option (that we did not choose) would be using std::function
, which is an object
that encapsulates a function pointer, a lambda, or a functor with a predetermined
signature. It would be possible to write a lambda function that captures some variables
by reference, or to use std::bind
to bind local variables to a function pointer. It
would also be possible to write a more complex class whose objects can be ‘called’
with the required signature (functor). std::function
has some overhead, especially
at creation.
The other option (that we did choose) is through derived classes and virtual functions.
A base class with a pure virtual function serves as the “model”. An object of a derived
class, implementing a specific image analysis algorithm, can be referenced using a base
class pointer. The derived class can have variables that the algorithm uses,
including references to local variables in the caller’s workspace. A benefit of this
option over the std::function
is that we were able to define a second function
in the class, which the framework function calls once, before starting the processing,
and after it has decided how many threads to use. This allows the creation of
intermediate state variables for each thread. Without this facility, intermediate
state needs to be created for each potential thread (i.e. by examining the maximum
number of threads setting), which might be wasted effort if fewer threads will be
used.
A std::function
might have offered more flexibility in how to implement the line
function, and would have allowed to write simple line functions inline, using a
lambda. However, the syntax using a derived class with a virtual function is somewhat
simpler and more straight-forward, and thus more accessible. This was the main reason
for us to choose the approach we chose.
Multithreading
We use OpenMP for multithreading, mostly because it seems (to me) easier to use than Intel’s Threading Building Blocks (TBB). TBB does not require special compiler support, but all modern compilers support OpenMP (except that XCode’s CLang on MacOS doesn’t come with the OpenMP library, which needs to be installed separately). The GNU Compiler Collection has very good support for OpenMP, and is available on all platforms. TBB probably also plays better with C++ code than OpenMP, which does not allow exceptions to be thrown across parallel construct boundaries. But we’re dealing only (so far) with trivially parallelizable code, so this is not a major issue.
The framework functions determine, based on the number of operations to perform,
whether it is worthwhile to create threads for a particular computation. To do so,
they call a GetNumberOfOperations
method of the line filter object. Each filter
thus needs to have such a method that determines how much work it will be to process
one image line. If the number of operations (clock cycles) is larger than a threshold,
multiple threads are started to process the image. I noticed that there is not a large
difference in overhead between starting one additional thread or starting three, so
it was not worth while to fine-tune the number of threads based on the number of
operations to perform.
The threshold was determined empirically on one single computer, and the way that the number of operations per line is computed is imprecise and in some cases empirical. It is more than likely that the threshold will not be optimal on a different machine. Furthermore, for some filters it is not even possible to determine ahead of time the number of operations because it depends on the data (e.g. see the pixel table morphology line filter).
Thus, the point at which multiple threads are launched is imprecise at best. However, what matters here is that, for very small images, we do not start threads and double or worse the time spent on the filtering. DIPlib 2 did not have any such logic, always started threads within the frameworks, and consequently behaved poorly with very small images. This system is intended to overcome that problem.
Ranges for indexing
There two common ways to define ranges for indexing, either the end index is included or not. For example,
in MATLAB the end index is included, in Python it is excluded. Some languages such as Ruby and Scala allow
the user to choose between the two. DIPlib uses dip::Range
to represent a range for indexing. These
ranges include the end index.
On Stack Overflow one sees quite a few errors and cases of confusion in Python because the end index is not included in the range. Similar errors in MATLAB are highly uncommon. I think that this is because in English (and many, if not all, other languages too) when one counts from a to b, the b is included in the count. This is a strong argument in favor of including the end index: it is less confusing to the novice programmer.
There are no strong arguments
as to why it should be excluded. The argument typically offered is that array[a:b]
in Python has b - a
elements,
rather than b - a + 1
as would be the case if the end index were included, and this is prettier and thus better.
Also, array[a:b] + array[b:c] == array[a:c]
, whereas in MATLAB one would have to write array(a:b-1)
and array(b:c)
to split array[a:c]
; this is again an argument of aesthetics. The only technical argument is that,
by excluding the upper bound, it is possible to index zero elements, array[1:1]
; however, in MATLAB this is also
possible, array(1:0)
, so this argument is again not meaningful.
Nonetheless, MATLAB’s end index being included is paired with indexing starting at 1, such that the range 1:n
has
n
elements, just like Python’s range(0, n)
. There are very few examples where indexing starts at 0 but includes
the end index.
In DIPlib indexing starts at 0, so the logical solution would be for dip::Range( a, b )
to exclude b
, following
Python’s logic, which also starts indexing at 0. But this leads to certain difficulties when using negative indices
to count from the end.
For example, in Python, array[a:]
is all elements from a
to the end of the array, whereas array[a,-1]
is all
elements from a
to the second to last element. In C++ we cannot copy that syntax, we cannot write dip::Range( a, )
.
So if the end index is not included in the array, then there is no way to express a range that includes the last index.
The same is true for reverse ranges that end in the first array element. There is no number before 0 that can be
used to indicate “one element before the first one”, since -1 means the last element of the array.
This particular issue lead to the decision to include the end index in the range, at the risk of doing things differently. There are two positive side-effects to this decision:
-
Ranges that include the end index are more intuitive, as mentioned above.
-
It is not possible to create an empty range.
dip::Range( 4, 4 )
is one element, anddip::Range( 4, 3 )
is two elements, in reverse. By not allowing empty ranges, we can be sure that indexing an image using a range will produce a valid view.
In C++ it is common to write loops as for( ii = 0; ii < n; ++ii )
, and the end iterator always points to one past
the end of the range. But the dip::Range
start and stop values are not meant to be used directly in loops,
they are meant for the user to communicate ranges to library functions. dip::Range
has begin()
and end()
functions for use in loops. Therefore, this aspect should not cause confusion.