1. Salishan Paraffins Problem:

Given an integer n, output the chemical structure of all paraffin molecules for
i<=n, without repetition and in order of increasing size. Include all isomers,
but no dupicates. The chemical formula for paraffin molecules is C(i)H(2i+2).
Any representation for the molecules could be chosen, as long as it clearly
distinguishes among isomers.

The problem addresses the representation of recursive tree structures, creation
and manipulation of these structures, nested loop parallelism and some
combinatorics issues.

2. Overall Strategy

The Paraffins problem was designed to reveal the strength of applicative
languages that can encode a compact solution based on higher order functions at
the cost of inefficiency. The usual approach is to produce all paraffin
molecules by attaching paraffin radicals of appropriate sizes to a leading
carbon atom without regard to producing duplicates as many different
orientations of the distinct paraffin isomers could be produced. As each new
molecule is generated, it is tested for distinctness from all previously
retained molecules (this test involves comparison of molecules based on various
transformations such as rotation, inversion and swapping of paraffin radicals.

A much more efficient solution that guarantees that only new not previously
generated molecules are produced at each step is based on the theory of free
and oriented trees. A paraffin molecule can be viewed as a free tree with nodes
corresponding to carbon atoms and edges corresponding to carbon-carbon bonds.
However, the data structures for paraffin molecules are designed to represent
oriented rather than free trees. Duplicates are avoided by a mapping between
the vertices in the free tree and those in the oriented tree. Lexicographic
ordering for the subnodes of a node eliminates the source of duplicates due to
imposing order on the unordered neighbors of a node in a free tree.

Varying the vertex that is mapped to the root vertex in the oriented
representation of a free tree is another source of duplicates. The centroid
theorem is used to avoid this: a tree of odd size has a single centroid (vertex
of a minimum height), while one of even size has either a single centroid or a
pair of adjacent centroids.

3. The Structure of Paraffin Isomers

Odd-sized paraffin molecules and even-sized single centroid paraffin molecules
are called "carbon centered paraffins", or CCPs. The root of CCPs is a carbon
atom that has 4 radicals as subtrees. Each of the four radicals has less than
or equal the (i-1)/2 carbon atoms, if i is the size of the molecule.

Even-sized double centroid paraffin molecules, called "bond-centered
paraffins", or BCPs have a root corresponding to a carbon-carbon bond. Each of
the carbon atoms attached to this bond is a root of a paraffin radical. The two
top-level radicals are of size i/2 each.

The root nodes of carbon radicals of size i, i>0, correspond to carbon atoms
and have 3 subnodes, each the root of a subtree representing a paraffin
radical. Each of these three subradicals have a size less than i-1 and total
size of i-1.

A paraffin radical of size 0 is a hydrogen atom.

All three kinds of objects, radicals, BCPs, and CCPs are constructed by
attaching a needed number of paraffin radicals of appropriate sizes to some
other node.


4. Comparison to Ada

The Ada solution to a Paraffins problem contains a surprisingly low number of
executable statements relative to the total number of statements in the program
- only about 25%. The bulk of the code contains definitions of various generic
functions, modules, etc. The Sather ratio is quite a bit different: 66%.

The other very significant difference is in the implementation of partition
enumeration routine. One of the goals of the Paraffins problem is to test the
ability of different languages to facilitate the reuse of software components.
The code for partition enumeration is used in the program in many different
contexts. In all cases, the exact same semantics of the enumeration routine is
called for: it has to generate all ordered partitions of a specified size with
a specified sum of all elements such that all elements are within a given
range. Partitions are supposed to be generated in the ascending lexicographic
order.

What differs from one usage to another is the operation(s) applied to the
generated partitions. Ada solution adapted a Lisp style passing of
apply_to_each function into the enumeration routine, thus exposing the
enumeration code to the outside world. Ada uses generic function mechanism for
such parametrization. Each of the clients of the enumeration abstraction needs
to create its own specialized version different from the rest only by the
semantics of apply_to_each function. Moreover, the interface of apply_to_each
is fixed (modulo some limited parametrization). Thus all clients need to create
a function with a fixed interface that processes partitions generated by the
enumeration routine even though there is nothing in the semantics of the
client/enumeration abstraction interaction that requires such restrictions.
Exactly the same is repeated for a tuple enumeration routine. As a result, code
gets polluted by lots of instantiations of the generic functions.

The Sather solution seems to show some certain software engineering benefits.
The partition enumeration abstraction is implemented as an iterator
(enum_partitions!). Once initialized, the iterator produces a stream of
partitions. Any action could be applied to each generated partition by the
client with no restriction on the interface whatsoever. Admittedly, writing
recursive iterator enum_partitions! is a bit tricky, however the advantages are
obvious - the iterator implementing the enumeration abstraction deals only with
enumeration and nothing else. None of the peculiarities of possible usages of
the generated partitions are exposed to partition generation itself, as opposed
to Ada.

Finally, parallelization appears to be equally trivial in Ada and pSather. Both
rendezvous mechanism in Ada and parloop in pSather provide good and simple
abstractions with no obvious winner.

