Daplex is a functional style data language that was designed as a ``conceptually natural'' interface to the FDM [14]. It is a high-level, declarative language, and is therefore amenable to optimisation, but it is also very expressive, as the examples given in the rest of this section will hopefully illustrate.
A database query in Daplex takes the form of several nested ``for'' loops, which describe sets of database values, followed by a sequence of ``actions'', such as display or update actions, which describe how those sets are to be manipulated. For example, if we wished to display the names and telephone numbers of all the staff members in the unidb database, we would give the following Daplex query, which contains just a single loop and a single action:
for each s in staff
print(forename(s), surname(s), 'has telephone no.', telephone(s));
In order to execute this query against a database, we must compile it into Prolog. To do this, enter the P/FDM system and open the database you wish to retrieve data from:
% pfdm
P/FDM welcome message ...
| ?- open_module(unidb, read).
yes
| ?-
There are two ways to compile the above query. The first is to type it
into a file, for example called query.d
, and then
ask P/FDM to compile the contents of that file:
| ?- daplex('query.d').
This command compiles all the Daplex queries that it finds in the given file, converting each one in turn into a Prolog program called run/0 which it then executes. If you are interested, you can look at the code generated for the last query in a file, by asking Prolog to display the definition of run/0:
| ?- listing(run).
As a result of compiling the query.d file, then, a list of the names and telephone numbers of the staff recorded in the unidb database will be displayed.
The alternative way to compile your query is to type the daplex command by itself. The compiler will then prompt you to type in your query directly, without having to create a separate file for it:
| ?- daplex.
|: for each s in staff
|: print(forename(s), surname(s), 'has telephone', telephone(s));
Notice that a piece of Daplex is always terminated by a semi-colon (we saw this earlier when defining schemas in the data definition part of the language). When the daplex compiler detects the semi-colon, it compiles the query that has been entered and executes it, just as it did when compiling the query.d file. When the execution is complete, the compiler prompts for more Daplex code (i.e. with the ``|:'' prompt). You may type in another query, or to return to the Prolog prompt, type a semi-colon by itself.
What then is the behaviour of a Daplex query when it is executed? For a query containing only a single loop, like our example query, the behaviour is very simple. The set described by the loop is evaluated, and then the actions are executed once for each value generated, with that value being substituted for every occurrence of the loop variable (s in the example). When we have more than one loop in a program, as in the following (nonsense) query:
|: for each s in staff
|: for each c in course
|: print(forename(s), surname(s));
the set described by the inner loops is evaluated once for every value of the outer loop's set, producing a set of tuples. In this case, the result is a set of binary tuples giving the cartesian product of the staff and course classes. The sequence of actions is now executed for each tuple in the final loop set, again with appropriate value-for-variable substitutions. It is possible, and more usual, to define the set of a nested loop in terms of the loop variable(s) of some outer loop(s), thus achieving the effect of a ``join'' in a relational query language. We will see an example of this kind of query shortly.
This separation of the execution of a Daplex program into two phases (i.e. the loop set evaluation and action sequence execution) is important because it is this which allows us to perform side-effecting actions such as data updates within a functional language. The compiler knows which parts of a Daplex program can be treated functionally and which cannot. It is therefore able to separate these out into the two phases at compile-time, and to treat each appropriately. However, this does impose some overhead for executing queries, since we need to store the loop sets for use later in the action execution phase. To alleviate this problem, the Daplex compiler is careful to avoid this execution strategy whenever it is safe to do so - namely, when only print actions are involved. For this kind of query, the order of retrieval of set values does not have any significant effect, except upon the ordering of the output values. Therefore, the execution strategy for this simple kind of query is to execute the sequence of actions (i.e. the print commands) as soon as each loop value tuple is generated.
We can see from our previous example that a loop begins with the key words for each, followed by a variable name (in this case, s), the key word in and finally the name of a database class (staff). In fact, we are not restricted to iterating over complete classes in this way, and Daplex provides a rich set of constructs for describing sets of values. For example, we can iterate over the set of integers from 1 to 5 in the following manner:
|: for each i in {1, 2, 3, 4, 5}
|: print(i);
or we can abbreviate the description of this set to:
|: for each i in {1 to 5}
|: print(i);
Another useful way of describing a set is to restrict a database class by adding further criteria to it, using the key words such that. For example, we can print the names of all students older than 20 as follows:
|: for each s in student such that age(s) > 20
|: print(forename(s), surname(s));
Another kind of set that is commonly required in queries is that defined not by a database class, but by the results of applying a database function to a particular instance. For example, suppose we wished to print the names of all researchers who are working on 3 year projects:
|: for each p in project such that duration(p) = 3
|: for each r in researcher such that employed_on(r) = p
|: print(forename(r), surname(r), 'works on', title(p));
Notice that we needed two for each loops to express this query. This is because we first needed to generate the set of 3 year projects, and we then used the employed_on function from researchers to projects, to generate the set of researchers that we are interested in. But the set of 3 year projects is retained, so that we can use it to print out the names of the project for each researcher at the end as well.
An alternative way to express the same query is to use the inverse function of employed_on to transform the set of 3 year projects to the set of researchers who are employed on them:
|: for each p in project such that duration(p) = 3
|: for each r in employed_on_inv(p)
|: print(forename(r), surname(r), 'works on', title(p));
In fact, this results in a much more efficient query, since we are navigating directly along a stored link, rather than iterating over a class testing for equality of some attribute. However, in general, it is not necessary for the user to worry about using inverse functions, as the Daplex query optimiser is able to spot situations in which they may be used, and to optimise the query accordingly. Consequently, the first of the above two queries will be transformed into the second before it is executed.
This use of function application to express navigation through a database is one of the primary advantages of a functional approach for data manipulation. In this example, we navigated along only one relationships, but it is possible to express navigation over a chain of an arbitrary number of relationships succinctly by composing functions. For example, if we wish to find the salaries of all people who teach fourth year students, we must navigate from the undergrad class, through the course and section classes, to finally reach the appropriate set of teacher instances:
|: for each u in undergrad such that year(u) = 4
|: for each t in has_lecturer(has_course_inv(takes(u)))
|: print(forename(t), surname(t), salary(t));
The important thing to notice here is that result of the application of the has_teacher function is a set, even though it is a single-valued function. This is because both the takes and has_course_inv functions are multi-valued, and therefore their composition produces a set of section instances when applied to an undergrad instance. The has_teacher function is applied to each of these section instances, thus producing a set of teacher instances. In fact, the function composition here can be seen as a shorthand notation for the following query:
|: for each u in undergrad such that year(u) = 4
|: for each c in takes(u)
|: for each s in has_course_inv(c)
|: for each t in has_lecturer(s)
|: print(forename(t), surname(t), salary(t));
which is considerably longer and arguably less readable than the version using composed functions.
In fact, functions can be applied to any set description in this way, not only to variables. This allows us to abbreviate the query given earlier to print the names of all researchers employed on 3 year projects to:
|: for each r in employed_on_inv(p in project such that duration(p) = 3)
|: print(forename(r), surname(r), 'works_on', title(employed_on(r)));
However, in this version, the variable p is now local to the definition of the set of values for r. It is no longer in scope at the end of the loops and so cannot be used in the print action, as before. In this case, we have to navigate to the project title from each researcher instance again, although the query optimiser may be able to save this information for us during the loop optimisation. In general, however, it is a good idea to write an explicit for loop for any values which we will want to use again, either in the specification of any nested loops or in the actions.
In addition to the for each loop which we have already encountered, there are two other kinds of loop, each specifying a slightly different kind of iteration. The first of these is introduced by the key words for the. It is used when we wish to specify iteration over a known singleton set. This situation most commonly occurs when we are retrieving instances by their key values, as in the following query:
|: for the u in undergrad such that forename(u) = "Graham" and
|: surname(u) = "Kemp"
|: for each c in takes(c)
|: print(code(c), level(c));
which prints the code and level of all the courses taken by the undergraduate student called Graham Kemp. However, there are other cases where this kind of loop is useful, such as when binding a variable to the result of applying a single-valued function, or when the semantics of the current domain require that a set evaluates to a singleton set.
Of course, it is possible to use a for each loop to iterate over a singleton set, but by using a for the loop we are adding an extra constraint the set generated by the loop specification must contain exactly one element. Therefore, if a set defined in a for the loop contains either no elements or more than one, then the loop will produce no variable bindings.
The final type of loop is denoted by the key words for any. This kind of loop is also concerned with the cardinality of the set it generates, but in a rather different way. We use a for any loop whenever we wish to iterate over a fixed number of the elements of the given set, but we do not care which are chosen. For example, suppose that we wish to print the names of any three undergraduate students from each year. In this case, we can state the query using a for any loop:
|: for each y in {1 to 4}
|: for any 3 u in undergrad such that year(u) = y
|: print(forename(u), surname(u));
A shorthand version of this loop syntax exists for the case when only one value is to be selected from the set, so that:
for any 1 p in person
and
for any p in person
are equivalent. It is also possible to define the number of elements that must be selected by giving a variable or an expression rather than a constant value. For example, the following (rather artificial) program prints the name of one year 1 student, two year 2 students, etc.:
|: for each y in {1 to 4}
|: for any y u in undergrad such that year(u) = y
|: print(forename(u), surname(u));
So far, we have concentrated on describing the ``loops'' part of a Daplex query, and have said very little about what is and is not possible in the ``actions'' part. This part of a query consists of a comma separated sequence of actions such as the print command, which describes how the sets of values specified by the for loops are to be treated. Since we are concerned here with the use of Daplex as a query language, the only action we are interested in for the moment is the print command, although later, in Chapter 4, we will discuss Daplex actions for updating databases and, in Chapter 5, we will describe how special-purpose actions can be defined in either Daplex or Prolog for use in data manipulation programs.
The print action is unusual in that it does not take a fixed number of arguments. We have already seen examples of its use with two and four arguments. Nor does it impose any specific type on any of its parameters, except for the restriction that only scalar values may be printed. According to this, the following two queries involve legal uses of print:
|: for each p in person
|: print(surname(p));
|: for each i in {1 to 10}
|: print(i * i, i + i);
but this third program does not:
|: for each p in person
|: print(p);
The result of a print action is that each of its arguments are displayed on a single line, separated by a single space in each case. The entire line is terminated by a new-line. The following program illustrates some of the effects of this behaviour:
|: for each i in {1 to 3}
|: print(i, 'has square'), print(i * i);
It produces the following output:
1 has square
1
2 has square
4
3 has square
9
We saw earlier how we could use function composition in place of an explicit for loop for describing database navigation, and a similar facility is available for use with actions. The following two programs, which are equivalent, illustrate its use:
|: for each p in person
|: print(surname(p));
|: print(surname(person));
However, the user should be aware that, to the compiler, there is no difference between an implicit and an explicit for loop, and they may be reordered by the optimiser as described earlier for queries involving only single print actions. Thus, a query having the following structure:
|: for each p1 in person
|: print(forename(p1), surname(person));
may be translated into either of the following queries:
|: for each p1 in person
|: for each p2 in person
|: print(forename(p1), surname(p2));
|: for each p2 in person
|: for each p1 in person
|: print(forename(p1), surname(p2));
If you would prefer not to have your query reordered in this way, the Daplex optimiser can be turned off, using the optimiser/1 command:
| ?- optimiser(off).
The optimiser can be turned back on by giving the atom on as the argument to this predicate.
These two constructs - the loop and the action - form the basis of the Daplex language. However, the real power comes from the rich set of constructs which are available for specifying sets, predicates and expressions in terms of database values, which we will now go on to describe. Examples of the use of some of these constructs can be found in the file of example unidb queries in the Examples directory (unidb.d).