Content
News
AAIP '09 --- Call for Sites
Call for Sites for the Workshop on Approaches and Applications of Inductive Programmin 2009 is open. Comments and suggestions can be posted at the bulletin board.
Bulletin Board online
We set up a blog as bulletin board for our homepage. You can access it under inductive-programmin.org/IPblog. We will post announcements ans discussions there.
New IP Logo
We are proud to present our new IP logo. It depicts the stylised acronym for Inductive Programming, with the letters I and P overlay on their stems. Circles emerging from the bottom to the top of the stem symbolise the induction from examples. The bowl of the P is an arrow pointing backwards to the first emerging circle representing recursion and loops.
We also adjusted the whole appearence of our site. Thanks to Sanne helping with the layout and especially for designing this fantastic logo.
Publications updated
The list of publications has been updated. Thanks to Ute Schmid for providing the ip-related part of her bibliography. If you want to contribute or you think papers are missing then please send a message with the BibTex entries to the admin.
People Section finished
The 'People' section has been finished. There an overview of researchers working in the fieled of IP can be found.
AAIP Mailing List transferred
The old AAIP mailing list has now been transferred to ip [HYPHEN] list [AT] inductive [HYPHEN] programming [DOT] org. If you registered to the the old AAIP list, you should have recieved a mail from listserv.uni-bamberg.de's postmaster (postmaster [AT] listserv [DOT] uni [HYPHEN] bamberg [DOT] de). If not, please have a look in your spam folder or use the subscription form to subscribe manually.
Mailing List installed
The mailing list of this site is now working. Everybody interested in the latest news of the IP community can sign in here. To post to the list send a message to ip [HYPHEN] list [AT] inductive [HYPHEN] programming [DOT] org.
Homepage online
This site is now online as the web portal of the IP community.
Systems
!!! Under Construction !!!
ADATE
- Roland Olsson's ADATE homepage
Short Description
Author: Roland Olsson
Automatic Design of Algorithms Through Evolution (ADATE) is an automatic programming system that synthesizes function definitions in a subset of the programming language Standard ML. ADATE is primarily intended as a system that automatically improves a part of an existing program, but is also useful for generating programs from scratch provided that they are fairly small.
The evolution of programs is guided by an evaluation function that tests a given program and says how good it is. Note that the function generated by ADATE might be only a small part of this given program. Thus, ADATE is especially suitable for so-called reinforcement learning, but also useful in more traditional machine learning where input-output pairs are employed to define the evaluation function.
For example, one possible application of ADATE is to automatically improve image processing algorithms. One may then start with a possibly advanced and complex algorithm, select a part of it to be improved and then define an evaluation function that measures the performance of the new versions of the original algorithm that result from plugging in ADATE generated functions instead of the selected part. ADATE has been used in this way to improve one of the best known image segmentation algorithms.
Automatic algorithm improvement as outlined above appears to be practically useful in applications where programs written by human beings contain subroutines or functions that are designed through experimental feedback rather than being optimally defined using purely theoretical considerations.
Depending on the final size of the function to be improved and its algorithmic complexity, ADATE may need to test millions or even billions of candidates. However, we have found experimentally that the number of evaluations needed typically is determined by a low degree polynomial in program size provided that the training inputs and the evaluation function are well defined. The overall run time of ADATE approximately equals the number of evaluations multiplied with the time required to run a program on all the training inputs.
Therefore, ADATE is most easily used in applications where the overall program has a short run time and where the part of it selected for improvement is reasonably small, say smaller than 100 lines of Standard ML code.
ADATE can automatically generate non-trivial and novel algorithms that contain one or more help functions that are created by ADATE when and where needed. For example, ADATE can easily synthesize a functional program that generates a list of lists containing all permutations of a given input list even if it is not given any help functions to start with except the necessary data type constructors. In this case, the best synthesized program contains two invented auxiliary functions that perform combinations of list rotations and concatenations in non-trivial ways.
ADATE uses large scale combinatorial search that employs sophisticated program transformations. Examples of transformations includes expression synthesis of terms that do not make a program worse and adding extra parameters to local help functions in order to make them more general.
The system currently runs on Linux clusters with MPI and can effectively utilize a few hundred CPU cores but a rewrite to obtain practically unlimited scalability with an increasing number of cores per CPU and number of CPUs is being planned.
ATRE
- ATRE homepage
DIALOGS
Short Description
Author: Pierre Flener
The DIALOGS (Dialogue-based Inductive/Abductive LOGic program Synthesiser) technique resulted from an effort at building a fully interactive version of SYNAPSE, and at extending its power while at the same time simplifying its machinery. The main objective was to take all burden from the specifier by having the technique ask for exactly and only the information it needs. As a result, no evidence needs to be prepared in advance, since the technique invents its own evidence and queries an oracle (usually a human specifier) about it. This is suitable for all levels of expertise of human users, as the queries are formulated in the specifier's (initially unknown) conceptual language, in a program-independent way, and such that the specifier must know the answers if s/he really feels the need for the program. For instance, the query "When does sort([X,Y],R) hold?" can be answered by
thereby expanding the perceived subset of the conceptual language of the specifier, and giving the same information as in a property provided to SYNAPSE. The evidence language implicitly amounts to (non-recursive) normal logic programs. Type declarations are provided as language bias. Mode and determinism information are not required, because the focus is on synthesising the logic component of logic programs. The technique is schema-guided, and currently has two schemas (divide-and-conquer and accumulate) where multiple base clauses and multiple recursive clauses are possible. The hypothesis language is normal logic programs, where negation is restricted to the case discriminants and appears there by extraction from the query answers (i.e., it can only be applied to primitive predicates and could thus be avoided by using the complementary primitives in the query answers, as we did above by writing X > Y instead of ¬ X ≤ Y).
FLIP
- The homepage of the MIP Group
Short Description
Author: Cèsar Ferri
FLIP implements a framework for the Induction of Functional Logic Programs (IFLP) from facts. This can be seen as an extension to the now consolidated field of Inductive Logic Programming (ILP). Inspired in the inverse resolution operator of ILP, the system is based on the reversal of narrowing, the more usual operational mechanism for Functional Logic Programming. The main advantages of the FLIP system over the most used ILP systems are a natural handling of functions, without the use of mode or determinism declarations, and its power for inducing short recursive programs. Its applications are mainly program synthesis, program debugging and data mining of small highly structured documents.
The FLIP system accepts both positive and negative examples. Examples consist of sets of equations, with its rhs. normalised wrt. the program to be induced.
Background knowledge can also be used as a conditional functional logic program. Note that any logic program can be expressed as a conditional functional logic program. Therefore, virtually any logic program can be used as background knowledge as well as functional definitions. FLIP must be given the function symbols from the background knowledge that should be used in the definition of the function to be learned.
The FLIP system can induce 'correct' programs from very few examples. There is no need for good examples (in the sense of Ling), nor Basic Representative Sets (BRS) such as FOIL, GOLEM, Progol, amongst others, require.
FOIL/FFOIL
- Ross Quinlan's personal homepage
GOLEM
- Stephen Muggleton's personal homepage
Short Description
Author: S.H.Muggleton
Golem (Muggleton and Feng, 1990) is an ILP system, written in C, which iteratively constructs a set of definite clauses from groups of atomic examples based on Plotkin's Relative Least General Generalisation (RLGG). Plotkin's approach provides a method of constructing a unique clause which covers a set of examples relative to given background knowledge. However, such a clause can in the worst case contain infinitely many literals, or at best grow exponentially with the number of examples involved. By bounding the depth of variables and imposing a deterimanacy constraint the Golem system was guaranteed to build clauses whose length was bounded by a polynomial function of various features of the background knowledge.
Golem is given positive and negative examples together with extensional background knowledge. It starts by randomly choosing pairs of positive examples. For each pair the ij-determinate RLGG is constructed and then maximally generalised by dropping body atoms while maintaining consistency with respect to the negative examples. All reduced clauses are evaluated with respect to the degree to which they compress the information in the examples. The pair of examples leading to the most compressive clause is then extended by randomly adding examples to produce triples and later n-tuples, for which further RLGG clauses are constructed and reduced in a similar fashion. The single clause construction terminates once further extenions of the RLGG no longer lead to improved compression. Once a single clause has been asserted, all covered examples are removed and the clause construction process continues on all remaining examples. The process terminates when no remaining pair of examples has a consistent RLGG.
Golem was applied to a variety of relational learning problems including the learning of simple Prolog programs as well as real-world applications involving protein-fold prediction, drug-activity prediction, learning of computer-aided design rules and various chess problems.
The use of inverse tabling methods allowed the efficiency of Golem to be comparable to the contemporary ILP system FOIL. FOIL greedily searches the space guided by an information measure similar to that used in ID3. This measure supports the addition of a literal in the body of a clause on the basis of its ability to discriminate between positive and negative examples. This gains efficiency at the expense of completeness. For instance the literal
in the recursive quick-sort clause does not produce any discrimination between positive and negative examples. As a result quick-sort cannot be learned by FOIL. The problem recurs throughout a large class of predicates in which new variables in the body are functionally derived from terms in the head of the clause. Such predicates include arithmetic multiply, list reverse and various real-world domains.
Golem's key weaknesses with respect to the later-developed Progol system were associated with the required use of ground unit clause background knowledge and its limitation to the construction of determinate clauses. The determinacy constraint was particularly awkward for applications involving learning the properties of chemical compounds described in the form of atom and bond relationships.
Igor I
Short Description
Author: Ute Schmid
IGOR I is a modern extension of Summer's seminal THESYS system. The underlying program template describes the set of all functional programs with the following restrictions: built-in functions can only be first-order, and no nested or mutual recursion is allowed. IGOR I adopts the two-step approach of THESYS. Synthesis is still restricted to structural problems, where only the structure of the arguments matters, but not their contents, such as in list reversing. Nevertheless, the scope of synthesisable programs is considerably larger. For instance, tree-recursive functions and functions with hidden parameters can be induced. Most notably, programs consisting of a calling function and an arbitrary set of further recursive functions can be induced. The first step of synthesis (trace construction) is therefore expanded such that traces can contain nestings of conditions. The second step is expanded such that the synthesis of a function can rely on the invention and synthesis of other functions (that is, IGOR I uses a technique of function invention in correspondence to the concept of predicate invention).
Igor II
Short Description
Author: Emanuel Kitzelmann
IGOR II relies on constructor-term rewriting techniques. Thereby characteristics of modern functional programming languages are covered: Functions are defined with respect to algebraic datatypes and control flow is guided by pattern matching. Specifications are presented as sets of example equations. Several dependent functions, e.g., the mutual recursive solution for even and odd, can be specified and learned simultaneously. The example equations can contain variables, which is especially convenient for specifying more complex problems where large sets of ground examples would be necessary to cover all possible cases of the desired behaviour of the program to be learned.
IGOR II performs synthesis in one step from a given set of example equations together with the algebraic datatype. Background knowledge can be provided in form of additional example equations. Due to the generality of user-defined datatypes, list problems as well as problems on natural numbers or trees or any other algebraic type can be solved. Since also the Boolean values can be defined as algebraic type, functions yielding trees, lists, numbers, etc, as well as predicates, e.g., equality and order relations, can be induced.
The patterns are assured to be non-unifying such that the equations constituting the induced program can be regarded as a \emph{set} of equations, where order does not matter. As a consequence, e.g., member cannot be defined based on the patterns (X,[]), (X,[X|Xs]) and (X,[Y|Xs]) since these patterns unify and hence cannot be considered in an arbitrary order. Thus, for solving problems where pattern variables need to be compared regarding equality or any other predicate, patterns alone do not suffice and if-then-else conditionals are needed. Currently, IGOR II can introduce such conditionals if needed when the condition is an equality or order test on the pattern variables such that, e.g., member or inserting a number at the right place into an ordered list can be induced. When providing background knowledge, a broader class of semantic problems becomes feasible. For example, insertion sort can be synthesised providing insert as background knowledge; quick sort can be synthesised providing append and partition.
Induction in IGOR II relies on calculating least general generalisations (lggs) over the example equations and refining these lggs using the following three techniques: splitting rules by pattern refinement, introducing function calls, and function invention.
MagicHaskeller
- MagicHaskeller: A Search-based Inductive Functional Programming System by Susumu Katayama
Short Description
coming soonPROGOL
- Stephen Muggleton's personal homepage
SYNAPSE
- Download at Pierre Flener's homepage
Short Description
Author: Pierre Flener
The SYNAPSE (SYNthesis of Algorithms from PropertieS and Examples) technique takes as evidence a set of (non-recursive) Horn clauses describing a single intended relation. Ground unit clauses are called (positive) examples and data-drive the synthesis; all other clauses are called properties and are used to find instances of the place-holders of the schema. For instance, the clauses
and
are properties. No other bias is given, though types are inferred from the examples. Mode and determinism information are not required, because the focus is on synthesising the logic component of logic programs. Synthesis is passive, although there is an expert mode where the technique asks for a preference among the possible instances of some place-holders, rather than non-deterministically choosing each from a repository. These problem-independent repositories form the (partitioned) background knowledge. The technique is based on a divide-and-conquer schema where multiple base clauses and multiple recursive clauses are possible. The hypothesis language is normal logic programs, where negation is restricted to the case discriminants and appears there by extraction from the properties (i.e., it can only be applied to primitive predicates and could thus be avoided by using the complementary primitives in the properties).