Machine Learning & Knowledge Discovery
      Research

      University of Aberdeen

      Machine Learning and Knowledge Discovery are two highly-related areas. Machine Learning has traditionally been ...

      People

      Derek Sleeman Professor
      Pete Edwards Lecturer
      Claire Green Research Assistant
      Darren Johns Research Student
      Ian Miller Research Student

      Former People

      Fengru Chen
      Winton Davies
      Adrian Gordon
      Fraser Mitchell
      Rudiger Oehlmann
      Terry Payne
      Davide Roverso

      News

      Workshop on Distributed Data-Mining
      Pete Edwards is on the programme committee for the PKDD-2001 workshop on Distributed Data Mining. For details of the workshop see http://???.


      Current Activities

      Text-Mining

      With the growth of the World Wide Web in recent years, text-mining has emerged as a specialised sub-activity within the field of knowledge discovery/data-mining. One application of such techniques is analysis of a user's Web browsing behaviour to induce a profile of their document preferences.

      Feature Selection & Dimensionality Reduction

      Real-world applications of machine learning and knowledge discovery are often faced with the problem of massive data sets containing large numbers of instances, often with sizeable feature sets. Methods for dimensionality reduction attempt to reduce the size of the instance space or feature space (or sometimes both) in order to make the learning task more tractable and to improve performance.

      Clustering

      Conventional techniques do not make use of domain knowledge when performing clustering. The KC approach does make use of such knowledge, in various forms: attribute dependencies (both strong and weak) and goal attributes. Experimental comparisons of KC with an earlier clustering system (COBWEB) have shown a marked improvement in performance (as measured by predictive accuracy vs. number of training instances).

      Adaptive Information Agents

      Our work focuses on the use of learning techniques to enhance the capabilities of agent-based systems (user-interface agents, network agents, and multi-agent systems), allowing agents to adapt to available resources, the nature of tasks, user characteristics, etc.

      Distributed Data-Mining

      Increasingly, data exists in a shared, distributed environment. The application of data-mining techniques in such an enviornment presents significant challenges. Our recent work has investigated the problem from both an abstract and empirical perspective.

      Analogical Reasoning

      The ACHAB system tackles the problem of the discovery of analogies in large, multi-functional knowledge bases. The system is based on the fusion of the access, mapping, and generalization phases of "classical" Analogical Reasoning; this fusion allows for the search of analogies in a knowledge base which has not been built specifically for the analogy task. ACHAB also exploits abstraction, in the form of a set of abstraction operators, to allow more distant analogies through the relaxation of mapping constraints. More recently the concept of competition among evolving analogies has been introduced by the exploitation of concurrency of processes, that is, each tentative analogy is incrementally built by a separate process which has to compete for resources with other processes which are attempting to construct alternative analogies.


      Former Activities

      Predicting Abnormal States & Situations

      The TIGON system accepts a data-dependency graph for a system to be modelled and a number of labelled data sets, and using curve fitting techniques learns rules to describe how the variables relate when in the "normal" state, and when in one of the abnormal states (these correspond to the labels given to the data sets). TIGON then compares the variables in the abnormal and normal state and produces rules which can be used to predict abnormal states and situations. Part of the labelled data is held back to test the inferred rules; if the predictions are not good on the test data, then the expert is given a chance to modify the data-dependency graph or to change the labelling of the data sets (by modifying the labelling or introducing new labels). The approach has been applied to gas turbines; several enhancements of the data labelling have so far been suggested.

      Computational Models of Scientific Discovery

      Scientific discovery is perhaps amongst the most complex of intellectual activities and its study has attracted the attention of historians and philosophers for many years. Advances in AI and cognitive psychology have provided new approaches and fresh insights into the nature of science through work on computational models of the discovery process.

      Activities:

      • The proposal of a framework in which the complex process of Theory Formation was formulated in terms of Informal Qualitative Models (IQMs). Using this framework, we successfully replicated many of the 18th and 19th Century discoveries in the area of colligative properties of solutions (depression of freezing point and osmotic pressure).

      • A graphical tool to find and correct errors in a scientific theory. In outline, the expert is asked to provide an initial theory and to suggest a data set on which to test this theory. Statistical techniques determine those points which conform to the theory, and highlights those which do not. These "rogue" points are investigated further to determine whether they have any features in common (the system is provided with a large set of previously defined chemical concepts). If an inconsistency is found then the issue is how to "patch" the original theory. A prototype version of the system was implemented and used, with reasonable success, to predict properties of molten slags. In the course of this investigation, we were able to show that many of the transition metal elements, e.g. iron, behave differently in different situations. The system employed a GUI to enable the Chemist/Metallurgist to clearly see inherent trends.

      • The IULIAN system fused ideas from machine discovery and case-based reasoning to discover new explanations which could be used to revise an initial theory of some domain. The term exploratory discovery refers to an integration of self-questioning and experimentation which aims to overcome a weakness of current machine discovery systems, namely that they use experimental results to generate explanations without using previous experience. IULIAN employed case-based planning techniques to learn how to improve not only its existing theory, but also its theory revision methods.

      • The majority of work concerned with developing computational models of scientific discovery has focused on modelling individual scientists and their endeavours. Such work has neglected one very important aspect of science - the extent to which scientists communicate/ cooperate during the discovery process. The MAMaLS system was used to explore some of these issues, by modelling the discovery of the structure of DNA. MAMaLS represents individual scientists (agents) as objects and supports several inter-agent communication strategies. Agents are provided with problem-solving and learning capabilities, allowing them to apply background knowledge to solve problems and to form generalisations over results.


      Knowledge Based Systems | Research Themes | Computing Science
      University of Aberdeen

      Last updated April 27, 2001
      webmaster@csd.abdn.ac.uk