Next: Mining different kinds of Up: Design of Knowledge Discovery Previous: Introduction

Design Considerations

The following considerations served as guidelines in our design of KDSL :

1. The set of relevant data should be specified in the knowledge discovery schema.

Since a user may be interested in any portion of data in a database, he should be able to work on any specific subset of data. This implies that he may first select the relevant set of data. If he cannot identify precisely the relevant set of data, a superset of the data can be collected, and certain mechanisms can be developed to help identify or rank the relevant set of data and/or attributes.

2. The kinds of knowledge to be discovered should be specified in the data mining schema.

Ideally, one may expect that a knowledge discovery system will perform interesting discovery autonomously without human instruction or interaction. However, since mining can be performed in many different ways on any specific set of data, huge amounts and different kinds of knowledge may be generated by unguided, autonomous discovery, whereas much of such discovered knowledge could be out of user's interest. Thus, we specify both the potentially relevant set of data and the kinds of knowledge to be discovered. This leads to a guided discovery of desired kind of knowledge on a relevant set of data and represents constrained search for the desired knowledge.

3. Background knowledge can be made generally available in the data mining schema.

Discovery may be performed with the assistance of background knowledge (such as conceptual hierarchy information, etc.) The availability of relatively strong background knowledge improves the efficiency of a discovery process a lot.

4. The selection, cleaning and transformation information could be specified in the schema.

The user should be able to provide the selection information to select data or cleaning information if he wants to give any consideration to the problem of noisy or missing data. If the user wants the data to be in a specific format before mining, he should be able to specify the transformation information.

5. Various kinds of parameters could be specified flexibly.

Various kinds of thresholds like confidence, support, no. of clusters, no. of classes, etc. can be used to select desired, interesting rules and filter out less interesting ones. Also, one should be able to specify parameters like periodicity, priority, start time, etc.

Next: Mining different kinds of Up: Design of Knowledge Discovery Previous: Introduction

Deepak Goel
1/5/2000