Digital Library

cab1

 
Title:      DESIGN OF AN AUTOMATED SYSTEM FOR CLUSTERING HETEROGENEOUS DATA
Author(s):      Dorin Carstoiu , Alexandra Cernian , Adriana Olteanu , Tudor Ionescu
ISBN:      978-972-8924-63-8
Editors:      Hans Weghorn and Ajith P. Abraham
Year:      2008
Edition:      Single
Keywords:      Clustering, classification, heterogeneous data, informational content.
Type:      Short Paper
First Page:      82
Last Page:      86
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      The goal of this work is to study the feasibility of a Heterogeneous Data Classification and Search (HDCS) system and to provide a possible design for its implementing. In order to design a HDCS system we propose an actor oriented modeling technique, for which we show the information flow. We have identified 6 different actors (subsystems) which collaborate to construct a file sheet and produce the final search result. The first 5 actors add information to the file’s sheet, which is afterwards used by the final actor to produce the desired result. Given the vast quantity of data and the variety of formats and encodings it exists in, a semantic approach based on metadata has been chosen. Instead of digging into the actual data for extracting information, we used the context of the file to collect its metadata. The metadata is afterwards used for the classification process. The reason for this approach is that data are made available by people who are interested in other people understanding what the respective data are about. This observation provided the confidence needed to pursue the presented approach. The HDCS system we propose combines techniques from conventional search systems, classification systems, search results clustering systems, while also providing original solutions, such as an innovative data sampling method.
   

Social Media Links

Search

Login