Digital Library

cab1

 
Title:      MODELING DATA CLEANING TECHNIQUES FOR BIG DATA
Author(s):      Diana Martinez-Mosquera, Sergio Luján-Mora and Fidel Parra
ISBN:      978-989-8533-69-2
Editors:      Pedro Isaías and Hans Weghorn
Year:      2017
Edition:      Single
Keywords:      Big, Data, Cleaning, Modeling, UML, Technique
Type:      Reflection Paper
First Page:      310
Last Page:      113
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Big Data is currently a popular term, it focuses on high volumes of data processed into relevant information to assist in decision making. A few researches about data cleaning techniques have been adapted to Big Data and we consider that filtering irrelevant data is an important task aimed to reduce hardware and processing time requirements. Moreover, there are separate researches about data cleaning processes in Big Data, thus, our approach proposes to model techniques used for this intent. Since logs can be considered as Big Data we have modeled two different approaches, one to clean a firewall log in a vertical dimension and another to clean a web log in a horizontal dimension. An advantage of our proposal is the use of the Unified Modeling Language, an International Organization for Standardization standard widely accepted since 2005. Consequently, the data cleaning process is composed by logical units that can be replaced or modified by the designer. Thus, the examples demonstrate the integration possibility of several clustering techniques, for example, Levenshtein Distance, Longitudinal Distance, Transposition Invariant Distance, Word Position Invariant Distance, etc.
   

Social Media Links

Search

Login