Title:
|
COMPARATIVE STUDY OF WORD EMBEDDING MODELS FOR ARABIC TERMINOLOGY EXTRACTION |
Author(s):
|
Wiem Lahbib, Ibrahim Bounhas and Yahya Slimani |
ISBN:
|
978-989-8533-82-1 |
Editors:
|
Pedro IsaĆas and Hans Weghorn |
Year:
|
2018 |
Edition:
|
Single |
Keywords:
|
Domain Terminology, Word Embedding, Terminology Enrichment, Arabic Language |
Type:
|
Full Paper |
First Page:
|
245 |
Last Page:
|
252 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
In this paper, we study the problem of specific-domain terminology extraction from textual data which is an important task for many applications, such as Machine Translation (MT), document indexing and Information Retrieval (IR). Existing terminology extraction approaches investigated models and techniques mostly applied for European languages like English and French. However, for highly ambiguous languages like Arabic, approaches are still primitive. To overcome this limitation, we propose in this paper a new approach for Arabic domain terminology extraction based on word embedding models. In particular, we use terms which appear in important parts of documents (e.g. section titles) as a minimal terminology on which we apply an enrichment process to identify similar terms based on embedded vectors. Results show that deep learning reaches better results compared to different baselines. The differences between experimented approaches show that our contribution is significant in terms of p-value. |
|
|
|
|