COMPARATIVE STUDY OF WORD EMBEDDING MODELS FOR ARABIC TERMINOLOGY EXTRACTION

Home

Document Info

Title:	COMPARATIVE STUDY OF WORD EMBEDDING MODELS FOR ARABIC TERMINOLOGY EXTRACTION
Author(s):	Wiem Lahbib, Ibrahim Bounhas and Yahya Slimani
ISBN:	978-989-8533-82-1
Editors:	Pedro Isaías and Hans Weghorn
Year:	2018
Edition:	Single
Keywords:	Domain Terminology, Word Embedding, Terminology Enrichment, Arabic Language
Type:	Full Paper
First Page:	245
Last Page:	252
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	In this paper, we study the problem of specific-domain terminology extraction from textual data which is an important task for many applications, such as Machine Translation (MT), document indexing and Information Retrieval (IR). Existing terminology extraction approaches investigated models and techniques mostly applied for European languages like English and French. However, for highly ambiguous languages like Arabic, approaches are still primitive. To overcome this limitation, we propose in this paper a new approach for Arabic domain terminology extraction based on word embedding models. In particular, we use terms which appear in important parts of documents (e.g. section titles) as a minimal terminology on which we apply an enrichment process to identify similar terms based on embedded vectors. Results show that deep learning reaches better results compared to different baselines. The differences between experimented approaches show that our contribution is significant in terms of p-value.

	Go Back