Graph-based sentence level spell checking framework

Yükleniyor...
Küçük Resim

Tarih

2017

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Inst Integrative Omics & Applied Biotechnology

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Spelling mistakes are very common on the web, especially when it comes to social media, it is much more common since (1) users tend to use an informal language that contains slang, and (2) the character limit defined by some social services such as Twitter. Traditional string similarity measurements (1) do not consider the context of the misspelled word while providing alternatives, and (2) do not provide a certain way to choose the right word when there are multiple alternatives that have the same similarity with the misspelled word. Therefore, we propose a novel sentence level spell checking framework that targets to find "the most frequently used similar alternative word". 146,808 sentences from different corpora are stored in a graph database. The similarity is calculated by using Levenshtein distance algorithm alongside the similarity between two given words. As the experimental results are presented in the discussion, the proposed framework is able to correct misspellings which cannot be corrected by traditional string similarity measurement based approaches. The accuracy of the proposed framework is calculated as 84%. Since the proposed framework uses a slang dictionary to determine misspelled words, it can be used to correct misspellings in the social media platforms.

Açıklama

Anahtar Kelimeler

Spell Check, String Similarity, Edit Distance, Social Media, Twitter

Kaynak

Iioab Journal

WoS Q Değeri

N/A

Scopus Q Değeri

Cilt

8

Sayı

3

Künye