Graph-based sentence level spell checking framework
Yükleniyor...
Tarih
2017
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Inst Integrative Omics & Applied Biotechnology
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
Spelling mistakes are very common on the web, especially when it comes to social media, it is much more common since (1) users tend to use an informal language that contains slang, and (2) the character limit defined by some social services such as Twitter. Traditional string similarity measurements (1) do not consider the context of the misspelled word while providing alternatives, and (2) do not provide a certain way to choose the right word when there are multiple alternatives that have the same similarity with the misspelled word. Therefore, we propose a novel sentence level spell checking framework that targets to find "the most frequently used similar alternative word". 146,808 sentences from different corpora are stored in a graph database. The similarity is calculated by using Levenshtein distance algorithm alongside the similarity between two given words. As the experimental results are presented in the discussion, the proposed framework is able to correct misspellings which cannot be corrected by traditional string similarity measurement based approaches. The accuracy of the proposed framework is calculated as 84%. Since the proposed framework uses a slang dictionary to determine misspelled words, it can be used to correct misspellings in the social media platforms.
Açıklama
Anahtar Kelimeler
Spell Check, String Similarity, Edit Distance, Social Media, Twitter
Kaynak
Iioab Journal
WoS Q Değeri
N/A
Scopus Q Değeri
Cilt
8
Sayı
3