Graph-based sentence level spell checking framework

dc.authorid0000-0003-2181-4292en_US
dc.authorid0000-0001-8902-6837
dc.contributor.authorKabakuş, Abdullah Talha
dc.contributor.authorKara, Resul
dc.date.accessioned2021-06-23T19:48:55Z
dc.date.available2021-06-23T19:48:55Z
dc.date.issued2017
dc.departmentBAİBÜ, Rektörlük, Bilgi İşlem Daire Başkanlığıen_US
dc.description.abstractSpelling mistakes are very common on the web, especially when it comes to social media, it is much more common since (1) users tend to use an informal language that contains slang, and (2) the character limit defined by some social services such as Twitter. Traditional string similarity measurements (1) do not consider the context of the misspelled word while providing alternatives, and (2) do not provide a certain way to choose the right word when there are multiple alternatives that have the same similarity with the misspelled word. Therefore, we propose a novel sentence level spell checking framework that targets to find "the most frequently used similar alternative word". 146,808 sentences from different corpora are stored in a graph database. The similarity is calculated by using Levenshtein distance algorithm alongside the similarity between two given words. As the experimental results are presented in the discussion, the proposed framework is able to correct misspellings which cannot be corrected by traditional string similarity measurement based approaches. The accuracy of the proposed framework is calculated as 84%. Since the proposed framework uses a slang dictionary to determine misspelled words, it can be used to correct misspellings in the social media platforms.en_US
dc.identifier.endpage41en_US
dc.identifier.issn0976-3104
dc.identifier.issue3en_US
dc.identifier.startpage36en_US
dc.identifier.urihttps://hdl.handle.net/20.500.12491/9253
dc.identifier.urihttps://www.webofscience.com/wos/woscc/full-record/WOS:000423922500007
dc.identifier.volume8en_US
dc.identifier.wosWOS:000423922500007en_US
dc.identifier.wosqualityN/Aen_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.institutionauthorKabakuş, Abdullah Talha
dc.language.isoenen_US
dc.publisherInst Integrative Omics & Applied Biotechnologyen_US
dc.relation.ispartofIioab Journalen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectSpell Checken_US
dc.subjectString Similarityen_US
dc.subjectEdit Distanceen_US
dc.subjectSocial Mediaen_US
dc.subjectTwitteren_US
dc.titleGraph-based sentence level spell checking frameworken_US
dc.typeArticleen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
abdullah-talha-kabakus.pdf
Boyut:
668.84 KB
Biçim:
Adobe Portable Document Format
Açıklama:
Tam Metin/Full Text