Other Titles: | Workshop proceedings |
Authors: | Mastora, Anna Kapidakis, Sarantos |
Issue Date: | 27-Sep-2012 |
Conference: | International Workshop on Supporting User’s Exploration on Digital Libraries |
Keywords: | Digital libraries, Inflectional languages, Natural Language Processing, Spelling, Lemmatising, Query patterns |
Abstract: | The aim of this study is to investigate and report on potential implications of implementing shallow language processing towards rewriting keyword subject queries in Greek. The processing we report includes a speller and a lemmatiser along with stop word removal and query normalisation in terms of punctuation use. Among our findings is that users tend to submit morphologically variant words, which the Aspell tool, for spell checking and correcting, manages to process in a consistent way in 98.7% of the cases. We recorded a semantic drift of the initial query intent in approximately 8.2% of the overall submitted queries, after implementing the spell checker. The lemmatiser (ilsp_nlp) performs extremely well for the words it identifies. Only five cases are recorded, among the initial 750 queries we submitted to the tool, which led to a semantic drift. However, the lemmatiser does not recognise either misspelled or truncated words which remained unaltered during this step of the process. Therefore, we conclude that, responding to the examined data, spelling prior to lemmatising is the appropriate sequence of implementing the specific shallow language processing. |
URL: | http://ixa2.si.ehu.eus/suedl/SUEDLproceedings.pdf#page=11 |
URI: | https://uniwacris.uniwa.gr/handle/3000/387 |
Type: | Conference Paper |
Department: | Department of Archival, Library and Information Studies |
School: | School of Administrative, Economics and Social Sciences |
Affiliation: | University of West Attica (UNIWA) |
Appears in Collections: | Conference Papers or Poster or Presentation / Δημοσιεύσεις σε Συνέδρια |
CORE Recommender
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.