Semantic-enhanced search: Finding meaning in large-scale scanned text collections Annike Hinze University of Waikato, NZ

When:
September 16, 2016 @ 2:00 pm – 3:00 pm
2016-09-16T14:00:00+01:00
2016-09-16T15:00:00+01:00
Where:
Oxford e-Research Centre
7 Keble Rd
Oxford OX1 3QG
UK
Cost:
Free
Contact:
Oxford e-Research Centre
01865 610600

Annika Hinze
University of Waikato, New Zealand
September 16, 2016 –
14:00 to 15:00
Conference Room (room 278)
Oxford e-Research Centre, 7 Keble Road, Oxford

Seminar No booking required Open to all Coffee and cakes

Abstract: This talk presents Capisco, a system for semantic-enhanced search in a digital library of full-texts. Document search in Digital Libraries typically use purely lexical analysis, which cannot address the inherent ambiguity of natural language. A semantic search approach offers the potential to overcome the shortcoming of lexical search, but even if an appropriate network of ontologies could be decided upon it would require a full semantic markup of each document. Capisco instead analyzes documents by the semantics and context of their content. The disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. Our method achieves a form of semantic-enhanced search that simultaneously exploits the proven scale benefits provided by lexical indexing.

For established systems, completely replacing, or even making significant changes to the document retrieval mechanism would require major technological effort, and would most likely be disruptive. We explored ways to use the results of semantic analysis and disambiguation, while retaining an existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.

Bio: Annika Hinze is a senior lecturer in the Department of Computer Science at the University of Waikato, New Zealand. She is the head of the Databases and Information Systems (ISDB) group at Waikato. Her research interests include complex event processing, location-based systems and semantic annotation in digital libraries. She is principal investigator on Capsico, exploring semantic-enhanced search in the HathiTrust Digital Library. Annika graduated with a Master’s in Mathematics from TU Berlin and undertook her PhD in Computer Science at Freie Universitaet Berlin.