Make it simple with paraphrases: automated paraphrasing for authoring aids and machine translation

Make it simple with paraphrases: automated paraphrasing for authoring aids and machine translation

Anabela Barreiro - Centre for Linguistics at the Universidade do Porto, Portugal

For the past few years, researchers have been trying to achieve automated paraphrasing to respond to the commercial enterprises’ goal to include paraphrases in their text processing tools, authoring aids, style editors, learning tools, etc. The benefits of paraphrasic knowledge to natural language processing have been quantified in areas such as question answering, information extraction, text mining, summarization, language generation, plagiarism detection, and machine translation. In this seminar, I will describe paraphrases (sensu latus and in linguistics) and present different types of paraphrase (referential, lexical, phrasal, syntactic, lexical-syntactic, multiword units). I will also discuss the elements that play a key role in paraphrasing and stress the importance of paraphrasing for machine translation (especially paraphrasing of multiword units). Finally, I will introduce SPIDER, a System for Paraphrasing In Document Editing and Revision, which it is currently being integrated into a educational program for a cyber-school project. SPIDER was designed to help with content writing optimization, but its applicability extends to machine translation pre-editing.

Short bio:

Anabela Barreiro is a computational linguist with a Ph.D. in Linguistics from the Universidade do Porto, having performed part of her scientific research at New York University. Her research focuses on the development and evaluation of linguistic resources for automated text processing and machine translation, having specialized in the latest years in automated paraphrasing and their application to authoring aids, text production and revision, with publications on the subject and participation in program committees. She was employed as a linguist at Logos Corporation, a pioneer US-based machine translation company, which for over 30 years developed a commercial system that led to the OpenLogos, free/open source MT. At Logos Corporation, Anabela developed the English-Portuguese language pair, but also has worked in the Spanish and Italian systems and performed collateral tasks, such as linguistic quality assurance. For the last few years, she has been exploiting OpenLogos resources to create new linguistically enhanced natural language processing applications, and is now engaged in the development of a new generation of linguistically sophisticated machine translation systems. She worked at the major Portuguese R&D institute (INESC – I&D) as a researcher at the Natural Language Group. In several occasions, she collaborated with the Portuguese National Scientific Computing Foundation (FCCN) providing linguistic services for the Linguateca project and participated in joint evaluation tasks. She has also worked as an independent consultant, as a language teacher, and as a translator/interpreter between English and Portuguese. She is now responsible for Computational Linguistics at Metatrad, a role that she combines with that of research at the Centre for Linguistics at the Universidade do Porto (CLUP).

Anabela's PhD dissertation: "Make it simple with paraphrases: automated paraphrasing for authoring aids and machine translation" addresses the problem of formalizing and automating paraphrases of multi-word units and exemplifies how paraphrases can be efficiently employed by authoring aids to help simplify and clarify texts, presenting obvious benefits to linguistic quality assurance in text processing. The dissertation emphasizes the positive impact of paraphrasing in machine translation.