Context Aware Paraphrasing of Noun Compounds for Robust Interpretation

Pushpak Bhattacharyya 1
1Indian Institute of Technology Bombay, Mumbai
DOI: https://doi.org/10.71448/bcds2121-5
Published: 30/12/2021
Cite this article as: Pushpak Bhattacharyya . Context Aware Paraphrasing of Noun Compounds for Robust Interpretation. Bulletin of Computer and Data Sciences, Volume 2 Issue 1. Page: 44-54.

Abstract

Noun compounds (NCs) such as chocolate cake or student protest are ubiquitous in natural language yet notoriously ambiguous when interpreted in isolation. Prior work paraphrases NCs into prepositional or free-form variants but largely ignores the sentential and discourse context that often determines the intended relation. We propose CaNCi, a context-aware paraphrasing framework that conditions on local context and retrieved usage exemplars to produce faithful, diverse paraphrases with calibrated confidence. CaNCi couples a context encoder with a sequence generator and fuses evidence from a dense retriever using Fusion-in-Decoder. To make outputs reliable for downstream insertion (e.g., machine translation or information extraction), we calibrate paraphrase probabilities and construct coverage-controlled top-k sets via conformal prediction. Across standard NC benchmarks augmented with sentence contexts, CaNCi improves isomorphic and non-isomorphic scores, reduces expected calibration error, and yields safer top-k outputs at target coverage. Human evaluations show higher adequacy and fluency compared to isolate-only baselines. We release code, data recipes, and analysis protocols to facilitate reproducibility.

Keywords: noun compounds, paraphrasing, retrieval-augmented generation, calibration, conformal prediction

Abstract

Noun compounds (NCs) such as chocolate cake or student protest are ubiquitous in natural language yet notoriously ambiguous when interpreted in isolation. Prior work paraphrases NCs into prepositional or free-form variants but largely ignores the sentential and discourse context that often determines the intended relation. We propose CaNCi, a context-aware paraphrasing framework that conditions on local context and retrieved usage exemplars to produce faithful, diverse paraphrases with calibrated confidence. CaNCi couples a context encoder with a sequence generator and fuses evidence from a dense retriever using Fusion-in-Decoder. To make outputs reliable for downstream insertion (e.g., machine translation or information extraction), we calibrate paraphrase probabilities and construct coverage-controlled top-k sets via conformal prediction. Across standard NC benchmarks augmented with sentence contexts, CaNCi improves isomorphic and non-isomorphic scores, reduces expected calibration error, and yields safer top-k outputs at target coverage. Human evaluations show higher adequacy and fluency compared to isolate-only baselines. We release code, data recipes, and analysis protocols to facilitate reproducibility.

Keywords: noun compounds, paraphrasing, retrieval-augmented generation, calibration, conformal prediction
Pushpak Bhattacharyya
Indian Institute of Technology Bombay, Mumbai

DOI

Cite this article as:

Pushpak Bhattacharyya . Context Aware Paraphrasing of Noun Compounds for Robust Interpretation. Bulletin of Computer and Data Sciences, Volume 2 Issue 1. Page: 44-54.

Publication history

Copyright © 2021 Pushpak Bhattacharyya. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Browse Advance Search