写作辅助的智能词语建议
Smart Word Suggestions for Writing Assistance
May 17, 2023
作者: Chenshuo Wang, Shaoguang Mao, Tao Ge, Wenshan Wu, Xun Wang, Yan Xia, Jonathan Tien, Dongyan Zhao
cs.AI
摘要
增强词语使用是写作辅助中一个理想的特性。为了进一步推动这一领域的研究,本文介绍了“智能词语建议”(SWS)任务和基准。与其他作品不同,SWS强调端到端评估,并呈现了更为现实的写作辅助场景。该任务涉及识别需要改进的词语或短语,并提供替换建议。基准包括人工标记的测试数据,用于训练的大规模远程监督数据集,以及用于评估的框架。测试数据包括1,000个由英语学习者撰写的句子,配有由10名母语者注释的超过16,000个替换建议。训练数据集包括超过3.7百万个句子和通过规则生成的12.7百万个建议。我们对七个基准模型进行的实验表明,SWS是一个具有挑战性的任务。根据实验分析,我们提出了未来在SWS上的研究潜在方向。数据集和相关代码可在 https://github.com/microsoft/SmartWordSuggestions 获取。
English
Enhancing word usage is a desired feature for writing assistance. To further
advance research in this area, this paper introduces "Smart Word Suggestions"
(SWS) task and benchmark. Unlike other works, SWS emphasizes end-to-end
evaluation and presents a more realistic writing assistance scenario. This task
involves identifying words or phrases that require improvement and providing
substitution suggestions. The benchmark includes human-labeled data for
testing, a large distantly supervised dataset for training, and the framework
for evaluation. The test data includes 1,000 sentences written by English
learners, accompanied by over 16,000 substitution suggestions annotated by 10
native speakers. The training dataset comprises over 3.7 million sentences and
12.7 million suggestions generated through rules. Our experiments with seven
baselines demonstrate that SWS is a challenging task. Based on experimental
analysis, we suggest potential directions for future research on SWS. The
dataset and related codes is available at
https://github.com/microsoft/SmartWordSuggestions.