PMID- 37346502 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20230701 IS - 2376-5992 (Electronic) IS - 2376-5992 (Linking) VI - 9 DP - 2023 TI - Unsupervised query reduction for efficient yet effective news background linking. PG - e1191 LID - 10.7717/peerj-cs.1191 [doi] LID - e1191 AB - In this article, we study efficient techniques to tackle the news background linking problem, in which an online reader seeks background knowledge about a given article to better understand its context. Recently, this problem attracted many researchers, especially in the Text Retrieval Conference (TREC) community. Surprisingly, the most effective method to date uses the entire input news article as a search query in an ad-hoc retrieval approach to retrieve the background links. In a scenario where the lookup for background links is performed online, this method becomes inefficient, especially if the search scope is big such as the Web, due to the relatively long generated query, which results in a long response time. In this work, we evaluate different unsupervised approaches for reducing the input news article to a much shorter, hence efficient, search query, while maintaining the retrieval effectiveness. We conducted several experiments using the Washington Post dataset, released specifically for the news background linking problem. Our results show that a simple statistical analysis of the article using a recent keyword extraction technique reaches an average of 6.2x speedup in query response time over the full article approach, with no significant difference in effectiveness. Moreover, we found that further reduction of the search terms can be achieved by eliminating relatively low TF-IDF values from the search queries, yielding even more efficient retrieval of 13.3x speedup, while still maintaining the retrieval effectiveness. This makes our approach more suitable for practical online scenarios. Our study is the first to address the efficiency of news background linking systems. We, therefore, release our source code to promote research in that direction. CI - (c) 2023 Essam and Elsayed. FAU - Essam, Marwa AU - Essam M AD - Qatar University, Doha, Qatar. FAU - Elsayed, Tamer AU - Elsayed T AD - Qatar University, Doha, Qatar. LA - eng PT - News DEP - 20230113 PL - United States TA - PeerJ Comput Sci JT - PeerJ. Computer science JID - 101660598 PMC - PMC10280215 OTO - NOTNLM OT - Ad-hoc retrieval OT - Efficiency analysis OT - Keyword extraction OT - News linking OT - News recommendation OT - Query reduction COIS- The authors declare that they have no competing interests. EDAT- 2023/06/22 13:09 MHDA- 2023/06/22 13:10 PMCR- 2023/01/13 CRDT- 2023/06/22 09:53 PHST- 2022/08/15 00:00 [received] PHST- 2022/11/28 00:00 [accepted] PHST- 2023/06/22 13:10 [medline] PHST- 2023/06/22 13:09 [pubmed] PHST- 2023/06/22 09:53 [entrez] PHST- 2023/01/13 00:00 [pmc-release] AID - cs-1191 [pii] AID - 10.7717/peerj-cs.1191 [doi] PST - epublish SO - PeerJ Comput Sci. 2023 Jan 13;9:e1191. doi: 10.7717/peerj-cs.1191. eCollection 2023.