item: #1 of 43 id: cord-020793-kgje01qy author: Suominen, Hanna title: CLEF eHealth Evaluation Lab 2020 date: 2020-03-24 words: 2380 flesch: 46 summary: CLEF eHealth tasks offered yearly from 2013 have brought together researchers working on related information access topics, provided them with resources to work with and validate their outcomes, and accelerated pathways from scientific ideas to societal impact. According to our analysis of the impact of CLEF eHealth labs up to 2017 keywords: care; chs; clef; clef ehealth; clinical; codes; codiesp; coding; collection; document; ehealth; evaluation; information; lab; labs; number; overview; participants; patient; processing; queries; resources; retrieval; spanish; spoken; subtasks; task; text; textual; workshop; year cache: cord-020793-kgje01qy.txt plain text: cord-020793-kgje01qy.txt item: #2 of 43 id: cord-020794-d3oru1w5 author: Leekha, Maitree title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling date: 2020-03-24 words: 1574 flesch: 52 summary: All comparisons have been made in terms of the F-1 score of the suggestion class for a fair comparison with prior work on representational learning for open domain suggestion mining [5] (refer Baseline in Table 3 ). The procedure is repeated until we have a total of N suggestion reviews. keywords: class; classification; domain; grams; language; learning; lmote; mining; model; multi; non; open; performance; reviews; sample; sampling; single; stl; suggestions; task; technique; text cache: cord-020794-d3oru1w5.txt plain text: cord-020794-d3oru1w5.txt item: #3 of 43 id: cord-020801-3sbicp3v author: MacAvaney, Sean title: Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning date: 2020-03-24 words: 2530 flesch: 50 summary: Probabilistic models of information retrieval based on measuring the divergence from randomness MS MARCO: a human generated machine reading comprehension dataset CLEF 2003 -overview of results Cross-language information retrieval (CLIR) track overview Learning to rank: from pairwise approach to listwise approach A survey of automatic query expansion in information retrieval TREC 2014 web track overview Word translation without parallel data Deeper text understanding for IR with contextual neural language modeling BERT: pre-training of deep bidirectional transformers for language understanding Overview of the fourth text retrieval conference (TREC-4) Overview of the third text retrieval conference (TREC-3) PACRR: a position-aware neural IR model for relevance matching Parameters learned in the comparison of retrieval models using term dependencies Google's multilingual neural machine translation system: enabling zero-shot translation Cross-lingual transfer learning for POS tagging without cross-lingual resources Adam: a method for stochastic optimization Cross-lingual language model pretraining Unsupervised cross-lingual information retrieval using monolingual data only CEDR: contextualized embeddings for document ranking A Markov random field model for term dependencies An introduction to neural information retrieval The TREC 2002 Arabic/English CLIR track Neural information retrieval: at the end of the early years Multilingual Information Retrieval: From Research to Practice Cross-lingual learning-torank with shared representations Cross-lingual transfer learning for multilingual task oriented dialog Cross-lingual relevance transfer for document retrieval The sixth text retrieval conference (TREC-6) Overview of the TREC 2005 robust retrieval track Overview of the fifth text retrieval conference (TREC-5) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings End-to-end neural ad-hoc ranking with kernel pooling Anserini: reproducible ranking baselines using Lucene Simple applications of BERT for ad hoc document retrieval Transfer learning for sequence tagging with hierarchical recurrent networks The digital language divide The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval keywords: approaches; arabic; bert; bm25; collections; cross; data; document; english; hoc; information; languages; learning; lingual; mandarin; model; multilingual; neural; non; performance; queries; query; ranking; relevance; results; retrieval; setting; shot; spanish; test; text; topics; training; transfer; trec; use cache: cord-020801-3sbicp3v.txt plain text: cord-020801-3sbicp3v.txt item: #4 of 43 id: cord-020806-lof49r72 author: Landin, Alfonso title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings date: 2020-03-24 words: 2374 flesch: 47 summary: key: cord-020806-lof49r72 authors: Landin, Alfonso; Parapar, Javier; Barreiro, Álvaro title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_27 sha: doc_id: 20806 cord_uid: lof49r72 Nowadays, item recommendation is an increasing concern for many companies. Recommendation accuracy became the most studied aspect of the quality of the suggestions. keywords: accuracy; collaborative; dataset; diversity; eer; elp; embeddings; experiments; fism; information; item; linear; matrix; method; model; novelty; prefs2vec; ratings; recommendations; recommender; representations; results; similarity; systems; table; user cache: cord-020806-lof49r72.txt plain text: cord-020806-lof49r72.txt item: #5 of 43 id: cord-020808-wpso3jug author: Cardoso, João title: Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements date: 2020-03-24 words: 2330 flesch: 45 summary: Section 3 describes our approach on how to establish a DMP creation service that allows for semantic based DMP representation. Our focus however is on having the generated DMP be machine-actionable, DMP Common Standards Model compliant, and expressed through the usage of semantic technologies. keywords: analysis; approach; assessment; common; data; dcso; dmp; focus; funder; funding; information; knowledge; machine; madmp; management; model; ontologies; ontology; paper; project; queries; representation; requirements; researchers; semantic; standards; tasks; technologies; templates cache: cord-020808-wpso3jug.txt plain text: cord-020808-wpso3jug.txt item: #6 of 43 id: cord-020811-pacy48qx author: Muhammad, Shamsuddeen Hassan title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon date: 2020-03-24 words: 1726 flesch: 44 summary: Sentiment lexicon is a dictionary of a lexical item with the corresponding semantic orientation. However, sentiment lexicons are domain-dependent, a word may convey two different connotations in a different domain. keywords: adaptation; analysis; approach; automatic; corpus; different; distribution; domain; embedding; generation; incremental; information; lexicon; neural; new; performance; rating; sect; sentiment; specific; word cache: cord-020811-pacy48qx.txt plain text: cord-020811-pacy48qx.txt item: #7 of 43 id: cord-020813-0wc23ixy author: Hashemi, Helia title: ANTIQUE: A Non-factoid Question Answering Benchmark date: 2020-03-24 words: 2941 flesch: 56 summary: key: cord-020813-0wc23ixy authors: Hashemi, Helia; Aliannejadi, Mohammad; Zamani, Hamed; Croft, W. Bruce title: ANTIQUE: A Non-factoid Question Answering Benchmark date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_21 sha: doc_id: 20813 cord_uid: 0wc23ixy Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. This has motivated researchers to study answer sentence and passage retrieval, in particular in response to non-factoid questions [1, 18] . keywords: annotations; answer; answering; antique; community; correct; dataset; factoid; information; judgments; label; level; models; neural; non; number; pairs; passage; quality; questions; relevance; relevant; research; results; retrieval; table; task; term; test; training; users; wikipassageqa; workers; yahoo cache: cord-020813-0wc23ixy.txt plain text: cord-020813-0wc23ixy.txt item: #8 of 43 id: cord-020814-1ty7wzlv author: Berrendorf, Max title: Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned date: 2020-03-24 words: 2314 flesch: 53 summary: The datasets used by entity alignments methods are generally based on large-scale open-source data sources such as DBPedia [1] , YAGO key: cord-020814-1ty7wzlv authors: Berrendorf, Max; Faerman, Evgeniy; Melnychuk, Valentyn; Tresp, Volker; Seidl, Thomas title: Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_1 sha: doc_id: 20814 cord_uid: 1ty7wzlv In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. keywords: adjacency; aligned; alignment; authors; code; convolutional; datasets; different; embedding; entities; entity; gcn; graph; implementation; individual; information; kgs; knowledge; lingual; matrix; methods; networks; node; number; paper; representations; results; table; weights; work cache: cord-020814-1ty7wzlv.txt plain text: cord-020814-1ty7wzlv.txt item: #9 of 43 id: cord-020815-j9eboa94 author: Kamphuis, Chris title: Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants date: 2020-03-24 words: 2251 flesch: 56 summary: Varying the scoring function, then, corresponds to varying the expression for calculating the score in the SQL query, allowing us to explore different BM25 variants by expressing them declaratively (instead of programming imperatively). We view our work as having two contributions: -We conducted a large-scale reproducibility study of BM25 variants, focusing on the Lucene implementation and variants described by Trotman et al. keywords: anserini; bm25; component; differences; different; document; effectiveness; function; index; length; lucene; original; prototyping; query; retrieval; scoring; search; source; study; term; time; value; variants; work cache: cord-020815-j9eboa94.txt plain text: cord-020815-j9eboa94.txt item: #10 of 43 id: cord-020820-cbikq0v0 author: Papadakos, Panagiotis title: Dualism in Topical Relevance date: 2020-03-24 words: 2469 flesch: 52 summary: Specifically, we sketch a method in which antonyms are used for producing dual queries, which can in turn be exploited for defining a multi-dimensional topical relevance based on the antonyms. We sketch a method in which antonyms are used for producing dual queries, which in turn can be exploited for defining a multi-dimensional topical relevance. keywords: aloe; antonyms; approach; articles; better; capitalism; concepts; direction; document; dual; dualism; evaluation; example; exploratory; fig; information; original; peace; queries; query; relevance; results; search; socialism; space; specific; tasks; terms; users; war; wse cache: cord-020820-cbikq0v0.txt plain text: cord-020820-cbikq0v0.txt item: #11 of 43 id: cord-020830-97xmu329 author: Ghanem, Bilal title: Irony Detection in a Multilingual Context date: 2020-03-24 words: 2806 flesch: 48 summary: Unsupervised neural machine translation Irony as relevant inappropriateness Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis Bilingual sentiment embeddings: joint projection of sentiment across languages Analyse d'opinion et langage figuratif dans des tweets présentation et résultats du Défi Fouille de Textes DEFT2017 Multilingual Natural Language Processing Applications: From Theory to Practice Clues for detecting irony in user-generated contents: oh s so easy;-) Translating irony in political commentary texts from English into Arabic Irony as indirectness cross-linguistically: on the scope of generic mechanisms Word translation without parallel data IDAT@FIRE2019: overview of the track on irony detection in Arabic tweets IDAT@FIRE2019: overview of the track on irony detection in Arabic tweets LDR at SemEval-2018 task 3: a low dimensional text representation for irony detection Annotating irony in a novel Italian corpus for sentiment analysis Learning word vectors for 157 languages Logic and conversation SemEval-2018 task 3: irony detection in English tweets Sentiment polarity classification of figurative language: exploring the role of irony-aware and multifaceted affect features Irony detection in twitter: the role of affective content Disambiguating false-alarm hashtag usages in tweets for irony detection Irony detection with attentive recurrent neural networks SOUKHRIA: towards an irony detection system for Arabic in social media Towards a contextual pragmatic model to detect irony in tweets Exploring the impact of pragmatic phenomena on irony detection in tweets: a multilingual corpus study Convolutional neural networks for sentence classification The perfect solution for detecting sarcasm in tweets# not Unsupervised cross-lingual information retrieval using monolingual data only Efficient estimation of word representations in vector space Linguistic regularities in continuous space word representations Improving multilingual named entity recognition with Wikipedia entity type mapping Overview of the task on irony detection in Spanish variants Sarcasm detection on Czech and English twitter A survey of cross-lingual embedding models Cross-lingual learning-torank with shared representations A contrastive study of ironic expressions in English and Arabic AraVec: a set of Arabic word embedding models for use in Arabic NLP A corpus of English-Hindi code-mixed tweets for sarcasm detection Chinese irony corpus construction and ironic structure analysis Reasoning with sarcasm by reading inbetween Creative language retrieval: a robust hybrid of information retrieval and linguistic creativity Unsupervised cross-lingual word embedding by multilingual neural language models We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony. keywords: analysis; annotated; arabic; art; cnn; corpus; cross; dataset; detection; embeddings; english; european; features; french; information; ironic; irony; languages; learning; models; monolingual; multilingual; neural; non; original; process; representation; results; retrieval; sarcasm; sentiment; similar; state; systems; task; training; tweets; use; vs.; word cache: cord-020830-97xmu329.txt plain text: cord-020830-97xmu329.txt item: #12 of 43 id: cord-020832-iavwkdpr author: Nguyen, Dat Quoc title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents date: 2020-03-24 words: 1983 flesch: 43 summary: Our goals are: (1) To develop tasks that impact chemical research in both academia and industry, (2) To provide the community with a new dataset of chemical entities, enriched with relational links between chemical event triggers and arguments, and (3) To advance the state-of-the-art in information extraction over chemical patents. Long sentences listing names of compounds are frequently used in chemical patents. keywords: arguments; chemdner; chemical; chemu; compounds; corpus; development; documents; domain; entities; entity; evaluation; event; extraction; industry; information; key; label; new; patents; product; reaction; recognition; role; scientific; table; task; text; trigger; types; word cache: cord-020832-iavwkdpr.txt plain text: cord-020832-iavwkdpr.txt item: #13 of 43 id: cord-020834-ch0fg9rp author: Grand, Adrien title: From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance date: 2020-03-24 words: 2735 flesch: 49 summary: This lag has contributed to the broad perception by researchers that Lucene produces poor search results and is illsuited for information retrieval research. Since then, there has been a resurgence of interest in adopting Lucene for information retrieval research, including a number of workshops that brought together like-minded researchers over the past few years [1, 2] . keywords: academic; anserini; block; case; changes; different; ding; evaluation; experiments; impact; implementation; indexes; information; innovations; lucene; max; open; pairs; paper; performance; query; real; report; reproducibility; research; researchers; results; retrieval; scores; scoring; source; suel; systems; time; version; wand; world cache: cord-020834-ch0fg9rp.txt plain text: cord-020834-ch0fg9rp.txt item: #14 of 43 id: cord-020835-n9v5ln2i author: Jangra, Anubhav title: Text-Image-Video Summary Generation Using Joint Integer Linear Programming date: 2020-03-24 words: 2289 flesch: 46 summary: Recent years have shown great promise in the emerging field of multi-modal summarization. Multi-modal summarization has various applications ranging from meeting recordings summarization [7] , sports video summarization keywords: abstractive; average; correlation; dataset; document; extractive; framework; ilp; images; information; integer; joint; key; linear; modal; multi; multimodal; networks; neural; novel; programming; scores; sentence; set; similarity; summarization; summary; task; text; textual; users; vector; video cache: cord-020835-n9v5ln2i.txt plain text: cord-020835-n9v5ln2i.txt item: #15 of 43 id: cord-020841-40f2p3t4 author: Hofstätter, Sebastian title: Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-ranking Results date: 2020-03-24 words: 1526 flesch: 47 summary: We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. With the adoption of neural re-ranking models, where the scoring process is arguably more complex than traditional retrieval methods, the divide between result score and the reasoning behind it becomes even stronger. keywords: content; different; document; evaluation; explorer; highlight; information; kernel; models; neural; pooling; queries; query; ranking; result; retrieval; scores; term; tool; users; view; visualization cache: cord-020841-40f2p3t4.txt plain text: cord-020841-40f2p3t4.txt item: #16 of 43 id: cord-020843-cq4lbd0l author: Almeida, Tiago title: Calling Attention to Passages for Biomedical Question Answering date: 2020-03-24 words: 2237 flesch: 46 summary: The inputs to the network are the query, a set of document passages aggregated by each query term, and the absolute position of each passage. For the remaining explanation, let us first define a query as a sequence of terms q = {u 0 , u 1 , ..., u Q }, where u i is the i-th term of the query; a set of document passages aggregated by each query term as D(u keywords: architecture; attention; bioasq; biomedical; deeprank; document; final; information; layer; matching; model; network; neural; passage; performance; query; query term; question; ranking; relevance; relevant; results; retrieval; self; semantic; set; similar; system; term; training; weights cache: cord-020843-cq4lbd0l.txt plain text: cord-020843-cq4lbd0l.txt item: #17 of 43 id: cord-020846-mfh1ope6 author: Zlabinger, Markus title: DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations date: 2020-03-24 words: 2535 flesch: 48 summary: We provide graded symptom judgments for diseases by differentiating between relevant symptoms and primary symptoms. We label the symptoms using graded judgments [5] , where we differentiate between: relevant symptoms (graded as 1) and primary symptoms (graded as 2). keywords: adaption; articles; available; baselines; collection; cooccur; diagnosis; disease; dsr; effectiveness; evaluation; extraction; graded; keywords; medical; mesh; method; occurrence; pmc; primary; pubmed; relations; relevant; second; symptom; text; title; umls; vocabulary cache: cord-020846-mfh1ope6.txt plain text: cord-020846-mfh1ope6.txt item: #18 of 43 id: cord-020848-nypu4w9s author: Morris, David title: SlideImages: A Dataset for Educational Image Classification date: 2020-03-24 words: 2277 flesch: 48 summary: Beyond the requirements of our taxonomy, our datasets needed to be representative of common educational illustrations in order to fit real-world applications, and legally shareable to promote research on educational image classification. Neural networks designed and trained to make sense of the noise and spatial relationships in photos are sometimes suboptimal for born-digital images and educational images in general. keywords: baseline; charts; classes; classification; cnns; dataset; diagrams; different; digital; docfigure; document; educational; extraction; features; illustrations; images; information; large; networks; neural; open; paper; photos; pre; retrieval; scientific; search; sect; similar; slideimages; task; test; text; training; type; use; vision cache: cord-020848-nypu4w9s.txt plain text: cord-020848-nypu4w9s.txt item: #19 of 43 id: cord-020851-hf5c0i9z author: Losada, David E. title: eRisk 2020: Self-harm and Depression Challenges date: 2020-03-24 words: 2483 flesch: 54 summary: key: cord-020851-hf5c0i9z authors: Losada, David E.; Crestani, Fabio; Parapar, Javier title: eRisk 2020: Self-harm and Depression Challenges date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_72 sha: doc_id: 20851 cord_uid: hf5c0i9z This paper describes eRisk, the CLEF lab on early risk prediction on the Internet. The lab casts early risk prediction as a process of sequential accumulation of evidence. keywords: alerts; anorexia; approach; chunk; data; decision; depression; detection; early; early risk; erisk; evaluation; lab; measures; media; metrics; new; participants; prediction; questionnaire; release; self; set; shared; signs; social; task; test; user; week; writings cache: cord-020851-hf5c0i9z.txt plain text: cord-020851-hf5c0i9z.txt item: #20 of 43 id: cord-020871-1v6dcmt3 author: Papariello, Luca title: On the Replicability of Combining Word Embeddings and Retrieval Models date: 2020-03-24 words: 2137 flesch: 51 summary: We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these representations in document classification, clustering, and retrieval. Extending the IR replicability infrastructure to include performance aspects Distributed representations of sentences and documents Verboseness fission for BM25 document length normalization Efficient estimation of word representations in vector space Neural information retrieval: at the end of the early years A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales The TREC robust retrieval track. keywords: cbow; classification; clustering; dataset; dimensional; distributions; document; experiments; fisher; gaussian; gmm; idf; information; methods; mises; mixture; models; movmf; original; representations; retrieval; vectors; von; word; zhang cache: cord-020871-1v6dcmt3.txt plain text: cord-020871-1v6dcmt3.txt item: #21 of 43 id: cord-020872-frr8xba6 author: Santosh, Tokala Yaswanth Sri Sai title: DAKE: Document-Level Attention for Keyphrase Extraction date: 2020-03-24 words: 2961 flesch: 42 summary: Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents Scibert: pretrained contextualized embeddings for scientific text Opinion expression mining by exploiting keyphrase extraction Citation-enhanced keyphrase extraction from research papers: a supervised approach Bert: pre-training of deep bidirectional transformers for language understanding PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents The Viterbi algorithm Incorporating expert knowledge into keyphrase extraction CorePhrase: keyphrase extraction for document clustering Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art Long short-term memory A study on automatically extracted keywords in text categorization Phrasier: a system for interactive document retrieval using keyphrases Automatic keyphrase extraction from scientific articles Adam: a method for stochastic optimization Conditional random fields: probabilistic models for segmenting and labeling sequence data Unsupervised keyphrase extraction: introducing new kinds of words to keyphrases Human-competitive tagging using automatic keyphrase extraction Deep keyphrase generation Document-level neural machine translation with hierarchical attention networks Textrank: bringing order into text Keyphrase extraction in scientific publications GloVe: global vectors for word representation Citation summarization through keyphrase extraction Enhancing access to scholarly publications with surrogate resources Dropout: a simple way to prevent neural networks from overfitting Single document keyphrase extraction using neighborhood knowledge PTR: phrase-based topical ranking for automatic keyphrase extraction in scientific publications KEA: practical automated keyphrase extraction Global attention for name tagging World wide web site summarization. Previous approaches for keyphrase extraction model it as a sequence labelling task and use local contextual information to understand the semantics of the input text but they fail when the local context is ambiguous or unclear. keywords: additional; approaches; attention; bilstm; contextual; crf; dake; dataset; dependencies; document; extraction; gating; information; input; keyphrase; keyphrase extraction; label; layer; level; local; mechanism; model; output; papers; phrases; representations; research; score; semantics; sentence; sequence; task; unsupervised; word cache: cord-020872-frr8xba6.txt plain text: cord-020872-frr8xba6.txt item: #22 of 43 id: cord-020875-vd4rtxmz author: Suwaileh, Reem title: Time-Critical Geolocation for Social Good date: 2020-03-24 words: 2031 flesch: 44 summary: [14] , on the other hand, constructed their own noisy gazetteer using a crowdsourcing-like method to match extracted location mentions from tweets by the POI tagger. I specifically aim to study the Location Mention Prediction problem in which the system has to extract location mentions in tweets and pin them on the map. keywords: approaches; challenges; context; data; deep; disambiguation; disaster; effectiveness; efficiency; gazetteer; information; learning; lmp; location; media; mentions; methods; networks; neural; noisy; prediction; problem; recognition; related; sect; social; solutions; system; task; techniques; time; tools; tweets; twitter cache: cord-020875-vd4rtxmz.txt plain text: cord-020875-vd4rtxmz.txt item: #23 of 43 id: cord-020880-m7d4e0eh author: Barrón-Cedeño, Alberto title: CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media date: 2020-03-24 words: 2695 flesch: 60 summary: Automated systems for claim identification and verification can be very useful as supportive technology for investigative journalism, as they could provide help and guidance, thus saving time Lab on automatic identification and verification of political claims. keywords: automatic; checking; checkthat; claims; clef; dataset; debates; edition; evaluation; evidence; fact; identification; information; lab; media; neural; news; pages; political; precision; previous; ranking; relevant; set; social; systems; task; topic; tweets; verification; web; worthiness; worthy cache: cord-020880-m7d4e0eh.txt plain text: cord-020880-m7d4e0eh.txt item: #24 of 43 id: cord-020885-f667icyt author: Sharma, Ujjwal title: Semantic Path-Based Learning for Review Volume Prediction date: 2020-03-17 words: 4027 flesch: 43 summary: MANTIS: system support for MultimodAl NeTworks of in-situ sensors HyperLearn: a distributed approach for representation learning in datasets with many modalities Interaction networks for learning about objects, relations and physics Heterogeneous network embedding via deep architectures The task-dependent effect of tags and ratings on social media access Metapath2vec: scalable representation learning for heterogeneous networks M-HIN: complex embeddings for heterogeneous information networks via metagraphs Node2vec: scalable feature learning for networks Deep residual learning for image recognition Leveraging meta-path based context for top-n recommendation with a neural co-attention model Multimodal network embedding via attention based multi-view variational autoencoder Adam: a method for stochastic gradient descent Distributed representations of sentences and documents Deep collaborative embedding for social image understanding How random walks can help tourism Image labeling on a network: using social-network metadata for image classification Distributed representations of words and phrases and their compositionality Multimodal deep learning Multi-source deep learning for human pose estimation The PageRank citation ranking: bringing order to the web GCap: graph-based automatic image captioning DeepWalk: online learning of social representations The visual display of regulatory information and networks Nonlinear dimensionality reduction by locally linear embedding Generating visual summaries of geographic areas using community-contributed images ImageNet large scale visual recognition challenge Heterogeneous information network embedding for recommendation Semantic relationships in multi-modal graphs for automatic image annotation PathSim: meta path-based top-k similarity search in heterogeneous information networks LINE: large-scale information network embedding Study on optimal frequency design problem for multimodal network using probit-based user equilibrium assignment Adaptive image retrieval using a graph model for semantic feature integration Heterogeneous graph attention network Network representation learning with rich text information Interactive multimodal learning for venue recommendation MetaGraph2Vec: complex semantic path augmented heterogeneous network embedding We use restaurant nodes as root nodes for the unbiased random walks and perform 80 walks per root node, each with a walk length of 80. keywords: approach; attention; attributes; bimodal; captioning; categorical; concatenation; context; deep; dimensional; edges; embeddings; features; fusion; graph; heterogeneous; image; information; interactions; learning; low; mechanism; metapath; modalities; modality; model; multimodal; multiple; network; nodes; objective; performance; random; real; relations; representations; restaurant; review; semantic; separate; set; similar; similarity; single; specific; task; types; users; venues; views; visual; volume; walks; world cache: cord-020885-f667icyt.txt plain text: cord-020885-f667icyt.txt item: #25 of 43 id: cord-020888-ov2lzus4 author: Formal, Thibault title: Learning to Rank Images with Cross-Modal Graph Convolutions date: 2020-03-17 words: 5213 flesch: 48 summary: key: cord-020888-ov2lzus4 authors: Formal, Thibault; Clinchant, Stéphane; Renders, Jean-Michel; Lee, Sooyeol; Cho, Geun Hee title: Learning to Rank Images with Cross-Modal Graph Convolutions date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_39 sha: doc_id: 20888 cord_uid: ov2lzus4 We are interested in the problem of cross-modal retrieval for web image search, where the goal is to retrieve images relevant to a text query. We propose to revisit the problem of cross-modal retrieval in the context of re-ranking. keywords: -the; approaches; architecture; attention; context; convolution; cross; datasets; dcmm; deep; different; differentiable; embeddings; experiments; features; feedback; general; graph; image; information; instance; joint; large; layer; learning; listwise; logs; mechanisms; mediaeval; methods; modal; models; multi; neighbors; networks; neural; new; node; number; order; prf; problem; queries; query; ranking; relevance; relevant; results; retrieval; score; search; set; similarity; simple; space; standard; task; text; textual; training; unsupervised; visual; webq cache: cord-020888-ov2lzus4.txt plain text: cord-020888-ov2lzus4.txt item: #26 of 43 id: cord-020890-aw465igx author: Brochier, Robin title: Inductive Document Network Embedding with Topic-Word Attention date: 2020-03-17 words: 4244 flesch: 50 summary: To our knowledge, we are the first to evaluate this kind of inductive setting in the context of document network embedding; -we qualitatively show that our model learns meaningful word and topic vectors and produces interpretable document representations. Recent techniques extend NE for document networks, showing that text and graph information can be combined to improve the resolution of classification and prediction tasks. keywords: algorithms; approach; attention; cane; classification; document; document network; document representations; embedding; following; gvnr; idne; inductive; information; interpretable; learning; links; matrix; meaningful; mechanism; method; model; network; new; nodes; number; paper; prediction; recent; representations; scores; sect; set; setting; similar; space; tadw; tasks; text; textual; topic; twa; vectors; weights; word cache: cord-020890-aw465igx.txt plain text: cord-020890-aw465igx.txt item: #27 of 43 id: cord-020891-lt3m8h41 author: Witschel, Hans Friedrich title: KvGR: A Graph-Based Interface for Explorative Sequential Question Answering on Heterogeneous Information Sources date: 2020-03-17 words: 4927 flesch: 47 summary: Visualising graph query results is different from visualising graphs in general; the former resembles generation of results snippets in text retrieval [16] . This refinement in turn is based on the selection of subsets of nodes for context definition and natural language questions towards this context. keywords: answering; answers; approach; base; better; books; clear; complex; concept; context; data; domain; exploratory; field; focus; graph; information; interaction; knowledge; language; large; learning; natural; needs; new; nlidb; nodes; parser; parsing; participants; queries; query; question; relationships; result; retrieval; search; sect; selection; semantic; sequence; sequential; simple; sources; sqa; structured; subgraph; system; users; visual cache: cord-020891-lt3m8h41.txt plain text: cord-020891-lt3m8h41.txt item: #28 of 43 id: cord-020896-yrocw53j author: Agarwal, Mansi title: MEMIS: Multimodal Emergency Management Information System date: 2020-03-17 words: 4880 flesch: 49 summary: Although their contribution was a step towards advancing damage assessment systems, the features used were relatively simple and weak, as opposed to the deep neural network models, where each layer captures complex information about the modality [17] . The existing research on disaster damage analysis has primarily taken only unimodal information in the form of text or image into account. keywords: analysis; assessment; attention; case; categories; classification; crisismmd; damage; data; dataset; deep; different; disaster; features; filtering; fusion; groups; image; individual; information; infrastructural; input; learning; location; management; media; modalities; modality; models; modules; multimodal; neural; performance; present; processing; proposed; relevance; relief; response; results; section; severity; social; system; tasks; text; time; training; tweets; twitter; unimodal; use; work cache: cord-020896-yrocw53j.txt plain text: cord-020896-yrocw53j.txt item: #29 of 43 id: cord-020899-d6r4fr9r author: Doinychko, Anastasiia title: Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views date: 2020-03-17 words: 4667 flesch: 47 summary: We demonstrate that generated views allow to achieve state-of-the-art results on a subset of Reuters RCV1/RCV2 collections compared to multiview approaches that rely on Machine Translation (MT) for translating documents into languages in which their versions do not exist; before training the models. In order to evaluate the quality of generated views by Cond 2 GANs we considered two scenarios. keywords: approaches; case; classes; classification; complete; cond; corresponding; different; discriminator; distribution; documents; example; external; following; function; game; gans; generation; generators; image; input; languages; learning; missing; missing views; models; multiview; networks; objective; observations; observed; problem; proposed; real; results; samples; set; test; training; value; views cache: cord-020899-d6r4fr9r.txt plain text: cord-020899-d6r4fr9r.txt item: #30 of 43 id: cord-020901-aew8xr6n author: García-Durán, Alberto title: TransRev: Modeling Reviews as Translations from Users to Items date: 2020-03-17 words: 5042 flesch: 52 summary: In addition, the availability of product reviews allows users to make more informed purchasing choices and companies to analyze costumer sentiment towards their products. In recent years the availability of large corpora of product reviews has driven text-based research in the recommender system community (e.g. [3, 19, 21] ). keywords: analysis; approaches; approximated; assumption; available; best; bias; completion; data; difference; embedding; function; good; graph; hft; item; knowledge; large; latent; learning; methods; model; modeling; neural; objective; online; overall; parameters; performance; prediction; problem; product; rating; recommendation; recommender; regression; representations; research; review; review embedding; score; sentiment; sets; similar; space; systems; test; text; training; translation; transrev; user; word; work cache: cord-020901-aew8xr6n.txt plain text: cord-020901-aew8xr6n.txt item: #31 of 43 id: cord-020903-qt0ly5d0 author: Tamine, Lynda title: What Can Task Teach Us About Query Reformulations? date: 2020-03-17 words: 4965 flesch: 56 summary: In the literature review, there are many definitions of query sessions. It is more likely that query terms are renewed during long tasks which could be explained by shifts in information needs related to the same driving long-term task. keywords: analysis; avg; changes; context; data; differences; different; features; information; large; learning; length; logs; long; longer; medium; min; model; multi; previous; queries; query; reformulation; related; research; resp; results; search; sequences; sessions; set; short; significant; similarity; stages; studies; table; tasks; term; time; trends; user; values; web; work cache: cord-020903-qt0ly5d0.txt plain text: cord-020903-qt0ly5d0.txt item: #32 of 43 id: cord-020904-x3o3a45b author: Montazeralghaem, Ali title: Relevance Ranking Based on Query-Aware Context Analysis date: 2020-03-17 words: 5195 flesch: 50 summary: [13] , therefore we only report the result of embedding-based estimation of query language models. [39] proposed an embedding query expansion named EQE1 to estimate query language model. keywords: analysis; centric; constraints; context; decision; document; embeddings; estimate; exact; expansion; experiments; feedback; function; importance; information; language; latent; lda; local; logistic; matching; method; model; modeling; parameter; performance; principle; queries; query; query terms; ranking; relevance; relevant; respect; retrieval; score; semantic; similarity; terms; use; value; word cache: cord-020904-x3o3a45b.txt plain text: cord-020904-x3o3a45b.txt item: #33 of 43 id: cord-020905-gw8i6tkn author: Qu, Xianshan title: An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction date: 2020-03-17 words: 5416 flesch: 58 summary: A study of customer reviews on amazon Exploring latent semantic factors to find useful product reviews Modeling and prediction of online product review helpfulness: a survey Learning to recommend helpful hotel reviews Glove: global vectors for word representation A dynamic neural network model for CTR prediction in real-time bidding Review helpfulness assessment Twitter sentiment analysis with deep convolutional neural networks Context-aware review helpfulness rating prediction Improving review representations with user attention and product attention for sentiment classification Automatically predicting peer-review helpfulness Semantic analysis and helpfulness prediction of text for online product reviews Text understanding from scratch Character-level convolutional networks for text classification Acknowledgments. [10] investigated a variety of content features from Amazon product reviews, and found that features such as review length, unigrams and product ratings are most useful in measuring review helpfulness. keywords: amazon; attention; attention layer; attributes; auc; cold; context; customer; data; different; fan; features; function; helpfulness; hidden; improvement; information; layer; level; model; neural; number; parameters; performance; prediction; previous; product; product attention; product information; related; representation; results; review; scenario; score; sentence; sentiment; set; sets; start; state; table; text; training; use; vector; warm; word; yelp cache: cord-020905-gw8i6tkn.txt plain text: cord-020905-gw8i6tkn.txt item: #34 of 43 id: cord-020908-oe77eupc author: Chen, Zhiyu title: Leveraging Schema Labels to Enhance Dataset Search date: 2020-03-17 words: 4312 flesch: 58 summary: We propose to improve dataset search by making use of generated schema labels, since these can be complementary to the original schema labels and especially valuable when they are otherwise absent from a dataset. With generated schema labels, the ranking model can have a higher performance on dataset retrieval task (Q2). keywords: column; context; dataset; description; different; document; embedding; features; field; framework; generation; information; item; labels; latent; matrix; metadata; methods; mixed; model; multifield; new; performance; query; ranking; representations; results; retrieval; rows; schema; schema labels; score; scoring; search; single; table; task; text; title; use; user; web; word cache: cord-020908-oe77eupc.txt plain text: cord-020908-oe77eupc.txt item: #35 of 43 id: cord-020909-n36p5n2k author: Papadakos, Panagiotis title: bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users date: 2020-03-17 words: 5006 flesch: 59 summary: In this work, we propose the bias goggles model, for computing the bias characteristics of web domains to user-defined concepts based on the structure of the web graph. In this work, we propose the bias goggles model, where users are able to explore the biased characteristics of web domains for a specific biased concept (i.e., a bias goggle). keywords: abs; algorithms; approach; aspects; available; average; bc2; bcs; behaviours; bias; bias score; biased; concept; content; data; domains; doms(w; example; goggles; golden; graph; greek; high; information; links; model; neighbors; nodes; pagerank; pages; platforms; political; propagation; prototype; related; results; score; search; sect; seeds; set; sign(s; sites; social; specific; support; support scores; supportive; surfer; time; users; web; work cache: cord-020909-n36p5n2k.txt plain text: cord-020909-n36p5n2k.txt item: #36 of 43 id: cord-020912-tbq7okmj author: Batra, Vishwash title: Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration date: 2020-03-17 words: 4507 flesch: 46 summary: Stepwise Recipe illustration example showing a few text recipe instruction steps alongside one full sequence of recipe images. To our knowledge, an application [17] that ranks and retrieves image sequences based on longer text paragraphs as queries was the first to extend the pairwise image-text relationship to matching image sequences with longer paragraphs. keywords: appropriate; cases; context; cooking; corresponding; cross; dataset; deep; embedding; feature; gold; gru; human; illustration; image; information; latent; learning; loss; model; multimodal; networks; neural; new; output; pairs; passages; proposed; query; recall@k; recipe; recurrent; representation; results; retrieval; retrieved; semantic; seq2seq; sequence; sequential; similarity; space; standard; stepwise; story; task; text; topic; use; vae; variational; vector; visual; vrss cache: cord-020912-tbq7okmj.txt plain text: cord-020912-tbq7okmj.txt item: #37 of 43 id: cord-020914-7p37m92a author: Dumani, Lorik title: A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity date: 2020-03-17 words: 5482 flesch: 64 summary: Finally, to compute the probability of picking a premise cluster instead of a single premise, we additionally need to aggregate over all premises in the cluster; this works since premise clusters are disjoint by construction: Note that if the user does not make a distinction between supporting and attacking clusters, but instead just wants good premise clusters, we can extend the experiment such that the user first throws a fair coin to decide if he will pick a supporting or attacking premise cluster. We write p → c if p ∈ P appears as a premise for c ∈ C in the corpus, and p + → c if p supports c. Similar to claim clusters, we consider premise clusters π j ⊆ P of premises with the same meaning and the corresponding premise clustering Π = {π 1 , π 2 , . . .} keywords: 512; argument; baseline; bert; bm25f; claim; clustering; clusters; corpus; corresponding; different; embeddings; evaluation; example; fossil; framework; frequency; fuels; gain; ground; implementation; information; large; list; paper; premises; probability; query; query claim; ranking; relevance; relevant; result; retrieval; set; similar; stance; standard; support; system; task; text; truth; user; work cache: cord-020914-7p37m92a.txt plain text: cord-020914-7p37m92a.txt item: #38 of 43 id: cord-020916-ds0cf78u author: Fard, Mazar Moradi title: Seed-Guided Deep Document Clustering date: 2020-03-17 words: 5081 flesch: 50 summary: For this evaluation, we have proposed a simple method to automatically select seed words that behaves comparably to manual seed words for evaluation purposes. For document clustering, constraints can be provided in the form of seed words, each cluster being characterized by a small set of words. keywords: 20news; acc; approaches; ari; auto; average; category; class; classification; clustering; clusters; collection; constraints; dataless; datasets; deep; different; dkm; documents; embedding; encoder; experiments; knowledge; learning; loss; means; methods; model; number; paper; problem; representations; results; reuters; sd2c; seed; seed words; set; space; stm; text; topic; training; use; words cache: cord-020916-ds0cf78u.txt plain text: cord-020916-ds0cf78u.txt item: #39 of 43 id: cord-020918-056bvngu author: Nchabeleng, Mathibele title: Evaluating the Effectiveness of the Standard Insights Extraction Pipeline for Bantu Languages date: 2020-03-17 words: 4538 flesch: 45 summary: We found that for Bantu languages, due to their complex grammatical structure, extra preprocessing steps such as part-of-speech tagging and morphological analysis are required during data cleaning, threshold values should be adjusted during topic modeling, and semantic analysis should be performed before completing text preprocessing. In this paper, we investigated whether the standard insights extraction pipeline is sufficient when applied to a single language family indigenous to Africa, Bantu languages, using the following questions: (1) how well does the standard insights extraction pipeline apply to Bantu languages; and (2) if found to be inadequate, why, and how can the pipeline be modified so as to be applicable to Bantu languages? keywords: adjectives; africa; analysis; bantu; bantu languages; clustering; data; dataset; dimensionality; english; example; insights; languages; matrix; media; mixed; modeling; morphemes; morphological; negation; network; noun; number; order; pipeline; prefix; preprocessing; processes; reduction; runyankore; sect; sentiment; sepedi; single; social; speech; standard; stem; structure; tagging; text; topic; tweets; twitter; verb; words; writing cache: cord-020918-056bvngu.txt plain text: cord-020918-056bvngu.txt item: #40 of 43 id: cord-020927-89c7rijg author: Zhuang, Shengyao title: Counterfactual Online Learning to Rank date: 2020-03-17 words: 5136 flesch: 48 summary: [22] , which does not require sampling candidate rankers to create interleaved results lists for online evaluation. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. keywords: algorithm; bandit; better; bias; candidate; click; coltr; considered; counterfactual; current; datasets; dbgd; distribution; documents; estimator; evaluation; feedback; function; gradient; implicit; impressions; interaction; learning; list; method; model; number; offline; online; pdgd; performance; pigd; pmgd; position; probabilistic; probability; production; propensity; ranker; result; risk; sampling; search; traditional; unbiased; user cache: cord-020927-89c7rijg.txt plain text: cord-020927-89c7rijg.txt item: #41 of 43 id: cord-020931-fymgnv1g author: Meng, Changping title: ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis date: 2020-03-17 words: 4969 flesch: 43 summary: For future works, we are interested in providing the personalized recommendation of articles based on the combination of article readability and the understanding ability of the user. In this paper, we propose a new and comprehensive framework which uses a hierarchical self-attention model to analyze document readability. keywords: analysis; approaches; articles; attention; average; cambridge; cee; classification; cohesion; complexity; dataset; different; difficulty; document; embedding; encoder; english; evaluation; exam; explicit; explicit features; features; hierarchical; information; language; layer; learning; length; level; load; matrix; model; modeling; network; neural; number; overall; readability; readers; regression; related; second; self; semantic; sentence; sentiment; sequence; statistical; structure; table; task; text; traditional; training; transfer; transformer; use; vector; web; weebit; wikipedia; words cache: cord-020931-fymgnv1g.txt plain text: cord-020931-fymgnv1g.txt item: #42 of 43 id: cord-020932-o5scqiyk author: Zhong, Wei title: Accelerating Substructure Similarity Search for Formula Retrieval date: 2020-03-17 words: 4603 flesch: 58 summary: Although at the time of writing, it obtains the best effectiveness for the NTCIR-12 dataset, the typically large number of query paths means that query run times are not ideal -maximum run times can be a couple of seconds. For query paths, the corresponding posting lists are merged and approximate matching is performed on candidates one expression at a time. keywords: best; candidate; common; document; dynamic; e.g.; efficiency; fig; formula; gbp; hit; index; leaves; lists; matched; matching; math; maxref; nodes; non; ntcir-12; number; opt; partial; paths; posting; processing; pruning; queries; query; rank; requirement; results; retrieval; rooted; score; search; set; similarity; strategies; structural; subtree; systems; time; tokenized; upperbound; var; widest; wildcard cache: cord-020932-o5scqiyk.txt plain text: cord-020932-o5scqiyk.txt item: #43 of 43 id: cord-020936-k1upc1xu author: Sanz-Cruzado, Javier title: Axiomatic Analysis of Contact Recommendation Methods in Social Networks: An IR Perspective date: 2020-03-17 words: 5657 flesch: 50 summary: However, the exact properties that make such adapted contact recommendation models effective at the task are as yet unknown. These ties have been materialized in the design and development of recommendation approaches based on IR models [2, 10, 39] . keywords: algorithms; analysis; approaches; axiomatic; axioms; bm25; candidate; common; conditions; constraints; contact; contact recommendation; degree; different; discrimination; documents; edge; ewc2; ewc3; fang; frequency; friends; graph; information; length; link; mapping; methods; models; neighborhood; neighbors; networks; new; normalization; original; people; query; recommendation; retrieval; satisfied; set; social; target; task; term; test; twitter; users; weights cache: cord-020936-k1upc1xu.txt plain text: cord-020936-k1upc1xu.txt