In-2022

Finding NLP Papers by Asking a Multi-hop Question

- Authors: Xiaoran Li; Toshiaki Takano
- Venue: Special Interest Group on Spoken Language Understanding and Dialogue Processing (SIG-SLUD) 95th
- Links: soon
- Code: [Github]
- BibTeX: [Google Scholar]

Abstract:
Researchers quickly find and understand the articles relevant to their research remains challenging due to the rapid iteration of technologies and the ever-increasing volume of scientific articles. In this paper, we propose a method for question generation based on unsupervised multi-hop question answering to adapt to the beginner's questioning style, using Natural Language Processing (NLP) papers as an example. Specifically, we employ a keyphrase extraction pre-training model to generate questions on the pre-trained Q&A model utilizing the keyphrases as answers. Next, construct an extensive keyphrase dictionary by converting questions into descriptions operating linguistic rules. Finally, we augment the questions by replacing the keyphrase in the questions with their definitions. Moreover, to implement the paper recommendation task, we constructed a simple neural network recommendation model to predict the order of paper relevance. We collected paper abstracts from NLP's top venues published from 2017 to 2022 for training. In extensive experiments, our method shows comparable performance.

The Endeavour to Advance Short Text Classification: Using Heterogeneous Graph Neural Network via Building Sememe-relationships

- Authors: Xiaoran Li; Toshiaki Takano
- Venue: Japanese Society for Artificial Intelligence (JSAI) 36th
- Links: [PDF] from jst.go.jp
- Code: [Github]
- BibTeX: [Google Scholar]

Abstract:
Short Text Classification (STC) is one of the fundamental tasks in natural language processing. The lack of grammatical structure and contextual information causes it challenging. One approach is to improve the STC by introducing the label information of entities via the entity knowledge base to build a hierarchical heterogeneous graph. However, the previous entity knowledge bases do not consider the complex semantic relationships of entities, and the number of entities in the articles is too large, affecting the computational resources. This paper proposes using sememes instead of entities to exploit the deeper semantic relations between words better to build heterogeneous graph networks. As the smallest semantic unit, the sememe consists of a finite number of words. We utilized Self-attention to find the sememe in short texts and the weight parameter between them. Extensive experiments results have demonstrated that our proposed method outperforms state-of-the-art methods on the Snippets dataset for STC.

A Data Augmentation Method for Building Sememe Knowledge Base via Reconstructing Dictionary Definitions

- Authors: Xiaoran Li; Toshiaki Takano
- Venue: The Association for Natural Language Processing (ANLP) 28th
- Links: [PDF] from jst.go.jp
- Code: [Github]
- BibTeX: [Google Scholar]

Abstract:
A sememe is a semantic language unit of meaning; it is indivisible. Sememe knowledge bases (SKBs), which contain words annotated with sememes, have been successfully applied to many natural language processing tasks. Some extant construction methods for sememe knowledge bases are performed in a limited lexicon with fixed-size sememe annotations. However, the obtained sememe annotation is challenging to extend to more words from other lexicons. In this paper, we proposed a method via reconstructing word definitions for expanding the lexicon. Moreover, we presented an evaluation sememe method utilizing graph embedding techniques and performed many experiments to prove effective.

Textualisation of Dialectal Speech Material Using the Wav2vec Model

- Authors: Mineo, Kainari; Xiaoran Li; Taniguchi, Joy; Takano, Toshiaki.
- Venue: Language Resource Workshop in Japan 2022
- Links: [PDF] from jst.go.jp
- Code: [Github]

Abstract:

In-2021

The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network

- Authors: Xiaoran Li; Toshiaki Takano
- Venue: Special Interest Group on Spoken Language Understanding and Dialogue Processing (SIG-SLUD) 92th
- Links: [PDF] from arXiv.org
- Code: [Github]
- BibTeX: [Google Scholar]

Abstract:
A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks, and we believe that by learning the smallest unit of meaning, computers can more easily understand human language. However, Existing sememe KBs are built on only manual annotation, human annotations have personal understanding biases, and the meaning of vocabulary will be constantly updated and changed with the times, and artificial methods are not always practical. To address the issue, we propose an unsupervised method based on a deep clustering network (DCN) to build a sememe KB, and you can use any language to build a KB through this method. We first learn the distributed representation of multilingual words, use MUSE to align them in a single vector space, learn the multi-layer meaning of each word through the self-attention mechanism, and use a DNC to cluster sememe features. Finally, we completed the prediction using only the 10-dimensional sememe space in English. We found that the low-dimensional space can still retain the main feature of the sememes.

Paper-style Text Transfer by Constructing a Pseudo-parallel Corpus in Axis Language

- Authors: Xiaoran Li; Toshiaki Takano
- Venue: YANS 16th 2021
- Links: [URL]
- Code: [Github]
- BibTeX: [Google Scholar]

Abstract:

In-2020

Lexical Semantic Relation Prediction Based on Word Embeddings

- Authors: Xiaoran Li; Yo Ehara
- Venue: YANS 16th 2020
- Links: [URL]
- Code: [Github]
- BibTeX: [Google Scholar]

Abstract: