WITHDRAWN: Learning and Semi-automatic Intention Labeling for Classification Models: ACOVID-19 Study for Chatbots

doi:10.21203/rs.3.rs-2207882/v1

Download PDF

Research Article

WITHDRAWN: Learning and Semi-automatic Intention Labeling for Classification Models: ACOVID-19 Study for Chatbots

https://doi.org/10.21203/rs.3.rs-2207882/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

The full text of this preprint has been withdrawn, as it was submitted in error. Therefore, the authors do not wish this work to be cited as a reference. Questions should be directed to the corresponding author.

Editorial notes are used to provide important context regarding the topic of a preprint or to alert readers to potential issues concerning that preprint or a downstream publication associated with it. For more information on editorial notes, see our Editorial Policies.

It is increasingly common to use chatbots as an interface to services. Making this experience more humanized requires the chatbot to understand natural language and express itself using it. One of the main components of a chatbot is Natural Language Understanding (NLU) model, which is responsible for interpreting the text and extracting the intent and entities present in that text. It's possible to focus only on one of these tasks of NLU, such as Intent classification. To train an NLU intent classification model, it's generally necessary to use a considerable amount of annotated data, where each sentence of the dataset receives a label indicating an intent.Performing manually labeling data is an arduous and impracticable process depending on the data volume. Thus, an unsupervised machine learning technique, such as data clustering, could be applied to find patterns in the data and label them. For this task, it is essential to have an effective vector embedding representation of texts that depicts the semantic information and helps the machine understand the context, intent, and other nuances of the entire text. In this paper, we perform an extensive evaluation of different text embedding models for clustering and labeling, we also apply some operations to improve the quality of the dataset, such as removing sentences establishing different strategies for distance thresholds in terms of (cosine similarity) for the clusters' centroids. Then we trained some Intent Classification Models with two different architectures, one built with the Rasa framework and the other with a Neural Network (NN) using the text of attendance from the Coronavirus Platform Service of Ceará, Brazil. We also manually annotated a dataset to be used as validation data. We found that semi-automatic labeling through clustering and visual inspection added some biases to the intent classification models. However, we still achieved competitive results in terms of accuracy for the trained models.

intention model

chatbot

word embedding

sentence embedding

clustering

No competing interests reported.

Download PDF

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

WITHDRAWN: Learning and Semi-automatic Intention Labeling for Classification Models: ACOVID-19 Study for Chatbots

Status:

Version 1

Editorial Note

Abstract

Full Text

Additional Declarations

Status:

Version 1