Generating Adversarial Examples in Chinese Texts using Mixed-level Perturbations

doi:10.21203/rs.3.rs-2307347/v1

Download PDF

Research Article

Generating Adversarial Examples in Chinese Texts using Mixed-level Perturbations

https://doi.org/10.21203/rs.3.rs-2307347/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Compared with the research on adversarial examples on English data, there are few models for generating adversarial examples on Chinese data. Most of the Chinese adversarial examples are in a single form. Their fluency and attack accuracy are not well performed. In this paper, we propose MixAttacker that uses word-level and sentence-level perturbations in conjunction with each other to generate adversarial examples in Chinese texts. The model uses the masked language model WoBERT [1] to generate replacement words based on the Chinese word-level transformation, then selects one of the new sentences obtained by word replacement by merit, and finally, back translate this sentence. In addition, we propose sentence fluency evaluation to control the quality of adversarial examples more effectively. The experimental results show that our model achieves 75.50%, 69.00%, 46.00%, and 55.50% accuracy decrease on four datasets, respectively, ChnsentiCorp, Hotel, TUHCNews, and Weibo, with effective perturbation, semantic and fluency control.

Many studies have demonstrated that seemingly powerful neural networks are vulnerable to adversarial examples attack. Adversarial examples are a class of examples where slight perturbations are difficult for humans to detect but can fool the model. The behavior of fooling the model is usually classified as a white-box attack and a black-box attack depending on whether the target information is known. A white-box attack can know all the information about the parameters of the target model. In contrast, a black-box attack can only access information about the input and output of the target model.

The first research on adversarial examples came from Szegedy et al. [2] in Computer Vision (CV). The current research on the CV domain is more mature, with classical approaches such as Goodfellow et al. [3]; Nguyen et al. [4]; Chakraborty et al. [5]. Text data, compared to images, has discrete data features that are easily detected when perturbations are introduced, and it isn't easy to apply gradient update methods in text. Furthermore, text data contains complex grammatical rules and more abstract forms of expression, making it difficult to directly transfer effective methods from the CV domain to the Nature Language Process (NLP) domain. In the field of NLP, existing methods for adversarial example generation can be summarized as operations such as insertion, deletion, replacement, and change of position for character, word, sentence, and mixed granularity. Although these methods have achieved the envisioned results, there is room for improvement in attack accuracy, perturbation rate, semantic similarity, and expression fluency.

There are only three models for generating Chinese adversarial examples: Wang et al. [6] proposed their formula for calculating word importance and used substitution of similar words for adversarial attack; Tong et al. [7] used four substitution strategies to complete word-level targeted and untargeted attacks; Li et al. [8] used sentence piece-level BERT [9] to predict candidates in order to generate adversarial examples. The difference between the Chinese corpus and the English corpus is that there are no natural separators within the Chinese sentences, and they cannot be simply cut by spaces as in English. In addition, the basic morphemes used for expression in Chinese are not characters but words, which indicates that the accuracy of word segmentation is essential. Incorrect or ambiguous word segmentation often causes many fluctuations in the effect of text processing. Overall, there are challenges in generating high-quality Chinese adversarial examples. Considering that Chinese corpus also accounts for a considerable proportion of NLP tasks, we decided to explore methods for generating Chinese adversarial examples. We want to generate fluent adversarial examples as quickly as possible while preserving the original semantics. Black-box attacks do not require knowledge of the target model's information, they have more scope for real-world applications. We thus propose a black-box attack method that uses a combination of word-level and sentence-level perturbations to generate Chinese adversarial examples using a masked language model and a back-translation strategy. Adversarial examples generated by the MixAttacker model are illustrated in Table 1.

Table 1

The original text and adversarial examples
(label)Original text	(label)Adversarial example
(negative)价格不低, 档次不高, 环境糟糕, 大厅很小, 房间一般, 早餐难吃, 感觉一般! The price is not low, the class is not high, the environment is terrible, the hall is very small, the room is average, the breakfast is hard to eat, the feeling is average!	(positive)价格不低, 档次不高, 环境欠佳, 大厅很小, 房间一般, 早餐难吃, 感觉一般 The price is not low, the class is not high, the environment is poor, the hall is very small, the room is average, the breakfast is hard to eat, the feeling is average!
(negative)新机拿到手就有硬件问题, 而且等了6天才到货, 第二天就返修, 到现在还没得到处理意见! There was a hardware problem with the new machine when I got it, and I waited 6 days for the arrival of the new phone, and I returned it to repair the next day, so far I have not received any advice!	(positive)新机拿到手了, 有硬件问题, 而且等了6天才到货, 第二天就返修, 到现在还没得到处理意见! I got the new machine, but there is a hardware problem, and I waited 6 days for the arrival of the new phone, and I returned it to repair the next day, so far I have not received any advice!
(negative)可能是酒店比较老的缘故,房间看起来很一般,设施也比较一般。 It might be the hotel is older, the rooms look very general and the facilities are also more general.	(positive)也许是因为这家酒店比较老, 房间看起来很普通, 设施也很普通。 Maybe it's because this hotel is older, the rooms look very ordinary and the facilities are also very ordinary.
(game)《文明II》“新春大礼包”获奖名单《Civilization II》 "Chinese New Year Gift Pack" award list	(fashion)文明II“新春礼盒”获奖名单 Civilization II "Chinese New Year Gift Box" award list
(none)皮堡斯已达到猫的正常发育年龄, 但是体形仍然如果小巧, 实属罕见。 Peeburgs has reached the normal age of cat development, but it is unusual for him to be so small.	(like)皮堡斯已达到猫的正常发育年龄, 但是体形仍然如果小巧, 实属稀有。 Peeburgs has reached the normal age of cat development, but it is rare for him to be so small.

The main contributions of this paper are summarized as follows:

(1) Under the black-box setting, we propose MixAttacker, a method for generating Chinese adversarial examples with mixed-level perturbations. The masked language model WoBERT is used to predict the word-level transformation, and the sentence-level transformations are processed by back-translation.

(2) MixAttacker introduces part-of-speech weighting in word importance calculation of important baseline TEXTFOOLER [10].

(3) Referring to Garg et al. [11], we designed a generation strategy combining multiple patterns for WoBERT prediction to generate diversified adversarial examples.

(4) Based on the principle of slight semantic loss, the sentence-level perturbation is completed with back-translation, thus optimizing the results of word-level perturbation, which improves the accuracy of adversarial example attack and further ensures the fluency of expression of adversarial example attack.

(5) The experimental results show that the accuracy decreases of 75.50%, 69.00%, 46.00%, and 55.50% respectively in the four data sets by attacking the BERT classification model with the adversarial examples generated by MixAttacker, and the experimental data are in line with the original intention of the method.

In NLP, existing research on adversarial attacks is classified into character-level, word-level, sentence-level, and other attacks according to the perturbation of different granularity.

Character-level attacks are attacks that add perturbations to the original input characters. Relevant works include: Gao et al. [12], who proposed the DeepWordBug method to edit the top-ranked words at character level; Ebrahimi et al. [13], who studied character-level neural machine translation (NMT) adversarial examples; Belinkov et al. [14], who proposed an attack method that combines character-level oriented NMT models with noise sources; Gil et al. [15] proposed White-to-black attack, which generates an adversarial example through the white-box attack, and then trains an attack model with that adversarial example, and the retrained model can perform a black-box attack.

In word-level adversarial attacks, the most dominant approach is word replacement, which mainly generates replacement words around word vector similarity, synonyms, language model score, and other perspectives. Word-level attacks are widely used as they can achieve high attack accuracy in both white-box and black-box scenarios. Related studies include: Papernot et al. [16], who proposed an attack based on word sequence transformation. Samanta et al. [17] were the first team to synthesize complex text examples into adversarial examples. Alzantot et al. [18] used a population optimization-based algorithm to generate semantically and syntactically similar adversarial examples. Ren et al. [19] proposed a base probability weighted word saliency strategy for replacing synonyms. Zang et al. [20] generated diverse adversarial examples using knowledge extracted from WordNet. Shi et al. [21] launched an attack on the paraphrase recognition model with the idea of modifying shared words. Jin et al. [9] proposed a TEXTFOOLER, a baseline method for generating opposition examples, to determine which words in a sentence are likely to be replaced by calculating the importance of words. Maheshwary et al. [22] proposed a decision-based attack method. Garg et al. [11] proposed a black box attack method BAE (Bert-based Adversarial Examples), which includes two types of modes of masking tokens, substitution and insertion; Li et al. [23] proposed the Bert-attack method to generate high-quality adversarial text through a pre-trained masked language model. Tan et al. [24] generated plausible adversarial examples by disrupting the word shape changes of the original text without accessing the model gradient; Zou et al. [25] use reinforcement learning to generate adversarial examples.

The commonly used methods in sentence-level attack include re-decoding after encoding, adding irrelevant sentences and paraphrasing, etc. Relevant studies include: Jia et al. [26] generated adversarial examples by adding distracting sentences to input paragraphs; Ribeiro et al. [27] used the semantic equivalent attack induction model to make false predictions. Jyyer et al. [28] use syntactically controlled paraphrase networks (SCPNs) to develop adversarial examples. Wang et al. [29] generated adversarial examples based on a given input text through controllable attributes unrelated to known task labels; Xu et al. [30] achieved the purpose of rewriting the whole sentence by constructing a sentence rewriting sampler; Aiming at the security problem of IoT smart grid, Dong et al. [31] generated adversarial examples by selecting important sentences in a paragraph, followed by replacing NEs, Nouns, and Adjectives in the sentence with synonyms, and then rewriting the substituted sentence.

Hybrid adversarial attacks are methods that introduce multiple levels of perturbation on the dataset to generate adversarial examples. Li et al. [32] proposed a method named TEXTBUGGER to use a white-box attack like Papernot et al. [16] to accomplish character- and word-level perturbation. Ebrahimi et al. [33] proposed the HotFlip method to generate adversarial examples based on Ebrahimi et al. [12] to generate word-level adversarial examples but retain character-level samples' characteristics. Liang et al. [34] proposed to generate character-level and word-level adversarial examples through insertion and deletion operations.

In general, the quality of the generated adversarial examples under the character-level attack approach is poor, often causing damage to the grammar of the original sentence, and suitable grammar correction methods are available to deal with such attacks. Word-level attacks have better control over the quality and accuracy of the adversarial examples, but there is room for improvement in their fluency. Compared with character-level and word-level attacks, sentence-level attacks are more difficult to control the surface form of adversarial examples during the generation process. Simultaneously, for long texts, most methods target only single-sentence variations without considering the whole paragraph. The mixed-level attack method is more widely used, but there is still more room for improvement in attack accuracy and other aspects.

According to the above characteristics of textual adversarial example work, this paper proposes a method for generating adversarial examples under mixed-level perturbation, in which: To play the role of word importance ranking further, we introduce a part-of-speech weighting strategy based on TEXTFOOLER [9]. To generate more flexible and diverse effective adversarial examples, we draw on the idea of BERT-Attack [23] and choose to make predictions around the "Chinese word level" and expand the generation mode of word candidate set in the prediction process. To optimize the effect of generating new sentences after prediction, we used Dong et al. [31] for reference to introduce back-translation processing for new sentences.

The MixAttacker method mainly includes three parts:

(1) For the original input sentence, the words among them that are more sensitive to the classification model are selected. That is, the importance level of each word is calculated, and the words are arranged in reverse order of scores.

(2) The words are processed according to the order of the above word sequence. A new word is generated using WoBERT based on the context of the sentence in which the original word is located to replace that word in the original sentence.。

(3) A new sentence, selected by merit from the sentences obtained in the WoBERT prediction stage, is processed using back-translation.

The specific processing process of the above three parts is illustrated in Fig. 1.

3.1 Adversarial attacks in text classification

An adversarial example is an example obtained after a reasonable transformation of the original text. Specifically, a qualified adversarial example should have the following characteristics: (1) It has some changes compared with the original input. (2) It is difficult for humans to detect such changes. (3) The adversarial example makes the classification result different from the original text, that is, the attack is successful.

For a text classification task, given data containing N sentences and corresponding N labels $\left(X,Y\right)=\{\left({X}_{1},{Y}_{1}\right),\left({X}_{2},{Y}_{2}\right),\dots ,\left({X}_{N},{Y}_{N}\right)\}$, respectively, there is a classifier F such that $X\to Y$. For a sentence $X\in \varvec{X}$, an effective adversarial example Xadv should satisfy the following requirements:

$$F\left({X}_{adv}\right)\ne F\left(X\right),and Sim\left({X}_{adv},X\right)\ge \epsilon ,and Ptb({X}_{adv},X)\le \mu$$

Sim (·) is used to measure the semantic similarity between the original sentence and the adversarial example. Ptb (·) is used to measure the degree of disturbance added to the adversarial example compared with the original sentence. In addition, for a high-quality adversarial example, it should meet the requirements of semantic and perturbation and have a good performance of fluency.

3.2 Characteristics of Chinese

In Chinese, the word character is the smallest unit of text. Furthermore, Wikipedia specifies that the word is the smallest unit that can be used independently and contains semantic content. In order to obtain as much semantics of the original corpus as possible and to ensure a more reasonable sentence transformation, we choose to process around the original sentence with the word as the unit during the generation of the adversarial examples.

Chinese words can be composed of one, two, or more characters, as shown in Table 2. A proper grasp of the characteristics of Chinese words helps the final generated results to be closer to our design.

Table 2

Chinese character and word
The number of Chinese character	Chinese word
One character	风(wind)、书༈book༉、糖༈sugar༉
Two characters	电脑(computer)、家庭༈family༉、跑步༈run༉
More than two characters	巧克力(chocolate)、言而无信༈fail to keep faith༉

3.3 Ranking the importance of words

Considering that each word in the input sentence has a different influence on the result of the classification model, it is necessary to calculate the importance of each word in the sentence to select the word that is more sensitive to the classification model for the following processing. The above operation makes generating the adversarial example more targeted and further ensures that the differences between the adversarial example and the original text are minimized.

We learned from TEXTFOOLER [9] by comparing the change of label prediction between the original sentence and the sentence with the target word removed, which is used as an important basis for scoring the target word. Furthermore, based on this practice, a part-of-speech weighting strategy is introduced. We score the target words from the above two perspectives to distinguish the importance of each word in the sentence. The above description is summarized in Formula 2.

$${I}_{{w}_{i}}=\alpha (T{F}_{{w}_{i}}+A)$$

The score ${{I}_{w}}_{i}$ is used to score the words ${w}_{i}\in X$. The above equation describes the processing of part-of-speech weighting after scoring ${TF}_{{w}_{i}}$. In a specific scenario, the part- of -speech of word ${w}_{i}$ is fixed and unique, and α is the weight of the part- of -speech. For example, when classifying emotional tendencies, α can be set to 100 for adjectives and 1 for other parts to highlight the importance of adjectives. During the experiment, we found that the ${TF}_{{w}_{i}}$ value fluctuates within a small interval (roughly ${TF}_{{w}_{i}}\in (-\text{3,10})$). Since ${TF}_{{w}_{i}}$ maybe minus, a positive constant A needs to be added to ensure that the weight is applied to a number greater than 1 to obtain the desired weighting effect. The specific definition of ${TF}_{{w}_{i}}$ in the above equation is as follows:

$$T{F}_{{w}_{i}}=\left\{\begin{array}{c}{F}_{Y}\left(X\right)-{F}_{Y}\left({X}_{\backslash {w}_{i}}\right),if F\left(X\right)=F\left({X}_{\backslash {w}_{i}}\right)=Y \\ \left({F}_{Y}\left(X\right)-{F}_{Y}\left({X}_{\backslash {w}_{i}}\right)\right)+\left({F}_{\stackrel{-}{Y}}\left({X}_{\backslash {w}_{i}}\right)-{F}_{\stackrel{-}{Y}}\left(X\right)\right),\\ if F\left(X\right)=Y,F\left({X}_{\backslash {w}_{i}}\right)=\stackrel{-}{Y,}and Y\ne \stackrel{-}{Y}.\end{array}\right.$$

By accessing the classification model F, we can judge the influence of ${w}_{i}$ in the sentence. Where ${X}_{\backslash {w}_{i}}\{{w}_{1},{w}_{2},\dots ,{w}_{i-1},{w}_{i+1},\dots ,{w}_{n}\}$ denotes the sentence after deleting the word ${w}_{i}$, and ${F}_{Y}(\bullet )$ denotes the predicted score for label Y. The formula shows how to calculate the sensitivity of ${w}_{i}$ to the classification model in two cases (the same or different prediction labels after word deletion).

3.4 Using WoBERT to generate replacements

In order to ensure the generation of qualified adversarial examples as efficiently as possible, we no longer rely on corpus data heuristics to design solutions and draw lessons from BERT-Attack (Li et al., 2020). We use a pre-trained masked language model to predict keyword alternatives using keyword context, thus generating new sentences with similar semantics but different surface forms.

We chose to make predictions for the Chinese words in the sentence for the following reasons:

(1) Shorter sequences, faster processing speed: Chinese words are composed of n Chinese characters, and the sequence formed by cutting the sentence by Chinese word level must be shorter than the sequence formed by Chinese character level, thus the processing speed of the word level model will be faster than the character level.

(2) For the generation task, mitigating the exposure bias problem: The Chinese word-based model predicts one complete Chinese word (n Chinese characters) at a time. And this is equivalent to the Chinese character-based model making n predictions, which are recursively dependent on n steps. So, the Chinese character-based model's exposure bias problem is more severe than the Chinese word-based model's.

(3) Lower uncertainty of Chinese word meanings reduces modeling complexity: In Chinese expressions, Chinese character meanings carry more uncertainty than Chinese word senses. This leads to modeling where one layer of embedding is sufficient to represent the Chinese word meaning, while the Chinese character meaning requires multiple layers of embedding to be constructed.

In most of the current "BERT" type pre-trained language models for Chinese, such as BERT-wwm (Cui et al., 2021) and AMBERT (Zhang et al., 2021), the basic unit is still the "word". These models only manage to incorporate word information, but do not essentially capture the word information in the original corpus. The emergence of WoBERT, a Chinese pre-training model that is genuinely purely oriented to word granularity, confirms the desirability of a word-based pre-training model. WoBERT is a model that continues to be pre-trained based on the open-source BERTa-wwm-ext (Cui et al., 2021) from Harbin Institute of Technology, using words as the unit and Masked LM(MLM) as the training task. Its experimental results (Sun, 2020) show that WoBERT is comparable to BERT in NLP tasks that do not require exact boundaries and has a more significant speedup. In particular, the word-level model works better than the character-based model for the generation task. To this end, the WoBERT-MLM model is finally chosen to predict replacement candidates.

In addition, to further maintain the original semantics as much as possible and reduce the consumption of running MLM predictions per iteration, we follow the BERT-Attack setting. Do not mask the selected words in the original sentence, but use the complete sequence as input and perform top-N prediction of the replacement words based on the position in the sentence where the selected words are located. Based on the above idea, we designed our approach for expanding the sources of replacement candidate sets. Therefore, in the whole process of WoBERT prediction and generation of new sentences, the re-placement candidate set we use to search contains the results generated by the following three modes:

(1) mode1: For the target word, predict the possible replacement words at the word's location in the original sentence, and select the top-N of the replacement word into the replacement candidate set.

(2) mode2: For the target word, we predict the possible insertion words before or after the word's location in the original sentence and select the top-N of the insertion words to join with the target word as the new word in the replacement candidate set.

(3) mode3: For each replacement word selected in mode1, the new sentence obtained after replacing the original word is used as input. The possible insertion words be-fore or after the location of the replacement word in the sentence are predicted. The top-N of the insertion word is selected and connected with the replacement word respectively as the new word in the replacement candidate set.

The whole replacement candidate set generation strategy is illustrated in Fig. 2.

3.5 Back-Translation

In each iteration, the new sentences generated using the MLM method may have two problems: first, they may not necessarily attack the target model successfully, and second, the new sentences generated may have unreasonable grammar or collocation, which may cause the fluency of the sentence expression to be affected. In order to improve these problems while ensuring a slight semantic loss, back-translation processing is introduced to the text. The process of back-translation is to take a given sentence in a particular class of languages, translate it first into another intermediate language, and then translate that sentence back into the original language. This process aims to complete the optimization of details, such as adjusting the sentence structure and standardizing the wording while ensuring that the semantics are not distorted, which is conveyed by the sentence.

In order to ensure the quality of back-translation, MixAttacker uses Youdao API to complete the action of back-translation, using English as the intermediate language, through the "Chinese-English-Chinese" translation processing, so that the sentence has more characteristics of high-quality adversarial examples. Results of the back-translation are illustrated in Table 3.

Table 3

Examples generated at the WoBERT and Back-Translation stages
(label)Original text	(label)WoBERT generation	(label)Back-Translation
(none)无论失去什么, 都不要失去好心情。	(none)无论失去什么, 都我不想失去好心情。	(like)无论我失去什么, 我都不想失去我的好心情。
(politic)台南法院将于4月8日开庭审理张铭清遇袭案。	(politic)南湖法院将于4月8日开庭审理张铭清遇袭案。	(society)南湖法院将于4月8日审理张某的案件。
(politic)文化部: 营业性演出不得以假唱假演奏欺骗观众。	(politic)中国文联解读: 营业性演出不得以假唱假演奏欺骗观众。	(society)商业表演不应该假唱来欺骗观众。
(negative)虽然预定比较方便, 但是宾馆的对顾客的关怀不足, 房间冷得让人睡不着觉服务人员还说暖和的!༁༁	(positive)虽然预定比较方便, 但是宾馆的对顾客的关怀不足, 房间有些寒冷得让人睡不着觉服务人员还说暖和的!༁༁	(positive)虽然预订比较方便, 但是酒店对顾客的关心是不够的, 房间有些冷让人睡不着, 服务人员也说温暖!༁༁
(negative)房间还可以, 不脏;热水放了30分钟后才有点热, 洗的不爽༛还不可以刷卡。	(positive)房间还可以, 不脏太;热水放了30分钟后才有点热, 洗的不爽༛还不可以刷卡。	(positive)房间还可以, 不太脏;热水放了30分钟才有点热。你还不能刷卡。

(The back-translation examples illustrated in the table reflect the effect of expression-level adjustments on the sentences generated in the WoBERT stage, and the examples show that the back-translated sentences are generally more fluent.)

In practice, we not only put back-translation after the WoBERT layer processing but also put it at the top of the overall algorithm as an independent adversarial example generation strategy. Specifically, (as illustrated in Algorithm 1) the model first back-translates the original input text and uses the back-translated result to attack the target model. If the attack is successful, the back-translated result is directly output as an adversarial example (the adversarial example generated here is illustrated in Table 4). If not, the algorithm enters the processing flow illustrated in Fig. 1.

Table 4

Adversarial examples generated only by back-translation
(label)Original text	(label)Back Translation
(negative)感觉一般, 最便宜的房间没有窗户, 房间太小。不过, 里要去的办公地点很近。	(positive)最便宜的房间没有窗户, 而且太小了。然而, 办公室很近。
(positive)酒店还可以, 房内有电脑, 数字电视。早餐一般啦。隔音效果比较差, 下面是商业街, 不太安静。	(negative)酒店没问题。房间里有电脑和数字电视。早餐一般般。隔音效果差, 下面是商业街, 不是很安静。
(positive)酒店的硬件设施达到了四星的标准, 24小时供应热水, 水量也很足。但是服务只能算是三星的水平, 早餐最多是二星的标准。由于团队过多, 酒店晚上餐厅不提供点餐服务, 只有自助餐。和其他地方的HolidayInn比起来, 性价比差远了。	(negative)酒店的硬件达到了四星级标准, 热水全天24小时供应。但是服务只有三星级, 早餐最多是两星级。由于团体人数多, 酒店餐厅晚上不提供点餐服务, 只提供自助餐。与其他地方的HolidayInn相比, 它的性价比远不如假日酒店。

3.6 Algorithm description

We summarize the whole MixAttacker process in Algorithm 1.

Algorithm 1: MixAttaker algorithm

Input: text data X with original label Y, classification model F, Functions (back translate sentence function-BTranslate word segmentation function-Split, word influence calculation function-Score, descending sorting function according to word influence-SortByScore, Pick the top-N by WoBERT prediction-Pred)

Output: adversarial example X_adv of text X

1. Initialization: X_adv← X, Importance← {}, Candidates← {}

2. X'←BTranslate(X)

3. If F(X') ≠ Y:

4. X_adv←X'

5. Return X_adv

6. Else:

7. W← {w₁, w₂, …, w_n} ←Split(X)

8. for w_i in W:

9. Importance.add (Score (w_i, X, Y, F))

10. W_score←SortBySore (W, Importance)

11. Candidates←Pred(X) where 3 modes

12. for i in W_score:

13. If Perturbation > 20%:

14. Return None

15. L← {}

16. for c in Candidates[w_i]:

17. L[c] ← w₁…w_i-1 [c] w_i+1…w_n

18. If $\exists$ c in Candidates s.t F(L[c]) ≠ Y:

19. L’← L[c'] where L[c'] has maximum similarity with X

20. Else:

21. L’ ←L[c'] where L[c'] causes maximum reduction in probability of label Y in F(L[c'])

22. X’←BTranslate(L’)

23. If F(X’) ≠ Y:

24. Return X_adv ←X’

25. Elif F(L’) ≠ Y:

26. Return X_adv ←L'

27. Return None

The overall process of the algorithm is described as follows:

Part1 (line 2–5): Back-translate the input sentences and attack the classifier with the back-translation results, and output the back-translation results as adversarial examples if the predicted labels change. Otherwise, enter Part2 (as illustrated in Fig. 1).

Part2 (line 6–26):

A (line 6–10): Use WoBERT to segment the input sentences. Then use the LTP tool to tag part-of-speech of the words obtained after segmenting, and then obtain a set of words in reverse order of word importance by importance ranking calculation.

B (line 11–26): Iterate sequentially through the words in the descending sequence of word importance:

①(line 11–21)For the target words, new words are predicted using WoBERT-MLM (three predicted modes are used in this stage, as illustrated in Fig. 2), the top-N predicted words are selected. Then the n sentences obtained after replacing the original words with them respectively attack the classifier. Finally, a sentence is selected that meets the requirements: the predicted label changes and has the highest semantic similarity degree with the original input. If not, the predicted label does not change and has the lowest confidence score.

②(line 21–26)Back-translate that sentence selected from ① and let the back-translation result attack the classifier. If the prediction label is changed, the back-translation result is output as an adversarial example; if the label is not changed, the result of ① is judged: if the sentence selected in ① can change the prediction label, it is output as an adversarial example. Otherwise, the B step operation is performed on the next word in the word importance descending sequence.

Note that when the perturbation threshold is reached (i.e., the last word allowed to be modified is processed), if neither ① nor ② produces a sentence that can change the predicted label, then Return None.

4.1 Experimental setup

To validate the quality of the MixAttacker method for generating adversarial examples, we launched an adversarial attack against the BERT classification model on four datasets. The details of the datasets are illustrated in Table 5

Table 5

Datasets
	ChnSentiCorp	Hotel	THUCNews	Weibo
Total categories	2	2	10	8
Training set(sentences)	9600	4800	50000	50000
Validation set(sentences)	1200	600	5000	5000
Average length(Chinese word)	108	139	19	28

ChnsentiCorp: A dataset for the Chinese Sentiment Tendency Task, containing reviews of online purchases of books, hotels, and other items, with an average review length of 108 Chinese characters.

Hotel: The content of this dataset is from Ctrip.com and contains more than 7000 reviews about hotels. Among them, 3000 positive and negative reviews were selected for the experiments, and the average length of the reviews was 139 Chinese characters.

TUHCNews: A news text classification dataset compiled by Tsinghua NLP group, the text information of this dataset contains news headlines and news contents. As experimental data, we select news headlines from 10 domains (education, entertainment, fashion, finance, game, politic, society, sport, stock, and technology). The average length of news headlines is 19 Chinese characters.

Weibo: A dataset for sentiment analysis tasks, with content posted by individuals on the Weibo platform. The dataset contains eight categories: like, disgust, happiness, sadness, anger, surprise, fear, and none. The average length of the sentences is 28 Chinese characters.

Other settings:

Semantic similarity

We choose the Baidu Short Text Similarity API to calculate the semantic similarity between the original sentence and the adversarial example. It adopts the SimNet framework for End-to-End modeling under the deep learning framework, which unifies both point-wise and pair-wise supervised learning approaches all within the overall framework, avoiding the defects of the traditional algorithms VSM and BM25, which can only be oriented to lexical level similarity and can make better use of multiple meaning words and language structures in the language. On the basis of this API, the BOW (bag of words, BOW) model based on the bag of words is specifically chosen in the experiments to calculate the semantic similarity.

Fluency

We choose the Baidu Chinese utterance fluency judgment API to score the fluency of the generated adversarial examples (a lower score indicates higher fluency). It uses the Chinese DNN language model to determine whether the sentence conforms to the language expression convention, i.e., how fluent the sentence is, by calculating the probability value of each word in the sentence.

Perturbation

At the stage where WoBERT was used to predict replacement words to form new sentences, our human evaluation of the experimental results revealed that the perturbation rate in this stage was more than 20%, and the resulting sentences would hinder readers' understanding. Therefore, to ensure the benign effect of this stage as much as possible, we introduced a perturbation limit for this stage. Specifically, we limited the number of words to be changed to less than 20% of the total number of words in the sentence.

Since back-translation is an operation that reduces semantic loss more than word substitution, it inherently ensures the original semantics of the sentence to the greatest extent possible. In addition, back-translation may produce a variety of forms, which makes such sentence-level perturbation not easy to measure, so back-translation operation is not included in the calculation of perturbation in this method.

Top-N

For the three prediction modes in the WoBERT stage, the top-N in mode1, mode2, and mode3 were set to top-8, top-8, and top-3 in the experiments, respectively.

4.2 Attack result

Table 6 shows the attacking results of the adversarial example attack on the BERT classification model under four adversarial example generation methods, including MixAttaker (MA), WordHandling [6] (WH), CWordAttacker [7] (CAW), and Pce-R [8]. Since the source code of the Pce-R model and the model training data were not published, in the experiments, we used the WoBERT model to complete its prediction session without changing other settings in the paper [11]. Since the WH method is oriented to the binary classification task of sentiment tendency, WH could not be evaluated on both THUCNews and Weibo datasets.

Table 6

Attacking results about MixAttaker, WordHandling, CWordAttacker and Pce-R
Dataset	Method	Ori Acc (%)	Atk Acc (%)	Acc Dec (%)	Ptb(word)	Sim (%)	Adv Flu
ChnSentiCorp	MA	92.50	17.00	75.50	3.54	95.10	690
	WH	92.50	68.00	24.50	6.12	94.42	2432
	CWA	92.50	46.50	46.00	4.93	97.24	873
	Pce-R	92.50	31.00	61.50	4.00	98.22	819
Hotel	MA	90.50	21.50	69.00	2.68	94.85	710
	WH	90.50	44.50	46.00	3.81	96.12	1958
	CWA	90.50	54.00	36.50	3.75	97.71	918
	Pce-R	90.50	37.50	53.00	2.66	98.51	906
THUCNews	MA	98.50	52.50	46.00	1.25	83.20	3181
	CWA	98.50	79.00	19.50	1.46	92.29	2953
	Pce-R	98.50	73.00	25.50	1.63	92.09	3691
Weibo	MA	77.00	21.50	55.50	1.51	91.90	1133
	CWA	77.00	53.00	24.00	2.21	95.58	1295
	Pce-R	77.00	35.00	42.00	1.83	96.28	1164

Table 6 presents a comprehensive evaluation of the MixAttaker method around four metrics: accuracy decrease (Acc Dec), number of perturbed words (Ptb), semantic similarity (Sim), and adversarial examples' fluency (Adv Flu). In general, MixAttaker achieves the expected results. Compared with the other three Chinese attack methods, this method has outstanding accuracy decrease, lower perturbation, and better fluency. Since this method introduces an extra back-translation link compared to the other three substitution class methods, back-translation will make the sentence form change more and thus affect the semantic similarity. Still, by observing the adversarial examples based on the semantic similarity scores (as in Table 7), we find that the Sim value of MA in the table is in line with the requirements of the adversarial examples, that is, the adversarial examples generated by MixAttaker do not change the meaning that the original text intends to express.

Table 7

Examples of adversarial with different semantic similarity
(label)Original text	(label)Adversarial example	Sim (%)
(negative)虽然预定比较方便, 但是宾馆的对顾客的关怀不足, 房间冷得让人睡不着觉服务人员还说暖和的!༁	(positive)虽然预订比较方便, 但是酒店对顾客的关心是不够的, 房间有些冷让人睡不着, 服务人员也说温暖!༁༁	97.11
(none)无论失去什么, 都不要失去好心情。	(like)无论我失去什么, 我都不想失去我的好心情。	94.80
(game)《文明II》“新春大礼包”获奖名单	(fashion)文明II“新春礼盒”获奖名单	87.12
(negative)可能是酒店比较老的缘故,房间看起来很一般,设施也比较一般.	(positive)也许是因为这家酒店比较老, 房间看起来很普通, 设施也很普通。	83.18
(fashion)揭秘韩国时尚舞娘街头魅影	(game)韩国时尚舞者《街头魅影》亮相	78.57

4.3 Ablation study

In this section, we explore the effectiveness of important parts in MixAttacker, including word importance calculation formula, WoBERT-MLM multi-mode prediction, and back-translation processing.

4.3.1 The role of word importance calculation formula

In MixAttacker and three other Chinese adversarial example generation models, a total of three word importance calculation formulas exist. About the three formulas:

(1) The method of WordHandling [6] is only oriented to affective tendency classification and is not very effective.

(2) The calculation formulas of CWordAttacker [7] and Sentence piece [8] are the same. Both are $T{F}_{{w}_{i}}={F}_{Y}\left(X\right)-{F}_{Y}\left({X}_{\backslash {w}_{i}}\right)$ (description refer to section3.3).

(3) The formula for calculating the word importance of MixAttacker is illustrated in section3.3.

According to the above, from the perspective of exploring the validity of MixAttacker's word importance calculation formula, we designed experiments around three word importance calculation formulas from MixAttacker (MA), TEXTFOOLER [10] (TF), and Sentence-piece [8] (Pce-R), respectively.

Table 8

Attacking results of MixAttacker with different word importance calculation formulas
Dataset	Method	Ori Acc (%)	Atk Acc (%)	Acc Dec (%)	Ptb(word)	Sim (%)	Adv Flu
ChnSentiCorp	MA	92.50	17.00	75.50	3.54	95.10	690
	TF	92.50	22.00	70.50	3.26	95.58	689
	Pce-R	92.50	22.00	70.50	3.27	95.63	699
Hotel	MA	90.50	21.50	69.00	2.68	94.85	710
	TF	90.50	31.50	59.00	2.11	94.87	722
	Pce-R	90.50	31.50	59.00	2.13	94.70	707
THUCNews	MA	98.50	58.50	40.00	1.09	82.82	3089
	TF	98.50	52.50	46.00	1.25	83.20	3181
	Pce-R	98.50	52.50	46.00	1.26	83.13	3147
Weibo	MA	77.00	25.00	52.00	1.82	92.22	1190
	TF	77.00	21.50	55.50	1.51	91.90	1133
	Pce-R	77.00	21.50	55.50	1.53	92.07	1199

Table 8 shows the Attacking results of MixAttacker with different word importance calculation formulas. According to the description in section3.3, the word importance calculation formula of MixAttaker is based on TEXTFOOLER [10] to weight the part-of-speech. Specifically, for sentiment tendency analysis, the importance of adjectives in the sentence is increased; for news headline classification, the importance of nouns is increased. The performance of MixAttacker on the ChnSentiCorp and Hotel datasets in Table 8 verifies the effectiveness of the part-of-speech weighting strategy introduced. However, for the latter two datasets, the purely TEXTFOOLER calculation has a better attack result. We analyze the possible reasons: There are many nouns from various fields in the THUCNews dataset, and there are only ten categories for their categorization, so it is difficult to weight precisely a certain noun that really affects the classification results; the Weibo dataset comes from a social platform, and the sentences are posted casually by users, the distribution of corpus types is scattered, and there are few adjectives. Weighting only adjectives may ignore other features of the whole dataset, so the effect is poor.

In general, part-of-speech weighting performs differently on different types of datasets. For the binary sentiment classification task on the ChnSentiCorp, Hotel dataset, part-of-speech weighting improves the effectiveness of attacks against examples.

4.3.2 The role of WoBERT's multi-mode prediction

Table 9 shows the attacking results of the MixAttacker (MA) in different prediction modes, where MA refers to the approach containing three modes: mode1, mode2, and mode3; mode13 indicates the use of mode1 and mode3; mode12 indicates the use of mode1 and mode2 (the specific settings of mode1/2/3 refer to section3.4).

Table 9

Attacking results of MixAttacker under different MLM prediction modes
Dataset	Method	Ori Acc (%)	Atk Acc (%)	Acc Dec (%)	Ptb(word)	Sim (%)	Adv Flu
ChnSentiCorp	MA	92.50	17.00	75.50	3.54	95.10	690
	mode13	92.50	19.00	73.50	3.59	95.08	696
	mode12	92.50	24.00	68.50	4.29	95.56	634
	mode1	92.50	27.50	65.00	4.54	94.92	598
Hotel	MA	90.50	21.50	69.00	2.68	94.85	710
	mode13	90.50	21.50	69.00	2.83	94.58	699
	mode12	90.50	25.50	65.00	2.95	94.62	774
	mode1	90.50	26.50	64.00	3.07	93.70	753
THUCNews	MA	98.50	52.50	46.00	1.25	83.20	3181
	mode13	98.50	52.50	46.00	1.25	83.47	3175
	mode12	98.50	60.50	38.00	1.22	81.31	2265
	mode1	98.50	63.00	35.50	1.23	80.41	1981
Weibo	MA	77.00	21.50	55.50	1.51	91.90	1133
	mode13	77.00	21.00	56.00	1.53	91.50	1102
	mode12	77.00	31.50	45.50	1.58	91.52	982
	mode1	77.00	35.50	41.50	2.47	89.80	1001

Table 9 shows the attacking results of the MixAttacker under the strategy of various modes. The MixAttacker contains three modes, making the set of replacements that can be searched larger and theoretically increases the probability of successful attacks and the diversity of adversarial examples. Also, the experimental results in the table show the effectiveness of the multi-mode prediction strategy design of WoBERT in MixAttacker.

4.3.3 The role of back-translation

From the algorithm description, it can be seen that the back-translation processing of MixAttacker is arranged in two places, and we designed experiments to explore the rationality of these two arrangements. Table 10 shows the attacking results of MixAttacker(MA) under different back-translation strategies, where justFirBT indicates that the original text is only processed by back-translation and the back-translation result is used to attack the classification model; _noSecBT indicates that the back-translation step after WoBERT prediction is removed based on MixAttacker; _noFirBT indicates that the top back-translation attack judgment is removed based on MixAttacker (refer to section3.5 for specific settings).

Table 10

Attacking results of MixAttacker under different back-translation strategies
Dataset	Method	Ori Acc (%)	Atk Acc (%)	Acc Dec (%)	Ptb(word)	Sim (%)	Adv Flu	BT Ratio (%)
ChnSentiCorp	MA	92.50	17.00	75.50	3.54	95.10	690	48.34
	justFirBT	92.50	89.00	3.50	——	90.13	465	100
	_noSecBT	92.50	19.00	73.50	4.05	97.47	870	4.76
	_noFirBT	92.50	17.50	75.00	3.55	95.22	697	46.67
Hotel	MA	90.50	21.50	69.00	2.68	94.85	710	51.50
	justFirBT	90.50	86.50	4.00	——	89.88	517	100
	_noSecBT	90.50	24.50	66.00	2.79	97.86	813	6.06
	_noFirBT	90.50	21.50	69.00	2.74	94.88	710	50.73
THUCNews	MA	98.50	52.50	46.00	1.25	83.20	3181	63.04
	justFirBT	98.50	91.50	7.00	——	76.09	1189	100
	_noSecBT	98.50	58.50	40.00	1.35	89.81	3300	17.50
	_noFirBT	98.50	54.00	44.50	1.43	83.95	3426	59.56
Weibo	MA	77.00	21.50	55.50	1.51	91.90	1133	38.74
	justFirBT	77.00	67.00	10.00	——	80.50	1113	100
	_noSecBT	77.00	22.50	54.50	1.56	93.80	1203	18.35
	_noFirBT	77.00	23.00	54.50	1.73	93.20	1146	33.33

We additionally show in Table 10 the percentage of back-translation results (BT Pct) among the adversarial examples generated by MixAttacker under different back-translation strategies. The BT Pct value in MA reflects the power that the back-translation strategy proposed in MixAttacker, contributes to the overall algorithm. The data in justFirBT reflect the effectiveness of back-translation alone as an attack strategy, i.e., the adversarial examples generated in such cases have a certain probability of success when attacking the classification model. Overall, Table 10 reflects the effectiveness of the back-translation strategy in MixAttacker, which achieves the best accuracy decrease and perturbation on all datasets with good fluency.

Overall, we have completed the work of creating Chinese adversarial examples under the black-box setting. Practical experiments show that the final effect of our proposed MixAttaker model meets the original intention of each part design: the part-of-speech weighting processing in word importance calculation, the design of WoBERT word-level prediction and multiple prediction modes, and the placement and timing of back-translation all contribute positively to the overall performance of the model. That is, the MixAttaker model can guarantee the fluency of the adversarial examples and the semantic similarity with the original input, and its accuracy in attacking the BERT classification model performs well among existing Chinese methods.

Declarations

Statements and Declarations

Conflict of Interest: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We would like to thank for our support organization. This work was supported by the National Natural Science Foundation of China (grant numbers 61962057, U2003208); Xinjiang Key R & D Project (grant number 2021B01002).

Jianlin Su (2020) Speeding Up Without Reducing Accuracy: Chinese WoBERT Based on Word Granularity [Blog post]. Retrieved from https://spaces.ac.cn/archives/7758
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ (2014) Rob Fergus Intriguing properties of neural networks. CoRR arXiv:1312.6199
Ian J, Goodfellow J, Shlens C, Szegedy(2015) Explaining and harnessing adversarial examples.CoRRarXiv:1412.6572
Anh Nguyen J, Yosinski, Clune J (2015) Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: IEEE conference on computer vision and pattern recognition, pp 427–436
Anirban Chakraborty M, Alam V, Dey A, Chattopadhyay, Mukhopadhyay D (2018) Adversarial Attacks and Defences: A Survey. CoRR arXiv:1810.00069
Wenqi Wang R, Wang L, Wang B Tang (2019) Adversarial Examples Generation Approach for Tendency Classification on Chinese Texts. Ruan Jian Xue Bao/Journal of Software 30(08):2415–2427
Tong X, Wang L, Wang R, Wang J (2020) A Generation Method of Word-level Adversarial Samples for Chinese Text Classifification. Netinfo Secur 20(09):12–16
Li L, Shao Y, Song D, Qiu X (2020) Xuanjing Huang Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces. CoRR arXiv:2012.14769
Devlin J, Chang M-W, Lee K, Kristina Toutanova(2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.CoRRarXiv:1810.04805
Di Jin Z, Jin JT, Zhou P (2020) Szolovits Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. In: AAAI Conference on Artificial Intelligence, pp 8018–8025
Siddhant Garg G (2020) Ramakrishnan BAE: BERT-based adversarial examples for text classification. In: Conference on Empirical Methods in Natural Language Processing, pp 6174–6181. https://doi.org/10.18653/v1/2020.emnlp-main.498
Ji Gao J, Lanchantin ML, Soffa Y (2018) Qi Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. In: IEEE Security and Privacy Workshops, pp 50–56
Ebrahimi J, Lowd D (2018) Dejing Dou On adversarial examples for character-level neural machine translation. In: International Conference on Computational Linguistics, pp 653–663
Yonatan Belinkov Y Bisk(2018) Synthetic and Natural Noise Both Break Neural Machine Translation. CoRR arXiv:1711.02173
Yotam Gil Y, Chai O, Gorodissky J (2019) Berant White-to-black: Efficient distillation of black-box adversarial attacks. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1373–1379
Nicolas Papernot PD, McDaniel A, Swami RE, Harang (2016) Crafting adversarial input sequences for recurrent neural networks. In: IEEE Military Communications Conference, pp 49–54
Suranjana Samanta S, Mehta(2017) Towards crafting text adversarial samples.CoRR arXiv: 1707.02812
Moustafa Alzantot Y, Sharma A, Elgohary B-J, Ho MB, Srivastava K-W (2018) Chang Generating natural language adversarial examples. In: Conference on Empirical Methods in Natural Language Processing, pp 2890–2896
Shuhuai Ren Y, Deng K, He W (2019) Che Generating natural language adversarial examples through probability weighted word saliency. In: Annual Meeting of the Association for Computational Linguistics, pp 1085–1097
Yuan Zang F, Qi C, Yang Z, Liu M, Zhang Q, Liu M (2020) Sun Word-level Textual Adversarial Attacking as Combinatorial Optimization. In: Annual Meeting of the Association for Computational Linguistics, pp 6066–6080
Zhouxing Shi M, Huang (2020) Robustness to modification with shared words in paraphrase identification. In: Conferece on Empirical Methods in Natural Language Processing, pp 164–171
Rishabh Maheshwary S, Maheshwary V (2021) Pudi Generating natural language attacks in a hard label black box setting. In: AAAI Conference on Artificial Intelligence, pp 13525–13533
Li L, Ma R, Guo X, Xue X (2020) Qiu BERT-ATTACK: Adversarial Attack Against BERT Using BERT.In: Conference on Empirical Methods in Natural Language Processing, pp 6193–6202
Samson Tan SR, Joty M-Y, Kan R (2020) Socher It’s Morphin’ time! Combating linguistic discrimination with inflectional perturbations. In: Annual Meeting of the Association for Computational Linguistics., pp 2920–2935
Wei Zou S, Huang J, Xie X, Dai J, Chen (2020) A reinforced generation of adversarial examples for neural machine translation. In: Annual Meeting of the Association for Computational Linguistics, pp 2020:3486–3497
Jia R, Liang P (2017) Adversarial Examples for Evaluating Reading Comprehension Systems. In: Conference on Empirical Methods in Natural Language Processing, pp 2021–2031
Marco Túlio Ribeiro S, Singh C (2018) Guestrin Semantically equivalent adversarial rules for debugging NLP models. In: Annual Meeting of the Association for Computational Linguistics, pp 856–865
Mohit Iyyer J, Wieting K, Gimpel L (2018) Zettlemoyer Adversarial example generation with syntactically controlled paraphrase networks. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1875–1885
Tianlu Wang X, Wang Y, Qin B, Packer K, Li J, Chen A, Beutel Ed (2020) Chi CAT-Gen: Improving robustness in NLP models via controlled adversarial text generation. In: Conference on Empirical Methods in Natural Language Processing, pp 5141–5146
Xu L (2021) Kalyan Veeramachaneni Attacking Text Classifiers Via Sentence Rewriting Sampler. CoRR arXiv: 2014.08453
Jialiang Dong Z, Guan L, Wu X, Du M Guizani (2021) A sentence-level text adversarial attack algorithm against IIoT based smart grid. Comput Netw 190:107956
Li J, Ji S, Du T, Li B, Wang T(2048) Textbugger: Generating adversarial text against real-world applications.CoRR arXiv:1812.05271
Ebrahimi J, Rao A, Lowd D (2018) Dejing Dou Hotflip: White-box adversarial examples for text classification. In: Annual Meeting of the Association for Computational Linguistics, pp 31–36
Liang B, Li H, Su M, Bian P, Li X, Shi W (2018) Deep text classification can be fooled. In: International Joint Conference on Artificial Intelligence, pp 4208–4215
Yiming Cui W, Che T, Liu B, Qin Z, Yang S, Wang G (2021) Hu Pre-training with whole word masking for chinese bert. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 3504–3514
Zhang X, Li P, Li H (2021) AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization. In: Findings of the Association for Computational Linguistics, pp 421–435
A Appendices
Here we provide the corresponding translations of the Chinese adversarial examples in the previous tables

Table 3 Examples generated at the WoBERT and Back-Translation stages

Table 4 Adversarial examples generated only by back-translation

Table 7 Examples of adversarial with different semantic similarity

(label)Original text	(label)WoBERT generation	(label)Back-Translation
(none)无论失去什么, 都不要失去好心情。 Whatever lose, don't lose good mood.	(none)无论失去什么, 都我不想失去好心情。 Whatever lose, I don't want to lose good mood.	(like)无论我失去什么, 我都不想失去我的好心情。 Whatever I lose, I don't want to lose my good mood.
(politic)台南法院将于4月8日开庭审理张铭清遇袭案。 Tainan Court will hold a trial on April 8 for the assault case of Mingqing Zhang.	(politic)南湖法院将于4月8日开庭审理张铭清遇袭案。 Nanhu Court will hold a trial on April 8 for the assault case of Mingqing Zhang.	(society)南湖法院将于4月8日审理张某的案件。 Nanhu Court will hear Zhang's case on April 8.
(politic)文化部: 营业性演出不得以假唱假演奏欺骗观众。 Ministry of Culture: commercial performances should not be lip-synched to deceive the audience.	(politic)中国文联解读: 营业性演出不得以假唱假演奏欺骗观众。 Interpretation by China Federation of Literary and Art Circles: commercial performances should not be lip-synched to deceive the audience.	(society)商业表演不应该假唱来欺骗观众。 Commercial performances should not lip-synch to deceive the audience.
(negative)虽然预定比较方便, 但是宾馆的对顾客的关怀不足, 房间冷得让人睡不着觉服务人员还说暖和的!༁༁ Although the reservation is more convenient, the hotel lacks care for customers; the room is so cold that people can not sleep, but the service staff still say warm!!!	(positive)虽然预定比较方便, 但是宾馆的对顾客的关怀不足, 房间有些寒冷得让人睡不着觉服务人员还说暖和的!༁༁ Although the reservation is more convenient, the hotel's care for customers is insufficient; the room is some cold so that people can't sleep, but the service staff also said warm!!!	(positive)虽然预订比较方便, 但是酒店对顾客的关心是不够的, 房间有些冷让人睡不着, 服务人员也说温暖!༁༁ Although the reservation is more convenient, the hotel is not enough to care for customers; the room is a little cold so that people can not sleep, and the service staff said warm!!!
(negative)房间还可以, 不脏;热水放了30分钟后才有点热, 洗的不爽༛还不可以刷卡。 The room was okay, not dirty; the hot water was only a little hot after 30 minutes, so it was not cool to wash; and it was not possible to swipe the card.	(positive)房间还可以, 不脏太;热水放了30分钟后才有点热, 洗的不爽༛还不可以刷卡。 The room was okay, not so dirty; the hot water was only a little hot after 30 minutes, so it was not cool to wash; and it was not possible to swipe the card.	(positive)房间还可以, 不太脏;热水放了30分钟才有点热。你还不能刷卡。 The room was okay, not too dirty; the hot water was only a little hot after 30 minutes. You can't swipe your card yet.

(label)Original text	(label)Back Translation
(negative)感觉一般, 最便宜的房间没有窗户, 房间太小。不过, 里要去的办公地点很近。 The cheapest room has no window, and the room is too small. Nevertheless, it is very close to the office where you are going.	(positive)最便宜的房间没有窗户, 而且太小了。然而, 办公室很近。 The cheapest room has no window, and is too small. However, the office is very close.
(positive)酒店还可以, 房内有电脑, 数字电视。早餐一般啦。隔音效果比较差, 下面是商业街, 不太安静。 The hotel is fine, with a computer and digital TV in the room. Breakfast is average. The soundproof effect is poor; the following is a commercial street, not too quiet.	(negative)酒店没问题。房间里有电脑和数字电视。早餐一般般。隔音效果差, 下面是商业街, 不是很安静。 No problem with the hotel. The room had a computer and digital TV. Breakfast is average. Poor sound insulation and the commercial street below is not very quiet.
(positive)酒店的硬件设施达到了四星的标准, 24小时供应热水, 水量也很足。但是服务只能算是三星的水平, 早餐最多是二星的标准。由于团队过多, 酒店晚上餐厅不提供点餐服务, 只有自助餐。和其他地方的HolidayInn比起来, 性价比差远了。 The hotel's hardware facilities are up to the standard of four stars, with a 24-hour hot water supply and sufficient water. But the service can only be considered a three-star level, and the breakfast is at most a two-star standard. Due to too many groups, the hotel restaurant does not provide a la carte service in the evening, only a buffet. Compared with HolidayInn in other places, the cost performance is far worse.	(negative)酒店的硬件达到了四星级标准, 热水全天24小时供应。但是服务只有三星级, 早餐最多是两星级。由于团体人数多, 酒店餐厅晚上不提供点餐服务, 只提供自助餐。与其他地方的HolidayInn相比, 它的性价比远不如假日酒店。 The hotel's hardware is up to four-star standard, and hot water is available 24 hours a day. But the service is only three stars, and the breakfast is two stars at most. Due to the large group size, the hotel restaurant does not offer a la carte service in the evening, only a buffet. Compared with HolidayInn in other places, it is far less cost-effective than Holiday Inn.

(label)Original text	(label)Adversarial example	Sim (%)
(negative)虽然预定比较方便, 但是宾馆的对顾客的关怀不足, 房间冷得让人睡不着觉服务人员还说暖和的!༁ Although the reservation is more convenient, the hotel lacks care for customers; the room is so cold that people can not sleep, but the service staff still say warm!!!	(positive)虽然预订比较方便, 但是酒店对顾客的关心是不够的, 房间有些冷让人睡不着, 服务人员也说温暖!༁༁ Although the reservation is more convenient, the hotel is not enough to care for customers; the room is a little cold so that people can not sleep, and the service staff said warm!!!	97.11
(none)无论失去什么, 都不要失去好心情。 Whatever lose, don't lose good mood.	(like)无论我失去什么, 我都不想失去我的好心情。 Whatever I lose, I don't want to lose my good mood.	94.80
(game)《文明II》“新春大礼包”获奖名单《Civilization II》 "Chinese New Year Gift Pack" award list	(fashion)文明II“新春礼盒”获奖名单 Civilization II "Chinese New Year Gift Box" award list	87.12
(negative)可能是酒店比较老的缘故,房间看起来很一般,设施也比较一般. It might be the hotel is older, the rooms look very general and the facilities are also more general.	(positive)也许是因为这家酒店比较老, 房间看起来很普通, 设施也很普通。 Maybe it's because this hotel is older, the rooms look very ordinary and the facilities are also very ordinary.	83.18
(fashion)揭秘韩国时尚舞娘街头魅影 Revealed south Korean fashion dancer street phantom	(game)韩国时尚舞者《街头魅影》亮相 South Korean fashion dancer Phantom of the Street	78.57

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Generating Adversarial Examples in Chinese Texts using Mixed-level Perturbations

Status:

Version 1

Abstract

1 Introduction

2 Related Works

3 Methodology

3.1 Adversarial attacks in text classification

3.2 Characteristics of Chinese

3.3 Ranking the importance of words

3.4 Using WoBERT to generate replacements

3.5 Back-Translation

3.6 Algorithm description

4 Experiment and Analysis

4.1 Experimental setup

4.2 Attack result

4.3 Ablation study

4.3.1 The role of word importance calculation formula

4.3.2 The role of WoBERT's multi-mode prediction

4.3.3 The role of back-translation

5 Conclusion

Declarations

Declarations

Acknowledgements

References

Unsectioned Paragraphs

Unsectioned Tables

Additional Declarations

Status:

Version 1