更全的杂志信息网

Automatic Satisfaction Analysis in Call Centers Considering Global Features of Emotion and Duration

更新时间:2016-07-05

Call centers have been widely used for customer services, technical support and sales. Call center agents are the key to the success of a call center operation. To evaluate the job performance of agents, 7 quantitative indicators are proposed in Ref. [1] which are service quality indicator, test score, personal attendance, total calls per hour, first call resolution, survey successful rate and customers’ satisfaction. Measuring and monitoring of customers’ satisfaction is an essential issue. Deficiencies of services and businesses can be clearly understood through the analysis of customers’ satisfaction. A large number of dialogue data are produced in call centers every day, and it is impossible to process by artificial means. So an intelligent system which is aimed to accomplish the satisfaction analysis automatically is greatly needed.

Meanwhile, the paralinguistic information has got more and more attention recently. Since 2010, INTERSPEECH has a challenge about computational paralinguistic. The goal is to identify more information from a speaker’s voice, such as personality, likability, drunken, social information and so on[2-6]. Inspired by these studies, this paper is concentrated on the degree of customer satisfaction of the services.

Various studies have carried out to investigate customers’ satisfaction. Research in Ref.[7] shows that agent traits can influence customers’ satisfaction. The knowledgeableness and preparedness of an agent are good indicators of his/her service quality according to the authors. The work in Ref.[8] predicts the customers’ satisfaction using affective features and textual features based on the context of a customer in social media. The affective features contain customer’s and agent’s personality traits and emotion expression.It is demonstrated in Ref.[9] that negative emotion between a customer and an agent, especially angry emotion, can deliver useful information to analyze the customer satisfaction.The authors use acoustic and lexical features to recognize the customers’ emotion and computed the proportion of emotional turns as the indicator of customer satisfaction.

In our research, we have collected thousands of dialogue speech from call centers to analyze the customers’ satisfaction without a speech recognizer[10].Since the customer may talk while driving, taking bus, taking the subways and so on. The channel noise, background noise and talking style can dramatically decrease the recognition accuracy. So it is hard to recognize the speech with a high accuracy.Our method is that extracting the acoustic features from the customers’ fragments to recognize the emotion and extracting the global features of emotion and duration to analyze the satisfaction based on the emotion recognition result.

1 Data Processing

The corpus is gathered from China Mobile’s call center that provides support to Shanghai customers in Chinese language. After each phone call, the customers are required to feedback by short message whether he/she is satisfied with the agent service. The database contains 5 684 recording audio files, out of which there are 1 170 dissatisfaction labeled recordings occupying 60 h and 4 514 satisfaction labeled recordings occupying 100 h. The duration of each dialogue varies from 20 s to 20 min. The sampling frequency is 6 kHz and the resolution is 16 bit. The training set contains 836 dissatisfaction recordings and 4 180 satisfaction recordings. The testing set contains 334 dissatisfaction recordings and 334 satisfaction recordings.

Firstly, all the customers’ fragments are collected from the dialogue. And then the acoustic features are extracted from every customer’s fragment. The purpose of the emotion recognition is to obtain the emotion confidence of every fragment. Next we do the satisfaction analysis. The input of the model are global features of emotion and duration. They are extracted based on the emotion confidence.The output of the model is customer satisfaction.

1.1 Segmentation and annotation

From the experiment, the SVM has the best performance (F score is 0.710) when the ratio is 1∶3. It is concluded that the ratio of dissatisfaction recordings to satisfaction recordings in training set has an influence on a system performance.

The object of an emotion annotation is customer voice. Six emotion labels are used: hot anger (HA), cold anger (CA), boredom (B), disappointment (D), neutral (N) and joy (J). The emotion annotation group has three annotators.They are all college students of about 24 years old. Before the annotation, the three annotators are trained to test the six kinds of emotion. Annotators need to label all of the customers’ fragments.The customers’ emotions are classified into positive emotions (neutral and joy) and negative emotions (hot anger, cold anger, boredom and disappointment) artificially.When a fragment has the same tag from more than one annotator, we take it as a sample for the study. Totally, we get 5 478 negative emotion fragments and 5 647 positive emotionfragments.

1.2 Negative emotion distribution

X—the ratio of negative emotion fragments in a dialogue; Y— the ratio of recordings which contain the different ratio of negative emotion fragments in the dataset Fig.1 Negative emotion distribution

The purpose of our system is to find out the satisfaction recordings. An investigation is preformed to find out the relation between the satisfaction and the emotion fragments of customers. Fig.1 shows the correlation between negative emotionsand satisfaction. The x axis delegates the ratio of negative fragments to all the fragmentsin a dialogue. The y axis delegates the ratio of recordings to all the satisfaction recordings/dissatisfaction recordings. For example, 0% on the x axis means the recording doesn’t contain any negative emotion, and 78% on the y axismeans that 78% of the satisfied recordings do not contain any negative emotion.

We employ openSMILE[11-12], a feature extraction toolkit for speech, to extract 384 features with a predefined configuration file[13]. The details are exhibited in Tab.1.The low level acoustic features are extracted on a frame level. These low level descriptors (LLD) and their delta coefficients are projected onto 12 statistic functions. The total number is 16×2×12=384.

2 Features

2.1 Acoustic features

Fig.1 illustrates that negative emotions have different distribution between satisfaction recordings and dissatisfaction recordings. Fig.1a shows the negative emotion distribution among all the satisfaction recordings and while Fig.1b shows the negative emotion distribution among all the dissatisfaction recordings.From Fig.1 we can see that only 22% of satisfaction recordings contain the negative emotions but all the dissatisfaction recordings contain the negative emotions. So it is effective to analyze the customer satisfaction by recognizing the emotions.However,it is not sufficient to merely consider the ratio of negative emotions because some satisfaction recordings contain the negative emotions. So the position and duration of the negative emotions in the recordings need to be considered.

Tab.1 Details of 384 features

LLD(16×2)Functional(12)(Δ)ZCRMean(Δ)RMSenergyStandarddivision(Δ)F0Kurtosis,skewness(Δ)HNRExtremes:value,rel.position,range(Δ)MFCC1-12Linearregression:offest,slope,MSE

2.2 Global features of emotion and duration

Global features are extracted based on emotion confidence. It is the result of the emotion recognition. The emotion confidence means the intensity of emotion expression. The larger the absolute value of emotion confidence is, the more obviously the emotion is expressed. According to annotation experiments and data statistics, we find that the customers’ negative emotion position has an influence on the satisfaction degree of the dialogue. The regulation is more rearward, more important. So the statistic features not only contain the information of emotion intensity, but also the information of emotion position. The dialogue is divided into beginning, middle, and ending according to its duration and the number of fragments. The negative emotion rate and the negative emotion intensity are calculated respectively. In the satisfied and unsatisfied dialogues, the duration has a great difference between customer and agent. Generally speaking, the customer’s duration is longer than the agent’s in an unsatisfied dialogue. So 13 rhythm features are added which contain the information of customers’ and agents’ duration and interaction. There are 54 global features totally. The details are shown in Tab.2.

Tab.2 Details of 54 features

FeatureDetailGlobalfeature(7)Emotionalconfidence’smaximum,minimum,range,maximum’sposition,minimum’sposition,slope,zero-crossingratePositiveemotionrelatedfeature(6)Negativeemotionrelatedfeature(6)Emotionalconfidence’svariance,confidence’smean,totalduration,totalturns,totalduration*confidence,totalconfidence’svalueDuration(2)Positiveemotionduration/totalduration,negativeemotionduration/totaldurationTurn(2)Positiveemotionturns/totalturns,negativeemotionturns/totalturnsTrisectspeechaccordingtoduration(9)Trisectspeechaccordingtosegments(9)Negativeemotionratio(3),totalnegativeemotionconfidence(3),totalnegativeemotionduration*confidence(3)Halvethespeechaccordingtoduration(13)Durationofthesecondhalfconversation,totaldurationratioofagentsegmentsinthesecondhalfconversation,totaldurationratioofsilencesegmentsinthesecondhalfconversation,averagedurationofagentsegmentsinthesecondhalfconversation,averagedurationofcustomersegmentsinthesecondhalfconversation,averagedurationofsilencesegmentsinthesecondhalfconversation,amountratioofagentsegmentsinthesecondhalfconversation,amountratioofsilencesegmentsinthesecondhalfconversation,averagedurationofallsegmentsinthesecondhalfconversation,totaldurationratioofagentsegmentstocustomersegmentsinthesecondhalfconversa-tion,averagedurationratioofagentsegmentstocustomersegmentsinthesecondhalfconver-sation,numberofthebreakafterlongsilencesegmentbycustomer,averagespeedratioofcustomersegmentsinthefirsthalftothatinthesecondhalfcon-versation

3 Baseline System and Proposed System

3.1 Baseline system

The baseline system assumes that customers’ satisfaction is constant during the dialogue. It only extracts 384 acoustic from the customers’ voice without considering the global features of emotion and duration to analyze the customer satisfaction. The basic framework is shown in Fig.2.

Fig.2 Overview of baseline system

3.2 Proposed system

五台群金岗库组为一套以火山岩为主的沉积变质岩系。下部以超基性—基性火山喷发开始,主要为拉斑玄武岩,向上有中基性熔岩、凝灰岩、安山岩和安山凝灰岩,上部主要为长石砂岩、粘土质粉砂岩及粘土质岩。该套火山岩经区域变质后,下部岩石为黑云角闪斜长片麻岩、斜长角闪岩夹磁铁石英岩,向上过渡为黑云变粒岩、角闪黑云变粒岩夹浅粒岩。

Fig.3 Overview of the proposed system

(4)系统流程设计测试。进行系统初期数据、业务单据设计及流程设计测试工作。通过与院属各单位会计人员的面对面沟通交流,完成了期初数据导入、年初预算下发分解、网上报销单据的流程设计及测试、全业务上线模拟测试工作。完成了高拍仪扫描自动纠偏上传接口开发,手机App查询与审批流程开发测试工作。

4 Experiments and Results

SVM[14-15] classifier with radial basic function is used for baseline of the proposed method. The optimal cost function parameter C and kernel function parameter g are obtained by 5-fold cross validation approach. The performance of the system is measured by F value which is defined as a harmonic mean of precision (P) and recall (R). The formula is as

To evaluate the performance of the proposed system, we compared to the baseline which extracted the acoustic features on the whole utterance to analyze the customer satisfaction without considering the customer emotion. During the training process, we assign five ratios: 1∶1, 1∶2, 1∶3, 1∶4, and 1∶5 of dissatisfaction recordings to satisfaction recordings. In order to ensure the robustness and practicability of the system, we use the recordings without manual correction as training set and test set. The final number of training set and testing set is shown in Tab.4. Five sets of experiments are conducted for comparison. Tab.5 shows the results in detail.

全国各地中考数学试卷,在题量和每个题的内容设置上都呈现出相对的稳定性,这一点对当地的教学尤其对中考复习具有一定的导向性,会增强一线教师中考复习的针对性,能收到事半功倍的效果.现以天津市2017,2018年中考卷的第24题为例,就中考试题引领中考复习针对性的实践谈点拙见.

Tab.3 Results of emotion recognition

FPR0.820.760.88

P— precision; R— recall; F— a harmonic mean value of precision and recall

To obtain the emotion confidence, SVM is utilized to classify the customers’ emotions into negative emotions and positive emotions. The emotion confidence means the signed distance between samples point and hyper plane in SVM. When the emotion confidence is greater than zero, the corresponding sample is recognized as negative emotion. Otherwise is positive emotion. The typical emotional fragments are used to validate the emotion model. Training set contains 3 835 negative fragments and 3 953 positive fragments. Testing set consists of 1 643 negative fragments and 1 694 positive fragments. The results are shown in Tab.3.

Tab.4 Size of training set and testing set

DatasetUnsatisfiedrecordingSatisfiedrecordingTestingset334334Trainingset836836n

n— the factor which control the ratio of unsatisfied recording to satisfied recording in training set.

Tab.5 Satisfaction analysis results

System1∶11∶21∶31∶41∶5avgBaseline0.6690.6720.6810.6550.6430.664Proposed0.7010.7020.7100.6980.6950.701

5 Conclusion

Tab.5 shows that the proposed system has a better performance than the baseline system. The average F value is improved to 0.701 from 0.664 with an increase of 5.57%. The baseline assumes that customers’ attitude does not vary and the acoustic features are only used to analyze the satisfaction. But in a real conversation between a customer and an agent, the interaction happens more than once. Customers can utter multiple sentences during the interaction. One difficulty in analyzing customers’ satisfaction is its ambiguity. Not all of the sentences exhibit the characteristics of satisfaction or dissatisfaction. However, almost all the dissatisfaction recordings have negative emotions. So the proposed system that combining the local acoustic features and the global features of emotions and durations can improve the effectivness in analyzing the customer satisfaction.

The whole process of segmentation has two steps: automatic segmentation and manual correction.The fragment is labeled with one of the four labels which are agent voice (A), customer voice(B), silence & noise (S) and overlap (AB). Overlapped fragments contains more than one speakers.The automatic segmentations are produced by a commercial ASR engine. These automatic segmentations are then corrected manually for the segmentation point and labeled with the speaker tag.

Our system analyzes the customer satisfaction based on local acoustic features and global features of emotions and durations. The system consists of two steps: local emotion recognition and global customer satisfaction analysis. In the first step, we mainly detect the customer’s emotions on the customer’s fragment level using the acoustic features. In the second step, we estimate the result of the firststep on the whole utterances level to analyze the customer satisfaction. Fig.3 shows the diagram of the proposed system.

In summary, a method is proposed to analyze the customer satisfaction using the acoustic features and global features of emotion and duration.The acoustic features are used to recognize the customer emotion of customer’s fragments. And then, global features of emotion and duration are extracted based on the emotion recognition results and used to conduct the satisfaction analysis. The global features not only contain the intensity of the customers’ emotions, but also the position and duration of customers’ emotions. Experiments show that this novel method can improve F value of the performance with 5.57%.

Jm170,G、Lx191和Jm92菌株分别接种于LB液体摇瓶中,接菌的摇瓶置于转速200 r/min,温度30℃摇床中培养48 h,待各摇瓶的细菌数量至108 cfu/mL即可使用。混合功能菌液为接不同菌株的菌液以一定比例(1∶1∶1∶1)混合均匀。用不同配比载体接混合功能菌制作微生物接种剂,将制作好的微生物接种剂置于28℃培养箱中培养7~10 d,待用。

公共管理同时也是一个应用性的研究领域,其关注的话题都与社会现实密切相关,并落实于公共政策的制定,因而“从现实中来,到现实中去”是公共管理最重要的研究路径,这决定了社会调查是公共管理学的主要研究手段。参与导师的课题或者在导师的指导下设计调查方案并开展具体的调研活动,有助于培养学生的实践能力,避免走弯路。就公共管理中的社会保障专业而言,不论是养老还是医疗,都需要深入实际调研才能得出客观结论并以此为依据提出行之有效的政策方案,因此导师基于学科领域结成的业务关系与调研经验,都是学生宝贵的学习资源。

表6显示,孵化器空置率2015年为38%,2016年为36%,2017年上半年为23%,在逐年下降。但孵化器的空置率不均衡,几家欢喜几家愁。其中,有的孵化器空置率为0,但也有个别孵化器空置率高达95%。比较综合孵化器和专业孵化器,我们发现专业孵化器的空置率相对较高。

In the future study, we will pay attention to two aspects. Firstly, we will try to classify the customers’ emotions into multi-classes to analyze the customers’ emotions with more details. Secondly, we will shorten the unit of emotion recognition. In the current experiment each turn of customer is used as unit no matter how long it is. So, we’d like to segment customers’ turn into some kind of unit with suitable length for emotion analysis.

References:

[1] Hsu H H, Chen T C, Chan W T, et al. Performance evaluation of call center agents by neural networks[C]∥2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA). Crans-Montana, Switzerland: IEEE, 2016: 964-968.

[2] Burkhardt F, Schuller B, Weiss B, et al. “Would you buy a car from me?”-On the likability of telephone voices[C]∥INTERSPEECH, Florence, Italy, 2011.

[3] Schuller B, Batliner A, Steidl S, et al. The INTERSPEECH 2011 Speaker state challenge[C]∥Proceedings INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 2011.

[4] Schuller B, Steidl S, Batliner A, et al. The INTERSPEECH 2012 speaker trait challenge[C]∥INTERSPEECH, Portland, Oregon, USA, 2012.

[5] Schuller B, Steidl S, Batliner A, et al. The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism[C]∥INTERSPEECH 2013, Conference of the International Speech Communication Association, 2013.

[6] Schuller B, Steidl S, Batliner A, et al. The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load[C]∥INTERSPEECH, Max Atria, Singapore, 2014.

[7] Froehle C M. Service personnel, technology, and their interaction in influencing customer satisfaction[J]. Decision Sciences, 2006, 37(1): 5-38.

[8] Herzig J, Feigenblat G, Shmueli-Scheuer M, et al. Predicting customer satisfaction in customer support conversations in social media using affective features[C]∥Proceedings of the 2016 Conference on User Modeling Adaptation and Personalizationm, Halifax, Canada, 2016.

[9] Vaudable C, Devillers L. Negative emotions detection as an indicator of dialogs quality in call centers[C]∥Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on IEEE, Kyoto, Japan, 2012.

[10] Xie Xiang, Kuang Jingming. Mandarin digits speech recognition using support vector machines[J]. Journal of Beijing Institute of Technology, 2005, 14(1): 9-12.

[11] Eyben F, Wöllmer M, Schuller B. Opensmile: the munich versatile and fast open-source audio feature extractor[C]∥Proceedings of the 18th ACM international Conference on Multimedia, Firenze, Italy, 2010.

[12] Eyben F, Weninger F, Gross F, et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor[C]∥Proceedings of the 21st ACM international conference on Multimedia, Barcelona, Spain, 2013.

[13] Schuller B, Steidl S, Batliner A. The INTERSPEECH 2009 emotion challenge[C]∥INTERSPEECH, Brighton, United Kingdom, 2009: 312-315.

[14] Zhang Xuegong. Introduction to statistical learning theory and support vector machines[J]. Acta Automatica Sinica, 2000, 26(1): 32-42. (in Chinese)

[15] Chang C C, Lin C J. LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 27.

Jing Liu, Chaomin Wang, Yingnan Zhang, Pengyu Cong, Liqiang Xu,Zhijie Ren, Jin Hu, Xiang Xie, Junlan Feng,Jingming Kuang
《Journal of Beijing Institute of Technology》2018年第1期文献

服务严谨可靠 7×14小时在线支持 支持宝特邀商家 不满意退款

本站非杂志社官网,上千家国家级期刊、省级期刊、北大核心、南大核心、专业的职称论文发表网站。
职称论文发表、杂志论文发表、期刊征稿、期刊投稿,论文发表指导正规机构。是您首选最可靠,最快速的期刊论文发表网站。
免责声明:本网站部分资源、信息来源于网络,完全免费共享,仅供学习和研究使用,版权和著作权归原作者所有
如有不愿意被转载的情况,请通知我们删除已转载的信息 粤ICP备2023046998号