当前位置:首页  学术交流

学术讲座【SUDMAD: Sequential and Unsupervised Decomposition of a Multi-Author Document Based on a Hidden Markov Model】

时间:2017-09-10浏览:834设置

时间:2017年9月12日(星期二)9:00-10:00

地点:仓山校区成功楼603报告厅

主讲:何祥健

主办:数学与信息学院、福建省网络安全与密码技术重点实验室

专家简介:Professor Xiangjian He is the Director of Computer Vision and Pattern Recognition Laboratory at the Global Big Data Technologies Centre (GBDTC), at the University of Technology, Sydney (UTS). He is the Director of UTS-NPU International Joint Laboratory on Digital Media and Intelligent Networks. He is an IEEE Senior Member and has been an IEEE Signal Processing Society Student Committee member. He has been awarded 'Internationally Registered Technology Specialist' by International Technology Institute (ITI). He has been carrying out research mainly in the areas of computer vision, network security, and pattern recognition in the previous years. He has played various chair roles in many international conferences such as ACM MM, MMM, IEEE TrustCom, IEEE CIT, IEEE AVSS, IEEE ICPR and IEEE ICARCV. In recent years, he has many high quality publications in IEEE Transactions journals such as IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Mobile Computing, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Cloud Computing, IEEE Transactions on Reliability, IEEE Transactions on Consumer Electronics, and in Elsevier’s journals such as Pattern Recognition, Signal Processing, Neurocomputing, Future Generation Computer Systems, Computer Networks, Computer and System Sciences, Network and Computer Applications. He has also had papers published in premier international conferences and workshops such as ACL, IJCAI, CVPR, ECCV, ACM MM, TrustCom and WACV. He has recently played editorial roles for various international journals such as Journal of Computer Networks and Computer Applications (Elsevier) and Signal Processing (Elsevier). He is currently an Advisor of HKIE Transactions. Since 1985, he has been an academic, a visiting professor, an adjunct professor, a postdoctoral researcher or a senior researcher in various universities/institutions including Xiamen University, China, Shanghai Jiaotong University, China, NorthwesternPolytechnical University, China, University of New England, Australia, University of Georgia, USA, Electronic and Telecommunication Research Institute (ETRI) of Korea, University of Aizu, Japan, Hongkong Polytechnic University, and Macau University.

报告摘要:Decomposing a document written by more than one author into sentences based on authorship is of great significance due to the increasing demand for plagiarism detection, forensic analysis, civil law (i.e., disputed copyright issues) and intelligence issues that involve disputed anonymous documents. Among existing studies for document decomposition some were limited by specific languages, according to topics or restricted to a document of two authors, and their accuracies have big rooms for improvement. In this paper, we consider the contextual correlation hidden among sentences and propose an algorithm for Sequential and Unsupervised Decomposition of a Multi-Author Document (SUDMAD) written in any language disregarding to topics, through the construction of a Hidden Markov Model (HMM) reflecting authors’ writing styles. To build and learn such a model, an unsupervised, statistical approach is first proposed to estimate the initial values of HMM parameters of a preliminary model, which does not require the availability of any information of authors or document’s context other than how many authors have contributed to writing the document. To further boost the performance of this approach, a boosted HMM learning procedure is proposed next, where the initial classification results are used to create labelled training data to learn a more accurate HMM. Moreover, the contextual relationship among sentences is further utilized to refine the classification results. Our proposed approach is empirically evaluated on three benchmark datasets which are widely used for authorship analysis of documents. Comparisons with recent state-the-art approaches are also presented to demonstrate the significance of our new ideas and the superior performance of our approach.


返回原图
/