您好, 访客   登录/注册

基于LDA主题模型的文献特征项多重共现可视化方法

来源:用户上传      作者:翟君伟 瞿英 郭菲 刘滨

  摘 要:文I计量学是运用数学和统计学方法对文献知识单元进行定量分析、揭示文献内部知识内容的一门科学。共现网络分析是文献计量研究中分析文献特征项数据关系的一种可视化方法,根据被分析特征项的数量分为单重共现网络分析和多重共现网络分析。与单重共现网络分析相比,多重共现网络分析增加了特征项的维度,对文献知识的呈现更加深入。但是,由于被分析特征项维度的增加,导致共现网络中的节点数量增多,节点间连线重合度和交叉频率过大,降低了文献计量可视化的效果。因此,目前文献计量共现网络分析主要以单重共现为主,多重共现网络分析可视化效果尚有待提升。
  为解决多重共现网络中节点过多、连线密度过大、不利于发现数据价值以及可视化效果较低等问题,引入LDA主题模型,采用空间划分的方法,将特征项全域可视化的问题转化为子空间可视化问题。首先,使用SATI文献题录信息分析软件抽取文献关键词,进行TF-IDF计算,以计算结果作为实验数据;其次,使用Python构建主题模型,对目标文献集合进行主题聚类分析;最后,使用Ucinet软件对不同主题子空间文献进行多重共现分析,并将子空间分析结果叠加和重构,完成多重共现可视化系统的结构化表达。结果表明:与原多重共现可视化方法相比,在内容呈现等价的前提下,基于LDA主题模型的多重共现可视化改进方法由于缩小了多重共现网络分析系统的规模,即子空间文献数量与特征词数目,因而降低了共现网络中的节点数量和节点间连线密度,使得多重共现可视化系统的结构更为清晰,增加了数据的可读性,突出了数据价值,有效提升了多重共现可视化效果。因此,多重共现可视化改进方法在一定程度上可以推进文献构成元素在多重组合知识挖掘方面的深入研究,提高不同领域文献计量的实证研究质量。
  关键词:管理计量学;LDA主题模型;多重共现分析;Ucinet;可视化
  中图分类号:G353.1 文献标识码:A
  Abstract:Bibliometrics is a science to quantitatively analyze literature knowledge units by using mathematical and statistical methods and reveal the internal knowledge content of literature.Co-occurrence network analysis is a visual method to analyze the data relationship of document characteristic items in bibliometric research.According to the number of analyzed characteristic items,it can be divided into single co-occurrence network analysis and multiple co-occurrence network analysis.Compared with single co-occurrence network analysis,multi co-occurrence network analysis increases the dimension of feature items and presents literature knowledge more deeply.However,due to the increase of the dimension of the analyzed feature items,the number of nodes in the co-occurrence network increases,and the connection coincidence degree and crossover frequency between nodes are too large,which reduces the visualization effect of literature measurement.Therefore,at present,the bibliometric co-occurrence network analysis mainly focuses on single co-occurrence,and the visualization effect of multiple co-occurrence network analysis needs to be improved.In order to solve the problems of too many nodes,too large connection density,disadvantage of discovering the value of data and low visualization effect in multi co-occurrence network,LDA topic model was introduced and the method of spatial division was adopted to transform the global visualization problem of feature items into subspace visualization problem.Firstly,the key words were extracted by using sati document title information analysis software,and the TF-IDF calculation was carried out.The calculation results were taken as the experimental data.Secondly,Python is used to construct a topic model for topic cluster analysis of the target literature set.Finally,Ucinet software was used to analyze the multiple co-occurrence of subspace documents with different topics,and the subspace analysis results are superimposed and reconstructed,so as to complete the structural expression of the multiple co-occurrence visualization system.The results show that compared with the original multi co-occurrence visualization method,the improved multi co-occurrence visualization method based on LDA topic model reduces the number of nodes in the co-occurrence network and the connection density between nodes due to the reduction of the scale of the multi co-occurrence network analysis system,that is,the number of documents and feature words in the subspace.It makes the structure of the multi co-occurrence visualization system clearer,increases the readability of the data,highlights the data value,and effectively improves the multi co-occurrence visualization effect.To a certain extent,this study can promote the in-depth research on knowledge mining of multiple combinations of literature constituent elements,and then improve the quality of empirical research on literature metrology in different fields.

nlc202206151113



转载注明来源:https://www.xzbu.com/1/view-15433605.htm

相关文章