首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
基于文本聚类和NLU的自动文摘系统的研究与实现   总被引:1,自引:0,他引:1  
提出了基于文本聚类和自然语言理解的自动文摘实现方法。它可以克服常规自动文摘方法的不足 ,使文摘的质量和效果得到大大的提高。将文本聚类引入自动文摘中 ,不但使单文档的文摘质量得到提高 ,而且能够实现多文档的自动文摘 ,这是现有的自动文摘技术所没有涉及的  相似文献   

2.
Focusing on the problem that the ant colony algorithm gets into stagnation easily and cannot fully search in solution space, a text clustering approach based on the fusion of the ant colony and genetic algorithms is proposed. The four parameters that influence the performance of the ant colony algorithm are encoded as chromosomes, thereby the fitness function, selection, crossover and mutation operator are designed to find the combination of optimal parameters through a number of iteration, and then it is applied to text clustering. The simulation results show that compared with the classical k-means clustering and the basic ant colony clustering algorithm, the proposed algorithm has better performance and the value of F-Measure is enhanced by 5.69%, 48.60% and 69.60%, respectively, in 3 test datasets. Therefore, it is more suitable for processing a larger dataset. __________ Translated from Journal of Xi’an Jiaotong University, 2007, 41(10): 1146–1150 [译自: 西安交通大学学报]  相似文献   

3.
Most of the previous studies focused on enriching text representation to address text classification (TC) task. However, conventional classification approaches with VSM (vector space model) on Chinese text study intensively only the words and their relationship in some specific corpus/dataset but ignore the basic concept of categories and the general knowledge behind the words learned and used to recognize entities by people. This paper focuses on enriching text representation and proposes a novel approach, which complements information from the online Chinese encyclopedia Baidu Baike for Chinese TC. The similarities between every text and each concept of categories and the most related words from Baidu Baike are added to the feature space. The performance of the proposed approach is measured on the Fudan University TC corpus, which is an imbalanced Chinese dataset. In the experiments, the proposed Baidu Baike‐based concept similarity approach obtains promising results when compared with a previous research and the conventional method, with macro‐precision of 90.31%, recall of 75.45%, and F1 score 80.32%, which are about 0.02%, 0.15%, 0.12%, respectively, higher than the conventional method, which obviously improves the recall for some small categories while keeping precision at high level and improving the macro F1 score. Moreover, the proposed approach has good expandability, so that many other knowledge bases could be integrated and many other concepts could be referred to improve the effectiveness. © 2016 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.  相似文献   

4.
电力设备铭牌包含丰富的设备信息,通过图文识别技术获取设备铭牌信息,可更加高效快捷地完成电力设备的信息统计、台帐校核等工作,也有利于提高电力系统的设备管理水平。针对电力设备铭牌与普通图像文本识别差异较大的特殊应用场景,文中提出一种基于深度学习的电力设备铭牌信息自动识别算法。该算法由铭牌检测、文本检测、文本识别三部分组成。通过改进损失函数设计、增加文本识别结果纠正、人工合成文本图像等方式,使得铭牌检测模型在测试集上的平均精度均值达到92.2%,文本检测模型在测试集上的F1值达到91.2%,文本识别模型的字符识别准确率达到94.0%,文本行识别准确率达到82.3%。  相似文献   

5.
The resolution of overlapping ambiguity strings (OAS) is studied based on the maximum entropy model. There are two model outputs, where either the first two characters form a word or the last two characters form a word. The features of the model include one word in context of OAS, the current OAS and word probability relation of two kinds of segmentation results. OAS in training text is found by the combination of the FMM and BMM segmentation method. After feature tagging they are used to train the maximum entropy model. The People Daily corpus of January 1998 is used in training and testing. Experimental results show a closed test precision of 98.64 % and an open test precision of 95.01 %. The open test precision is 3.76 % better compared with that of the precision of common word probability method. __________ Translated from Transactions of Beijing Institute of Technology, 2005, 25(7): 590–593 (in Chinese)  相似文献   

6.
The development of document image databases is becoming a challenge for document image retrieval techniques. Traditional layout-reconstructed-based methods rely on high quality document images as well as an optical character recognition (OCR) precision, and can only deal with several widely used languages. The complexity of document layouts greatly hinders layout analysis-based approaches. This paper describes a multi-density feature based algorithm for binary document images, which is independent of OCR or layout analyses. The text area was extracted after preprocessing such as skew correction and marginal noise removal. Then the aspect ratio and multi-density features were extracted from the text area to select the best candidates from the document image database. Experimental results show that this approach is simple with loss rates less than 3% and can efficiently analyze images with different resolutions and different input systems. The system is also robust to noise due to its notes and complex layouts, etc. Translated from Journal of Tsinghua University (Science and Technology), 2006, 46(7): 1 231–1 234 [译自: 清华大学学报 (自然科学版)]  相似文献   

7.
电力系统负荷聚类和特性分析对电网的安全与经济调度、运行具有重要意义,是提升调度人员对电网感知能力的重要技术手段。为了解决传统负荷聚类方法需要人工设定负荷特征指标和无法考虑负荷时序特性等问题,提出了一种由长短期记忆(LSTM)自动编码器构成的负荷聚类方法。利用LSTM的时序记忆能力和自动编码器的非线性特征提取能力,实现了考虑负荷时序特性的自动特征提取和非线性降维。然后,基于提取的负荷特征采用k-means聚类算法进行电力负荷聚类分析。最后,采用实际供电区域的负荷数据进行验证,并对负荷特性进行详细的分析。结果表明所提方法与其他负荷特征提取方法相比,有较好的负荷聚类效果。  相似文献   

8.
基于LSA的二次降维法在中文法律案情文本分类中的应用   总被引:1,自引:0,他引:1  
利用文本挖掘来表达文本特征,由于文本表现出巨大的维数,从而导致处理过程计算复杂,因此,首先应该对文本进行降维处理.潜在语义分析理论(latent semantic analysis,LSA)作为一种文本聚类的方法,在有效提取文本信息表现出许多特有的优势,在多个领域中被引用.本文构建了中文法律案情文本分类系统,引入LSA方法进行文本向量空间的二次降维,并利用LSA方法处理后的特征集--文档矩阵代替原有矩阵,从而进一步删除噪声,加快分类系统的处理速度.文中给出了具体实现过程及实验数据,通过实验证明该方法能收到较好的效果.  相似文献   

9.
A new analysis method based on wavelet domain for linear time-varying systems is developed and introduced and it is called system analysis in wavelet domain (SAIWD). Linear time-varying systems described by a higher order differential equation or state-space representation are analyzed in wavelet domain. To solve system equations, they are transferred to wavelet domain by forming algebraic matrix–vector relations using the wavelet transform coefficients. These relations are achieved by defining operator matrices concerned with the commonly used time domain operators. Orthogonal and compact support wavelets provide a simple way to define these operator matrices. It is seen from the solved examples that the percentage error between the analytical and wavelet domain solutions is around 1% in total sampling points.  相似文献   

10.
用户电能表与表箱的关系不清楚可能导致严重后果,用户电能表的所属计量表箱识别是低压配电网计量拓扑识别问题之一.由于同一计量箱的用户入线共用同一条配电箱母线,若已知总等效单相计量箱数量N,则原问题可转换为基于用户电压时序波形的N类分类问题.在分析这种方法在实际工程应用中缺点的基础上,提出了一种融合已知相别和地址信息的低压配电计量表箱识别方法.首先,采用聚类方法将分相后的电压时序波形辨识到配电分线箱层面.然后,利用经过挖掘和补全的地址文本信息构建基于相邻关系的约束,并采用约束的k-medoids聚类算法将用户区划分为表箱层面.最后,通过2个具典型特征的低压配电网案例验证了所提方法的有效性.  相似文献   

11.
电力系统的发展对电力系统知识的使用提出新的要求,为了实现非结构化电网调控文本知识的自动抽取,文中提出了基于注意力的双向长短期记忆网络和条件随机场的深度学习模型。深度学习模型从调度规程等文本数据中抽取电网运行规则和电网事故处理流程。实验结果表明,提出的模型的语料精度、召回率和F1分数分别为91.00%,89.98%和90.49%,结果略优于另外3种模型。在训练集和测试集上分别进行F1评估,识别精度差异很小,说明模型学习中没有发生过拟合现象,提出的深度学习模型具备良好的泛化能力。  相似文献   

12.
The requirement and feasibility of the positioning system using digital television (DTV) broadcasting signals are analyzed. The principle of DTV positioning on the basis of frame synchronization is brought forward and the ranging characteristic is studied that the observables are asynchronously measured during the same epoch interval. The models of the pseudo-range observation and Doppler carrier phase integral are researched. The system observation and state equations are presented on the basis of the above models. The simulation results showed that DTV positioning technology could remarkably improve the precision of system state estimates using smoothing methods for positioning systems or integrated navigation systems. The DTV positioning that has a sub-meter level ranging error and meter level positioning accuracy can parallel with and even taken as a beneficial substitute for the tradition positioning technology. __________ Translated from Journal of Southeast University (Natural Science Edition), 2006, 36(5): 690–694 [译 自: 东南大学学报(自然科学版)]  相似文献   

13.
Direction of arrival (DOA) estimation is one of the key technologies in smart antennas in the direct sequence ultra wideband (DS-UWB) system. Traditional DOA estimation methods based on narrow-band signals are not suitable for such system. Therefore, a fourth-order cumulant-based estimation method ofDOA for DS-UWB signal is proposed. This method is set on the frequency domain model of DS-UWB array signal. Simulation results show that the algorithm is effective and can guarantee adequate estimation accuracy. __________ Translated from Journal of University of Electronic Science and Technology of China, 2007, 36(2): 190–192 [译自: 电子科技大学学报]  相似文献   

14.
宽带扫频RCS自动测量系统   总被引:1,自引:0,他引:1  
传统的微波暗室远场雷达散射截面积(RCS)测量方法采用点频测试。对低RCS目标,点频RCS测量系统精度差,信息量少。为克服这些缺点,文中运用宽带扫频测量RCS的原理和方法,构建基于该方法的自动测量系统,以得到较高的精度。该系统利用矢量网络分析仪所具有的时域功能,把目标的频率响应变换到时域中进行分析,再通过软件上的背景回波消除技术,进一步降低背景回波的干扰,最后得到目标的RCS。实验证明,该方法系统测量精度高、结果准确,值得深入研究和推广。  相似文献   

15.
针对目前语音转录文本错误率较高的问题,本文提出一种基于MacBERT的文本先检错后纠错模型,对语音转录后文本进行校正。检错阶段使用MacBERT-BiLSTM-CRF模型检查文本是否有错及出错位置。纠错阶段从置信度和字音相似度两个维度出发,划定“置信度-字音相似度”曲线判断候选字是否进行纠错。候选字的置信度使用MacBERT语言模型计算,并提出一种基于拼音码的字音相似度计算方法。在语音公开数据集Thchs-30上通过调用百度语音识别API进行实验,相比现有方法,在检错阶段和纠错阶段的精确率、召回率、F1值都得到了提高,其中纠错阶段精确率达到83.32%,提高了转录文本的正确性。  相似文献   

16.
This article studies a third-order trajectory planning method for point-to-point motion. All available instances for third-order trajectory planning are first analyzed. To distinguish those, three criteria are presented relying on trajectory characteristics. Following that, a fast preprocessing approach considering the trajectory as a whole is given based on the criteria constructed and system constraints. Also, the time-optimality of the trajectory is obtained. The relevant formulas are derived with the combination of geometrical symmetry of trajectory and area method. As a result, an accurate algorithm and its implementation procedure are proposed. The experimental results show the effectiveness and precision of the proposed method. The presented algorithm has been applied in semiconductor manufacturing equipment successfully. __________ Translated from Journal of Huazhong University of Science and Technology (Natural Science Edition), 2007, 35(12): 58–61 [译自: 华中科技大学学报 (自然科学版)]  相似文献   

17.
知识获取多年来一直被认为是阻碍智能系统开发的瓶颈问题,尤其是互联网时代,大量的信息都以非结构化的文本形式存在。本文运用分布式计算思想设计了一个基于互联网大规模语料库的知识自动获取系统。采用弱监督条件下机器学习的方法对信息自动挖掘和获取,实现机器对知识的自动学习和挖掘、新词词典发现、实体关系模板提取、命名实体识别等功能。利用该系统分别对未登录新词发现和地名识别两种应用进行了实验,运用N gram和互信息(PMI)方法分别取得了72.1%和87.28%的准确率。  相似文献   

18.
The back-propagation (BP) neural network is proposed to correct nonlinearity and optimize the force measurement and calibration of an optical tweezer system. Considering the low convergence rate of the BP algorithm, the Levenberg-Marquardt (LM) algorithm is used to improve the BP network. The proposed method is experimentally studied for force calibration in a typical optical tweezer system using hydromechanics. The result shows that with the nonlinear correction using BP networks, the range of force measurement of an optical tweezer system is enlarged by 30% and the precision is also improved compared with the polynomial fitting method. It is demonstrated that nonlinear correction by the neural network method effectively improves the performance of optical tweezers without adding or changing the measuring system. __________ Translated from Optics and Precision Engineering, 2008, 16(1): 6–10 [译自: 光学精密工程]  相似文献   

19.
一种汽车载体自动称重装置的研究   总被引:1,自引:0,他引:1  
汽车等运输工具装载量的称重主要采用两种称重装置:一种是用机械方法构成的地磅称重,另一种是采用传感器作测力装置的电子衡.上述两种装置虽然具有较高的计量精度,但应用范围受到限制,况且只能在固定地点使用.本文介绍的汽车载体自动称重装置不仅解决了上述难题,同时也为汽车运输和车辆管理部门提供了一种结构简单、操作方便、可随时随地自动测量汽车装载重量的一种仪器.本文重点介绍了汽车载体自动称重装置的系统结构、传感系统的工作原理、设计和计算方法,同时介绍了采用单片微控制器技术及24位∑-△A/D转换技术,实现对称重信号高精度测量的硬件电路和软件设计方法.目前的称重精度只有2%左右,允许偏载20%.  相似文献   

20.
Question-answering systems provide short answers with the use of available information. The implementation mechanism for a question answering system is presented in this paper and is based on concepts and statistics. The system determines the question and focuses on the answer types, making different conceptual expansions for different questions. It applies the latent semantic indexing (LSI) method to retrieve relevant passages. It uses matching algorithms to find a match between questions and sentences stored in a database. It also extracts answers from a frequently asked questions (FAQ) database by finding matching or similar sentences. The answering ability of the system has been improved with the use of LSI and FAQ. The question-answering system introduced in Chinese universities is a developed and proven system capable of precise results. Translated from Journal of Dalian University of Technology, 2006, 46(2): 280–285 [译自: 大连理工大学学报]  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号