首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
Robust regression plays an important role in many machine learning problems. A primal approach relies on the use of Huber loss and an iteratively reweighted l2 method. However, because the Huber loss is not smooth and its corresponding distribution cannot be represented as a Gaussian scale mixture, such an approach is extremely difficult to handle using a probabilistic framework. To address those limitations, this paper proposes two novel losses and the corresponding probability functions. One is called Soft Huber, which is well suited for modeling non-Gaussian noise. Another is Nonconvex Huber, which can help produce much sparser results when imposed as a prior on regression vector. They can represent any lq loss (12q<2) with tuning parameters, which makes the regression modelmore robust. We also show that both distributions have an elegant form, which is a Gaussian scale mixture with a generalized inverse Gaussian mixing density. This enables us to devise an expectation maximization (EM) algorithm for solving the regression model.We can obtain an adaptive weight through EM, which is very useful to remove noise data or irrelevant features in regression problems. We apply our model to the face recognition problem and show that it not only reduces the impact of noise pixels but also removes more irrelevant face images. Our experiments demonstrate the promising results on two datasets.  相似文献   

2.
A k-CNF (conjunctive normal form) formula is a regular (k, s)-CNF one if every variable occurs s times in the formula, where k≥2 and s>0 are integers. Regular (3, s)- CNF formulas have some good structural properties, so carrying out a probability analysis of the structure for random formulas of this type is easier than conducting such an analysis for random 3-CNF formulas. Some subclasses of the regular (3, s)-CNF formula have also characteristics of intractability that differ from random 3-CNF formulas. For this purpose, we propose strictly d-regular (k, 2s)-CNF formula, which is a regular (k, 2s)-CNF formula for which d≥0 is an even number and each literal occurs sd2 or s+d2 times (the literals from a variable x are x and ¬x, where x is positive and ¬x is negative). In this paper, we present a new model to generate strictly d-regular random (k, 2s)-CNF formulas, and focus on the strictly d-regular random (3, 2s)-CNF formulas. Let F be a strictly d-regular random (3, 2s)-CNF formula such that 2s>d. We show that there exists a real number s0 such that the formula F is unsatisfiable with high probability when s>s0, and present a numerical solution for the real number s0. The result is supported by simulated experiments, and is consistent with the existing conclusion for the case of d= 0. Furthermore, we have a conjecture: for a given d, the strictly d-regular random (3, 2s)-SAT problem has an SAT-UNSAT (satisfiable-unsatisfiable) phase transition. Our experiments support this conjecture. Finally, our experiments also show that the parameter d is correlated with the intractability of the 3-SAT problem. Therefore, our research maybe helpful for generating random hard instances of the 3-CNF formula.  相似文献   

3.
王梅  许传海  刘勇 《计算机应用》2021,41(12):3462-3467
多核学习方法是一类重要的核学习方法,但大多数多核学习方法存在如下问题:多核学习方法中的基核函数大多选择传统的具有浅层结构的核函数,在处理数据规模大且分布不平坦的问题时表示能力较弱;现有的多核学习方法的泛化误差收敛率大多为O1/n,收敛速度较慢。为此,提出了一种基于神经正切核(NTK)的多核学习方法。首先,将具有深层次结构的NTK作为多核学习方法的基核函数,从而增强多核学习方法的表示能力。然后,根据主特征值比例度量证明了一种收敛速率可达O1/n的泛化误差界;在此基础上,结合核对齐度量设计了一种全新的多核学习算法。最后,在多个数据集上进行了实验,实验结果表明,相比Adaboost和K近邻(KNN)等分类算法,新提出的多核学习算法具有更高的准确率和更好的表示能力,也验证了所提方法的可行性与有效性。  相似文献   

4.
Dimensionality reduction (DR) methods based on sparse representation as one of the hottest research topics have achieved remarkable performance in many applications in recent years. However, it’s a challenge for existing sparse representation based methods to solve nonlinear problem due to the limitations of seeking sparse representation of data in the original space. Motivated by kernel tricks, we proposed a new framework called empirical kernel sparse representation (EKSR) to solve nonlinear problem. In this framework, nonlinear separable data are mapped into kernel space in which the nonlinear similarity can be captured, and then the data in kernel space is reconstructed by sparse representation to preserve the sparse structure, which is obtained by minimizing a ?1 regularization-related objective function. EKSR provides new insights into dimensionality reduction and extends two models: 1) empirical kernel sparsity preserving projection (EKSPP), which is a feature extraction method based on sparsity preserving projection (SPP); 2) empirical kernel sparsity score (EKSS), which is a feature selection method based on sparsity score (SS). Both of the two methods can choose neighborhood automatically as the natural discriminative power of sparse representation. Compared with several existing approaches, the proposed framework can reduce computational complexity and be more convenient in practice.  相似文献   

5.
针对Blow-CAST-Fish算法攻击轮数有限和复杂度高等问题,提出一种基于差分表的Blow-CAST-Fish算法的密钥恢复攻击。首先,对S盒的碰撞性进行分析,分别基于两个S盒和单个S盒的碰撞,构造6轮和12轮差分特征;然后,计算轮函数f3的差分表,并在特定差分特征的基础上扩充3轮,从而确定密文差分与f3的输入、输出差分的关系;最后,选取符合条件的明文进行加密,根据密文差分计算f3的输入、输出差分值,并查寻差分表找到对应的输入、输出对,从而获取子密钥。在两个S盒碰撞的情况下,所提攻击实现了9轮Blow-CAST-Fish算法的差分攻击,比对比攻击多1轮,时间复杂度由2107.9降低到274;而在单个S盒碰撞的情况下,所提攻击实现了15轮Blow-CAST-Fish算法的差分攻击,与对比攻击相比,虽然攻击轮数减少了1轮,但弱密钥比例由2-52.4提高到2-42,数据复杂度由254降低到247。测试结果表明,在相同差分特征基础上,基于差分表的攻击的攻击效率更高。  相似文献   

6.
高闯  唐冕  赵亮 《计算机应用》2021,41(12):3702-3706
针对现有表位预测方法对抗原中存在的重叠表位预测能力不佳的问题,提出了将基于局部度量(L-Metric)的重叠子图发现算法用于表位预测的模型。首先,利用抗原上的表面原子构建原子图并升级为氨基酸残基图;然后,利用基于信息流的图划分算法将氨基酸残基图划分为互不重叠的种子子图,并使用基于L-Metric的重叠子图发现算法对种子子图进行扩展以得到重叠子图;最后,利用由图卷积网络(GCN)和全连接网络(FCN)构建的分类模型将扩展后的子图分类为抗原表位和非抗原表位。实验结果表明,所提出的模型在相同数据集上的F1值与现有表位预测模型DiscoTope 2、ElliPro、EpiPred和Glep相比分别提高了267.3%、57.0%、65.4%和3.5%。同时,消融实验结果表明,所提出的重叠子图发现算法能够有效改善预测能力,使用该算法的模型相较于未使用该算法的模型的F1值提高了19.2%。  相似文献   

7.
In this paper, we consider the k-prize-collecting minimum vertex cover problem with submodular penalties, which generalizes the well-known minimum vertex cover problem, minimum partial vertex cover problem and minimum vertex cover problem with submodular penalties. We are given a cost graph G=(V,E;c) and an integer k. This problem determines a vertex set SV such that S covers at least k edges. The objective is to minimize the total cost of the vertices in S plus the penalty of the uncovered edge set, where the penalty is determined by a submodular function. We design a two-phase combinatorial algorithm based on the guessing technique and the primal-dual framework to address the problem. When the submodular penalty cost function is normalized and nondecreasing, the proposed algorithm has an approximation factor of 3. When the submodular penalty cost function is linear, the approximation factor of the proposed algorithm is reduced to 2, which is the best factor if the unique game conjecture holds.  相似文献   

8.
Sparse representation has been widely used in signal processing, pattern recognition and computer vision etc. Excellent achievements have been made in both theoretical researches and practical applications. However, there are two limitations on the application of classification. One is that sufficient training samples are required for each class, and the other is that samples should be uncorrupted. In order to alleviate above problems, a sparse and dense hybrid representation (SDR) framework has been proposed, where the training dictionary is decomposed into a class-specific dictionary and a non-class-specific dictionary. SDR puts 1 constraint on the coefficients of class-specific dictionary. Nevertheless, it over-emphasizes the sparsity and overlooks the correlation information in class-specific dictionary, which may lead to poor classification results. To overcome this disadvantage, an adaptive sparse and dense hybrid representation with nonconvex optimization (ASDR-NO) is proposed in this paper. The trace norm is adopted in class-specific dictionary, which is different from general approaches. By doing so, the dictionary structure becomes adaptive and the representationability of the dictionary will be improved. Meanwhile, a nonconvex surrogate is used to approximate the rank function in dictionary decomposition in order to avoid a suboptimal solution of the original rank minimization, which can be solved by iteratively reweighted nuclear norm (IRNN) algorithm. Extensive experiments conducted on benchmark data sets have verified the effectiveness and advancement of the proposed algorithm compared with the state-of-the-art sparse representation methods.  相似文献   

9.
传统的聚类方法是在数据空间进行,且聚类数据的维度较高.为了解决这两个问题,提出了一种新的二进制图像聚类方法——基于离散哈希的聚类(CDH).该框架通过L21范数实现自适应的特征选择,从而降低数据的维度;同时通过哈希方法将数据映射到二进制的汉明空间,随后,在汉明空间中对稀疏的二进制矩阵进行低秩矩阵分解,完成图像的快速聚类...  相似文献   

10.
针对脉冲噪声干扰环境下传统稀疏自适应滤波稳态性能差,甚至无法收敛等问题,同时为提高稀疏参数辨识的精度的同时不增加过多计算代价,提出了一种基于广义最大Versoria准则(GMVC)的稀疏自适应滤波算法——带有CIM约束的GMVC(CIMGMVC)。首先,利用广义Versoria函数作为学习准则,其包含误差p阶矩的倒数形式,当脉冲干扰出现导致误差非常大时,GMVC将趋近于0,从而达到抑制脉冲噪声的目的。其次,将互相关熵诱导维度(CIM)作为稀疏惩罚约束和GMVC相结合来构建新代价函数,其中的CIM以高斯概率密度函数为基础,当选择合适核宽度时,可无限逼近于l0-范数。最后,应用梯度法推导出CIMGMVC算法,并分析了所提算法的均方收敛性。在Matlab平台上采用α-stable分布模型产生脉冲噪声进行仿真,实验结果表明所提出的CIMGMVC算法能有效地抑制非高斯脉冲噪声的干扰,在稳健性方面优于传统稀疏自适应滤波,且稳态误差低于GMVC算法。  相似文献   

11.
函数查询是大数据应用中重要的操作,查询解答问题一直是数据库理论中的核心问题。为了分析大数据上函数查询解答问题的复杂度,首先,使用映射归约方法将函数查询语言归约到已知的可判定语言,证明了函数查询解答问题的可计算性;其次,使用一阶语言描述函数查询,并分析了一阶语言的复杂度;在此基础上,使用NC-factor归约方法将函数查询类归约到已知的ΠΤQ-complete类中。证明函数查询解答问题经过PTIME(多项式时间)预处理后,可以在NC(并行多项式-对数)时间内求解。通过以上证明可以推出,函数查询解答问题在大数据上是可处理的。  相似文献   

12.
周玉彬  肖红  王涛  姜文超  熊梦  贺忠堂 《计算机应用》2021,41(11):3192-3199
针对工业机器人机械轴健康管理中检测效率和精准度较低的问题,提出了一种机械轴运行监控大数据背景下的基于动作周期退化相似性度量的健康指标(HI)构建方法,并结合长短时记忆(LSTM)网络进行机器人剩余寿命(RUL)的自动预测。首先,利用MPdist关注机械轴不同动作周期之间子周期序列相似性的特点,并计算正常周期数据与退化周期数据之间的偏离程度,进而构建HI;然后,利用HI集训练LSTM网络模型并建立HI与RUL之间的映射关系;最后,通过MPdist-LSTM混合模型自动计算RUL并适时预警。使用某公司六轴工业机器人进行实验,采集了加速老化数据约1 500万条,对HI单调性、鲁棒性和趋势性以及RUL预测的平均绝对误差(MAE)、均方根误差(RMSE)、决定系数(R2)、误差区间(ER)、早预测(EP)和晚预测(LP)等指标进行了实验测试,将该方法分别与动态时间规整(DTW)、欧氏距离(ED)、时域特征值(TDE)结合LSTM的方法,MPdist结合循环神经网络(RNN)和LSTM等方法进行比较。实验结果表明,相较于其他对比方法,所提方法所构建HI的单调性和趋势性分别至少提高了0.07和0.13,RUL预测准确率更高,ER更小,验证了所提方法的有效性。  相似文献   

13.
刘帅  蒋林  李远成  山蕊  朱育琳  王欣 《计算机应用》2022,42(5):1524-1530
针对大规模多输入多输出(MIMO)系统中,最小均方误差(MMSE)检测算法在可重构阵列结构上适应性差、计算复杂度高和运算效率低的问题,基于项目组开发的可重构阵列处理器,提出了一种基于MMSE算法的并行映射方法。首先,利用Gram矩阵计算时较为简单的数据依赖关系,设计时间上和空间上可以高度并行的流水线加速方案;其次,根据MMSE算法中Gram矩阵计算和匹配滤波计算模块相对独立的特点,设计模块化并行映射方案;最后,基于Xilinx Virtex-6开发板对映射方案进行实现并统计其性能。实验结果表明,该方法在MIMO规模为128×4128×8128×16的正交相移键控(QPSK)上行链路中,加速比分别2.80、4.04和5.57;在128×16的大规模MIMO系统中,可重构阵列处理器比专用硬件减少了42.6%的资源消耗。  相似文献   

14.
With the increasing amount of data, there is an urgent need for efficient sorting algorithms to process large data sets. Hardware sorting algorithms have attracted much attention because they can take advantage of different hardware’s parallelism. But the traditional hardware sort accelerators suffer “memory wall” problems since their multiple rounds of data transmission between the memory and the processor. In this paper, we utilize the in-situ processing ability of the ReRAM crossbar to design a new ReCAM array that can process the matrix-vector multiplication operation and the vector-scalar comparison in the same array simultaneously. Using this designed ReCAM array, we present ReCSA, which is the first dedicated ReCAM-based sort accelerator. Besides hardware designs, we also develop algorithms to maximize memory utilization and minimize memory exchanges to improve sorting performance. The sorting algorithm in ReCSA can process various data types, such as integer, float, double, and strings. We also present experiments to evaluate the performance and energy efficiency against the state-of-the-art sort accelerators. The experimental results show that ReCSA has 90.92×, 46.13×, 27.38×, 84.57×, and 3.36× speedups against CPU-, GPU-, FPGA-, NDP-, and PIM-based platforms when processing numeric data sets. ReCSA also has 24.82×, 32.94×, and 18.22× performance improvement when processing string data sets compared with CPU-, GPU-, and FPGA-based platforms.  相似文献   

15.
At ToSC 2019, Ankele et al. proposed a novel idea for constructing zero-correlation linear distinguishers in a related-tweakey model. This paper further clarifies this principle and gives a search model for zero-correlation distinguishers. As a result, for the first time, the authors construct 15-round and 17-round zero-correlation linear distinguishers for SKINNY-n-2n and SKINNY-n-3n, respectively, which are both two rounds longer than Anekele et al.’s. Based on these distinguishers, the paper presents related-tweakey zero-correlation linear attacks on 22-round SKINNY-n-2n and 26-round SKINNY-n-3n, respectively.  相似文献   

16.
The general problem of minimizing the maximal regret in combinatorial optimization problems with interval data is considered. In many cases, the minmax regret versions of the classical, polynomially solvable, combinatorial optimization problems become NP-hard and no approximation algorithms for them have been known. Our main result is a polynomial time approximation algorithm with a performance ratio of 2 for this class of problems.  相似文献   

17.
Block synchronization is an essential component of blockchain systems. Traditionally, blockchain systems tend to send all the transactions from one node to another for synchronization. However, such a method may lead to an extremely high network bandwidth overhead and significant transmission latency. It is crucial to speed up such a block synchronization process and save bandwidth consumption. A feasible solution is to reduce the amount of data transmission in the block synchronization process between any pair of peers. However, existing methods based on the Bloom filter or its variants still suffer from multiple roundtrips of communications and significant synchronization delay. In this paper, we propose a novel protocol named Gauze for fast block synchronization. It utilizes the Cuckoo filter (CF) to discern the transactions in the receiver’s mempool and the block to verify, providing an efficient solution to the problem of set reconciliation in the P2P (Peer-to-Peer Network) network. By up to two rounds of exchanging and querying the CFs, the sending node can acknowledge whether the transactions in a block are contained by the receiver’s mempool or not. Based on this message, the sender only needs to transfer the missed transactions to the receiver, which speeds up the block synchronization and saves precious bandwidth resources. The evaluation results show that Gauze outperforms existing methods in terms of the average processing latency (about 10× lower than Graphene) and the total synchronization space cost (about 10× lower than Compact Blocks) in different scenarios.  相似文献   

18.
On-line transaction processing (OLTP) systems rely on transaction logging and quorum-based consensus protocol to guarantee durability, high availability and strong consistency. This makes the log manager a key component of distributed database management systems (DDBMSs). The leader of DDBMSs commonly adopts a centralized logging method to writing log entries into a stable storage device and uses a constant log replication strategy to periodically synchronize its state to followers. With the advent of new hardware and high parallelism of transaction processing, the traditional centralized design of logging limits scalability, and the constant trigger condition of replication can not always maintain optimal performance under dynamic workloads. In this paper, we propose a new log manager named Salmo with scalable logging and adaptive replication for distributed database systems. The scalable logging eliminates centralized contention by utilizing a highly concurrent data structure and speedy log hole tracking. The kernel of adaptive replication is an adaptive log shipping method, which dynamically adjusts the number of log entries transmitted between leader and followers based on the real-time workload. We implemented and evaluated Salmo in the open-sourced transaction processing systems Cedar and DBx1000. Experimental results show that Salmo scales well by increasing the number of working threads, improves peak throughput by 1.56× and reduces latency by more than 4× over log replication of Raft, and maintains efficient and stable performance under dynamic workloads all the time.  相似文献   

19.
This paper describes an extremely fast polynomial time algorithm, the NOVCA (Near Optimal Vertex Cover Algorithm) that produces an optimal or near optimal vertex cover for any known undirected graph G (V, E). NOVCA is based on the idea of(l) including the vertex having maximum degree in the vertex cover and (2) rendering the degree of a vertex to zero by including all its adjacent vertices. The three versions of algorithm, NOVCA-I, NOVCA-II, and NOVCA-random, have been developed. The results identifying bounds on the size of the minimum vertex cover as well as polynomial complexity of algorithm are given with experimental verification. Future research efforts will be directed at tuning the algorithm and providing proof for better approximation ratio with NOVCA compared to any available vertex cover algorithms.  相似文献   

20.
This work is concerned with online learning from expert advice. Extensive work on this problem generated numerous expert advice algorithms whose total loss is provably bounded above in terms of the loss incurred by the best expert in hindsight. Such algorithms were devised for various problem variants corresponding to various loss functions. For some loss functions, such as the square, Hellinger and entropy losses, optimal algorithms are known. However, for two of the most widely used loss functions, namely the 0/1 and absolute loss, there are still gaps between the known lower and upper bounds.In this paper we present two new expert advice algorithms and prove for them the best known 0/1 and absolute loss bounds. Given an expert advice algorithm ALG, the goal is to form an upper bound on the regret L ALGL* of ALG, where L ALG is the loss of ALG and L* is the loss of the best expert in hindsight. Typically, regret bounds of a canonical form C · are sought where N is the number of experts and C is a constant. So far, the best known constant for the absolute loss function is C = 2.83, which is achieved by the recent IAWM algorithm of Auer et al. (2002). For the 0/1 loss function no bounds of this canonical form are known and the best known regret bound is , where C 1 = e – 2 and C 2 = 2 . This bound is achieved by a P-norm algorithm of Gentile and Littlestone (1999). Our first algorithm is a randomized extension of the guess and double algorithm of Cesa-Bianchi et al. (1997). While the guess and double algorithm achieves a canonical regret bound with C = 3.32, the expected regret of our randomized algorithm is canonically bounded with C = 2.49 for the absolute loss function. The algorithm utilizes one random choice at the start of the game. Like the deterministic guess and double algorithm, a deficiency of our algorithm is that it occasionally restarts itself and therefore forgets what it learned. Our second algorithm does not forget and enjoys the best known asymptotic performance guarantees for both the absolute and 0/1 loss functions. Specifically, in the case of the absolute loss, our algorithm is canonically bounded with C approaching and in the case of the 0/1 loss, with C approaching 3/ . In the 0/1 loss case the algorithm is randomized and the bound is on the expected regret.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号