首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 993 毫秒
1.
With the increasing amount of data, there is an urgent need for efficient sorting algorithms to process large data sets. Hardware sorting algorithms have attracted much attention because they can take advantage of different hardware’s parallelism. But the traditional hardware sort accelerators suffer “memory wall” problems since their multiple rounds of data transmission between the memory and the processor. In this paper, we utilize the in-situ processing ability of the ReRAM crossbar to design a new ReCAM array that can process the matrix-vector multiplication operation and the vector-scalar comparison in the same array simultaneously. Using this designed ReCAM array, we present ReCSA, which is the first dedicated ReCAM-based sort accelerator. Besides hardware designs, we also develop algorithms to maximize memory utilization and minimize memory exchanges to improve sorting performance. The sorting algorithm in ReCSA can process various data types, such as integer, float, double, and strings. We also present experiments to evaluate the performance and energy efficiency against the state-of-the-art sort accelerators. The experimental results show that ReCSA has 90.92×, 46.13×, 27.38×, 84.57×, and 3.36× speedups against CPU-, GPU-, FPGA-, NDP-, and PIM-based platforms when processing numeric data sets. ReCSA also has 24.82×, 32.94×, and 18.22× performance improvement when processing string data sets compared with CPU-, GPU-, and FPGA-based platforms.  相似文献   

2.
In this paper, we consider the k-prize-collecting minimum vertex cover problem with submodular penalties, which generalizes the well-known minimum vertex cover problem, minimum partial vertex cover problem and minimum vertex cover problem with submodular penalties. We are given a cost graph G=(V,E;c) and an integer k. This problem determines a vertex set SV such that S covers at least k edges. The objective is to minimize the total cost of the vertices in S plus the penalty of the uncovered edge set, where the penalty is determined by a submodular function. We design a two-phase combinatorial algorithm based on the guessing technique and the primal-dual framework to address the problem. When the submodular penalty cost function is normalized and nondecreasing, the proposed algorithm has an approximation factor of 3. When the submodular penalty cost function is linear, the approximation factor of the proposed algorithm is reduced to 2, which is the best factor if the unique game conjecture holds.  相似文献   

3.
On-line transaction processing (OLTP) systems rely on transaction logging and quorum-based consensus protocol to guarantee durability, high availability and strong consistency. This makes the log manager a key component of distributed database management systems (DDBMSs). The leader of DDBMSs commonly adopts a centralized logging method to writing log entries into a stable storage device and uses a constant log replication strategy to periodically synchronize its state to followers. With the advent of new hardware and high parallelism of transaction processing, the traditional centralized design of logging limits scalability, and the constant trigger condition of replication can not always maintain optimal performance under dynamic workloads. In this paper, we propose a new log manager named Salmo with scalable logging and adaptive replication for distributed database systems. The scalable logging eliminates centralized contention by utilizing a highly concurrent data structure and speedy log hole tracking. The kernel of adaptive replication is an adaptive log shipping method, which dynamically adjusts the number of log entries transmitted between leader and followers based on the real-time workload. We implemented and evaluated Salmo in the open-sourced transaction processing systems Cedar and DBx1000. Experimental results show that Salmo scales well by increasing the number of working threads, improves peak throughput by 1.56× and reduces latency by more than 4× over log replication of Raft, and maintains efficient and stable performance under dynamic workloads all the time.  相似文献   

4.
At ToSC 2019, Ankele et al. proposed a novel idea for constructing zero-correlation linear distinguishers in a related-tweakey model. This paper further clarifies this principle and gives a search model for zero-correlation distinguishers. As a result, for the first time, the authors construct 15-round and 17-round zero-correlation linear distinguishers for SKINNY-n-2n and SKINNY-n-3n, respectively, which are both two rounds longer than Anekele et al.’s. Based on these distinguishers, the paper presents related-tweakey zero-correlation linear attacks on 22-round SKINNY-n-2n and 26-round SKINNY-n-3n, respectively.  相似文献   

5.
刘帅  蒋林  李远成  山蕊  朱育琳  王欣 《计算机应用》2022,42(5):1524-1530
针对大规模多输入多输出(MIMO)系统中,最小均方误差(MMSE)检测算法在可重构阵列结构上适应性差、计算复杂度高和运算效率低的问题,基于项目组开发的可重构阵列处理器,提出了一种基于MMSE算法的并行映射方法。首先,利用Gram矩阵计算时较为简单的数据依赖关系,设计时间上和空间上可以高度并行的流水线加速方案;其次,根据MMSE算法中Gram矩阵计算和匹配滤波计算模块相对独立的特点,设计模块化并行映射方案;最后,基于Xilinx Virtex-6开发板对映射方案进行实现并统计其性能。实验结果表明,该方法在MIMO规模为128×4128×8128×16的正交相移键控(QPSK)上行链路中,加速比分别2.80、4.04和5.57;在128×16的大规模MIMO系统中,可重构阵列处理器比专用硬件减少了42.6%的资源消耗。  相似文献   

6.
Oblivious Cross-Tags (OXT) [1] is the first efficient searchable encryption (SE) protocol for conjunctive queries in a single-writer single-reader framework. However, it also has a trade-off between security and efficiency by leaking partial database information to the server. Recent attacks on these SE schemes show that the leakages from these SE schemes can be used to recover the content of queried keywords. To solve this problem, Lai et al. [2] propose Hidden Cross-Tags (HXT), which reduces the access pattern leakage from Keyword Pair Result Pattern (KPRP) to Whole Result Pattern (WRP). However, the WRP leakage can also be used to recover some additional contents of queried keywords. This paper proposes Improved Cross-Tags (IXT), an efficient searchable encryption protocol that achieves access and searches pattern hiding based on the labeled private set intersection. We also prove the proposed labeled private set intersection (PSI) protocol is secure against semi-honest adversaries, and IXT is L-semi-honest secure (L is leakage function). Finally, we do experiments to compare IXT with HXT. The experimental results show that the storage overhead and computation overhead of the search phase at the client-side in IXT is much lower than those in HXT. Meanwhile, the experimental results also show that IXT is scalable and can be applied to various sizes of datasets.  相似文献   

7.
蒋楚钰  方李西  章宁  朱建明 《计算机应用》2022,42(11):3438-3443
针对公证人机制中存在的公证人节点职能集中以及跨链交易效率较低等问题,提出一种基于公证人组的跨链交互安全模型。首先,将公证人节点分为三类角色,即交易验证者、连接者和监督者,由交易验证组成员打包经过共识的多笔交易成一笔大的交易,并利用门限签名技术对它进行签名;其次,被确认的交易会被置于跨链待转账池中,连接者随机选取多笔交易,利用安全多方计算和同态加密等技术判断交易的真实性;最后,若打包所有符合条件的交易的哈希值真实可靠且被交易验证组验证过,则连接者可以继续执行多笔跨链交易的批处理任务,并与区块链进行信息交互。安全性分析表明,该跨链机制有助于保护信息的机密性和数据的完整性,实现数据在不出库的情况下的协同计算,保障区块链跨链系统的稳定性。与传统的跨链交互安全模型相比,所提模型的签名次数和需要分配公证人组数的复杂度从O(n)降为O(1)。  相似文献   

8.
Secure k-Nearest Neighbor (k-NN) query aims to find k nearest data of a given query from an encrypted database in a cloud server without revealing privacy to the untrusted cloud and has wide applications in many areas, such as privacy-preserving machine learning and secure biometric identification. Several solutions have been put forward to solve this challenging problem. However, the existing schemes still suffer from various limitations in terms of efficiency and flexibility. In this paper, we propose a new encrypt-then-index strategy for the secure k-NN query, which can simultaneously achieve sub-linear search complexity (efficiency) and support dynamical update over the encrypted database (flexibility). Specifically, we propose a novel algorithm to transform the encrypted database and encrypted query points in the cloud. By indexing the transformed database using spatial data structures such as the R-tree index, our strategy enables sub-linear complexity for secure k-NN queries and allows users to dynamically update the encrypted database. To the best of our knowledge, the proposed strategy is the first to simultaneously provide these two properties. Through theoretical analysis and extensive experiments, we formally prove the security and demonstrate the efficiency of our scheme.  相似文献   

9.
In this paper, we propose a lightweight network with an adaptive batch normalization module, called Meta-BN Net, for few-shot classification. Unlike existing few-shot learning methods, which consist of complex models or algorithms, our approach extends batch normalization, an essential part of current deep neural network training, whose potential has not been fully explored. In particular, a meta-module is introduced to learn to generate more powerful affine transformation parameters, known as γ and β, in the batch normalization layer adaptively so that the representation ability of batch normalization can be activated. The experimental results on miniImageNet demonstrate that Meta-BN Net not only outperforms the baseline methods at a large margin but also is competitive with recent state-of-the-art few-shot learning methods. We also conduct experiments on Fewshot-CIFAR100 and CUB datasets, and the results show that our approach is effective to boost the performance of weak baseline networks. We believe our findings can motivate to explore the undiscovered capacity of base components in a neural network as well as more efficient few-shot learning methods.  相似文献   

10.
A k-CNF (conjunctive normal form) formula is a regular (k, s)-CNF one if every variable occurs s times in the formula, where k≥2 and s>0 are integers. Regular (3, s)- CNF formulas have some good structural properties, so carrying out a probability analysis of the structure for random formulas of this type is easier than conducting such an analysis for random 3-CNF formulas. Some subclasses of the regular (3, s)-CNF formula have also characteristics of intractability that differ from random 3-CNF formulas. For this purpose, we propose strictly d-regular (k, 2s)-CNF formula, which is a regular (k, 2s)-CNF formula for which d≥0 is an even number and each literal occurs sd2 or s+d2 times (the literals from a variable x are x and ¬x, where x is positive and ¬x is negative). In this paper, we present a new model to generate strictly d-regular random (k, 2s)-CNF formulas, and focus on the strictly d-regular random (3, 2s)-CNF formulas. Let F be a strictly d-regular random (3, 2s)-CNF formula such that 2s>d. We show that there exists a real number s0 such that the formula F is unsatisfiable with high probability when s>s0, and present a numerical solution for the real number s0. The result is supported by simulated experiments, and is consistent with the existing conclusion for the case of d= 0. Furthermore, we have a conjecture: for a given d, the strictly d-regular random (3, 2s)-SAT problem has an SAT-UNSAT (satisfiable-unsatisfiable) phase transition. Our experiments support this conjecture. Finally, our experiments also show that the parameter d is correlated with the intractability of the 3-SAT problem. Therefore, our research maybe helpful for generating random hard instances of the 3-CNF formula.  相似文献   

11.
Local holographic transformations were introduced by Cai et al., and local affine functions, an extra tractable class, were derived by it in #CSP2. In the present paper, we not only generalize local affine functions to #CSPd for general d, but also give new tractable classes by combining local holographic transformations with global holographic transformations. Moreover, we show how to use local holographic transformations to prove hardness. This is of independent interests in the complexity classification of counting problems.  相似文献   

12.
王梅  许传海  刘勇 《计算机应用》2021,41(12):3462-3467
多核学习方法是一类重要的核学习方法,但大多数多核学习方法存在如下问题:多核学习方法中的基核函数大多选择传统的具有浅层结构的核函数,在处理数据规模大且分布不平坦的问题时表示能力较弱;现有的多核学习方法的泛化误差收敛率大多为O1/n,收敛速度较慢。为此,提出了一种基于神经正切核(NTK)的多核学习方法。首先,将具有深层次结构的NTK作为多核学习方法的基核函数,从而增强多核学习方法的表示能力。然后,根据主特征值比例度量证明了一种收敛速率可达O1/n的泛化误差界;在此基础上,结合核对齐度量设计了一种全新的多核学习算法。最后,在多个数据集上进行了实验,实验结果表明,相比Adaboost和K近邻(KNN)等分类算法,新提出的多核学习算法具有更高的准确率和更好的表示能力,也验证了所提方法的可行性与有效性。  相似文献   

13.
The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the existing deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific computing and deep learning workloads. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient codes for deep learning workloads on Sunway. The experiment results show that the codes generated by swTVM achieve 1.79× improvement of inference latency on average compared to the state-of-the-art deep learning framework on Sunway, across eight representative benchmarks. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and Sunway processor particularly with productivity and efficiency in mind. We believe this work will encourage more people to embrace the power of deep learning and Sunway many-core processor.  相似文献   

14.
为了解决现有认证方式无法追踪恶意节点、集中式认证中心易受攻击的问题,基于区块链技术的去中心化、防篡改、可追溯特点,提出了一种分布式轻量化移动自组网认证方案。该方案包括对等网络建立、认证账本设计和共识机制选择,并分析讨论了智能合约的作用。在所提方案中,当给定对等节点之间最大传输时延τmax时,节点认证时间δ和网络规模Nt^all间的关系满足(Nt^all/3+1)τmax≤δ≤3(Nt^all+1)τmax.  相似文献   

15.
朱槐雨  李博 《计算机应用》2021,41(11):3234-3241
无人机(UAV)航拍图像视野开阔,图像中的目标较小且边缘模糊,而现有单阶段多框检测器(SSD)目标检测模型难以准确地检测航拍图像中的小目标。为了有效地解决原有模型容易漏检的问题,借鉴特征金字塔网络(FPN)提出了一种基于连续上采样的SSD模型。改进SSD模型将输入图像尺寸调整为320×320,新增Conv3_3特征层,将高层特征进行上采样,并利用特征金字塔结构对VGG16网络前5层特征进行融合,从而增强各个特征层的语义表达能力,同时重新设计先验框的尺寸。在公开航拍数据集UCAS-AOD上训练并验证,实验结果表明,所提改进SSD模型的各类平均精度均值(mAP)达到了94.78%,与现有SSD模型相比,其准确率提升了17.62%,其中飞机类别提升了4.66%,汽车类别提升了34.78%。  相似文献   

16.
Dimensionality reduction (DR) methods based on sparse representation as one of the hottest research topics have achieved remarkable performance in many applications in recent years. However, it’s a challenge for existing sparse representation based methods to solve nonlinear problem due to the limitations of seeking sparse representation of data in the original space. Motivated by kernel tricks, we proposed a new framework called empirical kernel sparse representation (EKSR) to solve nonlinear problem. In this framework, nonlinear separable data are mapped into kernel space in which the nonlinear similarity can be captured, and then the data in kernel space is reconstructed by sparse representation to preserve the sparse structure, which is obtained by minimizing a ?1 regularization-related objective function. EKSR provides new insights into dimensionality reduction and extends two models: 1) empirical kernel sparsity preserving projection (EKSPP), which is a feature extraction method based on sparsity preserving projection (SPP); 2) empirical kernel sparsity score (EKSS), which is a feature selection method based on sparsity score (SS). Both of the two methods can choose neighborhood automatically as the natural discriminative power of sparse representation. Compared with several existing approaches, the proposed framework can reduce computational complexity and be more convenient in practice.  相似文献   

17.
The problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem. Recently, subgraph matching has become a popular research topic in the field of knowledge graph analysis, which has a wide range of applications including question answering and semantic search. In this paper, we study the problem of subgraph matching on knowledge graph. Specifically, given a query graph q and a data graph G, the problem of subgraph matching is to conduct all possible subgraph isomorphic mappings of q on G. Knowledge graph is formed as a directed labeled multi-graph having multiple edges between a pair of vertices and it has more dense semantic and structural features than general graph. To accelerate subgraph matching on knowledge graph, we propose a novel subgraph matching algorithm based on subgraph index for knowledge graph, called as F G q T-Match. The subgraph matching algorithm consists of two key designs. One design is a subgraph index of matching-driven flow graph ( F G q T), which reduces redundant calculations in advance. Another design is a multi-label weight matrix, which evaluates a near-optimal matching tree for minimizing the intermediate candidates. With the aid of these two key designs, all subgraph isomorphic mappings are quickly conducted only by traversing F G q T. Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.  相似文献   

18.
Robust regression plays an important role in many machine learning problems. A primal approach relies on the use of Huber loss and an iteratively reweighted l2 method. However, because the Huber loss is not smooth and its corresponding distribution cannot be represented as a Gaussian scale mixture, such an approach is extremely difficult to handle using a probabilistic framework. To address those limitations, this paper proposes two novel losses and the corresponding probability functions. One is called Soft Huber, which is well suited for modeling non-Gaussian noise. Another is Nonconvex Huber, which can help produce much sparser results when imposed as a prior on regression vector. They can represent any lq loss (12q<2) with tuning parameters, which makes the regression modelmore robust. We also show that both distributions have an elegant form, which is a Gaussian scale mixture with a generalized inverse Gaussian mixing density. This enables us to devise an expectation maximization (EM) algorithm for solving the regression model.We can obtain an adaptive weight through EM, which is very useful to remove noise data or irrelevant features in regression problems. We apply our model to the face recognition problem and show that it not only reduces the impact of noise pixels but also removes more irrelevant face images. Our experiments demonstrate the promising results on two datasets.  相似文献   

19.
Combinatorial optimization in the face of uncertainty is a challenge in both operational research and machine learning. In this paper, we consider a special and important class called the adversarial online combinatorial optimization with semi-bandit feedback, in which a player makes combinatorial decisions and gets the corresponding feedback repeatedly. While existing algorithms focus on the regret guarantee or assume there exists an efficient offline oracle, it is still a challenge to solve this problem efficiently if the offline counterpart is NP-hard. In this paper, we propose a variant of the Followthe-Perturbed-Leader (FPL) algorithm to solve this problem. Unlike the existing FPL approach, our method employs an approximation algorithm as an offline oracle and perturbs the collected data by adding nonnegative random variables. Our approach is simple and computationally efficient. Moreover, it can guarantee a sublinear (1+ ε)-scaled regret of order O(T23) for any small ε>0 for an important class of combinatorial optimization problems that admit an FPTAS (fully polynomial time approximation scheme), in which Tis the number of rounds of the learning process. In addition to the theoretical analysis, we also conduct a series of experiments to demonstrate the performance of our algorithm.  相似文献   

20.
Reactive synthesis is a technique for automatic generation of a reactive system from a high level specification. The system is reactive in the sense that it reacts to the inputs from the environment. The specification is in general given as a linear temporal logic (LTL) formula. The behaviour of the system interacting with the environment can be represented as a game in which the system plays against the environment. Thus, a problem of reactive synthesis is commonly treated as solving such a game with the specification as the winning condition. Reactive synthesis has been thoroughly investigated for more two decades. A well-known challenge is to deal with the complex uncertainty of the environment. We understand that a major issue is due to the lack of a sufficient treatment of probabilistic properties in the traditional models. For example, a two-player game defined by a standard Kriple structure does not consider probabilistic transitions in reaction to the uncertain physical environment; and a Markov Decision Process (MDP) in general does not explicitly separate the system from its environment and it does not describe the interaction between system and the environment. In this paper, we propose a new and more general model which combines the two-player game and the MDP. Furthermore, we study probabilistic reactive synthesis for the games of General Reactivity of Rank 1 (i.e., GR(1)) defined in this model. More specifically, we present an algorithm, which for given model M, a location s and a GR(1) specification P, determines the strategy for each player how to maximize/minimize the probabilities of the satisfaction of P at location s. We use an example to describe the model of probabilistic games and demonstrate our algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号