期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张海军潘伟民木妮娜栾静《小型微型计算机系统》2012,33(9):1968-1971

现有的排序算法很难实现自定义顺序的字符串排序,提出一种自定义顺序的字符串快速排序方法.在应用连续编号定义字符排序顺序的基础上,使用哈希表结构将字符串转换成对应的整型数组,以字符的最大编号作为基数排序算法的新基数,实现字符串的基数排序.分析和实验表明,本文方法可有效实现自定义顺序的字符串排序,是一个时间和空间复杂度都是线性的排序算法,比快速排序(Quick Sort)具有更好的时间性能,且可以方便地推广到其它语言的字串排序中. 相似文献

2.

一种改进的中文字符串排序方法

下载免费PDF全文

张海军丁溪源朱朝勇《计算机工程与应用》2010,46(19):129-131

对中文字符串排序,最快算法的时间复杂度是O（nlgn）。基数排序算法是目前最快的排序方法之一,时间复杂度是O（dn）,但其一般适用于相同长度的整型数据排序。提出了一种快速的变换方法,将字符串转换为与之等长的整型数组,使用基数排序算法对代表字串的整型数组排序,用以实现对字符串的快速排序。实验表明,提出的算法能快速地进行中文字符串排序,比快速排序算法具有更好的性能,且排序时间与数据规模之间是线性关系,算法的时间复杂度为O（dn）。相似文献

3.

针对字符串等复杂数据的一种新的高效分档混合排序算法

何文明崔俊芝《小型微型计算机系统》2004,25(4):698-701

针对字符串及记录等复杂数据及分布不均匀的数字数据的排序问题，在全面分析与提高各种排序算法的优点的基础上提出了一种新的高效分档排序算法，并通过把它与最近排序方面的工作进行比较充分说明了它的优越性与先进性．相似文献

4.

关于汉字字符串排序算法

钟诚《中文信息学报》1999,13(6):62-65

分析汉字字符串分组排序算法,在讨论基选择的基础上,给出将字符串映射成整数和处理映射冲突数据的改进的有效方法。相似文献

5.

MED算法及其在网页搜索中的应用

下载免费PDF全文

叶福军《计算机工程》2010,36(2):36-38

针对传统方法不能很好地处理网页中简短域和用户查询之间的相关性排序问题,提出一种改进的编辑距离(MED)排序算法,在编码和计算过程中引入查询词分布的位置、顺序和距离等信息,将查询和简短域之间的相关性问题转化为编码字符串的相似性问题。仿真实验结果表明,与传统的相关性排序算法相比,该算法可以提高网页搜索中简短网页域的相关性排序性能。相似文献

6.

NOW系统上的并行快速排序算法 总被引：5，自引：0，他引：5

王小牛何珍祥《计算机应用》2002,22(7):15-17

介绍了在NOW系统上的并行快速排序算法的设计与实现，分析了影响算法性能的因素及改进方法，最后给出了该算法对字符串排序的并行效率为49．15％。相似文献

7.

一种适合Java环境的中文快速排序和模糊检索方法

刘焕焕陆锋赵云山《数字社区&智能家居》2009,(7)

涉及中文字符串记录的数据库管理是Java开发中的常见问题。由于Java语言对中文支持不足,导致中文字符串记录的排序不能很好地满足应用要求。该文在与当前中文排序方法比较分析的基础上,提出了一种通用的排序方法,适用于Java环境下中文字符串和数字类型记录的排序过程,较好地解决了中文字符串数据集记录的排序问题,并且针对记录添加和检索时易出现的谐音拼写错误,提出了谐音检索方法,提高了检索过程的容错和纠错性能。相似文献

8.

一种适合Java环境的中文快速排序和模糊检索方法

刘焕焕陆锋赵云山《数字社区&智能家居》2009,5(3):1664-1666

涉及中文字符串记录的数据库管理是Java开发中的常见问题。由于Java语言对中文支持不足,导致中文字符串记录的排序不能很好地满足应用要求。该文在与当前中文排序方法比较分析的基础上,提出了一种通用的排序方法,适用于Java环境下中文字符串和数字类型记录的排序过程,较好地解决了中文字符串数据集记录的排序问题,并且针对记录添加和检索时易出现的谐音拼写错误,提出了谐音检索方法,提高了检索过程的容错和纠错性能。相似文献

9.

一种改进的综合Borda元搜索引擎结果排序算法

李兵谭春《计算机光盘软件与应用》2014,(4):87-87,89

在传统的元搜索引擎中采用了Borda排序算法,即根据成员引擎检索结果的相关位置赋予一定分值,求和后按总分递减排序。这对各个成员引擎检索结果重叠度较高的检索排序非常有效,但对独立的搜索结果效果不好,速度较慢。另有根据检索字符串和检索结果的标题、摘要等相似度进行排序,这种排序快速,实现简单,但返回的信息简单,可能导致摘要信息多的排在前面,而不是内容相关的高的排在前面。针对这两种算法的不足,提出了一种改进的综合Borda排序算法,计算检索字符串和结果标题、摘要的相似度,再把相似度作为相关分值进行排序,它综合了两种算法。实验结果表明,他的查准率优于传统的Borda算法。相似文献

10.

组合的二进制表示及其算法

戴小平施荣华《电脑与信息技术》1997,5(2):26-28

本文讨论了一种用二进制字符串表示组合的方法，基于该方法，给出了按字典顺序生成所有组合的枚举递归算法。为了建立组合集与Ｉ集之间的一一映射关系，描述了相应的排序算法与逆排序算法。事实上，我们的这些算法与Ｋｎｏｔｔ算法相比具有明显的优越性。相似文献

11.

On the possible patterns of inputs for block sorting in the Burrows-Wheeler transformation

Takashi Saso Atsuyoshi Nakamura 《Information Processing Letters》2011,111(12):595-599

Block sorting in the Burrows-Wheeler transformation is to sort all of the n circular shifts of a string of length n lexicographically. We introduce a notion called the width of a sequence of n strings of length n and show that the values of widths are very different between the two types of sequences of strings; (1) a sequence of n randomly generated strings of length n, and (2) the sequence of n circular shifts of a randomly generated string of length n. 相似文献

12.

Compression techniques for fast external sorting

John Yiannis Justin Zobel 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(2):269-291

External sorting of large files of records involves use of disk space to store temporary files, processing time for sorting, and transfer time between CPU, cache, memory, and disk. Compression can reduce disk and transfer costs, and, in the case of external sorts, cut merge costs by reducing the number of runs. It is therefore plausible that overall costs of external sorting could be reduced through use of compression. In this paper, we propose new compression techniques for data consisting of sets of records. The best of these techniques, based on building a trie of variable-length common strings, provides fast compression and decompression and allows random access to individual records. We show experimentally that our trie-based compression leads to significant reduction in sorting costs; that is, it is faster to compress the data, sort it, and then decompress it than to sort the uncompressed data. While the degree of compression is not quite as great as can be obtained with adaptive techniques such as Lempel-Ziv methods, these cannot be applied to sorting. Our experiments show that, in comparison to approaches such as Huffman coding of fixed-length substrings, our novel trie-based method is faster and provides greater size reductions. Preliminary versions of parts of this paper, not including the work on vargram compression” [41] 相似文献

13.

一种检测汉语相似重复记录的有效方法 总被引：7，自引：0，他引：7

程国达苏杭丽《计算机应用》2005,25(6):1362-1365

消除重复记录可以提高数据质量。提出了按字段值种类数选择排序字段的方法。在相似重复记录的检测中,用第1个排序字段建立存储相似重复记录的二维链表,然后再用第2、第3个排序字段对二维链表中的记录进行排序-比较,以提高检测效果。为了正确地匹配汉字串,研究了由于缩写所造成的不匹配和读音、字型相似造成的输入错误。通过查找“相似汉字表”解决部分输入错误的问题,计算相似度函数判断被比较的记录是否是重复记录。实验表明,提出的方法能有效的检测汉语相似重复记录。相似文献

14.

键位相关速度当量的研究 总被引：2，自引：2，他引：0

陈一凡周志农《中文信息学报》1990,4(4):14-18

本文从二百多万个实验数据中统计分析出通用小键盘连续击键键位相关速度当量矩阵。键位相关速度当量表是优化汉字键盘输入键位设计的人机工程基础据, 也为自动浏定汉字键盘输入方法速度素质提供了科学依据。本文还介绍了采集数上述数据的实验设计原理和数据处理所采用的方法。相似文献

15.

Location and interpretation of destination addresses on handwritten Chinese envelopes

《Pattern recognition letters》2001,22(6-7):639-656

Virtually all mail sorting machines currently used in China only recognize post code and ignore the useful destination address information on the envelopes. This paper discusses how to efficiently utilize such important information on handwritten Chinese envelopes in order to improve the sorting performance. For this purpose, two particular problems are addressed, respectively. One is the location of destination address block (DAB) on the envelope, and a new bottom-up location method is described in detail. The other is the interpretation of handwritten Chinese destination address strings. We present our effort on using as many geometric constraints as possible in the string segmentation. Then a novel address interpretation algorithm with global optimization is proposed. It combines the segmentation, recognition and address context information by the best-path search. The effectiveness of the proposed algorithms is fully demonstrated by our experiments on real envelopes. 相似文献

16.

中文校对系统中纠错知识库的构造及纠错建议的产生算法 总被引：5，自引：1，他引：4

张仰森《中文信息学报》2001,15(5):34-40

本文依据待校对文本中的常见错误类型介绍了纠错知识库的构造方法以及基于该纠错知识库的自动纠错算法。该算法通过利用出错字串的特征,结合上下文启发信息,可有效地对文本中的别字、漏字、多字、易位、多字替换等错误提供纠错建议。文中还对纠错建议的排序算法进行了讨论。相似文献

17.

A VLSI algorithm for sorting variable-length character strings

Yuzuru Tanaka 《New Generation Computing》1985,3(3):307-328

This paper proposes a sorting hardware module that can directly cope with variable length character strings. It gives a pipelined heap sort algorithm for a set of variable-length character strings, and a VLSI architecture that implements this algorithm. The hardware consists of a specially designed single chip module and an external memory bank. This special chip module is called a V-Sort Engine Core. The number of words in the external memory bank should be larger than the total length of strings to be sorted. A hardware module that can sort no more than 2^L strings uses a V-Sort Engine core consisting ofL levels. Thei-th level of a V-Sort Engine Core has a logic cell and a memory bank with 2ⁱ words. Each word consists of three fields and a mark bit, i. e., level number, character, and path number. A triple (j, c, i) consisting of these field values denotes thej+1st characterc of thei-th input string. Concurrent execution of the external memory bank and all the level logic cells of the V-Sort Engine Core allows the hardware module to receive a sequence of strings sequentially character by character, and to begin the sequential output of the sort result immediately after receiving the last input character. It requires no extra time other than those required for sequential data transfer to and from itself. 相似文献

18.

Distributed sorting algorithms for multi-channel broadcast networks

《Theoretical computer science》1987,52(3):193-203

A multi-channel broadcast network is a distributed computation model in which p independent processors communicate over a set of p shared broadcast channels. Computation proceeds in synchronous cycles, during each of which the processors first write and read the channels, then perform local computations. Performance is measured in terms of the number of cycles used in the computation, where each bit to be transmitted is assumed to require a separate cycle. In this paper we investigate the problem of sorting p bit strings of uniform length m, each string initially located at a different processor in the broadcast network. We develop an efficient sorting method that first reduces the length of the strings without affecting their relative order, then proceeds using only the shorter strings. A sequence of three successively improved algorithms based on this approach is presented, the best of which runs in O(m + p log p) cycles. By showing a lower bound of Ω(m) cycles, we prove that the algorithm is optimal for sufficiently large m. Our results improve by a factor of log p the solution of the multiple identification problem presented by Landau, Yung and Galil (1985). 相似文献

19.

基于DUCET的藏文排序方法 总被引：1，自引：0，他引：1

黄鹤鸣契嘎·德熙嘉措《中文信息学报》2008,22(4):109-113

DUCET为每个藏文字符规定了排序码,但藏文音节的拼写复杂性使得藏文排序不能直接应用这些排序码,提出了基于DUCET的藏文音节排序方法,主要思想是首先,将二维的藏文音节转化成一维的字母串;其次,从DUCET中查出每个字母的排序码,得到藏文音节对应的排序码串;最后,通过比较排序码串实现藏文音节间的排序。还讨论了藏文音节与一般藏文字母串以及藏文字符串与外文字符串间的比较规则。相似文献