共查询到20条相似文献,搜索用时 0 毫秒
1.
No-space graphs present one solution to the familiar problem: given data on the occurrence of fossil taxa in separate, well-sampled sections, determine a range chart; that is, a reasonable working hypothesis of the total range in the area in question of each taxon studied.The solution presented here treats only the relative sequence of biostratigraphic events (first and last occurrences of taxa) and does not attempt to determine an amount of spacing between events.Relative to a hypothesized sequence, observed events in any section may be in-place or out-of-place. Out-of-place events may indicate (1) the event in question reflects a taxon that did not fill its entire range (unfilled-range event), or (2) the event in question indicates a need for the revision of the hypothesized sequence.A graph of relative position only (no-space graph) can be used to facilitate the recognition of in-place and out-of-place events by presenting a visual comparison of the observations from each section with the hypothesized sequence. The geometry of the graph as constructed here is such that in-place events will lie along a line series and out-of-place events will lie above or below it. First-occurrence events below the line series and last-occurrence events above the line series indicate unfilled ranges. First-occurrence events above the line series and last-occurrence events below the line series indicate a need for the revision of the hypothesis. Knowing this, the stratigrapher considers alternative positionings of the line series as alternative range hypotheses and seeks the line series that best fits his geologic and paleontologic judgment.No-space graphs are used to revise an initial hypothesis until a final hypothesis is reached. In this final hypothesis every event is found in-place in at least one section, and all events in all sections may be interpreted to represent in-place events or unfilled-range events. No event may indicate a need for further range revision.The application of the no-space graph method requires the assumption of lack of reworking and the assumption that taxa that are present in a single horizon indicate taxa whose ranges overlap. When applied to hypothetical and actual data, the no-space graph technique produces geologically reasonable range charts that compare favorably with results produced by other methods. 相似文献
2.
Common OCR (Optical Character Recognition) systems fail to detect and recognize small text strings of few characters, in particular when a text line is not horizontal. Such text regions are typical for chart images. In this paper we present an algorithm that is able to detect small text regions regardless of string orientation and font size or style. We propose to use this algorithm as a preprocessing step for text recognition with a common OCR engine. According to our experimental results, one can get up to 20 times better text recognition rate, and 15 times higher text recognition precision when the proposed algorithm is used to detect text location, size and orientation, before using an OCR system. Experiments have been performed on a benchmark set of 1000 chart images created with the XML/SWF Chart tool, which contain about 14000 text regions in total. 相似文献
3.
Deadlock detection is an important service that the run-time system of a parallel environment should provide. In parallel programs deadlock can occur when the different processes are waiting for various events, as opposed to concurrent systems, where deadlock occurs when processes wait for resources held by other processes. Therefore classical deadlock detection techniques such as checking for cycles in the wait-for graph are unapplicable. An alternative algorithm that checks whether all the processes are blocked is presented. This algorithm deals with situations in which the state transition from blocked to unblocked is indirect, as may happen when busy-waiting is used. 相似文献
4.
In this paper, we present a new text line detection method for handwritten documents. The proposed technique is based on a strategy that consists of three distinct steps. The first step includes image binarization and enhancement, connected component extraction, partitioning of the connected component domain into three spatial sub-domains and average character height estimation. In the second step, a block-based Hough transform is used for the detection of potential text lines while a third step is used to correct possible splitting, to detect text lines that the previous step did not reveal and, finally, to separate vertically connected characters and assign them to text lines. The performance evaluation of the proposed approach is based on a consistent and concrete evaluation methodology. 相似文献
6.
We present a new approach for the problem of finding overlapping communities in graphs and social networks. Our approach consists of a novel problem definition and three accompanying algorithms. We are particularly interested in graphs that have labels on their vertices, although our methods are also applicable to graphs with no labels. Our goal is to find k communities so that the total edge density over all k communities is maximized. In the case of labeled graphs, we require that each community is succinctly described by a set of labels. This requirement provides a better understanding for the discovered communities. The proposed problem formulation leads to the discovery of vertex-overlapping and dense communities that cover as many graph edges as possible. We capture these properties with a simple objective function, which we solve by adapting efficient approximation algorithms for the generalized maximum-coverage problem and the densest-subgraph problem. Our proposed algorithm is a generic greedy scheme. We experiment with three variants of the scheme, obtained by varying the greedy step of finding a dense subgraph. We validate our algorithms by comparing with other state-of-the-art community-detection methods on a variety of performance measures. Our experiments confirm that our algorithms achieve results of high quality in terms of the reported measures, and are practical in terms of performance. 相似文献
7.
This paper presents a new method for detecting and recognizing text in complex images and video frames. Text detection is performed in a two-step approach that combines the speed of a text localization step, enabling text size normalization, with the strength of a machine learning text verification step applied on background independent features. Text recognition, applied on the detected text lines, is addressed by a text segmentation step followed by an traditional OCR algorithm within a multi-hypotheses framework relying on multiple segments, language modeling and OCR statistics. Experiments conducted on large databases of real broadcast documents demonstrate the validity of our approach. 相似文献
8.
This study extends Duncan's [1] model to two different manufacturing process models in which the processes continue and discontinue in operations during the search for the assignable cause. A more realistic assumption considered in this paper is that the cost of repair and the net hourly out-of-control income are functions of detection delay. In the continuous model, detection delay is defined as the elapsed time from the time when the shift of the process occurs until it is identified by
control charts and the assignable cause is eliminated. The discontinuous model defines detection delay as the time interval from the occurrence of the process shift to the completion of testing a set of samples and interpreting the results. An efficient procedure is developed to determine the optimal designs without using any approximation approach. Thus, the proposed procedure can obtain the truly optimal designs rather than those approximate designs determined by Duncan [1] and other subsequent researchers. This paper illustrates several numerical examples and makes some relevant comparisons. The results indicate that this optimal solution procedure is more accurate than that of Panagos et al. [2]. Also, detection delay is sensitive to the economic design of
control charts. 相似文献
9.
在分析文本行特点的基础上,提出了一种利用水平梯度差进行文档图像的文本行检测算法。该算法首先对输入的文档图像进行水平梯度差计算,然后在局部窗口中求解最大梯度差并进行文本行区域的合并,通过非文本区域过滤来消除字符阶跃的跳变,最后将文档图像以行块的形式进行显示。实验结果表明,与投影算法进行相比,该算法对于行间距较小的文档图像的检测效果较好,时间复杂度较低并且检测的正确率较高,具有一定的鲁棒性和较好的适应性。 相似文献
12.
传统的文字检测方法在场景图像复杂背景、噪声污染和文字的多种形态特征的干扰下,检测的准确率很低,漏检、误检非常严重.针对这些问题,提出了基于形态成分分析(MCA)与判别字典学习的场景图像文字检测的方法.通过学习过完备字典将文字检测问题转化成稀疏和鲁棒表示的问题.利用MCA与改进的Fisher判别准则学习一个过完备字典,求解待检测图像文字部分的稀疏系数,重建待检测图像中的文字图像,进行文字检测.通过在ICDAR2003/2005/2011和MSRA-TDS00数据库中的大量的实验证明了与其他文字检测方法相比,该方法能有效提高检测准确率. 相似文献
13.
提出了用模式作为复杂类型数据的知识表示方法,结合结构化数据挖掘给出了基于复杂类型数据知识发现的结构模型——发现特征子空间模型DCSSM。在此基础上讨论了文本特征提取及文本挖掘的方法。 相似文献
14.
Making use of special tree search algorithms the present paper describes two new methods for determining all maximal complete subgraphs (cliques) of a finite nondirected graph. In both methods the blockwise generation of all cliques induces characteristic properties, which guarantee an efficient calculation of special clique subsets, especially the set of all cliques of maximal length. Moreover, by their structure both algorithms allow to calculate the complete clique set by parallel processing. The algorithms have been tested for many series of characteristic graphs and compared with the algorithm of Bron-Kerbosch (Algorithm 457 of CACM) the most efficient algorithm which is known to the authors. 相似文献
15.
We present the results of a user study that compares different ways of representing Dual-Scale data charts. Dual-Scale charts incorporate two different data resolutions into one chart in order to emphasize data in regions of interest or to enable the comparison of data from distant regions. While some design guidelines exist for these types of charts, there is currently little empirical evidence on which to base their design. We fill this gap by discussing the design space of Dual-Scale cartesian-coordinate charts and by experimentally comparing the performance of different chart types with respect to elementary graphical perception tasks such as comparing lengths and distances. Our study suggests that cut-out charts which include collocated full context and focus are the best alternative, and that superimposed charts in which focus and context overlap on top of each other should be avoided. 相似文献
16.
为了高速度、高质量地浏览网络上的大量中文文本,提出了一种文本凹凸树结构的可视化浏览机制,并给出其彤式描述.通过以关键字和概念词典标注的最小概念集标识结点建立文本分类的层次树结构,为用户快速洲览文本提供有效路径.通过统计方法进行文本摘要抽取,按大纲、逻辑主题词段落和摘要洲览文本内容,提高了搜索查询速度与阅读效率,满足了用户快速、主动浏览文本的需求. 相似文献
17.
Text detection is important in the retrieval of texts from digital pictures, video databases and webpages. However, it can be very challenging since the text is often embedded in a complex background. In this paper, we propose a classification-based algorithm for text detection using a sparse representation with discriminative dictionaries. First, the edges are detected by the wavelet transform and scanned into patches by a sliding window. Then, candidate text areas are obtained by applying a simple classification procedure using two learned discriminative dictionaries. Finally, the adaptive run-length smoothing algorithm and projection profile analysis are used to further refine the candidate text areas. The proposed method is evaluated on the Microsoft common test set, the ICDAR 2003 text locating set, and an image set collected from the web. Extensive experiments show that the proposed method can effectively detect texts of various sizes, fonts and colors from images and videos. 相似文献
18.
Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm for community detection. However, such sequential algorithms fail to scale for emerging large-scale data. Scalable parallel algorithms are necessary to process large graph datasets. In this work, we show a comparative analysis of our different parallel implementations of Louvain algorithm. We design parallel algorithms for Louvain method in shared memory and distributed memory settings. Developing distributed memory parallel algorithms is challenging because of inter-process communication and load balancing issues. We incorporate dynamic load balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms and shows around 12-fold speedup scaling to a larger number of processors. We also compare the performance of our algorithm with some other prominent algorithms in the literature and get better or comparable performance . We identify the challenges in developing distributed memory algorithm and provide an optimized solution DPLAL showing performance analysis of the algorithm on large-scale real-world networks from different domains. 相似文献
19.
文本挖掘是指使用数据挖掘技术,自动地从文本数据中发现和提取独立于用户信息需求的文档集中的隐含知识。而中文文本数据的获得是依靠中文信息处理技术来进行的,因而自动分词成为中文信息处理中的基础课题。对于海量信息处理的应用,分词的速度是极为重要的,对整个系统的效率有很大的影响。分析了几种常见的分词方法,设计了一个基于正向最大匹配法的中文自动分词系统。为了提高分词的精度,对加强歧义消除和词语优化的算法进行了研究处理。 相似文献
20.
文本挖掘是指使用数据挖掘技术,自动地从文本数据中发现和提取独立于用户信息需求的文档集中的隐含知识。而中文文本数据的获得是依靠中文信息处理技术来进行的,因而自动分词成为中文信息处理中的基础课题。对于海量信息处理的应用,分词的速度是极为重要的,对整个系统的效率有很大的影响。分析了几种常见的分词方法,设计了一个基于正向最大匹配法的中文自动分词系统。为了提高分词的精度,对加强歧义消除和词语优化的算法进行了研究处理。 相似文献
|