共查询到20条相似文献,搜索用时 31 毫秒
1.
Learning to integrate web taxonomies 总被引:1,自引:0,他引:1
We investigate machine learning methods for automatically integrating objects from different taxonomies into a master taxonomy. This problem is not only currently pervasive on the Web, but is also important to the emerging Semantic Web. A straightforward approach to automating this process would be to build classifiers through machine learning and then use these classifiers to classify objects from the source taxonomies into categories of the master taxonomy. However, conventional machine learning algorithms totally ignore the availability of the source taxonomies. In fact, source and master taxonomies often have common categories under different names or other more complex semantic overlaps. We introduce two techniques that exploit the semantic overlap between the source and master taxonomies to build better classifiers for the master taxonomy. The first technique, Cluster Shrinkage, biases the learning algorithm against splitting source categories by making objects in the same category appear more similar to each other. The second technique, Co-Bootstrapping, tries to facilitate the exploitation of inter-taxonomy relationships by providing category indicator functions as additional features for the objects. Our experiments with real-world Web data show that these proposed add-on techniques can enhance various machine learning algorithms to achieve substantial improvements in performance for taxonomy integration. 相似文献
2.
As software becomes more complex, more sophisticated development and maintenance methods are needed to ensure software quality. Computer-aided prototyping achieves this via quickly built and iteratively updated prototypes of the intended system. This process requires automated support for keeping track of many independent changes and for exploring different combinations of alternative changes and refinements. This article formalizes the update and change merging process, extends the idea to multiple changes to the same base prototype, and introduces a new method of slicing prototypes. Applications of this technology include automatic updating of different versions of existing software with changes made to the baseline version of the system, integrating changes made by different design teams during development, and checking consistency after integration of seemingly disjoint changes to the same software system. 相似文献
3.
Hazem M. Bahig 《The Journal of supercomputing》2008,43(1):99-104
In this paper, we study the merging of two sorted arrays
and
on EREW PRAM with two restrictions: (1) The elements of two arrays are taken from the integer range [1,n], where n=Max(n
1,n
2). (2) The elements are taken from either uniform distribution or non-uniform distribution such that
, for 1≤i≤p (number of processors). We give a new optimal deterministic algorithm runs in
time using p processors on EREW PRAM. For
; the running time of the algorithm is O(log (g)
n) which is faster than the previous results, where log (g)
n=log log (g−1)
n for g>1 and log (1)
n=log n. We also extend the domain of input data to [1,n
k
], where k is a constant.
相似文献
Hazem M. BahigEmail: |
4.
While many (and more on the way) ocean color satellite sensors presently provide routine observations of ocean biological processes, limited concrete effort has taken place to demonstrate how these data can be used together in any systematic way. One obvious way is to merge these data streams together to provide robust merged climate data records with measurable uncertainty bounds. Here, we present and implement a formalism for merging global satellite ocean color data streams to produce uniform data products. Normalized water-leaving radiances (LwN(λ)) from SeaWiFS and MODIS are used together in a semianalytical ocean color merging model to produce global retrievals of 3 biogeochemically relevant variables (chlorophyll, combined dissolved and detrital absorption coefficient, particulate backscattering coefficient). The model-based merging approach has various benefits over techniques that blend end products, such as chlorophyll concentrations; (1) merging at the level of water-leaving radiance ensures simultaneity and consistency of the retrievals, (2) it works with single or multiple data sources regardless of their specific bands, (3) it exploits band redundancies and band differences, (4) it can account for the uncertainties of the incoming LwN(λ) data streams and, (5) it provides confidence intervals for the derived products. These features are illustrated through several examples of ocean color data merging using SeaWiFS and MODIS Terra and Aqua LwN(λ) imagery. Compared to each of the original data source, the products derived from the merging procedure show enhanced global daily coverage and lower uncertainties in the retrieved variables. 相似文献
5.
6.
Comparison of stream merging algorithms for media-on-demand 总被引:1,自引:0,他引:1
Stream merging is a technique for efficiently delivering popular media-on-demand using multicast and client buffers. Recently, several algorithms for stream merging have been proposed, and in this paper we perform a comprehensive comparison of these algorithms. We present the differences in philosophy and mechanics among the various algorithms and illustrate the tradeoffs between their system complexity and performance. We measure performance in total, maximum, and time-varying server bandwidth usage under different assumptions for the client request patterns. We also consider the effects on clients when the server has limited bandwidth. The result of this study is a deeper understanding of the system complexity and performance tradeoffs for the various algorithms.Amotz Bar-Noy: This work was done in part while the author was a member of the AT&T Labs-Research, Shannon Lab, Florham Park, NJ.Justin Goshi: Corresponding author. This work was done in part at AT&T Labs-Research, Shannon Lab, Florham Park, NJ.Richard E. Ladner: This work was done in part at AT&T Labs-Research, Shannon Lab, Florham Park, NJ and partially supported by NSF grants No. CCR-9732828 and CCR-0098012.A preliminary version of this paper appeared in Multimedia Computing and Networking 2002. 相似文献
7.
Updating generalized association rules with evolving taxonomies 总被引:1,自引:1,他引:1
Mining generalized association rules among items in the presence of taxonomies has been recognized as an important model for
data mining. Earlier work on mining generalized association rules, however, required the taxonomies to be static, ignoring
the fact that the taxonomies of items cannot necessarily be kept unchanged. For instance, some items may be reclassified from
one hierarchy tree to another for more suitable classification, abandoned from the taxonomies if they will no longer be produced,
or added into the taxonomies as new items. Additionally, the analysts might have to dynamically adjust the taxonomies from
different viewpoints so as to discover more informative rules. Under these circumstances, effectively updating the discovered
generalized association rules is a crucial task. In this paper, we examine this problem and propose two novel algorithms,
called Diff_ET and Diff_ET2, to update the discovered frequent itemsets. Empirical evaluation shows that the proposed algorithms
are very effective and have good linear scale-up characteristics. 相似文献
8.
This paper addresses the problem of grid map merging for multi-robot systems, which can be resolved by acquiring the map transformation matrix (MTM) among robot maps. Without the initial correspondence or any rendezvous among robots, the only way to acquire the MTM is to find and match the common regions of individual robot maps. This paper proposes a novel map merging technique which is capable of merging individual robot maps by matching the spectral information of robot maps. The proposed technique extracts the spectra of robot maps and enhances the extracted spectra using visual landmarks. Then, the MTM is accurately acquired by finding the maximum cross-correlation among the enhanced spectra. Experimental results in outdoor environments show that the proposed technique was performed successfully. Also, the comparison result shows that the map merging errors were significantly reduced by the proposed technique. 相似文献
9.
Carl G. LooneyAuthor Vitae 《Pattern recognition》2002,35(11):2413-2423
Major problems exist in both crisp and fuzzy clustering algorithms. The fuzzy c-means type of algorithms use weights determined by a power m of inverse distances that remains fixed over all iterations and over all clusters, even though smaller clusters should have a larger m. Our method uses a different “distance” for each cluster that changes over the early iterations to fit the clusters. Comparisons show improved results. We also address other perplexing problems in clustering: (i) find the optimal number K of clusters; (ii) assess the validity of a given clustering; (iii) prevent the selection of seed vectors as initial prototypes from affecting the clustering; (iv) prevent the order of merging from affecting the clustering; and (v) permit the clusters to form more natural shapes rather than forcing them into normed balls of the distance function. We employ a relatively large number K of uniformly randomly distributed seeds and then thin them to leave fewer uniformly distributed seeds. Next, the main loop iterates by assigning the feature vectors and computing new fuzzy prototypes. Our fuzzy merging then merges any clusters that are too close to each other. We use a modified Xie-Bene validity measure as the goodness of clustering measure for multiple values of K in a user-interaction approach where the user selects two parameters (for eliminating clusters and merging clusters after viewing the results thus far). The algorithm is compared with the fuzzy c-means on the iris data and on the Wisconsin breast cancer data. 相似文献
10.
Interest in database support for engineering applications is rapidly growing. In this paper we concentrate on conceptual database design and address the question of what a semantic model should look like, that meets the needs of engineering applications and is sufficiently formal to support validation, optimization, analysis, as well as transformation to an implementation schema. We present several case studies of engineering databases in order to determine major modelling requirements, and compare these to modelling concepts from the data base and knowledge representation fields. We demonstrate that the main issue is not adding further concepts, but to integrate the existing ones in a selective and precise fashion. We suggest to do so by tailoring the semantic model, starting from a set of base concepts and extending these. An initial model and an extensibility mechanism allowing an explicit and declarative definition of higher-order abstractions are presented. This is demonstrated by specifying some simple concepts such as generalization and a more complex time concept for image sequence evaluation. 相似文献
11.
Second-order tuple generating dependencies (SO tgds) were introduced by Fagin et al. to capture the composition of simple schema mappings. Testing the equivalence of SO tgds would be important for applications like model management and mapping optimization. However, we prove the undecidability of the logical equivalence of SO tgds. Moreover, under weak additional assumptions, we also show the undecidability of a relaxed notion of equivalence between two SO tgds, namely the so-called conjunctive query equivalence. 相似文献
12.
NoSQL systems have gained their popularity for many reasons, including the flexibility they provide in organizing data, as they relax the rigidity provided by the relational model and by the other structured models. This flexibility and the heterogeneity that has emerged in the area have led to a little use of traditional modeling techniques, as opposed to what has happened with databases for decades.In this paper, we argue how traditional notions related to data modeling can be useful in this context as well. Specifically, we propose NoAM (NoSQL Abstract Model), a novel abstract data model for NoSQL databases, which exploits the commonalities of various NoSQL systems. We also propose a database design methodology for NoSQL systems based on NoAM, with initial activities that are independent of the specific target system. NoAM is used to specify a system-independent representation of the application data and, then, this intermediate representation can be implemented in target NoSQL databases, taking into account their specific features. Overall, the methodology aims at supporting scalability, performance, and consistency, as needed by next-generation web applications. 相似文献
13.
多信息系统数据交换的研究与实现 总被引:1,自引:0,他引:1
该文基于对多个信息系统之间的数据交换需求的研究,为建立在异构数据库系统上的信息系统之间交换数据提供了一个安全、简捷、灵活的解决方案。并以此为基础,在北京工业大学信息系统建设中成功地实现了系统设计方案。 相似文献
14.
针对岩屑颗粒密集和颗粒表面纹理复杂的特点,提出一种基于熵率超像素分割和区域合并的分割方法。熵率超像素分割将图像分为一系列紧凑的、具有区域一致性的区域,不仅边缘定位准确且降低图像计算的复杂度;针对存在的过分割情况,提出一种结合颜色直方图和形状信息的合并准则,进行基于RAG结构的快速区域合并,得到最后分割结果。实验结果表明,将该方法用于岩屑颗粒图像分割,能够取得较好的实验效果。 相似文献
15.
16.
Haixun Wang Chang-Shing Perng Sheng Ma Philip S. Yu 《Knowledge and Information Systems》2005,8(1):82-102
Frequent itemset mining aims at discovering patterns the supports of which are beyond a given threshold. In many applications, including network event management systems, which motivated this work, patterns are composed of items each described by a subset of attributes of a relational table. As it involves an exponential mining space, the efficient implementation of user preferences and mining constraints becomes the first priority for a mining algorithm. User preferences and mining constraints are often expressed using patterns attribute structures. Unlike traditional methods that mine all frequent patterns indiscriminately, we regard frequent itemset mining as a two-step process: the mining of the pattern structures and the mining of patterns within each pattern structure. In this paper, we present a novel architecture that uses pattern structures to organize the mining space. In comparison with the previous techniques, the advantage of our approach is two-fold: (i) by exploiting the interrelationships among pattern structures, execution times for mining can be reduced significantly; and (ii) more importantly, it enables us to incorporate high-level simple user preferences and mining constraints into the mining process efficiently. These advantages are demonstrated by our experiments using both synthetic and real-life datasets. 相似文献
17.
18.
In this paper we introduce the database design tool EDO: an Evolutionary Database Optimizer. The term ‘evolutionary’ refers to a basic feature of the tool. After generating an initial pool of preliminary internal representations for a given conceptual data model, EDO allows a database designer to activate evolution strategies, modifying the preliminary internal representations into more desirable ones.
The quality of the internal representations found as yet is used to perform a guided walk through the solution space of alternative internal representations for the conceptual model under consideration. This quality (called fitness) takes into account the expected storage space and the expected average response time of a candidate internal representation. 相似文献
19.