期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Graph vs. bag representation models for the topic classification of web documents

George Papadakis George Giannakopoulos Georgios Paliouras 《World Wide Web》2016,19(5):887-920

Text classification constitutes a popular task in Web research with various applications that range from spam filtering to sentiment analysis. In this paper, we argue that its performance depends on the quality of Web documents, which varies significantly. For example, the curated content of news articles involves different challenges than the user-generated content of blog posts and Social Media messages. We experimentally verify our claim, quantifying the main factors that affect the performance of text classification. We also argue that the established bag-of-words representation models are inadequate for handling all document types, as they merely extract frequent, yet distinguishing terms from the textual content of the training set. Thus, they suffer from low robustness in the context of noisy or unseen content, unless they are enriched with contextual, application-specific information. In their place, we propose the use of n-gram graphs, a model that goes beyond the bag-of-words representation, transforming every document into a graph: its nodes correspond to character or word n-grams and the co-occurring ones are connected by weighted edges. Individual document graphs can be combined into class graphs and graph similarities are employed to position and classify documents into the vector space. This approach offers two advantages with respect to bag models: first, classification accuracy increases due to the contextual information that is encapsulated in the edges of the n-gram graphs. Second, it reduces the search space to a limited set of robust, endogenous features that depend on the number of classes, rather than the size of the vocabulary. Our thorough experimental study over three large, real-world corpora confirms the superior performance of n-gram graphs across the main types of Web documents. 相似文献

2.

Stemming Hausa text: using affix-stripping rules and reference look-up

Andrew?Bimba Email author Norisma?Idris Norazlina?Khamis Nurul?Fazmidar?Mohd?Noor 《Language Resources and Evaluation》2016,50(3):687-703

Stemming is a process of reducing a derivational or inflectional word to its root or stem by stripping all its affixes. It is been used in applications such as information retrieval, machine translation, and text summarization, as their pre-processing step to increase efficiency. Currently, there are a few stemming algorithms which have been developed for languages such as English, Arabic, Turkish, Malay and Amharic. Unfortunately, no algorithm has been used to stem text in Hausa, a Chadic language spoken in West Africa. To address this need, we propose stemming Hausa text using affix-stripping rules and reference lookup. We stemmed Hausa text, using 78 affix stripping rules applied in 4 steps and a reference look-up consisting of 1500 Hausa root words. The over-stemming index, under-stemming index, stemmer weight, word stemmed factor, correctly stemmed words factor and average words conflation factor were calculated to determine the effect of reference look-up on the strength and accuracy of the stemmer. It was observed that reference look-up aided in reducing both over-stemming and under-stemming errors, increased accuracy and has a tendency to reduce the strength of an affix stripping stemmer. The rationality behind the approach used is discussed and directions for future research are identified. 相似文献

3.

Efficient extraction of spatial relations for extended objects vis-à-vis human activity recognition in video

Shobhanjana Kalita Arindam Karmakar Shyamanta M. Hazarika 《Applied Intelligence》2018,48(1):204-219

相似文献

4.

Finding maximal ranges with unique topics in a text database

Zhihui Yang Huixin Ma Zhenying He X. Sean Wang 《World Wide Web》2018,21(2):289-310

Recent years have witnessed the rapid growth of text data, and thus the increasing importance of in-depth analysis of text data for various applications. Text data are often organized in a database with documents labeled by attributes like time and location. Different documents manifest different topics. The topics of the documents may change along the attributes of the documents, and such changes have been the subject of research in the past. However, previous analyses techniques, such as topic detection and tracking, topic lifetime, and burstiness, all focus on the topic behavior of the documents in a given attribute range without contrasting to the documents in the overall range. This paper introduces the concept of u n i q u e t o p i c s, referring to those topics that only appear frequently within a small range of documents but not in the whole range. These unique topics may reflect some unique characteristics of documents in this small range not found outside of the range. The paper aims at an efficient pruning-based algorithm that, for a user-given set of keywords and a user-given attribute, finds the maximal ranges along the given attribute and their unique topics that are highly related to the given keyword set. Thorough experiments show that the algorithm is effective in various scenarios. 相似文献

5.

On the Advice Complexity of the k-server Problem Under Sparse Metrics

Sushmita Gupta Shahin Kamali Alejandro López-Ortiz 《Theory of Computing Systems》2016,59(3):476-499

We consider the k-Server problem under the advice model of computation when the underlying metric space is sparse. On one side, we introduce Θ(1)-competitive algorithms for a wide range of sparse graphs. These algorithms require advice of (almost) linear size. We show that for graphs of size N and treewidth α, there is an online algorithm that receives O (n(log α + log log N))^* bits of advice and optimally serves any sequence of length n. We also prove that if a graph admits a system of μ collective tree (q, r)-spanners, then there is a (q + r)-competitive algorithm which requires O (n(log μ + log log N)) bits of advice. Among other results, this gives a 3-competitive algorithm for planar graphs, when provided with O (n log log N) bits of advice. On the other side, we prove that advice of size Ω(n) is required to obtain a 1-competitive algorithm for sequences of length n even for the 2-server problem on a path metric of size N ≥ 3. Through another lower bound argument, we show that at least $\frac {n}{2}(\log \alpha - 1.22)$ bits of advice is required to obtain an optimal solution for metric spaces of treewidth α, where 4 ≤ α < 2k. 相似文献

6.

On the Smallest Size of an Almost Complete Subset of a Conic in PG(2, <Emphasis Type="Italic">q</Emphasis>) and Extendability of Reed–Solomon Codes

D. Bartoli A. A. Davydov S. Marcugini F. Pambianco 《Problems of Information Transmission》2018,54(2):101-115

Abstract—In the projective plane PG(2, q), a subset S of a conic C is said to be almost complete if it can be extended to a larger arc in PG(2, q) only by the points of C \ S and by the nucleus of C when q is even. We obtain new upper bounds on the smallest size t(q) of an almost complete subset of a conic, in particular,

$$t(q) < \sqrt {q(3lnq + lnlnq + ln3)} + \sqrt {\frac{q}{{3\ln q}}} + 4 \sim \sqrt {3q\ln q} ,t(q) < 1.835\sqrt {q\ln q.} $$

The new bounds are used to extend the set of pairs (N, q) for which it is proved that every normal rational curve in the projective space PG(N, q) is a complete (q+1)-arc, or equivalently, that no [q+1,N+1, q?N+1]_q generalized doubly-extended Reed–Solomon code can be extended to a [q + 2,N + 1, q ? N + 2]_q maximum distance separable code.

相似文献

7.

Property-Preserving Data Reconstruction

Nir Ailon Bernard Chazelle Seshadhri Comandur Ding Liu 《Algorithmica》2008,51(2):160-182

We initiate a new line of investigation into online property-preserving data reconstruction. Consider a dataset which is assumed to satisfy various (known) structural properties; e.g., it may consist of sorted numbers, or points on a manifold, or vectors in a polyhedral cone, or codewords from an error-correcting code. Because of noise and errors, however, an (unknown) fraction of the data is deemed unsound, i.e., in violation with the expected structural properties. Can one still query into the dataset in an online fashion and be provided data that is always sound? In other words, can one design a filter which, when given a query to any item I in the dataset, returns a sound item J that, although not necessarily in the dataset, differs from I as infrequently as possible. No preprocessing should be allowed and queries should be answered online.We consider the case of a monotone function. Specifically, the dataset encodes a function f:{1,…,n}?? R that is at (unknown) distance ε from monotone, meaning that f can—and must—be modified at ε n places to become monotone.Our main result is a randomized filter that can answer any query in O(log?² nlog? log?n) time while modifying the function f at only O(ε n) places. The amortized time over n function evaluations is O(log?n). The filter works as stated with probability arbitrarily close to 1. We provide an alternative filter with O(log?n) worst case query time and O(ε nlog?n) function modifications. For reconstructing d-dimensional monotone functions of the form f:{1,…,n}^d? ? R, we present a filter that takes (2^O(d)(log?n)^4d?2log?log?n) time per query and modifies at most O(ε n ^d) function values (for constant d). 相似文献

8.

Enriching news events with meta-knowledge information

Paul Thompson Raheel Nawaz John McNaught Sophia Ananiadou 《Language Resources and Evaluation》2017,51(2):409-438

相似文献

9.

The formula of the solution for some classes of initial boundary value problems for the hyperbolic equation with two independent variables

V. L. Pryadiev A. V. Pryadiev 《Automation and Remote Control》2007,68(2):337-350

A new representation is proved of the solutions of initial boundary value problems for the equation of the form u _xx(x, t) + r(x)u _x(x, t) ? q(x)u(x, t) = u _tt(x, t) + μ(x)u _t(x, t) in the section (under boundary conditions of the 1st, 2nd, or 3rd type in any combination). This representation has the form of the Riemann integral dependent on the x and t over the given section. 相似文献

10.

Novel steganographic method based on generalized <Emphasis Type="Italic">K</Emphasis>-distance <Emphasis Type="Italic">N</Emphasis>-dimensional pixel matching

Bingwen?Feng Email author Wei?Lu Wei?Sun Email author 《Multimedia Tools and Applications》2015,74(21):9623-9646

In this paper, a steganographic scheme adopting the concept of the generalized K _d-distance N-dimensional pixel matching is proposed. The generalized pixel matching embeds a B-ary digit (B is a function of K and N) into a cover vector of length N, where the order-d Minkowski distance-measured embedding distortion is no larger than K. In contrast to other pixel matching-based schemes, a N-dimensional reference table is used. By choosing d, K, and N adaptively, an embedding strategy which is suitable for arbitrary relative capacity can be developed. Additionally, an optimization algorithm, namely successive iteration algorithm (SIA), is proposed to optimize the codeword assignment in the reference table. Benefited from the high dimensional embedding and the optimization algorithm, nearly maximal embedding efficiency is achieved. Compared with other content-free steganographic schemes, the proposed scheme provides better image quality and statistical security. Moreover, the proposed scheme performs comparable to state-of-the-art content-based approaches after combining with image models. 相似文献

11.

Analysis of Multi-Sort Algorithm on Multi-Mesh of Trees (MMT) architecture

Nitin Rakesh Nitin 《The Journal of supercomputing》2011,57(3):276-313

Various sorting algorithms using parallel architectures have been proposed in the search for more efficient results. This paper introduces the Multi-Sort Algorithm for Multi-Mesh of Trees (MMT) Architecture for N=n ⁴ elements with more efficient time complexity compared to previous architectures. The shear sort algorithm on Single Instruction Multiple Data (SIMD) mesh model requires $4\sqrt{N}+O\sqrt{N}$ time for sorting N elements, arranged on a $\sqrt{N}\times \sqrt{N}$ mesh, whereas Multi-Sort algorithm on the SIMD Multi-Mesh (MM) Architecture takes O(N ^1/4) time for sorting the same N elements, which proves that Multi-Sort is a better sorting approach. We have improved the time complexity of intrablock Sort. The Communication time complexity for 2D Sort in MM is O(n), whereas this time in MMT is O(log?n). The time complexity of compare–exchange step in MMT is same as that in MM, i.e., O(n). It has been found that the time complexity of the Multi-Sort on MMT has been improved as on Multi-Mesh architecture. 相似文献

12.

Directional Edge Boxes: Exploiting Inner Normal Direction Cues for Effective Object Proposal Generation

下载免费PDF全文

Xiang Bai Zheng Zhang Hong-Yang Wang Wei Shen 《计算机科学技术学报》2017,32(4):701-713

Edges are important cues for localizing object proposals. The recent progresses to this problem are mostly driven by defining effective objectness measures based on edge cues. In this paper, we develop a new representation named directional edges on which each edge pixel is assigned with a direction toward object center, through learning a direction prediction model with convolutional neural networks in a holistic manner. Based on directional edges, two new objectness measures are designed for ranking object proposals. Experiments show that the proposed method achieves 97.1% object recall at an overlap threshold of 0.5 and 81.9% object recall at an overlap threshold of 0.7 at 1 000 proposals on the PASCAL VOC 2007 test dataset, which is superior to the state-of-the-art methods. 相似文献

13.

Guaranteed approach with the farthest of the runaways

I. I. Shevchenko 《Automation and Remote Control》2008,69(5):828-844

Games of the family {Λ_N}_N?2 are formulated and studied with the application of generalized Isaacs’s approach. The game Λ_N is a simplest model of the counteraction of one persecutor P and coalition N of E ^N runaways for the case when the payoff is the distance up to the coalition of E ^N equal to the Euclidean distance between P and the farthest from the runaways; P is in command of the termination moment. Moreover, an approach within the limits of which in games with a smooth terminal payoff are generated strategies prescribing players’ motions in the directions of local gradients of the payoff is described. The approach is used for constructing pursuit strategies in games in which smooth approximations of the maximum of Euclidean distances up to the runaways are in place of payoffs. Pursuit strategies prescribing the motion in the direction of the farthest of the runaways are studied. A numerical simulation of the development of the games Λ₂ and Λ₃ is conducted in using different strategies by the players. 相似文献

14.

Observational constraints on a hyperbolic potential in brane-world inflation

Z.?Mounzi M.?Ferricha-Alami A.?Safsafi Email author M.?Bennai 《Gravitation and Cosmology》2017,23(1):84-89

We focus on the large field of a hyperbolic potential form, which is characterized by a parameter f, in the framework of the brane-world inflation in Randall-Sundrum-II model. From the observed form of the power spectrum P _R(k), the parameter f should be of order 0.1m _p to 0.001m _p, the brane tension must be in the range λ ~ (1?10)×10⁵⁷ GeV⁴, and the energy scale is around V₀ ^1/4 ~ 10¹⁵ GeV. We find that the inflationary parameters (n _s, r, and dn _s/d(ln k) depend only on the number of e-folds N. The compatibility of these parameters with the last Planck measurements is realized with large values of N. 相似文献

15.

The pursuit-evasion game on the 1-skeleton graph of a regular polyhedron. I

A. A. Azamov A. Sh. Kuchkarov A. G. Holboyev 《Automation and Remote Control》2017,78(4):754-761

We consider a game between a group of n pursuers and one evader moving with the same maximum velocity along the 1-skeleton graph of a regular polyhedron. The goal of the paper is finding, for each regular polyhedron M, a number N(M) with the following properties: if n ≥ N(M), the group of pursuers wins, while if n < N(M), the evader wins. Part I of the paper is devoted to the case of polyhedra in ?³; Part II will be devoted to the case of ?^d, d ≥ 5; and Part III, to the case of ?⁴. 相似文献

16.

On ruin probability minimization under excess reinsurance

Yu. G. Grigor’ev Dinh Le Son 《Automation and Remote Control》2007,68(6):1039-1054

The problem of ruin probability minimization in the Cramer-Lundberg risk model under excess reinsurance is studied. Together with traditional maximization of the Lundberg characteristic coefficient R is considered the problem of direct calculation of insurer’s ruin probability ? _r (x) as an initial-capital function x under the prescribed level of net-retention r. To solve this problem, we propose the excess variant of the Cramer integral equation which is an equivalent to the Hamilton-Jacobi-Bellman equation. The continuation method is used for solving this equation; by means of it is found the analytical solution to the Markov risk model. We demonstrated on a series of standard examples that with any admissible value of x the ruin probability ? _x (r): = ? _r (x) is usually a unimodal function r. A comparison of the analytic representation of ruin probability ? _r(x) with its asymptotic approximation with x → ∞ was conducted. 相似文献

17.

Two-armed bandit problem for parallel data processing systems

A. V. Kolnogorov 《Problems of Information Transmission》2012,48(1):72-84

We consider application of the two-armed bandit problem to processing a large number N of data where two alternative processing methods can be used. We propose a strategy which at the first stages, whose number is at most r ? 1, compares the methods, and at the final stage applies only the best one obtained from the comparison. We find asymptotically optimal parameters of the strategy and observe that the minimax risk is of the order of N ^α, where α = 2^r?1/(2^r ? 1). Under parallel processing, the total operation time is determined by the number r of stages but not by the number N of data. 相似文献

18.

Polar Codes with Higher-Order Memory

H.?Af?er Email author H.?Deli? 《Problems of Information Transmission》2018,54(4):301-328

We introduce a construction of a set of code sequences {C_n^(m) : n ≥ 1, m ≥ 1} with memory order m and code length N(n). {C_n^(m)} is a generalization of polar codes presented by Ar?kan in [1], where the encoder mapping with length N(n) is obtained recursively from the encoder mappings with lengths N(n ? 1) and N(n ? m), and {C_n^(m)} coincides with the original polar codes when m = 1. We show that {C_n^(m)} achieves the symmetric capacity I(W) of an arbitrary binary-input, discrete-output memoryless channel W for any fixed m. We also obtain an upper bound on the probability of block-decoding error P_e of {C_n^(m)} and show that ${P_e} = O({2^{ - {N^\beta }}})$ is achievable for β < 1/[1+m(? ? 1)], where ? ∈ (1, 2] is the largest real root of the polynomial F(m, ρ) = ρ^m ? ρ^{m ? 1} ? 1. The encoding and decoding complexities of {C_n^(m)} decrease with increasing m, which proves the existence of new polar coding schemes that have lower complexity than Ar?kan’s construction. 相似文献

19.

An Improved Extended Active Observer for Adaptive Control of A <Emphasis Type="Italic">n</Emphasis><Emphasis Type="Italic">−</Emphasis>DOF Robot Manipulator

Linping Chan Fazel Naghdy David Stirling 《Journal of Intelligent and Robotic Systems》2017,85(3-4):679-692

A novel algorithm for simultaneous force estimation and friction compensation of constrained motion of robot manipulators is presented. This represents an extension of the improved extended active observer (IEAOB) algorithm reported earlier and proposes a higher order IEAOB or N?th order IEAOB (IEAOB ?N) for a n?DOF robot manipulator. Central to this observer is the use of extra system states modeled as a Gauss-Markov (GM) formulation to estimate the force and disturbances including robot inertial parameters and friction. The stability of IEAOB ?N is verified through stability analysis. The IEAOB-1 is validated by applying it to a Phantom Omni haptic device against a Nicosia observer, disturbance observer (DOB)/reaction torque observer (RTOB), and nonlinear disturbance observer (NDO), respectively. The results show that the proposed IEAOB-1 is superior to the compared observers in terms of force estimation. Then, the performance of the IEAOB ? N is experimentally studied and compared to the IEAOB-1. Results demonstrate that the IEAOB ? N has an improved capability in tracking nonlinear external forces. 相似文献

20.

On the real complexity of a complex DFT

I. S. Sergeev 《Problems of Information Transmission》2017,53(3):284-293

We present a method to construct a theoretically fast algorithm for computing the discrete Fourier transform (DFT) of order N = 2ⁿ. We show that the DFT of a complex vector of length N is performed with complexity of 3.76875N log₂ N real operations of addition, subtraction, and scalar multiplication. 相似文献