期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast Phylogeny Reconstruction Through Learning of Ancestral Sequences

Radu Mihaescu Cameron Hill Satish Rao 《Algorithmica》2013,66(2):419-449

Given natural limitations on the length DNA sequences, designing phylogenetic reconstruction methods which are reliable under limited information is a crucial endeavor. There have been two approaches to this problem: reconstructing partial but reliable information about the tree (Mossel in IEEE Comput. Biol. Bioinform. 4:108–116, 2007; Daskalakis et al. in SIAM J. Discrete Math. 25:872–893, 2011; Daskalakis et al. in Proc. of RECOMB 2006, pp. 281–295, 2006; Gronau et al. in Proc. of the 19th Annual SODA 2008, pp. 379–388, 2008), and reaching “deeper” in the tree through reconstruction of ancestral sequences. In the latter category, Daskalakis et al. (Proc. of the 38th Annual STOC, pp. 159–168, 2006) settled an important conjecture of M. Steel (My favourite conjecture. Preprint, 2001), showing that, under the CFN model of evolution, all trees on n leaves with edge lengths bounded by the Ising model phase transition can be recovered with high probability from genomes of length O(logn) with a polynomial time algorithm. Their methods had a running time of O(n ¹⁰). Here we enhance our methods from Daskalakis et al. (Proc. of RECOMB 2006, pp. 281–295, 2006) with the learning of ancestral sequences and provide an algorithm for reconstructing a sub-forest of the tree which is reliable given available data, without requiring a-priori known bounds on the edge lengths of the tree. Our methods are based on an intuitive minimum spanning tree approach and run in O(n ³) time. For the case of full reconstruction of trees with edges under the phase transition, we maintain the same asymptotic sequence length requirements as in Daskalakis et al. (Proc. of the 38th Annual STOC, pp. 159–168, 2006), despite the considerably faster running time. 相似文献

2.

Log-Space Algorithms for Paths and Matchings in k-Trees

Bireswar Das Samir Datta Prajakta Nimbhorkar 《Theory of Computing Systems》2013,53(4):669-689

Reachability and shortest path problems are NL-complete for general graphs. They are known to be in L for graphs of tree-width 2 (Jakoby and Tantau in Proceedings of FSTTCS’07: The 27th Annual Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 216–227, 2007). In this paper, we improve these bounds for k-trees, where k is a constant. In particular, the main results of our paper are log-space algorithms for reachability in directed k-trees, and for computation of shortest and longest paths in directed acyclic k-trees. Besides the path problems mentioned above, we also consider the problem of deciding whether a k-tree has a perfect matching (decision version), and if so, finding a perfect matching (search version), and prove that these two problems are L-complete. These problems are known to be in P and in RNC for general graphs, and in SPL for planar bipartite graphs, as shown in Datta et al. (Theory Comput. Syst. 47:737–757, 2010). Our results settle the complexity of these problems for the class of k-trees. The results are also applicable for bounded tree-width graphs, when a tree-decomposition is given as input. The technique central to our algorithms is a careful implementation of the divide-and-conquer approach in log-space, along with some ideas from Jakoby and Tantau (Proceedings of FSTTCS’07: The 27th Annual Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 216–227, 2007) and Limaye et al. (Theory Comput. Syst. 46(3):499–522, 2010). 相似文献

3.

Improved Approximation Algorithms for the Min-max Tree Cover and Bounded Tree Cover Problems

M. Reza Khani Mohammad R. Salavatipour 《Algorithmica》2014,69(2):443-460

In this paper we provide improved approximation algorithms for the Min-Max Tree Cover and Bounded Tree Cover problems. Given a graph G=(V,E) with weights w:E→?⁺, a set T ₁,T ₂,…,T _k of subtrees of G is called a tree cover of G if $V=\bigcup_{i=1}^{k} V(T_{i})$ . In the Min-Max k-tree Cover problem we are given graph G and a positive integer k and the goal is to find a tree cover with k trees, such that the weight of the largest tree in the cover is minimized. We present a 3-approximation algorithm for this improving the two different approximation algorithms presented in Arkin et al. (J. Algorithms 59:1–18, 2006) and Even et al. (Oper. Res. Lett. 32(4):309–315, 2004) with ratio 4. The problem is known to have an APX-hardness lower bound of $\frac{3}{2}$ (Xu and Wen in Oper. Res. Lett. 38:169–173, 2010). In the Bounded Tree Cover problem we are given graph G and a bound λ and the goal is to find a tree cover with minimum number of trees such that each tree has weight at most λ. We present a 2.5-approximation algorithm for this, improving the 3-approximation bound in Arkin et al. (J. Algorithms 59:1–18, 2006). 相似文献

4.

Generalising Unit-Refutation Completeness and SLUR via Nested Input Resolution

Matthew Gwynne Oliver Kullmann 《Journal of Automated Reasoning》2014,52(1):31-65

The class ${\mathcal{SLUR}}$ (Single Lookahead Unit Resolution) was introduced in Schlipf et al. (Inf Process Lett 54:133–137, 1995) as an umbrella class for efficient (poly-time) SAT solving, with linear-time SAT decision, while the recognition problem was not considered. ?epek et al. (2012) and Balyo et al. (2012) extended this class in various ways to hierarchies covering all of CNF (all clause-sets). We introduce a hierarchy ${\mathcal{SLUR}}_k$ which we argue is the natural “limit” of such approaches. The second source for our investigations is the class ${\mathcal{UC}}$ of unit-refutation complete clause-sets, introduced in del Val (1994) as a target class for knowledge compilation. Via the theory of “hardness” of clause-sets as developed in Kullmann (1999), Kullmann (Ann Math Artif Intell 40(3–4):303–352, 2004) and Ansótegui et al. (2008) we obtain a natural generalisation ${\mathcal{UC}}_k$ , containing those clause-sets which are “unit-refutation complete of level k”, which is the same as having hardness at most k. Utilising the strong connections to (tree-)resolution complexity and (nested) input resolution, we develop basic methods for the determination of hardness (the level k in ${\mathcal{UC}}_k$ ). A fundamental insight now is that ${\mathcal{SLUR}}_k = {\mathcal{UC}}_k$ holds for all k. We can thus exploit both streams of intuitions and methods for the investigations of these hierarchies. As an application we can easily show that the hierarchies from ?epek et al. (2012) and Balyo et al. (2012) are strongly subsumed by ${\mathcal{SLUR}}_k$ . Finally we consider the problem of “irredundant” clause-sets in ${\mathcal{UC}}_k$ . For 2-CNF we show that strong minimisations are possible in polynomial time, while already for (very special) Horn clause-sets minimisation is NP-complete. We conclude with an extensive discussion of open problems and future directions. We envisage the concepts investigated here to be the starting point for a theory of good SAT translations, which brings together the good SAT-solving aspects from ${\mathcal{SLUR}}$ together with the knowledge-representation aspects from ${\mathcal{UC}}$ , and expands this combination via notions of “hardness”. 相似文献

5.

A Combinatorial Analysis for the Critical Clause Tree

Masaki Yamamoto 《Theory of Computing Systems》2013,52(2):271-284

In Paturi, Pudlák, Saks, and Zane (Proceedings of the 39th Annual IEEE Symposium on Foundations of Computer Science (FOCS1998), pp. 628–637, 1998) proposed a simple randomized algorithm for finding a satisfying assignment of a k-CNF formula. The main lemma of the paper is as follows: Given a satisfiable k-CNF formula that has a d-isolated satisfying assignment z, the randomized algorithm finds z with probability at least $2^{-(1-\mu_{k}/(k-1)+\epsilon_{k}(d))n}$ , where $\mu_{k}/(k-1)=\sum_{i=1}^{\infty}1/(i((k-1)i+1))$ , and ? _k(d)=o _d(1). They estimated the lower bound of the probability in an analytical way, and used some asymptotics. In this paper, we analyze the same randomized algorithm, and estimate the probability in a combinatorial way. The lower bound we obtain is a little simpler: $2^{-(1-\mu_{k}(d)/(k-1))n}$ , where $\mu_{k}(d)/(k-1)=\sum_{i=1}^{d}1/(i((k-1)i+1))$ . This value is a little bit larger (i.e., better) than that of Paturi et al. (Proceedings of the 39th Annual IEEE Symposium on Foundations of Computer Science (FOCS1998), pp. 628–637, 1998) although the two values are asymptotically equal when d=ω(1). 相似文献

6.

A Uniform Paradigm to Succinctly Encode Various Families of Trees

Arash Farzan J. Ian Munro 《Algorithmica》2014,68(1):16-40

We propose a uniform method to encode various types of trees succinctly. These families include ordered (ordinal), k-ary (cardinal), and unordered (free) trees. We will show the approach is intrinsically suitable for obtaining entropy-based encodings of trees (such as the degree-distribution entropy). Previously-existing succinct encodings of trees use ad hoc techniques to encode each particular family of trees. Additionally, the succinct encodings obtained using the uniform approach improve upon the existing succinct encodings of each family of trees; in the case of ordered trees, it simplifies the encoding while supporting the full set of navigational operations. It also simplifies the implementation of many supported operations. The approach applied to k-ary trees yields a succinct encoding that supports both cardinal-type operations (e.g. determining the child label i) as well as the full set of ordinal-type operations (e.g. reporting the number of siblings to the left of a node). Previous work on succinct encodings of k-ary trees does not support both types of operations simultaneously (Benoit et al. in Algorithmica 43(4):275–292, 2005; Raman et al. in ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 233–242, 2002). For unordered trees, the approach achieves the first succinct encoding. The approach is based on two recursive decompositions of trees into subtrees. Recursive decomposition of a structure into substructures is a common technique in succinct encodings and has even been used to encode (ordered) trees (Geary et al. in ACM Trans. Algorithms 2(4):510–534, 2006; He et al. in ICALP, pp. 509–520, 2007) and dynamic binary trees (Munro et al. in ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 529–536, 2001; Storm in Representing dynamic binary trees succinctly, Master’s thesis, 2000). The main distinction of the approach in this paper is that a tree is decomposed into subtrees in a manner that the subtrees are maximally isolated from each other. This intermediate decomposition result is interesting in its own right and has proved useful in other applications (Farzan et al. in ICALP (1), pp. 451–462, 2009; Farzan and Munro in ICALP (1), pp. 439–450, 2009; Farzan and Kamali in ICALP, 2011). 相似文献

7.

A Simple D 2-Sampling Based PTAS for k-Means and Other Clustering Problems

Ragesh Jaiswal Amit Kumar Sandeep Sen 《Algorithmica》2014,70(1):22-46

Given a set of points $P \subset\mathbb{R}^{d}$ , the k-means clustering problem is to find a set of k centers $C = \{ c_{1},\ldots,c_{k}\}, c_{i} \in\mathbb{R}^{d}$ , such that the objective function ∑_x∈P e(x,C)², where e(x,C) denotes the Euclidean distance between x and the closest center in C, is minimized. This is one of the most prominent objective functions that has been studied with respect to clustering. D ²-sampling (Arthur and Vassilvitskii, Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’07, pp. 1027–1035, SIAM, Philadelphia, 2007) is a simple non-uniform sampling technique for choosing points from a set of points. It works as follows: given a set of points $P \subset\mathbb{R}^{d}$ , the first point is chosen uniformly at random from P. Subsequently, a point from P is chosen as the next sample with probability proportional to the square of the distance of this point to the nearest previously sampled point. D ²-sampling has been shown to have nice properties with respect to the k-means clustering problem. Arthur and Vassilvitskii (Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’07, pp. 1027–1035, SIAM, Philadelphia, 2007) show that k points chosen as centers from P using D ²-sampling give an O(logk) approximation in expectation. Ailon et al. (NIPS, pp. 10–18, 2009) and Aggarwal et al. (Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pp. 15–28, Springer, Berlin, 2009) extended results of Arthur and Vassilvitskii (Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’07, pp. 1027–1035, SIAM, Philadelphia, 2007) to show that O(k) points chosen as centers using D ²-sampling give an O(1) approximation to the k-means objective function with high probability. In this paper, we further demonstrate the power of D ²-sampling by giving a simple randomized (1+?)-approximation algorithm that uses the D ²-sampling in its core. 相似文献

8.

Bounds on Sizes of Finite Bisimulations of Pfaffian Dynamical Systems

Margarita Korovina Nicolai Vorobjov 《Theory of Computing Systems》2008,43(3-4):498-515

相似文献

9.

Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash

Martin Aumüller Martin Dietzfelbinger Philipp Woelfel 《Algorithmica》2014,70(3):428-456

It is shown that for cuckoo hashing with a stash as proposed by Kirsch et al. (Proc. 16th European Symposium on Algorithms (ESA), pp. 611–622, Springer, Berlin, 2008) families of very simple hash functions can be used, maintaining the favorable performance guarantees: with constant stash size s the probability of a rehash is O(1/n ^s+1), the lookup time and the deletion time are O(s) in the worst case, and the amortized expected insertion time is O(s) as well. Instead of the full randomness needed for the analysis of Kirsch et al. and of Kutzelnigg (Discrete Math. Theor. Comput. Sci., 12(3):81–102, 2010) (resp. Θ(logn)-wise independence for standard cuckoo hashing) the new approach even works with 2-wise independent hash families as building blocks. Both construction and analysis build upon the work of Dietzfelbinger and Woelfel (Proc. 35th ACM Symp. on Theory of Computing (STOC), pp. 629–638, 2003). The analysis, which can also be applied to the fully random case, utilizes a graph counting argument and is much simpler than previous proofs. The results can be generalized to situations where the stash size is non-constant. 相似文献

10.

X-Ray CT Image Reconstruction via Wavelet Frame Based Regularization and Radon Domain Inpainting

Bin Dong Jia Li Zuowei Shen 《Journal of scientific computing》2013,54(2-3):333-349

X-ray computed tomography (CT) has been playing an important role in diagnostic of cancer and radiotherapy. However, high imaging dose added to healthy organs during CT scans is a serious clinical concern. Imaging dose in CT scans can be reduced by reducing the number of X-ray projections. In this paper, we consider 2D CT reconstructions using very small number of projections. Some regularization based reconstruction methods have already been proposed in the literature for such task, like the total variation (TV) based reconstruction (Sidky and Pan in Phys. Med. Biol. 53:4777, 2008; Sidky et al. in J. X-Ray Sci. Technol. 14(2):119–139, 2006; Jia et al. in Med. Phys. 37:1757, 2010; Choi et al. in Med. Phys. 37:5113, 2010) and balanced approach with wavelet frame based regularization (Jia et al. in Phys. Med. Biol. 56:3787–3807, 2011). For most of the existing methods, at least 40 projections is usually needed to get a satisfactory reconstruction. In order to keep radiation dose as minimal as possible, while increase the quality of the reconstructed images, one needs to enhance the resolution of the projected image in the Radon domain without increasing the total number of projections. The goal of this paper is to propose a CT reconstruction model with wavelet frame based regularization and Radon domain inpainting. The proposed model simultaneously reconstructs a high quality image and its corresponding high resolution measurements in Radon domain. In addition, we discovered that using the isotropic wavelet frame regularization proposed in Cai et al. (Image restorations: total variation, wavelet frames and beyond, 2011, preprint) is superior than using its anisotropic counterpart. Our proposed model, as well as other models presented in this paper, is solved rather efficiently by split Bregman algorithm (Goldstein and Osher in SIAM J. Imaging Sci. 2(2):323–343, 2009; Cai et al. in Multiscale Model. Simul. 8(2):337–369, 2009). Numerical simulations and comparisons will be presented at the end. 相似文献

11.

Testing for Structural Breaks at Unknown Time: A Steeplechase

Makram El-Shagi Sebastian Giesen 《Computational Economics》2013,41(1):101-123

This paper analyzes the role of common data problems when identifying structural breaks in small samples. Most notably, we survey small sample properties of the most commonly applied endogenous break tests developed by Brown et al. (J R Stat Soc B 37:149–163, 1975) and Zeileis (Stat Pap 45(1):123–131, 2004), Nyblom (J Am Stat Assoc 84(405):223–230, 1989) and Hansen (J Policy Model 14(4):517–533, 1992), and Andrews et al. (J Econ 70(1):9–38, 1996). Power and size properties are derived using Monte Carlo simulations. We find that the Nyblom test is on par with the commonly used F type tests in a small sample in terms of power. While the Nyblom test’s power decreases if the structural break occurs close to the margin of the sample, it proves far more robust to nonnormal distributions of the error term that are found to matter strongly in small samples although being irrelevant asymptotically for all tests that are analyzed in this paper. 相似文献

12.

Cross-categorization of legal concepts across boundaries of legal systems: in consideration of inferential links

Fumiko Kano Glückstad Tue Herlau Mikkel N. Schmidt Morten Mørup 《Artificial Intelligence and Law》2014,22(1):61-108

This work contrasts Giovanni Sartor’s view of inferential semantics of legal concepts (Sartor in Artif Intell Law 17:217–251, 2009) with a probabilistic model of theory formation (Kemp et al. in Cognition 114:165–196, 2010). The work further explores possibilities of implementing Kemp’s probabilistic model of theory formation in the context of mapping legal concepts between two individual legal systems. For implementing the legal concept mapping, we propose a cross-categorization approach that combines three mathematical models: the Bayesian Model of Generalization (BMG; Tenenbaum and Griffiths in Behav Brain Sci 4:629–640, 2001), the probabilistic model of theory formation, i.e., the Infinite Relational Model (IRM) first introduced by Kemp et al. (The twenty-first national conference on artificial intelligence, 2006, Cognition 114:165–196, 2010) and its extended model, i.e., the normal-IRM (n-IRM) proposed by Herlau et al. (IEEE International Workshop on Machine Learning for Signal Processing, 2012). We apply our cross-categorization approach to datasets where legal concepts related to educational systems are respectively defined by the Japanese- and the Danish authorities according to the International Standard Classification of Education. The main contribution of this work is the proposal of a conceptual framework of the cross-categorization approach that, inspired by Sartor (Artif Intell Law 17:217–251, 2009), attempts to explain reasoner’s inferential mechanisms. 相似文献

13.

Functional un|unparsing

Kenichi Asai Oleg Kiselyov Chung-chieh Shan 《Higher-Order and Symbolic Computation》2011,24(4):311-340

Danvy??s functional unparsing problem (Danvy in J. Funct. Program. 8(6), 621?C625, 1998) is to implement a type-safe ??printf?? function, which converts a sequence of heterogeneous arguments to a string according to a given format. The dual problem is to implement a type-safe ??scanf?? function, which extracts a sequence of heterogeneous arguments from a string by interpreting (Friedman and Wand in LFP, pp. 348?C355, 1984 and in Essentials of Programming Languages, MIT Press, 2008) the same format as an equally heterogeneous sequence of patterns that binds zero or more variables. We derive multiple solutions to both problems (Wand in J. ACM 27(1), 164?C180, 1980) from their formal specifications (Wand in Theor. Comput. Sci. 20(1), 3?C32, 1982). On one hand, our solutions show how the Hindley-Milner type system, unextended, permits accessing heterogeneous sequences with the static assurance of type safety. On the other hand, our solutions demonstrate the use of control operators (Felleisen et al. in Proceedings of the 1988 ACM Conference on Lisp and Functional Programming, pp. 52?C62, ACM Press, New York, 1988; Wand in POPL 85: Conference Record of the Annual ACM Symposium on Principles of Programming Languages, vol. 16, ACM Press, New York, 1985; Meyer and Wand in Logics of Programs, Lecture Notes in Computer Science, vol. 193, pp. 219?C224, Springer, Berlin, 1985) to communicate with formats as coroutines (Wand in Proceedings of the 1980 ACM Conference on Lisp and Functional Programming, vol. 12, pp. 285?C299, ACM Press, New York, 1980 and Haynes et al. in LFP, pp. 293?C298, 1984). 相似文献

14.

String Indexing for Patterns with Wildcards

Philip Bille Inge Li Gørtz Hjalte Wedel Vildhøj Søren Vind 《Theory of Computing Systems》2014,55(1):41-60

We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results.

A linear space index with query time O(m+σ ^jloglogn+occ). This significantly improves the previously best known linear space index by Lam et al. (in Proc. 18th ISAAC, pp. 846–857, [2007]), which requires query time Θ(jn) in the worst case.
An index with query time O(m+j+occ) using space $O(\sigma^{k^{2}} n \log^{k} \log n)$ , where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.
A time-space trade-off, generalizing the index by Cole et al. (in Proc. 36th STOC, pp. 91–100, [2004]).

We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest. 相似文献

15.

Secure multiparty quantum secret sharing with the collective eavesdropping-check character

Gan Gao 《Quantum Information Processing》2013,12(1):55-68

Combining the block transmission in Long and Liu (Phys Rev A 65:032302, 2002) and the double operations in Lin et al. (Opt Commun 282:4455, 2009), we propose a secure multiparty quantum secret sharing protocol with the collective eavesdropping-check character. In this protocol, only the boss needs to prepare Bell states and perform Bell state measurements, and all agents only perform local operations, which makes this protocol more feasible with the current technique. Incidentally, we show that the other half of secret messages in Lin et al. protocol (Opt Commun 282:4455, 2009) may also be eavesdropped. 相似文献

16.

Fair online load balancing

Niv Buchbinder Joseph Naor 《Journal of Scheduling》2013,16(1):117-127

We revisit from a fairness point of view the problem of online load balancing in the restricted assignment model and the 1-∞ model. We consider both a job-centric and a machine-centric view of fairness, as proposed by Goel et al. (In: Symposium on discrete algorithms, pp. 384–390, 2005). These notions are equivalent to the approximate notion of prefix competitiveness proposed by Kleinberg et al. (In: Proceedings of the 40th annual symposium on foundations of computer science, p. 568, 2001), as well as to the notion of approximate majorization, and they generalize the well studied notion of max-min fairness. We resolve a question posed by Goel et al. (In: Symposium on discrete algorithms, pp. 384–390, 2005) proving that the greedy strategy is globally O(log?m)-fair, where m denotes the number of machines. This result improves upon the analysis of Goel et al. (In: Symposium on discrete algorithms, pp. 384–390, 2005) who showed that the greedy strategy is globally O(log?n)-fair, where n is the number of jobs. Typically, n?m, and therefore our improvement is significant. Our proof matches the known lower bound for the problem with respect to the measure of global fairness. The improved bound is obtained by analyzing, in a more accurate way, the more general restricted assignment model studied previously in Azar et al. (J. Algorithms 18:221–237, 1995). We provide an alternative bound which is not worse than the bounds of Azar et al. (J. Algorithms 18:221–237, 1995), and it is strictly better in many cases. The bound we prove is, in fact, much more general and it bounds the load on any prefix of most loaded machines. As a corollary from this more general bound we find that the greedy algorithm results in an assignment that is globally O(log?m)-balanced. The last result generalizes the previous result of Goel et al. (In: Symposium on discrete algorithms, pp. 384–390, 2005) who proved that the greedy algorithm yields an assignment that is globally O(log?m)-balanced for the 1-∞ model. 相似文献

17.

On anonymizing transactions with sensitive items

Shyue-Liang Wang Yu-Chuan Tsai Hung-Yu Kao Tzung-Pei Hong 《Applied Intelligence》2014,41(4):1043-1058

K-anonymity (Samarati and Sweeny 1998; Samarati, IEEE Trans Knowl Data Eng, 13(6):1010–1027, 2001; Sweeny, Int J Uncertain, Fuzziness Knowl-Based Syst, 10(5):557–570, 2002) and its variants, l-diversity (Machanavajjhala et al., ACM TKDD, 2007) and tcloseness (Li et al. 2007) among others are anonymization techniques for relational data and transaction data, which are used to protect privacy against re-identification attacks. A relational dataset D is k-anonymous if every record in D has at least k-1 other records with identical quasi-identifier attribute values. The combination of released data with external data will never allow the recipient to associate each released record with less than k individuals (Samarati, IEEE Trans Knowl Data Eng, 13(6):1010–1027, 2001). However, the current concept of k-anonymity on transaction data treats all items as quasi-identifiers. The anonymized data set has k identical transactions in groups and suffers from lower data utility (He and Naughton 2009; He et al. 2011; Liu and Wang 2010; Terrovitis et al., VLDB J, 20(1):83–106, 2011; Terrovitis et al. 2008). To improve the utility of anonymized transaction data, this work proposes a novel anonymity concept on transaction data that contain both quasi-identifier items (QID) and sensitive items (SI). A transaction that contains sensitive items must have at least k-1 other identical transactions (Ghinita et al. IEEE TKDE, 33(2):161–174, 2011; Xu et al. 2008). For a transaction that does not contain a sensitive item, no anonymization is required. A transaction dataset that satisfies this property is said to be sensitive k-anonymous. Three algorithms, Sensitive Transaction Neighbors (STN) Gray Sort Clustering (GSC) and Nearest Neighbors for K-anonymization (K-NN), are developed. These algorithms use adding/deleting QID items and only adding SI to achieve sensitive k-anonymity on transaction data. Additionally, a simple “privacy value” is proposed to evaluate the degree of privacy for different types of k-anonymity on transaction data. Extensive numerical simulations were carried out to demonstrate the characteristics of the proposed algorithms and also compared to other types of k-anonymity approaches. The results show that each technique possesses its own advantage under different criteria such as running time, operation, and information loss. The results obtained here can be used as a guideline of the selection of anonymization technique on different data sets and for different applications. 相似文献

18.

The Parameterized Complexity of Unique Coverage and Its Variants 总被引：1，自引：0，他引：1

Neeldhara Misra Hannes Moser Venkatesh Raman Saket Saurabh Somnath Sikdar 《Algorithmica》2013,65(3):517-544

In this paper we study the parameterized complexity of the Unique Coverage problem, a variant of the classic Set Cover problem. This problem admits several parameterizations and we show that all, except the standard parameterization and a generalization of it, are unlikely to be fixed-parameter tractable. We use results from extremal combinatorics to obtain the best-known kernel for Unique Coverage and the well-known color-coding technique of Alon et al. (J. ACM 42(4), 844–856, 1995) to show that a weighted version of this problem is fixed-parameter tractable. Our application of color-coding uses an interesting variation of s-perfect hash families called (k,s)-hash families which were studied by Alon et al. (J. Comb. Theory Ser. A 104(1), 207–215, 2003) in the context of a class of codes called parent identifying codes (Barg et al. in SIAM J. Discrete Math. 14(3), 423–431, 2001). To the best of our knowledge, this is the first application of (k,s)-hash families outside the domain of coding theory. We prove the existence of such families of size smaller than the best-known s-perfect hash families using the probabilistic method (Alon and Spencer in The Probabilistic Method, Wiley, New York, 2000). Explicit constructions of such families of size promised by the probabilistic method is open. 相似文献

19.

Bucking the trend: improved evaluation and annotation practices for ESL error detection systems

Joel Tetreault Martin Chodorow Nitin Madnani 《Language Resources and Evaluation》2014,48(1):5-31

The last decade has seen an explosion in the number of people learning English as a second language (ESL). In China alone, it is estimated to be over 300 million (Yang in Engl Today 22, 2006). Even in predominantly English-speaking countries, the proportion of non-native speakers can be very substantial. For example, the US National Center for Educational Statistics reported that nearly 10 % of the students in the US public school population speak a language other than English and have limited English proficiency (National Center for Educational Statistics (NCES) in Public school student counts, staff, and graduate counts by state: school year 2000–2001, 2002). As a result, the last few years have seen a rapid increase in the development of NLP tools to detect and correct grammatical errors so that appropriate feedback can be given to ESL writers, a large and growing segment of the world’s population. As a byproduct of this surge in interest, there have been many NLP research papers on the topic, a Synthesis Series book (Leacock et al. in Automated grammatical error detection for language learners. Synthesis lectures on human language technologies. Morgan Claypool, Waterloo 2010), a recurring workshop (Tetreault et al. in Proceedings of the NAACL workshop on innovative use of NLP for building educational applications (BEA), 2012), and a shared task competition (Dale et al. in Proceedings of the seventh workshop on building educational applications using NLP (BEA), pp 54–62, 2012; Dale and Kilgarriff in Proceedings of the European workshop on natural language generation (ENLG), pp 242–249, 2011). Despite this growing body of work, several issues affecting the annotation for and evaluation of ESL error detection systems have received little attention. In this paper, we describe these issues in detail and present our research on alleviating their effects. 相似文献

20.

A design-of-experiment based statistical technique for detection of key-frames

Snehasis Mukherjee Dipti Prasad Mukherjee 《Multimedia Tools and Applications》2013,62(3):847-877

In this paper decision variables for the key-frame detection problem in a video are evaluated using statistical tools derived from the theory of design of experiments. The pixel-by-pixel intensity difference of consecutive video frames is used as the factor or decision variable for designing an experiment for key-frame detection. The determination of a key-frame is correlated with the different values of the factor. A novel concept of meaningfulness of a video key-frame is also introduced to select the representative key-frame from a set of possible key-frames. The use of the concepts of design of experiments and the meaningfulness property to summarize a video is tested using a number of videos taken from MUSCLE-VCD-2007 dataset. The performance of the proposed approach in detecting key-frames is found to be superior in comparison to the competing approaches like PME based method (Liu et al., IEEE Trans Circuits Syst Video Technol 13(10):1006–1013, 2003; Mukherjee et al., IEEE Trans Circuits Syst Video Technol 17(5):612–620, 2007; Panagiotakis et al., IEEE Trans Circuits Syst Video Technol 19(3):447–451, 2009). 相似文献