Chinese word segmentation is a difficult and challenging job because Chinese has no white space to mark word boundaries. Its result largely depends on the quality of the segmentation dictionary. Many domain phrases are cut into single words for they are not contained in the general dictionary. This paper demonstrates a Chinese domain phrase identification algorithm based on atomic word formation. First, atomic word formation algorithm is used to extract candidate strings from corpus after pretreatment. These extracted strings are stored as the candidate domain phrase set. Second, a lot of strategies such as repeated substring screening, part of speech (POS) combination filtering, and prefix and suffix filtering and so on are used to filter the candidate domain phrases. Third, a domain phrase refining method is used to determine whether a string is a domain phrase or not by calculating the domain relevance of this string. Finally, sort all the identified strings and then export them to users. With the help of morphological rules, this method uses the combination of statistical information and rules instead of corpus machine learning. Experiments proved that this method can obtain better results than traditional n-gram methods. 相似文献
Universal Access in the Information Society - Facilitating professional development of teachers has been considered as a critical factor for improving education quality. The use of an online... 相似文献
In this paper, a novel blind image watermarking scheme based on QR decomposition is proposed to embed color watermark image into color host image, which is significantly different from using the binary or gray image as watermark. When embedding watermark, the 24-bits color host image with size of 512?×?512 is divided into non-overlapping 4?×?4 pixel blocks and each pixel block is decomposed by QR. Then, according to the watermark information and the relation between the second row first column coefficient and the third row first column coefficient in the unitary matrix Q, the 24-bits color watermark image with size of 32?×?32 is embedded into the color host image. In addition, the new element compensatory method is used in the upper-triangle matrix R for reducing the visible distortion. When extracting watermark, only the watermarked image is needed. Compared with other SVD-based methods, the proposed method does not have the false-positive detection problem and has lower computational complexity, that is, the average running time of the proposed method only needs 1.481403 s. The experimental results show that the proposed method is robust against most common attacks including JPEG compression, JPEG 2000 compression, low-pass filtering, cropping, adding noise, blurring, rotation, scaling and sharpening et al. Compared with some related existing methods, the proposed algorithm has stronger robustness and better invisibility. 相似文献
At present, the binary images are often used as the original watermark images of many watermarking methods, but partial methods cannot be easily extended to colour image watermarking methods. For resolving this problem, we propose a new watermarking method using ternary coding and QR decomposition for colour image. In the procedure of embedding watermark, the colour image watermark is coded to ternary information; the colour host image is also separated into image blocks of sized 3?×?3, and these image blocks are further decomposed via QR decomposition; then, one ternary watermark is embedded into one orthogonal matrix Q of QR decomposition by the proposed rules. In the procedure of extracting watermark, the proposed method uses the blind-manner to extract the embedded ternary information. The novelty of this scheme lies in the proposed ternary coding for watermark image, which can improve the imperceptibility, embedded watermark capacity and real-time feature of the watermarking scheme. The results of simulation show the presented technique is better than other compared schemes with respect to imperceptibility, embedded watermark capacity and real-time feature under the similar robustness.