首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
RSS news articles that are either partially or completely duplicated in content are easily found on the Internet these days, which require Web users to sort through the articles to identify non-redundant information. This manual-filtering process is time-consuming and tedious. In this paper, we present a new filtering and clustering approach, called FICUS, which starts with identifying and eliminating redundant RSS news articles using a fuzzy set information retrieval approach and then clusters the remaining non-redundant RSS news articles according to their degrees of resemblance. FICUS uses a tree hierarchy to organize clusters of RSS news articles. The contents of the respective clusters are captured by the representative keywords from RSS news articles in the clusters so that searching and retrieval of similar RSS news articles is fast and efficient. FICUS is simple, since it uses the pre-defined word-correlation factors to determine related (words in) RSS news articles and filter redundant ones, and is supported by well-known and yet simple mathematical models, such as the standard deviation, vector space model, and probability theory, to generate clusters of non-redundant RSS news articles. Experiments performed on (test sets of) RSS news articles on various topics, which were downloaded from different online sources, verify the accuracy of FICUS on eliminating redundant RSS news articles, clustering similar RSS news articles together, and segregating different RSS news articles in terms of their?contents. In addition, further empirical studies show that FICUS outperforms well-known approaches adopted for clustering RSS news articles.  相似文献   

2.
Mining the interests of Chinese microbloggers via keyword extraction   总被引:1,自引:0,他引:1  
Microblogging provides a new platform for communicating and sharing information among Web users. Users can express opinions and record daily life using microblogs. Microblogs that are posted by users indicate their interests to some extent. We aim to mine user interests via keyword extraction from microblogs. Traditional keyword extraction methods are usually designed for formal documents such as news articles or scientific papers. Messages posted by microblogging users, however, are usually noisy and full of new words, which is a challenge for keyword extraction. In this paper, we combine a translation-based method with a frequency-based method for keyword extraction. In our experiments, we extract keywords for microblog users from the largest microblogging website in China, Sina Weibo. The results show that our method can identify users’ interests accurately and efficiently.  相似文献   

3.
People encounter more information than they can possibly use every day. But all information is not necessarily of equal value. In many cases, certain information appears to be better, or more trustworthy, than other information. And the challenge that most people then face is to judge which information is more credible. In this paper we propose a new problem called Corroboration Trust, which studies how to find credible news events by seeking more than one source to verify information on a given topic. We design an evidence‐based corroboration trust algorithm called TrustNewsFinder, which utilizes the relationships between news articles and related evidence information (person, location, time and keywords about the news). A news article is trustworthy if it provides many pieces of trustworthy evidence, and a piece of evidence is likely to be true if it is provided by many trustworthy news articles. Our experiments show that TrustNewsFinder successfully finds true events among conflicting information and identifies trustworthy news better than the popular search engines. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

4.
Motivated by a long-term goal in education for measuring Taiwanese civic scientific literacy in media (SLiM), this work reports the detailed techniques to efficiently mine a concept map from 2 years of Chinese news articles (901,446 in total) for SLiM instrument development. From the Chinese news stories, key terms (important words or phrases), known or new to existing lexicons, were first extracted by a simple, yet effective, rule-based algorithm. They were subjected to an association analysis based on their co-occurrence in sentences to reveal their term-to-term relationship. A given list of 3657 index terms from science textbooks were then matched against the term association network. The resulting term network (including 95 scientific terms) was visualized in a concept map to scaffold the instrument developers. When developing an item, the linked term pair not only suggests the topic for the item due to the clear context being mutually reinforced by each other, but also the content itself because of the rich background provided by the recurrent snippets in which they co-occur. In this way, the resulting instrument (comprised of 50 items) reflect the scientific knowledge revealed in the daily news stories, meeting the goal for measuring civic scientific literacy in media. In addition, the concept map mined from the texts served as a convenient tool for item classification, developer collaboration, and expert review and discussion.  相似文献   

5.
Knowledge graphs have gained increasing popularity in the past couple of years, thanks to their adoption in everyday search engines. Typically, they consist of fairly static and encyclopedic facts about persons and organizations–e.g. a celebrity’s birth date, occupation and family members–obtained from large repositories such as Freebase or Wikipedia.In this paper, we present a method and tools to automatically build knowledge graphs from news articles. As news articles describe changes in the world through the events they report, we present an approach to create Event-Centric Knowledge Graphs (ECKGs) using state-of-the-art natural language processing and semantic web techniques. Such ECKGs capture long-term developments and histories on hundreds of thousands of entities and are complementary to the static encyclopedic information in traditional knowledge graphs.We describe our event-centric representation schema, the challenges in extracting event information from news, our open source pipeline, and the knowledge graphs we have extracted from four different news corpora: general news (Wikinews), the FIFA world cup, the Global Automotive Industry, and Airbus A380 airplanes. Furthermore, we present an assessment on the accuracy of the pipeline in extracting the triples of the knowledge graphs. Moreover, through an event-centered browser and visualization tool we show how approaching information from news in an event-centric manner can increase the user’s understanding of the domain, facilitates the reconstruction of news story lines, and enable to perform exploratory investigation of news hidden facts.  相似文献   

6.
In this paper, we propose an innovative architecture to segment a news video into the so-called “stories” by both using the included video and audio information. Segmentation of news into stories is one of the key issues for achieving efficient treatment of news-based digital libraries. While the relevance of this research problem is widely recognized in the scientific community, we are in presence of a few established solutions in the field. In our approach, the segmentation is performed in two steps: first, shots are classified by combining three different anchor shot detection algorithms using video information only. Then, the shot classification is improved by using a novel anchor shot detection method based on features extracted from the audio track. Tests on a large database confirm that the proposed system outperforms each single video-based method as well as their combination.
Mario VentoEmail:
  相似文献   

7.
Algorithms are playing an increasingly important role in the production of news content as their computation capacity in manipulating large-scale data continues to grow. In this article, we present Personalized and Interactive News Generation System (PINGS), an algorithm-driven news generation system that is designed to provide personalized and interactive news for sports. We designed PINGS to generate baseball news based on the statistical importance of data and the direct manipulation of user interface components that alter the underlying algorithmic computation. We discuss the base-level algorithm framework for automated news content generation and describe the architecture of the system in terms of how it is designed to support the generation of personalized news stories. An evaluation revealed that the algorithm is capable of generating news stories that are significantly more interesting and pleasant to read than traditional baseball news articles.  相似文献   

8.
For automatically mining the underlying relationships between different famous persons in daily news, for example, building a news person based network with the faces as icons to facilitate face-based person finding, we need a tool to automatically label faces in new images with their real names. This paper studies the problem of linking names with faces from large-scale news images with captions. In our previous work, we proposed a method called Person-based Subset Clustering which is mainly based on face clustering for all face images derived from the same name. The location where a name appears in a caption, as well as the visual structural information within a news image provided informative cues such as who are really in the associated image. By combining the domain knowledge from the captions and the corresponding image we propose a novel cross-modality approach to further improve the performance of linking names with faces. The experiments are performed on the data sets including approximately half a million news images from Yahoo! news, and the results show that the proposed method achieves significant improvement over the clustering-only methods.  相似文献   

9.
In this paper we proposed two-stage segmentation approach for splitting the TV broadcast news bulletins into sequence of news stories and codebooks derived from vector quantization are used for retrieving the segmented stories. At the first stage of segmentation, speaker (news reader) specific characteristics present in initial headlines of news bulletin are used for gross level segmentation. During second stage, errors in the gross level segmentation (first stage) are corrected by exploiting the speaker specific information captured from the individual news stories other than headlines. During headlines the captured speaker specific information is mixed with background music, and hence the segmentation at the first stage may not be accurate. In this work speaker specific information is represented by using mel frequency cepstral coefficients, and captured by Gaussian mixture models (GMMs). The proposed two-stage segmentation method is evaluated on manual segmented broadcast TV news bulletins. From the evaluation results, it is observed that about 93 % of the news stories are correctly segmented, 7 % are missed and 6 % are spurious. For navigating the bulletins, a quick navigation indexing method is developed based on speaker change points. Performance of the proposed two-stage segmentation and quick navigation methods are evaluated using GMM and neural networks models. For retrieving the target news stories from news corpus, sequence of codebook indices derived from vector quantization is explored. Proposed retrieval approach is evaluated using queries of different sizes. Evaluation results indicating that the retrieval accuracy is proportional to size of the query.  相似文献   

10.
Today, people have only limited, valuable leisure time at their hands which they want to fill in as good as possible according to their own interests, whereas broadcasters want to produce and distribute news items as fast and targeted as possible. These (developing) news stories can be characterised as dynamic, chained, and distributed events in addition to which it is important to aggregate, link, enrich, recommend, and distribute these news event items as targeted as possible to the individual, interested user. In this paper, we show how personalised recommendation and distribution of news events, described using an RDF/OWL representation of the NewsML-G2 standard, can be enabled by automatically categorising and enriching news events metadata via smart indexing and linked open datasets available on the web of data. The recommendations—based on a global, aggregated profile, which also takes into account the (dis)likings of peer friends—are finally fed to the user via a personalised RSS feed. As such, the ultimate goal is to provide an open, user-friendly recommendation platform that harnesses the end-user with a tool to access useful news event information that goes beyond basic information retrieval. At the same time, we provide the (inter)national community with standardised mechanisms to describe/distribute news event and profile information.  相似文献   

11.
Building emotional dictionary for sentiment analysis of online news   总被引:1,自引:0,他引:1  
Sentiment analysis of online documents such as news articles, blogs and microblogs has received increasing attention in recent years. In this article, we propose an efficient algorithm and three pruning strategies to automatically build a word-level emotional dictionary for social emotion detection. In the dictionary, each word is associated with the distribution on a series of human emotions. In addition, a method based on topic modeling is proposed to construct a topic-level dictionary, where each topic is correlated with social emotions. Experiment on the real-world data sets has validated the effectiveness and reliability of the methods. Compared with other lexicons, the dictionary generated using our approach is language-independent, fine-grained, and volume-unlimited. The generated dictionary has a wide range of applications, including predicting the emotional distribution of news articles, identifying social emotions on certain entities and news events.  相似文献   

12.
Online news has become one of the major channels for Internet users to get news. News websites are daily overwhelmed with plenty of news articles. Huge amounts of online news articles are generated and updated everyday, and the processing and analysis of this large corpus of data is an important challenge. This challenge needs to be tackled by using big data techniques which process large volume of data within limited run times. Also, since we are heading into a social-media data explosion, techniques such as text mining or social network analysis need to be seriously taken into consideration.In this work we focus on one of the most common daily activities: web news reading. News websites produce thousands of articles covering a wide spectrum of topics or categories which can be considered as a big data problem. In order to extract useful information, these news articles need to be processed by using big data techniques. In this context, we present an approach for classifying huge amounts of different news articles into various categories (topic areas) based on the text content of the articles. Since these categories are constantly updated with new articles, our approach is based on Evolving Fuzzy Systems (EFS). The EFS can update in real time the model that describes a category according to the changes in the content of the corresponding articles. The novelty of the proposed system relies in the treatment of the web news articles to be used by these systems and the implementation and adjustment of them for this task. Our proposal not only classifies news articles, but it also creates human interpretable models of the different categories. This approach has been successfully tested using real on-line news.  相似文献   

13.
News personalized recommendation has long been a favorite research in recommender. Previous methods strive to satisfy the users by constructing the users’ preference profiles. Traditionally, most of recent researches use users’ reading history (content based) or access pattern (collaborative filtering based) to recommend newly published news to them. In this way, they only considered the relationship between news articles and the users and ignored the context of news report background. In other words, they fail to provide more useful information with considering the progression of the news story chain. In this paper, we propose to define the quality of a news story chain. Besides, we propose a method to construct a news story chain on a news corpus with date information. At last, we use a greedy selection method for filtering the final recommended news articles with considering accuracy and diversity. In this way, we can provide the news articles for users and meet their requirement: after reading the recommended news, the user gains a better understanding of the progression of the news story they read before. Finally, we designed several experiments compared to the state-of-the-art approaches, and the experimental results show that our proposed method significantly improves the accuracy, diversity and NDCG metrics.  相似文献   

14.
Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can be treated as one such stream of text; in this paper we discuss finding news articles on the web that are relevant to news currently being broadcast. We evaluated a variety of algorithms for this problem, looking at the impact of inverse document frequency, stemming, compounds, history, and query length on the relevance and coverage of news articles returned in real time during a broadcast. We also evaluated several postprocessing techniques for improving the precision, including reranking using additional terms, reranking by document similarity, and filtering on document similarity. For the best algorithm, 84–91% of the articles found were relevant, with at least 64% of the articles being on the exact topic of the broadcast. In addition, a relevant article was found for at least 70% of the topics.  相似文献   

15.
Social media have ushered in alternative modalities to propagate news and developments rapidly. Just as traditional IR matured to modeling storylines from search results, we are now at a point to study how stories organize and evolve in additional mediums such as Twitter, a new frontier for intelligence analysis. This study takes as input news articles as well as social media feeds and extracts and connects entities into interesting storylines not explicitly stated in the underlying data. First, it proposes a novel method of spatio-temporal analysis on induced concept graphs that models storylines propagating through spatial regions in a time sequence. Second, it describes a method to control search space complexity by providing regions of exploration. And third, it describes ConceptRank as a ranking strategy that differentiates strongly-typed connections from weakly-bound ones. Extensive experiments on the Boston Marathon Bombings of April 15, 2013 as well as socio-political and medical events in Latin America, the Middle East, and the United States demonstrate storytelling’s high application potential, showcasing its use in event summarization and association analysis that identifies events before they hit the newswire.  相似文献   

16.
With the information explosion from the Internet, there is a need to efficiently determine the relevance of information. This paper discusses an approach to information filtering using dynamic abstract generation techniques. Different abstract generation techniques such as the location method, indicative-phrases, keyword frequency, and title-keyword method are incorporated into a retrieval interface for on-line news articles. During news retrieval, abstract generation, an extract containing a set of verbatim sentences from the news article will be automatically produced. This will form an indicative abstract from which the prospective reader can then decide whether to read the full-length news article. In this way, a reader can filter out irrelevant news articles without having to review the entire article.  相似文献   

17.
Storyline-based summarization for news topic retrospection   总被引:2,自引:0,他引:2  
Electronics newspapers gradually become main sources for news readers. When facing the numerous reports on a series of events in a topic, a summary of stories from news reports will benefit news readers in reviewing the news topic efficiently. Besides identifying events and presenting news titles and keywords the TDT (Topic Detection and Tracking) techniques are used to do, a summarized text to present event evolution is necessary for general news readers to review events under a news topic. This paper proposes a topic retrospection process and implements the SToRe (Story-line based Topic Retrospection) system that identifies various events under a news topic, and composes a summary that news readers can get the sketch of event evolution in the topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The constructed main storyline can remove the irrelevant events and present a main theme. The storyline-based summarization extracts the representative sentences and takes the main theme as the template to compose the summary. The storyline summary not only provides readers enough information to understand the development of a news topic, but also serves as an index for readers to search corresponding news reports. Following a design science paradigm, a lab experiment is conducted to evaluate the SToRe system in the question-and-answer (Q&A) setting. The experimental results show that SToRe enables news readers to effectively and efficiently capture the evolution of a news topic.  相似文献   

18.
Prior research has identified the influence of using hyperlinks in online information gathering. This study attempts to understand first, how hyperlinks can influence individuals' perceptions of news credibility and information‐seeking behavior. Second, the paper extends previous research by examining the interaction of hyperlinks with the content of the story. In doing so, the paper examines the influence of hyperlinks on news frames. The data for the study were collected using 2 experiments embedded in web‐based survey of participants. Findings show that hyperlinks in news stories can increase perceptions of credibility as well as information‐seeking. Results reveal the interaction of news frames in the process; hyperlinks increase participants' perception of news credibility; but only in the value‐framed condition. Implications are discussed.  相似文献   

19.
20.
This paper proposes a visualization method of news distribution in Blog space. Recently, Blog is becoming one of the important information resources on the Web, from which trend information can be obtained. On the other hand, online news site is another information resource, which reports latest events in the world. This paper focuses on the combination of both resources, and proposes a method for visualizing news distribution in Blog space, which indicates various access patterns to news articles in Blog space. The types of objects that are to be visualized as well as their relationships are defined, based on which interactive information visualization system is proposed. Experiments with test subjects are performed to investigate the viewpoints they employ for examining news distribution in Blog space. The results show that test subjects can examine news distribution in Blog space from various viewpoints, which affects their estimation of the impacts of news articles.
Yasufumi TakamaEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号