首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Journal of Supercomputing - Twitter social network has gained more popularity due to the increase in social activities of registered users. Twitter performs dual functions of online social...  相似文献   

2.
The University of Manchester's Small-Scale Experimental Machine (SSEM), known as the Baby, was rebuilt as a replica to celebrate, in June 1998, the 50th anniversary of the running of the world's first stored program. This article explains the background of the original Baby, and why and how a replica of it was built. The article concludes with some of the lessons learned from the project.  相似文献   

3.
Location detection and disambiguation from twitter messages   总被引:1,自引:0,他引:1  
A remarkable amount of Twitter messages are generated every second. Detecting the location entities mentioned in these messages is useful in text mining applications. Therefore, techniques for extracting the location entities from the Twitter textual content are needed. In this work, we approach this task in a similar manner to the Named Entity Recognition (NER) task, but we focus only on locations, while NER systems detect names of persons, organizations, locations, and sometimes more (e.g., dates, times). But, unlike NER systems, we address a deeper task: classifying the detected locations into names of cities, provinces/states, and countries in order to map them into physical locations. We approach the task in a novel way, consisting in two stages. In the first stage, we train Conditional Random Fields (CRF) models that are able to detect the locations mentioned in the messages. We train three classifiers: one for cities, one for provinces/states, and one for countries, with various sets of features. Since a dataset annotated with this kind of information was not available, we collected and annotated our own dataset to use for training and testing. In the second stage, we resolve the remaining ambiguities, namely, cases when there exists more than one place with the same name. We proposed a set of heuristics able to choose the correct physical location in these cases. Our two-stage model will allow a social media monitoring system to visualize the places mentioned in Twitter messages on a map of the world or to compute statistics about locations. This kind of information can be of interest to business or marketing applications.  相似文献   

4.
Change-point methods (CPMs) are statistical tests design to assess whether a given sequence comes from an unique, stationary, data-generating process. CPMs eventually estimate the change-point location, i.e., the point where the data-generating process shifted. While there exists a large literature concerning CPMs meant for sequences of independent and identically distributed (i.i.d.) random variables, their use on time-dependent signals has not been properly investigated. In this case, a straightforward solution consists in computing at first the residuals between the observed signal and the output of a suitable approximation model, and then applying the CPM on the residual sequence. Unfortunately, in practical applications, such residuals are seldom i.i.d., and this may prevent the CPMs to operate properly. To counteract this problem, we introduce the ensemble of CPMs, which aggregates several estimates obtained from CPMs executed on different subsequences of residuals, obtained from random sampling. Experiments show that the ensemble of CPMs improves the change-point estimates when the residuals are not i.i.d., as it is often the case in real-world scenarios.  相似文献   

5.
6.
Sample statistics and model parameters can be used to infer the properties, or characteristics, of the underlying population in typical data-analytic situations. Confidence intervals can provide an estimate of the range within which the true value of the statistic lies. A narrow confidence interval implies low variability of the statistic, justifying a strong conclusion made from the analysis. Many statistics used in software metrics analysis do not come with theoretical formulas to allow such accuracy assessment. The Efron bootstrap statistical analysis appears to address this weakness. In this paper, we present an empirical analysis of the reliability of several Efron nonparametric bootstrap methods in assessing the accuracy of sample statistics in the context of software metrics. A brief review on the basic concept of various methods available for the estimation of statistical errors is provided, with the stated advantages of the Efron bootstrap discussed. Validations of several different bootstrap algorithms are performed across basic software metrics in both simulated and industrial software engineering contexts. It was found that the 90 percent confidence intervals for mean, median, and Spearman correlation coefficients were accurately predicted. The 90 percent confidence intervals for the variance and Pearson correlation coefficients were typically underestimated (60-70 percent confidence interval), and those for skewness and kurtosis overestimated (98-100 percent confidence interval). It was found that the Bias-corrected and accelerated bootstrap approach gave the most consistent confidence intervals, but its accuracy depended on the metric examined. A method for correcting the under-/ overestimation of bootstrap confidence intervals for small data sets is suggested, but the success of the approach was found to be inconsistent across the tested metrics.  相似文献   

7.
Detecting topics from Twitter streams has become an important task as it is used in various fields including natural disaster warning, users opinion assessment, and traffic prediction. In this article, we outline different types of topic detection techniques and evaluate their performance. We categorize the topic detection techniques into five categories which are clustering, frequent pattern mining, Exemplar-based, matrix factorization, and probabilistic models. For clustering techniques, we discuss and evaluate nine different techniques which are sequential k-means, spherical k-means, Kernel k-means, scalable Kernel k-means, incremental batch k-means, DBSCAN, spectral clustering, document pivot clustering, and Bngram. Moreover, for matrix factorization techniques, we analyze five different techniques which are sequential Latent Semantic Indexing (LSI), stochastic LSI, Alternating Least Squares (ALS), Rank-one Downdate (R1D), and Column Subset Selection (CSS). Additionally, we evaluate several other techniques in the frequent pattern mining, Exemplar-based, and probabilistic model categories. Results on three Twitter datasets show that Soft Frequent Pattern Mining (SFM) and Bngram achieve the best term precision, while CSS achieves the best term recall and topic recall in most of the cases. Moreover, Exemplar-based topic detection obtains a good balance between the term recall and term precision, while achieving a good topic recall and running time.  相似文献   

8.
Airborne laser profiling data were used to estimate the basal area, volume, and biomass of primary tropical forests. A procedure was developed and tested to divorce the laser and ground data collection efforts using three distinct data sets acquired in and over the tropical forests of Costa Rica. Fixed-area ground plot data were used to simulate the height characteristics of the tropical forest canopy and to simulate laser measurements of that canopy. On two of the three study sites, the airborne laser estimates of basal area, volume, and biomass grossly misrepresented ground estimates of same. On the third study site, where the widest ground plots were utilized, airborne and ground estimates agreed within 24%. Basal area, volume, and biomass prediction inaccuracies in the first two study areas were tied directly to disagreements between simulated laser estimates and the corresponding airborne measurements of average canopy height, height variability, and canopy density. A number of sampling issues were investigated; the following results were noted in the analyses of the three study areas. 1) Of the four ground segment lengths considered (25 m, 50 m, 75 m, and 100 m), the 25 m segment length introduced a level of variability which may severely degrade prediction accuracy in these Costa Rican primary tropical forests. This effect was more pronounced as plot width decreased. A minimum segment length was on the order of 50 m. 2) The decision to transform or not to transform the dependent variable (e.g., biomass) was by far the most important factor of those considered in this experiment. The natural log transformation of the dependent variable increased prediction error, and error increased dramatically at the shorter segment lengths. The most accurate models were multiple linear models with forced zero intercept and an untransformed dependent variable. 3) General linear models were developed to predict basal area, volume, and biomass using airborne laser height measurements. Useful laser measurements include average canopy height, all pulses ( a), average canopy height, canopy hits ( c) and the coefficients of variation of these terms (ca and cc). Coefficients of determination range from 0.4 to 0.6. Based on this research, airborne laser and ground sampling procedures are proposed for use for reconnaissance level surveys of inaccessible forested regions.  相似文献   

9.
Data from the ICESat/GLAS laser altimetry mission is used to obtain an estimate of the volume change of Greenland's ice sheet over the time span of February 2003 to April 2007. A novel processing strategy is developed and applied. It uses approximately 1 million ICESat elevation differences at geometrically overlapping footprints of both crossing and repeated tracks. The data are edited using quality flags defined by the ICESat/GLAS science team, as well as other additional criteria. In order to reduce the influence of surface slope, we propose a correction based on the ICESat/GLAS laser altimetry digital elevation model. Three slightly different processing strategies to convert the observed temporal elevation differences to elevation/volume changes are compared for 6 different drainage systems, further divided into regions above and below 2000 m in elevation. The final chosen strategy includes the correction for surface slopes, but does not include the removal of outlying elevation changes. For the region above 2000 m, a positive elevation change rate of 2 cm/year is obtained, which corresponds to a volume change rate of 21 km3/year. For the region below 2000 m the estimated elevation change rate is ? 24 cm/year, which corresponds to a volume loss of 168 km3/year. In general, the obtained results are in agreement with trends discovered by other authors that were also derived from laser altimetry. Nevertheless, the estimation obtained in this study suggests a more negative trend than those obtained previously. The differences can be explained by differences in the sampling of the region below 2000 m and, to a certain extent, by different time spans of the datasets used. A representative sampling of coastal areas is identified as the most critical issue for an accurate estimation of volume change rates in Greenland.  相似文献   

10.
With low computation cost, motion vectors can be readily extracted from MPEG video streams and processed to estimate vehicle motion speed. A statistical model is proposed to model vehicle speed and noise. In order to achieve high estimation accuracy and also study the limitations of the proposed algorithm, we quantitatively evaluated four parameters used in our algorithm: temporal filter window size T, video resolution R v (CIF/QCIF), motion vector frame distance m, and video bit-rates. Our experiments showed that the mean vehicle speed can be estimated with high accuracy, up to 85 to 92% by proper spatial and temporal processing. The proposed algorithm is especially suitable for Skycam-based application, where the traditional tracking-based or virtual-loop-based approaches perform poorly because of their requirements of high-resolution images. Although extensive work has been done in extracting motion information directly from MPEG video data in compressed domain, to our best knowledge, this paper is the very first work in which stationary motion (speed) of moving objects can be estimated with high accuracy directly from MPEG motion vectors. Furthermore the proposed method is not limited to vehicle speed estimation by nature and it can be applied to other applications where the stationary motion assumption is satisfied.
Qi TianEmail:
  相似文献   

11.
This article is intended as a preliminary report on the implementation of a finite volume multilevel scheme for the discretization of the incompressible Navier-Stokes equations. As is well known the use of staggered grids (e.g. MAC grids, Perićet al. Comput. Fluids,16(4), 389–403, (1988)) is a serious impediment for the implementation of multilevel schemes in the context of finite differences. This difficulty is circumvented here by the use of a colocated finite volume discretization (Faureet al. (2004a) Submitted, Perićet al. Comput. Fluids,16(4), 389–403, (1988)), for which the algebra of multilevel methods is much simpler than in the context of MAC type finite differences. The general ideas and the numerical simulations are presented in this article in the simplified context of a two-dimensional Burgers equations; the two-, and three-dimensional Navier-Stokes equations introducing new difficulties related to the incompressibility condition and the time discretization, will be considered elsewhere (see Faureet al. (2004a) Submitted and Faureet al. (2004b), in preparation).  相似文献   

12.
The increasing popularity of Twitter as social network tool for opinion expression as well as information retrieval has resulted in the need to derive computational means to detect and track relevant topics/events in the network. The application of topic detection and tracking methods to tweets enable users to extract newsworthy content from the vast and somehow chaotic Twitter stream. In this paper, we apply our technique named Transaction-based Rule Change Mining to extract newsworthy hashtag keywords present in tweets from two different domains namely; sports (The English FA Cup 2012) and politics (US Presidential Elections 2012 and Super Tuesday 2012). Noting the peculiar nature of event dynamics in these two domains, we apply different time-windows and update rates to each of the datasets in order to study their impact on performance. The performance effectiveness results reveal that our approach is able to accurately detect and track newsworthy content. In addition, the results show that the adaptation of the time-window exhibits better performance especially on the sports dataset, which can be attributed to the usually shorter duration of football events.  相似文献   

13.
14.
Although partial harvests are common in many forest types globally, there has been little assessment of the potential to map the intensity of these harvests using Landsat data. We modeled basal area removal and percent cover change in a study area in central Washington (northwestern USA) using biennial Landsat imagery and reference data from historical aerial photos and a system of inventory plots. First, we assessed the correlation of Landsat spectral bands and associated indices with measured levels of forest removal. The variables most closely associated with forest removal were the shortwave infrared (SWIR) bands (5 and 7) and those strongly influenced by SWIR reflectance (particularly Tasseled Cap Wetness, and the Disturbance Index). The band and indices associated with near-infrared reflectance (band 4, Tasseled Cap Greenness, and the Normalized Difference Vegetation Index) were only weakly correlated with degree of forest removal. Two regression-based methods of estimating forest loss were tested. The first, termed “state model differencing” (SMD), involves creating a model representing the relationship between inventory data from any date and corresponding, cross-normalized spectral data. This “state model” is then applied to imagery from two dates, with the difference between the two estimates taken as estimated change. The second approach, which we called “direct change modeling” (DCM), involves modeling forest structure changes as a single term using re-measured inventory data and spectral differences from corresponding image pairs. In a leave-one-out cross-validation process, DCM-derived estimates of harvest intensity had lower root mean square errors than SMD for both relative basal area change and relative cover change. The higher measured accuracy of DCM in this project must be weighed against several operational advantages of SMD relating to less restrictive reference data requirements and more specific resultant estimates of change.  相似文献   

15.
A methodology is developed here to model evapotranspiration (λEc ) from the canopy layer over large areas by combining satellite and ground measurements of biophysical and meteorological variables. The model developed here follows the energy balance approach, where λEc is estimated as a residual when the net radiation (Rn), sensible heat flux (H) and ground flux (G) are known. Multi-spectral measurements from the NOAA Advanced Very High Resolution Radiometer (AVHRR) were used along with routine meteorological measurements made on the ground to estimate components of the energy balance. The upwelling long wave radiation, and H from the canopy layer were modelled using the canopy temperature, obtained from a linear relation between the Normalized Difference Vegetation Index (NDVI) and surface temperature. This method separates flux measurements from the canopy and bare soil without the need for a complex two layer model. From theoretical analysis of canopy reflectance, leaf area, and canopy resistance, a model is developed to scale the transpiration estimates from the full canopy to give an area averaged estimate from the mean NDVI of the study area. The model was tested using data collected from the First International Satellite Land Surface Climatology Project (ISLSCP) Field Experiment (FIFE), and the results show that the modelled values of total surface evapotranspiration from the soil and canopy layers vary from the ground measurements by less than 9 per cent.  相似文献   

16.
17.
18.
The restoration of digital images and patterns by the splitting-integrating method (SIM) proposed by Li (1993) and Liet al. (1992) is much simpler than other algorithms because no solutions of nonlinear algebraic equations are required. Let a pixel in 2D images be split intoN 2 subpixels; the convergence rates areO(1/N) andO/(1/N 2) for pixel greyness under image normalization by SIM. In this paper, the advanced SIM using spline functions can raise the convergence rates to (O(1/N 3) andO(1/N 4). Error bounds of pixel greyness obtained are derived from numerical analysis, and numerical experiments are carried out to confirm the high convergence rates ofO(1/N 3) andO(1/N 4).  相似文献   

19.
The effort required to service maintenance requests on a software system increases as the software system ages and deteriorates. Thus, it may be economical to replace an aged software system with a freshly written one to contain the escalating cost of maintenance. We develop a normative model of software maintenance and replacement effort that enables us to study the optimal policies for software replacement. Based on both analytical and simulation solutions, we determine the timings of software rewriting and replacement, and hence the schedule of rewriting, as well as the size of the rewriting team as functions of the: user environment, effectiveness of rewriting, technology platform, development quality, software familiarity, and maintenance quality of the existing and the new software systems. Among other things, we show that a volatile user environment often leads to a delayed rewriting and an early replacement (i.e., a compressed development schedule). On the other hand, a greater familiarity with either the existing or the new software system allows for a less-compressed development schedule. In addition, we also show that potential savings from rewriting will be higher if the new software system is developed with a superior technology platform, if programmers' familiarity with the new software system is greater, and if the software system is rewritten with a higher initial quality  相似文献   

20.
The study of terrorism informatics utilizing the Twitter microblogging service has not been given apt attention in the past few years. Twitter has been identified as both a potential facilitator and also a powerful deterrent to terrorism. Based on observations of Twitter’s role in civilian response during the recent 2009 Jakarta and Mumbai terrorist attacks, we propose a structured framework to harvest civilian sentiment and response on Twitter during terrorism scenarios. Coupled with intelligent data mining, visualization, and filtering methods, this data can be collated into a knowledge base that would be of great utility to decision-makers and the authorities for rapid response and monitoring during such scenarios. Using synthetic experimental data, we demonstrated that the proposed framework has yielded meaningful graphical visualizations of information, to reveal potential response to terrorist threats. The novelty of this study is that microblogging has never been studied in the domain of terrorism informatics. This paper also contributes to the understanding of the capability of conjoint structured data and unstructured content mining in extracting deep knowledge from noisy twitter messages, through our proposed structured framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号