首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
变化数据捕获方法是数据集成基础设施的战略组成部分,不断推动ETL、EAI等技术的发展.许多数据库厂商都提供了自己的CDC(Change data capture)产品,但只限于针对本身的数据库系统,价格也比较昂贵.虽然通过扫描数据库日志文件可以捕获变化数据,但大多数数据库系统都不提供日志文件的内部格式而只是提供日志访问的程序接口,如Oracle,SQL Server和DB2等.这些提供的接口有的访问活动日志,有的访问稳定日志,有的访问归档日志,因此很难保证读取日志文件的可靠性.现有的研究主要是如何利用程序应用接口读取日志文件,忽略了对可靠性的分析.本文针对读取不同类型的日志文件的可靠性条件进行了分析,提出了可靠读取规则及读取算法;并提出了从日志文件中有效抽取变化数据算法,实验证明了可靠性分析模型.  相似文献   

2.
数据库索引是用于提高数据检索速度的关键数据结构,该文结合常用的数据库索引结构B树,分析索引的原理,并结合外存储的原理,分析大多数数据库使用B+树作为索引结构的原因,并结合My SQL数据库中Inno DB存储引擎中的索引实现,分析其优缺点。  相似文献   

3.
开源数据库MySQL越来越广泛地应用于航空公司的信息系统,在深入分析航空货运系统数据特点的基础上,提出了联合使用内存和磁盘存储引擎的方法,用InnoDB存储业务数据,用Memory存储基础数据,大大提高了表连接操作的性能。针对内存表的永久保存问题,用写回相关磁盘表的方法加以解决。  相似文献   

4.
《Computers in Industry》2014,65(6):937-951
Passage retrieval is usually defined as the task of searching for passages which may contain the answer for a given query. While these approaches are very efficient when dealing with texts, applied to log files (i.e. semi-structured data containing both numerical and symbolic information) they usually provide irrelevant or useless results. Nevertheless one appealing way for improving the results could be to consider query expansions that aim at adding automatically or semi-automatically additional information in the query to improve the reliability and accuracy of the returned results. In this paper, we present a new approach for enhancing the relevancy of queries during a passage retrieval in log files. It is based on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of the requested information within a learning process. The second step is a new kind of pseudo relevance feedback. Based on a novel term weighting measure it aims at assigning a weight to terms according to their relatedness to queries. This measure, called TRQ (Term Relatedness to Query), is used to identify the most relevant expansion terms.The main advantage of our approach is that is can be applied both on log files and documents from general domains. Experiments conducted on real data from logs and documents show that our query expansion protocol enables retrieval of relevant passages.  相似文献   

5.
目前分布式业务应用的日志多存储在各分布式服务器节点本地日志文件中,没有集中存储和管理,导致业务系统问题定位速度慢,解决问题效率低.本文提供一种基于OSGi的分布式日志收集与分析技术方案.该方案单独设计了集中的日志存储服务器用于存储日志,并提供一套通用日志模型,业务应用分布式节点向该设备发送基于该模型的日志数据,日志存储服务器接收到各节点的日志数据后进行统一存储和界面化分析展示,帮助开发人员快速定位和分析问题.该方案以OSGi插件形式部署到应用系统,应用卸载该插件后则以原有方式存储日志.应用结果表明,采用该日志管理方案对1000并发下记录日志的业务应用访问性能平均提升2秒,并且没有日志数据丢失.开发人员反馈,错误日志更加一目了然,定位问题的时间明显短于普通的日志存储方式.  相似文献   

6.
Recently, researchers discovered that the major problems of mining event logs is to discover a simple, sound and complete process model. But since the mining techniques can only reproduce the behaviour recorded in the log, the fitness of the reproduced model is a function of the event log completeness. In this paper, a Fuzzy-Genetic Mining model based on Bayesian Scoring Functions (FGM-BSF) which we called probabilistic approach was developed to tackle problems which emanated from the incomplete event logs. The main motivation of using genetic mining for the process discovery is to benefit from the global search performed by the algorithm. The incompleteness in processes deals with uncertainty and is tackled by using the probabilistic nature of the scoring functions in Bayesian network based on a fuzzy logic value prediction. The global search performed by the genetic approach is panacea to dealing with the population that has both good and bad individuals. Hence, the proposed approach helps to enhance a robust fitness function for the genetic algorithm through highlift traces representing only good individuals not detected by mining model without an intelligent system. The implementation of our approach was carried out on java platform with MySQL for event log parsing and preprocessing while the actual discovery was done in ProM. The results showed that the proposed approach achieved 0.98% fitness when compared with existing schemes.  相似文献   

7.
介绍WindowsNT系统的日志管理构架和日志文件的文件结构,提出一种抓取并解析日志的分析方案,讨论如何针对WindowsNT构架来管理日志,以提高日志的安全性并实现对损坏的日志文件的修复.  相似文献   

8.
Process mining can be seen as the “missing link” between data mining and business process management. The lion's share of process mining research has been devoted to the discovery of procedural process models from event logs. However, often there are predefined constraints that (partially) describe the normative or expected process, e.g., “activity A should be followed by B” or “activities A and B should never be both executed”. A collection of such constraints is called a declarative process model. Although it is possible to discover such models based on event data, this paper focuses on aligning event logs and predefined declarative process models. Discrepancies between log and model are mediated such that observed log traces are related to paths in the model. The resulting alignments provide sophisticated diagnostics that pinpoint where deviations occur and how severe they are. Moreover, selected parts of the declarative process model can be used to clean and repair the event log before applying other process mining techniques. Our alignment-based approach for preprocessing and conformance checking using declarative process models has been implemented in ProM and has been evaluated using both synthetic logs and real-life logs from a Dutch hospital.  相似文献   

9.
LARGE框架是部署在中国科学院超级计算环境中的日志分析系统,通过日志收集、集中分析、结果反馈等步骤对环境中的各种日志文件进行监控和分析。在对环境中系统日志的监控过程中,系统维护人员需要通过日志模式提炼算法将大量的过往系统日志记录缩减为少量的日志模式集合。然而随着日志规模的增长以及messages日志文件的特殊性,原有的日志模式提炼算法已经难以满足对大规模日志快速处理的需要。介绍了一种对于日志模式提炼算法的优化方法,通过引入MapReduce机制实现在存在多个日志输入文件的情况下对日志处理和模式提炼的流程进行加速。实验表明,当输入文件较多时,该优化方法能够显著提高词汇一致率算法的运行速度,大幅减少运行时间。此外,还对使用词汇转换函数时的算法运行时间和提炼效果进行了验证。  相似文献   

10.
一种查看Linux系统审计日志的图形工具   总被引:2,自引:0,他引:2  
Linux系统提供了详细的系统审计日志。基于MySQL数据库和Kylix图形开发工具,开发了一个快速查询Linux系统审计日志的图形化工具,并可对查询的日志信息进行统计分析,提高了系统审计日志的实用性。  相似文献   

11.
EAST等离子体控制系统日志文件大且繁,不便于系统对错误类型的统计分析,也不能快速找到错误原因,给实验带来了极大的不便.基于这样的背景,有必要挖掘日志信息,建立等离子体控制系统的日志数据库,并开发基于B/S模式的管理系统.日志数据库的建立是通过正则表达式提取出日志文件中的特征信息,信息经过抽取整合后存入关系型数据库;系统前端采用Apache、PHP、MySQL进行开发,为不同角色的用户提供良好的交互平台,实现了日志信息查询、评论管理和用户管理功能.  相似文献   

12.
Many companies have adopted Process-aware Information Systems (PAIS) to support their business processes in some form. On the one hand these systems typically log events (e.g., in transaction logs or audit trails) related to the actual business process executions. On the other hand explicit process models describing how the business process should (or is expected to) be executed are frequently available. Together with the data recorded in the log, this situation raises the interesting question “Do the model and the log conform to each other?”. Conformance checking, also referred to as conformance analysis, aims at the detection of inconsistencies between a process model and its corresponding execution log, and their quantification by the formation of metrics. This paper proposes an incremental approach to check the conformance of a process model and an event log. First of all, the fitness between the log and the model is measured (i.e., “Does the observed process comply with the control flow specified by the process model?”). Second, the appropriateness of the model can be analyzed with respect to the log (i.e., “Does the model describe the observed process in a suitable way?”). Appropriateness can be evaluated from both a structural and a behavioral perspective. To operationalize the ideas presented in this paper a Conformance Checker has been implemented within the ProM framework, and it has been evaluated using artificial and real-life event logs.  相似文献   

13.
应毅  任凯  刘亚军 《计算机科学》2018,45(Z11):353-355
传统的日志分析技术在处理海量数据时存在计算瓶颈。针对该问题,研究了基于大数据技术的日志分析方案:由多台计算机完成日志文件的存储、分析、挖掘工作,建立了一个基于Hadoop开源框架的并行网络日志分析引擎,在MapReduce模型下重新实现了IP统计算法和异常检测算法。实验证明,在数据密集型计算中使用大数据技术可以明显提高算法的执行效率和增加系统的可扩展性。  相似文献   

14.
以某网站所需的日志分析需求为背景,设计并实现了一种Web日志分析系统的前端日志采集框架.该日志采集方案没有采用传统的写日志文件然后轮训的方式进行日志收集,而是采用客户端/服务器的模式进行"自定义"格式的日志采集.该框架同样完全兼容传统的日志记录格式和记录方式.  相似文献   

15.
传统的数据存储引擎对Flash Memory数据的修改是通过页内更新技术实现的,这将导致FlashMemory的性能下降及其磨损加剧。针对该问题,文章提出了一种面向Flash Memory的采用页外更新技术的多版本数据存储引擎MV4Flash。该数据存储引擎采用多版本存储和垃圾回收机制,所有数据的更新和修改都通过文件追加的方式进行,适应了Flash Memory先擦除后写入的特点,延长了设备寿命。采用NDBBench对该数据存储引擎进行测试的结果表明,MV4Flash与传统的InnoDB相比,事物处理性能有较大的提升,更适合于数据规模大、实时性要求高的应用系统。  相似文献   

16.
Process mining aims at deriving order relations between tasks recorded by event logs in order to construct their corresponding process models. The quality of the results is not only determined by the mining algorithm being used, but also by the quality of the provided event logs. As a criterion of log quality, completeness measures the magnitude of information for process mining covered by an event log. In this paper, we focus on the evaluation of the local completeness of an event log. In particular, we consider the direct succession (DS) relations between the tasks of a business process. Based on our previous work, an improved approach called CPL+ is proposed in this paper. Experiments show that the proposed CPL+ works better than other approaches, on event logs that contain a small amount of traces. Finally, by further investigating CPL+, we also found that the more distinct DSs observed in an event log, the lower the local completeness of the log is.  相似文献   

17.
18.
面向真实云存储环境的数据持有性证明系统   总被引:1,自引:0,他引:1  
肖达  杨绿茵  孙斌  郑世慧 《软件学报》2016,27(9):2400-2413
对数据动态更新和第三方审计的支持的实现方式是影响现有数据持有性证明(provable data possession,简称PDP)方案实用性的重要因素.提出面向真实云存储环境的安全、高效的PDP系统IDPA-MF-PDP.通过基于云存储数据更新模式的多文件持有性证明算法MF-PDP,显著减少审计多个文件的开销.通过隐式第三方审计架构和显篡改审计日志,最大限度地减少了对用户在线的需求.用户、云服务器和隐式审计者的三方交互协议,将MF-PDP和隐式第三方审计架构结合.理论分析和实验结果表明:IDPA-MF-PDP具有与单文件PDP方案等同的安全性,且审计日志提供了可信的审计结果历史记录;IDPA-MF-PDP将持有性审计的计算和通信开销由与文件数线性相关减少到接近常数.  相似文献   

19.
Nowadays, most systems and applications produce log records that are useful for security and monitoring purposes such as debugging programming errors, checking system status, and detecting configuration problems or even attacks. To this end, a log repository becomes necessary whereby logs can be accessed and visualized in a timely manner. This paper presents Loginson, a high-performance log centralization system for large-scale log collection and processing in large IT infrastructures. Besides log collection, Loginson provides high-level analytics through a visual interface for the purpose of troubleshooting critical incidents. We note that Loginson outperforms all of the other log centralization solutions by taking full advantage of the vertical scalability, and therefore decreasing Capital Expenditure (CAPEX) and Operating Expense (OPEX) costs for deployment scenarios with a huge volume of log data.  相似文献   

20.
In this paper we present recovery techniques for distributed main-memory databases, specifically for client-server and shared-disk architectures. We present a recovery scheme for client-server architectures which is based on shipping log records to the server, and two recovery schemes for shared-disk architectures—one based on page shipping, and the other based on broadcasting of the log of updates. The schemes offer different tradeoffs, based on factors such as update rates.Our techniques are extensions to a distributed-memory setting of a centralized recovery scheme for main-memory databases, which has been implemented in the Dalì main-memory database system. Our centralized as well as distributed-memory recovery schemes have several attractive features—they support an explicit multi-level recovery abstraction for high concurrency, reduce disk I/O by writing only redo log records to disk during normal processing, and use per-transaction redo and undo logs to reduce contention on the system log. Further, the techniques use a fuzzy checkpointing scheme that writes only dirty pages to disk, yet minimally interferes with normal processing—all but one of our recovery schemes do not require updaters to even acquire a latch before updating a page. Our log shipping/broadcasting schemes also support concurrent updates to the same page at different sites.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号