共查询到20条相似文献,搜索用时 0 毫秒
1.
Construction schedules can mitigate delay risks and are essential to project success. Yet, creating a quality construction schedule is often the outcome of experienced schedulers, and what makes it harder is the fact that historic information including decision reasoning was not documented and disseminated for future use. This study proposes a graph-based method to find the most time-efficient construction sequence from historic projects to improve scheduling productivity and accuracy. The proposed method captured the textual, numerical, and graphical features of construction schedules, and was validated on 353 construction schedules obtained from a Tier-1 contractor in the UK. The results indicate that earthwork sequences can be finished in 4.0% of the project time on average, but earthwork sequences are the least time-efficient ones in a construction project (29% delayed), particularly in road construction (88% delayed). This study compared the time efficiency of sequences learned from previous projects with case study sequences. Results indicated that frequent sequences learned from past projects are 26.7% closer to the actual schedule than the planned ones. Results of this study could assist inexperienced schedulers to create more quality construction schedules and project managers to benchmark project performances. 相似文献
2.
3.
In this paper we describe the different NLP techniques designed and used in collaboration between the CLLE-ERSS research laboratory and the CFH/Safety Data company to manage and analyse aviation incident reports. These reports are written every time anything abnormal occurs during a civil air flight. Although most of them relate routine problems, they are a valuable source of information about possible sources of greater danger. These texts are written in plain language, show a wide range of linguistic variation (telegraphic style overcrowded by acronyms or standard prose) and exist in different languages, even for a single company/country (although our main focus is on English and French). In addition to their variety, their sheer quantity (e.g. 600/month for a large airline company) clearly requires the use of advanced NLP and text mining techniques in order to extract useful information from them. Although this context and objectives seem to indicate that standard NLP techniques can be applied in a straightforward manner, innovative techniques are required to handle the specifics of aviation report text and the complex classification systems. We present several tools that aim at a better access to this data (classification and information retrieval), and help aviation safety experts in their analyses (data/text mining and interactive analysis).Some of these tools are currently in test or in use both at the national and international levels, by airline companies as well as by regulation authorities (DGAC,1 EASA,2 ICAO3). 相似文献
4.
Examining past near-miss reports can provide us with information that can be used to learn about how we can mitigate and control hazards that materialise on construction sites. Yet, the process of analysing near-miss reports can be a time-consuming and labour-intensive process. However, automatic text classification using machine learning and ontology-based approaches can be used to mine reports of this nature. Such approaches tend to suffer from the problem of weak generalisation, which can adversely affect the classification performance. To address this limitation and improve classification accuracy, we develop an improved deep learning-based approach to automatically classify near-miss information contained within safety reports using Bidirectional Transformers for Language Understanding (BERT). Our proposed approach is designed to pre-train deep bi-directional representations by jointly extracting context features in all layers. We validate the effectiveness and feasibility of our approach using a database of near-miss reports derived from actual construction projects that were used to train and test our model. The results demonstrate that our approach can accurately classify ‘near misses’, and outperform prevailing state-of-the-art automatic text classification approaches. Understanding the nature of near-misses can provide site managers with the ability to identify work-areas and instances where the likelihood of an accident may occur. 相似文献
5.
Literature on supervised Machine-Learning (ML) approaches for classifying text-based safety reports for the construction sector has been growing. Recent studies have emphasized the need to build ML approaches that balance high classification accuracy and performance on management criteria, such as resource intensiveness. However, despite being highly accurate, the extensively focused, supervised ML approaches may not perform well on management criteria as many factors contribute to their resource intensiveness. Alternatively, the potential for semi-supervised ML approaches to achieve balanced performance has rarely been explored in the construction safety literature. The current study contributes to the scarce knowledge on semi-supervised ML approaches by demonstrating the applicability of a state-of-the-art semi-supervised learning approach, i.e., Yet, Another Keyword Extractor (YAKE) integrated with Guided Latent Dirichlet Allocation (GLDA) for construction safety report classification. Construction-safety-specific knowledge is extracted as keywords through YAKE, relying on accessible literature with minimal manual intervention. Keywords from YAKE are then seeded in the GLDA model for the automatic classification of safety reports without requiring a large quantity of prelabeled datasets. The YAKE-GLDA classification performance (F1 score of 0.66) is superior to existing unsupervised methods for the benchmark data containing injury narratives from Occupational Health and Safety Administration (OSHA). The YAKE-GLDA approach is also applied to near-miss safety reports from a construction site. The study demonstrates a high degree of generality of the YAKE-GLDA approach through a moderately high F1 score of 0.86 for a few categories in the near-miss data. The current research demonstrates that, unlike the existing supervised approaches, the semi-supervised YAKE-GLDA approach can achieve a novel possibility of consistently achieving reasonably good classification performance across various construction-specific safety datasets yet being resource-efficient. Results from an objective comparative and sensitivity analysis contribute to much-required knowledge-contesting insights into the functioning and applicability of the YAKE-GLDA. The results from the current study will help construction organizations implement and optimize an efficient ML-based knowledge-mining strategy for domains beyond safety and across sites where the availability of a pre-labeled dataset is a significant limitation. 相似文献
6.
基于自动分类的搜索引擎过滤系统 总被引:2,自引:0,他引:2
随着互联网的普及和发展,网络上的信息资源越来越丰富,如何高效、准确地获得包含用户所需的信息的网页资源,日益成为需要迫切解决的问题。目前搜索引擎返回的搜索结果往往涉及很多领域,而且是大量的,用户从中找到自己所感兴趣的内容往往很困难。利用自动分类器对搜索引擎的返回结果进行分类,以提高检索效率和准确性,方便用户使用。 相似文献
7.
《Advanced Engineering Informatics》2014,28(4):381-394
The dangers of the construction industry due to the risk of fatal hazards, such as falling from extreme heights, being struck by heavy equipment or materials, and the possibility of electrocution, are well known. The concept of Job Hazard Analysis is commonly used to mitigate and control these occupational hazards. This technique analyzes the major tasks in a construction activity, identifies all potential task-related hazards, and suggests safe approaches to reduce or avoid each of these hazards. In this paper, the authors explore the possibility of leveraging existing construction safety resources to assist JHA, aiming to reduce the level of human effort required. Specifically, the authors apply ontology-based text classification (TC) to match safe approaches identified in existing resources with unsafe scenarios. These safe approaches can serve as initial references and enrich the solution space when performing JHA. Various document modification strategies are applied to existing resources in order to achieve superior TC effectiveness. The end result of this research is a construction safety domain ontology and its underlying knowledge base. A user scenario is also discussed to demonstrate how the ontology supports JHA in practice. 相似文献
8.
Learning from past accidents is pivotal for improving safety in construction. However, hazard records are typically documented and stored as unstructured or semi-structured free-text rendering the ability to analyse such data a difficult task. The research presented in this study presents a novel and robust framework that combines deep learning and text mining technologies that provide the ability to analyse hazard records automatically. The framework comprises four-step modelling approach: (1) identification of hazard topics using a Latent Dirichlet Allocation algorithm (LDA) model; (2) automatic classification of hazards using a Convolution Neural Network (CNN) algorithm; (3) the production of a Word Co-occurrence Network (WCN) to determine the interrelations between hazards; and (4) quantitative analysis by Word Cloud (WC) technology of keywords to provide a visual overview of hazard records. The proposed framework is validated by analysing hazard records collected from a large-scale transport infrastructure project. It is envisaged that the use of the framework can provide managers with new insights and knowledge to better ensure positive safety outcomes in projects. The contributions of this research are threefold: (1) it is demonstrated that the process of analysing hazard records can be automated by combining deep learning and text learning; (2) hazards are able to be visualized using a systematic and data-driven process; and (3) the automatic generation of hazard topics and their classification over specific time periods enabling managers to understand their patterns of manifestation and therefore put in place strategies to prevent them from reoccurring. 相似文献
9.
《Ergonomics》2012,55(6):1264-1282
10.
An appropriate safety culture helps in enhancing safety performance in organisations. This study aims to investigate safety culture prevalence, assess individual sociodemographic parameters and accident experience effects on this culture and explore ways to enhance this culture in public sector organisations. A specially designed questionnaire was randomly distributed to 805 public sector employees in Dubai and Kuwait. Respondents were asked to rate their agreement with 24 statements representing seven safety culture dimensions. Student t-test and non-parametric tests were used to analyse the responses. Results revealed that employees in both governments reported experiencing a reasonably strong safety culture in their workplaces with safety attitude and teamwork receiving the highest while safety rules and workload receiving the lowest ranks among the seven safety culture dimensions. Moreover, male employees reported experiencing more accidents and scoring higher on most safety culture dimensions than female employees. Finally, employees who experienced accidents in the last five years reported a higher safety culture score than others. Accordingly, recommendations are put forward to enhance safety culture in public sector organisations. 相似文献
11.
Syed Mustajar Ahmad Shah Hongwei Ge Sami Ahmed Haider Muhammad Irshad Sohail M. Noman Jehangir Arshad Asfandeyar Ahmad Talha Younas 《计算机系统科学与工程》2021,36(2):369-382
The data generated from non-Euclidean domains and its graphical representation (with complex-relationship object interdependence) applications has observed an exponential growth. The sophistication of graph data has posed consequential obstacles to the existing machine learning algorithms. In this study, we have considered a revamped version of a semi-supervised learning algorithm for graph-structured data to address the issue of expanding deep learning approaches to represent the graph data. Additionally, the quantum information theory has been applied through Graph Neural Networks (GNNs) to generate Riemannian metrics in closed-form of several graph layers. In further, to pre-process the adjacency matrix of graphs, a new formulation is established to incorporate high order proximities. The proposed scheme has shown outstanding improvements to overcome the deficiencies in Graph Convolutional Network (GCN), particularly, the information loss and imprecise information representation with acceptable computational overhead. Moreover, the proposed Quantum Graph Convolutional Network (QGCN) has significantly strengthened the GCN on semi-supervised node classification tasks. In parallel, it expands the generalization process with a significant difference by making small random perturbations of the graph during the training process. The evaluation results are provided on three benchmark datasets, including Citeseer, Cora, and PubMed, that distinctly delineate the superiority of the proposed model in terms of computational accuracy against state-of-the-art GCN and three other methods based on the same algorithms in the existing literature. 相似文献
12.
鲍翠梅 《计算机应用与软件》2010,27(5):197-199
在文本自动分类中,针对如何进行文本特征的选择和提取这一关键和基础性工作,提出用支持向量度量词汇对分类的贡献,然后进行文本特征的提取。实验结果表明,该方法可以在确保分类信息不损失的前提下,降低向量空间的维数,提高分类器效率和分类准确率。 相似文献
13.
14.
Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional feature-vector-based learning methods, one could treat the presence or absence of a word as a Boolean feature and use these binary-valued features together with the numerical features. However, the use of a text-classification system on this is a bit more problematic—in the most straight-forward approach each number would be considered a distinct token and treated as a word. This paper presents an alternative approach for the use of text classification methods for supervised learning problems with numerical-valued features in which the numerical features are converted into bag-of-words features, thereby making them directly usable by text classification methods. We show that even on purely numerical-valued data the results of text classification on the derived text-like representation outperforms the more naive numbers-as-tokens representation and, more importantly, is competitive with mature numerical classification methods such as C4.5, Ripper, and SVM. We further show that on mixed-mode data adding numerical features using our approach can improve performance over not adding those features. 相似文献
15.
根据文本分类通常包含多异类数据源的特点,提出了多核SVM学习算法。该算法将分类核矩阵的二次组合重新表述成半无限规划,并说明其可以通过重复利用SVM来实现有效求解。实验结果表明,提出的算法可以用于数百个核的结合或者是数十万个样本的结合,对于多异类数据源的文本分类具有较高的查全率和查准率。 相似文献
16.
Support vector machines (SVMs) have shown superb performance for text classification tasks. They are accurate, robust, and quick to apply to test instances. Their only potential drawback is their training time and memory requirement. For n training instances held in memory, the best-known SVM implementations take time proportional to na, where a is typically between 1.8 and 2.1. SVMs have been trained on data sets with several thousand instances, but Web directories today contain millions of instances that are valuable for mapping billions of Web pages into Yahoo!-like directories. We present SIMPL, a nearly linear-time classification algorithm that mimics the strengths of SVMs while avoiding the training bottleneck. It uses Fisher's linear discriminant, a classical tool from statistical pattern recognition, to project training instances to a carefully selected low-dimensional subspace before inducing a decision tree on the projected instances. SIMPL uses efficient sequential scans and sorts and is comparable in speed and memory scalability to widely used naive Bayes (NB) classifiers, but it beats NB accuracy decisively. It not only approaches and sometimes exceeds SVM accuracy, but also beats the running time of a popular SVM implementation by orders of magnitude. While describing SIMPL, we make a detailed experimental comparison of SVM-generated discriminants with Fisher's discriminants, and we also report on an analysis of the cache performance of a popular SVM implementation. Our analysis shows that SIMPL has the potential to be the method of choice for practitioners who want the accuracy of SVMs and the simplicity and speed of naive Bayes classifiers.Received: 9 September 2002, Accepted: 3 March 2003, Published online: 21 July 2003 Edited by Y. Ioannidis 相似文献
17.
蝴蝶种类成千上万,每种蝴蝶都与一定植物密切相关,研究蝴蝶种类自动识别有重要意义. 野外环境下的蝴蝶物种识别研究受制于现有数据集蝴蝶种类较少,每类样本(图像)数量较少,使基于机器学习的蝴蝶种类识别面临泛化推广难的挑战. 另外,野外环境下的蝴蝶翅膀遮挡使分类特征学习面临挑战. 因此,提出基于元学习的蝴蝶物种自动识别新模型DL-MAML(deep learning advanced model-agnostic meta-learning),实现野外环境下的任意蝴蝶种类识别. 首先,DL-MAML模型采用L2正则改进经典元学习算法MAML(model-agnostic meta-learning)的目标函数和模型参数更新方法,并对MAML增加了2层特征学习模块,避免模型陷入过拟合风险,解决现有野外环境下蝴蝶物种识别面临的泛化推广困难;其次,采用ResNet34深度学习模型提取蝴蝶分类特征,对图像进行表征预处理,作为DL-MAML模型元学习模块的输入,克服其特征提取不足的缺陷,以及野外环境下蝴蝶翅膀遮挡带来的分类特征学习困难. 大量消融实验以及与同类模型的实验比较表明,DL-MAML算法学习获得的初始模型参数对蝴蝶新类识别具有很好的效果,优于MAML和其他同类模型,对野外环境下的蝴蝶种类识别很有效,使利用现有野外环境下的蝴蝶数据集构造通用且完全的蝴蝶物种识别系统成为可能. 相似文献
18.
针对维吾尔语文本的分类问题,提出一种基于TextRank算法和互信息相似度的维吾尔文关键词提取及文本分类方法。首先,对输入文本进行预处理,滤除非维吾尔语的字符和停用词;然后,利用词语语义相似度、词语位置和词频重要性加权的TextRank算法提取文本关键词集合;最后,根据互信息相似度度量,计算输入文本关键词集和各类关键词集的相似度,最终实现文本的分类。实验结果表明,该方案能够 提取出具有较高识别度的关键词,当关键词集大小为1250时,平均分类率达到了91.2%。 相似文献
19.
Construction quality control is achieved primarily through various testing and inspections and subsequent analysis of the massive unstructured quality records. The quality professionals are required to classify and review the inspection texts according to the project category. However, manual processing of a sheer amount of textual data is not only time-consuming, laborious but also error-prone, which could lead to overlooked quality issues and harm the overall project performance. In response, this paper uses the text mining method to mine the hidden information from unstructured text records. First, obtain quality text records on-site, use data cleaning method to obtain 9859 clean data, then use both Bidirectional Encoder Representation from Transformers (BERT) pre-training and Word2vec methods to quantify the text into a digital representation, next improve the Convolutional Neural Network (CNN) model by expanding input channels, and input the quantified text into the model to extract key features to realize the integration of quality records according to established categories. The results show that the average precision of the proposed model is 89.69%. Compared with CNN, BERT, and other models, this model has less manual intervention, less time-consuming training, and higher precision. Finally, through data augmentation of small sample data, the precision of the model is further improved, reaching 92.02%. The proposed model can assist quality professionals to quickly spot key quality issues and reference corresponding quality standards for further actions, and allow them to focus on more value-added efforts, e.g., making decisions and planning for corrective actions. This research also provides a reference for the ultimate goal of constructing an intelligent project management system. 相似文献
20.
近年来,基于图的半监督分类是机器学习与数据挖掘领域的研究热点之一.该类方法一般通过构造图来挖掘数据中隐含的信息,并利用图的结构信息来对无标签样本进行分类.因此,半监督分类的效果严重依赖于图的质量.文中提出了一种基于光滑表示的半监督分类算法.具体来说,此方法通过应用一个低通滤波器来实现数据的平滑,然后将光滑数据用于半监督... 相似文献