Abstract: | Computation of semantic similarity between concepts is a very common problemin many language related tasks and knowledge domains. In the biomedical field, several approaches have been developed to deal with this issue by exploiting the structured knowledgeavailable in domain ontologies (such as SNOMED-CT or MeSH) and specific, closed andreliable corpora (such as clinical data). However, in recent years, the enormous growth ofthe Web has motivated researchers to start using it as the corpus to assist semantic analysisof language. This paper proposes and evaluates the use of the Web as background corpus formeasuring the similarity of biomedical concepts. Several ontology-based similarity measureshave been studied and tested, using a benchmark composed by biomedical terms, comparingthe results obtained when applying them to the Web against approaches in which specificclinical data were used. Results show that the similarity values obtained from the Web forontology-based measures are at least and even more reliable than those obtained from specificclinical data, showing the suitability of the Web as information corpus for the biomedicaldomain. |