scispace - formally typeset
Search or ask a question
Author

Sanghee Kim

Bio: Sanghee Kim is an academic researcher from University of Southampton. The author has contributed to research in topics: Ontology (information science) & Knowledge extraction. The author has an hindex of 10, co-authored 16 publications receiving 835 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The Artequakt project is considered, which links a knowledge extraction tool with an ontology to achieve continuous knowledge support and guide information extraction and is further enhanced using a lexicon-based term expansion mechanism that provides extended ontology terminology.
Abstract: To bring the Semantic Web to life and provide advanced knowledge services, we need efficient ways to access and extract knowledge from Web documents. Although Web page annotations could facilitate such knowledge gathering, annotations are rare and will probably never be rich or detailed enough to cover all the knowledge these documents contain. Manual annotation is impractical and unscalable, and automatic annotation tools remain largely undeveloped. Specialized knowledge services therefore require tools that can search and extract specific knowledge directly from unstructured text on the Web, guided by an ontology that details what type of knowledge to harvest. An ontology uses concepts and relations to classify domain knowledge. Other researchers have used ontologies to support knowledge extraction, but few have explored their full potential in this domain. The paper considers the Artequakt project which links a knowledge extraction tool with an ontology to achieve continuous knowledge support and guide information extraction. The extraction tool searches online documents and extracts knowledge that matches the given classification structure. It provides this knowledge in a machine-readable format that will be automatically maintained in a knowledge base (KB). Knowledge extraction is further enhanced using a lexicon-based term expansion mechanism that provides extended ontology terminology.

490 citations

01 Jan 2002
TL;DR: An overview of the Artequakt system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction.
Abstract: The Artequakt project seeks to automatically generate narrative biographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. Conclusions are drawn from the initial experiences of the project and future progress is detailed.

98 citations

01 Jan 2000
TL;DR: A versatile multi-agent framework designed for Distributed Information Management tasks, SoFAR embraces the notion of proactivity as the opportunistic reuse of the services provided by other agents, and provides the means to enable agents to locate suitable service providers.
Abstract: In this paper we present SoFAR, a versatile multi-agent framework designed for Distributed Information Management tasks. SoFAR embraces the notion of proactivity as the opportunistic reuse of the services provided by other agents, and provides the means to enable agents to locate suitable service providers. The contribution of SoFAR is to combine some ideas from the distributed computing community with the performative-based communications used in other agent systems: communications in SoFAR are based on the startpoint/endpoint paradigm, which is the foundation of Nexus, the communication layer at the heart of the Computational Grid. We explain the rationale behind our design decisions, and describe the predefined set of agents which make up the core of the system. Two distributed information management applications have been written, a general query architecture and an open hypermedia application, and we recount their design and operations.

46 citations

01 Jan 2003
TL;DR: This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology.
Abstract: A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper.

46 citations

Book ChapterDOI
20 Oct 2003
TL;DR: The design and prototype implementation of a novel architecture for integrated concept, metadata and content based browsing and retrieval of museum information is described, part of a European project involving several major galleries and the aim is to provide more versatile access to digital collections of museum artefacts.
Abstract: This paper describes the design and prototype implementation of a novel architecture for integrated concept, metadata and content based browsing and retrieval of museum information. The work is part of a European project involving several major galleries and the aim is to provide more versatile access to digital collections of museum artefacts, including 2-D images, 3-D models and other multimedia representations. An ontology for the museum domain, based on the CIDOC Conceptual Reference Model, is being developed as a semantic layer with references to the digital collection as instance information. A graphical concept browser is an integral component in the user interface, allowing navigation through the semantic layer, display of thumbnails, or full representations of artefacts and textual information in appropriate viewers and the invocation of conventional content based searching or combined querying. Semantic Web technologies are used in system integration to describe how tools for analysis and visualisation can be applied to different data types and sources. This supports flexible and managed formulation, execution and interpretation of the results of distributed multimedia queries. Combined searches using concepts, content and metadata can be initiated from a single user interface.

45 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: The Artequakt project is considered, which links a knowledge extraction tool with an ontology to achieve continuous knowledge support and guide information extraction and is further enhanced using a lexicon-based term expansion mechanism that provides extended ontology terminology.
Abstract: To bring the Semantic Web to life and provide advanced knowledge services, we need efficient ways to access and extract knowledge from Web documents. Although Web page annotations could facilitate such knowledge gathering, annotations are rare and will probably never be rich or detailed enough to cover all the knowledge these documents contain. Manual annotation is impractical and unscalable, and automatic annotation tools remain largely undeveloped. Specialized knowledge services therefore require tools that can search and extract specific knowledge directly from unstructured text on the Web, guided by an ontology that details what type of knowledge to harvest. An ontology uses concepts and relations to classify domain knowledge. Other researchers have used ontologies to support knowledge extraction, but few have explored their full potential in this domain. The paper considers the Artequakt project which links a knowledge extraction tool with an ontology to achieve continuous knowledge support and guide information extraction. The extraction tool searches online documents and extracts knowledge that matches the given classification structure. It provides this knowledge in a machine-readable format that will be automatically maintained in a knowledge base (KB). Knowledge extraction is further enhanced using a lexicon-based term expansion mechanism that provides extended ontology terminology.

490 citations

Proceedings Article
01 May 2004
TL;DR: It is proposed in this paper that one approach to ontology evaluation should be corpus or data driven, because a corpus is the most accessible form of knowledge and its use allows a measure to be derived of the ‘fit’ between an ontology and a domain of knowledge.
Abstract: The evaluation of ontologies is vital for the growth of the Semantic Web. We consider a number of problems in evaluating a knowledge artifact like an ontology. We propose in this paper that one approach to ontology evaluation should be corpus or data driven. A corpus is the most accessible form of knowledge and its use allows a measure to be derived of the ‘fit’ between an ontology and a domain of knowledge. We consider a number of methods for measuring this ‘fit’ and propose a measure to evaluate structural fit, and a probabilistic approach to identifying the best ontology.

407 citations

Journal ArticleDOI
01 Oct 2005
TL;DR: The experimental results show that the news agent based on the fuzzy ontology can effectively operate for news summarization and an experimental website is constructed to test the approach.
Abstract: In this paper, a fuzzy ontology and its application to news summarization are presented. The fuzzy ontology with fuzzy concepts is an extension of the domain ontology with crisp concepts. It is more suitable to describe the domain knowledge than domain ontology for solving the uncertainty reasoning problems. First, the domain ontology with various events of news is predefined by domain experts. The document preprocessing mechanism will generate the meaningful terms based on the news corpus and the Chinese news dictionary defined by the domain expert. Then, the meaningful terms will be classified according to the events of the news by the term classifier. The fuzzy inference mechanism will generate the membership degrees for each fuzzy concept of the fuzzy ontology. Every fuzzy concept has a set of membership degrees associated with various events of the domain ontology. In addition, a news agent based on the fuzzy ontology is also developed for news summarization. The news agent contains five modules, including a retrieval agent, a document preprocessing mechanism, a sentence path extractor, a sentence generator, and a sentence filter to perform news summarization. Furthermore, we construct an experimental website to test the proposed approach. The experimental results show that the news agent based on the fuzzy ontology can effectively operate for news summarization.

377 citations