Showing papers by "Shigeo Sugimoto published in 2010"

PDF

Open Access

Journal Article•DOI•

Feature Analysis of Metadata Schemas for Records Management and Archives from the Viewpoint of Records Lifecycle

[...]

01 Dec 2010-Journal of Korean Society of Archives and Records Management

TL;DR: This study presents a basis for the interoperability of different metadata schemas used in digital archiving and preservation and shows the features of these metadata standards obtained through the feature analysis based on the records lifecycle model.

...read moreread less

Abstract: Digital resources are widely used in our modern society. However, we are facing fundamental problems to maintain and preserve digital resources over time. Several standard methods for preserving digital resources have been developed and are in use. It is widely recognized that metadata is one of the most important components for digital archiving and preservation. There are many metadata standards for archiving and preservation of digital resources, where each standard has its own feature in accordance with its primary application. This means that each schema has to be appropriately selected and tailored in accordance with a particular application. And, in some cases, those schemas are combined in a larger frame work and container metadata such as the DCMI application framework and METS. There are many metadata standards for archives of digital resources. We used the following metadata standards in this study for the feature analysis me metadata standards - AGLS Metadata which is defined to improve search of both digital resources and non-digital resources, ISAD(G) which is a commonly used standard for archives, EAD which is well used for digital archives, OAIS which defines a metadata framework for preserving digital objects, and PREMIS which is designed primarily for preservation of digital resources. In addition, we extracted attributes from the decision tree defined for digital preservation process by Digital Preservation Coalition (DPC) and compared the set of attributes with these metadata standards. This paper shows the features of these metadata standards obtained through the feature analysis based on the records lifecycle model. The features are shown in a single frame work which makes it easy to relate the tasks in the lifecycle to metadata elements of these standards. As a result of the detailed analysis of the metadata elements, we clarified the features of the standards from the viewpoint of relationships between the elements and the lifecycle stages. Mapping between metadata schemas is often required in the long-term preservation process because different schemes are used in the records lifecycle. Therefore, it is crucial to build a unified framework to enhance interoperability of these schemes. This study presents a basis for the interoperability of different metadata schemas used in digital archiving and preservation.

...read moreread less

4 citations

Journal Article•DOI•

Automatic Term Recognition Using the Corpora of the Different Academic Areas

[...]

Junko Kubo, Keita Tsuji, Shigeo Sugimoto

01 Jan 2010-Joho Chishiki Gakkaishi

TL;DR: 39分野のテキストから任意のテkBストを選び他分 Jincole, して用いてコーパスの規模を縮小できるか実験を行った．その結果，対象

...read moreread less

Abstract: コンピュータを使用した専門用語自動抽出は，従来，対象とする専門分野のテキストコーパスのみをデータとして行っているものが多かった．しかし，専門用語の特徴として，対象分野のコーパスに頻出し，対象分野以外の他分野コーパスにはあまり多く出現しない点が挙げられる．そこで本研究では，対象分野コーパスと他分野コーパスとの用語の出現率の差を考慮した手法を提案する．実験では，女性学のテキストを対象分野のコーパスとして使用し，他分野のコーパスとして39分野のテキストを使用した．実験の結果，従来の代表的手法よりもかなり高い精度で用語が抽出できることが明らかとなった．また39分野のテキストから任意のテキストを選び他分野コーパスとして用いてコーパスの規模を縮小できるか実験を行った．その結果，対象分野と類似した分野のテキストを用いることで，39分野すべてのテキストを用いた場合の抽出精度・再現率に近づけることができた．

...read moreread less

1 citations

Proceedings Article•DOI•

A constraint-based tool for data integrity management on the web

[...]

Masami Takahashi¹, Atsuyuki Morishima¹, Hiroyuki Kitagawa¹, Shigeo Sugimoto¹•Institutions (1)

University of Tsukuba¹

14 Jan 2010

TL;DR: A system to maintain the content integrity of Web sites without backend databases is proposed and weak inclusion relationships are proposed, which are inclusion relationships associated with inclusion ratios.

...read moreread less

Abstract: Today, publishing information on Web sites is common. And the size of the Web contents that need to be managed is increasing. Therefore it is important to maintain content integrities on the Web. This paper proposes a system to maintain the content integrity of Web sites without backend databases. First, we explain the architecture of the proposed system. Second, we address the problem of finding integrity constraints used as the input to the system. We focus on inclusion dependencies among HTML/XML elements and discuss how to find inclusion relationships that can be used as hints to find inclusion dependencies. In particular, we propose to introduce weak inclusion relationships, which are inclusion relationships associated with inclusion ratios. Finally, we propose a filter-based approach to the efficient discovery of weak inclusion relationships and discuss some of its possible implementations.

...read moreread less

1 citations

Proceedings Article•

Automatic Term Recognition Based on the Statistical Differences of Relative Frequencies in Different Corpora.

[...]

Junko Kubo¹, Keita Tsuji¹, Shigeo Sugimoto¹•Institutions (1)

University of Tsukuba¹

01 May 2010

TL;DR: A method for automatic term recognition (ATR) which uses the statistical differences of relative frequencies of terms in target domain corpus and elsewhere and found that this method outperformed earlier methods.

...read moreread less

Abstract: In this paper, we propose a method for automatic term recognition (ATR) which uses the statistical differences of relative frequencies of terms in target domain corpus and elsewhere. Generally, the target terms appear more frequently in target domain corpus than in other domain corpora. Utilizing such characteristics will lead to the improvement of extraction performance. Most of the ATR methods proposed so far only use the target domain corpus and do not take such characteristics into account. For the extraction experiment, we used the abstracts of a women's studies journal as a target domain corpus and those of academic journals of 39 domains as other domain corpora. The women's studies terms which were used for extraction evaluation were manually identified terms in the abstracts. The extraction performance was analyzed and we found that our method outperformed earlier methods. The previous methods were based on C-value, FLR and methods which were also used with other domain corpora.

...read moreread less

1 citations