WorldCat Linked Data Explorer

http://worldcat.org/entity/work/id/12249317

Terminology identification in a collection of Web resources

The primary goal of the WordSmith project is to obtain subject terminology directly from raw text. We are currently investigating the hypothesis that reliable subject terms can be automatically collected, re-used, and organized into thesaurus-like objects that enhance access to material that is unwieldy to classify by hand, such as the Web documents in the CORC database. Baseline results ofour work are already visible in the CORC project. Catalogers who check the Generate possible subject terms button in the process of creating a description for a new item may retrieve novel subject terms, such as animal genome databases, backcountry Web sites, digital communities, e-mail viruses, and world wide Internet music. These terms are too new to appear in standard library classification schemes. In later versions of CORC, we want to make automatic keyword assignment more responsive to the needs of catalogers and use this terminology in other ways to increase subject access to the CORC collection. Our paper describes the current implementation of WordSmith in CORC, an evaluation of the results, and proposed future enhancements.

Open All Close All

http://schema.org/description

  • "The primary goal of the WordSmith project is to obtain subject terminology directly from raw text. We are currently investigating the hypothesis that reliable subject terms can be automatically collected, re-used, and organized into thesaurus-like objects that enhance access to material that is unwieldy to classify by hand, such as the Web documents in the CORC database. Baseline results ofour work are already visible in the CORC project. Catalogers who check the Generate possible subject terms button in the process of creating a description for a new item may retrieve novel subject terms, such as animal genome databases, backcountry Web sites, digital communities, e-mail viruses, and world wide Internet music. These terms are too new to appear in standard library classification schemes. In later versions of CORC, we want to make automatic keyword assignment more responsive to the needs of catalogers and use this terminology in other ways to increase subject access to the CORC collection. Our paper describes the current implementation of WordSmith in CORC, an evaluation of the results, and proposed future enhancements."@en

http://schema.org/name

  • "Terminology identification in a collection of Web resources"@en