go to UC Berkeley home page go to SIMS home page
 
Overview 

Assignments  

Lectures  

Administrivia  

Readings  

Online Resources 

SIMS 202 Information Organization and Retrieval  

Assignment 4 


Assigned 9/14. Due 9/21.

Readings:   Read Clark & Clark, Fellbaum and Miller (WordNet), Bates.

The goal of this assignment is to give you hands-on experience with a large database of lexical relations (WordNet) and to further explore the idea of hierarchical versus faceted organization of categories.

WordNet
(1) Use the WordNet HTML forms interface (or any of the other interfaces available) to explore a bit of WordNet. Make a sketch (pen and paper is fine) of a portion of the WordNet Lexical Net. Do this by first selecting a noun, then selecting one of its synsets (if it has more than one sense) and then selecting its hypernym until you reach a hypernym that has meronym (parts-of) relations. Descend down the hyponyms of a few of the meronyms. Continue until you've shown five or six relations. An example for which this sequence of operations produces results is canary. However, you can use a different sequence of operations if you like and a different starting word.

(2) This is an exploratory exercise, and there is no "correct" answer! Don't spend too long, not more than an hour. (Idea for question from Philip Resnik.)

    (a) Write down, in English, between 2 and 10 different senses of the verb "break". You may be able to come up with more, but 10 is plenty! Fewer is also ok. For example, here are two:

      1. break an object into pieces
      Example: Edgar broke the vase.

      2. break a bone
      Example: Sunyi broke her wrist.

    Try to do this without a dictionary if you can, but if you're not a native speaker of English, use a dictionary if you need to.

    (b) Go to one of the web pages for using WordNet online, and look up the verb senses for "break".

    Which WordNet senses do your senses from part (a) match, if any? (One of your senses might match more than one WordNet sense, of course.) For example,

    Sense 1: Matches WN senses 2,3,4,5

    (c) Do any of your senses group naturally into a class with common elements of meaning? How would you group them? (Use a hierarchy if that makes more sense.)

    (d) How do these variations in word meaning relate to our discussion of centrality of category membership and characteristic features?

(3) Comparing WordNet to other systems of organization.

  • (a) How is WordNet similar to and different from a standard thesaurus (like Roget's thesaurus)?
  • (b) How is WordNet similar to and different from a standard dictionary?
  • (c) How is WordNet similar to and different from an encyclopedia?
  • (d) How is WordNet similar to and different from a classification thesaurus like Eric?
(4) A website called Lexical Freenet puts together different WordNet lexical relations in an interesting way. See where each link type comes from. For example, enter canary and bird and do a Connection search between them. How does this differ from the kind of relations found in question (1)? How can you make the system show only the kind of connections that WordNet would show between canary and bird (hypernym relations only)?

Optionally, (just for fun) try to find a really interesting relationship between two terms. My favorite is doing a "Connection" search on "hearst" and "berkeley" (be sure to click on More Paths).

Hierarchical Classification vs. Facets

 (5) Yahoo! (a catalog service for the World Wide Web) employs human categorizers to assign selected web pages to categories. A (variation of a) small fragment of the Yahoo category system is shown here. (Names surrounded by * indicate actual web pages, as opposed to categories of web pages.)

    (a) Is this a hierarchical or a faceted classification system?

    (b) Convert the category system from its given structure (hierarchical or faceted) to the other of these two kinds of structure. Show where each of the categories and web pages appears in your converted structure. (You may refer to line numbers rather than writing out the entire name of each category to save time if you like. The numbers have no meaning except to act as a quick way to refer to the categories.)

    (c) Is this new arrangement better? Why or why not?