CS 680: Internet Systems Research
Final Review

General

Closed Book, Closed Notes
Comprehensive, but more on material since the midterm. See topics below.

Overall: What is the key contribution of each paper we read? What problem is each paper/system trying to address? How does each system relate to the work that came before it? For instance, if you were walking down the street and someone asked you, "What is the Haystack system all about?," could you answer intelligently?

The sample questions below are indicative of the types of questions you'll see on the final.

Cross-Topic

What is the abundance problem? Discuss how the following systems and system types address the abundance problem:

    Google, HITS (Kleinberg), Outride, De.li.ci.ous, A9, Technorati, Chakrabarti's focused crawler, Referral Web

Research

What is the format of a proper research paper/talk? Why is a related work section important?

Semantic Web

What is an ontology?

What is the key conceptual differences between today's web and the semantic web. How does XML and RDF fit into the semantic web? How do web services fit in?

What is RDFS? How does it compare to RDF? How does it compare to Java?
How does RDF compare to XML?

Semi-structured data allows for objects to have dynamic structure, i.e., any user can define a new attribute or association for an object? Discuss the implications of this? How is it liberating? How does it make the job of a semantic web browser more difficult? How does RDFS and schemas in general play into this issue?

What are the challenges in browsing/searching the semantic web?

Do all URIs map to a physical location? What implications does this have for a semantic web browser? What if all URIs did map to a physical location-- would this make browsing the semantic web as easy as the regular web?

All three parts of an RDF tuple can be URIs, including the subject, predicate, and object. Why is this important for the subject and object? Why is it important for a predicate?

The Semantic Search (TAP) system is based on particular data stores. Haystack is different. How so? How is it more like Technorati?

What is the relationship between the semantic web and the model-view-controller (MVC) paradigm?

What is the difference between the semantic web and yesterday's knowledge bases?

Terminology:
structured, semi-structured, unstructured data
RDF
RDFS -- rdfs:range, rdfs:domain
LSID
VOWL
URI

Systems
 
TAP, Haystack, Magpie, Annotea,

Crawling

What is the motivation behind a focused crawler?
How does Charkrabarti's crawler use Bayesian classification?
How does Chakrabarti's system make use of Kleinberg's hubs and authorities idea?
What type of crawling occurs in Lieberman's Letizia?
How would a focused semantic web crawler (scutter) work?

Terminology: frontier, BFS, best-first, classifier, distiller

Collaborative Filtering, Social Networks, People

What is the difference between Referral Web and what Adamic, et.al. did in their Friends and Neighbors work?

How could the existence of people and people-association data (e.g., a network of foafs) help us search the web?

Systems: GroupLens, Referral Web, Golbeck's email filtering system based on reputation, Adamics Friends and Neighbors

P2P

How does a distributed hash table work? How does Chord's skiplist algorithm work? How does the Chord system handle a new node being added to the system?

What are the key innovations of BitTorrent? How does BitTorrent differ from DNS? Gnutella? Original Napster?

Design a P2P system for browsing the semantic web, making it as decentralized as possible.

Terminology
torrent, tracker, skiplist, pareto efficiency
BitTorrent: rarest first, random first piece, endgame mode, random unchoking

Systems
Chord, BitTorrent

Folksonomy

How does the idea of folksonomy compare to a system whereby users classify data into fixed ontologies? What are the advantages and disadvantages of each? Is there a way folksonomies and taxonomies could live together? How?

What problem does the concept of folksonomy address?

Terminology
tags, ethnoclassification

Systems

de.li.cio.us, flikr