Thursday, April 17, 2014

Week 15 reading note



The use of wildcards allows generalization beyond specific words, while contextual restrictions limit the wildcard-matching to entities related to the user's query.

The first type of wildcards should match entities of a specific type. We call these as S-Wildcards (Specific entity type Wildcards). The second type of wildcards can match against all types of entities. We call these as A-Wildcards (Any entity type Wildcards).

The ontology becomes a means of communication between the user and the system and helps overcome the bottlenecks in information access, which is primarily based on keyword searches. It supports information retrieval based on the actual content of a page and helps navigate the information space based on semantic concepts.

Monday, April 14, 2014

Week 14 muddiestPoint

Why the tools like stemming is different between TextCat and IR?

Thursday, April 10, 2014

Week 14 reading note



Because we have to look at the data at least once, NB can be said to have optimal time complexity. Its efficiency is one reason why NB is a popular text classification method.

The answer is that even though the probability estimates of NB are of low quality, its classification decisions are surprisingly good.

NB’s main strength is its efficiency: Training and classification can be accomplished with one pass over the data. Because it combines efficiency with good accuracy it is often used as a baseline in text classification research.



Thus, linear models in high-dimensional spaces are quite powerful despite their linearity. Even more powerful nonlinear learning methods can model decision boundaries that are more complex than a hyper plane, but they are also more sensitive to noise in the training data. Nonlinear learning methods sometimes perform better if the training set is large, but by no means in all cases.


Monday, April 7, 2014

Tuesday, April 1, 2014

Week 13 reading note

      The user profiling process generally consists of three main phases.


     No matter which construction method is chosen, the profile must be kept current to reflect the user’s preferences accurately; this has proven to be a very challenging task.