ma.gnolia

Sign In | Learn More

Skip to main content


13332_32 era's Bookmarks Tagged With "ir"

  1. Visit Snowball Snowball

    Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. This site describes Snowball, and presents several useful stemmers which have been implemented using it.

  2. Visit RCV1: A New Benchmark Collection for Text Categorization Research (Lewis, D. D.; Yang, Y.; Rose, T.; and Li, F.; Journal of Machine Learning Research, 2004) RCV1: A New Benchmark Collection for Text Categorization Research (Lewis, D. D.; Yang, Y.; Rose, T.; and Li, F.; Journal of Machine Learning Research, 2004)

    "Reuters Corpus Volume 1 (RCV1) (Rose, Stevenson and Whitehead, 2003) [...] consists of over 800,000 newswire stories that have been manually coded using three category sets. However, RCV1 as distributed [...] includes known errors in category ass… More

  3. Visit Research on N-Grams in Information Retrieval Research on N-Grams in Information Retrieval

    Fairly nice and comprehensive bibliography, including links to relevant patents and a few applications, except (a) it's mainly IR-oriented and (b) last upated in 1997 (ouch!)

  4. Visit US Patent 6,621,930: Automatic categorization of documents based on textual content (Smajda) US Patent 6,621,930: Automatic categorization of documents based on textual content (Smajda)

    Patent by Frank Smajda "Haifa, US, IL" (-: I wonder if the patent could be contested for a fib like that)

Didn't find what you were looking for? Try searching Google.