scispace - formally typeset
S

Soumen Chakrabarti

Researcher at Indian Institute of Technology Bombay

Publications -  208
Citations -  16289

Soumen Chakrabarti is an academic researcher from Indian Institute of Technology Bombay. The author has contributed to research in topics: Ranking (information retrieval) & Web page. The author has an hindex of 55, co-authored 208 publications receiving 15481 citations. Previous affiliations of Soumen Chakrabarti include University of California & Indian Institutes of Technology.

Papers
More filters
Journal ArticleDOI

Focused crawling: a new approach to topic-specific Web resource discovery

TL;DR: A new hypertext resource discovery system called a Focused Crawler that is robust against large perturbations in the starting set of URLs, and capable of exploring out and discovering valuable resources that are dozens of links away from the start set, while carefully pruning the millions of pages that may lie within this same radius.
Proceedings ArticleDOI

Enhanced hypertext categorization using hyperlinks

TL;DR: This work has developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained and its technique also adapts gracefully to the fraction of neighboring documents having known topics.
Proceedings ArticleDOI

Keyword searching and browsing in databases using BANKS

TL;DR: BANKS is described, a system which enables keyword-based search on relational databases, together with data and schema browsing, and presents an efficient heuristic algorithm for finding and ranking query results.
Journal ArticleDOI

Automatic resource compilation by analyzing hyperlink structure and associated text

TL;DR: An evaluation of ARC suggests that the resources found by ARC frequently fare almost as well as, and sometimes better than, lists of resources that are manually compiled or classified into a topic.
Book

Mining the Web: Discovering Knowledge from Hypertext Data

TL;DR: This chapter discusses the infrastructure of the Web, the future of Web mining, and applications of semi-supervised learning for text and similarity and clustering.