• Startseite
  • Open Source

    Information Retrieval • Enterprise Search • Vertical Search • Digital Libraries • Media Monitoring • Web Mining • Web Science

    Kohlschütter Search Intelligence. Open Source. Erstklassig.

    Eigene Projekte

    • Boilerplate Removal and Fulltext Extraction from HTML pages
    • Unix Domain Sockets für Java
    • OS X (Darwin) Kernel module to monitor and control some EeePC functions

    Beiträge zu anderen Projekten (Auszug)

    • Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
      Bug Fixes, Enhancements (LUCENE-2134, 2133, 1918, 1186, 1185, 954)
    • Apache Tika™ is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika.
      Enhancements (TIKA-420, 477)
    • Designed for extension while providing robust support for the base HTTP protocol, the HttpClient component may be of interest to anyone building HTTP-aware client applications such as web browsers, web service clients, or systems that leverage or extend the HTTP protocol for distributed communication.
      Bug Fixes, Developed simple HTTP server for JUnit testing
    • Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
      Bug Fixes, Host-level Bucket-Queuing, Admin UI enhancements, Frontiers refactoring
    Copyright © 2010 Kohlschütter Search Intelligence | Impressum