What is it?
Various search tools can be used to create a customized search for a Web site. Components may include:
- user interface (e.g. search box, search results page)
- server (hardware and software), crawler and indexer
Why use it?
- Provide users with an additional way to navigate your site
- Links and navigation menus are the traditional way to navigate the Web
- Search is especially important for sites with many pages
- Many other sites have a search box so users expect it
- Ideally every page on your site should have a site search box
-
- Search box is most often found in the page header
- Improve staff intranet productivity
- Find errors in order to improve page content
- Assure that all important content is indexed
- Create a custom search engine for a group or research topic
- Can include content from any sites on the Internet
Popular Tools
- Commercial
- Commercial, Free
- Google
- Google Custom Search Engine (Co-op/CSE) - SearchTools review
- Google Syndicated Search (replaced by CSE)
- Google Webmaster Tools
- Gigablast (Web Search, Site Search, Custom Topic Search)
- Swicki
- Google
- Open Source
CSU Libraries Demos
Google Custom Search Engine (Co-op/CSE)
- Interface is easy to customize using Libraries template
- Results are Google-like, with Google Custom Search logo
- Added code for menu to narrow search to one subdirectory
- Can search content on multiple servers (lib and digital)
- Keywords to narrow search
- Sites/URLs to include or exclude, wildcards allowed
- Editions: standard has ads, business/university/nonprofits do not
- Add to Google home page, get code
- Refinements to label categories in some sites
- Look and feel of search box and results
- Code to copy and paste in your search and results pages
- Collaboration of contributors, invited or volunteers
- Preview - try out your searches
Google Mini
- Turnkey server (hardware and software) in our server room
- Interface is fairly easy to customize using Libraries template
- Menu of collections to narrow search
- No Google branding needed
- Crawl and Index
- Crawl URLs - patterns to start, follow, or not crawl
- Crawl Schedule - continuous or specific days/times
- Crawler Access - internal, password-protected, proxy servers
- Collections - groups of URL patterns to search together
- Serving
- Front ends - separate interfaces for public, staff, test
- Output format, KeyMatch, related queries, remove URLs
- Front ends - separate interfaces for public, staff, test
- Status and Reports
- Crawl status - documents found/crawled/served
- Crawl diagnostics - URLs crawled, excluded or with errors
- Content statistics - documents by file type
- Search reports - collections, dates, keywords, queries
- Administration
- User accounts - admin or manager, collections, frontends
- Reset index - clear database and start
- Import/export configuration - backup all settings
- System, network, SNMP, certificates, SSL, LDAP, license
Features
User Interface
- Public and staff/restricted interfaces (front ends)
- Interface of Search and result pages can be customized?
- page layout, header, footer, colors, styles, ads
- Faceted search
- left navigation links to subcategories or topics with fixed # of items
- e.g. dates, countries, languages, subjects
- Collections (limit search to specific folders or sets of URLs you define)
- KeyMatch (staff-suggested URLs for highly-used keywords)
- Spellchecker ("did you mean...") and suggestions for related terms
- Advanced search
- Keywords/phrases (and, or, not, exact phrase, part of word)
- Limit (to a collection, language, format, domain, or field)
- Sort (by relevance, date, title, etc.)
- Output format (# results per page, long/short/URL, group by site)
- Duplicates/similar items are removed or grouped?
- XML search results available (for flexible formatting by scripts/XSLT)?
Crawl and Index
- Crawl/search multiple domains or hosts
- URLs to crawl
- Filters (remove domains or URLs from crawls, indexes or interfaces)
- File formats indexed (HTML, PDF, Word, Excel, etc.)
- Crawl frequency (increase/decrease overall or for certain pages/patterns)
- Usage reports (top queries, top keywords)
- Crawl reports (URLs crawled/excluded, errors)
- Helps create files for crawlers? (robots.txt, sitemap.xml)
- Access to password-protected pages or proxy servers
- Meta tag information used or ignored?
- Language and character set support?
Other Selection Criteria
- Provider: Commercial? Cost? Licensing? Open source?
- Limits: # domains, pages, queries; ads, vendor branding
- Platform: Windows or Unix? Apache or IIS? Programming language?
- Performance: Searches must be fast or users will go elsewhere
- Administration: Multiple administrators? Roles?
- Ease of configuration: GUI-based and/or file-based?
- Support: phone/email, user community, documentation, training, upgrades, longevity
Other Resources
- List of search engines - Wikipedia
- Building Vertical Search Engines. Greg Notess, Online, July/August 2007, Vol. 31, Issue 4, pp. 37-39
- Building a Better Search Engine. Mary Ellen Bates, Online, Mar/Apr2007, Vol. 31, Issue 2, p. 64
- A Comparison of Free Search Engine Software - Yiling Chen, 2004/2006
- Search Engine Software for Your Web Site - Search Engine Watch, 2003
- Search Engine Land: Search Engines
- Network Computing: Enterprise Search
- Search engines for Web developers - VirtualHosting.com