Tag Archives: Crawl

Difference between crawl database and crawl component

What is a crawl database?

In Search Server 2010, a crawl database contains data that are interrelated with the location of content sources, crawl schedules, and other information specific to crawl operations for a specific Search service application. The database load can be distributed by adding crawl databases to different computers that are running SQL Server. Crawl databases are associated with crawl components and can be associated with specific hosts by creating host distribution rules.

What is a crawl component?

In Search Server 2010, a crawl component is used to process crawls of content sources and propagates the resulting index files to query components. It adds information about the location and crawl schedule of content sources to their associated crawl databases. Crawl components are associated with a single Search service application and can be added to different farm servers for distributing the crawl Load.

Reference: Search 2010 Architecture and Scale - Part 1 Crawl

SharePoint “Crawl Log error: Access Denied”

Issue
Windows Server 2003 SP1 introduced a loopback security check. This feature is obviously also present in Windows Server 2008. The feature prevents access to a web application using a fully qualified domain name (FQDN) if an attempt to access it takes place from a machine that hosts that application. The end result is a 401.1 Access Denied from the web server and a logon failure in the event log.

1. Login to the SharePoint server.
2. Click Start, click Run, type regedit, and then click OK.
3. In Registry Editor, locate and then click the following registry key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa

4. Right-click Lsa, point to New, and then click DWORD Value.
5. Type DisableLoopbackCheck, and then press ENTER.
6. Right-click DisableLoopbackCheck, and then click Modify.
7. In the Value data box, type 1, and then click OK.
8. Quit Registry Editor, and then restart your computer.

However, for production environments, DO NOT DISABLE this feature. You are unpicking a serious security check of the OS. If that environment underwent a security audit by a competent security engineer, it would be flagged. You should add a list of addresses you wish to exclude. This makes your scenario work whilst retaining the security check.

1. Login to the SharePoint server.
2. Click Start, click Run, type regedit, and then click OK.
3. In Registry Editor, locate and then click the following registry key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\MSV1_0
4. Right-click MSV1_0, point to New, and then click Multi-String Value.
5. Type BackConnectionHostNames, and then press ENTER.
6. Right-click BackConnectionHostNames, and then click Modify.
7. In the Value data box, type the host name or the host names for the sites that are on the local computer, and then click OK.
8. Quit Registry Editor, and then restart the IISAdmin service.  (Or Recycle the App-Pool).

Ref: http://support.microsoft.com/kb/896861

SharePoint Search Crawler Error After Installing .NET 3.5 SP1

I installed Microsoft .NET 3.5 SP1 on SharePoint 2007 today, which was a prerequisite for an updated 3rd party application that we use with SharePoint. After the upgrade, I attempted to perform a Full Crawl but the crawl only took a few seconds and only Access Denied errors were displayed in the crawl log. After investigating the Event Viewer on the SharePoint Search server, the following error was found:

Event Type: Warning
Event Source: Office Server Search
Event Category: Gatherer
Event ID: 2436
Date:  1/27/2011
Time:  10:29:16 PM
User:  N/A
Computer: SHP1
Description:
The start address http://intranet.contoso.com/ cannot be crawled.
Context: Application ‘SSP’, Catalog ‘Portal_Content’
Details:
Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has “Full Read” permissions on the SharePoint Web Application being crawled.   (0x80041205) For more information, see Help and Support Center at

http://go.microsoft.com/fwlink/events.asp.

After some investigation, A fix was found in KB896861 by adding theDisableLoopbackCheck key to the registry. Once the registry key was added and the IIS Admin service was restarted, I attempted to perform another full crawl and this time it was successful!

Ref : http://support.microsoft.com/kb/896861