Types of Crawl in SharePoint 2013

Posted: January 3, 2014 in SharePoint
Tags: ,

In SharePoint search the most of time people annoying that actual content and search content not in Sync. So the search administrator keeps hitting head on the wall and putting same excuse in front of stakeholders “please wait for the next incremental crawl, most of the time :)”. As we know we already have two content crawling methods first is “Full Crawl” and second is “Incremental crawl”.

 

Disadvantage of the “Full Crawl” and “Incremental Crawl” as both can’t run in a parallel i.e. the content change during the crawl, it required next incremental crawl.
So what is new in continues crawl?
The content source that using continues crawl that run in parallel. The default waiting time is 15 min. So the default wait time can change via PowerShell , no UI for that. Now the content is up to date most of the time. This crawler only for SharePoint content source, so the job of the SharePoint administrator need to identify those content which are keep updating on the regular interval & also comes under the part of search need to be comes under “Continues crawl “category. The “Continuous Crawl” is a type of crawl that aims to maintain the index as current as possible. So the following are list of crawl are available in SharePoint 2013 search architecture.
  1. Run By User
    • Full Crawl
    • Incremental Crawl
    • Continues crawl
  2. Run By system (automated crawl)
    • Incremental Crawl (clean-up)
  1. Run by User: The content source created by user/Administrator and it is trigger/ scheduled by the user.
    • Full Crawl: 
      • Crawl full items
      • Can be scheduled
      • Can be stop and paused
      • When required
        • Change content access account
        • Added new manage properties
        • Content enrichment web service codes change/modified.
        • Add new IFilter
    • Incremental Crawl:
      • Crawl last modified content
      • Can be scheduled
      • Can be stop and paused
      • When required
        • Crawl last modified content
    • Continues Crawl
      • Index as current as possible.
      • Cannot be scheduled
      • Cannot be stop and paused (Once started, a “Continuous Crawl” can’t be paused or stopped, you can just disable it.)
      • When required
        • Content frequently changed (Multiple instance can be run in parallel).
        • Only for SharePoint Content Source
        • E-commerce site in crass site publishing mode.
  2. Run by System: The crawl run automatically by the timer job.
    • Clean-Up continues crawl (Microsoft definition): A continuous crawl does not process or retry items that return errors more than three times. A “clean-up” incremental crawl automatically runs every four hours for content sources that have continuous crawl enabled to re-crawl any items that repeatedly return errors. This incremental crawl will try to crawl the item again and then will postpone retries if the error persists.

**********************************************************************************************************

SharePoint 2013: Continuous Crawl and the Difference Between Incremental and Continuous Crawl

With the new version of SharePoint a new type of crawl appeared in 2013 named « Continuous Crawl ».  For Old schools like me on SharePoint 2010 we had 2 crawls available and configurable on our Search Service Application.

  • Full : Crawl all content,
  • Incremental : As the name is says, it crawls content has been modified since the last crawl.

The disadvantage of these crawls, is that once launched, you are not able to launch a second in parallel (on the same content source), and therefore the content changed in the meantime we will need to wait until the current crawl is finished (crawl and another) to be integrated into the index, and therefore to be found via search. An example :

  • A incremental crawl named ALFA is started and will last 50 take minutes,
  • After 10 minutes of crawling a new document has been added, so we need a second incremental crawl named BETA to get the document in the index.
  • This item will have to wait at least 40 minutes to be integrated into the index.

 

So, we can’t keep an updated index with the latest changes, because latency is invited in each crawling process. It is possible that in most of cases this operation is suitable and favorable for your clients, but for those who want to search their content immediately or after their integration into SharePoint there is now a new solution in SharePoint: “Continuous Crawl“.

 

The Continuous Crawl  So resuming: The “Continuous Crawl” is a type of crawl that aims to maintain the index as current as possible.

His operation is simple: once activated, it will launch the crawl at regular intervals. The major difference with incremental crawl is that the crawl can run in parallel, and do not expect that the crawl is completed prior to launch.

Important Points:

  • “Continuous Crawl” is only available for sources of content type “SharePoint Sites”
  •  By default, a new crawl is run every 15 minutes, but the SharePoint administrator can change this interval using the PowerShell cmdlet Set-SPEnterpriseSearchCrawlContentSource  ,
  • Once started, a “Continuous Crawl” can’t be paused or stopped, you can just disable it.

If we take our example above with “Continuous Crawl”:

  •  Our ALFA crawl starts and will take at least 50 minutes,
  •  After 10 minutes of crawling an item already crawl is hereby amended, and requires a new crawl.
  •  Crawl “BETA” is launched,
  •  The crawl “BETA” starts in (15-10) minutes,
  •  Therefore this item will not need to wait 5 minutes (instead of 50 minutes) to be integrated into the index.

 

 

1- How to Enable it?

In Central Administration, click on your search service application, and then in the menu on the “Content Sources”

 

Clique on « New Content Source » at the menu

 

Chose « SharePoint Sites »

  

Select « Enable Continuous Crawls »

  

 

  • The content source has been created so we can see his status on « Crawling Continuous »

 

2 – How to disable it?

  •  From the content source page, chose the option “Enable Incremental Crawls” option. This will disable the continuous crawl.
  •  Save changes.

 

3 – How to see if it works ?

Click on your service application search then “Crawl Log” in the section “Diagnostics”.

 

Select your Content Source and click on « View crawl history »

Or via PowerShell Execute the followoing cmdlets  $SearchSA = « Search Service» Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $SearchSA | select *

 

Impact on our Servers

The impact of a “Continuous Crawl” is the same as an incremental crawl. At the parallel execution of crawls, the “Continuous Crawl” within the parameters defined in the “Crawler Impact Rule” which controls the maximum number of requests that can be executed by the server (default 8).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s