SharePoint Data access techniques using CAML Queries

Posted: January 3, 2014 in SharePoint
Tags:

CAML Queries are most commonly used  for easily accessing  data inside SharePoint.

SharePoint object model provides many useful Classes which use CAML queries for fetching the data you need (listed below). These classes are fine tuned for specific scenarios.

  1. SPQuery
  2. ContentIterator
  3. SPSiteDataQuery
  4. PortalSiteMapProvider
  5. CrossListQueryCache and CrossListQueryInfo

Lets explore each of the above and see which how they can be best used depending upon requirements.

SPQuery

SPQuery is probably most popular among SharePoint developers. Using Query class , we can execute a CAML query against an SPList instance, to retrieve items corresponding to the query.

using (SPSite site = new SPSite("http://spsite")) {
 using (SPWeb web = site.OpenWeb()) {
 SPList list = web.Lists["Contacts"];
 SPQuery query = new SPQuery();

 // Define columns to fetch
 query.ViewFields = "<FieldRef Name=\"Title\" /><FieldRef Name=\"Email\" />";

 // Force to fetch only the specified columns
 query.ViewFieldsOnly = true; 

 query.Query = "<Where><Contains><FieldRef Name=\"Email\" />
 <Value Type=\"Text\">@extremesharepoint.com</Value></Contains></Where>";

 //Define the maximum number of results for each page (like a SELECT TOP)
 query.RowLimit = 10;

 // Query for items
 SPListItemCollection items = list.GetItems(query);
 foreach (SPListItem item in items) {
 Console.WriteLine(item["Title"] +" : "+item["E-mail Address"]);
 }
 }
}

SPQuery can be used within any kind of application that query SharePoint data : windows application or web application. It is the best way to query a single list when items returned by query are frequently changing. And also when you need real time data in your results.

Below are few points you should take into consideration for achieving best performance out of SPQuery

  • Always bound SPQuery object using RowLimit.  An SPQuery without RowLimit has poor performance and may fail on large lists. You should specify a RowLimit between 1 and 2000. You can also usePaging in SPQuery   if you want to retreive more then 2000 items at a time .
  • Avoid Query Throttle exceptions. The maximum number of items you should retrieve in a single query or view in order to avoid performance degradation is 5000 – query threshold. If your query returns more items than the configured query threshold , the query will be blocked and you will not get results.
  • Use indexed fields in query where possible . If you are querying  a field that is not indexed, and the resulting scan encounters more than 5000 items(query threshold), your query not get any results.
ContentIterator

SharePoint Server 2010 provides a new class named ContentIterator that you can use to query lists without hitting throttle limits and hence can avoid receiving an SPQueryThrottleException. You should consider using ContentIterator if you need to run a query that will return more than 5,000 rows of data.

The ContentIterator object divides the list items into chunks and runs the query against one chunk of list data at a time. Each list item is processed asynchronously by a callback method until the query is complete.

The following  example demonstrates usage of the ContentIterator class.

    static int noOfErrors = 0;
    static int noOfItemsProcessed = 0;

    string camlQuery = @"<View><Query><Where>              
                    <IsNotNull>
                        <FieldRef Name='Title' />                       
                    </IsNotNull>               
                   </Where></Query></View>";

    ContentIterator iterator = new ContentIterator();
    SPQuery listQuery = new SPQuery();
    listQuery.Query = query1;
    SPList list = SPContext.Current.Web.Lists["Tasks"];
    iterator.ProcessListItems(list,
        listQuery,
        ProcessItem,
        ProcessError
    );
}

public  bool ProcessError(SPListItem item, Exception e) 
{ 
    // process the error
    noOfErorrs++; 
    return true; 
}
public void ProcessItem(SPListItem item)
{
    noOfItemsProcessed++;
    //process the item.
}

ContentIterator will iterate througth each item in the list and  invoke the callback methodProcessItem specified to process list items . If an error occurs while iterating the list,  theProcessError function is invoked.

To efficiently use ContentIteratoryou should include one of the three OrderBy clauses—

  • ContentIterator.ItemEnumerationOrderByID : Gets an OrderBy clause for a query that orders items by ID.
  • ContentIterator.ItemEnumerationOrderByPath : Gets an OrderBy clause that orders query items by URL.
  •  ContentIterator.ItemEnumerationOrderByNVPField : Gets an OrderBy clause for a query that orders items by the NVP index used in the <Where > clauseIt  actually enables the index to be used.

By default,  SharePoint adds a OrderBy clause that orders by content type, which ensures that folders are processed before list items. You should override this behavior with one of the three OrderBy clauses to take full advantage of  indexed fields.

The following code example shows how to use the ContentIterator.ItemEnumerationOrderByNVPField clause. The example assumes that you are querying an indexed field.

SPQuery query = new SPQuery();
query.Query = "<Where><Eq><FieldRef Name=\"IndexedFieldName\"/><Value Type=\"Text\">Sharepoint</Value></Eq></Where>" 
+ ContentIterator.ItemEnumerationOrderByNVPField;
ContentIterator contentIterator = new ContentIterator();
contentIterator.ProcessItemsInList(query,
    delegate(SPListItem item)
    {
        // Work on each item.
    },
    delegate(SPListItem item, Exception e)
    {
        // Handle an exception that was thrown while iterating.
        // Return true so that ContentIterator rethrows the exception.
        return true;
    }
);
SPSiteDataQuery

You can use SPSiteDataQuery  when you want to query multiple lists within a site collection simultaneously. Like the SPQuery object, the SPSiteDataQuery object has Query and ViewFields properties. In addition, the SPSiteDataQuery object also has Lists and Websproperties. Below example uses SPSiteDataQuery to  return all events from all calendar lists in the current site collection where the end date is later than today.

SPSiteDataQuery query = new SPSiteDataQuery;
query.Query = "<Where><Gt><FieldRef Name='EndDate'/><Value Type='DateTime'><Today OffsetDays=\"-1\"/></Value></Gt></Where>";
//Sets the list types to search.106 is for calendar list.
query.Lists = "<Lists ServerTemplate='106' />";
//Sets the Fields to include in results
query.ViewFields = "<FieldRef Name='Title' /><FieldRef Name='Location' />";
//Sets the scope of the query
query.Webs = @"<Webs Scope='SiteCollection' />";
//Define the maximum number of results for each page (like a SELECT TOP)
query.RowLimit = 10;
//Execute the query
DataTable table = SPContext.Current.Site.RootWeb.GetSiteData(query);

The Lists property is used to specify the lists that should be included in the query within the site collection. The Lists property can take several forms to specify the lists to include in the query.:

  • Setting Lists property to <Lists ServerTemplate=[value]/> limits the query to lists of a certain server template. For example, type 106 is a calendar. By default, this attribute is null and so the query is not limited to lists based on a particular template
  • Setting Lists property to <Lists BaseType=[value]/> limits the query to lists of a certain BaseType.By default, the query considers lists of BaseType 0 (generic lists)
  • Setting Lists property to <Lists Hidden=’true’/> includes hidden lists in the query. By default, the query considers all non-hidden lists.
  • Setting Lists property to <Lists MaxListLimit=[value]/> limits the query to considering no more than the specified number of lists. If the query exceed the limit, it fails with SPException. By default, the limit is 1000. When set to 0, there is no limit to the number of lists that are considered (You should avoid setting limit to 0)
  • You can also instruct to include specific lists only by using Lists property.For example, to include only 2 specific lists to search, use <Lists><List ID=”[list1GUID]” /><List ID=”[list2GUID]” /></Lists>.The ID attribute identifies each list.

The Webs property is used to set the scope of the query

  • Setting Webs property to  <Webs Scope=’SiteCollection’/>   includes all lists in the site collection.
  • Setting Webs property to  <Webs Scope=’Recursive’/>  includes only the lists in the current site or subsites beneath the current site.
A few important points to note about SPSiteDataQuery:
  • Like SPQuery, SPSiteDataQuery also throws exception when results exceeds the no of  items allowed by the MaxItemsPerThrottledOperation or the MaxItemsPerThrottledOperationOverride property of SPWebApplication. So, you should setRowLimit property for optimum performance and to avoid the throttle exceptions.
  • SPSiteDataQuery  does not consider indexed columns and so using index columns in query have no positive effect on performance. This behavior differs from SPQuery which considers indexed column values and can achieve better performance.
PortalSiteMapProvider

PortalSiteMapProvider is the navigation site map provider for SharePoint. The main purpose of PortalSiteMapProvider class is to help to cache the content for navigation.

Additionally, it is useful for aggregating data as it provides cached queries and access to cached object stores.PortalSiteMapProvider also offers efficient management of caching infrastructure for retrieving list data

We can use PortalSiteMapProvider.GetCachedListItemsByQuery method to query the list and also cache the query results.  The PortalSiteMapProvider.GetCachedListItemsByQuery method first checks cache to see if those items already exist. If they exist, the method returns the cached results. If not, it queries the list, stores the results in cache and then returns them.

You can use below example method in a webpart or user control

protected void TestPortalSiteMapProvider(HtmlTextWriter writer)
{
 SPWeb web = SPContext.Current.Web;
 SPQuery spquery = new SPQuery();
 spquery.Query = "<Where><IsNotNull><FieldRef Name='Title'/></IsNotNull></Where>";
 PortalSiteMapProvider provider = PortalSiteMapProvider.WebSiteMapProvider;
 PortalWebSiteMapNode node = (PortalWebSiteMapNode)provider.FindSiteMapNode(web.ServerRelativeUrl);
 SiteMapNodeCollection nodeCollec = provider.GetCachedListItemsByQuery(node, "Tasks", spquery, web);
 foreach (SiteMapNode smnode in nodeCollec)
 writer.Write(smnode.Title+"<br>");
}

It should be noted that PortalSiteMapProvider requires HTTPContext (SPContext)to work. So you cannot use it in the scenarios where HTTPContext is null, for example: Console\Windows applications, Timer Jobs etc.

The main advantage of using PortalMapSiteProvider is that it exploits SharePoint object cache and hence provides efficient data access.

Apart from querying list items, PortalSiteMapProvider can also be used aggregate information  for sites, property bags etc. The below example demonstrates how to use the PortalSiteMapProvider to retrieve the site property bag values for a specific key in the site collection.  Since this information does not reside in a list, neither the SPQuery nor the SPSiteDataQuery can easily retrieve the information in this case.

protected void GetPropertyBagValues()
 {
 PortalSiteMapProvider provider = PortalSiteMapProvider.CombinedNavSiteMapProvider;
 NameValueCollection sitenameAndImageUrl = new NameValueCollection();
 if (provider.CurrentNode != null && provider.CurrentNode.HasChildNodes)
 {
 foreach (PortalWebSiteMapNode node in provider.GetChildNodes(provider.CurrentNode))
 {
 //Retrieves value from the site property bag by using a specified object key. For e.g. SiteImageUrl
 if (node.GetProperty("SiteImageUrl") != null)
 {
 sitenameAndImageUrl.Add(node.Title, (string)node.GetProperty("SiteImageUrl"));
 } } } }

Using PortalSiteMapProvider is one of the best performing data access technique . However, you should be aware of certain limitations:

  • It cannot be used in windows  or console application since HTTPContext is null in these applications
  • It can only be use with Publishing sites in  Sharepoint Server and not with SharePoint foundation.
  • PortalSiteMapProvider is  most useful if the data you are retrieving is not significantly changing over time. If you are trying to frequently retrieve different list items or data, the PortalSiteMapProvider will constantly read from the database, insert data into the cache and then return data. So, if this is the case, we don’t benefit from PortalSiteMapProvider as we are not reading from cache. Due to additional tasks it performs(checking and inserting into cache) , We also have performance loss if we use PortalSiteMapProvider for such situations.
  • PortalSiteMapProvider uses the site collection object cache to store data. By default, the object cache is limited to 100 megabytes (MB). Hence, the amount of memory PortalSiteMapProvider can use may be limited.

[Note:  You can increase the size of the site collection object cache from the Object cache settings page in the site collection. However, we should note that the amount of memory assigned to the object cache comes out of the shared memory available to the application pool.  Therefore, you should carefully increase the limit after ensuring you have that much memory to consume. Check this out for details ]

CrossListQueryCache and CrossListQueryInfo

CrossListQueryCache and CrossListQueryInfo provide very scalable way to run cross–site queries like SPSiteDataQuery. Unlike  SPSiteDataQuery, CrossListQueryCache .GetSiteData()  uses cache and hence have better performance – if you use correct overloaded version of the method.

The CrossListQueryInfo object uses the CrossListQueryInfo object to get the cached results or, if there are no cached results available, it performs a cross-list query to the database and then caches the results for future use. Audience targeting is then applied to the result set, depending on the setting specified in the CrossListQueryInfo object. You can use the CbqQueryCacheobject to obtain a CrossListQueryInfo object for a specific Content by Query Web Part.

Overloaded methods Description   Uses Cache
GetSiteData(SPSite) Retrieves the cached data that is based on the CrossListQueryInfo specification.  YES
GetSiteData(SPWeb) Retrieves the data from the SPWeb object.  NO
GetSiteData(SPSite, String)
Retrieves the cached data from the SPSite 
and from the specified web url.
 YES
GetSiteData(SPWeb, SPSiteDataQuery) Retrieves the data by using the specified SPSiteDataQuery object.  NO

If you don’t use CrossListQueryCache .GetSiteData() version that supports caching then better to go for SPSiteDataQuery instead. Below is an example that uses cached version :

protected DataTable TestCrossListQueryCache()
{
DataTable dt = null;
CrossListQueryInfo crossListQueryInfo = new CrossListQueryInfo();
crossListQueryInfo.Query = "<Where><IsNotNull><FieldRef Name='Title'/></IsNotNull></Where>";
crossListQueryInfo.ViewFields = "<FieldRef Name=\"Title\" />";
crossListQueryInfo.Lists = "<Lists BaseType=\"0\" />";
crossListQueryInfo.Webs = "<Webs Scope=\"SiteCollection\" />";
crossListQueryInfo.UseCache = true;
CrossListQueryCache crossListQueryCache = new CrossListQueryCache(crossListQueryInfo);
dt = crossListQueryCache.GetSiteData(SPContext.Current.Site);
return dt;
}

Like PortalSiteMapProvider,CrossListQueryCache and CrossListQueryInfo also need HTTPContext to work. Hence they cannot be used in windows or console applications. For their usage, you should consider same points mentioned above for PortalSiteMapProvider.

MORE…..
KeywordQuery Class
In order to develop custom search web parts or applications that support ‘search-by-keyword’ scenario, SharePoint Query object model exposesKeywordQuery Class.
Conclusion

PortalSiteMapProvider gives you the power of SPQuery along with cache for better performance. CrossListQueryCache gives you the power of SPSiteDataQuery along with cache for better performance. Now, to use cache or not depends on the kind of data you want to  query. If your query returns frequently changing  Data-Sets then caching is actually a overhead and can in turn cause performance hit.

ContentInterator is good to use only when you want to process >5000 items which is the default query threshold limit.

Advertisements
Comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s