Skip to content


SharePoint 2007 vs. 2010: How do my list content types, policies, customXsn look like?

Sometimes it makes sense to look at the SharePoint content database directly (note: never make any changes to the db without going through documented APIs).

Say you have a site collection policy (for example for auditing) which is attached to one or more content types. When you make subsequent changes to the site collection policy, these changes are not propagated to the lists and the list content types immediately. Instead, they are updated using a (drum roll, please) timer job, which, by default, runs once per day (Service=”PolicyConfigService”, JobDefinition=”Information management policy”). The content type information which is attached to the lists is updated only when the timer job runs. So, the site collection policy definition you will see through the UI may diverge from the policy which is active for a given list at a given time.

Information about lists is stored in the dbo.AllLists table (note: not all information is in the db. SharePoint also reads the schema.xml of the list template at run-time from the file system. The database however contains the content type information). The AllLists table has a tp_ContentTypes column, containing a XML BLOB (or is it a CLOB?) describing the content types which are effective at a given time for the list.

The data in this column may look like this:

 image

Most of the information should look familiar. You’ll notice, that under the XmlDocuments node, there are several nodes containing Base64 encoded information data. This is an extensibility mechanism. Information about custom document information panels (xsn), event receivers, policies, forms … is stored here. See here for the possible types.

This data is surfaced through the object model in the SPContentType.XmlDocuments property, using unique NamespaceURIs as identifiers (The handling is kind of ugly in my eyes …).

The most obvious NamespaceURIs are:

office.server.policy policies
microsoft.office.server.policy.changes PolicyDirtyBag (thanks for the consistent naming, Microsoft)
http://schemas.microsoft.com/office/2006/metadata/customXsn document information panels. afaik, no api other than “build xml and assign to SPContentType.XmlDocuments” available.
http://schemas.microsoft.com/sharepoint/events event receivers
http://schemas.microsoft.com/sharepoint/v3/contenttype/forms custom forms

If you want to subsequently decode the Base64 blobs, which contain XML themselves, use something like the following query. This will dump the content types and decode all the information under XmlDocuments.

— run this query against the SharePoint 2007 content database. You should change the where clause below

;with AllListsContentTypes ([ListId], [ContentTypeId], [ContentTypeName], [XmlDocumentNamespaceURI], [XmlDocument])
as
(
    select xtab.tp_Id,
            ContentTypes.x.value(N’@ID’, N’nvarchar(max)’),
            ContentTypes.x.value(N’@Name’, N’nvarchar(max)’),
            XDocs.x.value(N’@NamespaceURI’, N’nvarchar(max)’),
            cast(XDocs.x.value(N’.’, N’varbinary(max)’) as xml)
    from
    (    select tp_Id, CAST(tp_ContentTypes as xml) CTypes
         from dbo.AllLists
         where tp_Id = N’<List Id here>’
        –Note: Add where-clause here
    ) xtab
    cross apply xtab.CTypes.nodes(N’/ContentType[@Name]’) ContentTypes(x)
    outer apply ContentTypes.x.nodes(N’./XmlDocuments/XmlDocument’) XDocs(x)
)
select ContentTypeName, XmlDocumentNamespaceURI, XmlDocument from AllListsContentTypes
order by ListId, ContentTypeId, ContentTypeName, XmlDocumentNamespaceURI

That query will shred the xml and display it as a nicely formatted rowset …

image

SharePoint 2010

Now, in 2010 (all info Beta 2) the database schema looks fairly similar to 2007. There are some extensions, but the basic schema remains roughly unchanged. However, one bigger change is that most of the columns which contained plain XML (content type definitions, views …) or BLOBs are now compressed in the database. This is a performance improvement, as such information can grow big. If you look at the 2010 content database, you will notice columns of type tCompressedBinary or tCompressedString. These two types are simply aliases for the varbinary(max) type. For SQL Server, this is just a varbinary(max) column with nothing else attached to it.

image

Most of the columns typed as tCompressedString still contain XML (like the tp_ContentTypes column in dbo.AllLists), but deflated using the algorithm described in RFC1950. The (de)compression is done solely in the SharePoint process (application pool, timer service et al). SQL Server knows nothing about these mechanisms.

Unfortunately, it is generally difficult to decompress such data using pure T-SQL. So instead of using a simple script like above for 2007, for 2010 another friend comes in handy: PowerShell.

I’ll post a PowerShell script later. The basic mechanism when reading tCompressedString (thanks) is skipping the first 14 bytes and then using System.IO.Compression.DeflateStream on it. That gives the XML, including the Base64 encoded parts, which you can decode using Convert.FromBase64String.

Posted in SharePoint, SharePoint 2007, SharePoint 2010.

Tagged with , , , , , .


3 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Luiz says

    Hi… great post !!
    Could you send me the script to make the decompression of tp_field in SharePoint 2010 database?

  2. Steven says

    Nice article!

    I’m posting my C# code about the decompression, i didn’t found the powershell script…

    SqlBytes bytes = sqlDataReader.GetSqlBytes(sqlDataReader.GetOrdinal(“xmlFields”));
    Stream stream = bytes.Stream;
    for (int i = 0; i < 14; i++)
    stream.ReadByte();

    System.IO.Compression.DeflateStream deflateStream = new System.IO.Compression.DeflateStream(stream, System.IO.Compression.CompressionMode.Decompress, true);
    StreamReader reader = new StreamReader(deflateStream);
    string fields = reader.ReadToEnd();

Continuing the Discussion



Some HTML is OK

or, reply to this post via trackback.