Thursday, January 1, 2009

Accurate Metadata Values May Be Difficult to Maintain Over Time in SharePoint Server 2007

In working with Content Types and metadata, I've learned of a situation that I hope will be resolved either in a service pack or the next version of SharePoint Server. Please read and follow these steps and you'll see what I mean. What I'm proposing is that there is an inherent flaw in how metadata is saved and stored between Office 2007 and SharePoint Server 2007. I'm assuming a collaboration portal template for this post. Specifically, metadata saved for a particular Custom metadata field is not always over-written by metadata values that we enter into the SharePoint Edit Properties interface. This can lead to confusion about which metadata value is the most accurate and/or the most recent description of the document and potentially impair the integrity of your document taxonomy efforts.
In this post, I'll go over the steps that lead to this problem and then offer some best practices for working with content types, metadata and documents in Office SharePoint Server 2007.

Create a New Content Type

To create a new content type, go into the Content Type gallery and create a new content type based on the Document content type. To do this, follow these steps:
  1. From Site Actions, click on Site Settings, then Modify All Site Settings.
  2. Click on the Site Content Types link
  3. From the Site Content Type Gallery page, click on the Create link
  4. Configure the page as shown below, and then click OK.

Now, you have created a new content type that is based on the default Document content type. Why did we do this? Because it is best practice to not modify the Out-of-the-Box (OOB) content types. Service pack upgrade and/or hotfix upgrades can over-write your modifications to default objects in SharePoint. We always recommend that you not modify OOB files and objects in SharePoint for this reason.
Now, let's add this new content type to a document library and then give the Test Document Template content type some unique metadata. First, to add the content type to a document library, open the document library, click on Settings, then Document Library Settings and then click on the Advanced link. Select "Yes" for the Allow Management of Content Types? Configuration, as shown here:

Click OK (not illustrated) and then navigate back to the Customize Documents page by clicking on the Settings link. In the Content Types section, click the Add from Existing Site Content Types link and the select the new document content type to add to the document library.

Now I'll modify the content type to include new metadata. I'll add a simple column called "document type". Back in the content type gallery, I'll click on the test document template content type and then click the link to Add from New Site Column under the Columns section. I'll configure it as a single line of text data type and accept the other defaults, then click OK.

Create a New Document Based on the New Content Type

Now, I'll go back to the document library that this content type is associate with and click the New drop down list and select the Test Document Template content type to create a new document based on this content type. In Word 2007, you can see that the "document type" metadata field appears in the Document Information Panel.

Once saved, you'll find that the document's metadata appears in the SharePoint interface, as illustrated here:

Now, what's interesting is that this metadata field has been added to the document's Custom properties but the field is not populated. Even if I change the value to Type100, it is not saved to the document's Custom tab. I can save this over and over again, publish it, check it out, check it in, publish again, save it again and the Type100 value is not saved to the document's Advanced Properties on the Custom tab.
So, now, I'll go back to the content type and add another metadata field (column) called Doc Num. I'll populate that field with a value of 1234. (I'll also remove the pending states from the documents in this library to eliminate that variable.) Now, what happens when I remove the Document Type column from the content type? Well, the field's values are automatically written to the Advanced properties of the document on the Custom tab and that field no longer appears in the SharePoint interface.
Now, if I go back and remove the Doc Num column from the Test Document content type and re-add the Document Type column, the Type100 value will not be pulled from the Advanced properties of the document to pre-populate the Document Type metadata field. So, you'll be able to enter a new metadata value for this field, such as TypeABC and save it in the SharePoint interface. Note also that the Doc Num field is written to the Custom tab in the Advanced Properties field since it is no longer available as part of the content type.

When we open the Advanced Properties and look at the Custom metadata on the document, we see that the Doc Num field has been written to the Custom tab but the Type100 value has not been over-written by the new value of TypeABC. This is true, even if we publish, save and re-save the current document in SharePoint with the TypeABC metadata assigned to the Document Type metdata field.

The real problem with this scenario is that the crawler in SharePoint will crawl both metadata values and return the same document for both values for the same metadata field. Consider these two screen shots:


There are several problems here:
  1. Since Site Collection Administrators can create and modify content types, there exists a strong need to educate them about when and how to create and modify content types. You will not be successful if you implement content types without forethought, planning and education. Their education should be based on your organization's taxonomy plan and structure.
  2. If you remove the content type's metadata (columns) after the document's metadata has been populated, you'll have to manually visit each document to remove the old metadata.
  3. Confusion can be created if the old metadata on a document persists and is considered to be wrong if the users can still find a document using the old metadata.
  4. Old metadata assignments that persist might violate some legal and industry compliance rules.


Print this post

No comments: