Alexander's Blog

Sharing knowledge with the global IT community since November 1, 2004

Search Server is Not Necessary to Crawl PDF files in SharePoint Foundation 2010

/
/
ad-mania

A lot of blogs and articles on the Internet indicate that in order to crawl PDF documents in SharePoint Foundation 2010 you must install Microsoft Search Server. I want to clear this myth by stating that according to Microsoft, Search Server is not required to crawl PDF files in SharePoint Foundation 2010.

The main problem that people run into is the fact that, unlike WSS 3.0, SharePoint Foundation 2010 does not have an interface to add file extensions for additional file types and iFilters. So how can you crawl additional file types, such as PDFs, in SharePoint Foundation 2010? One easy solution is to use the following VB script. The VB script is available in the KB article 2518465. Here’s the step-by-step procedure.

  1. Copy the following content to notepad and save the file with a .vbs extension. For example, AddExtension.vbs.Sub UsageSub Usage

    WScript.Echo “Usage:    AddExtension.vbs extension”
    WScript.Echo

    end Sub

    Sub Main

    if WScript.Arguments.Count < 1 then
    Usage
    wscript.Quit(1)
    end if

    dim extension
    extension = wscript.arguments(0)

    Set gadmin = WScript.CreateObject(“SPSearch4.GatherMgr.1”, “”)

    For Each application in gadmin.GatherApplications
    For Each project in application.GatherProjects
    project.Gather.Extensions.Add(extension)
    Next
    Next

    End Sub

    call Main

  2. Copy the script to SharePoint Foundation Server and run it at the command prompt. This will add the PDF extension.
    > WScript AddExtension.vbs pdf
  3. Register the PDF iFilter by going to the following registry key.
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\14.0\Search\Setup\ContentIndexCommon\Filters\Extension\.
  4. Right-click the Extensions folder and select New, key.
  5. Enter .pdf for the key name.
  6. In the right-hand pane dobule-click the Default value and enter the following for the Value data:
    {E8978DA6-047F-4E3D-9C78-CDBE46041603}.
  7. Restart SPSearch4 by typing the following at the command prompt:
    net stop spsearch4
    net start spsearch4
  8. Run crawl by typing the following at the command prompt:
    >stsadm –o spsearch –action fullcrawlstart
    The stsadm.exe utility is located in the “14 Hive” folder at C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\BIN.
  9. You should now be able to crawl PDF files in SharePoint Foundation 2010.

Note that this method adds the PDF extension. You can use the same technique to add additional filters as necessary.

  • Facebook
  • Twitter
  • Linkedin

3 Comments

  1. I realize this is an old post, but hopefully you are tracking comments. I have seen this script before, but it does not work on my SP2010 Foundation server. When I run it I get a message that it cannot create the object named “SPSearch4.GatherMger.1”. I am running WScript from a SP 2010 Management shell window running as admin. Any suggestions as to why it does not work for me?

  2. You may get this error if you run the script on the SharePoint Server 2010. Are you sure you are running SharePoint Foundation 2010?

Leave a Comment

Your email address will not be published. Required fields are marked *

This div height required for enabling the sticky sidebar