kenton/gist:5349785

## gistfile1.xml
    <!-- current implementation -->
    <fieldType name="text" class="solr.TextField" omitNorms="false">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <!--<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>-->
      </analyzer>
    </fieldType>

    <!-- proposed updated implementation -->
    <fieldType name="text_ws" class="solr.TextField">
      <analyzer>
        <charFilter class="solr.MappingCharFilterFactory"mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- notes / thoughts on our current solr schema
      StandardTokenizerFactory - I'd keep the StandardTokenizer vs. the WhitespaceTokenizer
      because the StandardTokenizer tokenizes text based on whitespace *and*
      word boundary rules specified by Unicode.  WhitespaceTokenizer only
      tokenizes based on whitespace.

      StandardFilterFactory - not sure why we have this one in our schema.  According to the docs, this
      was used pre-solr3.1.  We're running 3.6 on our staging machines, presumably same verson in production.
      Either way, unlikely that prod is running < v.3.1 so this could be updated to ClassicFilterFactory.
      ClassicFilterFactory should be kept around also.  It removes periods from the end of tokens
      and from acronyms.

      LowercaseFilterFactory - having this makes sense in either case

      PorterStemFilterFactory - we need some sort of stemmer in the mix.  There are a few to choose from.
      I can't find much that gives good technical rationale for choosing one over the other.

      ASCIIFoldingFilterFactory - I'd keep the ASCIIFoldingFilter vs. using the MappingCharFilter w/IOSLatin1Accent
      The ISOLatin1Accent is is just a mapping of ISO Latin1 characters to ASCII.  This
      is probably sufficient, but may not be and there could be other characters that
      slip through the cracks that we need mapped to ASCII.  The Solr 3 book from Packt Publishing
      mentions that MappingCharFilter can be used w/FoldToASCII also, but recommends using
      ASCIIFoldingFilterFactory instead as it should be faster.

    -->
	<!-- current implementation -->
	<fieldType name="text" class="solr.TextField" omitNorms="false">
	<analyzer>
	<tokenizer class="solr.StandardTokenizerFactory"/>
	<filter class="solr.StandardFilterFactory"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.PorterStemFilterFactory"/>
	<filter class="solr.ASCIIFoldingFilterFactory"/>
	<!--<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>-->
	</analyzer>
	</fieldType>

	<!-- proposed updated implementation -->
	<fieldType name="text_ws" class="solr.TextField">
	<analyzer>
	<charFilter class="solr.MappingCharFilterFactory"mapping="mapping-ISOLatin1Accent.txt"/>
	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
	</fieldType>

	<!-- notes / thoughts on our current solr schema
	StandardTokenizerFactory - I'd keep the StandardTokenizer vs. the WhitespaceTokenizer
	because the StandardTokenizer tokenizes text based on whitespace and
	word boundary rules specified by Unicode. WhitespaceTokenizer only
	tokenizes based on whitespace.

	StandardFilterFactory - not sure why we have this one in our schema. According to the docs, this
	was used pre-solr3.1. We're running 3.6 on our staging machines, presumably same verson in production.
	Either way, unlikely that prod is running < v.3.1 so this could be updated to ClassicFilterFactory.
	ClassicFilterFactory should be kept around also. It removes periods from the end of tokens
	and from acronyms.

	LowercaseFilterFactory - having this makes sense in either case

	PorterStemFilterFactory - we need some sort of stemmer in the mix. There are a few to choose from.
	I can't find much that gives good technical rationale for choosing one over the other.

	ASCIIFoldingFilterFactory - I'd keep the ASCIIFoldingFilter vs. using the MappingCharFilter w/IOSLatin1Accent
	The ISOLatin1Accent is is just a mapping of ISO Latin1 characters to ASCII. This
	is probably sufficient, but may not be and there could be other characters that
	slip through the cracks that we need mapped to ASCII. The Solr 3 book from Packt Publishing
	mentions that MappingCharFilter can be used w/FoldToASCII also, but recommends using
	ASCIIFoldingFilterFactory instead as it should be faster.

	-->