solr,太阳黑子,坏请求,非法字符

我正在向我的项目介绍太阳黑子搜索。 我只是通过名字字段搜索获得了POC。 当我介绍说明字段并重新索引销售时,我收到以下错误。

** Invoke sunspot:reindex (first_time) ** Invoke environment (first_time) ** Execute environment ** Execute sunspot:reindex Skipping progress bar: for progress reporting, add gem 'progress_bar' to your Gemfile rake aborted! RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request Error: {'responseHeader'=>{'status'=>400,'QTime'=>18},'error'=>{'msg'=>'Illegal character ((CTRL-CHAR, code 11)) at [row,col {unknown-source}]: [42,1]','code'=>400}} Request Data: "ItemsDesign 1322ItemsDesignActiveRecord::BaseItemsDesignRiver City Clocks Musical Multi-Colored Quartz Cuckoo ClockThis colorful chalet style German quartz cuckoo clock accurately keeps time and plays 12 different melodies. Many colorful flowers are painted on the clock case and figures of a Saint Bernard and Alpine horn player are on each side of the clock dial. Two decorative pine cone weights are suspended beneath the clock case by two chains. The heart shaped pendulum continously swings back and forth.
On every 

我假设坏的焦点是你可以在底部看到。 在很多描述中都散落着。 我甚至不确定那是什么炭。

我该怎么做才能让solr忽略它或清理数据,以便卖出来处理它。

谢谢

将以下内容放在初始化程序中以自动清除任何UTF8控制字符的太阳黑子调用:

 # config/initializers/sunspot.rb module Sunspot # # DataExtractors present an internal API for the indexer to use to extract # field values from models for indexing. They must implement the #value_for # method, which takes an object and returns the value extracted from it. # module DataExtractor #:nodoc: all # # AttributeExtractors extract data by simply calling a method on the block. # class AttributeExtractor def initialize(attribute_name) @attribute_name = attribute_name end def value_for(object) Filter.new( object.send(@attribute_name) ).value end end # # BlockExtractors extract data by evaluating a block in the context of the # object instance, or if the block takes an argument, by passing the object # as the argument to the block. Either way, the return value of the block is # the value returned by the extractor. # class BlockExtractor def initialize(&block) @block = block end def value_for(object) Filter.new( Util.instance_eval_or_call(object, &@block) ).value end end # # Constant data extractors simply return the same value for every object. # class Constant def initialize(value) @value = value end def value_for(object) Filter.new(@value).value end end # # A Filter to allow easy value cleaning # class Filter def initialize(value) @value = value end def value strip_control_characters @value end def strip_control_characters(value) return value unless value.is_a? String value.chars.inject("") do |str, char| unless char.ascii_only? and (char.ord < 32 or char.ord == 127) str << char end str end end end end end 

来源(太阳黑子Github问题): 太阳黑子索尔重组索引由于非法字符而失败

我尝试了@thekingoftruth提出的解决方案,但它没有解决问题。 在他链接到的同一个github线程中找到了替代版本的Filter类,这解决了我的问题。

主要区别在于我通过HABTM关系使用嵌套模型。

这是我在模型中的搜索块:

  searchable do text :name, :description, :excerpt text :venue_name do venue.name if venue.present? end text :artist_name do artists.map { |a| a.name if a.present? } if artists.present? end end 

这是适用于我的初始化程序:(在: config/initializers/sunspot.rb

 module Sunspot # # DataExtractors present an internal API for the indexer to use to extract # field values from models for indexing. They must implement the #value_for # method, which takes an object and returns the value extracted from it. # module DataExtractor #:nodoc: all # # AttributeExtractors extract data by simply calling a method on the block. # class AttributeExtractor def initialize(attribute_name) @attribute_name = attribute_name end def value_for(object) Filter.new( object.send(@attribute_name) ).value end end # # BlockExtractors extract data by evaluating a block in the context of the # object instance, or if the block takes an argument, by passing the object # as the argument to the block. Either way, the return value of the block is # the value returned by the extractor. # class BlockExtractor def initialize(&block) @block = block end def value_for(object) Filter.new( Util.instance_eval_or_call(object, &@block) ).value end end # # Constant data extractors simply return the same value for every object. # class Constant def initialize(value) @value = value end def value_for(object) Filter.new(@value).value end end # # A Filter to allow easy value cleaning # class Filter def initialize(value) @value = value end def value if @value.is_a? String strip_control_characters_from_string @value elsif @value.is_a? Array @value.map { |v| strip_control_characters_from_string v } elsif @value.is_a? Hash @value.inject({}) do |hash, (k, v)| hash.merge( strip_control_characters_from_string(k) => strip_control_characters_from_string(v) ) end else @value end end def strip_control_characters_from_string(value) return value unless value.is_a? String value.chars.inject("") do |str, char| unless char.ascii_only? && (char.ord < 32 || char.ord == 127) str << char end str end end end end end 

您需要在保存内容的同时摆脱UTF8中的控制字符。 Solr不会正确重新编制索引并抛出此错误。
http://en.wikipedia.org/wiki/UTF-8#Codepage_layout

你可以使用这样的东西:

 name.gsub!(/\p{Cc}/, "") 

编辑:如果你想全局覆盖它,我认为可以通过覆盖AttributeExtractor中的value_for_methods以及需要的BlockExtractor来实现。 https://github.com/sunspot/sunspot/blob/master/sunspot/lib/sunspot/data_extractor.rb我没有检查这个。 如果你设法添加一些全局补丁,请告诉我。 我最近有同样的问题。