• 热门专题

为推文优化的LuceneAnalyzer类

作者:  发布日期:2016-03-07 20:38:21
  • /***
     * @author YangXin
     * @info 使用Doublemetaphone函数对Twitter优化。
     * Doublemetaphone函数可以为发音相似的单词创建相同的键
     *  
     */
    package unitTwelve;
    
    import java.io.IOException;
    
    import org.apache.commons.codec.language.DoubleMetaphone;
    import org.apache.lucene.analysis.Analyzer;
    import org.apache.lucene.analysis.StopFilter;
    import org.apache.lucene.analysis.TokenStream;
    import org.apache.lucene.analysis.en.PorterStemFilter;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.analysis.tokenattributes.TermAttribute;
    import org.apache.lucene.util.Version;
    public class TwitterAnalyzer extends Analyzer{
    	private DoubleMetaphone filter = new DoubleMetaphone();
    	public TokenStream result = new PorterStemFilter(new StopFilter(true, new StandardTokenizer(Version.LUCENE_CURRENT, reader), StandardAnalyzer.STOP_WORDS_SET));
    	TermAttribute termAtt = (TermAttribute) result.addAttribute(TermAttribute.class);
    	StringBuilder buf = new StringBuilder();
    	try{
    		while(result.incrementToken()){
    			String word = new String(termAtt.term(), 0, termAtt.termLength());
    			buf.append(filter.encode(filter.encode(word)).append(" "));
    		}
    	}catch(IOException e){
    		e.printStackTrace();
    	}
    	return new WhitespaceTokenizer(new StringReader(buf.toString()));
    	}
    }

延伸阅读:

About IT165 - 广告服务 - 隐私声明 - 版权申明 - 免责条款 - 网站地图 - 网友投稿 - 联系方式
本站内容来自于互联网,仅供用于网络技术学习,学习中请遵循相关法律法规