IT技术互动交流平台

solr和hbase结合进行索引搜索

来源:IT165收集  发布日期:2016-07-14 22:04:37

solrcloud集群情况
solrcloud集群已经安装完成。
solr版本:5.5.0,zookeeper版本:3.4.6
solr的操作用户、密码: solr/solr123
solr使用的zookeeper安装位置:/opt/zookeeper-3.4.6
solr安装位置:/opt/solr-5.5.0
solr端口:8983
zookeeper端口:9983
5台机器,每台机器上安装的都有solr和zookeeper

zookeeper启动:/opt/zookeeper-3.4.6/bin/zkServer.sh start
zookeeper停止:/opt/zookeeper-3.4.6/bin/zkServer.sh stop
zookeeper状态:/opt/zookeeper-3.4.6/bin/zkServer.sh status

solr启动:/opt/solr-5.5.0/bin/solr start
solr停止:/opt/solr-5.5.0/bin/solr stop
solr状态:/opt/solr-5.5.0/bin/solr status

solr访问:http://10.1.202.67:8983/solr/ 
                http://10.1.202.68:8983/solr/ 
                http://10.1.202.69:8983/solr/ 
                http://10.1.202.70:8983/solr/ 
                http://10.1.202.71:8983/solr/ 

复制IK分词到solr中
注意:IK分词不要用之前IKAnalyzer2012FF_u1.jar的版本,以前版本不支持solr5.0以上,需要用IKAnalyzer2012FF_u2.jar,或者在github上下载源码,然后自己编译,github连接如下:
https://github.com/EugenePig/ik-analyzer-solr5/blob/master/README.md
本人用的是自己编译的ik分词包Ik-analyzer-solr5-5.x.jar
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.67:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.68:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.69:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.70:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib
scp ./Ik-analyzer-solr5-5.x.jar solr@10.1.202.71:/opt/solr-5.5.0/server/solr-webapp/webapp/WEB-INF/lib

3.managed-schema修改
solr5.5.0版本已经没有schema.xml的文件,替代用的是managed-schema,在solr-5.5.0/server/solr/configsets/目录下,在此目录下复制sample_techproducts_configs文件夹,命令如下:
cp sample_techproducts_configs poc_configs
编辑poc_configs/conf/managed-schema 增加ik分词字段类型和ik分词字段
vim managed-schema

<fields>
        <field name='title_ik' type='text_general' indexed='true' stored='true'/>
        <field name='content_ik' type='text_ik' indexed='true' stored='false'/>
        <field name='content_outline' type='text_general' indexed='false' stored='true'/>
    </fields>
    <fieldType name='text_ik' class='solr.TextField'>
        <analyzer type='index' useSmart='false' class='org.wltea.analyzer.lucene.IKAnalyzer'/>
        <analyzer type='query' useSmart='true' class='org.wltea.analyzer.lucene.IKAnalyzer'/>
    </fieldType>

注:content_ik字段太长,不能够存储文件,只能生产索引,也没有必要存储原文,导致数据量加大,搜索可能会受影响,内容的前50个字段截取出来存储到content_outline,方便查看内容的大致内容

重启solr服务器
./solr_stop_all.sh

#!/bin/bash
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:~/bin


slaves='dn-1 dn-2 dn-3 dn-4 dn-5'
cmd='/opt/solr-5.5.0/bin/solr stop'
for slave in $slaves
do
    echo $slave
    ssh $slave $cmd
done

./solr_start_all.sh

#!/bin/bash
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:~/bin


slaves='dn-1 dn-2 dn-3 dn-4 dn-5'
cmd='/opt/solr-5.5.0/bin/solr start'
for slave in $slaves
do
    echo $slave
    ssh $slave $cmd
done

分词测试
使用text_ik分词:
这里写图片描述

创建collection
编辑完成后可以创建collection,命令如下:
/opt/solr-5.5.0/bin/solr create -c collection1 -d /opt/solr-5.5.0/server/solr/configsets/poc_configs/conf -shards 5 -replicationFactor 2
创建完成后在访问页面出现一下内容:
这里写图片描述

配置文件并不会生成到solr目录下,而是增加到zookeeper上
连接zookeeper
/opt/zookeeper-3.4.6/bin/zkCli.sh -server 10.1.202.67:9983
这里写图片描述

当创建失败collection时,可以通过命令删除,命令如下:
/opt/solr-5.5.0/bin/solr delete -c collection1 -deleteConfig true
注:-deleteConfig true是删除zookeeper上的配置文件,防止下次创建时直接用此配置项或者报错
如果还是不行,那说明zookeeper上的配置文件没有删除,直接登录zookeeper,通过rmr /configs/collection1命令删除配置项。

编写solrCloud通过mapreduce读取hbase的字段生成索引
主方法:

public class SolrHBaseMoreIndexer {

    public static  Logger logger = LoggerFactory.getLogger(SolrHBaseMoreIndexer.class);


    private static void hadoopRun(String[] args){

        String tbName = ConfigProperties.getHBASE_TABLE_NAME();
        try {
            Job job = new Job(ConfigProperties.getConf(), 'SolrHBaseMoreIndexer');
            job.setJarByClass(SolrHBaseMoreIndexer.class);

            Scan scan = new Scan();
            //开始和结束并不是ID,而是hbase的rowkey,rowkey是通过数字排序,而是通过字符串进行排序,所以109在1000的后面,即1 。。。1000 。。。109
            scan.setStartRow(Bytes.toBytes('1'));
            scan.setStopRow(Bytes.toBytes('109'));
            for(String tbFamily:ConfigProperties.getHBASE_TABLE_FAMILY().split(',')){
                 scan.addFamily(Bytes.toBytes(tbFamily));
                 logger.info('tbName:'+tbName+',tbFamily:'+tbFamily);
            }

            scan.setCaching(500); // 设置缓存数据量来提高效率
            scan.setCacheBlocks(false);
            // 创建Map任务
            TableMapReduceUtil.initTableMapperJob(tbName, scan,
                    SolrHBaseMoreIndexerMapper.class, null, null, job);
             // 不需要输出
            job.setOutputFormatClass(NullOutputFormat.class);
            // job.setNumReduceTasks(0);

            System.exit(job.waitForCompletion(true) ? 0 : 1);
        } catch (Exception e) {
            logger.error('hadoopRun异常', e);
        } 


    }


    public static void main(String[] args) throws IOException,
            InterruptedException, ClassNotFoundException, URISyntaxException {

        SolrHBaseMoreIndexer.hadoopRun(args);
    }
}

mapper方法:

public class SolrHBaseMoreIndexerMapper extends TableMapper<Text, Text> {
     CloudSolrClient cloudSolrServer;
     @Override
     protected void setup(Context context)
             throws IOException, InterruptedException {
         cloudSolrServer=SolrServerFactory.getCloudSolrClient();
     }
     @Override
     protected void cleanup(Context context
             ) throws IOException, InterruptedException {
         try {
            cloudSolrServer.commit(true, true, true);
            cloudSolrServer.close();
        } catch (SolrServerException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

     }
    public static  Logger logger = LoggerFactory.getLogger(SolrHBaseMoreIndexerMapper.class);
    public void map(ImmutableBytesWritable key, Result hbaseResult,
            Context context) throws InterruptedException, IOException {
        SolrInputDocument solrDoc = new SolrInputDocument();
        try {
            solrDoc.addField('id', new String(hbaseResult.getRow()));
            logger.info('id:'+new String(hbaseResult.getRow()));
            for (KeyValue rowQualifierAndValue : hbaseResult.list()) {
                String fieldName = new String(rowQualifierAndValue.getQualifier());
                String family = new String(rowQualifierAndValue.getFamily());
                String fieldValue = new String(rowQualifierAndValue.getValue());
                if(family.equals('content')){
                    solrDoc.addField('content_outline',fieldValue.length()>50?fieldValue.substring(0, 50)+'...':fieldValue);
                }
                for(String tbFamily:ConfigProperties.getHBASE_TABLE_FAMILY().split(',')){
                    if(family.equals(tbFamily))solrDoc.addField(tbFamily+'_ik', fieldValue);
                }
            }
            //1分钟提交一次,防止每次提交影响效率
            cloudSolrServer.add(null,solrDoc,60000);
        } catch (SolrServerException e) {
            logger.error('更新Solr索引异常:' + new String(hbaseResult.getRow()),e);
        }
    }
}

配置文件读取类:

public class ConfigProperties {
    public static  Logger logger = LoggerFactory.getLogger(ConfigProperties.class);
    private static Properties props;
    private static String HBASE_ZOOKEEPER_QUORUM;
    private static String HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT;
    private static String HBASE_MASTER;
    private static String HBASE_ROOTDIR;
    private static String DFS_NAME_DIR;
    private static String DFS_DATA_DIR;
    private static String FS_DEFAULT_NAME;
    private static String HBASE_TABLE_NAME; // 需要建立Solr索引的HBase表名称
    private static String HBASE_TABLE_FAMILY; // HBase表的列族
    private static String QUERY_FIELD;
    private static String SOLR_ZOOKEEPER;
    private static String SOLRCLOUD_SERVER1;
    private static String SOLRCLOUD_SERVER2;
    private static String SOLRCLOUD_SERVER3;
    private static String SOLRCLOUD_SERVER4;
    private static String SOLRCLOUD_SERVER5;
    private static String wordsFilePath;
    private static String querySeparator;
    private static String COLLECTION;
    private static boolean isQueryContent;
    private static Configuration conf;

    /**
     * 从配置文件读取并设置HBase配置信息
     * 
     * @param propsLocation
     * @return
     */
    static {
        props = new Properties();
        try {
            InputStream in = ConfigProperties.class.getClassLoader().getResourceAsStream('config.properties');  
            props.load(new InputStreamReader(in,'UTF-8'));

            HBASE_ZOOKEEPER_QUORUM = props.getProperty('HBASE_ZOOKEEPER_QUORUM');
            HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT = props.getProperty('HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT');
            HBASE_MASTER = props.getProperty('HBASE_MASTER');
            HBASE_ROOTDIR = props.getProperty('HBASE_ROOTDIR');
            DFS_NAME_DIR = props.getProperty('DFS_NAME_DIR');
            DFS_DATA_DIR = props.getProperty('DFS_DATA_DIR');
            FS_DEFAULT_NAME = props.getProperty('FS_DEFAULT_NAME');
            HBASE_TABLE_NAME = props.getProperty('HBASE_TABLE_NAME');
            HBASE_TABLE_FAMILY = props.getProperty('HBASE_TABLE_FAMILY');
            QUERY_FIELD = props.getProperty('QUERY_FIELD');
            SOLR_ZOOKEEPER = props.getProperty('SOLR_ZOOKEEPER');
            SOLRCLOUD_SERVER1= props.getProperty('SOLRCLOUD_SERVER1');
            SOLRCLOUD_SERVER2= props.getProperty('SOLRCLOUD_SERVER2');
            SOLRCLOUD_SERVER3= props.getProperty('SOLRCLOUD_SERVER3');
            SOLRCLOUD_SERVER4= props.getProperty('SOLRCLOUD_SERVER4');
            SOLRCLOUD_SERVER5= props.getProperty('SOLRCLOUD_SERVER5');
            wordsFilePath= props.getProperty('wordsFilePath');
            querySeparator= props.getProperty('querySeparator');
            isQueryContent=Boolean.parseBoolean(props.getProperty('isQueryContent','false'));

            COLLECTION= props.getProperty('COLLECTION');
            conf = HBaseConfiguration.create();
            conf.set('hbase.zookeeper.quorum', HBASE_ZOOKEEPER_QUORUM);
            conf.set('hbase.zookeeper.property.clientPort',HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT);
            conf.set('hbase.master', HBASE_MASTER);
            conf.set('hbase.rootdir', HBASE_ROOTDIR);
            conf.set('mapreduce.job.user.classpath.first','true');
            conf.set('mapreduce.task.classpath.user.precedence','true');


        } catch (IOException e) {
            logger.error('加载配置文件出错',e);
        } catch (NullPointerException e) {
            logger.error('加载文件出错',e);
        }catch (Exception e) {
            logger.error('加载配置文件出现位置异常',e);
        }
    }

    public static Logger getLogger() {
        return logger;
    }

    public static Properties getProps() {
        return props;
    }

    public static String getHBASE_ZOOKEEPER_QUORUM() {
        return HBASE_ZOOKEEPER_QUORUM;
    }

    public static String getHBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT() {
        return HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT;
    }

    public static String getHBASE_MASTER() {
        return HBASE_MASTER;
    }

    public static String getHBASE_ROOTDIR() {
        return HBASE_ROOTDIR;
    }

    public static String getDFS_NAME_DIR() {
        return DFS_NAME_DIR;
    }

    public static String getDFS_DATA_DIR() {
        return DFS_DATA_DIR;
    }

    public static String getFS_DEFAULT_NAME() {
        return FS_DEFAULT_NAME;
    }
    public static String getHBASE_TABLE_NAME() {
        return HBASE_TABLE_NAME;
    }

    public static String getHBASE_TABLE_FAMILY() {
        return HBASE_TABLE_FAMILY;
    }

    public static String getQUERY_FIELD() {
        return QUERY_FIELD;
    }

    public static String getSOLR_ZOOKEEPER() {
        return SOLR_ZOOKEEPER;
    }

    public static Configuration getConf() {
        return conf;
    }

    public static String getSOLRCLOUD_SERVER1() {
        return SOLRCLOUD_SERVER1;
    }

    public static void setSOLRCLOUD_SERVER1(String sOLRCLOUD_SERVER1) {
        SOLRCLOUD_SERVER1 = sOLRCLOUD_SERVER1;
    }

    public static String getSOLRCLOUD_SERVER2() {
        return SOLRCLOUD_SERVER2;
    }

    public static void setSOLRCLOUD_SERVER2(String sOLRCLOUD_SERVER2) {
        SOLRCLOUD_SERVER2 = sOLRCLOUD_SERVER2;
    }

    public static String getSOLRCLOUD_SERVER3() {
        return SOLRCLOUD_SERVER3;
    }

    public static void setSOLRCLOUD_SERVER3(String sOLRCLOUD_SERVER3) {
        SOLRCLOUD_SERVER3 = sOLRCLOUD_SERVER3;
    }

    public static String getSOLRCLOUD_SERVER4() {
        return SOLRCLOUD_SERVER4;
    }

    public static void setSOLRCLOUD_SERVER4(String sOLRCLOUD_SERVER4) {
        SOLRCLOUD_SERVER4 = sOLRCLOUD_SERVER4;
    }

    public static String getSOLRCLOUD_SERVER5() {
        return SOLRCLOUD_SERVER5;
    }

    public static void setSOLRCLOUD_SERVER5(String sOLRCLOUD_SERVER5) {
        SOLRCLOUD_SERVER5 = sOLRCLOUD_SERVER5;
    }

    public static String getCOLLECTION() {
        return COLLECTION;
    }

    public static void setCOLLECTION(String cOLLECTION) {
        COLLECTION = cOLLECTION;
    }

    public static String getWordsFilePath() {
        return wordsFilePath;
    }

    public static String getQuerySeparator() {
        return querySeparator;
    }

    public static void setQuerySeparator(String querySeparator) {
        ConfigProperties.querySeparator = querySeparator;
    }

    public static boolean getIsQueryContent() {
        return isQueryContent;
    }


}

config.properties配置文件:

HBASE_ZOOKEEPER_QUORUM=10.1.202.67,10.1.202.68,10.1.202.69
HBASE_ZOOKEEPER_PROPERTY_CLIENT_PORT=2181
HBASE_MASTER=10.1.202.67:16000,10.1.202.68:16000
HBASE_ROOTDIR=hdfs://ocdpCluster/apps/hbase/data
DFS_NAME_DIR=/hadoop/hdfs/namenode
DFS_DATA_DIR=/data1/hadoop/hdfs/data,/data2/hadoop/hdfs/data,/data3/hadoop/hdfs/data,/data4/hadoop/hdfs/data,/data5/hadoop/hdfs/data,/data6/hadoop/hdfs/data,/data7/hadoop/hdfs/data
FS_DEFAULT_NAME=hdfs://ocdpCluster
HBASE_TABLE_NAME=td_poc_dynamic_info
HBASE_TABLE_FAMILY=title,content
QUERY_FIELD=content_ik:公司
SOLR_ZOOKEEPER=10.1.202.67:9983,10.1.202.68:9983,10.1.202.69:9983,10.1.202.70:9983,10.1.202.71:9983
SOLRCLOUD_SERVER1=http://10.1.202.67:8983/solr/
SOLRCLOUD_SERVER2=http://10.1.202.68:8983/solr/
SOLRCLOUD_SERVER3=http://10.1.202.69:8983/solr/
SOLRCLOUD_SERVER4=http://10.1.202.70:8983/solr/
SOLRCLOUD_SERVER5=http://10.1.202.71:8983/solr/
COLLECTION=collection1

wordsFilePath=/usr/local/pocProject/queryProject2/querywords.txt

pom依赖包

<dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->
    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>1.1.2</version>
    </dependency>
    <dependency>
        <version>1.6.6</version>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.solr</groupId>
        <artifactId>solr-solrj</artifactId>
        <version>5.1.0</version>
    </dependency> 
    <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-server -->
    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-server</artifactId>
        <version>1.1.2</version>
    </dependency>


  </dependencies>

创建solrcloud连接:

public class SolrServerFactory {
    public static  Logger logger = LoggerFactory.getLogger(SolrServerFactory.class);
    private static CloudSolrClient cloudSolrServer;
    public static synchronized CloudSolrClient getCloudSolrClient(){
        if(cloudSolrServer==null){
            logger.info('cloudSolrServer怎么还是空');
            createCloudSolrClient();
        }
        return cloudSolrServer;
    }
    private static void createCloudSolrClient(){
        ModifiableSolrParams params = new ModifiableSolrParams();
        params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 100);//10
        params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 20);//5
        HttpClient httpClient = HttpClientUtil.createClient(params);
        LBHttpSolrClient lbHttpSolrClient = new LBHttpSolrClient(httpClient, ConfigProperties.getSOLRCLOUD_SERVER1(),
                ConfigProperties.getSOLRCLOUD_SERVER2(),ConfigProperties.getSOLRCLOUD_SERVER3(),
                ConfigProperties.getSOLRCLOUD_SERVER4(),ConfigProperties.getSOLRCLOUD_SERVER5());
        cloudSolrServer = new CloudSolrClient(ConfigProperties.getSOLR_ZOOKEEPER(),lbHttpSolrClient);
        cloudSolrServer.setDefaultCollection(ConfigProperties.getCOLLECTION());
//      cloudSolrServer.setZkClientTimeout(SearchConfig.getZookeeperClientTimeout());
//      cloudSolrServer.setZkConnectTimeout(SearchConfig.getZookeeperConnectTimeout());
    }
}

hbase连接

public class HbaseConnectionFactory {
    private static Connection connection = null; 
    public static synchronized Connection getHTable(){
        if(connection ==null){
            try {
                connection = ConnectionFactory.createConnection(ConfigProperties.getConf());
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return connection;
    }
    public static Connection getConnection() {
        return connection;
    }

}
查询代码如下:

public class QueryData {
public static Logger logger = LoggerFactory.getLogger(ConfigProperties.class);
/**
* @param args
* @throws SolrServerException
* @throws IOException
*/
public static void main(String[] args) throws SolrServerException, IOException {

    CloudSolrClient cloudSolrServer=SolrServerFactory.getCloudSolrClient();
    SolrQuery query = new SolrQuery(new String(ConfigProperties.getQUERY_FIELD()));
    query.setStart(0); //数据起始行,分页用
    query.setRows(10); //返回记录数,分页用
    QueryResponse response = cloudSolrServer.query(query);
    SolrDocumentList docs = response.getResults();
    System.out.println('文档个数:' + docs.getNumFound()); //数据总条数也可轻易获取
    System.out.println('查询时间:' + response.getQTime());
    cloudSolrServer.close();

    HTable table = new HTable(ConfigProperties.getConf(), ConfigProperties.getHBASE_TABLE_NAME());
    Get get = null;
    List<Get> list = new ArrayList<Get>();
    for (SolrDocument doc : docs) {
        logger.info('查询出ID为:'+(String) doc.getFieldValue('id'));
        get = new Get(Bytes.toBytes((String) doc.getFieldValue('id')));
        list.add(get);
    }
    Result[] res = table.get(list);
    logger.info('查询出数据个数:'+res.length);
    byte[] titleBt = null;
    byte[] contentBt = null;
    String title = null;
    String content = null;
    for (Result rs : res) {
        if(rs.getRow()==null){
            return;
        }
        titleBt = rs.getValue('title'.getBytes(), ''.getBytes());
        contentBt = rs.getValue('create_date'.getBytes(), ''.getBytes());
        if (titleBt != null && titleBt.length>0) {title = new String(titleBt);} else {title = '无数据';} //对空值进行new String的话会抛出异常
        if (contentBt != null && contentBt.length>0) {content = new String(contentBt);} else {content = '无数据';}
        logger.info('id:'+new String(rs.getRow()));
        logger.info('title'+title + '|');
        logger.info('content'+content + '|');
    }
    table.close();
}

}


最好在运行时把hdfs-site.xml和hbase-site.xml放入配置文件中
这里写图片描述

Tag标签: 索引  
  • 专题推荐

About IT165 - 广告服务 - 隐私声明 - 版权申明 - 免责条款 - 网站地图 - 网友投稿 - 联系方式
本站内容来自于互联网,仅供用于网络技术学习,学习中请遵循相关法律法规