2012-08-10 15:36:31.0|分类: lucene|浏览量: 1435
这是我在lucene in action 中看到的,本来想翻译一下,但是翻译成汉语就没有原来的味道了。 What is indexing, and why is it important? Suppose you needed to search a large number of files, and you wanted to be able to find files that contained a certain word or a phrase. How would you go about writing a program to do this? A naïve approach would be to sequentially scan each file for the given word or phrase. This approach has a number of flaws, the most obvious of which is that it doesn’t scale to larger file sets or cases where files are very large. This is where indexing comes in: To search large amounts of text quickly, you must first index that text and convert it into a format that will let you search it rapidly, eliminating the slow sequential scanning process. This conver- sion process is called indexing, and its output is called an index. You can think of an index as a data structure that allows fast random access to words stored inside it. The concept behind it is analogous to an index at the end of a book, which lets you quickly locate pages that discuss certain topics. In the case of Lucene, an index is a specially designed data structure, typically stored on the file system as a set of index files. We cover the structure of index files in detail in appendix B, but for now just think of a Lucene index as a tool that allows quick word lookup. 创建index的过程: 1创建Directory -- 我们的索引是创建在硬盘还是创建在内存 2创建IndexWriter 3创建Document对象 索引文档(名称,路径,大小,修改时间,内容)什么形式呈现 4 为Document添加Field 5通过IndexWriter添加文档到索引中 5关闭 IndexWriter 示例代码: package com.java.lucene.index; import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.store.LockObtainFailedException; import org.apache.lucene.util.Version; public class MyIndex { private String[] ids = {"1","2","3","4","5","6"}; private String[] names = {"tian","bao","xing","zhen","kun","xing"}; private String[] emails = {"aa@qq.com","bb@qq.com","cc@qq.com", "dd@qq.com","ee@qq.com","ff@qq.com"}; private String[] contents = { "Lucene Core, our flagship sub-project, provides Java-based indexing and search technology", "Solr is a high performance search server built using Lucene Core, with XML/HTTP and ", "Open Relevance Project is a subproject ", "PyLucene is a Python port of the Core project.", "22 July 2012 - Apache Lucene 3.6.1 and Apache Solr 3.6.1 available", "Lucene 3.6.1 Release Highlights" }; private Directory directory = null; public MyIndex(){ try { //1创建Directory -- 我们的索引是创建在硬盘还是创建在内存 // Directory directory = new RAMDirectory(); // 建立在内存中的索引 directory = FSDirectory.open(new File("d:/tools/lucene/index02")); } catch (IOException e) { e.printStackTrace(); } } public void index() { IndexWriter writer = null; try { //2创建IndexWriter writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35))); Document doc = null; for(int i=0;i<ids.length;i++) { //3创建Document对象 索引文档(名称,路径,大小,修改时间,内容)什么形式呈现 doc = new Document(); //4 为Document添加Field doc.add(new Field("id",ids[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("email",emails[i],Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field("content",contents[i],Field.Store.NO,Field.Index.ANALYZED)); doc.add(new Field("name",names[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS)); //5通过IndexWriter添加文档到索引中 writer.addDocument(doc); } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (LockObtainFailedException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { try { if(writer!=null){ //6关闭writer writer.close(); } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } } }
|