Building and Maintaining Text Search Indexes API Definition Version 0.23 September 14, 2006 The name of the Text Search API dll is WtDocUtil.dll. The exposed Text Search API contains three sets of interfaces: 1) DocParse() - parse documents 2) DocStruct() - create initial structure 3) DocStructUpdate() - update structure The requisite #include files for the interface are: #include "DocCommon.h" #include "DocParse.h" #include "DocStruct.h" I. Universal definitions II. Registry Settings III. Comments from source code: IV. DocParse() A. void DocParseInit() B. void DocParseTerm() C. void DocParseSetCancelCb() D. void DocParseSetRankParams() E. DocParseSetDetailParams() F. DocParseSetScoreBias() G. DocParseSetWriteUrlFlag() H. DocParseSetUpdateParams() I. DocParseSetProcessMode() J. DocParseGetResult() K. void DocParseFile() L. void DocParseMem() M. void DocParseBucket() V. DocParse() - Relevancy A. void DocParseSetRelevancyParams() B. void DocParseRelevancyMem() C. void DocParseRelevancyGetResult() D. DocParseRelevancyAbsGetResult() VI. DocParser() A. DocParserInit() B. DocParserTerm() C. DocParserSetCancelCb() D. DocParser() E. DocParserAbs() F. DocParserCreateResult() G. DocParserDestroyResult() H. DocParserDupResult() I. DocParserResizeResult() J. DocParserGrowResult() K. DocDetailCreateResult() L. DocDetailDestroyResult() M. DocDetailResizeResult() N. DocDetailGrowResult() O. DocTokenArrayCreateResult() P. DocTokenArrayDestroyResult() VII. DocStruct() A. void DocStructInit() B. void DocStructTerm() C. void DocStructFresh() D. void DocStructFromFile() E. void DocStructFromFileRD() VIII. DocStructUpdate() A. void DocStructUpdateInit() B. void DocStructUpdateTerm() C. void DocStructUpdate() I. Universal definitions A. char *ErrorString The *ErrorString parameter is included in all Text Search API functions. It is presumed to be defined by the calling program as char ErrorString[256] = "\0"; If an API function encounters an error, it stores a descriptive, non-null string into *ErrorString. It is the responsibility of the calling program to check for this condition and to take appropriate steps. II. Registry Settings This application uses the WordNet subsystem and the WordNet Dictionary. Subsequent to 2005.02.12, the folder containing the WordNet Dictionary files is specified by the registry key: HKEY_LOCAL_MACHINE\Software\WhamTech\TextSearch\WordNet Prior to 2005.02.12, the folder containing the WordNet Dictionary files was specified by the environment variable: WNSEARCHDIR III. Comments from source code: A. From DocWordCount.cpp: The rules for forming a word are quite simple: if any character except {A-Z,a-z,0-9,'} is encountered, it is considered a word break. The ' is not considered a word break, but it is not included in the word. For example, O'Brien is indexed as OBRIEN. The '&' character is considered to be a separate word (except when it occurs within an HTML Entity [ , etal]. The '.' character is considered to be a break for proximity pairs. B. From DocRank.cpp: i. WordNet Stemming vs Porter Stemming: WordNet Stemming (morphing) combines table lookup and computional processing. WordNet Stemming returns the singular case form of nouns and the first person present tense form for verbs. For example: nationalizations ==> nationalization running == > run It is quite comprehensive in dealing with special cases such as mice/mouse and data/datum. WordNet Stemming requires that the source word be in lower case. Porter Stemming is a strictly computational process. Porter Stemming removes multiple prefixes and suffixes to give the root stem. For example: nationalizations == > nation Porter Stemming does not attempt to deal with special cases. Porter Stemming requires the source word to be in lower case. ii. The Synonym-based word weighting works as follows: a. The parser (RmaParser.dll) extracts words from a document and stores them in an array of structure items (struct WordDescriptor{}). Each structure item includes the word, multiple attribute counters and flags, and additional temp space for the weighting process. b. When the parser has finished collecting words for a document, it calls DocRankWords() in RmaPtos.dll. DocRankWords() IV. DocParse() A. void DocParseInit() Prototype: (Ref DocParse.h) void DocParseInit( long *DocParseHandle, /* [OUT] */ char *ArgString, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocParseInit() create and initializes a context for the DocParseXxx() functions; and return a context handle. Description of Parameter *DocParseHandle: If function is successful, a non-zero handle is returned for use in subsequent DocParseXxx() function calls. If function is not successful, zero is returned. Description of Parameter *ArgString: At present, ArgString is the null string. Description of Parameter *ErrorString: Refer to paragraph I. B. void DocParseTerm() Prototype: (Ref DocParse.h) void DocParseTerm( long DocParseHandle, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocParseTerm() terminates and destroys the DocParseXxx() context. Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter *ErrorString: Refer to paragraph I. C. void DocParseSetCancelCb() Prototype: (Ref DocParse.h) void DocParseSetCancelCb( long DocParseHandle, /* [IN] */ DocCancelCb CancelCb, long CancelUserWord); Description: Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter CancelCb: (Ref DocCommon.h) CancelCb is a function pointer to the callback function; its prototype is defined by the typedef: typedef int (DocStdCall *DocCancelCb)(long UserWord); If the callback is defined [i.e., if the function pointer is non-zero], it is called by a process periodically; if the callback return 0, then the process continues; if the callback returns non-zero, then the process terminates. In this case, the callback is called by DocParseBucket() after processing each document. D. void DocParseSetRankParams() Prototype: (Ref DocParse.h) void DocParseSetRankParams( long DocParseHandle, /* [IN] */ int RankMode, /* [IN] */ char *OutputMetaFile, /* [IN] */ char *OutputInfoFile, /* [IN] NULL == > no info file */ char *ErrorString); /* [OUT] */ Description: DocParseSetRankParams() passes rank-related parameters. The sequence of operations is: a) DocParseInit() b) 1 or more calls to DocParseSetRankParams() c) DocDetailSetParams() d) DocParseBucket() e) DocParseTerm() Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter RankMode: (Ref DocParse.h) DocRankFlPoolXxx flags cause derivatives to be computed and used solely for the purpose of pooling word-score data prior to ranking. The base word is not changed; and no additional words are created. We call this mode "Stem-Pool Mode". DocRankFlDerivedXxx flags cause derivatives to be computed to replace the base word. Then, word-score data for like words is combined and the duplicate word is removed. We call this mode "Word-Pool Mode". #define DocRankFlPoolNone 0x00 #define DocRankFlPoolWn 0x02 #define DocRankFlPoolSoundex 0x04 #define DocRankFlPoolMetaPhone 0x05 #define DocRankFlPoolWnSyn 0x06 #define DocRankFlDerivedWn (DocRankFlPoolWn | 0x08) #define DocRankFlDerivedSoundex (DocRankFlPoolSoundex | 0x08) #define DocRankFlDerivedMetaPhone (DocRankFlPoolMetaPhone | 0x08) #define DocRankFlDerivedWnSyn (DocRankFlPoolWnSyn | 0x08) Description of Parameter *OutputMetaFile: Contains the name of the meta-data output file for this RankMode. Description of Parameter *OutputInfoFile: Contains the name of the meta-info output file for this RankMode. It is NULL if no meta-info output file is desired. Description of Parameter *ErrorString: Refer to paragraph I. E. DocParseSetDetailParams() Prototype: (Ref DocParse.h) void DocParseSetDetailParams( long DocParseHandle, /* [IN] */ int DetailMode, /* [IN] */ char *OutputDetailFile, /* [IN] */ char *DetailTag, /* [IN] */ char *ErrorString); /* [OUT] */ Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter DetailMode: (Ref DocParser.h) The four low order bits of DetailMode (masked by DocParserFlDetailMask) specifies the Proximity Level. This is the maximum distance between word pairs that are to be emnitted. For example, for words: Word1 Word2 Word3 Word4 and for Proximity Level 2, the following pairs are emitted: Word1 Word2 Word1 Word3 Word2 Word3 Word2 Word4 Word3 Word4 The maximum permitter Proximity Level is defined symbolically in DocCommon.h as: #define MAX_PROXLEVEL 5 // max proximity for word pair scheme The number of characters in each member of the pair is defined symbolically in DocCommon.h as: #define CHARS_IN_PROX 4 /* number of chars in Proximity Index */ The remaining DetailMode flags, from DocParser.h, specify how the pair is formed: #define DocParserFlDetailRaw 0x0100 /* use raw word for Prox Pairs */ #define DocParserFlDetailStem 0x0010 /* use stem for Prox Pairs */ #define DocParserFlDetailPorter 0x0020 /* use Porter stem for Prox Pairs */ #define DocParserFlDetailMeta 0x0040 /* use metaphone for Prox Pairs */ #define DocParserFlDetailPoMeta 0x0080 /* use Porter + metaphone for Prox Pairs */ #define DocParserFlDetailMask 0x000F Description of Parameter OutputDetailFile: Contains the name of the meta-data output file for all Prox Pairs. The Prox data in each record of this flat file is in the form: Byte 0: Prox Level for this pair; (ascii '1' thru '5'); Byte 1-4: first half of Prox Pair Byte 5-8: second half of Prox Pair Each record also includes additional meta data such as TB recno, TB table name, etc. Description of Parameter DetailTag: This is a 2 or 3 character ascii string that duplicates the effective -P option; i.e., "PR", "PPM", ... It is used in summary message displays. Description of Parameter *ErrorString: Refer to paragraph I. F. DocParseSetScoreBias() Prototype: (Ref DocParse.h) void DocParseSetScoreBias( long DocParseHandle, /* [IN] */ char *ArgString, /* [IN] */ char *ErrorString); /* [OUT] */ Description: This function defines a constant ranking score for a URL based on its domain. If the first N charactyers of a URL matches the specified domain, its score is set to the specified value. DocParseSetScoreBias() is an optional call. And it can be called multiple times. The placement of DocParseSetScoreBias() is not significant as long as it is after DocParseInit() and before DocParseTerm(). However, it is meaningless unless DocParseSetWriteUrlFlag() is also invoked. Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter *ArgString: ArgString is an ascii string in the form: <Domain> <score> Description of Parameter *ErrorString: Refer to paragraph I. G. DocParseSetWriteUrlFlag() Prototype: (Ref DocParse.h) void DocParseSetWriteUrlFlag( long DocParseHandle, /* [IN] */ char *DbName, /* [IN] */ char *DbTableName, /* [IN] */ char *ErrorString); /* [OUT] */ Description: This function sets a flag and initializes to cause subsequent calls to DocParseBucket() to write the URL that is contained in each document in the bucket file to the target data base, specified by the DbName parameter. The resulting record number overrides the record number in thetag and becomes part of the data that is emitted to the meta file(s). In addition to the URL itself, these fields are also written: . Crawl Date . BulkFileName . AccountID . Score . UrlHash The BulkFileName field is stored as "Repository\FileName"; with the expectation that this will be interpreted as a relative path wrt the DocIndex data base path. Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter *DbName: DbName is the full path name of the target data base. Description of Parameter *DbTableName: DbTableName is the table name in the target data base where URLs are to be written. In the current implementation, target data base column names are coded as constants. As a result, the schema of the target data base must conform to this model: DG 10000 records URL X(180) @ (*) Full name of the URL URLHash X(14) KEY DISCRETE @ (*) Hash to URL name BulkFileName X(96) @ (*) Bulk file name where the URL resides UrlText MEMO :1 UrlContextAlias X(20) @ (*) Score ID @ (*) URL score (based on URL references) AccountID ID @ (*) Account ID related to this domain DateSpidered Date @ (*) DateModified Date @ URL_Text X(16) KEY VIRTUAL @ Text search field URL_TextDS X(16) KEY VIRTUAL @ Text search field URL_TextDY X(16) KEY VIRTUAL @ Text search field URL_TextDX X(16) KEY VIRTUAL @ Text search field (*) Required column name and type Description of Parameter *ErrorString: Refer to paragraph I. H. DocParseSetUpdateParams() Prototype: (Ref DocParse.h) void DocParseSetUpdateParams( long DocParseHandle, /* [IN] */ int ParserFlags, /* [IN] */ char *ArgString, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocParseSetUpdateParams() changes the mode of operation of the DocParseXxx() functions. This call is pertinent only when it is used in conjunction with the DocParseSetWriteUrlFlag(). The default mode of operation of DocParseXxx() [e.g., DocParseBucket()] when DocParseSetUpdateParams() is not in effect is to unconditionally write all URLs that are parsed to the target data base. When the DocParseSetUpdateParams() mode is in effect, all URLs are tested for existence in the target data base. If a URL exists, it is processed in update mode; that is, the DocStructUpdate() functions are called internally using the parameters specified in the DocParseSetUpdateParams(). If the URL does not exist in the target data base, it is processed in the default mode. On return from pertinent DocParseXxx() calls, the caller can use the DocParseGetResult() function to determine if any URLs were inserted. If this count is greater than zero, the caller must call the DocStruct() functions as in the default case. Also, when the DocParseSetUpdateParams() mode is in effect, the presence of the attribute "delete=true" in the tag causes the URL specified by the attribute "URL=xxx" (and by the SeedUrlID attribute, if applicable) to be deleted from the DocIndex, along with its text index entries. If the URL attribute specifies "URL=all", all URLs for the specified SeedUrlID are deleted. Description of Parameter ParserFlags: These are the same as the DetailMode parameter for DocParseSetDetailParams(). Description of Parameter *ArgString: ArgString is an ascii string in the form: <mode1> <KeyName1> [<mode2> <KeyName2> ...] <mode> is WNraw - Pool None Raw WNrank - Pool None Rank WSraw - Pool Stem Raw WSrank - Pool Stem Rank WYraw - Pool Synonym Raw WYrank - Pool Synonym Rank DSraw - Derived Stem Raw DSrank - Derived Stem Rank DYraw - Derived Synonym Raw DYrank - Derived Synonym Rank DXraw - Derived Soundex Raw DXrank - Derived Soundex Rank DMraw - Derived MetaPhone Raw DMrank - Derived MetaPhone Rank PROX1 - Proximity Level 1 PROX2 - Proximity Level 2 PROX3 - Proximity Level 3 PROX4 - Proximity Level 4 PROX5 - Proximity Level 5 I. DocParseSetProcessMode() Prototype: (Ref DocParse.h) void DocParseSetProcessMode( long DocParseHandle, /* [IN] */ char *ArgString, char *ErrorString); Description: Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter *ArgString: ArgString is an ascii string in the form: <mode> <mode> is an ascii literal "default" "ECA" J. DocParseGetResult() Prototype: (Ref DocParse.h) void DocParseGetResult( long DocParseHandle, /* [IN] */ int indicator, /* [IN] */ int *result); /* [OUT] */ Description: DocParseGetResult() returns one of three counters that are incremented by the DocParseXxx() functions. Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter indicator: Symbolic Value Counter -------------- ------- DocParseIndInsertCount Number of URL records inserted DocParseIndUpdateCount Number of URL records updated DocParseIndTotalCount Total number of URL records processed Description of Parameter *result: The requested counter is returned in *result. K. void DocParseFile() Prototype: (Ref DocParse.h) void DocParseFile( long DocParseHandle, /* [IN] */ char *SourceFileName, /* [IN] */ char *ErrorString); /* [OUT] */ Description: Not implemented. Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter *SourceFileName: SourceFileName is the name of a document file to be parsed. Description of Parameter *ErrorString: Refer to paragraph I. L. void DocParseMem() Prototype: (Ref DocParse.h) void DocParseMem( long DocParseHandle, /* [IN] */ char *SourceMemImage, /* [IN] */ char *ErrorString); /* [OUT] */ Description: Not implemented. Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter *SourceMemImage: SourceMemImage is the memory image of a document to be parsed. Description of Parameter *ErrorString: Refer to paragraph I. M. void DocParseBucket() Prototype: (Ref DocParse.h) void DocParseBucket( long DocParseHandle, /* [IN] */ char *BucketFileName, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocParseBucket() parses, ranks and stores meta-data for a specified bucket file. The sequence of operations is: a) DocParseInit() b) 1 or more calls to DocParseParam() c) DocParseBucket() d) DocParseTerm() Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter *BucketFileName: BucketFileName is the name of a bucket file to be parsed. Description of Parameter *ErrorString: Refer to paragraph I. V. DocParse() - Relevancy A. void DocParseSetRelevanceParams() Prototype: (Ref DocParse.h) void DocParseSetRelevancyParams( long DocParseHandle, /* [IN] */ int RankMode, /* [IN] */ char *RelevancyParams, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocParseSetRelevancyParams() passes the parameters that are used to calculate document relevancy. Relevancy parameters are specified as a function of RankMode; thus, DocParseSetRelevancyParams() can be called multiple times, once for each RankMode that requires a relevancy calculation. The sequence of operations is: a) Initialization . DocParseInit() . 1 or more calls to DocParseSetRelevancyParams(); there can be up to 16 [MAX_DPRC] calls to DocParseSetRelevancyParams() for distinct values of RankMode; b) Operation; iteration on . DocParseRelevancyMem() . 1 or more calls to DocParseRelevanceGetResult() and/or DocParseRelevancyAbsGetResult() c) Termination . DocParseTerm() Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter RankMode: (Ref DocParse.h) Refer to description at DocParseSetRankParams; For DocParseSetRelevancyParams() only, there is another possible value for parameter RankMode: #define DocRankFlPoolAbs 0x07 When this value is used, the word list in parameter RelevancyParams is actually a list of names; and the relevancy calculation is made by DocParseRelevancyAbsGetResult() rather than DocParseRelevancyGetResult(). Description of Parameter RelevancyParams: RelevancyParams is a null-terminated, comma-separated (with optional quotes ["]) list of words to be used in measuring relevancy; When parameter RankMode is DocRankFlPoolAbs, the word list is actually a list of names; i.e., each item may contain multiple words. In all cases, words are constructed from the characters [0-9] [A-Z] [’] All other characters are treated as separators. When a word contains the [’] character it is rendered both with and without the [’]; e.g., w’ord is rendered as both w’ord and as word; ’word is rendered as both ’word and as word; word’ is rendered as both word’ and as word; B. void DocParseRelevancyMem() Prototype: (Ref DocParse.h) void DocParseRelevancyMem( long DocParseHandle, /* [IN] */ char *SourceMemImage, /* [IN] */ char *ErrorString); /* [OUT] */ Description: Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter *SourceMemImage: SourceMemImage is the memory image of a document to be parsed. C. void DocParseRelevancyGetResult() Prototype: (Ref DocParse.h) void DocParseRelevancyGetResult( long DocParseHandle, /* [IN] */ int RankMode, /* [IN] */ int *result); /* [OUT] */ Description: DocParseRelevancyGetResult() returns a relevency score for the document specified in the most recent DocParseRelevencyMem() call. Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter RankMode: (Ref DocParse.h) Refer to description at DocParseSetRankParams; DocParseRelevancyGetResult() accepts all values of RankMode except DocRankFlPoolAbs Description of Parameter *result: An integer between 0 and 100 is returned; this is the normalized count of hits of words in the most recent document from DocParseRelevancyMem(). In the case of an error, *result is -1. D. DocParseRelevancyAbsGetResult() Prototype: (Ref DocParse.h) void DocParseRelevancyAbsGetResult( long DocParseHandle, /* [IN] */ int RankMode, /* [IN] */ int *resultCount, /* [OUT] */ char *resultString, /* [OUT] */ int resultStringLength); /* [IN] */ Description: DocParseRelevancyAbsGetResult() returns a hit count for the document specified in the most recent DocParseRelevencyMem() call. Description of Parameter DocParseHandle: DocParseHandle is the non-zero handle returned by DocParseInit(). Description of Parameter RankMode: (Ref DocParse.h) DocParseRelevancyAbsGetResult() accepts only RankMode value DocRankFlPoolAbs Description of Parameter *resultCount: An integer between 0 and N is returned; N is the number of words in the DocParseSetRelevancyParams() RelevancyParams parameter. *resultCount is the count of hits of words in the most recent document from DocParseRelevancyMem(). In the case of an error, *resultCount is -1. Description of Parameter *resultString: char resultString[N] is a buffer defined by the caller. The max length of N if given by the fifth parameter resultStringLength. Each hit is copied to *resultString in a quote-delimited (") comma separated (,) format. Description of Parameter resultStringLength: resultStringLength is the maximum number of characters [including the null terminator] that can be stored in *resultString. VI. DocParser() A. void DocParserInit() Prototype: (Ref DocParser.h) void DocParserInit( long *DocParserHandle, char *ParamString, char *ErrorString, char *StopWordList); [Documentation for this function is not complete.] B. void DocParserTerm() Prototype: (Ref DocParser.h) void DocParserTerm( long DocParserHandle, char *ErrorString); [Documentation for this function is not complete.] C. void DocParserSetCancelCb() Prototype: (Ref DocParser.h) void DocParserSetCancelCb( long DocParserHandle, DocCancelCb CancelCb, long CancelUserWord); [Documentation for this function is not complete.] D. void DocParser() Prototype: (Ref DocParser.h) void DocParser( long DocParserHandle, long ParserFlags, char *SourceFileName, /* can be either file or memory image */ char *SourceMemImage, /* if SourceMemImage == NULL, use SourceFileName */ DocParserResult **dprRef, DocDetailResult **ddrRef, /* may be NULL */ char *ErrorString); [Documentation for this function is not complete.] E. void DocParserAbs() Prototype: (Ref DocParser.h) void DocParserAbs( long DocParserHandle, long ParserFlags, char *SourceFileName, /* can be either file or memory image */ char *SourceMemImage, /* if SourceMemImage == NULL, use SourceFileName */ DocParserResult **dprRef, DocDetailResult **ddrRef, /* may be NULL */ DocTokenArrayResult **dtrRef, char *ErrorString); [Documentation for this function is not complete.] F. DocParserResult *DocParserCreateResult() Prototype: (Ref DocParser.h) DocParserResult *DocParserCreateResult(int maxwords); [Documentation for this function is not complete.] G. void DocParserDestroyResult() Prototype: (Ref DocParser.h) void DocParserDestroyResult( DocParserResult *dpr, char *ErrorString); [Documentation for this function is not complete.] H. void DocParserDupResult() Prototype: (Ref DocParser.h) void DocParserDupResult( DocParserResult *dpr, DocParserResult **dprDupRef, char *ErrorString); [Documentation for this function is not complete.] I. int DocParserResizeResult() Prototype: (Ref DocParser.h) int DocParserResizeResult( /* value 0 - failure; value 1 - success */ DocParserResult *dpr, int NewSize); /* newdpr->dprWordArraySize */ [Documentation for this function is not complete.] J. int DocParserGrowResult() Prototype: (Ref DocParser.h) int DocParserGrowResult( /* value 0 - failure; value 1 - success */ DocParserResult *dpr); [Documentation for this function is not complete.] K. DocDetailResult *DocDetailCreateResult() Prototype: (Ref DocParser.h) DocDetailResult *DocDetailCreateResult(int maxpairs); [Documentation for this function is not complete.] L. void DocDetailDestroyResult() Prototype: (Ref DocParser.h) void DocDetailDestroyResult( DocDetailResult *ddr, char *ErrorString); [Documentation for this function is not complete.] M. int DocDetailResizeResult() Prototype: (Ref DocParser.h) int DocDetailResizeResult( /* value 0 - failure; value 1 - success */ DocDetailResult *ddr, int NewSize); /* newddr->ddrPairArraySize */ [Documentation for this function is not complete.] N. int DocDetailGrowResult() Prototype: (Ref DocParser.h) int DocDetailGrowResult( /* value 0 - failure; value 1 - success */ DocDetailResult *ddr); [Documentation for this function is not complete.] O. DocTokenArrayResult *DocTokenArrayCreateResult() Prototype: (Ref DocParser.h) DocTokenArrayResult *DocTokenArrayCreateResult(int maxint) [Documentation for this function is not complete.] P. void DocTokenArrayDestroyResult() Prototype: (Ref DocParser.h) void DocTokenArrayDestroyResult( DocTokenArrayResult *dtr, char *ErrorString); [Documentation for this function is not complete.] VII. DocStruct() A. void DocStructInit() Prototype: (Ref DocStruct.h) void DocStructInit( long *DocStructHandle, /* [OUT] */ char *ArgString, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocStructInit() create and initializes a context for the DocStructXxx() functions; and return a context handle. Description of Parameter *DocStructHandle: If function is successful, a non-zero handle is returned for use in subsequent DocStructXxx() function calls. If function is not successful, zero is returned. Description of Parameter *ArgString: ArgString is an ascii string in the form: <DbName> [TOKENMAP OFF] <DbName> is the name of the target database. TOKENMAP OFF is an optional clause that is required if <DbName> is token mapped; Description of Parameter *ErrorString: Refer to paragraph I. B. void DocStructTerm() Prototype: (Ref DocStruct.h) void DocStructTerm( long DocStructHandle, char *ErrorString); /* [OUT] */ Description: DocStructTerm() terminates and destroys the DocStructXxx() context. Description of Parameter DocStructHandle: DocStructHandle is the non-zero handle returned by DocStructInit(). Description of Parameter *ErrorString: Refer to paragraph I. C. void DocStructFresh() Prototype: (Ref DocStruct.h) void DocStructFresh( long DocStructHandle, /* [IN] */ char *ArgString, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocStructFresh() performs a DESTRUCT on the specified index(es). Description of Parameter DocStructHandle: DocStructHandle is the non-zero handle returned by DocStructInit(). Description of Parameter *ArgString: ArgString is an ascii string in the form: <KeyName1> [<KeyName2> ...] Description of Parameter *ErrorString: Refer to paragraph I. D. void DocStructFromFile() Prototype: (Ref DocStruct.h) void DocStructFromFile( long DocStructHandle, /* [IN] */ char *ArgString, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocStructFromFile() performs a STRUCTURE using the specified meta-data file; it uses a default RD that is consistent with DocParse() output. Description of Parameter DocStructHandle: DocStructHandle is the non-zero handle returned by DocStructInit(). Description of Parameter *ArgString: ArgString is an ascii string in the form: <Mode> <KeyName> <MetaFileName> [<table-name-literal>] <Mode> is {RAW | RANK | PROXn} <MetaFileName> is the name(s) of one or more meta-file; multiple file names are separated by '+'; file names with special characters are enclosed in \" or \' [Currently, the buffer that holds file names is 1024 bytes; if/when that is a problem, I can re-code that.] <table-name-literal> is the name of the table [enclosed in 's] that contains <KeyName>; when <KeyName> is local to a single table, the table-name may be specified either as a literal in ArgString; or it may be included in the meta-data. If <KeyName> is global to multiple tables, then it can only be passed in the meta-data. Description of Parameter *ErrorString: Refer to paragraph I. E. void DocStructFromFileRD() Prototype: (Ref DocStruct.h) void DocStructFromFileRD( long DocStructHandle, /* [IN] */ char *ArgString, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocStructFromFileRD() performs a STRUCTURE using the specified meta-data file; it uses a user-specified RD. Description of Parameter DocStructHandle: DocStructHandle is the non-zero handle returned by DocStructInit(). Description of Parameter *ArgString: ArgString is an ascii string in the form: <Mode> <KeyName> <MetaFileName> <RDname> <RD-KeyName> <RD-recno> <RD-TableName> [<RD-ColumnName> <RD-dbname>] <Mode> <KeyName> <MetaFileName> <RDname> <RD-KeyName> <RD-recno> <table-name-literal> [<RD-ColumnName> <RD-dbname>] <Mode> is {RAW | RANK | PROXn} <MetaFileName> is the name(s) of one or more meta-file; multiple file names are separated by '+'; file names with special characters are enclosed in \" or \' [Currently, the buffer that holds file names is 1024 bytes; if/when that is a problem, I can re-code that.] <table-name-literal> is the name of the table [enclosed in 's] that contains <KeyName>; when <KeyName> is local to a single table, the table-name may be specified either as a literal in ArgString; or it may be included in the meta-data. If <KeyName> is global to multiple tables, then it can only be passed in the meta-data. Description of Parameter *ErrorString: Refer to paragraph I. VIII. DocStructUpdate() A. void DocStructUpdateInit() Prototype: (Ref DocStruct.h) void DocStructUpdateInit( long *DocUpdateHandle, /* [OUT] */ char *ArgString, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocStructUpdateInit() create and initializes a context for the DocStructUpdateXxx() functions; and return a context handle. Description of Parameter *DocUpdateHandle: If function is successful, a non-zero handle is returned for use in subsequent DocStructUpdateXxx() function calls. If function is not successful, zero is returned. Description of Parameter *ArgString: ArgString contains the name of the target database. Description of Parameter *ErrorString: Refer to paragraph I. B. void DocStructUpdateTerm() Prototype: (Ref DocStruct.h) void DocStructUpdateTerm( long DocUpdateHandle, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocStructUpdateTerm() terminates and destroys the DocStructUpdateXxx() context. Description of Parameter DocUpdateHandle: DocUpdateHandle is the non-zero handle returned by DocStructUpdateInit(). Description of Parameter *ErrorString: Refer to paragraph I. C. void DocStructUpdate() Prototype: (Ref DocStruct.h) void DocStructUpdate( long DocUpdateHandle, /* [IN] */ int ParserFlags, /* [IN] */ char *ArgString, /* [IN] */ DocUpdateItem *ItemArray1, /* [IN] */ int ItemCount1, /* [IN] */ DocUpdateItem *ItemArray2, /* [IN] */ int ItemCount2, /* [IN] */ char *ErrorString); /* [OUT] */ Description: DocStructUpdate() performs an index update based on the incoming ArgString parameters and DocUpdateItem{} data. Definition of structure DocUpdateItem: typedef struct DocUpdateItem{ int duidItemType; /* item type */ int duidPassThruSize; /* size of item passthru tag */ char *duidPassThruPointer; /* pointer to item passthru tag */ int duidItemSize; /* size of item text */ char *duidItemPointer; /* pointer to the item text */ char *duidReserved1; /* reserved for internal use */ char *duidReserved2; /* reserved for internal use */ }DocUpdateItem; enum{ /* enum{} for duidItemType */ DocUpdateItemTypeText = 1, DocUpdateItemTypeDoc = 2, DocUpdateItemTypeBlob = 3 }; Description of Parameter DocUpdateHandle: DocUpdateHandle is the non-zero handle returned by DocStructUpdateInit(). Description of Parameter ParserFlags: These are the same as the DetailMode parameter for DocParseSetDetailParams(). Description of Parameter *ArgString: ArgString is an ascii string in the form: <TableName> <recno> <mode1> <KeyName1> [<mode2> <KeyName2> ...] <mode> is WNraw - Pool None Raw WNrank - Pool None Rank WSraw - Pool Stem Raw WSrank - Pool Stem Rank WYraw - Pool Synonym Raw WYrank - Pool Synonym Rank DSraw - Derived Stem Raw DSrank - Derived Stem Rank DYraw - Derived Synonym Raw DYrank - Derived Synonym Rank DXraw - Derived Soundex Raw DXrank - Derived Soundex Rank DMraw - Derived MetaPhone Raw DMrank - Derived MetaPhone Rank PROX1 - Proximity Level 1 PROX2 - Proximity Level 2 PROX3 - Proximity Level 3 PROX4 - Proximity Level 4 PROX5 - Proximity Level 5 Description of Parameter ItemArray1: ItemArray1 is an array of DocUpdateItem{} structures defining the "old" value documents. Description of Parameter ItemCount1: ItemCount1 is the number of items in ItemArray1. Description of Parameter ItemArray2: ItemArray2 is an array of DocUpdateItem{} structures defining the "new" value documents. Description of Parameter ItemCount2: ItemCount2 is the number of items in ItemArray2. Description of Parameter *ErrorString: Refer to paragraph I.
Copyright © 2019 , WhamTech, Inc. All rights reserved. This
document is provided for information purposes only and the contents hereof are
subject to change without notice. Names may be
trademarks of their respective owners.