Home page  
Help > SDK Help > Client Interface Components >
Text Search Indexes API
Version 7.11
 
     Building and Maintaining Text Search Indexes
                   API Definition
                    Version 0.23
                  September 14, 2006
 
   The name of the Text Search API dll is WtDocUtil.dll.
 
   The exposed Text Search API contains three sets of interfaces:
      1) DocParse() - parse documents
      2) DocStruct() - create initial structure
      3) DocStructUpdate() - update structure
 
   The requisite #include files for the interface are:
      #include "DocCommon.h"
      #include "DocParse.h"
      #include "DocStruct.h"

I.   Universal definitions
II.  Registry Settings
III. Comments from source code:
IV.  DocParse()
   A. void DocParseInit()
   B. void DocParseTerm()
   C. void DocParseSetCancelCb()
   D. void DocParseSetRankParams()
   E. DocParseSetDetailParams()
   F. DocParseSetScoreBias()
   G. DocParseSetWriteUrlFlag()
   H. DocParseSetUpdateParams()
   I. DocParseSetProcessMode()
   J. DocParseGetResult()
   K. void DocParseFile()
   L. void DocParseMem()
   M. void DocParseBucket()
V.    DocParse() - Relevancy
   A. void DocParseSetRelevancyParams()
   B. void DocParseRelevancyMem()
   C. void DocParseRelevancyGetResult()
   D. DocParseRelevancyAbsGetResult()
VI.   DocParser()
   A. DocParserInit()
   B. DocParserTerm()
   C. DocParserSetCancelCb()
   D. DocParser()
   E. DocParserAbs()
   F. DocParserCreateResult()
   G. DocParserDestroyResult()
   H. DocParserDupResult()
   I. DocParserResizeResult()
   J. DocParserGrowResult()
   K. DocDetailCreateResult()
   L. DocDetailDestroyResult()
   M. DocDetailResizeResult()
   N. DocDetailGrowResult()
   O. DocTokenArrayCreateResult()
   P. DocTokenArrayDestroyResult()
VII.  DocStruct()
   A. void DocStructInit()
   B. void DocStructTerm()
   C. void DocStructFresh()
   D. void DocStructFromFile()
   E. void DocStructFromFileRD()
VIII. DocStructUpdate()
   A. void DocStructUpdateInit()
   B. void DocStructUpdateTerm()
   C. void DocStructUpdate()
 

I. Universal definitions
 
   A. char *ErrorString
 
      The *ErrorString parameter is included in all Text Search API 
      functions. It is presumed to be defined by the calling program as
         char ErrorString[256] = "\0";
 
      If an API function encounters an error, it stores a descriptive,
      non-null string into *ErrorString. It is the responsibility of
      the calling program to check for this condition and to take
      appropriate steps.


II. Registry Settings

   This application uses the WordNet subsystem and the WordNet Dictionary. 

   Subsequent to 2005.02.12, the folder containing the WordNet Dictionary 
   files is specified by the registry key:
      HKEY_LOCAL_MACHINE\Software\WhamTech\TextSearch\WordNet

   Prior to 2005.02.12, the folder containing the WordNet Dictionary 
   files was specified by the environment variable:
      WNSEARCHDIR


III. Comments from source code:

   A. From DocWordCount.cpp:
      The rules for forming a word are quite simple: if any character 
      except {A-Z,a-z,0-9,'} is encountered, it is considered a word 
      break. The ' is not considered a word break, but it is not 
      included in the word. For example, O'Brien is indexed as OBRIEN.
    
      The '&' character is considered to be a separate word (except when 
      it occurs within an HTML Entity [ , etal].
    
      The '.' character is considered to be a break for proximity pairs.
 
   B. From DocRank.cpp:
      i. WordNet Stemming vs Porter Stemming:

         WordNet Stemming (morphing) combines table lookup and computional 
         processing. WordNet Stemming returns the singular case form of 
         nouns and the first person present tense form for verbs. For example:

           nationalizations ==> nationalization
           running          ==

     
                  
               
                   
                
    
                  
             
    
          > run It is quite comprehensive in dealing
 
   with special cases
      such as mice/mouse and data/datum. WordNet

         Stemming requires that the source word be in 
         lower case. Porter Stemming is a strictly computational process. 
         Porter Stemming removes multiple prefixes and suffixes to give the root stem.

           For  example:
           nationalizations          == 

                    
           

                   

                 
                   
           

            >  nation

         Porter Stemming does not attempt to deal with special cases.

         Porter Stemming requires the source word to be in lower case.

      ii. The Synonym-based word weighting works as follows:

         a. The parser (RmaParser.dll) extracts words from a document and 
            stores them in an array of structure items (struct WordDescriptor{}). 
            Each structure item includes the word, multiple attribute counters 
            and flags, and additional temp space for the weighting process.

         b. When the parser has finished collecting words for a document, 
            it calls DocRankWords() in RmaPtos.dll. DocRankWords() 

IV. DocParse()
   
   A. void DocParseInit()
 
      Prototype: (Ref DocParse.h)
         void DocParseInit(
            long *DocParseHandle,          /* [OUT] */
            char *ArgString,               /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocParseInit() create and initializes a context for the DocParseXxx()
         functions; and return a context handle.
 
      Description of Parameter *DocParseHandle:
         If function is successful, a non-zero handle is returned
         for use in subsequent DocParseXxx() function calls.
         If function is not successful, zero is returned.
 
      Description of Parameter *ArgString:
         At present, ArgString is the null string.
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   B. void DocParseTerm()
 
      Prototype: (Ref DocParse.h)
         void DocParseTerm(
            long DocParseHandle,           /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocParseTerm() terminates and destroys the DocParseXxx() context.
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   C. void DocParseSetCancelCb()
 
      Prototype: (Ref DocParse.h)
         void DocParseSetCancelCb(
            long DocParseHandle,           /* [IN] */
            DocCancelCb CancelCb,
            long CancelUserWord);
 
      Description:
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter CancelCb: (Ref DocCommon.h)
         CancelCb is a function pointer to the callback function;
         its prototype is defined by the typedef:
            typedef int (DocStdCall *DocCancelCb)(long UserWord);
         If the callback is defined [i.e., if the function pointer is 
         non-zero], it is called by a process periodically; if the 
         callback return 0, then the process continues; if the callback
         returns non-zero, then the process terminates.
 
         In this case, the callback is called by DocParseBucket() after  
         processing each document.
   
   D. void DocParseSetRankParams()
 
      Prototype: (Ref DocParse.h)
         void DocParseSetRankParams(
            long DocParseHandle,           /* [IN] */
            int RankMode,                  /* [IN] */
            char *OutputMetaFile,          /* [IN] */
            char *OutputInfoFile,          /* [IN] NULL ==
 
        
          
                          
                                 
                         
                         >  no info file */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocParseSetRankParams() passes rank-related parameters. 
         The sequence of operations is:
         a) DocParseInit()
         b) 1 or more calls to DocParseSetRankParams()
         c) DocDetailSetParams()
         d) DocParseBucket()
         e) DocParseTerm()
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
         
      Description of Parameter RankMode: (Ref DocParse.h)
         DocRankFlPoolXxx flags cause derivatives to be computed and used
         solely for the purpose of pooling word-score data prior to ranking.
         The base word is not changed; and no additional words are created.
         We call this mode "Stem-Pool Mode".
       
         DocRankFlDerivedXxx flags cause derivatives to be computed to replace
         the base word. Then, word-score data for like words is combined
         and the duplicate word is removed.
         We call this mode "Word-Pool Mode".
       
         #define DocRankFlPoolNone         0x00
         #define DocRankFlPoolWn           0x02
         #define DocRankFlPoolSoundex      0x04
         #define DocRankFlPoolMetaPhone    0x05
         #define DocRankFlPoolWnSyn        0x06
         #define DocRankFlDerivedWn        (DocRankFlPoolWn | 0x08)
         #define DocRankFlDerivedSoundex   (DocRankFlPoolSoundex | 0x08)
         #define DocRankFlDerivedMetaPhone (DocRankFlPoolMetaPhone | 0x08)
         #define DocRankFlDerivedWnSyn     (DocRankFlPoolWnSyn | 0x08)
 
      Description of Parameter *OutputMetaFile:
         Contains the name of the meta-data output file for this RankMode.
 
      Description of Parameter *OutputInfoFile:
         Contains the name of the meta-info output file for this RankMode.
         It is NULL if no meta-info output file is desired.
          
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   E. DocParseSetDetailParams()
 
      Prototype: (Ref DocParse.h)
         void DocParseSetDetailParams(
            long DocParseHandle,           /* [IN] */
            int DetailMode,                /* [IN] */
            char *OutputDetailFile,        /* [IN] */
            char *DetailTag,               /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter DetailMode: (Ref DocParser.h)
         The four low order bits of DetailMode (masked by DocParserFlDetailMask)
         specifies the Proximity Level. This is the maximum distance 
         between word pairs that are to be emnitted. For example, for words:
            Word1 Word2 Word3 Word4
         and for Proximity Level 2, the following pairs are emitted:
            Word1 Word2
            Word1 Word3
            Word2 Word3
            Word2 Word4
            Word3 Word4
 
         The maximum permitter Proximity Level is defined symbolically in
         DocCommon.h as:
 
         #define MAX_PROXLEVEL  5   // max proximity for word pair scheme
 
         The number of characters in each member of the pair is defined
         symbolically in DocCommon.h as:
 
         #define CHARS_IN_PROX     4      /* number of chars in Proximity Index */
 
         The remaining DetailMode flags, from DocParser.h, specify
         how the pair is formed:
 
         #define DocParserFlDetailRaw    0x0100   /* use raw word for Prox Pairs */
         #define DocParserFlDetailStem   0x0010   /* use stem for Prox Pairs */
         #define DocParserFlDetailPorter 0x0020   /* use Porter stem for Prox Pairs */
         #define DocParserFlDetailMeta   0x0040   /* use metaphone for Prox Pairs */
         #define DocParserFlDetailPoMeta 0x0080   /* use Porter + metaphone for Prox Pairs */
         #define DocParserFlDetailMask   0x000F
 
      Description of Parameter OutputDetailFile:
         Contains the name of the meta-data output file for all Prox Pairs.
         The Prox data in each record of this flat file is in the form:
            Byte 0: Prox Level for this pair; (ascii '1' thru '5');
            Byte 1-4: first half of Prox Pair
            Byte 5-8: second half of Prox Pair
         Each record also includes additional meta data such as TB recno,
         TB table name, etc.
 
      Description of Parameter DetailTag:
         This is a 2 or 3 character ascii string that duplicates the 
         effective -P option; i.e., "PR", "PPM", ...
         It is used in summary message displays.
          
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   F. DocParseSetScoreBias()
 
      Prototype: (Ref DocParse.h)
         void DocParseSetScoreBias(
            long DocParseHandle,           /* [IN] */
            char *ArgString,               /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         This function defines a constant ranking score for a URL
         based on its domain. If the first N charactyers of a URL
         matches the specified domain, its score is set to the
         specified value.
 
         DocParseSetScoreBias() is an optional call. And it can
         be called multiple times.
 
         The placement of DocParseSetScoreBias() is not significant as
         long as it is after DocParseInit() and before DocParseTerm().
         However, it is meaningless unless DocParseSetWriteUrlFlag()
         is also invoked.
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter *ArgString:
         ArgString is an ascii string in the form:
            <Domain> <score>
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   G. DocParseSetWriteUrlFlag()
 
      Prototype: (Ref DocParse.h)
         void DocParseSetWriteUrlFlag(
            long DocParseHandle,           /* [IN] */
            char *DbName,                  /* [IN] */
            char *DbTableName,             /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         This function sets a flag and initializes to cause 
         subsequent calls to DocParseBucket() to write the URL
         that is contained in each document in the bucket file
         to the target data base, specified by the DbName parameter.
         The resulting record number overrides the record number
         in the  tag and becomes part of the data that
         is emitted to the meta file(s).
 
         In addition to the URL itself, these fields are also 
         written:
         . Crawl Date
         . BulkFileName
         . AccountID
         . Score
         . UrlHash
 
         The BulkFileName field is stored as "Repository\FileName"; 
         with the expectation that this will be interpreted as a
         relative path wrt the DocIndex data base path.
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter *DbName:
         DbName is the full path name of the target data base.
 
      Description of Parameter *DbTableName:
         DbTableName is the table name in the target data base
         where URLs are to be written.
 
         In the current implementation, target data base column
         names are coded as constants. As a result, the schema of
         the target data base must conform to this model:
 
          DG  10000 records
             URL                    X(180)                @ (*) Full name of the URL
             URLHash                X(14)   KEY DISCRETE  @ (*) Hash to URL name
             BulkFileName           X(96)                 @ (*) Bulk file name where the URL resides
             UrlText                MEMO
          :1 UrlContextAlias        X(20)                 @ (*)
             Score                  ID                    @ (*) URL score  (based on URL references)
             AccountID              ID                    @ (*) Account ID related to this domain
             DateSpidered           Date                  @ (*)
             DateModified           Date                  @
             URL_Text               X(16)   KEY VIRTUAL   @ Text search field                   
             URL_TextDS             X(16)   KEY VIRTUAL   @ Text search field                   
             URL_TextDY             X(16)   KEY VIRTUAL   @ Text search field                   
             URL_TextDX             X(16)   KEY VIRTUAL   @ Text search field                   
 
         (*) Required column name and type
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   H. DocParseSetUpdateParams()
 
      Prototype: (Ref DocParse.h)
         void DocParseSetUpdateParams(
            long DocParseHandle,           /* [IN] */
            int ParserFlags,               /* [IN] */
            char *ArgString,               /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocParseSetUpdateParams() changes the mode of operation of the
         DocParseXxx() functions. This call is pertinent only when it
         is used in conjunction with the DocParseSetWriteUrlFlag().
 
         The default mode of operation of DocParseXxx() [e.g., DocParseBucket()]
         when DocParseSetUpdateParams() is not in effect is to
         unconditionally write all URLs that are parsed to the target
         data base.
 
         When the DocParseSetUpdateParams() mode is in effect, all URLs
         are tested for existence in the target data base. If a URL exists,
         it is processed in update mode; that is, the DocStructUpdate()
         functions are called internally using the parameters specified
         in the DocParseSetUpdateParams().
 
         If the URL does not exist in the target data base, it is processed
         in the default mode.
 
         On return from pertinent DocParseXxx() calls, the caller can
         use the DocParseGetResult() function to determine if any URLs
         were inserted. If this count is greater than zero, the caller
         must call the DocStruct() functions as in the default case.
 
         Also, when the DocParseSetUpdateParams() mode is in effect, 
         the presence of the attribute "delete=true" in the  
         tag causes the URL specified by the attribute "URL=xxx"
         (and by the SeedUrlID attribute, if applicable) to be deleted
         from the DocIndex, along with its text index entries.
         If the URL attribute specifies "URL=all", all URLs for the
         specified SeedUrlID are deleted.
 
      Description of Parameter ParserFlags:
         These are the same as the DetailMode parameter for DocParseSetDetailParams().
 
      Description of Parameter *ArgString:
         ArgString is an ascii string in the form:
            <mode1> <KeyName1> [<mode2> <KeyName2> ...] 
               <mode> is
                  WNraw   - Pool None Raw
                  WNrank  - Pool None Rank
                  WSraw   - Pool Stem Raw
                  WSrank  - Pool Stem Rank
                  WYraw   - Pool Synonym Raw
                  WYrank  - Pool Synonym Rank
                  DSraw   - Derived Stem Raw
                  DSrank  - Derived Stem Rank
                  DYraw   - Derived Synonym Raw
                  DYrank  - Derived Synonym Rank
                  DXraw   - Derived Soundex Raw
                  DXrank  - Derived Soundex Rank
                  DMraw   - Derived MetaPhone Raw
                  DMrank  - Derived MetaPhone Rank
                  PROX1   - Proximity Level 1
                  PROX2   - Proximity Level 2
                  PROX3   - Proximity Level 3
                  PROX4   - Proximity Level 4
                  PROX5   - Proximity Level 5
   
   I. DocParseSetProcessMode()
 
      Prototype: (Ref DocParse.h)
         void DocParseSetProcessMode(
            long DocParseHandle,           /* [IN] */
            char *ArgString,
            char *ErrorString);
 
      Description:
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter *ArgString:
         ArgString is an ascii string in the form:
            <mode>
               <mode> is an ascii literal
                  "default"
                  "ECA"
 
   
   J. DocParseGetResult()
 
      Prototype: (Ref DocParse.h)
         void DocParseGetResult(
            long DocParseHandle,           /* [IN] */
            int indicator,                 /* [IN] */
            int *result);                  /* [OUT] */
 
      Description:
         DocParseGetResult() returns one of three counters that are
         incremented by the DocParseXxx() functions. 
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter indicator:
         Symbolic Value           Counter
         --------------           -------
         DocParseIndInsertCount   Number of URL records inserted
         DocParseIndUpdateCount   Number of URL records updated
         DocParseIndTotalCount    Total number of URL records processed
 
      Description of Parameter *result:
         The requested counter is returned in *result.
   
   K. void DocParseFile()
 
      Prototype: (Ref DocParse.h)
         void DocParseFile(
            long DocParseHandle,           /* [IN] */
            char *SourceFileName,          /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         Not implemented.
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter *SourceFileName:
         SourceFileName is the name of a document file to be parsed.
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   L. void DocParseMem()
 
      Prototype: (Ref DocParse.h)
         void DocParseMem(
            long DocParseHandle,           /* [IN] */
            char *SourceMemImage,          /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         Not implemented.
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter *SourceMemImage:
         SourceMemImage is the memory image of a document to be parsed.
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   M. void DocParseBucket()
 
      Prototype: (Ref DocParse.h)
         void DocParseBucket(
            long DocParseHandle,           /* [IN] */
            char *BucketFileName,          /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocParseBucket() parses, ranks and stores meta-data for a
         specified bucket file. The sequence of operations is:
         a) DocParseInit()
         b) 1 or more calls to DocParseParam()
         c) DocParseBucket()
         d) DocParseTerm()
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter *BucketFileName:
         BucketFileName is the name of a bucket file to be parsed.
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.

V. DocParse() - Relevancy
   
   A. void DocParseSetRelevanceParams()
 
      Prototype: (Ref DocParse.h)
         void DocParseSetRelevancyParams(
            long DocParseHandle,           /* [IN] */
            int RankMode,                  /* [IN] */
            char *RelevancyParams,         /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocParseSetRelevancyParams() passes the parameters that are 
         used to calculate document relevancy.
         Relevancy parameters are specified as a function of RankMode;
         thus, DocParseSetRelevancyParams() can be called multiple times,
         once for each RankMode that requires a relevancy calculation.
 
         The sequence of operations is:
         a) Initialization 
            . DocParseInit()
            . 1 or more calls to DocParseSetRelevancyParams(); there can
              be up to 16 [MAX_DPRC] calls to DocParseSetRelevancyParams()
              for distinct values of RankMode;
         b) Operation; iteration on
            . DocParseRelevancyMem()
            . 1 or more calls to DocParseRelevanceGetResult() and/or
              DocParseRelevancyAbsGetResult()
         c) Termination
            . DocParseTerm()
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter RankMode: (Ref DocParse.h)
         Refer to description at DocParseSetRankParams;
 
         For DocParseSetRelevancyParams() only, there is another possible
         value for parameter RankMode:
            #define DocRankFlPoolAbs      0x07
         When this value is used, the word list in parameter RelevancyParams
         is actually a list of names; and the relevancy calculation is
         made by DocParseRelevancyAbsGetResult() rather than 
         DocParseRelevancyGetResult().
 
      Description of Parameter RelevancyParams:
         RelevancyParams is a null-terminated, comma-separated (with
         optional quotes ["]) list of words to be used in measuring
         relevancy;
 
         When parameter RankMode is DocRankFlPoolAbs, the word list 
         is actually a list of names; i.e., each item may contain multiple
         words.
 
         In all cases, words are constructed from the characters
            [0-9] [A-Z] [’]
         All other characters are treated as separators. When a word contains
         the [’] character it is rendered both with and without the [’]; e.g.,
            w’ord is rendered as both w’ord and as word;
            ’word is rendered as both ’word and as word;
            word’ is rendered as both word’ and as word;
   
   B. void DocParseRelevancyMem()
 
      Prototype: (Ref DocParse.h)
         void DocParseRelevancyMem(
            long DocParseHandle,           /* [IN] */
            char *SourceMemImage,          /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter *SourceMemImage:
         SourceMemImage is the memory image of a document to be parsed.
   
   C. void DocParseRelevancyGetResult()
 
      Prototype: (Ref DocParse.h)
         void DocParseRelevancyGetResult(
            long DocParseHandle,           /* [IN] */
            int RankMode,                  /* [IN] */
            int *result);                  /* [OUT] */
 
      Description:
         DocParseRelevancyGetResult() returns a relevency score for the
         document specified in the most recent DocParseRelevencyMem() call.
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter RankMode: (Ref DocParse.h)
         Refer to description at DocParseSetRankParams;
 
         DocParseRelevancyGetResult() accepts all values of RankMode
         except DocRankFlPoolAbs
 
      Description of Parameter *result:
         An integer between 0 and 100 is returned; this is the normalized
         count of hits of words in the most recent document from
         DocParseRelevancyMem().
         In the case of an error, *result is -1.
   
   D. DocParseRelevancyAbsGetResult()
 
      Prototype: (Ref DocParse.h)
         void DocParseRelevancyAbsGetResult(
            long DocParseHandle,           /* [IN] */
            int RankMode,                  /* [IN] */
            int *resultCount,              /* [OUT] */
            char *resultString,            /* [OUT] */
            int resultStringLength);       /* [IN] */
 
      Description:
         DocParseRelevancyAbsGetResult() returns a hit count for the
         document specified in the most recent DocParseRelevencyMem() call.
 
      Description of Parameter DocParseHandle:
         DocParseHandle is the non-zero handle returned by DocParseInit().
 
      Description of Parameter RankMode: (Ref DocParse.h)
         DocParseRelevancyAbsGetResult() accepts only RankMode value
         DocRankFlPoolAbs
 
      Description of Parameter *resultCount:
         An integer between 0 and N is returned; N is the number of words
         in the DocParseSetRelevancyParams() RelevancyParams parameter.
         *resultCount is the count of hits of words in the most recent
         document from DocParseRelevancyMem().
         In the case of an error, *resultCount is -1.
 
      Description of Parameter *resultString:
         char resultString[N] is a buffer defined by the caller. The max
         length of N if given by the fifth parameter resultStringLength.
         Each hit is copied to *resultString in a quote-delimited (")
         comma separated (,) format.
 
      Description of Parameter resultStringLength:
         resultStringLength is the maximum number of characters [including
         the null terminator] that can be stored in *resultString.
 

VI. DocParser()
   
   A. void DocParserInit()
 
      Prototype: (Ref DocParser.h)
         void DocParserInit(
            long *DocParserHandle,
            char *ParamString,
            char *ErrorString,
            char *StopWordList);
 
      [Documentation for this function is not complete.]
 
   
   B. void DocParserTerm()
 
      Prototype: (Ref DocParser.h)
         void DocParserTerm(
            long DocParserHandle,
            char *ErrorString);
 
      [Documentation for this function is not complete.]
 
   
   C. void DocParserSetCancelCb()
 
      Prototype: (Ref DocParser.h)
         void DocParserSetCancelCb(
            long DocParserHandle,
            DocCancelCb CancelCb,
            long CancelUserWord);
 
      [Documentation for this function is not complete.]
 
   
   D. void DocParser()
 
      Prototype: (Ref DocParser.h)
         void DocParser(
            long DocParserHandle,
            long ParserFlags,
            char *SourceFileName,       /* can be either file or memory image */
            char *SourceMemImage,       /* if SourceMemImage == NULL, use SourceFileName */
            DocParserResult **dprRef,
            DocDetailResult **ddrRef,   /* may be NULL */
            char *ErrorString);
 
      [Documentation for this function is not complete.]
 
   
   E. void DocParserAbs()
 
      Prototype: (Ref DocParser.h)
         void DocParserAbs(
            long DocParserHandle,
            long ParserFlags,
            char *SourceFileName,       /* can be either file or memory image */
            char *SourceMemImage,       /* if SourceMemImage == NULL, use SourceFileName */
            DocParserResult **dprRef,
            DocDetailResult **ddrRef,   /* may be NULL */
            DocTokenArrayResult **dtrRef,
            char *ErrorString);
 
      [Documentation for this function is not complete.]
 
   
   F. DocParserResult *DocParserCreateResult()
 
      Prototype: (Ref DocParser.h)
         DocParserResult *DocParserCreateResult(int maxwords);
 
      [Documentation for this function is not complete.]
 
   
   G. void DocParserDestroyResult()
 
      Prototype: (Ref DocParser.h)
         void DocParserDestroyResult(
            DocParserResult *dpr,
            char *ErrorString);
 
      [Documentation for this function is not complete.]
 
   
   H. void DocParserDupResult()
 
      Prototype: (Ref DocParser.h)
         void DocParserDupResult(
            DocParserResult *dpr,
            DocParserResult **dprDupRef,
            char *ErrorString);
 
      [Documentation for this function is not complete.]
 
   
   I. int DocParserResizeResult()
 
       Prototype: (Ref
         DocParser.h) int
            DocParserResizeResult( /* value 0 - failure; value 1 - success
            */ DocParserResult
            *dpr, int               NewSize); /* newdpr->dprWordArraySize */
 
      [Documentation for this function is not complete.]
 
   
   J. int DocParserGrowResult()
 
      Prototype: (Ref DocParser.h)
         int DocParserGrowResult(
            /* value 0 - failure; value 1 - success */
            DocParserResult *dpr);
 
      [Documentation for this function is not complete.]
 
   
   K. DocDetailResult *DocDetailCreateResult()
 
      Prototype: (Ref DocParser.h)
         DocDetailResult *DocDetailCreateResult(int maxpairs);
 
      [Documentation for this function is not complete.]
 
   
   L. void DocDetailDestroyResult()
 
      Prototype: (Ref DocParser.h)
         void DocDetailDestroyResult(
            DocDetailResult *ddr,
            char *ErrorString);
 
      [Documentation for this function is not complete.]
 
   
   M. int DocDetailResizeResult()
 
       Prototype: (Ref
         DocParser.h) int
            DocDetailResizeResult( /* value 0 - failure; value 1 - success
            */ DocDetailResult
            *ddr, int               NewSize); /* newddr->ddrPairArraySize */
 
      [Documentation for this function is not complete.]
 
   
   N. int DocDetailGrowResult()
 
      Prototype: (Ref DocParser.h)
         int DocDetailGrowResult(
            /* value 0 - failure; value 1 - success */
            DocDetailResult *ddr);
 
      [Documentation for this function is not complete.]
 
   
   O. DocTokenArrayResult *DocTokenArrayCreateResult()
 
      Prototype: (Ref DocParser.h)
         DocTokenArrayResult *DocTokenArrayCreateResult(int maxint)
 
      [Documentation for this function is not complete.]
 
   
   P. void DocTokenArrayDestroyResult()
 
      Prototype: (Ref DocParser.h)
         void DocTokenArrayDestroyResult(
            DocTokenArrayResult *dtr,
            char *ErrorString);
 
      [Documentation for this function is not complete.]
 
 

VII. DocStruct()
   
   A. void DocStructInit()
 
      Prototype: (Ref DocStruct.h)
         void DocStructInit(
            long *DocStructHandle,         /* [OUT] */
            char *ArgString,               /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocStructInit() create and initializes a context for the DocStructXxx()
         functions; and return a context handle.
 
      Description of Parameter *DocStructHandle:
         If function is successful, a non-zero handle is returned
         for use in subsequent DocStructXxx() function calls.
         If function is not successful, zero is returned.
 
      Description of Parameter *ArgString:
         ArgString is an ascii string in the form:
            <DbName> [TOKENMAP OFF]
               <DbName> is the name of the target database.
               TOKENMAP OFF is an optional clause that is required
                  if <DbName> is token mapped;
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   B. void DocStructTerm()
 
      Prototype: (Ref DocStruct.h)
         void DocStructTerm(
            long DocStructHandle,
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocStructTerm() terminates and destroys the DocStructXxx() context.
 
      Description of Parameter DocStructHandle:
         DocStructHandle is the non-zero handle returned by DocStructInit().
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   C. void DocStructFresh()
 
      Prototype: (Ref DocStruct.h)
         void DocStructFresh(
            long DocStructHandle,          /* [IN] */
            char *ArgString,               /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocStructFresh() performs a DESTRUCT on the specified index(es).
 
      Description of Parameter DocStructHandle:
         DocStructHandle is the non-zero handle returned by DocStructInit().
 
      Description of Parameter *ArgString:
         ArgString is an ascii string in the form:
            <KeyName1> [<KeyName2> ...]
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   D. void DocStructFromFile()
 
      Prototype: (Ref DocStruct.h)
         void DocStructFromFile(
            long DocStructHandle,          /* [IN] */
            char *ArgString,               /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocStructFromFile() performs a STRUCTURE using the specified
         meta-data file; it uses a default RD that is consistent
         with DocParse() output.
 
      Description of Parameter DocStructHandle:
         DocStructHandle is the non-zero handle returned by DocStructInit().
 
      Description of Parameter *ArgString:
         ArgString is an ascii string in the form:
            <Mode> <KeyName> <MetaFileName> [<table-name-literal>]
               <Mode> is {RAW | RANK | PROXn}
               <MetaFileName> is the name(s) of one or more meta-file;
                  multiple file names are separated by '+';
                  file names with special characters are enclosed in \" or \'
                  [Currently, the buffer that holds file names is 1024 bytes;
                  if/when that is a problem, I can re-code that.]
               <table-name-literal> is the name of the table [enclosed
                  in 's] that contains <KeyName>; when <KeyName>
                  is local to a single table, the table-name may be specified
                  either as a literal in ArgString; or it may be included
                  in the meta-data. If <KeyName> is global to multiple tables,
                  then it can only be passed in the meta-data.
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   E. void DocStructFromFileRD()
 
      Prototype: (Ref DocStruct.h)
         void DocStructFromFileRD(
            long DocStructHandle,          /* [IN] */
            char *ArgString,               /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocStructFromFileRD() performs a STRUCTURE using the specified
         meta-data file; it uses a user-specified RD.
 
      Description of Parameter DocStructHandle:
         DocStructHandle is the non-zero handle returned by DocStructInit().
 
      Description of Parameter *ArgString:
         ArgString is an ascii string in the form:
            <Mode> <KeyName> <MetaFileName> <RDname> <RD-KeyName> <RD-recno> <RD-TableName> [<RD-ColumnName> <RD-dbname>]
            <Mode> <KeyName> <MetaFileName> <RDname> <RD-KeyName> <RD-recno> <table-name-literal> [<RD-ColumnName> <RD-dbname>]
               <Mode> is {RAW | RANK | PROXn}
               <MetaFileName> is the name(s) of one or more meta-file;
                  multiple file names are separated by '+';
                  file names with special characters are enclosed in \" or \'
                  [Currently, the buffer that holds file names is 1024 bytes;
                  if/when that is a problem, I can re-code that.]
               <table-name-literal> is the name of the table [enclosed
                  in 's] that contains <KeyName>; when <KeyName>
                  is local to a single table, the table-name may be specified
                  either as a literal in ArgString; or it may be included
                  in the meta-data. If <KeyName> is global to multiple tables,
                  then it can only be passed in the meta-data.
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
 

VIII. DocStructUpdate()
   
   A. void DocStructUpdateInit()
 
      Prototype: (Ref DocStruct.h)
         void DocStructUpdateInit(
            long *DocUpdateHandle,         /* [OUT] */
            char *ArgString,               /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocStructUpdateInit() create and initializes a context for the 
         DocStructUpdateXxx() functions; and return a context handle.
 
      Description of Parameter *DocUpdateHandle:
         If function is successful, a non-zero handle is returned
         for use in subsequent DocStructUpdateXxx() function calls.
         If function is not successful, zero is returned.
 
      Description of Parameter *ArgString:
         ArgString contains the name of the target database.
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   B. void DocStructUpdateTerm()
 
      Prototype: (Ref DocStruct.h)
         void DocStructUpdateTerm(
            long DocUpdateHandle,          /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocStructUpdateTerm() terminates and destroys the DocStructUpdateXxx()
         context.
 
      Description of Parameter DocUpdateHandle:
         DocUpdateHandle is the non-zero handle returned by DocStructUpdateInit().
 
      Description of Parameter *ErrorString:
         Refer to paragraph I.
   
   C. void DocStructUpdate()
 
      Prototype: (Ref DocStruct.h)
         void DocStructUpdate(
            long DocUpdateHandle,          /* [IN] */
            int ParserFlags,               /* [IN] */
            char *ArgString,               /* [IN] */
            DocUpdateItem *ItemArray1,     /* [IN] */
            int ItemCount1,                /* [IN] */
            DocUpdateItem *ItemArray2,     /* [IN] */
            int ItemCount2,                /* [IN] */
            char *ErrorString);            /* [OUT] */
 
      Description:
         DocStructUpdate() performs an index update based on the incoming
         ArgString parameters and DocUpdateItem{} data.
 
      Definition of structure DocUpdateItem:
         typedef struct DocUpdateItem{
            int duidItemType;               /* item type */
            int duidPassThruSize;           /* size of item passthru tag */
            char *duidPassThruPointer;      /* pointer to item passthru tag */
            int duidItemSize;               /* size of item text */
            char *duidItemPointer;          /* pointer to the item text */
            char *duidReserved1;            /* reserved for internal use */
            char *duidReserved2;            /* reserved for internal use */
         }DocUpdateItem;
             
         enum{                              /* enum{} for duidItemType */
            DocUpdateItemTypeText      = 1,
            DocUpdateItemTypeDoc       = 2,
            DocUpdateItemTypeBlob      = 3
         };
 
      Description of Parameter DocUpdateHandle:
         DocUpdateHandle is the non-zero handle returned by DocStructUpdateInit().
 
      Description of Parameter ParserFlags:
         These are the same as the DetailMode parameter for DocParseSetDetailParams().
 
      Description of Parameter *ArgString:
         ArgString is an ascii string in the form:
            <TableName> <recno> <mode1> <KeyName1> [<mode2> <KeyName2> ...] 
               <mode> is
                  WNraw   - Pool None Raw
                  WNrank  - Pool None Rank
                  WSraw   - Pool Stem Raw
                  WSrank  - Pool Stem Rank
                  WYraw   - Pool Synonym Raw
                  WYrank  - Pool Synonym Rank
                  DSraw   - Derived Stem Raw
                  DSrank  - Derived Stem Rank
                  DYraw   - Derived Synonym Raw
                  DYrank  - Derived Synonym Rank
                  DXraw   - Derived Soundex Raw
                  DXrank  - Derived Soundex Rank
                  DMraw   - Derived MetaPhone Raw
                  DMrank  - Derived MetaPhone Rank
                  PROX1   - Proximity Level 1
                  PROX2   - Proximity Level 2
                  PROX3   - Proximity Level 3
                  PROX4   - Proximity Level 4
                  PROX5   - Proximity Level 5
 
      Description of Parameter ItemArray1:
         ItemArray1 is an array of DocUpdateItem{} structures defining  the
         "old" value documents. 
 
      Description of Parameter ItemCount1:
         ItemCount1 is the number of items in ItemArray1.
 
      Description of Parameter ItemArray2:
         ItemArray2 is an array of DocUpdateItem{} structures defining  the
         "new" value documents. 
 
      Description of Parameter ItemCount2:
         ItemCount2 is the number of items in ItemArray2.
       
      Description of Parameter *ErrorString:
         Refer to paragraph I.
 
               
                   
                     
                    

Copyright © 2019 , WhamTech, Inc.  All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. Names may be trademarks of their respective owners.