Defining and Applying User Transforms
Changing to RTI Mode for Building EIQ Indexes
Designating Key Type for Columns
Designating PRIMARY KEY and FOREIGN KEY Columns
Designating Text Search Fields
'Auto Name Mapping' for Table and Column Names
Creating Indexed Views in EIQ Indexes
Entity Extraction from Text Columns
Appendix A: Column Data Type Mappings
Modifying Global Data Type Map
Modifying Column Data Type Map
Appendix B: Available Column Data Type Mappings.
Appendix C: Available Transformations
Appendix D: RTI Reserved Words
Appendix E: ROWID Column Data Types
The EIQ Server RTI Tool is used to build EIQ Indexes for data sources. The tool connects to data sources; reads and transforms data; and builds clean, standardized indexes for use by EIQ Product servers. The data sources can be relational data sources, such as SQL Server, Oracle, and DB2, or document files of various types including text, html, XML, Word, and Excel formatted files.
The RTI tool includes the following functions:
EIQ Server RTI Tool has two modes: Profile and RTI mode. In Profile mode, users can build profile indexes on data sources in order to analyze the data and test and apply transforms. Users can switch from Profile to RTI mode once they are ready to start building the full EIQ Indexes for the data source.
Note: Profile indexes should not be used for configuring EIQ Product Servers because they have limited indexing structures. Only EIQ Indexes built in RTI mode are valid for EIQ Product configuration.
The RTI tool can build EIQ Indexes for the following data source types:
· Any data source that provides an ODBC or JDBC interface to clients (SQL Server, Oracle, DB2, etc.).
· Various document files (text, html, XML, Word, Excel, etc.)
· RSS Feeds
Accessing JDBC data sources requires WhamTech's Java Gateway server.
EIQ Server RTI Tool presents a wizard-like interface that guides users through several of the initial steps for opening a work session to profile a data source or build EIQ Indexes. Users are asked to select the data source type, enter information required for data source connection, provide the name and path for the EIQ Index, and etc.. Once connected to the data source and the data source metadata is loaded, depending on the data source type, users enter Profile or Index mode. Here, they can select the data source elements (tables and columns or documents) for profiling or building EIQ Indexes.
Once a session is loaded, users can save the session information to the hard drive as map files. They can open saved map files later to continue their work or to make any changes. The map files save work session details, such as the data source connection information; selected schemas, tables, and columns (or document files); index properties; data transforms; and primary key-foreign key relationships. There are two types of map files: Profiling mode uses the extension '.rtip.xml' and Indexing mode uses the extension '.rti.xml'.
To start building data source profiles or indexes, start the EIQ Server RTI Tool from the Start Menu. Select 'Create a new EIQ Index RTI map' from the startup dialog.
Figure 1: RTI startup dialog.
Click 'OK' to configure the data source and EIQ Index information. You now have a choice to connect to the following data sources:
· Relational Data Source (ODBC/JDBC)
· Document Files
· RSS Feeds
Figure 2: Configure the data source and enter index information.
Select the 'Data Source Connection String' option. This will enable the DSN, UID and PWD edit fields, and the 'Build' button.
Select 'Build' and select the 'Machine Data Source' tab or enter an ODBC DSN. Then, enter a UID and PWD if required.
Figure 3: Select the data source.
Click 'OK' and log in to the database server if required.
Figure 4: Enter the password for the data source.
Use or change the default index path and enter the name for the new EIQ Index. Then press 'Next'. If you typed the name of the DSN and did not provide the credentials, a login dialog will appear for you to do so now.
Figure 5: Enter the login information.
Users may find it necessary to change the data type of a data source column while indexing it in an EIQ Index. For example, a data source column of the 'CHAR' type may contain only integer values. In order to improve the query flexibility, users may want to build indexes in the 'INT' type.
The global data type map dialog allows you to globally change the mapping of a data type.
Figure 6: Configure the global data type map.
You can leave the defaults during this step and click 'Finish' to proceed. A progress dialog is displayed while the database schema loads. You can still change the default data type mappings at any time after this point. See section 'Data Type Conversion' for details.
Figure 7: Schema loading is complete.
After the database schema loads, click 'OK'.
Figure 8: New RTI sessions start in 'Profiling' mode.
All new sessions for relational data sources start in Profiling mode. The RTI tool shows the data source metadata information in a tree structure on the left side pane (the data source pane). The right pane (Index pane) will be empty at this point. The RTI tool holds an internal map (Profile map or RTI map) of all the information entered by the user during this session along with data source metadata. Users can start saving the map at any time by clicking 'Save' or 'Save As' in the 'File' menu.
Note: The current mode is displayed in the status bar at the bottom-right of the window. Depending on the size of the window, this may be obscured. |
Expand the data source tree to see the table and column information. Then, select the columns from the left pane and add them to the map on the right side for building indexes. The RTI tool builds indexes for the columns shown in the right pane.
To add a column to the map, double-click on that column. To add all the columns in a table, right-click on the table and select 'Add All'. To remove unwanted columns from the index map, right-click on the unwanted column and select 'Remove'. A whole table can be removed by right clicking on a table and selecting 'Remove'.
Figure 9: Tables and columns added to the index map.
Save the map. In this case, it's a Profile map ('.rtip.xml'). You can now profile the selected data source columns or build an EIQ Index by skipping data profiling.
Figure 10: Configure documents and enter index information.
Select "Document Datasource" as the Data Source. Use or change the default index path, or select 'Browse' to locate the folder. Then enter the index name.
Figure 11: Select document folder and enter an index name.
Select the document types to be indexed. All the documents of the types selected for the specified folder will be displayed in the file system view (left pane).
Figure 12: Selecting document types to index.
Note: To generate indexes for WhamSearch documents ('.TSI' extension), uncheck all other options. This enables the WhamSearch option. Then, select the "Text Search Index" option. See the figure below. |
Figure 12a: Select all the TSI documents to index.
Figure 13: Documents from the selected folder are displayed.
Then, select documents from the document view (left pane) to add to the Index pane (right pane) for building indexes.
To add a document to the index map, double-click on that document. To add all the documents in the documents folder, right-click on the folder and select 'Add All' as shown in Figure 16.
Figure 14: Adding all documents to the index map.
All the documents for the selected document folder have been added to the index map.
Figure 15: Documents RTI index map.
To remove unwanted documents from the index map, right-click on the unwanted document and select 'Remove'. The entire folder can be removed by right-clicking on the folder and selecting 'Remove'.
You can now continue and build an EIQ Index for the document files or from WhamSearch documents.
Figure 16: Configure XML Feeds and enter index information.
Select the 'XML Feeds Folder' option. Then, enter the path to the folder that contains the XML Feed documents to be indexed or click 'Browse'. See Figure 18.
Figure 17: Browse for XML Feeds document folder.
Click 'OK' to use the selected folder.
Figure 18: Enter the index name.
Use or change the default EIQ index path or select 'Browse' to locate the folder. Then, enter the EIQ index name. See Figure 20.
Then, click 'Finish'. The RSS Feeds from the selected folder are displayed in the file system view (left pane).
Figure 19: RSS Feeds from selected folder are displayed.
Then select the RSS feeds from the document view (left pane) to add to the right pane (index pane) for building indexes.
To add an RSS feed to the map, double-click on that feed. To add all the feeds in the folder, right-click on the folder and select 'Add All' as shown below.
Figure 20: Adding all RSS Feed documents to the index map.
All the feeds for the selected folder have been added to the index map.
Figure 21: RSS Feeds RTI index map.
To remove unwanted feeds from the index map, right-click on the unwanted feed and select 'Remove'. The entire feed folder can be removed by right-clicking on the folder and selecting 'Remove'.
You can now build an EIQ Index from the contents of RSS Feed files.
The RTI Tool's data profiling feature allows users to profile selected data in data sources and develop transforms for cleansing and standardizing EIQ Indexes. Profiling mode is only available for relational data sources.
Here, users can build profiles on the selected columns and analyze the data. The RTI tool provides a quick snapshot analysis of selected columns including unique values, frequency distribution, top and bottom frequency values, and minimum and maximum values for an understanding of the data as it exists in the data source. Users can further analyze the data using Microsoft Excel files generated by the profiling feature.
Many data sources contain issues such as unclean data, missing data, insufficient indexes, non-standardized values, and incompatible formats. Using the results of the analysis, users can identify any data issues at the data source. Users can then apply appropriate data type mappings and transforms and test them using the preview feature. In addition, they can also define additional indexes (rich text search indexes on text columns, indexed views with pre-aggregated and pre-joined data) as needed for building clean, standardized, and richer EIQ Indexes.
Note: The indexes built while in 'Profiling' mode are useful only for profiling purposes. You must change from 'Profiling' mode to 'RTI' mode and build EIQ Indexes useful to EIQ Product servers. |
When a new session for relational data sources is loaded, the default mode is 'Profiling'.
Figure 22: Data Profiling mode.
You will notice the map file extension is '.rtip ' to indicate this file contains information for Profiling mode. The 'Mode' menu shows that 'Profiling" is selected. The status line in the lower-right pane displays 'Profiling' to indicate the current mode.
Figure 23: Status line. Depending on the size of the window, the status line may not appear. If you do not see the status line, make your window bigger.
Select the tables and columns you want to profile and add them to the right pane.
Note: If you do not want to perform data profiling and want to build EIQ Indexes directly, you must change the mode to 'RTI' (Indexing mode) from the Mode menu before continuing. Any of the columns already selected in profiling mode will be automatically selected in RTI mode. |
Select the columns from the left pane and add them to the map on the right side to build indexes. The RTI tool builds indexes for the columns in the right pane.
To add a column to the map, double-click on that column. To add all the columns in a table, right-click on the table and select 'Add All'. To remove unwanted columns from the index map, right-click on the unwanted column and select 'Remove'. A whole table can be removed by right-clicking on a table and selecting 'Remove'.
Note that while in Profiling mode character data type columns of 256 or greater length cannot be added to the index map. The 'Add All' command at the table level would not add any such columns to the map. Any such columns need to be added after switching to RTI mode for building indexes.
The next step is to build profiling indexes and files. These contain profile data for all the columns selected in the profile map.
Figure 24: Create data profile indexes.
Click 'Build Profile' on the toolbar and select 'Full Build'.
Figure 25: Building the data profile.
The build process generates profile indexes and files which are written to a predefined folder as follows.
INSTALL_FOLDER\WhamTech\Profiling\INDEXPATH_INDEXNAME
INSTALL_FOLDER is the base folder where the EIQ Server Suite is installed. INDEXPATH is the subfolder name where the indexes are stored and INDEXNAME is the name of the index used.
A separate data profile file is generated for each column with a file extension of '.CSV'. From the example shown above, the profiling file path is as follows:
INSTALL_FOLDER\WhamTech\Profiling\Data\POLICE
The 'Options' item in the 'Tools' menu allows you to change certain aspects of data profiling. Select 'Options' from the 'Tools' menu.
Figure 26: Data Profiling Options menu.
The 'Options' properties sheet is displayed. Select the 'Data Profiling' tab.
Figure 27: Data Profiling Options.
The option 'Build indexes during "Build Profile"' is always set when creating a new session or loading a session from a file. This value does not persist across sessions. When this option is selected, 'Tools->Build' builds both the data profiling indexes and data profiling files (the '.CSV' files corresponding to each profiling index column used for analysis using Microsoft Excel; see below for more details).
If you have lost the profiling files, but you know the profile indexes are up-to-date and you only need to regenerate the associated data profiling files without rebuilding the indexes, you can un-select this option. This saves time while building the index.
The analyze feature allows users to view the original data before any transforms. Users can analyze in two ways: using the 'Analyze Data' feature within the RTI tool and using Microsoft Excel.
Indexes must be built in order to analyze data.
To analyze data using the RTI tool, select and right-click on the column you want to preview. Select 'Analyze Data' from the context menu.
Figure 28: Previewing data.
A dialog opens below the index pane the displays profile information for the column.
Figure 29: Previewing profiling data for a column.
This is a preview of the data as it exists in the data source before applying any transforms.
To view and analyze profile data in Excel files, select the column you want to analyze from the index pane. Right-click and select 'Analyze Data using Excel' from the context menu.
Figure 30: Analyzing data profiling files.
The profile data files have the following name format: SCHEMANAME_TABLENAME_COLUMNNAME.csv
Microsoft Excel opens to display the corresponding CSV file.
Figure 31: Viewing a data profile file in Excel.
To open a list of profiling files for a table, right-click on the table node and select 'Analyze Data' from the context menu.
Figure 32: Displaying a list of profiling files.
The 'Analyze Profiling Data' property sheet is displayed as shown below.
Figure 33: 'Analyze Profiling Data' property page.
Then, select the file to analyze and click 'Analyze'. Microsoft Excel opens and displays the file.
One of the main benefits of using EIQ Indexes for query processing is the ability to build clean indexes without changing the data source. If any data in the data source required cleansing or standardization, you can apply transforms to build clean, standardized, and more useful indexes.
For the purposes of standardization and cleansing, the RTI tool provides column data type mappings and both predefined and user defined transforms. Developers can also create additional user defined transforms using C/C++ and Perl scripts.
Users may find it necessary to change the data type of a data source column while indexing it in an EIQ Index. The 'Column Data Type Mappings' tab in the 'Transformations' dialog allows users to accomplish this.
To map a different data type for a column, right-click on the column in the index map (right pane) and select 'Transformations.' from the context menu. For mode details, See Data Type Conversion.
To apply data transforms to a column, right-click on the column and select 'Transformations…' from the context menu.
Figure 34: Selecting transformations property sheet.
The 'Transformations' property sheet is displayed as follows:
Figure 35: Transformations property sheet.
Select the 'Transformations' tab. It allows users to select the transform(s) for this column. On the right side of the tab it shows all the available transforms for this column under 'Default Transforms' and 'User Defined' lists. It displays only those transformations that are relevant to the selected column's data type.
The 'Transformations' tab provides features to select one or more transforms. Selected transforms are applied in the order they appear in the list on the left-side of the tab. Some transforms may require additional parameters and users can enter the information by clicking 'Edit'.
Figure 36: Transformations tab.
Note: The transformations applied to columns can result in unique values being transformed into non-unique values. Therefore, a column designated as a ROWID column may result in duplicate values.
For example a transformation that removes numbers from the column values could transform previously unique values to duplicates, e.g. if we have Test1 and Test2 before a transformation; we get Test and Test after a transformation. Also, using the 'RemoveCharacters(-())' transform, values like '-b' and 'b-' will be transformed to 'b' and 'b'. Therefore, the user should be careful while applying transformations. |
Note: See EIQ RTI Tool Transformations for additional information on transforms and troubleshooting transform issues. See EIQ User Transform SDK for additional information on defining your own transforms. |
Figure 37: The status bar updates to display the transform for the selected column.
Note: After a transform is added, you must rebuild the profile for the changes to be applied. This is not required for edit changes. |
The preview feature is useful for testing transforms and verifying the results.
To view the profile data after a transform is applied, right-click the column and select 'Preview Transform Data' from the context menu..
Figure 38: 'Preview Transform Data' context menu.
This gives a preview of how the data looks after the transform has been applied. The dialog below the index pane shows the transformed data.
Figure 39: Previewing transformed data.
Users can create additional transforms using Perl scripts and C/C++.
To use transforms written in perl, make sure the script file with subroutines (named as 'perl_transforms.pl') and the library file 'perl510.dll' are present in the '.\WhamTech\bin' folder on your local machine. A sample perl script file 'perl_transforms.pl' should be present in your local installation 'bin' folder already.
Open RTI tool and connect to the POLICE database (one of the sample data sources on the EIQ Server Suite Installation CD). In profiling mode, right-click on the column where the transform should be applied, for instance, the FamilyName column of the LOPERSON table. Select 'Transformations' from the context menu.
Figure 40: Opening the 'Transformations' Property sheet for the FamilyName column.
The 'Transformations' property sheet is displayed as follows:
Figure 41: Transformations Property Sheet.
Select the 'Transformations' tab:
Figure 42: 'Transformations' Tab.
Click the Perlsub() transform from the 'Default Transforms' list and add it to the left pane.
Select the PerlSub() transform in the 'Selected Transforms' list on the left side and click 'Edit' to specify the 'Perlsub' transform properties. Enter the name of the subroutine from the ‘perl_transforms.p’l file, for example, 'testsub'. This will apply the transform(s) written in the testsub subroutine to the selected column data. Click ‘OK’ to exit the Transformations Property Sheet.
Figure 43: Entering the name of the Perl subroutine.
Before building EIQ Indexes, in Profiling mode users can analyze the data as it exists in the data source, build and test transforms, and apply transforms and column data type mappings for cleansing and standardizing data.
To build EIQ Indexes users must change from Profiling to RTI mode.
Select RTI from the 'Mode' menu.
Figure 44: Changing to RTI mode.
If the current Profiling mode session has not been saved, the following message box is displayed.
Figure 45: Save current profiling session file message box.
To save the current profiling session click 'Yes'. To skip saving the file and continue, click 'No'. To remain in profiling mode press 'Cancel'. It is recommended to save the profiling session. Once in RTI mode you cannot change back to 'Profiling' mode. You can however reload the profiling '.rtip' file.
Clicking 'Yes' will display the 'Save As' dialog if this is a new session and not loaded from a saved file.
Figure 46: The 'Save As' dialog.
Click 'Save' to save the file in the current folder with the default name. The status bar now displays the mode as RTI and the file extension is now .rti, as shown below. You can now continue to modify the index map for building the EIQ index.
Figure 47: Once the mode is changed to RTI, Profiling mode is disabled.
Note: To work in 'Profiling' mode after changing to RTI mode, you must load a saved .rtip file. |
You can now setup a complete index map and build an EIQ Index for use with the EIQ Server Suite.
EIQ Indexes are built in RTI mode. Unlike profile indexes, EIQ Indexes are used by EIQ Product servers to execute queries for data sources.
To build an EIQ Index for a relational data source, first create a relational data source session.
When a new session is created, the initial mode is profiling. To learn more about profiling, see 'Data Profiling'. To build an EIQ Index, switch from 'Profiling' to 'RTI' mode.
Now expand the data source tree in the left pane and add items to the index map in the right pane by selecting tables/columns. Indexes are built for the tables/columns added to the right pane (index pane).
To add a column to the index map, double-click on that column. To add all the columns in a table, right-click on the table and select 'Add All'. To remove unwanted columns from the map, right-click on the unwanted column and select 'Remove'. A whole table can be removed by right clicking on a table and selecting 'Remove'.
Note that if the 'Add All' command was used in Profile mode to add all the columns in a table, it would leave out any character type columns of 256 or greater length. Add those columns to the map as desired.
Figure 48: Tables and columns added to the index map.
Figure 49: Assign a 'Row ID' and primary key to a column.
Proceed to assign 'ROW ID' columns for each table in the index map. Next, designate key types, text search fields, and primary and foreign key columns.
In order to retrieve results data from the data source after a query is executed on EIQ Indexes, the EIQ Server needs a way to match EIQ Index result rows with corresponding rows in the data source. For this, the EIQ Server uses ROWID columns. A 'ROWID' column is a data source column that has a unique value for each row in that table. Usually the primary key column for a table serves as its ROWID column.
In cases where a single column is not sufficient to serve as a ROWID column, a combination of columns that make up the unique values can be designated as ROWID columns. In this case, one column would serve as a primary ROWID column, a second column as a secondary ROWID column, and all other columns as extra ROWIDs.
See Appendix E for the allowed data types for ROWID columns.
Each table that is selected in the RTI map for building EIQ Indexes must have at least one ROWID column specified. The RTI tool lets you specify as many as five columns as ROWID columns for a single table.
To designate a field as a ROWID field, follow these steps:
1. Right-click a selected field.
2. Expand the "Modify Flags" submenu.
3. Select "Row ID (PRIMARY)" or "Row ID (SECONDARY)" or "Row ID (Extra)".
Figure 50: Assign a column as RowID field.
The ROWID column designation will be shown through special icons to the left of the column name in the index map.
A column key in an EIQ Index can be defined as one of the following types:
To set the default key type for all the columns of an Index, follow these steps:
Figure 51: Default Key options tab.
To designate a key type for a specific field, follow these steps:
1. Right click on a selected field.
2. Expand the "Modify Flags" submenu.
3. Select "Non-Virtual Key" or "Non Key".
Alternatively, use acceleration keys:
1. Select the desired field and press:
Note: The column key type by default is 'Virtual Key' for a newly created index.
The key designations ([VK] for Virtual Key, [NVK] for Non-Virtual Key) will be shown to the right of the column name in the index map.
Figure 52: Assigning a specific column as 'Non-Virtual Key'.
This step involves designating primary key/foreign key relationships between selected tables in the RTI tool to allow EIQ Product servers to perform automatic joins across multiple data source tables. This is an optional step if you are indexing columns from just a single table or if EIQ Product servers are not going to be used with AutoJoins.
Columns in the RTI Map can be designated as either 'PRIMARY KEY' or 'FOREIGN KEY' or as both. The key designation affects how data is stored in the EIQ Indexes. For these designated columns, the RTI tool saves column data along with index structures in EIQ Index for query processing. For virtual columns data will be discarded once index structures are built.
To designate a field as 'PRIMARY KEY'/ 'FOREIGN KEY', follow these steps:
1. Right-click a selected field.
2. Expand the "Modify Flags" submenu.
3. Select "PRIMARY KEY " or "FOREIGN KEY" or "PRIMARY KEY + FOREIGN KEY"
4. To enable automatic table joins for each primary key column in a table, assign a corresponding foreign key from a different table. You can do this by right-clicking on the desired foreign key field, using the 'Modify Flags' > 'Foreign Key For' submenu, and then clicking on the Primary Key column name.
Figure 53: The RTI Map with ROWID, Primary, and Foreign keys specified.
Note that any column designated as ROWID, Primary key, or Foreign key will be automatically designated as a Non-Virtual Key.
Users can designate any column of a character data type as a 'Text Search Field' and build text search indexes.
To designate a column as 'Text Search Field', right-click on the column and select 'Modify Flags'->'Text Search Field' from the context menu.
The text search field designation will be indicated by '[TSF]' to the right of the column name.
Note: Any character type column of 256 or greater length will be automatically designated as a text search field when added to the index map. For these columns, regular character indexes can also be built by designating them with 'Text Search Field + Index' flag. Regular character indexes differ from text search indexes in that they let users query on the full string or partial strings contained by the column. This designation will be indicated by '[TSF+]' to the right of the column name.
Setting Text Search Options
Users have options to build various text search indexes for text search fields.
Users can build the following types of text search indexes.
Regular text search indexes:
Users can also build associated weighted indexes for each of the regular text search indexes except for Proximity indexes. Unlike regular text search indexes, weighted indexes incorporate the relative weight of the words in a document considering various criteria such as the number of times a word appeared in the document, word context (where in the document the word appeared - titles, headings, normal text, etc.) and font size and style (bold, italics, foot note, etc.).
Weighted indexes can apply stemming, synonyms etc. functions before the weights are calculated. The supported weighted index options for each of the regular text search index are as follows:
Users can set text search options at the global level, which will be applied to all text search columns, and at the individual column level. Any column level options would take precedence over the global settings.
To get to the 'Text Search Options' dialog, right-click on the top item in the right pane, for instance, 'POLICE - (RTI MODE)', and select the 'Text Search Options.' menu item.
Figure 54: Text Search Options dialog.
Note that selecting any weighted index (on the right) would automatically enable building the corresponding non-weighted regular text search index.
· Now click 'OK' to close the dialogs.
Only the selected columns from the data source as described in the index map get built. Depending on the amount of data to be indexed, the indexing process can take some time and can be resource intensive.
· To build an EIQ Index, select "Build EIQ Index" from the tool bar.
Figure 55: Schema loading is complete.
By default, the auto name mapping option is on. When this option is on, the RTI tool maps the data source table and column names to internal unique names while building EIQ Indexes. This avoids data source table/column name conflicts with EIQ reserved words listed in Appendix D: RTI Reserved Words.
Figure 56: Selecting RTI options.
For any reason this option needs to be disabled, select 'Tools | Options.' to open the Options dialog.
Figure 57: Options dialog.
Unselect 'Auto Name Mapping' to let the RTI tool maintain data source table and column names. Then press 'OK' to generate the EIQ indexes.
Document files such as text, html, XML, Word, and Excel can also be indexed using the RTI tool.
First, open a document files session.
The selected file types are displayed in the file system view (left pane). Double-click on a document to select it for indexing or right-click on the 'Documents' folder and click 'Add All'.
Figure 58: Documents from the selected folder are displayed.
Right-click the 'Documents' folder in the 'Documents' view (right pane) and select 'Text Search Options.'.
Figure 59: Select Text Search Options from the context menu.
Then select the required text search options.
Figure 60: Selecting text search options in the dialog.
Note that the selected text search options are displayed on the status line.
Select 'Build EIQ Index' from the toolbar to generate the EIQ indexes for Document Files.
Figure 61: Status bar shows the current text search options.
WhamSearch Intelligent Spider collects content and saves in files with '.TSI' extension. These files can be indexed using RTI tool.
First, create a document files session.
The WhamSearch documents (files with '.TSI' extension) are displayed in the file system view (left pane).
Figure 62: Select root node to change global text search options
Right-click the root node in the 'Document' view (right pane) and select 'Text Search Options.' to display the "Text Search Option" dialog.
Figure 63: Selected text search options
Select the required text search options. Since the global root node text search options are changed an informational message is displayed for user response.
Figure 64: Message regarding global option changes.
Click 'OK' to continue.
Figure 65: Selected text search options are displayed in the status pane.
Notice the status bar displays the selected text search options.
To generate the EIQ Index for the configured map, select "Build EIQ Index" from the toolbar.
Figure 66: Selecting "Build EIQ Index" from the toolbar.
Figure 67: Build progress displayed during index processing.
The "Processing Data" dialog is displayed showing the progress of the build process.
Figure 68: Index processing completed successfully.
Once the build is complete click 'OK' on the information dialog and then click 'Close' on the progress dialog.
RSS Feeds are used for syndicating news and the content of sites (news sites etc.). Building EIQ Indexes for RSS Feeds is similar to building indexes for document files except that the feeds are special purpose documents in XML format.
First, create a RSS Feeds session.
After creating the RSS Feeds, the files from the selected folder are displayed in the file system view (left pane). Right-click the folder in the 'Document' view (right pane) and select 'Text Search Options.'.
Figure 69: Select all the XML Feed documents to index.
Select the required text search options as shown below.
Figure 70: Selecting text search options in the dialog.
Select 'Build EIQ Index' from the toolbar to generate indexes for the documents list in the 'Document' view.
Figure 71: Status bar shows the current text search options.
Indexed views provide predefined 'views' of data in base tables where they are built. A 'view' can contain pre-aggregated or pre-joined information from base tables for ready access to client applications. By doing heavy duty tasks ahead of query execution time, Indexed views can improve query execution speed exponentially. Indexed views can be created either from the EIQ Server RTI Tool or from any query client through EIQ Server. Some types of views, such as the ones involving multiple tables and virtual columns, can only be created through EIQ Server.
Using the EIQ Server RTI Tool, indexed views can be added while building EIQ Indexes for data sources. An indexed view is defined in the EIQ Server RTI Tool by specifying the SQL statement to create the view.
First, create a new EIQ Index for an ODBC data source through the EIQ Server RTI Tool, or open an existing RTI map. Right-click on the index name at the top of the right pane, and click on 'Add Indexed View'. For example, in the figure below, the index name is "FBI_VIEW_INDEX".
Figure 72: Creating an Indexed View
The 'Add Indexed View definition' window opens. Enter a name for the indexed view you wish to create, and define the view using a SQL query such as:
"select count(add_state) as cls, add_state from address group by add_state"
Figure 73: Defining an Indexed View.
Click 'OK'. On the next screen, enter the initial size (number of rows) of the indexed view; for example, 1000.
Enter the maximum size (number of rows) of the indexed view; for example, 10000. As long as the initial size is not larger than the maximum, there should not be an issue. The maximum and the initial size can be the same number.
Figure 74: Initial size of the Indexed View.
You should see the new indexed view in the right pane of the EIQ Server RTI Tool as shown below.
Figure 75: Seeing the new indexed view in the right pane.
Proceed to build the index by clicking 'Build EIQ Index' on the toolbar.
The EIQ Server RTI Tool allows users to modify existing indexes and apply only the changes without rebuilding existing index structures. Discrete builds may include adding one or more tables to the index, adding columns to existing tables, removing columns, and changing the key type of an existing index from Virtual to Non-virtual or vice-versa.
To modify an existing index, open a saved RTI map file (.RTI.XML format). Add new tables/columns or make changes to existing columns as desired. The EIQ Server RTI Tool detects these actions and saves them. Click 'Build' on the toolbar. If the applied actions are supported by the discrete build process, a dialog will appear asking you to choose a full rebuild or apply only the changes since the last save.
Figure 76: Build Options: Rebuild versus 'Apply changes'.
If you click 'Full Build', the existing index files will be deleted and the index will be rebuilt from scratch. Clicking on 'Apply Changes' will only build the new and changed index structures and hence would generally be faster than a complete rebuild.
The Text Search Groups feature allows users to group several columns from a table into one text search index. Queries on text search group indexes find matching text across all the columns in the group. The name of the group is used for the group index that contains data for all the columns selected.
To create a text search group, right-click on a table and select 'Group Text Search Indexes' from the context menu to open the Text Search Groups dialog.
Figure 77: Select 'Group Text Search Indexes' from the context menu.
Figure 78: Text Search Groups dialog.
To add a text search group, click 'Add' to open the 'Text Search Group - Add' dialog.
Figure 79: Text Search Groups - Add dialog.
Enter the name of the text search group in the 'Group Name' edit field. Then, select the Table Columns you want associated with the group. Click the arrows or double-click to move them to the Group Columns box.
Figure 80: Text Search Groups information.
Next, click 'Group Text Search Options' to open the 'Text Search Options' dialog. Then, select the options as shown:
Figure 81: Group Text Search Options dialog.
Now click 'OK' to close the dialogs. The group name now appears in the Group Names box. Click 'OK' to save the new group or 'Cancel' to discard all changes.
Figure 82: Completed adding the Text Search Group
Repeat the above steps to add additional groups that contain other columns.
See Entity Extraction Help for additional information.
The EIQ Server RTI Tool supports data type mappings for columns while building EIQ Indexes. This feature helps RTI users build standardized indexes that are compatible with their target standard data model. A data source column data type may not be compatible with standard data model data types that require a data type conversion in EIQ Indexes. For example, a character type column may contain only integer values in the data source, and the standard model may require those values to be queried in numeric format. Data type mapping enables queries on EIQ Indexes as per the standard data model.
Using RTI column data type mapping, non-string data types can be converted to character strings if needed.
Data type conversion is supported on three levels:
· Globally
· At Table level
· At Column level
The levels take precedence over one another that order.
The global data type map is specified before creating a new map, but can be changed at any time during the RTI map creation process. Modifying the global data type map will affect all selected tables and columns regardless of overridden table or column data type maps.
To modify the global data type map, follow these steps:
1.
From the main menu, select "Edit --> Global Data Type Map"
-or-
2. Right-click the root item of the index map (right pane) and select "Modify Datatype Map."
3. Configure the data type map.
4. Select "OK" to apply the new global data type map.
Table data type maps can be configured for any table selected for indexing. The default table data type map is identical to the global data type map. Modifying the table data type map will override the global and column data type maps for the specified table.
To modify a table data type map, follow these steps:
1. From the index map (right pane), right-click the table whose map is to be modified, and select "Modify Datatype Map".
2. Configure the data type map.
3. Select "OK" to apply the new table data type map.
Column data type maps can be configured for any column that is selected for indexing. Modifying a column data type will override the global or table data type map for the specified column.
To modify a column data type map, follow these steps:
1. From the index map (right pane), right-click the column whose data type map is to be modified and select "Transformations..." menu item
2. Select the new data type map for the column in the Mappings tab.
The available column data type mappings are listed below with an explanation of their function. These mappings are available from the 'Mappings' tab of the Transformations dialog for a given column data type.
The default mapping in RTI for the DECIMAL data type column is NUMERIC.
Map(decimal,decimal) - Map decimal to decimal.
Map(decimal,int) - Map decimal to int.
Map(decimal,real) - Map decimal to real.
Map(decimal,float) - Map decimal to float.
Map(decimal,bigint) - Map decimal to bigint.
Map(decimal,char) - Map decimal to char.
Map(numeric,numeric) - (default)
Map(numeric,int) - Map numeric to int.
Map(numeric,bigint) - Map numeric to bigint.
Map(numeric,real) - Map numeric to real.
Map(numeric,float) - Map numeric to float.
Map(numeric,char) - Map numeric to char.
Map(smallint,smallint) - (default).
Map(smallint,char) - Map smallint to char.
Map(bigint,bigint) - (default).
Map(bigint,char) - Map bigint to char.
Map(int,int) - (default).
Map(int,char) - Map int to char.
Map(real,real) - (default).
Map(real,char) - Map real to char.
Map(double,float) - (default).
Map(double,char) - Map double to char.
Map(double,int) - Map double to int.
Map(double,bigint) - Map double to bigint.
Map(date,date) - (default).
Map(date,char) - Map date to char.
Map(datetime,datetime) - (default).
Map(datetime,date) - Map datetime to date.
Map(datetime,char) - Map datetime to char.
Map(byte,byte) - (default).
Map(byte,blob) - Map byte to blob.
Map(clob,clob) - (default)
Map(clob,char) - Map clob to char.
Map(char,char) - (default).
Map(char,smallint) - Map char to smallint.
Map(char,int) - Map char to int.
Map(char,real) - Map char to real.
Map(char,float) - Map char to float.
Map(varchar,varchar) - (default)
Map(varchar,char) - Map varchar to char.
Map(varchar,clob) - Map varchar to clob.
Map(varchar,smallint) - Map varchar to smallint.
Map(varchar,int) - Map varchar to int.
Map(varchar,real) - Map varchar to real.
Map(varchar,float) - Map varchar to float.
The available transformations for a given data type are listed below with an explanation of their function. Additional user-defined transformations can be developed and added for use. Refer to 'Defining and Applying User Transforms' for more information.
Convert(C,F) - This is a demo transform that converts data from Celsius to Fahrenheit.
IsNull() - This transform replaces null values with a specified value.
Convert(C,F) - This is a demo transform that converts data from Celsius to Fahrenheit.
IsNull() - This transform replaces null values with a specified value.
Convert(C,F) - This is a demo transform that converts data from Celsius to Fahrenheit.
IsNull() - This transform replaces null values with a specified value.
Convert(C,F) - This is a demo transform that converts data from Celsius to Fahrenheit.
IsNull() - This transform replaces null values with a specified value.
Convert(C,F) - This is a demo transform that converts data from Celsius to Fahrenheit.
IsNull() - This transform replaces null values with a specified value.
Convert(C,F) - This is a demo transform that converts data from Celsius to Fahrenheit.
IsNull() - This transform replaces null values with a specified value.
Convert(C,F) - This is a demo transform that converts data from Celsius to Fahrenheit.
IsNull() - This transform replaces null values with a specified value.
FormatPhone() - This transform standardizes phone numbers as (XXX) XXX-XXXX.
FormatSSN() - This transform standardizes social security numbers as XXX-XX-XXXX.
IsNull() - This transform replaces null values with a specified value.
Lookup() - This transform replaces values with specified values from a file.
Lowercase() - This transform converts the column data to lowercase.
PerlSub() - This transform applies a transformation defined in the specified subroutine.
Uppercase() - This transform converts the column data to uppercase.
FormatPhone() - This transform standardizes phone numbers as (XXX) XXX-XXXX.
FormatSSN() - This transform standardizes social security numbers as XXX-XX-XXXX.
IsNull() - This transform replaces null values with a specified value.
Lookup() - This transform replaces values with specified values from a file.
Lowercase() - This transform converts the column data to lowercase.
PerlSub() - This transform applies a transformation defined in the specified subroutine.
Uppercase() - This transform converts the column data to uppercase.
For more details on the available transforms in the EIQ Server RTI Tool, see EIQ RTI Tool Transformations.
The RTI tool uses auto name mapping to avoid conflicts between EIQ Index reserved words and data source table and column names. If the auto name mapping option needs to be turned off for any reason, these reserved words will result in a conflict and are not allowed for table and column names in the EIQ Index map.
Any Column/Table names starting with numeric character
Any Column/Table names starting with ":" or "#" or "_"
@
ACCORDING
ADD
ALIAS
ALL
ALTER
AND
ARE
ASC
ASV
AV
BEGIN
BETWEEN
BY
CHAR
COLUMN
COMMIT
COUNT
CREATE
CRTSPEC
D_CHAR
D_DATE
D_FLOAT
D_LONGINT
D_NUMERIC
D_REAL
D_SHORTINT
DATABASE
DATE
DATETIME
DBOPTION
DELETE
DELIMITED
DELIMITER
DESC
DETAIL
DG
DISCRETE
DISTINCT
DROP
EQ
ERROR
EXCEPT
EXPORT
EXTERNAL
EXTRA
FILE
FIXED
FIXED_ETL
FLOAT
FROM
GE
GROUP
GT
HAVING
IMPORT
IN
INNER
INSERT
INT
INTO
INTO_COLLECTION
IS
ISOLATE
JOIN
KEY
LAYOUT
LE
LIKE
LT
MAXRECORDS
NE
NOT
NUMERIC
ON
OR
ORDER
OVER
PAGE
PAGESPEC
PROTECT
QUOTE
RD
REAL
REALTIME
RECORD
RECORDS
SELECT
SET
SKIPCOUNT
SMALLINT
SUPPRESS
SV
TABLE
TD
THEN
THRU
TIME
TO
TRUNCATE
UNIQUE
UNIQUERTB
UNIQUERTL
UPDATE
VALUES
WHERE
WITH
Only the following data typed columns are allowed to be designated as RowID columns.
Copyright © 2019 , WhamTech, Inc. All rights reserved. This
document is provided for information purposes only and the contents hereof are
subject to change without notice. Names may be
trademarks of their respective owners.