This document provides an overview of SmartData Fabric^® (SDF) and its components along with the concepts behind External Index and Query (EIQ) approach used by SDF. This document covers the following topics:

Product Overview
Data Access, Integration, Sharing and Interoperability Issues
SDF Approach and Conventional Approaches
SDF Components Overview
SDF Tools

Product Overview

WhamTech SmartData Fabric^® (SDF) solves most data access, integration, sharing, and interoperability challenges faced by organizations in an innovative and unique way.

SDF combines technologies from the conventional approaches of data warehousing, federated adapters, and enterprise search. SDF uses a hybrid approach to retain most of the advantages of the conventional approaches while overcoming most of the disadvantages.

SDF provides client applications access to multiple, disparate, and distributed data sources - including those containing structured, unstructured, and semi-structured data. In a typical edge-ware configuration, SDF adapter and federation virtual data sources act similarly to conventional data adapters in federated query systems. However, there is a key difference. SDF provides enhanced query capabilities by using own data-less indexes (EIQ Indexes) for processing Structured Query Language (SQL) queries and going to data sources only for the results data. SDF tools read data from data sources and put the data through cleansing, transformation, and standardization processes to build data-less indexes while leaving data in its original form. SDF tools also maintain the indexes to reflect changes to data sources in near-real-time to provide access to up-to-date information. SDF tools put the results data through the same data quality processes as indexes, merge multiple data source results, and provide the results to calling applications or middleware.

Queries against EIQ Indexes are highly successful because of their clean and standardized nature. SDF also absorbs the query processing load from the data source system making queries much faster. Additional indexes for actions like fuzzy matching can provide querying capabilities that are otherwise not available from the data source.

It is important to note that EIQ Indexes are data-less. They do not retain data locally but hold pointers to data in data sources. Regular database management systems, including data warehouses, retain a copy of the data locally along with index trees and record number lists. However, EIQ Indexes use 'virtual indexes' containing only the index trees and the corresponding record number lists - not the data. SDF goes directly to the data sources to access the raw record data results when necessary.

SDF connects to data sources through available drivers/APIs and query languages. Clients connect to SQF query server through standard database access drivers (ODBC, JDBC, OLEDB) and Web Services using SQL.

Data Access, Integration, Sharing, and Interoperability Issues

It's All About Data:

Accessing it - indexes and queries
Securing and protecting it - including across organizations
Making it useful - cleansing, transforming and standardizing
Distilling it - aggregations and calculations
Making sense of it - context, categorization and semantics
Connecting it to other data - obvious/direct and non-obvious/indirect relationships
Alerting/notifying those who need to know - watch-lists, event-triggers, subscriptions, and thresholds

The problem with data that most organizations face is that they are awash with it and it is rarely available in a usable form for other purposes. In many cases, the hurdles to data access, integration, sharing, and interoperability are more than technical and are related to culture, security, and privacy. WhamTech SmartData Fabric^® helps alleviate these hurdles. SDF combines technologies from the conventional approaches of data warehousing, federated adapters, and enterprise search. SDF powers the best-of-all-worlds solutions by leaving data where it is and providing query-able metadata layers that work with original data sources, data source schemas, and formats. Unlike other approaches, SDF addresses all of the data issues listed above.

SmartData Fabric^® provides the real alternative to conventional approaches for data access, integration, sharing, and interoperability across multiple, disparate data sources with higher success and lower complexity, implementation time, and cost. This provides real value for business intelligence (BI), analytics, marketing, customer data integration (CDI), master data management (MDM), and other data-centric solutions.

The following sections describe the conventional and WhamTech SDF approaches.

Conventional Approaches and the SmartData Fabric^® EIQ Approach to Data Issues

Data Warehousing

Data is extracted from multiple data sources, transformed, and loaded (ETL) into a separate database called a data warehouse.

Figure 1: Data Warehousing

The advantages and disadvantages of data warehouses:

Advantages:

Data warehouses tend to have a high query success as they allow complete control over data once copied:

· High query success

o Data is clean and usable

o Multiple and consistent indexes across data

o Complete control over query processing

· No load on, or interference with, data source systems

· High performance

· Allows for pre-aggregated and pre-calculated fields

· Good for archival purposes

· Can provide access security - row, column (and data element) based access

· Original data sources will not be aware of queries

Disadvantages:

However, there are considerable disadvantages involved in moving data from multiple and often highly disparate data sources to a single data warehouse, such as long implementation time, high cost, lack of flexibility, dated information, and limited capabilities:

· Major data schema transforms from individual data sources into one schema in the data warehouse. This can represent more than 50% of the total data warehouse implementation effort.

· Data owners lose control over their data, raising issues on responsibility, accountability, security, and privacy.

· Long initial implementation time and associated high cost

· Adding new data sources takes time and associated high cost.

· Has limited flexibility on use and users requiring multiple separate data marts for multiple uses and types of users

· Data is typically static and dated, and changes in data cannot be actively monitored

· Does not usually support data drill-down capabilities

· Difficult to accommodate changes in data types and ranges, data source schema, indexes, and queries

· Complete additional system cost including storage

Federated Data Systems with Conventional Adapters

In federated systems using conventional adapters, data remains in data sources and queries are translated from a common data model to queries that each data source can execute. Queries executed on data source systems and the results are retrieved. The components that translate queries and transform result-sets are called adapters.

Figure 2: Federated Data System with Conventional Adapters

The advantages and disadvantages of federated data systems with conventional adapters:

Advantages:

Federated data systems with conventional adapters were pursued in an attempt to overcome some of the disadvantages of data warehouses by providing the following primary benefits:

Data remains at the source overcoming many challenges data warehouses face, such as:

§ Lengthy, costly, and complex ETL process*

§ Data ownership issues

§ Static and dated data

§ Lack of drill-down capabilities

· Little or no additional storage required

Disadvantages:

However, there are considerable disadvantages using federated data systems with conventional adapters due to data source constraints:

· Low query success

· Data is not clean and is sometimes unusable.

· Limited indexes are inconsistent across data sources and are inflexible.

· Limited query processing

· Query load is placed on data source system.

· Query performance can be an issue.

· Queries are "hard-wired".

· Federated systems cannot actively monitor data sources.

· It's expensive and time-intensive to develop adapters.

· No pre-aggregated or pre-calculated fields

· Data sources aware of queries

· No archive

· No results if data source is unavailable.

To accommodate the translation between an application or information sharing system and any particular data source, conventional adapters are developed over a significant period and at great cost to cover basic requirements. In fact, it usually costs 300 to 500% more than the initial adapter purchase to customize conventional adapters that cover basic requirements.

*The only advantage conventional adapters have over the ETL process is that schema transforms become less difficult; however, query processing and subsequent result transforms become more complex.

Federated Data Systems Using SmartData Fabric^®

SmartData Fabric® combines the conventional approaches of data warehousing, federated adapters, and enterprise search to retain the majority of the advantages while overcoming most of the disadvantages. EIQ Indexed Adapter and EIQ Hybrid Adapter configurations replace conventional adapters in federated data systems to bring many benefits of data warehousing to federated data systems. SDF adds clean, standardized, and enhanced indexes to absorb query processing from data sources.

Figure 3: Federated Data System with EIQ Products

SmartData Fabric^®Configurations

Depending on the configuration, SDF can function either as adapters (EIQ Indexed Adapter, EIQ Hybrid Adapter, and EIQ Conventional Adapter - collectively referred to as EIQ Edgeware) or as federation servers (EIQ Federation Server). When configured as EIQ Edgeware - with a single instance dedicated per data source - SDF provides access to a single data source. A Federation configuration connects multiple EIQ Edgeware servers to provide access to multiple data sources through a single common interface or business data view.

For details on various SDF configurations, refer to the SDF Configurations Introduction.

EIQ Indexes

The EIQ SuperAdapter and EIQ TurboAdapter configurations execute queries at EIQ Indexes. Two features of EIQ Indexes are important to note:

1. Unlike in data warehousing, EIQ Indexes are built according to the schema defined in the data source. They require no schema transformations

2. EIQ Indexes primarily use 'virtual indexes' containing only index trees and the corresponding record number lists. No actual data is used. The only exceptions are ROWID columns and other columns indicated as foreign keys. These are designated as Non-virtual keys.

EIQ Indexes may contain two types of indexes: Content Indexes and Link Indexes. Content Indexes are for content derived from data sources, whereas Link Indexes are formed to link records from within and across data sources based on specific link criteria. Only EIQ SuperAdapters can form link indexes. Link Indexes are discussed in detail under the Link Indexes section.

In a typical content index, each index table has a column designated as a ROWID column. This ROWID column must contain only unique values; that is, each value in this column uniquely identifies a single row in that table. Usually the Primary Key column in a table is designated as a ROWID column. The ROWID column values are used for retrieving raw results data for matching rows from the data source as needed. If there is no single primary key column for a table, then multiple columns that form unique value can collectively be designated as ROWID columns.

Synchronizing EIQ Indexes

It is necessary to keep EIQ Indexes in sync with data in the corresponding data source to obtain up-to-date results. If the values in the data source are modified, indexes need to be maintained by updating them with new values. There are multiple options available for updating EIQ Indexes.

The advantages and disadvantages of federated data systems with SDF EIQ adapters:

With EIQ Indexed Adapters, queries are resolved almost 100% at the EIQ Indexes providing the following benefits:

Advantages:

· High query success because indexes/results data are clean and usable.

· Indexes are consistent across disparate data sources.

· Complete control over query processing at EIQ Indexes

· Data remains at the source.

· Latest data is available

· Almost no index or query load on data source systems

· No major schema transforms.

· EIQ Products can actively monitor indexes and data sources and send notifications when complex events happen

· Rapid query response

· Additional indexes

o De-normalized indexes

o Indexed views/aggregate indexes

· Additional query capabilities

o Text Search

o Entity Extraction

o Information geometry tool

o Fuzzy matching

o Link Indexes™ for performance and link analysis

· Highly flexible

· Provide row, column (and data element) security indexes

· User-level access to data sources

· Return results from indexes even if data sources become unavailable.

· Data sources only aware of low-level results requests; not the full queries.

Apart from these advantages, SDF also helps alleviate cultural hurdles for implementing data access and information sharing solutions across organizations.

Disadvantages:

Requires index updates
Indexes require storage
(No built-in archive by default)

Table 1: Comparison between EIQ Products and other approaches ranked by advantage to EIQ Products:

Number	Features	EIQ Products	Data Warehouses	Federated Systems with Conventional Adapters	Comments
1	Minimal implementation time				Advantages are unique to EIQ Products.
2	Quickly add new data sources
3	Flexibility of use and users
4	Actively monitor data sources		₁		EIQ Products provide these advantages over data warehouses and federated systems with conventional adapters.
5	Full text search		₂
6	Unlimited query options and performance		₃
7	De-normalized views		₄
8	Link mapping and analysis/data mapping		₅
9	No major schema transforms			₆
10	Can write to data sources
11	Row, column, and data element security		₇	₈
12	Data source changes readily made	₉		₁₀
13	Clean and usable data				EIQ Products provide these advantages over federated systems with conventional adapters and share them with data warehouses.
14	Consistent and multiple indexes and types
15	Almost any data source
16	Do not install anything on data source systems			₁₁
17	Pre-aggregated and pre-calculated fields
18	Results when data sources are unavailable
19	Data remains at source				EIQ Products provide these advantages over data warehouses and share them with federated systems with conventional adapters.
20	User-level access to data sources
21	Latest data available		₁₂
22	Drill-down capabilities		₁₃
23	No index or query load on data source systems	₁₄			Data warehouses provide these advantages over EIQ Products and federated systems with conventional adapters.
24	Data source owners are not aware of queries	₁₅
25	Archive	₁₆
26	Good for standard application data sources	₁₇			Federated systems with conventional adapters provide these advantages over EIQ Products and data warehouses.
27	No need for data or index update process	₁₈
28	No additional system cost	₁₉		₂₀

^[1] Real-time data warehouses only
^[2] Data warehouses, typically, do not have full text search
^[3] Additional databases or data marts, typically, are used for unlimited query options and performance
^[4] Additional databases or data marts, typically, are used for de-normalized views
^[5] Additional databases or data marts, typically, are used for link analysis/data mining
^[6] No major schema transform if flat front-end schema is used, e.g., GJDXDM and NIEM in government
^[7] Data owners relinquish control over their data. Only one or two DBMS vendors provide this level of security
^[8] Only if data sources provide this level of security (see footnote 7)
^[9] SDF can accommodate some changes, for instance, indexes can be used to create new indexes; however, fundamental changes may require individual column re-indexing.
^[10] Federated systems with conventional adapters can accommodate minor changes, but not to the extent that SDF can.
^[11] Many federated systems with conventional adapters require specialized connectors and/or special access that requires installation of something on data source systems.
^[12] Only for real-time or active data warehouses
^[13] Can be implemented in data warehouses, but not typically.
^[14] Small overhead on data source system when retrieving final result-set data
^[15] Data source system receives low-level requests for specific records only - not the query that resulted in them.
^[16] SDF can store results data; work with mirror-image copies of original data sources that act as archives, complying with Sarbanes-Oxley and other regulations (data warehouses are not considered original data source copies); and maintain an index to archive.
^[17] Can use application vendor or third-party change data capture (CDC) capability or results level indexes.
^[18] Only index update process, but not data
^[19] SDF, typically, have separate servers for query processing and storage for indexes, but not storage for data or separate DBMS to maintain like in data warehouses.
^[20] Federated systems with conventional adapters, typically, have separate servers and require minimal storage.

SmartData Fabric^® Components Overview

The SmartData Fabric^® package comes with a set of servers, client and administrative tools, connectors and access drivers, and software development kits (SDK). Refer to Figure 4 below for an overview of the SDF internal architecture and components.

Figure 4: SmartData Fabric^® Architecture and Components

Servers

SDF PG, EIQ Server and EIQ RTIS are the main server components included in the suite.

SDF PG is the query server and can provide various adapters and federation functionality.

EIQ Server is the configuration and management server that lets admins configure and manage various adapters and federation configurations. These configurations include EIQ Indexed Adapter, EIQ Hybrid Adapter, EIQ Conventional Adapter, and EIQ Federation Server.

EIQ RTIS helps keep EIQ Indexes in sync with changes in corresponding data sources by monitoring transaction, change and redo logs and then applying the appropriate changes where necessary.

Administration Tools

The administration tools included in SDF PG provide a user-friendly interface that simplifies the server configuration process. The EIQ Server RTI Tool is used to build the initial EIQ Indexes. The EIQ Server Configuration Tool is used to configure specific adapter and federation configuration. The EIQ Update Configuration Tool is used to configure EIQ RTIS.

A diagnostic tool and query client tools and drivers are also included.

EIQ Server RTI Tool

EIQ Server Configuration Tool

EIQ Update Configuration Tool

EIQ DS TransMon Configuration Tool

EIQ Diagnostics Tool

Other Components

JavaGateway with WhamEE - an Entity Extraction server that extracts entities out of text.

WhamSearch - an Intelligent web spider that finds documents matching given criteria based on relevant keywords, watch lists or Information Geometry models.

Connectors

Various data source connectors and drivers are supplied for connecting to databases, such as, Oracle, Microsoft SQL Server, Teradata, DB2, and Mainframe data files.

Client Drivers

SDF comes with ODBC, OLEDB, and JDBC drivers to help clients connect with the SDF Query Server. Clients can also use a Web Services interface for connection.

SmartData Fabric^®Components

SDF PG

SDF PG provides adapter and federation virtual data source functionality based on configuration.

EIQ Server

The EIQ Server is a key component of SDF. The EIQ Server lets admins configure adapters and federation virtual data sources. These configurations include EIQ Indexed Adapter, EIQ Hybrid Adapter, EIQ Conventional Adapter, and EIQ Federation Server. These configurations and the concepts behind the configurations are described in the SDF Configurations Introduction.

EIQ Server Configuration Tool

The EIQ Server Configuration Tool is used for configuring and managing SDF servers. SDF servers run as services and they must be running before query clients can connect.

This tool helps admins to:

Connect to EIQ Product servers locally or across networks)
Configure EIQ Product servers into EIQ Indexed Adapters, EIQ Hybrid Adapters, EIQ Conventional Adapters, or EIQ Federation Servers
Manage business data dictionary standard data models
Register data sources
Attach multiple EIQ Indexes (Attach Mode)
Configure entities
Initiate Link Indexing
Set advanced server settings
Monitor and manage connection pooling and client sessions
Manage users and user credentials

EIQ RTIS

EIQ RTIS keeps EIQ Indexes in sync with changes in corresponding data sources. EIQ RTIS communicates with the data sources through a messaging system. The message system holds any update notification messages from data sources for EIQ RTIS. It is also capable of monitoring message queues such as MSMQ and JMS and can be used in place of database triggers.

On the data source side, the admin must set some form of change recognition and notification mechanism that uses transaction logs, change logs, replication, etc., and send any updates to the message system as messages. EIQ RTIS actively polls the message system and applies updates to EIQ Indexes in near-real-time.

For a detailed description of the index update process and tools, see EIQ Server Update Process.

EIQ RTIS is configured using the EIQ Update Configuration Tool and EIQ DS TransMon Configuration Tool.

EIQ Update Configuration Tool

The EIQ Update Configuration Tool is used to configure EIQ RTIS and provide an interface for administering and monitoring it. The EIQ Update Configuration Tool enables admins to create update tasks associated with data sources and their EIQ indexes. Admins can assign EIQ RTIS to a message queue and set the polling interval.

In addition, a mechanism to start and stop the services that monitor the notifications sent via the message system is provided.

EIQ DS TransMon Configuration Tool

The EIQ DS TransMon Configuration Tool is also used to configure EIQ RTIS and provide an interface for administering and monitoring it. This tool enables administrators to create tasks associated with data sources and task items associated with EIQ Server Virtual Data Sources. Administrators can assign a message queue to each task item.

In addition, a mechanism to start and stop the task items that monitor the data sources is provided.

EIQ Server RTI Tool

The EIQ Server RTI Tool is used to create the initial EIQ Indexes for data sources. Users can perform various functions including data type mappings, transforms, and relational key settings (Primary Key/Foreign Key). This tool also lets users build indexed views and text search indexes with various options, build and view data profiles, and try various transforms.

EIQ Server RTI Tool generates map files containing EIQ Index definitions that can be edited for future modifications.

WhamTech JavaGateway Server

WhamTech JavaGateway serves as a gateway to SDF components for accessing data sources that support only Java APIs (JDBC, Java APIs).

PGAdmin/PSQL

PGAdmin and PSQL are query client tools that let user:

connect to SDF query server
open Virtual Data Sources and registered data sources
make queries
retrieve results

EIQ Diagnostics Tool

The EIQ Diagnostics Tool provides message tracing and logging for troubleshooting SDF components through a simple user-friendly interface.

Copyright © 2023 , WhamTech, Inc. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. Names may be trademarks of their respective owners.

Product Overview

Data Access, Integration, Sharing, and Interoperability Issues

Conventional Approaches and the SmartData Fabric® EIQ Approach to Data Issues

Federated Data Systems Using SmartData Fabric®

SmartData Fabric® Configurations

SmartData Fabric® Components Overview

SmartData Fabric® Components

EIQ Server

EIQ Server Configuration Tool

EIQ RTIS

EIQ Update Configuration Tool

EIQ DS TransMon Configuration Tool

EIQ Server RTI Tool

WhamTech JavaGateway Server

PGAdmin/PSQL

EIQ Diagnostics Tool

Conventional Approaches and the SmartData Fabric^® EIQ Approach to Data Issues

Federated Data Systems Using SmartData Fabric^®

SmartData Fabric^®Configurations

SmartData Fabric^® Components Overview

SmartData Fabric^®Components