SmartData Fabric® Introduction
Version 8.0.0.490
Data Access, Integration,
Sharing, and Interoperability Issues. 2 Conventional Approaches and
SmartData Fabric’s EIQ Approach to Data Issues. 3 Federated Data Systems with
Conventional Adapters. 4 The advantages and
disadvantages of federated data systems with conventional adapters:
5 Federated Data Systems Using
SmartData Fabric®. 6 SmartData Fabric® Configurations. 7 The advantages and
disadvantages of federated data systems with SDF EIQ adapters:
7 SmartData Fabric® Components Overview.. 11 SmartData Fabric® Components. 12 EIQ Server Configuration
Tool
13 EIQ Update Configuration
Tool
13 EIQ DS TransMon Configuration
Tool
13 WhamTech JavaGateway
Server
14 This document provides an overview of SmartData
Fabric® (SDF) and its components along with the concepts
behind External Index and Query (EIQ) approach used by SDF. This document covers
the following topics: WhamTech
SmartData Fabric® (SDF) solves most data access, integration, sharing, and
interoperability challenges faced by organizations in an innovative and unique
way. SDF combines technologies from the conventional approaches
of data warehousing, federated adapters, and enterprise search. SDF uses a
hybrid approach to retain most of the advantages of the conventional approaches
while overcoming most of the disadvantages. SDF provides client applications access to multiple,
disparate, and distributed data sources - including those containing structured,
unstructured, and semi-structured data. In a typical edge-ware configuration,
SDF adapter and federation virtual data sources act similarly to conventional
data adapters in federated query systems. However, there is a key
difference. SDF provides enhanced query capabilities by using own data-less
indexes (EIQ Indexes) for processing Structured Query Language (SQL) queries and
going to data sources only for the results data. SDF tools read data from data
sources and put the data through cleansing, transformation, and
standardization processes to build data-less indexes while leaving data in
its original form. SDF tools also maintain the indexes to reflect changes to
data sources in near-real-time to provide access to up-to-date
information. SDF tools put the results data through the same data quality
processes as indexes, merge multiple data source results, and provide the
results to calling applications or middleware. Queries against EIQ Indexes are highly successful because
of their clean and standardized nature. SDF also absorbs the query processing
load from the data source system making queries much faster. Additional indexes
for actions like fuzzy matching can provide querying capabilities that
are otherwise not available from the data source. It is important to note that EIQ Indexes are data-less.
They do not retain data locally but hold pointers to data in data sources.
Regular database management systems, including data warehouses, retain a copy of
the data locally along with index trees and record number lists. However, EIQ
Indexes use 'virtual indexes' containing only the index trees and the
corresponding record number lists - not the data. SDF goes directly to the
data sources to access the raw record data results when
necessary. SDF connects to data sources through available drivers/APIs
and query languages. Clients connect to SQF query server through standard
database access drivers (ODBC, JDBC, OLEDB) and Web Services using
SQL. The
problem with data that most organizations face is that they
are awash with it and it is rarely available in a usable form for other
purposes. In many cases, the hurdles to data access, integration, sharing, and
interoperability are more than technical and are related to culture, security,
and privacy. WhamTech SmartData Fabric® helps alleviate these hurdles. SDF combines
technologies from the conventional approaches of data warehousing, federated
adapters, and enterprise search. SDF powers the best-of-all-worlds solutions by
leaving data where it is and providing query-able metadata layers that work
with original data sources, data source schemas, and formats. Unlike other
approaches, SDF addresses all of the data issues listed above. SmartData Fabric® provides the
real alternative to conventional approaches for data access, integration,
sharing, and interoperability across multiple, disparate data sources with
higher success and lower complexity, implementation time, and cost. This
provides real value for business intelligence (BI), analytics, marketing,
customer data integration (CDI), master data management (MDM), and other
data-centric solutions. The following sections describe the conventional and WhamTech SDF approaches. Data is extracted from multiple data
sources, transformed, and loaded (ETL) into a separate database called a data
warehouse. Figure 1: Data Warehousing The advantages and disadvantages of data warehouses: Advantages: Data warehouses tend to have a high
query success as they allow complete control over data once copied: ·
High query success o
Data is clean and usable o
Multiple and consistent indexes across data
o
Complete control over query processing ·
No load on, or interference with, data source
systems ·
High performance ·
Allows for pre-aggregated and pre-calculated
fields ·
Good for archival purposes ·
Can provide access security - row, column (and
data element) based access ·
Original data sources will not be aware of
queries Disadvantages: However, there are considerable
disadvantages involved in moving data from multiple and often highly
disparate data sources to a single data warehouse, such as long implementation
time, high cost, lack of flexibility, dated information, and limited
capabilities: ·
Major data schema transforms from individual data
sources into one schema in the data warehouse. This can represent
more than 50% of the total data warehouse implementation effort. ·
Data owners lose control over their data, raising
issues on responsibility, accountability, security, and privacy. ·
Long initial implementation time and associated
high cost ·
Adding new data sources takes time and associated
high cost. ·
Has limited flexibility on use and users
requiring multiple separate data marts for multiple uses and types of users ·
Data is typically static and dated, and changes in
data cannot be actively monitored ·
Does not usually support data drill-down
capabilities ·
Difficult to accommodate changes in data types
and ranges, data source schema, indexes, and queries ·
Complete additional system cost including
storage In federated systems using conventional
adapters, data remains in data sources and queries are translated from a common
data model to queries that each data source can execute. Queries executed on
data source systems and the results are retrieved. The components that translate
queries and transform result-sets are called adapters. Figure 2: Federated Data System with Conventional
Adapters Advantages: Federated data systems with conventional
adapters were pursued in an attempt to overcome some of the disadvantages of
data warehouses by providing the following primary benefits: §
Lengthy, costly, and complex ETL process* §
Data ownership issues §
Static and dated data §
Lack of drill-down capabilities ·
Little or no additional storage required Disadvantages: However, there are considerable
disadvantages using federated data systems with conventional adapters due to
data source constraints: ·
Low query success ·
Data is not clean and is sometimes
unusable. ·
Limited indexes are inconsistent across data
sources and are inflexible. ·
Limited query processing ·
Query load is placed on data source
system. ·
Query performance can be an issue. ·
Queries are "hard-wired". ·
Federated systems cannot actively monitor data
sources. ·
It's expensive and time-intensive to develop
adapters. ·
No pre-aggregated or pre-calculated fields ·
Data sources aware of queries ·
No archive ·
No results if data source
is unavailable. To accommodate the translation between
an application or information sharing system and any particular data source,
conventional adapters are developed over a significant period and at great cost
to cover basic requirements. In fact, it usually costs 300 to 500% more
than the initial adapter purchase to customize conventional adapters that
cover basic requirements. *The only advantage
conventional adapters have over the ETL process is that schema transforms become
less difficult; however, query processing and subsequent result
transforms become more complex. SmartData Fabric® combines the conventional
approaches of data warehousing, federated adapters, and enterprise search to
retain the majority of the advantages while overcoming most of the
disadvantages. EIQ Indexed Adapter and EIQ Hybrid Adapter configurations replace
conventional adapters in federated data systems to bring many benefits of data
warehousing to federated data systems. SDF adds clean, standardized, and
enhanced indexes to absorb query processing from data sources. Figure 3: Federated Data System with EIQ Products Depending on the configuration, SDF can function either as
adapters (EIQ Indexed Adapter, EIQ Hybrid Adapter, and EIQ Conventional Adapter
- collectively referred to as EIQ Edgeware) or as federation servers (EIQ
Federation Server). When configured as EIQ Edgeware - with a single instance
dedicated per data source - SDF provides access to a single data source. A
Federation configuration connects multiple EIQ Edgeware servers to provide
access to multiple data sources through a single common interface or business
data view. For details on various SDF configurations, refer to
the SDF Configurations
Introduction. The EIQ SuperAdapter and EIQ TurboAdapter configurations execute queries at EIQ
Indexes. Two features of EIQ Indexes are important to note: 1.
Unlike in data warehousing, EIQ Indexes are built
according to the schema defined in the data source. They require no schema
transformations 2.
EIQ Indexes primarily use 'virtual indexes'
containing only index trees and the corresponding record number lists. No actual data is used. The only
exceptions are ROWID columns and other columns indicated as foreign
keys. These are designated as Non-virtual keys. EIQ Indexes may contain two types of
indexes: Content Indexes and Link Indexes. Content Indexes are for content
derived from data sources, whereas Link Indexes are formed to link records from
within and across data sources based on specific link criteria. Only EIQ SuperAdapters can form link indexes.
Link Indexes are discussed in detail under the Link Indexes
section. In a typical content index, each index
table has a column designated as a ROWID column. This ROWID
column must contain only unique values; that is, each value in this column
uniquely identifies a single row in that table. Usually the Primary Key
column in a table is designated as a ROWID column. The ROWID column values
are used for retrieving raw results data for matching rows from the data source
as needed. If there is no single primary key column for a table, then multiple
columns that form unique value can collectively be designated as ROWID
columns. It is necessary to keep EIQ Indexes in
sync with data in the corresponding data source to obtain up-to-date
results. If the values in the data source are modified, indexes need to be
maintained by updating them with new values. There are multiple options
available for updating EIQ Indexes. With EIQ Indexed Adapters, queries are
resolved almost 100% at the EIQ Indexes providing the following benefits: Advantages: ·
High query success because indexes/results
data are clean and usable. ·
Indexes are consistent across disparate
data sources. ·
Complete control over query processing at EIQ
Indexes ·
Data remains at the source. ·
Latest data is available ·
Almost no index or query load on data source
systems ·
No major schema transforms. ·
EIQ Products can actively monitor indexes
and data sources and send notifications when complex events
happen ·
Rapid query response ·
Additional indexes o
De-normalized indexes o
Indexed views/aggregate indexes ·
Additional query capabilities o
Text Search o
Entity Extraction o
Information geometry tool o
Fuzzy matching o
Link Indexes™ for performance and link
analysis ·
Highly flexible ·
Provide row, column (and data element) security
indexes ·
User-level access to data sources ·
Return results from indexes even if data
sources become unavailable. ·
Data sources only aware of low-level results
requests; not the full queries. Apart from these advantages, SDF also
helps alleviate cultural hurdles for implementing data access and information
sharing solutions across organizations. Disadvantages: Table 1:
Comparison between EIQ Products and other approaches ranked by advantage to EIQ
Products: Number Features EIQ Products Data Warehouses Federated Systems with Conventional
Adapters Comments 1 Minimal implementation time Advantages are unique to EIQ Products. 2 Quickly add new data sources 3 Flexibility of use and users 4 Actively monitor data sources 5 Full text search
6 Unlimited query options and performance 7 De-normalized views 8 Link mapping and analysis/data mapping 9 No major schema transforms 10 Can write to data sources 11 Row, column, and data element security
12 Data source changes readily made
13 Clean and usable data 14 Consistent and multiple indexes and types 15 Almost any data source 16 Do not install anything on data source systems
17 Pre-aggregated and pre-calculated fields 18 Results when data sources are unavailable 19 Data remains at source 20 User-level access to data sources 21 Latest data available
22 Drill-down capabilities
23 No index or query load on data source systems
Data warehouses provide these advantages over EIQ
Products and federated systems with conventional adapters. 24 Data source owners are not aware of queries 25 Archive
26 Good for standard application data sources 27 No need for data or index update process
28 No additional system cost
[1] Real-time data
warehouses only The SmartData Fabric® package
comes with a set of servers, client and administrative tools, connectors and
access drivers, and software development kits (SDK). Refer to Figure 4 below for
an overview of the SDF internal architecture and components. Figure 4: SmartData Fabric®
Architecture and Components The administration tools included in SDF PG provide a user-friendly interface that
simplifies the server configuration process. The EIQ Server RTI Tool is
used to build the initial EIQ Indexes. The EIQ Server Configuration Tool is
used to configure specific adapter and federation configuration. The EIQ Update
Configuration Tool is used to configure EIQ RTIS. A diagnostic tool and query client tools and drivers are
also included. EIQ DS TransMon Configuration Tool JavaGateway with WhamEE - an Entity Extraction server that
extracts entities out of text. WhamSearch - an Intelligent web
spider that finds documents matching given criteria based on relevant
keywords, watch lists or Information Geometry models. Various data source connectors and
drivers are supplied for connecting to databases, such as, Oracle, Microsoft SQL
Server, Teradata, DB2, and Mainframe data files. SDF comes with ODBC, OLEDB, and JDBC
drivers to help clients connect with the SDF Query Server. Clients can also use
a Web Services interface for connection. SDF PG provides adapter and federation virtual data source
functionality based on configuration. The
EIQ Server is a key component of SDF. The EIQ Server lets admins configure adapters and
federation virtual data sources. These configurations include EIQ Indexed
Adapter, EIQ Hybrid Adapter, EIQ Conventional Adapter, and EIQ Federation
Server. These configurations and the concepts behind the configurations are
described in the SDF
Configurations Introduction.
The EIQ Server Configuration
Tool is used for configuring and managing SDF servers. SDF servers run as
services and they must be running before query clients can connect. This tool helps admins to: EIQ RTIS keeps EIQ Indexes
in sync with changes in corresponding data sources. EIQ RTIS communicates with the data sources through
a messaging system. The message system holds any update notification messages
from data sources for EIQ RTIS. It is also capable of monitoring message queues
such as MSMQ and JMS and can be used in place of database triggers. On the data source side, the
admin must set some form of change recognition and notification mechanism
that uses transaction logs, change logs, replication, etc., and send any updates
to the message system as messages. EIQ RTIS actively polls the message system and
applies updates to EIQ Indexes in near-real-time. For a detailed description of the index update process
and tools, see EIQ Server Update Process. EIQ RTIS is configured using
the EIQ Update Configuration Tool and EIQ DS TransMon Configuration Tool. The EIQ Update Configuration
Tool is used to configure EIQ RTIS and provide an interface for
administering and monitoring it. The EIQ Update Configuration Tool enables
admins to create update tasks associated with data sources and their EIQ
indexes. Admins can assign EIQ RTIS to a message queue and set the polling
interval. In addition, a mechanism to start and
stop the services that monitor the notifications sent via the message system is
provided. The EIQ DS TransMon Configuration Tool is also
used to configure EIQ RTIS and provide an interface for administering and
monitoring it. This tool enables administrators to create tasks associated
with data sources and task items associated with EIQ Server Virtual Data
Sources. Administrators can assign a message queue to each task item. In addition, a mechanism to start and
stop the task items that monitor the data sources is provided. The EIQ Server RTI
Tool is used to create the initial EIQ Indexes for data
sources. Users can perform various functions including data type mappings,
transforms, and relational key settings (Primary Key/Foreign Key). This tool
also lets users build indexed views and text search indexes with various
options, build and view data profiles, and try various transforms. EIQ Server RTI Tool generates map files
containing EIQ Index definitions that can be edited for future
modifications. WhamTech JavaGateway serves as a gateway to SDF
components for accessing data sources that support only Java APIs (JDBC, Java
APIs). PGAdmin and PSQL are query client tools that let user: The EIQ Diagnostics
Tool provides message tracing and logging for troubleshooting SDF components
through a simple user-friendly interface. Introduction
Product
Overview
Data Access, Integration, Sharing, and Interoperability
Issues
It's All
About Data:
Conventional Approaches and the SmartData Fabric® EIQ
Approach to Data Issues
Data
Warehousing
Federated Data Systems with Conventional
Adapters
The
advantages and disadvantages of federated data systems with conventional
adapters:
Federated Data Systems Using SmartData
Fabric®
SmartData
Fabric® Configurations
EIQ
Indexes
Synchronizing EIQ Indexes
The
advantages and disadvantages of federated data systems with SDF EIQ
adapters:
1
EIQ Products
provide these advantages over data warehouses and federated systems with
conventional adapters.
2
3
4
5
6
7
8
9
10
EIQ Products
provide these advantages over federated systems with conventional adapters
and share them with data warehouses.
11
EIQ Products
provide these advantages over data warehouses and share them with
federated systems with conventional adapters.
12
13
14
15
16
17
Federated
systems with conventional adapters provide these advantages over EIQ
Products and data warehouses.
18
19
20
[2] Data warehouses, typically, do not have
full text search
[3] Additional databases or data marts,
typically, are used for unlimited query options and performance
[4] Additional
databases or data marts, typically, are used for de-normalized views
[5] Additional
databases or data marts, typically, are used for link analysis/data
mining
[6] No major
schema transform if flat front-end schema is used, e.g., GJDXDM and NIEM in
government
[7] Data
owners relinquish control over their data. Only one or two DBMS vendors
provide this level of security
[8] Only if data sources provide this level of
security (see footnote 7)
[9] SDF can accommodate some changes, for
instance, indexes can be used to create new indexes; however, fundamental
changes may require individual column re-indexing.
[10] Federated systems with conventional
adapters can accommodate minor changes, but not to the extent that SDF can.
[11] Many federated
systems with conventional adapters require specialized connectors and/or special
access that requires installation of something on data source systems.
[12] Only for
real-time or active data warehouses
[13] Can be implemented in data warehouses, but
not typically.
[14] Small overhead on data source system when
retrieving final result-set data
[15] Data source system receives low-level
requests for specific records only - not the query that resulted in them.
[16] SDF can store
results data; work with mirror-image copies of original data sources that act as
archives, complying with Sarbanes-Oxley and other regulations (data
warehouses are not considered original data source copies); and maintain an
index to archive.
[17] Can use application vendor or third-party
change data capture (CDC) capability or results level indexes.
[18] Only index update
process, but not data
[19] SDF, typically, have separate servers
for query processing and storage for indexes, but not storage for data or
separate DBMS to maintain like in data warehouses.
[20] Federated systems
with conventional adapters, typically, have separate servers and
require minimal storage.SmartData
Fabric® Components
Overview
Servers
Administration Tools
Other
Components
Connectors
Client
Drivers
SmartData
Fabric®
Components
SDF
PG
EIQ
Server
EIQ
Server Configuration Tool
EIQ RTIS
EIQ
Update Configuration Tool
EIQ DS TransMon Configuration Tool
EIQ
Server RTI Tool
WhamTech JavaGateway Server
PGAdmin/PSQL
EIQ
Diagnostics Tool
Copyright
© 2023 , WhamTech, Inc. All rights reserved. This document is
provided for information purposes only and the contents hereof are subject to
change without notice. Names may be trademarks of their respective owners.