WhamTech security-centric distributed data virtualization, federation, master data management, virtual integration, virtual graph database and visualization, and analytics technology
Combining the best of conventional federated data access, data lake, data warehouse and enterprise search solutions
SmartData Fabric® is used to deliver comprehensive data solutions to enable:
- Access to almost any and many data sources on almost any and many platforms, e.g., cloud, on-premise, data centers, SaaS/third-party only, etc.
- Manage different types of data, e.g., structured, unstructured and semi-structured, with different processors to build and maintain different types of indexes on data that needs them
- Upfront data discovery, classification, security, profiling, quality and standardization
- Dynamic data masking, tokenization and encryption, depending on access control and user credentials
- Parallel distributed indexing and query processing in near real-time, and other update periods and methods
- A seamless combination of master data with all other data, which is essential for data integration and virtual graph database and visualization
- Standard drivers, standard query languages and standard data views based on standard data models
- Advanced capabilities to support almost any and many applications, e.g., operational, reporting, BI and analytics
- Highly interactive link analysis and visualization, and other graphics through third-party apps
- Advanced access control, data governance and data security base don roles and user credentials
- Multiple configuration options, e.g., , cloud only, on-premise only, hybrid cloud 1.0 and an ideal hybrid cloud 2.0 that allows all compute in the cloud, but data remains where it is, and multi-cloud
Unique indexes are key to enabling advanced capabilities and driving significant value:
A new paradigm shift has happened in the world of data. Extract, Transform and Load (ETL) led the way for a long time, but the vast amount of highly diverse data that has been created on highly diverse and in some cases, SaaS/third-party, systems has made copying and moving all this data, in some cases, not possible, paired with security issues for copied data, unscalable and very expensive. SmartData Fabric® has pioneered a new process known as Read, Transform and INDEX (RTI). Instead of extracting, moving and copying data, RTI accesses and reads data that needs to be processed and indexed, leaves it wherever it resides, and then creates pointers to the data in sources that it stores in indexes.
There are three types of indexes: Content, link, and master data. Content indexes are the basis for the other two. All indexes resolve to “record numbers” – internal to SDF but correlated with external and data source references, and can be combined using Boolean operations on physical and virtual bitmaps.
Content indexes have eight possible types, depending on the configuration:
- Source data
- Composite (source data combined)
- Derived from source data
- Indexed views (pre-aggregated, calculated and joined data) that can also be business objects
- Unstructured text
- Extracted entities
- Fuzzy match
- Security or access level – a form of derived data
Although WhamTech SmartData Fabric® can include conventional (non-indexed-based) federated data access (FDA) adapters, advanced capabilities are enabled and significant value is driven by unique INDEX-based FDA adapters that:
- Build and maintain DATA PROFILES using raw indexes for data discovery (metadata), data matching within and across data sources, and developing and testing data transforms.
- Support FORRESTER ZERO TRUST DATA SECURITY FRAMEWORK – discover, INDEX, classify and secure – GDPR, PCI, PHI, PII, etc.
- PREPROCESS DATA to build and maintain production indexes to address data management fundamentals, e.g., cleansing, transformation, standardization and security – data usually discarded.
- Use LINK INDEXES™ AS BASIS FOR MDM – future development that can be used exclusively for MDM match and merge.
- COMBINE CONTENT, LINK AND MASTER DATA INDEXES for complete and multiple views of data.
- Index POINTERS UNIQUELY REFER TO DATA in sources, copies, e.g., Data Lake, or stored in indexes – pointers retained in results data = full traceability.
- Enable HIGH PERFORMANCE, DISTRIBUTED PARALLEL QUERY PROCESSING through standard drivers, APIs, Web/data services, SQL and other query languages.
- MONITOR DATA SOURCES for content and relationships in near real-time, and support EVENT PROCESSING.
- Enable VIRTUAL GRAPH DATABASE, link analysis and graph/link visualization.
- As columnar indexes, they can be INVERTED AND COMBINED TO GENERATE DATA RECORDS for optimization, when data sources unavailable or as storage for IoT devices, for example.
Other configuration options include a hybrid adapter that uses an index-based FDA adapter for data that needs indexes, and a conventional (non-indexed-based) FDA adapter for data that does not need indexes, assuming that the data source system can support external queries. The results from the two types of FDA adapters on the same data source can be combined through a join using an SDF federation server or third-party federated query engine, such as Trino®, Presto® or Dremio®.
Multiple solutions to challenging problems in one scalable, distributed technology
The SmartData Fabric® allows for indexed and non-indexed, adapter-based virtualized and federated access to data across large numbers of data sources, types, and formats, while leaving data in its original sources. More on these key capabilities.