Link Indexes™ are an add-on option for EIQ Adapters™. Link Indexes™, in combination with normal content indexes, provide metadata, enable MDM support, accelerated database joins, degrees-of-separation queries, virtual graph database, link analysis, virtual customer/employee/patient/person and other entity-centric views, and data, logical and ontological model discovery, among other capabilities.
Link Indexes™ change the way data is managed, as they add structure to content, and capture relationships in data, within a data source and across multiple data sources, regardless of whether structured or unstructured, type, format, location, system, cloud or non-cloud, access, etc., and could ultimately empower end-user-level application development.
HISTORY
WhamTech developed the predecessor of Link Indexes™ through work on its Web search engine during 2000 – 2001, when normal content indexes were used to capture hyperlinks in Web pages and documents, and allowed WhamTech to establish the following:
- Indexing “from and to” links and inverting them to establish “to and from” links, enabled backwards and forwards Web navigation and therefore lateral browsing, e.g., go to Web pages that are hyperlinked TO the current Web page, as well as the easier option of FROM the current Web page
- Hyperlinked communities on the Web could be discovered and navigated, and Web pages filtered in or out, using WhamTech’s powerful search (for more information, see WHAMSEARCH™)
- Hyperlink indexes added structure to content
- Metadata from hyperlink indexes could be used to calculate how popular Web pages and documents were in general, and how popular and important they were to specific communities by using social network analysis (SNA)
- Could be used for ranking results
- All networks, regardless of structural complexity (hierarchical, relational, random and even 3D objects) could be represented by multiple pairs of links, such as hyperlinks or primary key – foreign key (PK-FK) connections in relational databases
From WhamTech’s experience with content, structure and ranking of very large data sets, an understanding of data-based solutions evolved, as illustrated in the diagram LI1 below:
LI1: WhamTech EIQ Products combine content, links and ranking
LINK INDEXES™ EVOLVED FROM HYPERLINK INDEXES
In 2001, WhamTech started working on law enforcement and intelligence data access issues, and link analysis. And over time, extended hyperlink indexes to arrive at the current Link Indexes™ EIQ Adapter™ add-on, which can be described as follows:
- Link Indexes™ are representations of links between data pointers to records/files/documents in the same or other data sources
- Based on data/entity (collectively known as “entity”) matches, e.g., PK-FK, or exact, fuzzy or algorithm match – these are direct or obvious relationships
- As a consequence, all data in the same record, file or document/paragraph are connected – these are indirect or non-obvious relationships
- Link Indexes™ can be combined, using Boolean operations, to represent networks through simple SQL – WhamTech added LINK (a specialized JOIN) and DOS (degrees-of-separation) to conventional SQL, obviating the need for specialized graph query language, although WhamTech is considering enabling the new standard Graph Query Language (GQL) in addition to continuing with conventional SQL
- Link Indexes™ can be used in conjunction with content indexes for accelerating internal and external joins in combination with normal content indexes, provide metadata, enable MDM support, accelerated database joins, degrees-of-separation queries, virtual graph database, link analysis, virtual customer/employee/patient/person and other entity-centric views, and data, logical and ontological model discovery, again, using simple SQL
- Other applications are possible
There are unlimited ways that Link Indexes™ can be built, including using hyperlinks and PK-FKs mentioned above, and also using various matches on structured data, extracted entities, context, words, predictive analytics algorithms, combinations of these, etc. To maximize the value and minimize the number of links, Link Indexes™ should be built and maintained using high cardinality entities, i.e., entities that are either unique or have limited populations, such as person (defined as more than just a name), address, email, phone number, SSN, etc. Link Indexes™ can be stored at different levels to coexist with content indexes or not, as needed at individual data source, department, organization or regional levels, combinations of levels or as one at a central level. Regardless of where they are built and maintained, similar to content indexes, Link Indexes™ are 100% contiguous across multiple data sources and levels.
The following diagram LI2 illustrates the basic differences between normal content indexes and Link Indexes™:
LI2: A Content Index is used to generate a Link Index – in this case, of self-joins within a single (table) index
In the above example, a simple name within a single (table) index is used to generate a Link Index through self-joins. In some cases, there may be separate Link Indexes™ for the different types of links – in other cases, not. The Link Index is an inversion of the content index in the sense that record numbers are made queryable, but not in the sense that content can be read from the Link Index like a normal inverted index.
LINK MAPPING – BUILDING AND MAINTAINING LINK INDEXES™
The process to build and maintain Link Indexes™ can run in parallel with building and maintaining content indexes or after content indexes are built. The following two diagrams LI3 and LI4 illustrate the Link Index build and maintenance process:
LI3: Link Index™ build and maintenance process across four data source indexes
Note that it does not matter which Link Index™ initiates the link mapping process, as all links will eventually be captured. The link mapping process takes place on the logical/Standard Data Model level and not necessarily the physical data source schema level. As can be seen, the logical/standard data model represents entities. An example of a four data source link map is illustrated in the following diagram LI4:
LI4: An example of a four data source link map, built and maintained using internal, self and external joins
The above link mapping example results in the Link Indexes™ illustrated in the following diagram LI5, when built and maintained per data source:
LI5: Example of a distributed Link Index built from four data sources
As mentioned earlier, Link Indexes™ do not need to be distributed and the following diagram LI6 is an illustration of a consolidated version of the above distributed Link Index:
LI6: Example of a consolidated Link Index built from four data sources
LINK ANALYSIS
To illustrate working with Link Indexes™, the above consolidated Link Index is simplified and a query submitted at a lower entity level to determine if two entities belonging to two nodes (e.g., records in separate databases) are connected in any way. These nodes may contain multiple entities that are inherently connected as being part of the same record. An example could be a link query: “Does a person with a particular SSN have any connection to a vehicle with a particular VIN?” WhamTech passes the appropriate SQL query to an EIQ Federation Server™. The degrees of separation (DOS) can be specified or not for the solution – in this example, the DOS is not specified. The example Link Index is simplified and the mentioned link query is represented in the following diagram LI7:
LI7: Simplified representation of a consolidated Link Index, with source and target nodes specified
Entity queries on content indexes are used to initially isolate the source and target nodes, but thereafter, the process to determine the solution involves only the Link Index or Indexes in the case of distributed Link Indexes™. However, from a presentation point-of-view, entities associated with any and all nodes can be read from data sources and presented and interacted with visually, as per a conventional link analysis application – this process is discussed more below. When seeking a solution using a Link Index™ or Link Indexes™, “walking the tree (or “trees)” from the source node on one side and the target node on the other, along with Boolean operations on Bitmap subsets, results in a rapidly converging solution, as illustrated in the following diagram LI8:
LI8: Two solutions determined from the link analysis performed on the Link Index
Of the two solutions determined, the path in black is shorter than the one in gray; however, there could be additional metadata that favors the longer path solution such as a higher probability of the links or entities, or a more recent timeline. The link analysis process is performed in middleware, in the background, where the end-user is unaware of it, as it is a query function performed in multiple query engines associated with multiple EIQ Adapters™ for multiple data sources. Any and all entities associated with any and all nodes can be read from data sources and visually presented. WhamTech currently OEMs and supports a thin client, highly interactive commercial graph visualization tool from called KeyLines, but has in the past, and can also interface with other open source and commercial tools.
An example of the transition from innate physical models to logical models is illustrated in the following diagrams LI9 and LI10:
LI9: Physical to logical data model, grouped on entities
LI10: Logical model, grouped on entities, combined with physical model
Of note in the above, even though there was more than one link discovered between URN1 and URN 10, and between URN1 and URN100, only one link is represented in Link Indexes™. There was more than one matching entity in each case, PERSON1 and PHONE1, and PERSON1 and EMAIL1, respectively. Link Indexes™ capture the “physical” links between records/files/documents and represent “physical models”, which can be visualized, but this not normally how end-users interact with link networks. An application running in middleware combines records/files/documents retrieved through content indexes with those through Link Indexes™ to group similar entities and provide “logical models”, which are more familiar and can be visualized by almost any visualization software, as mentioned earlier. The following diagram LI11 shows a screenshot from an interactive link analysis visualization interface built using Keylines graph/link visualization software.
LI11: An interactive link analysis visualization interface built using Keylines graph/link visualization software.
The end-user now has the power to:
- Apply filters, ranges, probabilities and threat/favorability scores
- Update automatically in near real-time within n degrees of separation (DOS)
- Update manually and interactively, select, combine, separate, delete, expand and analyze
- View original records/files/documents
- View more detail
- Execute social network analysis (SNA) calculations
- Set alerts/notifications with thresholds and importance
- Execute external data source queries
- Etc.
And all of this power can be made available through a highly interactive, visual and comprehensive link analysis solution built on EIQ Products that can scale across multiple large data sources and does not need data to be extracted to a database for analysis. For more information, see solutions and the near-real-time updatable solutions, which allows for DOS-based alerts/notifications, e.g., “let me know when any new entities appear in any data source link to my network within two DOS, with a probability > 80% and carry a threat score of > 60%.” If any threat or favorability score is known about an entity, e.g., address, person, vehicle, etc., WhamTech can use guilt by association network techniques to estimate threat or favorability scores of other entities. This, combined with probabilities of links and entities being accurate themselves and/or links, authority of data sources, social media analysis and social network analysis, create a power analytics tool that could be used for more than just intelligence, e.g., virtual/hybrid CDI-MDM, marketing, fraud detection, anti-money laundering, predictive analytics, etc.
Link Indexes™ can be further used for data discovery and to discover, merge, extend, validate and present ontologies.
More information on WhamTech products, click here.