Comments:

Examples of SQL JOIN Syntax

Advanced Syntax

Native API Text Search Syntax

Examples

Appendix A - Native API SQL Syntax

Native API supports most of the popular SQL-92 and SQL-99 syntax.

Supported Commands

Apart from SELECT command, some of the important commands supported are:

WhamTech Native API provides support for SQL-92 and SQL-99 SELECT syntax that covers most commonly used query capabilities, including for business intelligence queries. Following is the supported syntax for the SELECT statement.

SELECT [DISTINCT] <FieldList>

FROM {<TableName> | <DerivedTable>} [TableAlias]

[{ ,<TableName> [TableAlias] } ...]

[<Joined Table [Joined Table[...]]]

[JOIN <JoinTable> <JoinTableAlias> ON <JoinCondition> [JOIN <JoinTable> <JoinTableAlias> ON <JoinCondition>]

[WHERE <Conditions>]

[GROUP BY <FieldList>]

[HAVING <Conditions>]

[ORDER BY <OrderByFieldList>]

[UNION SELECT .]

[RECORDS(StartRow, EndRow)]

Where:

<FieldList> ::= * | <FieldName> [[AS] FieldAlias] | <Numeric/Arithmetic constants>

| <AggregateFunctions> | <Calculated Values>

<DerivedTable> ::= ( SELECT [DISTINCT] <FieldList> FROM . )

<Joined Table> ::= <CROSS JOIN> <TableName> [<TableAlias>] |

<[INNER] JOIN> <TableName> [<TableAlias>] ON <JoinCondition> |

<LEFT [OUTER] JOIN> <TableName> [<TableAlias>] ON <JoinCondition> |

<RIGHT [OUTER] JOIN> <TableName> [<TableAlias>] ON <JoinCondition> |

<JoinCondition> ::= TableName1.PKFieldName = TableName2.FKFieldName

<Conditions> ::= [*<JoinCondition> [ { AND <JoinCondition> },,,] ] AND <Condition> [ { <AND|OR> <Condition> }...]

<Condition> ::=

| <FieldReference> [NOT] IN (<Value>,...)

| <FieldReference> [NOT] LIKE 'String Value'

| <FieldReference> [NOT] BETWEEN <Value> AND <Value>

| <FieldReference> IS [NOT] NULL

* | <FieldReference> [NOT] IN <DerivedTable>

* | <FieldReference> <Compare Operator> [ALL | ANY | SOME | [NOT]

EXISTS ] <DerivedTable>

| <CONTAINS()>

<FieldReference> ::= [{<TableName> | TableAlias} <period>] FieldName

<Compare Operator> ::=

| >=

| <=

| <

| >

| <>

| !=

<Value> ::=

'String Value'

| <NumericValue>

| <DateValue>

<NumericValue> ::= xxx[.yyy]

<TimeStampValue> ::= [TIMESTAMP] <quote><Date String><space><Time String><quote>

<Time String> ::= hh:mm:ss.ttt

<quote> ::= '

<OrderFieldList> ::= <FieldName> [ASC | DESC] [,.]

<CONTAINS()> : Please refer to Text Search Syntax section below for details.

<AggregateFunctions> : Please refer to "Advanced Syntax" section below for

details.

<period> ::= .

Notes

* It is invalid if <JoinCondition> is specified in both <Joined Table> and <WHERE > clause.
* When using a sub-query, FieldName and FieldName1 are connecting fields. They must be the same data type and length (match command requirement). FieldName2,.. FieldNameN are redundant.

Comments:

1. Advanced syntax, including for aggregate functions and arithmetic operators, is described below under the "Advanced Syntax" section.

2. SELECT command supports the Text Search function CONTAINS() in the WHERE condition. CONTAINS() could be used along with normal SQL queries. Please see below under "Text Search Syntax" section.

3. Supports most common uses of nested JOIN and SELF JOIN syntax.

4. Supports UNION, CASE, COALESCE, and several string functions

5. Select statements must use explicit joins (i.e. JOIN clause must be specified).

6. The comma(s) which separate the table in FROM clause is(are) treated as CROSS JOIN.

7. The order of join is based on the following sequence:

a. the order of ON <JoinCondition> OR

b. the order of <JoinCondition> specified in WHERE clause

c. the rest of Tables which are not referenced in On 1) and 2) will be treated as CROSS JOIN table in the order in

FROM <TableName> [TableAlias] [{ ,<TableName> [TableAlias] } ...]

8. DISTINCT keyword works only for queries involving a single table (and not when joined tables are involved).

9. Supports RECORDS() function to return only a subset of records from the results. This function behaves similar to TOP in SQL Server and LIMIT in MySQL. RECORDS() function works for statements involving one single table only.

Examples of SQL JOIN Syntax

select * from t1 inner join t2 on t1.t1k = t2.t1k left join t3 on t2.t3k = t3.t3k
select * from t1, t2 left join t3 on t2.t3k = t3.t3k
select * from t1, t2, t3 where t1.t1k = t2.t1k and t2.t3k = t3.t3k

Exceptions:

select * from t2 left join t3 on t2.t3k = t3.t3k, t1

is invalid.

select * from t1, t2 left join t3 on t2.t3k = t3.t3k where t1.t1k = t2.t1k

is invalid because <JoinCondition> specified both in <Joined Table> and <WHERE > clause.

Advanced Syntax

Aggregate Functions:

SUM() - returns the sum of values
COUNT() - returns the count of values
AVG() - returns the average of values
MIN() - returns the minimum value
MAX() - returns the maximum value
VAR() - returns the variance of values
STD() - returns the standard deviation of values

Arithmetic Operators:

Plus (+)
Minus (-)
Modulus (%)
Division (/)
Multiplication (*)

All aggregate and arithmetic functions work with single table and multiple tables.

Alias names are supported for aggregate and calculated fields.

Exceptions:

Return values from arithmetic operations would always be in floating point. Implication is that an integer division such as 5/2 would result in 2.5 instead of 2.
Alias names could not be references in subsequent arithmetic or aggregate functions in the same query.

For example, SELECT column1 + column2 as A, A + column3 FROM table .. is not supported.

Field Aliases can only be used with GROUP BY and ORDER BY. They cannot be used in the condition (WHERE clause).
Arithmetic operations on fields work only when the fields are from the same table.

For example, "SELECT Table1.Field1 + Table2.Field2 FROM .." is not supported.

GROUP BY does not support calculated fields. Also the GROUP BY modifiers 'ROLLUP' and 'CUBE' are not supported.

Examples in using aggregate functions and arithmetic operators:

Simple calculated value in SELECT with alias

SELECT a, b, a+b as x FROM ...

Simple calculated value in SELECT without alias

SELECT a, b, a+b FROM ...

Simple aggregate in SELECT with alias and without GROUP BY

SELECT sum(a) as x FROM ...

Simple aggregate COUNT(*) in SELECT with alias and without GROUP BY

SELECT COUNT(*) as x FROM ...

Simple aggregate in SELECT without alias and without GROUP BY

SELECT sum(a) FROM ...

Simple aggregate in SELECT with alias and with GROUP BY

SELECT a, sum(a) as x FROM ... GROUP BY b ...

Simple aggregate in SELECT without alias and with GROUP BY

SELECT a, sum(a) FROM ... GROUP BY b ...

Aggregate of expression in SELECT with alias and without GROUP BY

SELECT sum(a + b) as x FROM ...

Aggregate of expression in SELECT without alias and without GROUP BY

SELECT sum(a + b) FROM ...

Select constant value

SELECT 1, 'abcde' FROM

Simple calculated value in WHERE and in ORDER BY

SELECT * FROM .WHERE a+b >20 ORDER by a+b

Examples in using Date and DateTime columns:

select * from users where created = date '2010-3-19' is same as
select * from users where created = '2010-3-19'
select * from users where created = '2010\3\19'
select * from users where created = '2010.3.19 '
select * from users where created = '2010/03/19'
select * from users where created = date '03-19-2010'
select * from users where created = '03-19-2010'
select * from users where created = '03\19\2010'
select * from users where created = '03.19.2010'
select * from users where created = '03/19/2010'
select * from table_3 where dt1 = timestamp '1956-11-13 20:12:05.123'

COALESCE:

Returns the first non-null expression among its arguments.

Syntax:

COALESCE (expression [ ,...n ] )

Where, expression is an expression of any type.

Coalesce returns the data type of expression with the highest data type precedence. If all expressions are non-nullable, the result is typed as non-nullable.

For example:

SELECT ProductID, ProductName, COALESCE(LaunchDate, '2008-01-01') FROM Products

This displays '2008-01-01' for products that have NULL values for the LaunchDate column.

CASE:

Evaluates a list of conditions and returns one of multiple possible result expressions.

The CASE expression has two formats:

. The simple CASE expression compares an expression to a set of simple expressions to determine the result.

. The searched CASE expression evaluates a set of Boolean expressions to determine the result.

Both formats support an optional ELSE argument.

Syntax:

Simple CASE expression:

CASE input_expression

WHEN when_expression THEN result_expression [ ...n ]

[ ELSE else_result_expression ]

END

Searched CASE expression:

CASE

WHEN Boolean_expression THEN result_expression [ ...n ]

[ ELSE else_result_expression ]

END

The Case expression returns the highest precedence type from the set of types in result_expressions and the optional else_result_expression.

For example:

1. SELECT ProductNumber, Category =

CASE ProductLine

WHEN 'R' THEN 'Road'

WHEN 'M' THEN 'Mountain'

WHEN 'T' THEN 'Touring'

WHEN 'S' THEN 'Other sale items'

ELSE 'Not for sale'

END,

Name

FROM Production.Product

2. SELECT ProductNumber, Name, 'Price Range' =

CASE

WHEN ListPrice = 0 THEN 'Mfg item - not for resale'

WHEN ListPrice < 50 THEN 'Under $50'

WHEN ListPrice >= 50 and ListPrice < 250 THEN 'Under $250'

WHEN ListPrice >= 250 and ListPrice < 1000 THEN 'Under $1000'

ELSE 'Over $1000'

END

FROM Production.Product

[NOTE:- Include the following usage of CASE as part of Native API SQL syntax (CASE operator as an argument for an aggregate function):

select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= '1995-09-01' and l_shipdate < date '1995-09-01' + interval '1' month;

]

RECORDS(StartRow, EndRow):

StartRow is the starting row number (the offset of the initial row is 1 - not 0)

EndRow is the ending row number

RECORDS(0, EndRow ) is same as RECORDS(1, EndRow).

RECORDS() function works for statements involving one single table only.

Example:

SELECT empnum FROM employee WHERE employeeDept = 'SAS' RECORDS (6 , 11) - skips the first 5 rows and truncates the 12th.

String Functions:

1. SUBSTRING(str, start_position)

2. SUBSTRING(str, FROM start_position)

3. SUBSTRING(str, start_position, sub_length)

4. SUBSTRING(str, FROM start_position, FOR sub_length )

The forms without length arguments return a substring from string 'str' starting at position 'start_position'.

The forms with a length argument return a substring of 'sub_length' characters long from string 'str', starting at position 'start_position'.

The forms that use FROM are as per standard SQL syntax.

SUBSTR() and SUBSTRING() are synonyms.

5. TRIM([[{BOTH | LEADING | TRAILING}] [remstr] FROM] str)

6. TRIM([remstr FROM] str)

Returns the string 'str' with all 'remstr' prefixes or suffixes removed.

Note the current TRIM() support is limited to string constants.

7. LRIM(str) - removes leading spaces from 'str'.

8. RTRIM(str) - removes trailing spaces from 'str'.

9. LCASE(str) - turns 'str' in to lower case.

10. LOWER(str) - turns 'str' in to lower case.

11. UCASE(str) - turns 'str' in to upper case.

12. UPPER(str) - turns 'str' in to upper case.

Where

str - string constant i.e. 'some text' or string column

remstr - string constant, prefixes or suffixes removed

Native API Text Search Syntax

To use text search commands, you must first build text search indexes in RTI tool. Make sure to check the options that you expect to use. For example, make sure to check "Build Indexes for searching words in the proximity of 1 or 2 words" to enable phrase search. Without these indexes, WhamSearch cannot find phrases. The same applies to other search functions such as stem, synonym and Soundex.

The rules for identifying a word during the build process in RTI tool are: if any character except {A-Z,a-z,0-9, '} is encountered, it is considered a word break. The single quote (') is not considered a word break. For example, O'Brien is indexed as both O'BRIEN and OBRIEN. Ampersand (&) when occurs on its own is considered a special word. Otherwise it is considered a word break.

During the query process, open and close parentheses ( '(' , ')' ), and comma (,) are not allowed in the CONTAINS() as they are part of the query syntax. Characters other than {A-Z,a-z,0-9}are considered invalid to be part of a word and should be stripped before submitting to the query process. & (with the escape character '\') is handled as a special word. Single quote (') requires an escape character of another single quote (eg: O''BRIEN).

SELECT [TextSearchReturnFunctions(ReturnColumn, 'Parameters')] FROM TABLE [WHERE CONTAINS()]

Where,

TextSearchReturnFunctions - can be any of URLCONTEXT(), SUMSCORE(), URLCONTENT(), and NORAMLSUMSCORE()

ReturnColumn - column name to hold the return values

Parameters - parameters required for each function. See below for details for each of the text search return functions.

CONTAINS() - function that is central to text search; finds user specified text. See below for details.

Text search return functions:

URLCONTEXT() returns the context for a word, i.e. the phrase/sentence in the matching document that contains the search word(s).

Syntax:

URLCONTEXT(ReturnColumn,'Repository_Filename, SearchWord')

Where

'Repository_Filename - the name of the repository/cache file containing the text (the bucket file name with .tsi file extension)

SearchWord - can be a single word OR multiple words separated by spaces.

SUMSCORE () returns the absolute score of a search word/phrase within a document using the scores of top N words.

Syntax:

SUMSCORE(ScoreColumn, 'WordScoreColumnPrefix TopWordsColumn SearchWord ')

Where

ScoreColum - the column which returns the calculated score.

WordScoreColumnPrefix - the prefix of the names of columns containing scores for each of the top N words - TopWord1Score, TopWord2Score, etc.

TopWordsColumn - the prefix of the names of columns containing the top N words - TopWord1, TopWord2, etc.

SearchWord: [TextFunctionName] Word [.]

Where

TextFunctionName - Syn or Stem

Word - can be a single word OR multiple words separated by spaces.

The score returned is a positive integer value (greater than or equal to zero). A higher score denotes higher occurrence/ranking of the search word in the particular URL.

This function works only when columns for the top N words and their scores are present in the table.

This function works only for queries involving single tables (no JOIN or UNION).

// Find DOCUMENTS containing synonyms of 'dog' and order the results by the score for 'doc' in descending order

select bodyid, word1, word2, word3, word4, word5, sumscore(score, 'score word syn dog') from DOCUMENTBODY where contains(DOCUMENTBODY, 'DOCUMENT, syn(dog)') order by score desc

URLCONTENT() returns the content of a document from the repository/cache.

Syntax:

SELECT URLCONTENT(destination, 'bucketfilename') . FROM ..

Where

bucketfilename: the field contains tsi file information ( ie relative tsi file name offset and length ) is required.

Destination and bucketfilename can't be the same.

For URLCONTENT() and URLCONTEXT() functions the destination column returns information from document repository (bucket file with '.tsi' file extension).

Note that these functions may not return accurate information for queries involving JOIN and UNION if a record gets processed multiple times as internal buffers get overwritten by subsequent operations.

NORAMLSUMSCORE() returns normalized score (SUMSCORE() divided by the total score of the Top 'N' words in each document (currently 'N' is 20)).

CONTAINS():

Syntax:

CONTAINS(TableName, ' TextColumnName, SearchText [, Min|Max(<MinMaxColumn>,<GroupByColum>)] ')

WHERE

TableName is the name of the table.

TextColumnName is the name of the column to search (text search indexes must already be built for this column).

SearchText can be:

A single word.
Words separated by a space (denotes phrase search).
Words separated by '&' but no spaces between words (denotes AND - both words required to be present in the document).
Words separated by '|' but no spaces between words (denotes OR - any of the words required to be present in the document).
Words can be weighted, meaning users can only for words of certain weight in the content.
FunctionName("Word"), where FunctionName can be any of STEM(), SYN(), SOUNDEX(), METAPHONE(), PROX() text functions.

Weighted words:

Weighted words are specified using the following syntax:

Word:n

where n can be any of 0,1,2,3 to indicate weight level (weights in increasing order from low-weighted to high weighted based on the section (title, heading, normal text etc.) in which they occur in a document/field) of the word in a document. By default, when there is no weight mentioned at the end of the word, it means non weighted words (0 weight).

Text functions:

Stem(Word): Finds words that have a common stem (root). E.g. Stem(Unites) would retrieve the results having words "Unites" and "United" as both have the same stem "Unite".

Syn(Word): Finds words that are synonyms of the specified word. E.g. Syn(United) returns results that contain "Joined", "Combined" and other synonyms of United.

Soundex(Word): Finds words that sound similar to the search word. E.g. Soundex(Omar) would return the result containing 'Omeir' as they are similar-sounding words.

Metaphone(Word): Metaphone is similar to Soundex. However, its algorithm is more accurate as it has knowledge of basic rules of English pronunciation.

Prox(Word1 Word2)]: Finds where "Word1" and "Word2" appear in close proximity to each other in a given document or record. E.g. Prox(United Canada) will display documents/records containing ". United States, Canada ...", ".United Canada .", etc.

Escape sequences for Ampersand (&) and single-quote ('):

To search for the Ampersand (&) character in a phrase or otherwise, use the escape sequence (\&).

To search for the single-quote character, use the escape sequence ('') (two successive single-quote characters).

Comma (,) cannot be searched as part of a phrase.

MAX() and MIN():

Returns only the records that have the Minimum or Maximum value for the <MinMaxColumn> field for each distinct <GroupByColumn>.

Examples

// regular text search

SELECT URLCONTEXT(urlcontext,'repository_file,Word'),userdata FROM urldocument WHERE

CONTAINS(urldocument,'urltext, Word')

// using various text search functions