Saturday 8 March 2014

My Words for SQL Testing Information.


Types of Joins:

Joins are one of the most important operations performed by a relational database system. An RDBMS uses joins to match rows from one table with rows from another table. For example, we can use joins to match sales with customers or books with authors. Without joins, we might have a list of sales and customers or books and authors, but we would have no way to determine which customers bought which items or which authors wrote which books.
We can join two tables explicitly by writing a query that lists both tables in the FROM clause. We can also join two tables by using a variety of different sub-queries. Finally, SQL Server may introduce joins for a variety of purposes into a query plan during optimization.
This is the first of several posts that I am planning for joins. In this post, I’m going to start by introducing the logical join operators that SQL Server supports. These are:
While different implementations have many ways of joining tables, you concentrate on the most common joins in this lesson. The types of joins that you learn are
  • EQUIJOINS( Inner join)
  • NATURAL JOINS
  • NON-EQUIJOINS
  • Using Table Aliases
  • OUTER JOINS
  • SELF JOINS
  • Cross join
  • Cross apply
  • Semi-join
  • Anti-semi-join
Here is a simple schema and data set that I will use to illustrate each join type:
create table Customers (Cust_Id int, Cust_Name varchar(10))
insert Customers values (1, 'Craig')
insert Customers values (2, 'John Doe')
insert Customers values (3, 'Jane Doe')
create table Sales (Cust_Id int, Item varchar(10))
insert Sales values (2, 'Camera')
insert Sales values (3, 'Computer')
insert Sales values (3, 'Monitor')
insert Sales values (4, 'Printer')
Inner joins
Perhaps the most used and important of the joins is the EQUIJOIN, also referred to as an INNER JOIN. The EQUIJOIN joins two tables with a common column in which each is usually the primary key.
The syntax for an EQUIJOIN is
SELECT TABLE1.COLUMN1, TABLE2.COLUMN2...
FROM TABLE1, TABLE2 [, TABLE3 ]
WHERE TABLE1.COLUMN_NAME = TABLE2.COLUMN_NAME
[ AND TABLE1.COLUMN_NAME = TABLE3.COLUMN_NAME ]
NOTE
Inner joins are the most common join type. An inner join simply looks for two rows that put together satisfy a join predicate. For example, this query uses the join predicate “S.Cust_Id = C.Cust_Id” to find all Sales and Customer rows with the same Cust_Id:
select *
from Sales S inner join Customers C
on S.Cust_Id = C.Cust_Id
Cust_Id Item Cust_Id Cust_Name
----------- ---------- ----------- ----------
2 Camera 2 John Doe
3 Computer 3 Jane Doe
3 Monitor 3 Jane Doe
Notes:
  • Cust_Id 3 bought two items so this customer row appears twice in the result.
  • Cust_Id 1 did not purchase anything and so does not appear in the result.
  • We sold a ‘Printer’ to Cust_Id 4. There is no such customer so this sale does not appear in the result.
Inner joins are fully commutative. “A inner join B” and “B inner join A” are equivalent.

Natural Joins

NATURAL JOIN is nearly the same as the EQUIJOIN; however, theNATURAL JOIN differs from the EQUIJOIN by eliminating duplicate columns in the joining columns. The JOIN condition is the same, but the columns selected differ.
The syntax is as follows:
SELECT TABLE1.*, TABLE2.COLUMN_NAME
    [ TABLE3.COLUMN_NAME ]
FROM TABLE1, TABLE2 [ TABLE3 ]
WHERE TABLE1.COLUMN_NAME = TABLE2.COLUMN_NAME
[ AND TABLE1.COLUMN_NAME = TABLE3.COLUMN ]
Look at the following example:
SELECT EMPLOYEE_TBL.*, EMPLOYEE_PAY_TBL.SALARY
FROM EMPLOYEE_TBL,
   EMPLOYEE_PAY_TBL
WHERE EMPLOYEE_TBL.EMP_ID = EMPLOYEE_PAY_TBL.EMP_ID;
This SQL statement returns all columns from EMPLOYEE_TBL andSALARY from the EMPLOYEE_PAY_TBL. The EMP_ID is in both tables, but is retrieved only from the EMPLOYEE_TBL because both contain the same information and do not need to be selected.
The following example selects all columns from the EMPLOYEE_TBL table and only one column from the EMPLOYEE_PAY_TBL table. Remember that the asterisk (*) represents all columns of a table.
SELECT EMPLOYEE_TBL.*, EMPLOYEE_PAY_TBL.POSITION
FROM EMPLOYEE_TBL, EMPLOYEE_PAY_TBL
WHERE EMPLOYEE_TBL.EMP_ID = EMPLOYEE_PAY_TBL.EMP_ID;
EMP_ID     LAST_NAM FIRST_NA M ADDRESS          CITY         ST  ZIP  PHONE
---------- -------- -------- - -------------   ------------  -- ----- ----------
PAGER      POSITION
---------- --------------
311549902  STEPHENS TINA     D  RR 3 BOX 17A    GREENWOOD    IN 47890 3178784465
           MARKETING
 
442346889  PLEW     LINDA    C  3301 BEACON     INDIANAPOLIS IN 46224 3172978990
           TEAM LEADER
 
213764555 GLASS    BRANDON S 1710 MAIN ST    WHITELAND    IN 47885 3178984321
3175709980 SALES MANAGER
 
313782439 GLASS    JACOB       3789 RIVER BLVD INDIANAPOLIS IN 45734 3175457676
8887345678 SALESMAN
 
220984332  WALLACE  MARIAH      7889 KEYSTONE   INDIANAPOLIS IN 46741 3173325986
           SHIPPER
 
443679012  SPURGEON TIFFANY     5 GEORGE COURT  INDIANAPOLIS IN 46234 3175679007
           SHIPPER

6 rows selected.
NOTE
Notice how the output has wrapped in the previous example. The wrap occurred because the length of the line has exceeded the limit for the line (which is usually 80 characters per line by default).

Using Table Aliases

The use of table aliases means to rename a table in a particular SQL statement. The renaming is a temporary change. The actual table name does not change in the database. As you will learn later in this hour, giving the tables aliases is a necessity for the SELF JOIN. Giving tables aliases is most often used to save keystrokes, which results in the SQL statement being shorter and easier to read. In addition, fewer keystrokes means fewer keystroke errors. Also, programming errors are typically less frequent if you can refer to an alias, which is often shorter in length and more descriptive of the data with which you are working. Giving tables aliases also means that the columns being selected must be qualified with the table alias. The following are some examples of table aliases and the corresponding columns:
SELECT E.EMP_ID, EP.SALARY, EP.DATE_HIRE, E.LAST_NAME
FROM EMPLOYEE_TBL E,
   EMPLOYEE_PAY_TBL EP
WHERE E.EMP_ID = EP.EMP_ID
AND EP.SALARY > 20000;
The tables have been given aliases in the preceding SQL statement. The EMPLOYEE_TBL has been renamed E. The EMPLOYEE_PAY_TBL has been renamed EP. The choice of what to rename the tables is arbitrary. The letter E is chosen because the EMPLOYEE_TBL starts with E. Because the EMPLOYEE_PAY_TBL also begins with the letter E, you could not use E again. Instead, the first letter (E) and the first letter of the second word in the name (PAY) are used as the alias. The selected columns were justified with the corresponding table alias. Note thatSALARY was used in the WHERE clause and must also be justified with the table alias.

Joins of Non-Equality

NON-EQUIJOIN joins two or more tables based on a specified column value not equaling a specified column value in another table. The syntax for the NON-EQUIJOIN is
FROM TABLE1, TABLE2 [, TABLE3 ]
WHERE TABLE1.COLUMN_NAME != TABLE2.COLUMN_NAME
[ AND TABLE1.COLUMN_NAME != TABLE2.COLUMN_NAME ]
Cross joins
A cross join performs a full Cartesian product of two tables. That is, it matches every row of one table with every row of another table. You cannot specify a join predicate for a cross join using the ON clause though you can use a WHERE clause to achieve essentially the same result as an inner join.
Cross-joins are fairly uncommon. Two large tables should never be cross joined as this will result in a very expensive operation and a very large result set.
select *
from Sales S cross join Customers C
Cust_Id Item Cust_Id Cust_Name
----------- ---------- ----------- ----------
2 Camera 1 Craig
3 Computer 1 Craig
3 Monitor 1 Craig
4 Printer 1 Craig
2 Camera 2 John Doe
3 Computer 2 John Doe
3 Monitor 2 John Doe
4 Printer 2 John Doe
2 Camera 3 Jane Doe
3 Computer 3 Jane Doe
3 Monitor 3 Jane Doe
4 Printer 3 Jane Doe
Cross apply
We introduced cross apply in SQL Server 2005 to enable joins with a table valued function (TVF) where the TVF has a parameter that changes for each execution. For example, the following query returns the same result as the above inner join using a TVF and cross apply:
create function dbo.fn_Sales(@Cust_Id int)
returns @Sales table (Item varchar(10))
as
begin
insert @Sales select Item from Sales where Cust_Id = @Cust_Id
return
end
select *
from Customers cross apply dbo.fn_Sales(Cust_Id)
Cust_Id Cust_Name Item
----------- ---------- ----------
2 John Doe Camera
3 Jane Doe Computer
3 Jane Doe Monitor
We can also use outer apply to find all Customers regardless of whether they purchased anything. This is similar to an outer join.
select *
from Customers outer apply dbo.fn_Sales(Cust_Id)
Cust_Id Cust_Name Item
----------- ---------- ----------
1 Craig NULL
2 John Doe Camera
3 Jane Doe Computer
3 Jane Doe Monitor
Semi-join and Anti-semi-join
A semi-join returns rows from one table that would join with another table without performing a complete join. An anti-semi-join returns rows from one table that would not join with another table; these are the rows that would be NULL extended if we performed an outer join.
Unlike the other join operators, there is no explicit syntax to write “semi-join,” but SQL Server uses semi-joins in a variety of circumstances. For example, we may use a semi-join to evaluate an EXISTS sub-query:
select *
from Customers C
where exists (
select *
from Sales S
where S.Cust_Id = C.Cust_Id
)
Cust_Id Cust_Name
----------- ----------
2 John Doe
3 Jane Doe
Unlike the previous examples, the semi-join only returns each customer one time.
The query plan shows that SQL Server indeed uses a semi-join:
|--Nested Loops(Left Semi Join, WHERE:([S].[Cust_Id]=[C].[Cust_Id]))
|--Table Scan(OBJECT:([Customers] AS [C]))
|--Table Scan(OBJECT:([Sales] AS [S]))
There are left and right semi-joins. A left semi-join returns rows from the left (first) input that match rows from the right (second) input while a right semi-join returns rows from the right input that match rows from the left input.
We might similarly use an anti-semi-join to evaluate a NOT EXISTS sub-query.
Miscellaneous notes
In all of the above examples, I used a join predicate that compares whether two columns, one from each table, are equal. This type of join predicate is called an “equijoin.” Other join predicates (such as inequalities) are possible, but equijoins are especially common and SQL Server has many more alternatives when optimizing equijoins than when optimizing joins with more complex predicates.
SQL Server has more flexibility over join order and algorithms when optimizing inner joins than when optimizing outer joins and cross applies. Thus, given two queries that differ only in that one strictly uses inner joins while the other uses outer joins and/or cross applies, SQL Server may be able to find a better plan for the query that uses inner joins only.