Saturday, March 25, 2017

Loading Data Script for Snowflake type Date Dimension

Date dimension is the most common dimension in data warehousing and it is generally designed as a single table. However, if it needs to be used with multiple Fact Tables with different levels, Date dimension table should be broken into multiple tables that creates a Snowflake Schema. If the table is broken into multiple tables such as Year, Quarter, Month and Date, loading them is bit difficult. A script for loading a single table is available in many sites but it is bit difficult to find a script for loading multiple tables. I had a same requirement and could not find a proper script, hence wrote one.

Here is the script if you need the same;

Here is the script for creating tables;
CREATE TABLE dbo.DimYear
(
 YearKey smallint 
 , Year smallint not null
 , Constraint pk_DimYear Primary Key (YearKey)
);
GO

CREATE TABLE dbo.DimQuarter
(
 QuarterKey smallint 
 , YearKey smallint not null
 , Quarter smallint not null
 , YearQuater varchar(20) not null
 , QuarterName varchar(20) not null
 , Constraint pk_DimQuarter Primary Key (QuarterKey)
 , Constraint fk_DimQuarter_DimYear Foreign Key (YearKey)
  References dbo.DimYear (YearKey)
);
GO

CREATE TABLE dbo.DimMonth
(
 MonthKey int 
 , QuarterKey smallint not null
 , MonthNumber smallint not null
 , MonthName varchar(20) not null
 , YearMonth varchar(20) not null
 , MonthShortName char(3) not null
 , Constraint pk_DimMonth Primary Key (MonthKey)
 , Constraint fk_DimMonth_DimQuarter Foreign Key (QuarterKey)
  References dbo.DimQuarter (QuarterKey)
);
GO

CREATE TABLE dbo.DimDate
(
 DateKey int 
 , MonthKey int not null
 , Date date not null
 , WeekDayNumber smallint not null
 , WeekDayName varchar(20) not null
 , DayOfMonth smallint not null
 , DayOfYear smallint not null
 , IsWeekend bit not null
 , IsHoliday bit not null
 , WeekNumberOfYear smallint not null
 , Constraint pk_DimDate Primary Key (DateKey)
 , Constraint fk_DimDate_DimMonth Foreign Key (MonthKey)
  References dbo.DimMonth (MonthKey)
);
GO


And the script for loading all tables.
SET DATEFIRST 1;

DECLARE @StartDate date = '2010-01-01';
DECLARE @EndDate date = '2019-12-31';
DECLARE @YearKey smallint;
DECLARE @Quarter smallint, @QuarterKey smallint;
DECLARE @Month int, @MonthKey int;
DECLARE @Date int, @DateKey int;

WHILE (@StartDate <= @EndDate)
BEGIN

 -- Inserting years
 SET @YearKey = YEAR(@StartDate);
 IF NOT EXISTS (SELECT * FROM dbo.DimYear WHERE YearKey = @YearKey)
  INSERT INTO dbo.DimYear (YearKey, [Year]) VALUES (@YearKey, @YearKey);


 -- Inserting quarters
 SET @QuarterKey = Convert(smallint, Convert(varchar(4), YEAR(@StartDate)) + Convert(varchar(1), DATEPART(q, @StartDate)))
 SET @Quarter = DATEPART(q, @StartDate);

 IF NOT EXISTS (SELECT * FROM dbo.DimQuarter WHERE QuarterKey = @QuarterKey)
  INSERT INTO dbo.DimQuarter (QuarterKey, YearKey, [Quarter], YearQuater, QuarterName) 
  VALUES (@QuarterKey, @YearKey, @Quarter, Convert(varchar(4), YEAR(@StartDate)) + 'Q' + Convert(varchar(4), @Quarter)
   , CASE @Quarter WHEN 1 THEN 'First Quarter' WHEN 2 THEN 'Second Quarter' WHEN 3 THEN 'Third Quarter' ELSE 'Forth Quarter' END);
  

 ---- Inserting months
 SET @MonthKey = Convert(int, Convert(varchar(4), YEAR(@StartDate)) + (CASE LEN(MONTH(@StartDate)) WHEN 1 THEN '0' ELSE '' END) + Convert(varchar(2), MONTH(@StartDate)));
 SET @Month = MONTH(@StartDate)
 IF NOT EXISTS (SELECT * FROM dbo.DimMonth WHERE MonthKey = @MonthKey)
  INSERT INTO dbo.DimMonth (MonthKey, QuarterKey, MonthNumber, MonthName, YearMonth, MonthShortName) 
  VALUES (@MonthKey, @QuarterKey, @Month, DATENAME(MONTH, @StartDate)
  , Convert(varchar(4), YEAR(@StartDate)) + ' ' + LEFT(DATENAME(MONTH, @StartDate), 3)
  , LEFT(DATENAME(MONTH, @StartDate), 3));
  
 ---- Inserting dates
 SET @DateKey = Convert(int, Convert(varchar(8), @StartDate, 112))
 IF NOT EXISTS (SELECT * FROM dbo.DimDate WHERE DateKey = @DateKey)
  INSERT INTO dbo.DimDate (DateKey, MonthKey, Date, WeekDayNumber, WeekDayName, DayOfMonth
   , DayOfYear, IsWeekend, IsHoliday, WeekNumberOfYear) 
  VALUES (@DateKey, @MonthKey, @StartDate, DatePart(WEEKDAY, @StartDate), DateName(WEEKDAY, @StartDate), DatePart(DAY, @StartDate)
   , DatePart(DAYOFYEAR, @StartDate), CASE WHEN DatePart(WEEKDAY, @StartDate) IN (6, 7) THEN 1 ELSE 0 END, 0
   , DatePart(WEEK, @StartDate));

 SET @StartDate = DATEADD(dd, 1, @StartDate);
END;
GO


Saturday, March 18, 2017

SQL Server Backup Expiry Date Setting

I would say, this is the most confusing property in the Microsoft SQL Server. Most misunderstand it and most do not know how it works. Although there are many posts on this, only few explains it with an example, and some articles/posts make it more confused, hence making this post the way I understand;

If you refer MSDN, it explains it well but in simple term, what it says it, if you set an expiry date for your backup set (if you need to understand what is backup media set, backup media family, backup device or backup set, read this)  or if you set Retention days, the backup set can be overwritten only when the backup set is expired or after number of days mentioned with Retention days (this is shown as After x days in GUI).

Simply, if I take a backup today (19-Mar-2017) and set the expiry date as 25-Mar-2017, I will not be able to overwrite the backup set until 25-Mar-2017. Remember, backup can be restored any day regardless of the expiry date set.

Here is an example;

Let's take a backup of AdventureWorks2014 database. See both ways; using GUI and TSQL. Note that I have mentioned a name for Media too. Without this, expiry setting will not as we expect.




BACKUP DATABASE [AdventureWorks2014] 
TO  DISK = N'E:\DatabaseBackups\AdventureWorks2014.bak' 
WITH  EXPIREDATE = N'03/25/2017 00:00:00', FORMAT, INIT
,  MEDIANAME = N'Media01',  NAME = N'AdventureWorks2014-Full Database Backup'
, SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

Now if you try to take another backup to the same media by overwriting the existing backup sets;


BACKUP DATABASE [AdventureWorks2014] 
TO  DISK = N'E:\DatabaseBackups\AdventureWorks2014.bak' 
WITH NOFORMAT, INIT
,  MEDIANAME = N'Media01',  NAME = N'AdventureWorks2014-Full Database Backup'
, NOSKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

SQL Server will throw this error;


If you still need to overwrite it, with TSQL, you can use SKIP instead of NOSKIP, or with GUI, do not mention the Media name.

Wednesday, March 15, 2017

Cannot change the SQL Server Tempdb log file size during the installation

Every SQL Server database has two type of files added; Data file and Log file and same goes for system databases as well. You can change the size of User-Defined Databases when they are created but not when System Databases are getting created by the Installation. However, with SQL Server 2016, we have a new page in the installation wizard that allows you to make some changes to the Tempdb.


As you see, you can change the number of files required for the tempdb, initial size of them and the increment for Auto Growth. Although file size can be changed as you want, it does not allow to change the Log Initial Size beyond 1,024 MB. If you set a larger value, it automatically sets to 1,024MB.

This does not mean that you can set a larger value to tempdb log file. Once the installation is done, you can change it later with the size you need. I assume that the reason for not allowing to set the size larger than 1,024MB during the installation is for saving the overall time for the installation.

Tuesday, March 14, 2017

Which protocol has been used for my SQL Server connection?

SQL Server uses 3 protocols to make the communication between client and the server. Initially there were 4 protocols but now it supports only 3: Shared Memory, Named Pipes and TCP/IP. We can enable/disable these protocols from server-end and change the priority order from client-end. Now, how do I know which protocol has been used for my connection?

We can easily see this by using sys.dm_exec_connection dynamic management view. It shows all current connection along with the used protocol. The net_transport is the one that shows it.

Here is a sample code. The first connection 54, was made without specifying anything additional when connecting, hence it has used Shared Memory. This protocol is used when it is enabled and connection made using the same machine that hosts the SQL Server. The second connection 56 has been established using Named Pipes because I forced to use Named Pipes for my connection.


How can I force the protocol when connecting via SSMS? It is simple. When connecting, if you use lpc: as the prefix for the server name, it uses Shared Memory. If you use np:, then it uses Named Piped.


Wednesday, March 8, 2017

Adding a Hash column using HASHBYTES based on all columns to all tables

We use either Checksum or Hashbytes for generating a value for finding changes of records when need to transfer records from one source to another and changes cannot be identified at the source end. This is specially used in data warehousing. I have written two posts on it, you can read them for getting an idea on it;


I had to implement similar with another database but it was not at the design stage. The database is already developed and being used, and it was over 1TB. The requirement was, adding a column that has Hash based on all existing columns. Since there were more than 300 tables, it was not practical to open the Design of the table and add the column. The only option I had was, form a dynamic query that alters the table by adding the column. I had to consider few things with the implementation;
  • All tables should be altered.
  • All columns should be used for generating the Hash.
  • Tables that have records must set Hash immediately.
  • Null must be replaced with blank because Hashbytes does not accept nulls.
Considering all, I wrote the following code for altering tables. You may use the same if you have the same requirement;

-- Getting table names into a table
-- A temporary table or table variable can be used for this
SELECT ROW_NUMBER() OVER(ORDER BY NofRecords) Id, TableName, TableId, NofRecords 
INTO dbo.TableNames
FROM
 (
 SELECT t.name TableName, t.object_id TableId, SUM(p.rows) NofRecords 
 FROM sys.partitions p
  INNER JOIN sys.tables t
   ON p.object_id = t.object_id
 WHERE p.index_id < 2 AND t.type = 'U'
 GROUP BY t.name, t.object_id) AS t;

-- Adding a clustered index
-- This is not required if the nof tables is low
CREATE CLUSTERED INDEX IX_TableNames ON dbo.TableNames (Id);
GO

DECLARE @Id int = 1;
DECLARE @LastId int = 0;
DECLARE @TableName varchar(500)
DECLARE @TableId int
DECLARE @Sql varchar(max)
DECLARE @Columns varchar(8000)

SELECT @LastId = COUNT(*) FROM dbo.TableNames;

-- Iterate through all tables
WHILE (@Id <= @LastId)
BEGIN

 SELECT @TableName = TableName, @TableId = TableId FROM dbo.TableNames WHERE Id = @Id;
 
 SET @Sql = 'ALTER TABLE dbo.' + @TableName;
 SET @Sql += ' ADD ';
 SET @Sql += ' MIG_HashKey AS HASHBYTES(''MD5'', ';
 
 -- get all columns, convert them to varchar
 -- and replace null with blank value
 SELECT @Columns = STUFF((SELECT '+ IsNull(Convert(varchar(4000), ' + name + '),'''')' FROM sys.columns WHERE object_id = @TableId FOR XML PATH ('')), 1, 2, '');

 SET @Sql += @Columns;
 SET @Sql += ') ';

 -- Execute the code
 BEGIN TRY
  EXEC ( @sql);
 END TRY
 BEGIN CATCH

  PRINT ERROR_MESSAGE()
  PRINT @Sql;
 END CATCH

 SET @Sql = '';
 SET @Columns = ''
 SET @Id += 1; 
END
GO

Tuesday, March 7, 2017

How to get the total row count of all SQL Server tables

I had a requirement for getting the record count of all tables in one of client databases that had many tables with over 10 millions records. There are many ways of getting this, hence explored some to find the most efficient way. I analyzed many techniques using various approaches. Here are some of the ways I used and time it took for producing the result;

  1. Using sys.partitions Catalog View - 1 second
  2. Using SELECT COUNT(*) with sp_MSforeachtable - 10 minutes
  3. Using sys.indexes and dm_db_partition_stats - 1 seconds

One thing we need to remember is, the database we have to work with can have tables with different structures. One can have a heap and another can have clustered structure. Not only that, if we use Dynamic Management Objects or objects that depend on Statistics, we may not get the accurate output. However, 1st option worked well for me, here is the code I wrote for getting result;

SELECT ROW_NUMBER() OVER(ORDER BY NofRecords) Id, TableName, TableId, NofRecords 
--INTO dbo.TableNames
FROM
 (
 SELECT t.name TableName, t.object_id TableId, SUM(p.rows) NofRecords 
 FROM sys.partitions p
  INNER JOIN sys.tables t
   ON p.object_id = t.object_id
 WHERE p.index_id < 2 AND t.type = 'U'
 GROUP BY t.name, t.object_id) AS t;


Monday, March 6, 2017

Creating HDInsight Hadoop Cluster with Integration Services (SSIS) Azure HDInsight Create Cluster Task

I have written two posts on HDInsight + SSIS based on Azure Feature Pack and they speak about how to creating a HDInsight Cluster and processing semi-structured data using Hive. You can read them at;

With the latest release of Azure Feature Pack, some of the settings have been changed and the creating is easier with new task than the previous one. This posts discusses the way of configuring Azure HDInsight Create Cluster Task.

In order to create a HDInsight Hadoop Cluster using SSIS, you need to make sure that following steps have been done;
  1. Azure Feature Pack is installed.
  2. Azure Storage is created and you have the name and key of it.
  3. Azure Active Directory Application is created and linked with the subscription used.
Since the installation of Azure Feature Pack is straightforward, we do not need to discuss it here. Again, creating a Storage is also a very common task with almost all Azure related implementation, hence it is not going to be discussed too. Let's start with Active Directory Application creation.

Creating an Active Directory Application and link to the subscription
Here are the steps for creating it.
1. Login to the portal and open the Azure Active Directory blade.
2. Click on App Registration and click Add.


3. In the Create Blade, enter the Name of the app and select the Application Type as Web app / API. Enter the Sign-on URL as you want. It can be changed later as your need, hence enter something and continue even though the domain is not registered with the Azure.


4. Once the application is created, you need to take the Application ID copied. 


5. You need to create an Application Key and take a copy of too. Go to Keys under Settings and create one. Note the message appeared. It is better to get a copy of the key before closing the blade.


6. You need the Tenant Id for creating the cluster. Go to Properties of Active Directory Blade and get the Directory Id. It is the Tenant Id.


7. You need two more values to be copied and kept. Go to Subscription Blade and take a copy of your Subscription Id. Other one is resource group used. Take copy of its name too. You need to make sure everything is under one resource group.

8. Next is assigning or linking the application created with your subscription. Open your Subscription Blade and select the subscription you need to use. Then select Access Control (IAM) and click on Add button.


9. With the Add Access Blade, select the Role you need to set with the Application. You need to make sure that you select the right role for this.


10. With the Add Users Blade, search for the Application Name and select it.



At the end of this steps, you have;
  1. Application Id
  2. Application Key
  3. Tenant Id
  4. Subscription Id
  5. Resource Group Name
Let's start with SSIS package now. Here are the steps for configuring it.

Creating a SSIS Package and configuring Azure HDInsight Create Cluster Task
Here are the steps doing it.

1. Create a SSIS project and a package. Drag and Drop Azure HDInsight Create Cluster Task. Open the properties of the task.


2. Open the properties of the task and select < New Connection...> of AzureResourceManagerConnection. Create a connection using copied items: Application Id, Application Key, and Tenant Id.


3. Select < New Connection...> for AzureStorageConnection. Enter the Storage account name and Account key that will be used by the cluster.


3. Next, fill the following;
  • Subscription Id - enter the copied one
  • ResourceGroup - enter the resource group used for the storage.
  • Location - Select the location used for creating the storage. Everything should be in the same locaiton.
  • ClusterName - enter the name of the cluster you need.
  • ClusterSize - enter the number of Worker Nodes you need in your cluster.
  • BlobContainer - enter the name of the Container that should be created for holding cluster files in the storage.
  • Username and Password - enter the user name and password for the cluster.
  • SshUserName and Password - enter the user name and the password for accessing the cluster remotely.
Everything required is done. If you execute the package or task now, it will create the cluster for you.


Genrally it takes about 10-15 minutes. You can open the portal and verify once it is created.


Remember, once this is created, whether you use it or not, you will be charged. Therefore, it is better to delete the cluster once the operation required is completed with it. You can use Azure HDInsight Delete Cluster Task for that. Here is the settings of it.



Sunday, March 5, 2017

Table Value Function or Stored Procedure - SQL Server Brain Basher of the Week #067

Let's talk about a very common scenario that you do with all solutions. If you need to implement a business logic using a database object such as View, Stored Procedure, or Function, how do you decide the best for your requirements.

Here is the Interview Question of the week. Should you use a Stored Procedure or Table Value Function (TVF) for the given scenario?

You have a table called Customer that has a column called CreditLimit. You must write module for accepting the CustomerId and Credit Limit, and updating the CreditLimit column with submitted Credit Limit. You need to make sure that exceptions are handled as well.

In order to answer this question, you should certain set of factors that can be used for deciding whether the logic must be implemented using a Stored Procedure or Function. Let's see each an every factor.

First thing you should know is, the output of the Stored Procedure cannot be easily used with another output generated by SELECT though INSERT EXEC is possible. However this can be easily done with TVFs. In addition to that, Stored Procedures allow us to return multiple results but function can return a single result.

Other main thing is modification. You cannot modify records inside Functions but it is possible within Stored Procedures. And you can form a statement dynamically and execute inside the Stored Procedure but it is not possible with Functions.

Exception handling is not possible with functions but details exception handling is possible with Stored Procedures.

Considering these factors, the answer is Stored Procedure because data modification and exception handling are not possible with Functions.

Saturday, March 4, 2017

Get all SQL Server tables that have IDENTITY enabled

Here is a useful script. If you need to find out tables that have Identity property enabled, you can simply query the sys.tables Catalog View combining with OBJECTPROPERTY function.

USE AdventureWorks2014;
GO

SELECT SCHEMA_NAME(schema_id) + '.' + name TableName 
FROM sys.tables
WHERE OBJECTPROPERTY(object_id, 'TableHasIdentity') = 1;

Remember, this OBJECTPROPERTY function can be used to check many properties related to SQL Server objects. See this MSDN page for more details on it;


Friday, March 3, 2017

SQL Server View does not show newly added columns

Few months back, this was experienced by one of my clients and I was asked the reason for this. Although I wanted to make a post on it, I could not and forgot but this popped up again while I was going through notes related to my class.

How can this be happened? Assume that you have created a view that references a table called Customer using SELECT * statement. When you access the view with SELECT statement, it returns all columns in Customer table.

USE tempdb;
GO

CREATE TABLE dbo.Customer
(
 CustomerId int identity(1,1) Primary key
 , Name varchar(200) not null
);
GO

CREATE OR ALTER VIEW dbo.vwCustomer
AS
SELECT * FROM dbo.Customer;
GO

SELECT * FROM dbo.vwCustomer;


Next, you add a new column to the table and you expect to see all columns including the newly added one when retrieve records using the view.

ALTER TABLE dbo.Customer
ADD CreditLimit decimal(16, 4) not null;
GO

SELECT * FROM dbo.vwCustomer;


As you see, result does not have the newly added column. What is the reason?

This is because of;
  1. We have used SELECT * statement inside the view.
  2. Metadata of the view has not be updated/refreshed.
If you avoid SELECT * statement when you create views, obviously you do not see this issue because view it is with a defined column set. If you have used SELECT *, then only option is either ALTER the view or refresh the view.

EXEC sp_refreshview 'dbo.vwCustomer';

Once the view is refreshed, new column will be appeared in the result.

Thursday, March 2, 2017

Speeding up INSERTs to Columnstore index by bypassing the Deltastore

Columnstore index is designed for retrieving large number of records much faster with aggregations and grouping. With SQL Server 2016, we can make the table as Clustered Columnstore table and have additional Rowstore indexes for handing SEEK types of requests. Generally we make our data warehousing tables as Clustered Columnstore table but it is not limited to data warehouses.

When inserting records to Clustered Columnstore table, there are few best practices for speeding up the process;
  1. Use Bulk Operations for inserting instead of inserting row by row.
  2. Load data parallel
There are multiple ways of inserting records as a bulk insert. Generally, when we have data to be inserted, in data warehousing, we load the data into staging and then load to main table. When loading from staging to main table, if we use INSERT INTO SELECT... then it becomes a bulk operation. What is the benefit? Benefit is, getting data directly into Row Group without sending them into Deltastore. However, this happens only when the batch size is 102,400 or above.

See below code. First code load data using a bulk operation but number of records is less than 102,400. See the result after that.

INSERT INTO dbo.Sales_2
 ([DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate])
SELECT TOP (5000) [DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate]
FROM ContosoRetailDW.dbo.FactSales
WHERE YEAR(datekey) IN (2008, 2009);


As you see, Deltastore is used for holding records. The next code inserts more than 102,400 records. See the result.

SELECT * 
FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = OBJECT_ID('dbo.Sales_2');

INSERT INTO dbo.Sales_2
 ([DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate])
SELECT TOP (102401) [DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate]
FROM ContosoRetailDW.dbo.FactSales
WHERE YEAR(datekey) IN (2008, 2009)
ORDER BY DateKey;

-- check all rowgroups
SELECT * 
FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = OBJECT_ID('dbo.Sales_2');


As you see, records go directly to Compressed Row Group which is the best. See what happen when you insert same number of records without using a bulk operation.

SELECT * 
FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = OBJECT_ID('dbo.Sales_2');

DECLARE cur CURSOR
FOR
SELECT TOP (102401) [DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
      ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
      ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate]
FROM ContosoRetailDW.dbo.FactSales
WHERE YEAR(datekey) IN (2008, 2009)
ORDER BY DateKey;

DECLARE @DateKey datetime ,@channelKey int ,@StoreKey int ,@ProductKey int ,@PromotionKey int ,@CurrencyKey int ,
  @UnitCost money ,@UnitPrice money ,@SalesQuantity int ,@ReturnQuantity int ,@ReturnAmount money ,@DiscountQuantity int ,
  @DiscountAmount money ,@TotalCost money ,@SalesAmount money ,@ETLLoadID int ,@LoadDate datetime ,@UpdateDate datetime 

OPEN cur
FETCH NEXT FROM cur INTO @DateKey , @channelKey, @StoreKey, @ProductKey, @PromotionKey, @CurrencyKey,@UnitCost, @UnitPrice,
  @SalesQuantity, @ReturnQuantity, @ReturnAmount, @DiscountQuantity, @DiscountAmount, @TotalCost,
  @SalesAmount, @ETLLoadID, @LoadDate, @UpdateDate

WHILE (@@FETCH_STATUS = 0)
BEGIN

 INSERT INTO dbo.Sales_2
  ([DateKey] ,[channelKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey]
    ,[UnitCost] ,[UnitPrice] ,[SalesQuantity] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity]
    ,[DiscountAmount] ,[TotalCost] ,[SalesAmount] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate])
 values
  (@DateKey , @channelKey, @StoreKey, @ProductKey, @PromotionKey, @CurrencyKey,@UnitCost, @UnitPrice,
  @SalesQuantity, @ReturnQuantity, @ReturnAmount, @DiscountQuantity, @DiscountAmount, @TotalCost,
  @SalesAmount, @ETLLoadID, @LoadDate, @UpdateDate)

 FETCH NEXT FROM cur INTO @DateKey , @channelKey, @StoreKey, @ProductKey, @PromotionKey, @CurrencyKey,@UnitCost, @UnitPrice,
  @SalesQuantity, @ReturnQuantity, @ReturnAmount, @DiscountQuantity, @DiscountAmount, @TotalCost,
  @SalesAmount, @ETLLoadID, @LoadDate, @UpdateDate
END 

CLOSE cur
DEALLOCATE cur

-- check all rowgroups
SELECT * 
FROM sys.dm_db_column_store_row_group_physical_stats
WHERE object_id = OBJECT_ID('dbo.Sales_2');
GO


See the result. Row group is not compressed. This will be automatically compressed when the number of records exceed 1,048,576 but it is better to use a bulk operation for getting it compressed with initial loading.

Wednesday, March 1, 2017

SQL Server Licensing for High Availability

Licensing is always complex and sometime we cannot figure out the number of licenses we have to buy for the production. It is always better to contact someone from Microsoft for licensing because it is not always as we think, it gets changed with many options.

This post is not about how to decide the number of licenses you need for Microsoft SQL Server, it is about the licenses you have to buy when you configure High Availability.

There are two models in SQL Server Licensing; Server + CAL and Core based. If you the devices and users connect to SQL Server and it is not much, Server + CAL model works well with it. If the number of connections are unknown and expect many connections, then Core-based model is the best. However, you need to calculate the cost and compare before making the decision.

We use many techniques for implementing High Availability, we can use Log Shipping, Mirroring (both are deprecated), Clustering and AlwaysOn Availability Group. I was recently working on a similar implementation, had a question whether we need to purchase a license for the secondary server as it is not always active. Do we really need to purchase licenses for the secondary/stand-by server?

This was with Enterprise Edition, hence only model we can use with SQL Server 2016 is Core-based. While searching on this, I found a good resource that explains everything need, here are some I have taken from it.


First of all, let's see what is the core.


Your CPU can have multiple Processors and a processor can have multiple cores. The above image shows a CPU with 2 processors and processor is a Octo Core Processor that has 8 cores. If you have a CPU like above, you need to buy 16 Core licenses.

When you implement High Availability with Windows Clustering (whether it is Cloud or On-Premises, it is same), one node becomes Active and other node becomes Passive. Since the Passive Node will not be used unless there is a fail-over, it does not requires licenses for its cores.


However, if you use AlwaysOn Availability Group with Windows Clustering and second node will be used as a read-only instance for queries and reporting, then it needs licenses based on the number of cores. 



Tuesday, February 28, 2017

CHECK constraints accepts values that evaluate to UNKNOWN

Few days back, I wrote a post titled as SQL Server Default and Rule objects - Should not I use them now? that discussed two objects that are deprecated that can be used for enforcing the data integrity. I received a question based on it, related CHECK Constraint.

CHECK Constraint limits the values for columns based on the condition added. It can be set with a column, or it can be set for the entire record by adding it to the table. If you are adding CHECK Constraint for enforcing data integrity, you need to remember how it works.

CHECK Constraint works with any Boolean Expression that can return True, False or Unknown. If the value is False, it will be rejected and if the value if True, it will be accepted. However, if the value is Unknown, then it accepts it without rejecting. Therefore, you need to be very careful with the condition you write because, if the condition returns NULL, then it will be treated as True.

You can understand it by looking at the following code;

USE tempdb;
GO

CREATE TABLE dbo.Student
(
 StudentId int primary key
 , Name varchar(100) NOT NULL
 , Marks int NULL
 , Credit int NOT NULL
 , Constraint ck_Student_Marks_Credit CHECK (Marks + Credit > 100)
);
GO

-- This record can be inserted
INSERT INTO dbo.Student 
 VALUES (1, 'Dinesh', 60, 55);

-- This record cannot be inserted
INSERT INTO dbo.Student 
 VALUES (2, 'Yeshan', 40, 40);

-- This record CAN BE INSERTED
INSERT INTO dbo.Student 
 VALUES (3, 'Priyankara', null, 60);


Monday, February 27, 2017

NULL values consume storage in SQL Server Tables?

While we were discussing on data types to be used and null-able columns, a question was raised, asking whether the space is used for NULLs as SQL Server uses for other data types. My answer was Yes and No because it depends on the data type we have used.

NULL is not exactly a value. It indicates that the value is unknown hence it requires some bits/bytes to maintain it. However, if I set one of my columns value as NULL, can I assume that it will not use the space that data type suppose to use? For example, if I have a column with data type int that uses 4 bytes per value and I inserted a record with NULL for the column, will SQL Server still uses 4 bytes or few bits for the NULL?

It is always better to write some codes for testing and come to a conclusion. Therefore, let's test this with four tables. The below code creates;
  1. Customer_Without_NullValues_FixedWidthType table
  2. Customer_With_NullValues_FixedWidthType table
  3. Customer_Without_NullValues_VaryWidthType table
  4. Customer_With_NullValues_VaryWidthType table
The last three columns of First and Second tables are set with date data type and all are null-able. And last three columns of Third and Forth tables are set with varchar(4000) data type and all are null-able.

USE tempdb;
GO

CREATE TABLE dbo.Customer_Without_NullValues_FixedWidthType
(
 Id int identity(1,1) PRIMARY KEY
 , Name varchar(100) NOT NULL
 , DateOfBirth date NULL
 , RegisteredDate date NULL
 , LastPurchasedDate date NULL
);
GO

CREATE TABLE dbo.Customer_With_NullValues_FixedWidthType
(
 Id int identity(1,1) PRIMARY KEY
 , Name varchar(100) NOT NULL
 , DateOfBirth date NULL
 , RegisteredDate date NULL
 , LastPurchasedDate date NULL
);
GO


CREATE TABLE dbo.Customer_Without_NullValues_VaryWidthType
(
 Id int identity(1,1) PRIMARY KEY
 , Name varchar(100) NOT NULL
 , Details1 varchar(4000) NULL
 , Details2 varchar(4000) NULL
 , Details3 varchar(4000)  NULL
);
GO

CREATE TABLE dbo.Customer_With_NullValues_VaryWidthType
(
 Id int identity(1,1) PRIMARY KEY
 , Name varchar(100) NOT NULL
 , Details1 varchar(4000) NULL
 , Details2 varchar(4000) NULL
 , Details3 varchar(4000)  NULL
);
GO

Next code insert 100,000 records for each table. However, last three columns of Second and Forth tables are filled with NULLs instead of known values.

INSERT INTO dbo.Customer_Without_NullValues_FixedWidthType
 (Name, DateOfBirth, RegisteredDate, LastPurchasedDate)
 VALUES
 ('a', getdate(), getdate(), getdate());

INSERT INTO dbo.Customer_With_NullValues_FixedWidthType
 (Name, DateOfBirth, RegisteredDate, LastPurchasedDate)
 VALUES
 ('a', null, null, null);

INSERT INTO dbo.Customer_Without_NullValues_VaryWidthType
 (Name, Details1, Details2, Details3)
 VALUES
 ('a', REPLICATE('a', 4000), REPLICATE('a', 4000), REPLICATE('a', 4000));

INSERT INTO dbo.Customer_With_NullValues_VaryWidthType
 (Name, Details1, Details2, Details3)
 VALUES
 ('a', null, null, null);

GO 100000

In order to see the space usage, easiest way is, check number of pages read for data retrieval.


As you see, space usage of First and Second table is same regardless of the value stored. I means, Fixed data types need the defined space whether the value is null or not. However, Third and Forth clearly shows that it is not the same with data type with vary length. When the data type is vary in length, it does not need the space defined with the type.

Sunday, February 26, 2017

Azure SQL Database or SQL Server on Azure Virtual Machine - SQL Server Brain Basher of the Week #066

Here is a modern interview question on SQL Server. In addition to general questions on administration and maintenance, Azure based questions have become common now because many maintain databases either in a VM hosted in cloud or as a Cloud-Managed-Database. Therefore, let's talk about an interview question related to Azure.

Here is the question. What are the factors you can use for determining whether the database has to be set up with a Virtual Machine hosted on Azure or set up as an Azure SQL Database?.



There is a very common answer for this question. If we want to get the entire database managed by Azure, such as Disaster Recovery, Backup, or Performance Tuning, then Azure SQL Database is the best. If we need to manage the instance by ourselves, then we need to go ahead with SQL Server on Azure Virtual Machine.

There is nothing wrong with the answer but it always better to know few more factors that can be used for deciding the way. Let's talk about some important factors.

Azure SQL Database is a relational Database-as-a-Service (DBaaS) (that falls under industry categories of Software-as-a-Service - SaaS and Platform-as-a-Service - PaaS) and SQL Server on Azure Virtual Machine is a Infrastructure-as-a-Services - IaaSAzure SQL Database is completely managed by Microsoft, you do not need to worry about the Operating System, configuring hardware, service packs, patches. But if it is SQL Server on Azure Virtual Machine, then we need to manage everything by us. It is good if you have a team or dedicated DBA for managing the database and you need to be the administrator of it, not Microsoft.

Azure SQL Database Server is logical server though it can hold multiple databases. It is not considered as an Instance that you configure in SQL Server on Azure Virtual Machine. You know that, you can have multiple unrestricted administrators in an instance which is possible with SQL Server on Azure Virtual Machine but not possible with Azure SQL Database.

Another key thing you need to remember is, Microsoft makes most of new features available initially with Azure SQL Database before making them available with other versions. That is why you see features such as Treat Detection, Temporal Tables with Retention Policies only available with Azure SQL Database. In addition to that, this has an Intelligence Database Service that recommends possible optimizations for performance tuning.

Do not forget, Azure SQL Database is limited to 1 TB whereas instance of SQL Server on Azure Virtual Machine can consume 64 TB with multiple databases.

We do not need to worry on High Availability with DBaaS as it is provided with the service. With IaaS needs a manual set up with multiple VMs using clustering + Availability Group as High Availability is only for the VM, not for the SQL Server instance.

Azure SQL Database has no hardware and administrative cost where as SQL Server on Azure Virtual Machine has administrative cost. When it comes to License cost, Azure SQL Database is sold as a service based on multiple tiers and SQL Server on Azure Virtual Machine comes with a license but you can use your own license if you need.

If you need to use other components such as Integration Services, Analysis Services or Reporting Services, then SQL Server on Azure Virtual Machine is the only option though some services such as Analysis Services available as a managed service.

There are few more tiny reasons for picking one over other. But as an Interviewee, the mentioned ones should be known.


Saturday, February 25, 2017

SQL Server Default and Rule objects - Should not I use them now?

In order to make sure that the database contains high quality data, we ensure data integrity with our data that refers to the consistency and accuracy of data stored. There are different types of data integrity that can be enforced at different levels of solutions. Among these types, we have three types called Domain, Entity and Referential Integrity that are specific to database level for enforcing data integrity.

For enforcing Domain Integrity, SQL Server has given two types of objects called Default and Rule. We have been using these objects for handling Domain Integrity but now it is not recommended to use these for enforcing Domain Integrity.

Let's try to understand what these objects first and see the usage. Default object can be used for creating an object that holds a default value and it can be bound to a column of the table. Rule is same as Default and it creates an object for maintaining rules for columns. See below code as an example.

USE tempdb;
GO

-- creating default object
CREATE DEFAULT CreditLimitDefault AS 10000;
GO

-- creating a sample table
CREATE TABLE dbo.Customer
(
 CustomerId int PRIMARY KEY
 , LastName varchar(50) NOT NULL
 , CreditLimit decimal(16,4) NOT NULL
);
GO

-- Binding the default to a column
-- The object can be bound to many tables
EXEC sp_bindefault 'CreditLimitDefault', 'dbo.Customer.CreditLimit';
GO

-- creating rule object
CREATE RULE CreditLimitRule AS @CreditLimit > 9000;
GO

-- Binding the rule to a column
-- The object can be bound to many tables
EXEC sp_bindrule 'CreditLimitRule', 'dbo.Customer.CreditLimit';

As you see, above code creates two objects, CreditLimitDefault and CreditLimitRule that are Default and Rule objects. These objects can be assigned to any column in any table.

As I mentioned above, it is not recommended to use them now as they are deprecated. It is recommended to use Default and Check constraints instead.

Read more on CREATE DEFAULT at: https://msdn.microsoft.com/en-us/library/ms173565.aspx

Friday, February 24, 2017

How to hide SysStartTime and SysEndEtime columns in Temporal Tables

Temporal table was introduced with SQL Server 2016 and it is designed to capture and store changes of data in tables. In other words, similar to Change Data Capture (CDC), Change Tracking (CT), Temporal table maintains the history with changed details.

Temporal table needs two additional columns called SysStartTime and SysEndTime. Once they are added, they can be seen with the table just like other columns and will be appeared with SELECT * statement. Although it is not recommended to write SELECT * type of query against tables, unfortunately it can still be seen with many application and the database I had to analyze today had similar codes in almost all areas in the application. I had to make two tables as Temporal Tables and I had to make sure that it does not break the existing application.

Fortunately, SQL Server has given a solution for handling it. I was able to alter the table and make it as a Temporal Table without making changes to any statement written in the application while making sure that SELECT * does not return newly added SysStartTime and SysEndTime columns.

If you use, HIDDEN keyword when creating the Temporal Table, it makes sure that these two columns are not appeared when SELECT * is performed. However, columns can be explicitly mentioned in the SELECT if required.

-- changing existing table by adding columns
ALTER TABLE dbo.Customer  
ADD  ValidFrom datetime2(0) GENERATED ALWAYS AS ROW START HIDDEN 
  CONSTRAINT DF_SysStartTime 
  DEFAULT CONVERT(datetime2 (0), '2017-02-24 00:00:00')
 , ValidTo datetime2(0) GENERATED ALWAYS AS ROW END HIDDEN
  CONSTRAINT DF_SysEndTime 
  DEFAULT CONVERT(datetime2 (0), '9999-12-31 23:59:59')
 , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo);  
GO

-- turning versioning on
ALTER TABLE dbo.Customer
SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.CustomerHistory));  
GO

-- Checking records  
SELECT * FROM dbo.Customer;  
SELECT *, ValidFrom, ValidTo FROM dbo.Customer;
SELECT * FROM dbo.CustomerHistory;  


Thursday, February 23, 2017

SQL Server Date, Datetime and Datetime2 - What should I use when I get the value as a string?

When we have to store a datatime value (or date), in most cases, application accepts the value as a datetime and send the value to SQL Server as a datetime value. However, if the value is sent as a string (Example, CSV upload), then what should be the best way of formatting the value and how we can convert it to datatime without making any mistake?

It is always recommended to use ISO 8601 standard when exchanging datatime values. The standard describes the way of passing a datetime value, generally it is YYYY-MM-DDTHH:MM:SS.sss. You can read more on this my post: Best way to pass datetime values to SQL Server – SS SLUG Dec 2013 – Brain Bashers - Demo III.

With SQL Server 2016, the default string format for dates is YYYY-MM-DD. If you pass the value with this format, regardless of the Current Language set, SQL Server will accurately read the value. However, this does not work with all datetime data types as expected. Have a look on the following code;

SET LANGUAGE English;

DECLARE @Date date
DECLARE @Datetime datetime
DECLARE @Datetime2 datetime2

SET @Date = '2017-05-06';
SET @Datetime = '2017-05-06';
SET @Datetime2 = '2017-05-06';

SELECT DATENAME(mm, @Date) As WithEnglish;
SELECT DATENAME(mm, @Datetime) As WithEnglish;
SELECT DATENAME(mm, @Datetime2)As WithEnglish;
GO


SET LANGUAGE German;

DECLARE @Date date
DECLARE @Datetime datetime
DECLARE @Datetime2 datetime2

SET @Date = '2017-05-06';
SET @Datetime = '2017-05-06';
SET @Datetime2 = '2017-05-06';


SELECT DATENAME(mm, @Date) As WithGerman;
SELECT DATENAME(mm, @Datetime) As WithGerman;
SELECT DATENAME(mm, @Datetime2) As WithGerman;


As you see, Datetime data type convert happens based on the language set but Date and Datetime2 data types are accurately interpreted regardless of the language set. This is something you have to remember. If you expect datetime values as string and settings related to the session can be changed, then it is always better to use either Date or Datetime2.

If you need to make sure that date is properly getting interpreted regardless of the settings (language), then stick into ISO 8601. If you change the values of variable as below, you will get the same month: May for both English and German.

SET @Date = '2017-05-06T00:00:00';
SET @Datetime = '2017-05-06T00:00:00';
SET @Datetime2 = '2017-05-06T00:00:00';

Note that ISO 8601 accepts a value like 24:00:00 for time for midnight but SQL Server does not support it.

Wednesday, February 22, 2017

How to change the default Fill Factor value of SQL Server instance

Fill Factor is a setting that uses with Indexes for determining how much free space remains on each leaf-level page for future operations (Insert or Update). If you do not specify the value for Fill Factor when creating the index, the default value is set with it, which is 0.

Is it possible to change the default value? Yes, it is possible. The current value set can be seen using sp_configure;

SP_CONFIGURE;

Here is the output of it.


If you create a table like below in tempdb database;

USE tempdb;
GO

CREATE TABLE dbo.Customer
(
 CustomerId int PRIMARY KEY
 , LastName varchar(20) NOT NULL 
  INDEX IX_Customer_LastName NONCLUSTERED
);
GO

and check the Properties of the index created, you will see that Fill Factor of it is 0.


You can change the default value using sp_configure. Once it is change, you need a restart of the service in order to get the value applied.

SP_CONFIGURE 'fill factor (%)', 80;

If you recreate the table again, you will see the Fill Factor as 80% now.


Tuesday, February 21, 2017

SQL Server Sequence suddenly starts with -2147483648

Have you faced this? Assume that you have created a Sequence object for generating values sequentially for tables which data type is set as int, and suddenly it shows the next number as -2147483648.

See below code;

-- creating sequence
CREATE SEQUENCE [dbo].[OrderSequence]
AS int
START WITH 1 
INCREMENT BY 1 
MAXVALUE 2 
CYCLE
GO

-- requesting numbers three times
SELECT (NEXT VALUE FOR   dbo.OrderSequence) AS OrderSequence
GO 3



What could be the reason? If you analyze the code written above, you can easily find the issue. I have not used MINVALUE property when creating the Sequence, hence it takes the lowest value of the data type set for the Sequence, which is -2147483648 for int data type. You may experience the same, if so, check and see whether the MINVALUE has been set or not.

Monday, February 20, 2017

Can we access the SQL Server temporary table created in different database?

Temporary tables are nothing new and we have been using this for long time. There are two types of temporary tables; Local that starts with single pound sign (#) and Global that starts with double pound signs (##). Local Temporary Tables are limited to the connection created and will be discarded automatically when the connection is disconnected. Global Temporary Tables are global to the instance and it can be accessed by the anyone connected the instance. It will be dropped automatically when the last referenced connection is dropped.

Now the question is, when a Local Temporary Table is created, can I access it in another database?

Answer is yes and no. See the code below.

USE Sales;
GO

CREATE TABLE #TempTable
(
 Id int
);
GO

-- This works without any issue
SELECT * FROM #TempTable;

The created table can be access without any issue because access is done in the same database using the same connection. If we try the SELECT in another database with different window (different connection);

USE Sales;
GO

-- This will throw "Invalid object name '#TempTable'." error.
SELECT * FROM #TempTable;

You will see an error as above. However if I try to access the table from the same connection but different database;

--USE Sales;
--GO

--CREATE TABLE #TempTable
--(
-- Id int
--);
--GO

--SELECT * FROM #TempTable;

-- Same first connection but different database
USE AdventureWorks2014;
GO

-- This will work
SELECT * FROM #TempTable;

As you see, it is possible. Remember, Local Temporary Tables are limited to the connection, not to the database created, hence the created table can be accessed within any database as long as the connection is same.

Sunday, February 19, 2017

Changing Schema of SQL Server objects

I had a requirement today to change the schema of set of tables to new schema but I did not find a direct method to change the schema of all objects using a single statement. The ALTER SCHEMA supports transferring one object from one schema to another but it cannot be executed against multiple tables.

USE AdventureWorks2014;
GO

-- transferring Person table from Person Schema to Sales
-- Once executed, tables becomes Sales.Person
ALTER SCHEMA Sales TRANSFER Person.Person;

Therefore I wrote a simple code for transferring multiple tables (of course code can be changed for addressing any object type) and thought to share it because you may look for something similar if you have the same need.

USE AdventureWorks2014;
GO


DECLARE @TableNames TABLE (Id int identity(1,1) PRIMARY KEY, Name varchar(500));
DECLARE @Messages TABLE (Id int identity(1,1) PRIMARY KEY, Message varchar(1000));

DECLARE @TableName varchar(500);
DECLARE @TableId int = 1
DECLARE @NewSchema varchar(20) = 'Sales';
DECLARE @OldSchema varchar(20) = 'Production';
DECLARE @Statement varchar(500)

-- table all table names needed
INSERT INTO @TableNames
 (Name)
SELECT s.name + '.' + t.name TableName
FROM sys.tables t
 INNER JOIN sys.schemas s
  ON t.schema_id = s.schema_id
WHERE s.name = @OldSchema
 AND t.type = 'U';

-- making the ALTER SCHEMA statement for all tables
-- and execute them using EXEC
WHILE EXISTS (SELECT * FROM @TableNames WHERE Id = @TableId)
BEGIN
 SELECT @TableName = Name FROM @TableNames WHERE Id = @TableId;
 SET @Statement = 'ALTER SCHEMA ' + @NewSchema + ' TRANSFER ' + @TableName;
 BEGIN TRY

  EXEC (@Statement);
  -- record success message
  INSERT INTO @Messages (Message) VALUES ('Successfully transfered ' + @TableName + ' to ' + @NewSchema);
 END TRY
 BEGIN CATCH
  
  -- record the error
  INSERT INTO @Messages (Message) VALUES ('Transfer unsuccessful: ' + @TableName + ' [' + ERROR_MESSAGE());
 END CATCH

 SET @TableId += 1;
END

-- checking the output
SELECT * FROM @Messages;


Thursday, February 16, 2017

Creating multiple administrative accounts in Azure SQL Database Server

You know that we can create multiple administrator accounts in SQL Server if it is a on-premises instance or an instance configured in a VM (hosted in cloud or on-premises). What we generally do is, create a login and add the login to sysadmin fixed server role. Can we do the same in Azure SQL Database (Server)?

There is only one unrestricted administrative account that can be be created in Azure SQL Database Server. It is created when creating the server. In addition to that, you can add either one Azure Active Directory Account or Security Group Account (that has multiple accounts as members). If you open the Properties of your server, you can see your administrative accounts;


However, if you need to add multiple administrative accounts (not unrestricted administrative accounts), then there is a way of adding them. There are two server-level administrative roles that are available in the master database which user accounts can be added for granting permissions for creating and managing databases and logins.

The first role is dbmanager. This role has permission to create new databases, hence members of this role can create databases. This role exist in the master database. Therefore, only users in the master database can become members of this role.

-- In master database
-- creating a user
CREATE USER Dinesh2 WITH PASSWORD = 'Pa$$w0rd';

-- adding the user to dbmanager role
ALTER ROLE dbmanager ADD MEMBER Dinesh2;

The other role is loginmanager. This role has permissions for creating logins in the master database. Again, this role is master database, hence only users in master database can become members of it.

-- In master database
-- creating a user
CREATE USER Dinesh3 WITH PASSWORD = 'Pa$$w0rd';

-- adding the user to loginmanager role
ALTER ROLE loginmanager ADD MEMBER Dinesh2;

Wednesday, February 15, 2017

Should I create Azure SQL Logins or Azure SQL Users?

Unless it is a contained database, we must create a login for connecting with SQL Server in order to access a database. Databases are accessed using Users and Users are created using Logins. Is it same with Azure SQL Databases? Should I create a login before creating a user?


With Azure SQL Database, it is not a must as Azure SQL database is a contained database (portable). We can create a user account with a password for accessing the database without creating a login. However, it allows us to create a login in the master and then create a user in our database using the login created in the master.

What if I created a user in my database using a login?
The main advantage is maintenance. If the login needs to be dropped or disabled, it can be simply done in the master database. Not only that, since multiple user accounts can be created with multiple databases using the same login, one login can be used for connecting the multiple databases.
The biggest disadvantage with this is, database becomes non-portable. If databases needs to be copied, moved or replicated, login must be created with new servers.

What if I created a user in my database without creating a login?
The main benefit we get is a portable database. Since all users with passwords are maintained inside the database, it can be copied, moved, replicated without any issue.
However, if the environment has multiple databases and users need to access more than one database, then multiple user accounts have to be created.

Monday, February 13, 2017

The Integration between Azure Data Lake Store and Azure SQL Data Warehouse

As a part of my presentation, I had to show the integration between Azure Data Lake Store (ADL) and Azure SQL Data Warehouse (Azure DW), hence to created the image below;


Generally ADL is used for storing unlimited data in any format (structured and unstructured) and processing them using various tools given with Massively Parallel Processing (MPP). We use Azure DW for storing a large volume of structured data, again using MPP. In most cases, Azure SQL DW is the source for the Business Intelligence. Earlier, Azure DW could not ingest data from ADL but now it is possible with PolyBase, hence, best practice is, load data into ADL, process them and then transfer to Azure SQL DW for reporting.