Wednesday, June 21, 2017

Power BI Parameters - What are they and when they can be used?

Microsoft Power BI supports adding parameters for queries and use them with various areas. However, since it has some limitations and cannot use with all areas, it has not been noticed much. Let's talk about Power BI parameters and see the way of using it.

Let's take a scenario and see how parameters can help on that. I have a set of CSV files that shows monthly sales. We need to to create a report using one of the files and then publish. Whenever the report needs to be refreshed for a different file, I need to change the data source by changing many things. What if I can just open the report using Power BI Desktop and change the source using a menu and get the report refreshed rather changing many things?

It can be done with Power BI Parameters. We can create a parameter with predefined values (in this case, source file names along with paths) and use it for changing the source.

Power BI Parameter can be created in three ways;
  • Parameter that accepts a value via an input box
  • Parameter that is formed using a predefined list
  • Parameter that is formed using another query
Parameter with a predefined list
Let's focus on the second one first. Here are the steps for creating a Power BI report with a parameter;

1. Open Power BI and connect with a CSV file. In my case, the file name is Sales_201501.csv.


2. Once the file is loaded, click Edit Queries in Home ribbon for opening Query Editor. You should one query under Queries

Parameters can be created only with Query Editor. However, created parameters can be accessed in both Query Editor and Data Model.

3. Click Manage Parameters in Home ribbon. Once the dialog box is opened, click New to create a new Parameter.

4. Name the parameter as SourceFile

5. Select Text for Type.

6. Select List of values for Suggested values.

7. Enter File Names along with paths.

8. Set Default Value as you need. This is the default value for the designer.

9. Set Current Value as you need. This setting is for the end user, or to use with the report level.


Now you should see the parameter in Queries. If you want to open it as another table in the data model, you can right click on it and select Enable Load. We do not need to enable this option for using the parameter at report level.

Now we need to make sure that source of our file is set with the created parameter. There are two ways of doing it. One is open the Advanced Editor and change M Code. Other way is, using Data source settings. Let's use the second option. If you need to know how the M Code is getting changed, open it and see after the change.

10. Click Data source settings in the Home ribbon. This opens the Data Source Setting dialog box.


11. Select the source and click Change Source.... This opens a dialog box that allows you to change settings related to your source.

12. You should notice that the button given for File Path has an option for creating parameters and selecting a parameter.


13. Select Parameter as the option and select name parameter created.


14. Click OK and close the Data source settings.

15. All set. Click Close & Apply for saving the queries and closing Query Editor.

16. Create a visual as you can distinguish results when different file is selected. This is what I created from my file.


Let's change the source using the parameter now.

17. Click down-arrow in Edit Queries in Home ribbon. You should see three options. Select the last one that is Edit Parameter.


18. You should see Enter Parameters dialog box that shows all parameters. Change the source file to different one.


19. Once clicked OK, you should be prompted for Applying. Apply the settings and you should see the report with new source now.


Like this way, we can use the parameter in other sections like Filtering rows in Query Editor, Replacing, etc. We still cannot use the parameter with DAX and it is one limitations.

Saturday, June 17, 2017

Relational Algebra - Joins - Theta Join, Equijoin, Natural Join, Outer Join, Semijoin

I wrote a post on Relational Algebra that discusses most of operations related to it. This is the continuation of it and this explains Join Operations related to Relational Algebra. You may find that it is different with Joins we do now but it is the foundation for all type of joins we do with our relational databases.

Join Operations

Join is one of the main operations we perform against relations (tables) for retrieving data. It is done over the Cartesian Product of the two operand relations, using a SELECT statement with a Predicate. We are familiar with Joins like INNER JOIN, OUTER JOIN and CROSS JOIN but initially there were introduced with types like Theta Join, Equijoin, Natural Join, Outer Join and Semijoin. Modern DBMSs have enhanced these and have different implementations and that is why we do not see these types with mentioned names. But let's try to understand each of these types and how they are represented with Relational Algebra. Note that I have used different SQL Statement but it can be implemented using many techniques. Remember, almost all Joins are based on Cartesian Products.

Theta Join

This is based on a Predicate added to a Cartesian Product. In simple term, if you have joined two tables using CROSS JOIN, then you can add a filter to the result using one of the comparison operators. See the example given. Note that it can be implemented using SELECTION over a Cartesian Product as well.


Equijoin

This is same as Theta Join but the comparison operator is equal. Generally, if the operator of the Theta Join is equal operator (=), then the join is called as Equijoin instead of Theta Join, Here are two examples;



Natural Join

Natural Join is an Equijoin of two relations over all common attributes. In other words, when joining two tables, join is done using all common columns. Therefore, explicit Predicate is not required. See the sample given. I have used NATURAL JOIN which is not available with some DBMSs. Note that Common Attributes are not duplicated.


Outer Join

This join type includes both matching and no matching values from one relation and matching values from the other relation when two relations are joined. The relation that returns all tuples is determined using the Symbol used for the operation. If the Symbol is opened for the Left Relation, it is considered as the relation that returns all tuples. This is implemented using either LEFT or RIGHT in SQL.


Semijoin

Here is the last one. This join performs a join operations over two relations and projects over the attributes of first operand (or the relation). With this join, tuples can be limited for the join operation by adding a predicate, increasing the performance of the join operation.

 

Wednesday, June 14, 2017

Power BI - Scheduled Refresh section is not available in Settings

Power BI Desktop allows us to import data from files such as CSV and Excel. Once imported, it automatically creates a Model that can be used for creating reports. Good thing is, when we publish the report, Dataset that is created with the model is also getting published. This makes sure that we can create a schedule and refresh the Dataset published for seeing latest.

See below image. It shows the Gateway connection and Scheduled refresh settings for an uploaded text file. The Dataset created for the file is PowerBI Test Report.


Now if I get the settings related to the Dataset called Sales_201501;


I will not see same settings. What is the reason? It is also based on a text file.

This will happen when the uploaded Dataset is not a Model. This file has been uploaded using Power BI Services (app.powerbi.com), not using Power BI Desktop. Unlike Power BI Desktop, Power BI Services does not create a Model when a file is uploaded and uploaded file will be sill treated as a Dataset but not as a Model.

If you need to get your Dataset continuously refreshed by creating a schedule, then make sure that it is created using Power BI Desktop.

You can get the uploaded file refreshed automatically if the file is in either SharePoint or OneDrive.

Saturday, June 10, 2017

Power BI - Cannot highlight or pin visuals when grouped with shapes

Power BI allows us to add shapes like rectangle or oval for marking specific area, showing covered visuals as they are related and operate together. This is really useful when you need to highlight a group of visuals that has same behavior or the visuals are related to same business process.


When you have visuals, you can add the required shape and change properties as you need. The below image shows an added rectangle with specific Title and filled color with transperancy set to 75%.


Once added and set, it can be nicely seen with the report. But you may notice that you have lost some of the functionalities. You cannot highlight a slice of Pie Chart and make other visuals filtered. You cannot pin the visuals for Dashboards when they are published to online service.


However, this can be handled. The reason for this is, the added rectangle appears on top of visuals. If you send the rectangle "back", then this issue gets sorted.


As you see, now you can highlight slices in the Pie Chart and you can pin visuals to a Dashboard.


Wednesday, June 7, 2017

Naming columns in the View

How do you name columns returned from your created view? Generally, you do not need to explicitly name them as the names of view-columns are derived from base-tables. But, if there is any ambiguity in the name for a column or calculated column, then it must be named explicitly.

What are the ways of naming columns?

There are two ways of naming columns in the view. One way is, mentioning names along with CREATE VIEW statement. This requires names for all columns, you cannot just name set of columns in the view. Here is an example of it;


Other way is, adding aliases to columns. Most of us prefer this way because we do not need to name all columns explicitly. In most cases, we need to name columns that has ambiguity in the name or it is a calculated column. Remember if above technique is used for naming columns, aliases added will be overwritten.



Tuesday, June 6, 2017

Creating Horizontal Views and Vertical Views

Everyone has either used or created views for getting or creating a dynamic result using one or multiple tables. And everyone knows the usage of views and purpose of creating them. But many do not how they have been categorized; such as Horizontal and Vertical. Here is a simple explanation on them.

Horizontal Views

If you think about the standard definition of a view, it is a virtual relation (table) that is result of one or more relational operations on one or more relations (tables). In simple terms, is a logical table defined with a SELECT statement based on one or more tables. Now, what is a Horizontal View? It is a view that limits the number of records to be produced. We create Horizontal Views by adding WHERE clause.

USE AdventureWorks2014;
GO

CREATE VIEW dbo.Sales_2012
AS
SELECT * FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2012;
GO

Vertical Views

This view results set of selected columns from one or more tables. We implement this by selecting required columns, optionally adding the WHERE clause.

USE AdventureWorks2014;
GO

CREATE VIEW dbo.Sales_2012_Summary
AS
SELECT SalesOrderNumber, OrderDate, SubTotal 
FROM Sales.SalesOrderHeader
WHERE YEAR(OrderDate) = 2012;
GO

In addition to that, there is another type called Grouped and joined views that is actually either a Horizontal View or Vertical View with aggregations and multiple joins, hence I do not think that it has to be taken as a major type.


Monday, June 5, 2017

What should be considered on Visualization?

Creating a report with stunning, meaningful and appropriate visuals is the key for delivering the information quickly, hence special consideration has to be made when designing reports. Otherwise, the created report will not be attractive and users will not see what they want and as they want.


There are few things need to be considered when designing reports;
  • Placement - you need to make sure that visuals added to the report are placed or positioned at the right place. How do we decide? Generally, most important visuals (A KPI for an example) should be placed in the top-left corner of the report because it is the place everyone looks at immediately. Not only that, when the report is opened using a mobile device, this makes sure that the important ones are displayed first. In addition to that, make sure the space between visuals is consistence and sized appropriately.
  • Appropriate Visuals - If something can be displayed using a Column Chart does not mean that it is the best visual for showing it. Therefore, right visual should be selected for displaying information. For an example, if just one value has to be shown, something like Revenue for the month, a visual like KPI or Card can be used. If something needs to be shown with values against a set of parameters, something like, showing revenue by month should be displayed with Column Chart. If something needs to be shown against set of parameters using percentage, not values, Pie or Donut will be appropriate.
  • Story Telling - Make sure that you report contains relevant and related information only, it should not be cluttered. If the report is created for Sales Department, make sure it shows information related to it. Not only that, if the report contains set of visuals related another area, example, a process, highlight it or group them. You can add a unfilled rectangle covering visuals related.
  • Formatting - The size of the visual, font used, font size, color and labels, all matters. Make sure that the title is properly displayed for visuals and you can make some visuals large and some small based on the importance of them. Same goes for font size. If it is a KPI and it is most important one, increasing the font size of it will definitely add a value to the report. Another important thing in formatting is, showing numerals properly. No need to show the whole number unless the visual is something like table. Generally, if the number is really large, consider to show it in thousands or millions.
  • Coloring - There are two things to consider with coloring; background color and colors on visuals. Generally light background color will do a lot as visuals can be highlighted with most of the colors. Therefore it is recommended to make the background light, something like white. However, making the background dark has become a trend because it makes the dashboard elegant. But when selecting colors for visuals, you need to be very careful as dark colors in visuals may not blend properly with a dark background. Another consideration is, company theme. Your company may have a color theme for all web applications and you might have to stick to it. If so, select appropriate colors for visuals based on the theme.
    What about colors used with visuals? One thing you need to remember is, the standard color-set used with BI. Generally, we use Green for Excellent, Amber for Neutral and Red for worse. Therefore, you should use a color like Red for indicating a success of something. Another thing is, do not use all these three colors with all types of visuals. For an example, when a KPI is shown, one color is used for either coloring font or background for the visual that is fine. Similarly, these three colors can be used with a tabular visual with all possible cells that is something you need to consider. You may see your table busy when many cells are colored with all three colors, hence it is better to color only one type of cells with one color.
There are many more tiny things to consider but above considered as the most important considerations. List anything if you think that I have missed some.

Sunday, June 4, 2017

Calculate the average value against a large table - SQL Server Brain Basher of the Week #068

Let's talk about another Interview Question related to SQL Server development. This is about one of the aggregate functions that is AVG. Here is the question;

You have a table called SalesOrderHeader and it has millions of records. It has a column called Freight and you need to the average of it. You need to make sure that only 10% of records is used for calculating the average.

What would be the best way? The standard answer is, write a query to get 10% of records using TOP and calculate the average.

SELECT AVG(t.Freight) AverageFreight
FROM (SELECT TOP 10 PERCENT Freight
 FROM Sales.SalesOrderHeader) t

But this might not give you the accurate average as you consider only first set of records. It will be more accurate if you pick records randomly and then calculate the average. You may add ORDER BY to your statement with NEW_ID function.

SELECT AVG(t.Freight) AverageFreight
FROM (SELECT TOP 10 PERCENT Freight
 FROM Sales.SalesOrderHeader
 ORDER BY NEWID()) t

Second method gives the most accurate value as it picks records randomly. However the cost is high with the statement. There is another way to achieve the same. It is using TABLESAMPLE operator.

SELECT AVG(Freight) AverageFreight
FROM Sales.SalesOrderHeader
TABLESAMPLE (10 PERCENT)

Here is the comparison between both methods. Notice the total cost.


As you see, TABLESAMPLE gives better performance than the first method.

Since it picks records randomly, the average it returns different at each execution. If you need to same value for all your executions, REPEATABLE option has to be used with repeat_seed

SELECT AVG(Freight) AverageFreight
FROM Sales.SalesOrderHeader
TABLESAMPLE (10 PERCENT) REPEATABLE (1)

You will get the same average as long as the repeat_seed is same.

Saturday, June 3, 2017

Power BI - No page to enter SQL Server credentials when connecting - Here is the reason

Have you experienced that Power BI just connects with your SQL Server without requesting your login credentials? Or you might need to connect with your SQL Server using someone else credentials but Power BI continues to the next page without prompting the page related to credentials.

This is not a bug or issue, this is because of the cache maintained by Power BI Desktop.

When I try to connect with my SQL Server using Power BI Desktop, I get the initial page where I can enter server name and database name. But when I click on OK, it straightaway opens the page for table selection instead of showing the page related to authentication.


I know that this happens because I have connected to the same server before and my connection is cached. What if I want to connect the same using a different account?

Clearing the data source cache

It is possible to clear the saved information related to previously connected sources. You can open this setting using File menu -> Options and settings -> Data source settings. This page shows all connections you have made previously; Data sources in current file shows all connections you have made with opened file and Global permissions shows all connections you have made with all files


You can clear cached permissions by selecting the connection and clicking Clear Permissions. Once cleared, you should see the page that requests credentials for connecting with the source.


Friday, June 2, 2017

Power BI does not show all columns in SQL Server View

I experienced a strange issue with Power BI today, I might have missed something or there is something unknown to me. This is what I experienced; Power BI does not show all columns when trying to import records from a SQL Server view.

Just for simulating the issue, I created a view with AdventureWorks2014, using three tables, Product, ProductSubCategory, and ProductCategory, and named as vProduct.

USE [AdventureWorks2014]
GO

CREATE OR ALTER VIEW dbo.vProduct
AS
 SELECT p.ProductID, p.Name Product, s.Name SubCategory, c.Name Category
 FROM Production.Product p
  INNER JOIN Production.ProductSubcategory s
   ON p.ProductSubcategoryID = s.ProductSubcategoryID
  INNER JOIN Production.ProductCategory c
   ON s.ProductCategoryID = c.ProductCategoryID;
GO

Then I granted SELECT permission to Jack user.

GRANT SELECT ON dbo.vProduct TO Jack;
GO

If Jack executes a SELECT against the view, he sees records with all columns.


Now, if Jack connects to the database with Power BI, this is what he sees.


Anyone has experienced this? I made a search but could not find a solution. I might have made a mistake or missed something but could not figure it out. Appreciate if you can share your thoughts or a solution if you have already seen this and sorted out.

I used the latest Power BI version that is May 2017.

Note that I see the same result for both Import and Direct Query.

Wednesday, May 31, 2017

SQL Server AUTOGROW_ALL_FILES Option

Few days back, I wrote an article on SQL Server default behavior when it expands files with AutoGrowth enabled. Where there are multiple files in the file group and all files are fully filled, SQL Server expands only one file at a time, making the data distribution inconsistence. Read more on it: SQL Server does not always write data to files, proportional to the amount of free space in each file

There is a facility to change this behavior with SQL Server 2016. By default File Group is set to AUTOGROW_SINGLE_FILE but if we set it to AUTOGROW_ALL_FILES, then it changes the default behavior and grows all files when required.

Let me take the same code used with my previous code and show you.

Let's create a database with an additional file group that contains two file. Let's set the new file group as the default and create a table on it. The below code does it and inserts 900 records to the table. And it checks the spaced used;


USE master;
GO

DROP DATABASE TestDatabase;
GO

-- Two additional data files are added under FG1
CREATE DATABASE [TestDatabase]
 ON  PRIMARY 
( NAME = N'TestDatabase'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase.mdf' 
 , SIZE = 4MB , MAXSIZE = UNLIMITED, FILEGROWTH = 2MB), 
FILEGROUP [FG1]  DEFAULT
( NAME = N'TestDatabase_Data1'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase_Data1.ndf' 
 , SIZE = 4MB , MAXSIZE = UNLIMITED, FILEGROWTH = 2MB),
( NAME = N'TestDatabase_Data2'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase_Data2.ndf' 
 , SIZE = 4MB , MAXSIZE = UNLIMITED, FILEGROWTH = 2MB)
 LOG ON 
( NAME = N'TestDatabase_log'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase_log.ldf' 
 , SIZE = 10MB , FILEGROWTH = 5MB )
GO


USE TestDatabase;
GO

-- Create a table
-- This will be created on FG1 as it is the default
CREATE TABLE dbo.TestTable
(
 Id int identity(1,1) Primary Key
 , Name char(8000) not null
);
GO

-- Inserting 900 records
INSERT INTO dbo.TestTable
 (Name) VALUES (Replicate('a', 8000));
GO 900

SELECT 
 file_id, name, physical_name
 , Convert(Decimal(12,2),Round(size/128.000,2)) AS [FILE_SIZE_MB]
 , Convert(Decimal(12,2),Round(FileProperty(name,'SpaceUsed')/128.000,2)) AS [SPACE_USED_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000,2)) AS [FREE_SPACE_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000/(size/128.000),2)) AS [FREE_PCT_MB]
FROM sys.database_files;


As you see, both files are almost filled. Let's change the default setting of the file group and insert some additional records.

ALTER DATABASE TestDatabase MODIFY FILEGROUP FG1 AUTOGROW_ALL_FILES;

Done. Now let's see whether only one file or both files have been grown.

SELECT 
 file_id, name, physical_name
 , Convert(Decimal(12,2),Round(size/128.000,2)) AS [FILE_SIZE_MB]
 , Convert(Decimal(12,2),Round(FileProperty(name,'SpaceUsed')/128.000,2)) AS [SPACE_USED_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000,2)) AS [FREE_SPACE_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000/(size/128.000),2)) AS [FREE_PCT_MB]
FROM sys.database_files;


Both files have been grown. This makes sure that data is getting distributed properly and I believe that this should be the standard setting for our databases.

Tuesday, May 30, 2017

Power BI - Changes you should know on Power BI Free Account from June 01

As per the announcement made by Microsoft on Power BI early this month,  Following things will be changed on Power BI Free account;



  • Almost all Power BI Pro features will be available for Free accounts, except sharing and collaboration. Sharing and Collaboration will only be available for Pro account.
  • All types of data sources can be accessed using Free account.
  • Following are increased: 10GB storage, 8 times daily refresh and 1 million rows/hour streaming data.
  • Capabilities NOT included: Group workspaces (now it is app-workspace), Export to PowerPoint, Export to CSV/Excel and Analyze in Excel.
As you see, some of important things are going to vanish while some are getting added. If you have been using Free account, you will be offered a Pro trial account for one year to continue with what you have been doing.

For more info on this, read this thread.

Monday, May 29, 2017

Understanding Relational Algebra

When the relational model was introduced, in order to work with the model or in order to retrieve or update data in the model, languages were introduced. They are called as relational languages. Initially, two languages: Relational Algebra and Relational Calculus were introduced by Codd during 1971 as basis of relational languages.

When you start studying databases and its related languages, these are the two main languages you learn first, and of course, they are not much user-friendly. If you have worked with modern Database Management Systems such as Microsoft SQL Server or Oracle, then you know that how TSQL or PLSQL is powerful and richer when you compare to Relational Algebra and Relational Calculus. However, if you are to study database and related languages, it is always better to start with basis as it explains the fundamentals.

The Relational Algebra

It is a high-level procedural language. It describes how an operation is performed between two relations (tables: that is the word familiar to us that results another relation. The operation is described with expressions and it can be nested, means, an output of one relation can be used to performed another operation.


This language is considered as theoretical language as well as relation-at-a-time language. It manipulates all tuples (records) using one statement without looping them. There are many different types of operations in Relational Algebra but Codd originally proposed eight operations and they are categoried into two: Unary and Binary.

Unary Operations

There are two operations defined under this: Selection and Projection. They are called as unary operations since they operate on only one relation.

Binary Operations

There are six operations under this category: Cartesian Product, Union, Set Difference, Join, Intersection and Division. Unlike unary operations, they work on pairs of relations.

Here are some sample images I used in my classes. Note that SQLs can be written in different ways for getting the output. The one I have written is not the only way for getting the result.

Symbols

There are set of symbols that are used with operations, here are the standard symbols used with Relational Algebra.


Selection

This operation returns tuples from a single relation based on the specified predicate (condition). Multiple predicates can be added using AND, OR and NOT.


Projections

This operation returns a relation that contains a vertical subset of used relation. In other words, it returns set of tuples only with specified attributes.


Cartesian Product

This operation creates a relation combining two relations, concatenating every tuple in one relation with every tuple in other relation. In simple term, if the first table has 100 records with 5 attributes and other table has 50 records with 10 attributes, this creates an output with 5000 records (100 * 50) and 15 attributes (5 + 10).


Union

This operation makes a relation containing all tuples from first relation or second relation or both first and second relations. This eliminates duplicates. Remember, both relations used must be union-compatible.


Set Difference

This operation creates a relation containing tuples in first relation that are not in second relation. Just like the previous one, both relations must be union-compatible.


Intersection

This operation creates a relation containing tuples in both first and second relations. Both relations must be union-compatible.


Division

This operation creates a relation containing selected attributes in first relation, matching with every tuple in second relation. See the image; It tries to answer Which customers are registered from ALL the countries ranked as 2.


I have made a separate post on Joins. Here is the link for it: http://dinesql.blogspot.com/2017/06/relational-algebra-joins-theta-join-equijoin-natural-join-outer-join-semijoin.html

Wednesday, May 24, 2017

SQL Server bcp Error - Unexpected EOF file encountered in BCP file

Here is another issue I experienced today with bcp utility (see my previous post related to this: SQL Server bcp Error - Invalid ordinal for field in xml format). The file I tried imported was UNIX-based file and bcp did not identify the proper row-terminator;


I was sure about the row-terminator but this did not work even with BULK INSERT. As usual, did a search, then found a suggestion to use hexadecimal value for row-terminator instead of \n.

And it worked.


You may experience the same, try this and see whether it solves the issue.

Tuesday, May 23, 2017

SQL Server bcp Error - Invalid ordinal for field in xml format

We still use bcp utility for importing or exporting data, specifically when we have to work with large amount of data (I have written post on this, comparing different methods for loading data, read it for more info: How to load/insert large amount of data efficiently). In order to load data from a file, need to instruct the utility with the structure of the file and mapping, and instruction can be passed either specifying details with switches along with bcp command or adding a format file to the command.

If you need to use a format file, it needs to be generated first. We can use same bcp command for generating it.

bcp tempdb.dbo.Customer format nul -T -S DINESH-PC\SQL2014 -c -t, -x -f "E:\Temp\Customer_c.xml"

for more information on bcp command, read this.

Sometime it does not work as we expect. You may experience the following error when you try to generate the file;

SQLState = HY000, NativeError = 0
Error = [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Invalid ordinal for field 3 in xml format file.

If experience it, the main reason could be Column names with spaces. Check and see whether the table used for generating the xml file has column names with spaces. If so, you will surely get this error.

Here is an example. As you see, if you try to execute the above bcp command against this table, you will get this error.

USE tempdb;
GO

IF OBJECT_ID('dbo.Customer') IS NOT NULL
 DROP TABLE dbo.Customer;
GO
CREATE TABLE dbo.Customer
(
 CustomerId int PRIMARY KEY
 , [Customer Code] char(5) NOT NULL
 , [Customer Name] varchar(50) NOT NULL
 , [Credit Limit] money NOT NULL
);
GO


If you remove spaces from columns, there will be no errors and file will be generated.


Monday, May 22, 2017

SQL Server does not always write data to files, proportional to the amount of free space in each file

We add multiple data files to file groups to get data distributed (it is always better to get them distributed among multiple physical volumes) for improving the performance. But does it guarantee that it always distributes data among files?

This is something every DBA should know. There are certain situations that SQL Server does not write data to all files allocated to the file group though there are multiple files in the same file group. Generally, SQL Server uses round-robin proportional fill algorithm to allocate extents. This means, when there are two data files allocated to one file group and there is a table created on that file group, allocations are made in each data file proportional to the amount of free space in each file. Example, if the first file has 60MB free space and second file has 40MB free space, when 10MB is written to the table, 6MB is written to first file and 4MB is written to the second file. This makes sure that data is distributed properly for improving the performance.

This behavior gets changed when data files are expanded automatically. Files are expanded when the space of them are exhausted and AutoGrowth is enabled. This expansion happens one at a time, again in round-robin way. Continuing with above example, if space of files are exhausted, SQL Server expands the first file based on the AutoGrowth setting, without expanding the second file. When the first file is exhausted, it expands the second file without expanding the first file. This makes data distribution uneven. When only first file is expanded, data is written only to the first file, means data is not getting stripped across all files. This is where you see the issue.

See this code. It creates a database with three data files; one file under Primary File Group and two files under FG1 File Group. Then it creates a table on FG1 and inserts set of records.

USE master;
GO

DROP DATABASE TestDatabase;
GO

-- Two additional data files are added under FG1
CREATE DATABASE [TestDatabase]
 ON  PRIMARY 
( NAME = N'TestDatabase'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase.mdf' 
 , SIZE = 4MB , MAXSIZE = UNLIMITED, FILEGROWTH = 2MB), 
FILEGROUP [FG1]  DEFAULT
( NAME = N'TestDatabase_Data1'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase_Data1.ndf' 
 , SIZE = 4MB , MAXSIZE = UNLIMITED, FILEGROWTH = 2MB),
( NAME = N'TestDatabase_Data2'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase_Data2.ndf' 
 , SIZE = 4MB , MAXSIZE = UNLIMITED, FILEGROWTH = 2MB)
 LOG ON 
( NAME = N'TestDatabase_log'
 , FILENAME = N'D:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\DATA\TestDatabase_log.ldf' 
 , SIZE = 10MB , FILEGROWTH = 5MB )
GO


USE TestDatabase;
GO

-- Create a table
-- This will be created on FG1 as it is the default
CREATE TABLE dbo.TestTable
(
 Id int identity(1,1) Primary Key
 , Name char(8000) not null
);
GO

-- Inserting 900 records
INSERT INTO dbo.TestTable
 (Name) VALUES (Replicate('a', 8000));
GO 900

SELECT 
 file_id, name, physical_name
 , Convert(Decimal(12,2),Round(size/128.000,2)) AS [FILE_SIZE_MB]
 , Convert(Decimal(12,2),Round(FileProperty(name,'SpaceUsed')/128.000,2)) AS [SPACE_USED_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000,2)) AS [FREE_SPACE_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000/(size/128.000),2)) AS [FREE_PCT_MB]
FROM sys.database_files;

See the result when we check the file sizes. Focus on TestDatabase_Data1 and TestDatabase_Data2 files. They are exhausted.


Since we have enabled AutoGrowth, files will be expanded if we enter more data. Let's enter some more data and see whether both files are getting expanded.

-- Inserting 100 more records
INSERT INTO dbo.TestTable
 (Name) VALUES (REPLICATE('a', 8000));
GO 100

SELECT 
 file_id, name, physical_name
 , Convert(Decimal(12,2),Round(size/128.000,2)) AS [FILE_SIZE_MB]
 , Convert(Decimal(12,2),Round(FileProperty(name,'SpaceUsed')/128.000,2)) AS [SPACE_USED_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000,2)) AS [FREE_SPACE_MB]
 , Convert(Decimal(12,2),Round((size-FileProperty(name,'SpaceUsed'))/128.000/(size/128.000),2)) AS [FREE_PCT_MB]
FROM sys.database_files;


See, only first file has been expanded.


How to avoid this behavior?

This can be overcome by either expanding manually or enabling Trace Flag 1117. For expanding file manually, it is better to disable AutoGrowth for stopping automatic expansion.

Saturday, May 20, 2017

Management Studio has new DAX Query Window

We have been writing DAX queries using MDX Query Editor whenever we need to do something with Analysis Services Tabular Model databases (Read my old post on this: How to write SELECT for Analysis Services Tabular using DAX). Now, with this month Management Studio release (release number 17.0), we have a new Query Window for DAX.


Once the new release is installed, you should see the icon and menu item for DAX Query Window;


Writing DAX Queries

Here is an example of DAX queries that can be written. As you see, first statement uses SUMMERIZE function to produce a result that shows sales by day of the week and product. Second query creates a measure using DEFINE MEASURE and uses it again with another query.

EVALUATE 
SUMMARIZE(Sales_201501, Sales_201501[CalendarDayOfWeekLabel]
 , Sales_201501[ProductName]
 , "Sales", SUM(Sales_201501[SalesAmount]))

GO

DEFINE MEASURE Sales_201501[NewSales] = SUMX(FILTER(Sales_201501, Sales_201501[IsWorkDay]="WeekEnd")
          , Sales_201501[SalesAmount])
EVALUATE 
SUMMARIZE(Sales_201501, Sales_201501[CalendarDayOfWeekLabel]
 , Sales_201501[ProductName]
 , "Sales", Sales_201501[NewSales])

Here is the result of it;