Tuesday 15 October 2019

Ways of importing excel data into SQL Server Table

There are a couple of ways to import excel data into SQL Server. One of the most popular ways is to use SSIS to import the data into SQL Server from Excel workbooks. Another one is to use OPENROWSET and linked servers. We will take one scenario and implement both the ways.

Scenario: 

I have 'n' number of excel workbooks (all of them have the same schema) in a network drive. I will have to process each excel file into one common SQL table and move the processed file into "ProcessedFiles" folder.



So, SSIS or OPENROWSET approach had to do below steps.
1. Read the network path
2. Loop through each file, process and move the processed file into "ProcessedFiles" folder.

I have added the following 2 columns (Id, Name) and added 4 rows of dummy data in the excel.  I have copied the same excel 4 times and changed the name "Excel Data 10*.xls". I have created a table with the same schema in the SQL Server using the below script.
CREATE TABLE exceldata( id INTNAME sysname) 

Let's do the demo.

SSIS Approach:

Since we have to read each file in the network drive, I am using "Foreach Loop Container" to loop through each file and process. I have chosen "Enumerator" as "Foreach File Enumerator" like below screenshot, Please update "Folder:" and "Files:" correctly as highlighted.  Note that I am retrieving "Fully qualified" file path from here and going to assign it to a variable to use it in "Excel File Path" in excel source.

Go to "Variable Mappings" and create a new variable named "Filepath" to store Fully qualified file Path for each iteration. I have assigned design value for the variable "Filepath" as "C:\NARA\Excel Data 101.xls".

Let's add a data flow task from SSIS Toolbox into the "Foreach Loop Container".

Go into the data flow task and add "Excel Source". Configure excel source by adding "Excel Connection Manager".

Right-click on "Excel Connection Manager", click on properties, set the "Delay Validation" to "True". Click on "Expression", Select "Property" as "ExcelFIlePath" and Click "..." right to the expression and drag "User::FilePath" and drop to the "Expression: "and click on "Evaluated value" to check. Click on "OK"

Add "OLEDB Destination" and configure it appropriately with the right table.

It is like below in the data flow task. Ignore the "Truncation" warning.

Please now come back to control flow, add a new variable to store the "ProcessedFiles" network drive path and add "File System Task" to move the processed file to the "ProcessedFiles" Folder.  Please update the settings of "File System Task" like below and clik "OK".

Now everything is done. Execute the SSIS package and check the "ProcessedFiles" folder.


Check the SQL Table data.

OPENROWSET Approach: 

Let's truncate "exceldata" table in MyDB to use OPENROWSET approach.
TRUNCATE TABLE exceldata 
I have moved the ProcessedFiles again back to the c:\NARA folder to process files again using OPENROWSET approach.

In order to use "OPENROWSET" approach, we will have to install "Microsoft.ACE.OLEDB.12.0 for Windows" in the server. An installer can be found here for 64 bt version. Get the appropriate installer from the Microsoft site and install it. Please make sure to register the "msexcl40.dll" using below command prompt.
> C:\Windows\SysWOW64\regsvr32 C:\Windows\SysWOW64\msexcl40.dll

Reconfigure xp_cmdshell, 'Ad Hoc Distributed Queries' and 'Microsoft.ACE.OLEDB.12.0' using below script.
USE mydb 
go 
EXEC Sp_configure  'xp_cmdshell',1 
go 
RECONFIGURE 
go 
Sp_configure 'show advanced options', 1; 
go 
RECONFIGURE; 
go 
Sp_configure 'Ad Hoc Distributed Queries', 1; 
go 
RECONFIGURE; 
go 
EXEC Sp_msset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess',1 
go 
EXEC Sp_msset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1 
go 

Use below script to load the data from excel workbooks into SQL Table and move the processed files to "ProcessedFiles" folder. Update the appropriate values in the highlighted before running the script. You can use this script in a SQL stored procedure and call sproc using SQL agent job to run based on the schedule (daily/weekly/monthly).
USE MyDB
GO
IF(OBJECT_ID('tempdb..#DirectoryTree_F') IS NOT NULL ) DROP TABLE #DirectoryTree_F
CREATE TABLE #DirectoryTree_F  (id int IDENTITY(1,1) ,subdirectory nvarchar(512),depth int,isfile bit);
DECLARE @FolderPath SYSNAME = 'C:\NARA\'
DECLARE @ProcessedFolderPath SYSNAME = 'C:\NARA\ProcessedFiles\'
INSERT #DirectoryTree_F (subdirectory,depth,isfile)
EXEC master.sys.xp_dirtree @FolderPath,1,1;
DELETE #DirectoryTree_F WHERE subdirectory NOT like 'Excel Data%.xls'
--SELECT * FROM #DirectoryTree_F
WHILE EXISTS (SELECT * FROM #DirectoryTree_F)
BEGIN
  DECLARE @FileName SYSNAME,@FilePath NVARCHAR(3000),@SQL NVARCHAR(4000),@OpenRowSQL NVARCHAR(4000)
  SELECT TOP 1  @FileName = subdirectory FROM #DirectoryTree_F
  SELECT @FilePath = @FolderPath + @FileName
  SET @OpenRowSQL =  'INSERT exceldata (Id,[Name]) SELECT Id,[Name] FROM OPENROWSET(''Microsoft.ACE.OLEDB.12.0'',''Excel 12.0; Database='+@FilePath+''', [Sheet1$]);'
  --PRINT @OpenRowSQL
  EXEC SP_EXECUTESQL @OpenRowSQL
  SET @SQL = 'MOVE "' + @FilePath + '" "' +  @ProcessedFolderPath+'" '
  EXEC xp_cmdshell @SQL
  --PRINT @SQL
  DELETE #DirectoryTree_F WHERE subdirectory = @FileName
END
GO

Let's check the data in the table.
We have loaded all the data from excel sheet(s) into SQL Server Table and moved the processed files to the different folder using OPENROWSET approach.
Which approach do you think is better and effective?

Please remember that Microsoft has left the following note in their online docs about "OPENROWSET" approach. 

"The ACE provider (formerly the Jet provider) that connects to Excel data sources is intended for interactive client-side use. If you use the ACE provider on SQL Server, especially in automated processes or processes running in parallel, you may see unexpected results."

Please choose the approach carefully based on your requirement. Thanks

2 comments: