Thursday 26 September 2019

How to consolidate same kind of databases into one common database in SQL Server

I have recently come across this requirement. One of our SQL enterprise servers has got 150+ game databases, eventually, the team has created a separate database for each game. Over time, the team end up having 150+ databases and the number is increasing for each month or quarter.  The team does not want to make any changes from the application side since many applications and reports are talking to this enterprise server, and, it is a huge effort to change the architecture.

As this enterprise server is participating in Always on High Availability, the team is kind of suspect that this would be having an issue down the line since Microsoft has only tested up to 100 databases
 So, I have decided to do a kind of proof concept for them by keeping below points in mind.

  1. Users or reports or applications talking with this enterprise server must NOT having any impact.
  2. Inserts must happen against the Current Game database.
  3. The update statements come from applications or users or reports MUST happen AS-IS without any impact.
  4. Way of accessing the data from applications or users or reports MUST work AS-IS without any impact.
Let's do the Demo
  • Create the Sample Game1 database with 2 tables in it.  Insert a few records in the tables.
SET NOCOUNT ON 
GO 
USE MASTER 
GO 
DROP DATABASE IF EXISTS Game_1 
GO 
CREATE DATABASE Game_1 
GO 
USE Game_1 
GO 
DROP TABLE IF EXISTS Table1 
GO 
CREATE TABLE Table1 (NAME SYSNAME,COLUMN1 SYSNAME) 
GO 
INSERT Table1 
SELECT 'NARA','TESTING' 
GO 100 
GO 
CREATE TABLE Table2 (NAME SYSNAME,COLUMN1 SYSNAME) 
GO 
INSERT Table2 
SELECT 'SIMHA','TESTING' 
GO 100 
GO






























Let's clone the GAME_1 database to GAME_2

DBCC CLONEDATABASE (Game_1,Game_2)
GO
ALTER DATABASE [Game_2]SET  READ_WRITE WITH NO_WAIT
GO
USE Game_2
GO
INSERT Table1 SELECT 'NARA','TESTING'
GO 500
GO
INSERT Table2 SELECT 'SIMHA','TESTING'
GO 200
GO

Let's clone the GAME_1 database to GAME_3
DBCC CLONEDATABASE (Game_1,Game_3)
GO
ALTER DATABASE [Game_3]SET  READ_WRITE WITH NO_WAIT
GO
USE Game_3
GO
INSERT Table1 SELECT 'NARA','TESTING'
GO 300
GO
INSERT Table2 SELECT 'SIMHA','TESTING'
GO 100
GO

The idea is to have one common Game database and have the data copied to common game databases from all the game databases (Game_1, Game_2...Game_n). Here each database has around 20 tables and NO column value in any table denotes the game number. So there is NO straight forward pattern to identify Game_No when inserting/updating the records into a common Game database.

Below script used to clone the GAME_1 database to COMMON_GAME and add DB_NAME (would be the database name) and GAME_NO in all tables. 

DBCC CLONEDATABASE (Game_1,COMMON_GAME)
GO
ALTER DATABASE [COMMON_GAME]SET  READ_WRITE WITH NO_WAIT
GO
USE COMMON_GAME
GO
--ADD NEW COLUMNS IN ALL TABLES IN COMMON DATAABSE
IF (OBJECT_ID('tempdb..#alltables') IS NOT NULL) DROP TABLE #alltables
SELECT TABLE_SCHEMA,TABLE_NAME INTO #alltables FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE = 'BASE TABLE'
WHILE EXISTS (SELECT * FROM #alltables)
BEGIN
DECLARE @TABLE_SCHEMA SYSNAME,@TABLE_NAME SYSNAME,@SQLCMD NVARCHAR(MAX)
SELECT TOP 1 @TABLE_SCHEMA= TABLE_SCHEMA, @TABLE_NAME = TABLE_NAME FROM #alltables

SELECT @SQLCMD = 'IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME = ''GAME_NO'' AND TABLE_NAME = '''+@TABLE_NAME+''' AND TABLE_SCHEMA = '''+@TABLE_SCHEMA+''') ALTER TABLE ['+@TABLE_SCHEMA+'].['+@TABLE_NAME+'] ADD GAME_NO INT NOT NULL'
PRINT @SQLCMD
EXEC SP_EXECUTESQL @SQLCMD
SELECT @SQLCMD = 'IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME = ''DB_NAME'' AND TABLE_NAME = '''+@TABLE_NAME+''' AND TABLE_SCHEMA = '''+@TABLE_SCHEMA+''') ALTER TABLE ['+@TABLE_SCHEMA+'].['+@TABLE_NAME+'] ADD DB_NAME AS CAST(''GAME_''+CAST(LTRIM(RTRIM(GAME_NO)) AS VARCHAR) AS VARCHAR(20)) PERSISTED '
PRINT @SQLCMD
EXEC SP_EXECUTESQL @SQLCMD
DELETE #alltables WHERE TABLE_NAME = @TABLE_NAME AND TABLE_SCHEMA = @TABLE_SCHEMA
END
GO
Let's copy the data for all tables from all game databases into common_game database using below script.
--COPY DATA FROM ALL GAMES INTO COMMON DATABASE AND AND RENAME THE TABLE (AS A BACKUP) IN INDIVIDUAL GAME DATABASES 
USE master
GO
IF (OBJECT_ID('tempdb..#allgames') IS NOT NULL) DROP TABLE #allgames
SELECT name into #allgames FROM SYS.sysdatabases WHERE name LIKE 'Game_%'
WHILE EXISTS (SELECT * FROM #allgames)
BEGIN
  DECLARE @DatabaseName SYSNAME,@TABLE_CATALOG SYSNAME,@TABLE_SCHEMA SYSNAME,@TABLE_NAME SYSNAME
  SELECT top 1 @DatabaseName = name FROM  #allgames
  DECLARE @SQLTableCommnd NVARCHAR(MAX)
  IF (OBJECT_ID('tempdb..##tableInfo') IS NOT NULL) DROP TABLE ##tableInfo
  SET @SQLTableCommnd = 'SELECT * INTO ##tableInfo FROM ['+@DatabaseName+']. INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE=''BASE TABLE'''
  EXEC SP_EXECUTESQL @SQLTableCommnd
  WHILE EXISTS (SELECT * FROM ##tableInfo)
  BEGIN
  SELECT TOP 1 @TABLE_CATALOG =TABLE_CATALOG ,@TABLE_SCHEMA=TABLE_SCHEMA,@TABLE_NAME=TABLE_NAME FROM ##tableInfo
  SET @SQLTableCommnd = 'INSERT [COMMON_GAME].['+@TABLE_SCHEMA+'].['+@TABLE_NAME+']  SELECT *,'''+SUBSTRING(@TABLE_CATALOG,6,LEN(@TABLE_CATALOG))+'''  FROM ['+@TABLE_CATALOG+'].['+@TABLE_SCHEMA+'].['+@TABLE_NAME+']  (NOLOCK)'
   PRINT @SQLTableCommnd
   EXEC SP_EXECUTESQL @SQLTableCommnd
  SET @SQLTableCommnd = @TABLE_CATALOG+'.'+@TABLE_SCHEMA+'.'+'SP_RENAME '''+@TABLE_NAME+''', '''+@TABLE_NAME+'_old'''
  PRINT @SQLTableCommnd
  EXEC (@SQLTableCommnd)
  DELETE ##tableInfo WHERE TABLE_CATALOG= @TABLE_CATALOG AND TABLE_SCHEMA= @TABLE_SCHEMA AND TABLE_NAME = @TABLE_NAME
  END
  DELETE #allgames WHERE name=@DatabaseName
END
GO











































We now have physical data in two places (one in COMMON_GAME and individual Game database with '*_Old' table).

Let's create one view manually under Game_1 database and try to INSERT and UPDATE the records. If it works as expected, we can create automated scripts to create views under individual game databases for all tables available in COMMON_GAME database.
--CREATE VIEWS IN INDIVIDUAL DATABASES
USE Game_1
GO
--DROP VIEW TABLE1
CREATE VIEW TABLE1 AS
SELECT NAME,COLUMN1--,[DB_NAME],GAME_NO 
FROM COMMON_GAME.DBO.Table1 WHERE [DB_NAME] = 'Game_1' AND GAME_NO = 1
GO
UPDATE Game_1.DBO.Table1  SET COLUMN1 = 'TESTING ADD'
SELECT @@ROWCOUNT
GO
SELECT COUNT(*) GAME_1_ROWS FROM  COMMON_GAME.DBO.Table1 WHERE [DB_NAME] = 'Game_1' AND COLUMN1 = 'TESTING ADD'
SELECT COUNT(*) AS TOTAL_ROWS FROM COMMON_GAME.DBO.Table1
GO

This means it is updating the GAME_1 records as expected.
Let's check the inserting the records into the GAME_1 table.
--INSERT CHECK
USE Game_1
GO
INSERT Table1
SELECT 'NARA','TESTING ADD'
GO 100
GO
SELECT COUNT(*) FROM COMMON_GAME.DBO.Table1 WHERE [DB_NAME] IS NULL
GO

As the GAME_NO column does not allow NULLS, this may not work for on-going records.

Let's create Partition Function and Partition Schema on Game_NO, try implementing table partitions on Table1 in COMMON_GAME database on GAME_NO. If it works, we can create an automated script to apply for all other tables.

--Create Partition Funtion and Partition Schema for all the games, Here i have got 3 games, so i have created created 3 and 1 extra file group AND 
USE [COMMON_GAME]
GO
CREATE PARTITION FUNCTION [PF_GAME_NO](int) AS RANGE LEFT FOR VALUES (N'1', N'2', N'3')
CREATE PARTITION SCHEME [PS_GAME_NO] AS PARTITION [PF_GAME_NO] ALL TO ([PRIMARY])
GO
--ADD INDEX ON [dbo].[Table1] 
CREATE NONCLUSTERED INDEX IX_Table1_GAME_NO ON [dbo].[Table1] (GAME_NO) 
ON PS_GAME_NO(GAME_NO)
GO 


Let's create a scalar function, which helps to get the current partition number by adding default constraint on GAME_NO column in the Table1. Use below script.
USE COMMON_GAME
GO
DROP FUNCTION IF EXISTS [dbo].[ufnGetLatestGameNo]
GO
CREATE FUNCTION  [dbo].[ufnGetLatestGameNo](@OBJECT_ID BIGINT)
RETURNS INT
AS
BEGIN
DECLARE @GameNo INT;
SELECT @GameNo = MAX(partition_number)
FROM SYS.PARTITIONS AS p
 INNER JOIN SYS.INDEXES AS i ON i.object_id = p.object_id  AND i.index_id = p.index_id
 INNER JOIN SYS.DATA_SPACES AS ds ON ds.data_space_id = i.data_space_id
 INNER JOIN SYS.PARTITION_SCHEMES AS ps ON ps.data_space_id = ds.data_space_id
 INNER JOIN SYS.PARTITION_FUNCTIONS AS pf ON pf.function_id = ps.function_id
 LEFT OUTER JOIN SYS.PARTITION_RANGE_VALUES AS prv_left ON ps.function_id = prv_left.function_id
 AND prv_left.boundary_id = p.partition_number- 1
 LEFT OUTER JOIN SYS.PARTITION_RANGE_VALUES AS prv_right ON ps.function_id = prv_right.function_id
 AND prv_right.boundary_id = p.partition_number
WHERE p.object_id = @OBJECT_ID AND prv_right.value IS NOT NULL;
RETURN @GameNo;
END;
GO
--ADD DEFAULT CONSTRAINT FOR THE GAME_NO COLUMN
DECLARE @OBJECT_ID BIGINT,@SQLCMD NVARCHAR(MAX)
SELECT @OBJECT_ID = OBJECT_ID('dbo.Table1')
SET @SQLCMD= 'ALTER TABLE dbo.Table1 ADD CONSTRAINT def_GameNo DEFAULT [dbo].[ufnGetLatestGameNo]('+CAST(@OBJECT_ID AS VARCHAR)+') FOR GAME_NO'
PRINT @SQLCMD
EXEC (@SQLCMD)
GO


Now validate INSERT AND UPDATE statements. 

Let's insert some records and check.
--INSERT CHECK
USE Game_1
GO
SELECT COUNT(*) ROWCNT_PREVIOUS FROM COMMON_GAME.DBO.Table1 WHERE GAME_NO = 3
GO
INSERT Table1
SELECT 'NARA','TESTING ADD_INSERT TESTING'
GO 100
GO
SELECT COUNT(*) ROWCNT_INSERT FROM COMMON_GAME.DBO.Table1 WHERE GAME_NO = 3
GO


Note that update works AS-IS and it updates the respective records but INSERTING statements would only insert against the current partition. If the business is okay to insert the records against the latest partition then it is better to implement table partitions for all tables in COMMON_GAME database.  I have tested this approach only for one table_1 in Game_1 database, We can follow the same approach to implement for all other tables too. 

We can automate additional whenever the new game database arrives using the below script.
ALTER PARTITION SCHEME [PS_GAME_NO] 
 NEXT USED [PRIMARY]ALTER PARTITION FUNCTION [PF_GAME_NO]() SPLIT RANGE(N'500') 
 --here '500' is the  game database name (say GAME_500) which recently arrives


Please note that I have not seen the schema of their databases yet. If there is a way of identifying records based on any value whether it is GAME_1 or GAME_2 or GAME_n in each table then it would be a little tweak in [dbo].[ufnGetLatestGameNo] function to insert records in a respective partition or with correct db_name in COMMON_GAME database.

If everything works as expected, please remember to drop the  '*_Old' tables in all individual game databases to reclaim the space.  Please test this approach in a NON-PRODUCTION environment before implementing in PRODUCTION.

If we follow this approach, all individual game databases may not necessarily part of the Always-On Availability Group. 

Hope this helps. Kindly let me know in case if think you have a better approach. 

No comments:

Post a Comment