21 March 2011

Parsing a query or statement for Identifying Functions


Overview

I am working in the application which was previously integrated with Microsoft Great Plain system (Back-end SQL Server Database). Now it's going to be integrated with Oracle Financial (Back-end Oracle Database). When i analyzed application's current code, i found number of inline queries everywhere in the code. It would be very difficult to scan whole code and modify all queries for making them Oracle compliance. So rather to do this manual work, i decided to write a routine which would return an object after parsing the query. Once i would get the object, i could make required modification easily before sending it to the sever for execution.

Details

If you want to convert SQL Server's query to Oracle's, beside other modifications, you must need to replace all SQL Server's functions and their parameters with some Oracle compliance functions and parameters. This replacement depends on the name of SQL Server's functions as well as values and positions of their arguments/parameters.
Suppose we have the function from SQL Server query Convert(datetime,'23/02/2010',103). In order to make it Oracle compliance, it should be modified something like To_Date('23/02/2010','DD/MM/yyyy').
First argument of Convert function in above example is datetime which tells that function name should be replaced with To_Date. Third argument of Convert function is 103 which tell that To_Date function's second argument should be 'DD/MM/yyyy'.

IdentifyFunctions (parsing a query or statement)

This method parse the query and return a FunctionParameter object.
/// <summary>
        /// Identifying functions into a parameter.
        /// Whole statement or query is considered as a parameter.
        /// </summary>
        /// <param name="sb">A parameter or a statement. A parameter in case of recursive calling</param>
        /// <param name="startIndex"></param>
        /// <param name="endIndex"></param>
        /// <param name="flatListOfFunction">A flat list of all functions in a statement</param>
        /// <param name="parentFunction"> Function, the parameter belongs to. Null value in case of statement.</param>
        /// <returns></returns>
        public static FunctionParameter IdentifyFunctions(StringBuilder sb, int startIndex, int endIndex
            , SortedList<int, Function> flatListOfFunction
            , Function parentFunction)
        {

            //Parsing will be started for "index" position
            int index = startIndex;

            //This flag will be true where parsing will not be required. e.g "sample text" or 'sample text'
            bool blindMove = false;

            //Initialize flatListOfFunction
            if (flatListOfFunction == null)
            {
                flatListOfFunction = new SortedList<int, Function>();
            }



            //Passed value is considered a parameter. Each param may contain number of functions.
            FunctionParameter parameter
                = new FunctionParameter(sb, startIndex, endIndex, parentFunction);


            //Explicit return condition
            if (parameter.Statement == string.Empty)
            {
                return parameter;
            }

            //Fuction can be constructed with any of them enclosing chars. [{(
            //e.g. Function(),Function[] or Function{}
            Stack<char> enclosingChars = new Stack<char>();
            Stack<int> enclosingCharIndexs = new Stack<int>();

            //Variables for current function
            int currentFuncNameStartIndex = -1;
            int currentFuncNameEndIndex = -1;
            bool currentFuncNameEnded = false;
            char currentFuncEnclosingChar = '\0';
            Function currentFunction = null;


            //current text enclosing char
            char currentTextEnclosing = '\'';

            while (index <= endIndex)
            {
                char chr = sb[index];


                if (blindMove && chr != currentTextEnclosing)
                {
                    //Ignore current char, becuase blind move is true.
                    index++;
                    continue;
                }

                char? chrNext = (index + 1 < sb.Length)
                    ? sb[index + 1]
                    : (char?)null;

                char? chrPrev = (index > 0)
                    ? sb[index - 1]
                    : (char?)null;


                char? topEnclosingChar = (enclosingChars.Count > 0)
                        ? enclosingChars.Peek()
                        : (char?)null;

                int? topEnclosingCharIndex = (enclosingCharIndexs.Count > 0)
                       ? enclosingCharIndexs.Peek()
                       : (int?)null;

                switch (chr)
                {
                    // ' 
                    case '\'':
                    case '"':
                        if (topEnclosingChar == chr)
                        {
                            if (chrNext == chr)
                            {
                                //It is not enclosing (closing) char so skip these two chars
                                index += 2;
                                continue;
                            }
                            else if (chrPrev == '\\')
                            {
                                //Previous char was escaping char. so it is not closing of the text.
                            }

                            else
                            {
                                //It is enclosing (closing) char so pop the char from stacks.
                                enclosingChars.Pop();
                                enclosingCharIndexs.Pop();

                                //Parsing will be required for upcomming data.
                                blindMove = false;

                            }
                        }
                        else
                        {
                            //It is enclosing (opening) char so push it to the stacks.
                            enclosingChars.Push(chr);
                            enclosingCharIndexs.Push(index);

                            //Ignore parising of upcomming data untill we get closing char.
                            blindMove = true;

                        }
                        break;

                    // {([
                    case '{':
                    case '[':
                    case '(':

                        //This char may be for nested function 
                        //"Bracket found in the stack" means that it is opening for nested function.
                        // Nested functions will be handled by related iteration (Recursive Calling)
                        bool isNestedFunctionOpening = (enclosingChars.Count > 0);


                        //If it is not for nested functin then set function name's ending index.
                        if (!isNestedFunctionOpening)
                        {

                            //Set endIndex to previous of this index.
                            currentFuncNameEndIndex = index - 1;

                            //Enclosing char, (Can be used later at the time of statement manipulation).
                            currentFuncEnclosingChar = chr;

                            //Make function
                            currentFunction = new Function(sb
                                , currentFuncNameStartIndex
                                , currentFuncNameEndIndex
                                , currentFuncEnclosingChar
                                , parameter);

                            //Add function to main param
                            parameter.Functions.Add(currentFunction);


                            //Add into flat list
                            flatListOfFunction.Add(currentFuncNameEndIndex, currentFunction);



                        }
                        enclosingChars.Push(chr);
                        enclosingCharIndexs.Push(index);
                        break;

                    // )}]
                    case '}':
                    case ']':
                    case ')':

                        if (enclosingChars.Count == 0)
                        {
                            //There is no openning bracket for this char
                            throw new Exception(string.Format("opening char is missing index={0}    char={1}"
                                , index
                                , chr));
                        }

                        //This char may be for nested function 
                        //"More than one char found in the stack" means, it is closing for nested function.
                        bool isNestedFunctionClosing = (enclosingChars.Count > 1);

                        if (!isNestedFunctionClosing)
                        {
                            //It is closing of current function. 
                            //And previous index is the ending of its last parameter if any.

                            //Get its last param by recursive calling
                            FunctionParameter param = IdentifyFunctions(sb
                                , enclosingCharIndexs.Peek() + 1
                                , index - 1
                                , flatListOfFunction
                                , currentFunction);

                            //Add Param to function params List
                            if (currentFunction != null)
                            {
                                currentFunction.Parameters.Add(param);
                            }

                            //Pop function's opening char.
                            enclosingCharIndexs.Pop();
                            enclosingChars.Pop();


                            //Reset current function variables
                            currentFunction = null;
                            currentFuncNameStartIndex = -1;
                            currentFuncNameEndIndex = -1;

                        }
                        else
                        {
                            //Pop nested fuction opening char
                            enclosingCharIndexs.Pop();
                            enclosingChars.Pop();

                        }
                        break;

                    // ,
                    case ',':

                        //This char may be for nested function 
                        //"More than one char found in the stack" means, it is for nested function.
                        bool isNestedFunctionComma = (enclosingChars.Count > 1);


                        if (!isNestedFunctionComma

                            //commas outside the functions will be ignored. 
                            //If we find stack empty, it means this comma doesnt belog to any function.
                            && enclosingChars.Count > 0)
                        {
                            //Previous index is the ending of the last parameter of current function if any

                            //Get the param by recursive calling
                            FunctionParameter param = IdentifyFunctions(sb
                                , enclosingCharIndexs.Peek() + 1
                                , index - 1
                                , flatListOfFunction
                                , currentFunction);


                            //Add param to list
                            if (currentFunction != null)
                            {
                                currentFunction.Parameters.Add(param);
                            }

                            //Pop function's opening char (it may be a comma).
                            //Because after deciding first param boundary, function's opening char will get poped out the stacks.
                            enclosingCharIndexs.Pop();
                            enclosingChars.Pop();

                            //Push this char, it will be used for identifying starting of next param
                            enclosingCharIndexs.Push(index);
                            enclosingChars.Push(chr);
                        }
                        break;

                    // space or new line
                    case '\t':
                    case '\n':
                    case '\r':
                    case ' ':

                        //As a space has come, system is assuming it is either end of the function name
                        //or not part of the function name.
                        currentFuncNameEnded = true;
                        break;

                    default:
                        if ((chr >= '0' && chr <= '9')
                             || (chr >= 'a' && chr <= 'z')
                             || (chr >= 'A' && chr <= 'Z')
                             || (chr == '_')
                            ) //Possible chars for function name
                        {
                            bool isNestedFunctionText = (enclosingChars.Count > 0);

                            //if Char is from nested functions. system will assume that 
                            //function name has already been identified.
                            if (!isNestedFunctionText)
                            {
                                if (currentFuncNameEnded)
                                {
                                    //As current function name has already been ended.
                                    //and now we have another alpha numeric character.
                                    //it means last identified function name is not correct
                                    //So mark this char as start of the fuction name
                                    currentFuncNameStartIndex = index;
                                    currentFuncNameEnded = false;
                                }
                                else if (currentFuncNameStartIndex < 0)
                                {
                                    //it has no been intialized yet
                                    currentFuncNameStartIndex = index;
                                }
                            }
                        }
                        break;
                }
                index++;

            }

            return parameter;

        }

FunctionParameter object

FunctionParameter object is either a whole query/statement or an argument of a function. FunctionParameter object has a collection of Function objects FunctionParameter object belongs to a parent Function. (In case of whole statemet/query, parent Function object is null)
public class FunctionParameter
    {

        /// <summary>
        /// 
        /// </summary>
        /// <param name="source"></param>
        /// <param name="startIndex"></param>
        /// <param name="endIndex"></param>
        /// <param name="parentFunction">Function, which this param belongs to</param>
        public FunctionParameter(StringBuilder source, int startIndex, int endIndex, Function parentFunction)
        {
            this.startIndex = startIndex;
            this.endIndex = endIndex;
            this.source = source;
            this.functions = new List<Function>();
            this.parentFunction = parentFunction;
        }

        private int startIndex;
        public int StartIndex
        {
            get { return startIndex; }

            set
            {
                startIndex = value;

                //Statement shoud be re assigned.
                statement = null;
            }
        }

        private int endIndex;
        public int EndIndex
        {
            get { return endIndex; }
            set
            {
                endIndex = value;

                //Statement shoud be re assigned.
                statement = null;
            }
        }

        private StringBuilder source;
        public StringBuilder Source
        {
            get { return source; }
            set { source = value; }
        }

        private List<Function> functions;
        public List<Function> Functions
        {
            get { return functions; }
        }


        private Function parentFunction;
        public Function ParentFunction
        {
            get { return parentFunction; }
        }

        private string statement = null;
        public string Statement
        {
            get
            {
                if (statement == null)
                {
                    if (startIndex > endIndex
                        || startIndex < 0
                        || endIndex < 0)
                    {
                        //Parameter with empty text (empty statement)
                        statement = string.Empty;
                    }
                    else
                    {
                        statement = source.ToString(startIndex
                            , endIndex - startIndex + 1).Trim();
                    }
                }
                return statement;
            }
            set
            {
                int statementCurrentLength = 0;
                if (startIndex < 0)
                {
                    //Already empty text so no need to remove old text
                    startIndex = 0;
                }
                else if (endIndex > 0)
                {
                    //Remove old text
                    source.Remove(startIndex
                        , endIndex - startIndex + 1);

                    statementCurrentLength = endIndex - startIndex + 1;
                }

                //Insert new text
                source.Insert(startIndex, value);

                //Update endindex
                endIndex = startIndex + value.Length - 1;

                //Reassign statement.
                statement = value.Trim();

                //Offset: subtrace statementCurrentLength from value's length
                int offset = value.Length - statementCurrentLength;


                //Adjust onward functions and params indexes
                if (parentFunction != null)
                {
                    parentFunction.AdjustIndexes(offset, this);
                }



            }
        }

        public void AdjustIndexes(int offset)
        {
            AdjustIndexes(offset, null);
        }


        public void AdjustIndexes(int offset, Function requestingFunction)
        {

            if (requestingFunction == null)
            {
                endIndex = endIndex + offset;
                startIndex = startIndex + offset;
            }
            else
            {
                //It is requested from its own fuction so only endIndex adjustment is required.
                endIndex = endIndex + offset;
            }

            //Adjustment should be applied only for those statement's functions which come after requesting function.
            int index = (requestingFunction != null)
                ? functions.IndexOf(requestingFunction) + 1
                : 0;

            for (; index < functions.Count; index++)
            {
                functions[index].AdjustIndexes(offset);
            }

            //Notify parent function
            if (parentFunction != null
                && requestingFunction != null)
            {
                //If request from its own function, only then notification is required.
                parentFunction.AdjustIndexes(offset, this);
            }
        }


        public override string ToString()
        {
            return Statement;
        }
    }

Function object

Function object has a collection of FunctionParameter objects. Function object always belongs to a parent FunctionParameter object. New FunctionParameter objects can be added to and removed from Function object.
public class Function
    {

        public Function(StringBuilder source
            , int nameStartIndex, int nameEndIndex
            , char enclosingChar
            , FunctionParameter parentParameter
            )
        {
            this.source = source;
            this.nameStartIndex = nameStartIndex;
            this.nameEndIndex = nameEndIndex;
            this.parameters = new List<FunctionParameter>();
            this.enclosingChar = enclosingChar;
            this.parentParameter = parentParameter;



        }


        private FunctionParameter parentParameter;
        public FunctionParameter ParentParameter
        {
            get { return parentParameter; }
        }

        private int nameStartIndex;
        public int NameStartIndex
        {
            get { return nameStartIndex; }
            set
            {
                nameStartIndex = value;

                //Function name shoud be re assigned.
                functionName = null;
            }
        }

        private int nameEndIndex;
        public int NameEndIndex
        {
            get { return nameEndIndex; }
            set
            {
                nameEndIndex = value;

                //Function name shoud be re assigned.
                functionName = null;
            }
        }


        private List<FunctionParameter> parameters;
        public List<FunctionParameter> Parameters
        {
            get { return parameters; }
        }



        private StringBuilder source;
        public StringBuilder Source
        {
            get { return source; }
            set { source = value; }
        }

        private char enclosingChar;
        public char EnclosingChar
        {
            get { return enclosingChar; }
            set
            {
                enclosingChar = value;

                if (parameters.Count > 0)
                {
                    //opening index
                    FunctionParameter firstParam = parameters[0];
                    int openingIndex = (firstParam.EndIndex > firstParam.StartIndex)
                        ? ((firstParam.StartIndex > 0) ? firstParam.StartIndex - 1 : 0)
                          : firstParam.EndIndex;

                    //Closing index
                    FunctionParameter lastParam = parameters[parameters.Count - 1];
                    int closingIndex = (lastParam.EndIndex > lastParam.StartIndex)
                        ? ((lastParam.EndIndex < source.Length - 1) ? lastParam.EndIndex + 1 : source.Length - 1)
                          : lastParam.StartIndex;

                    source[openingIndex] = value;

                    switch (value)
                    {
                        case '[':
                            source[closingIndex] = ']';
                            break;
                        case '{':
                            source[closingIndex] = '}';
                            break;
                        case '(':
                            source[closingIndex] = ')';
                            break;
                        case '<':
                            source[closingIndex] = '>';
                            break;
                        default:
                            source[closingIndex] = value;
                            break;
                    }


                }
            }
        }

        private string functionName;
        public string FunctionName
        {
            get
            {
                if (functionName == null)
                {
                    if (nameStartIndex > nameEndIndex
                        || nameStartIndex < 0
                        || nameEndIndex < 0)
                    {
                        //Function without name
                        functionName = string.Empty;
                    }
                    else
                    {
                        functionName = source.ToString(nameStartIndex
                            , nameEndIndex - nameStartIndex + 1).Trim();
                    }
                }
                return functionName;
            }
            set
            {
                int nameCurrentLength = 0;
                if (nameStartIndex < 0)
                {
                    //Already empty text so no need to remove old text
                    nameStartIndex = 0;
                }
                else if (nameEndIndex > 0)
                {
                    //Remove old text
                    source.Remove(nameStartIndex
                        , nameEndIndex - nameStartIndex + 1);

                    nameCurrentLength = nameEndIndex - nameStartIndex + 1;
                }

                //Insert new text
                source.Insert(nameStartIndex, value);

                //Update endindex
                nameEndIndex = nameStartIndex + value.Length - 1;

                //Assign function name             
                functionName = value.Trim();

                //Offset: subtrace nameCurrentLength from value's length
                int offset = value.Length - nameCurrentLength;

                //Adjust indexes but not of function name.
                AdjustIndexes(offset, true);

            }
        }

        public void RemoveParameter(FunctionParameter parameter)
        {
            int index = parameters.IndexOf(parameter);
            RemoveParameter(index);
        }
        public FunctionParameter RemoveParameter(int index)
        {
            FunctionParameter parameter = parameters[index];
            bool isFirstParam = index == 0;
            bool isLastParam = index == parameters.Count - 1;

            int sourceStartIndex = (isFirstParam)
                ? parameter.StartIndex
                : (parameters[index - 1].EndIndex + 1); //We also need to remove prior comma.

            int sourceEndIndex = (isLastParam)
                ? parameter.EndIndex
                : (parameters[index + 1].StartIndex - 1); //We also need to remove next comma.


            if (sourceEndIndex >= sourceStartIndex)
            {
                int Paramlength = sourceEndIndex - sourceStartIndex + 1;
                source.Remove(sourceStartIndex
                    , Paramlength);

                AdjustIndexes(Paramlength * -1, parameter);
            }

            parameters.RemoveAt(index);

            return parameter;


        }

        public FunctionParameter AddParameter(string parameterText)
        {
            return InsertParameter(parameterText, parameters.Count);
        }

        public FunctionParameter InsertParameter(string parameterText, int index)
        {
            string comma = " , ";

            //index adjustment
            if (index < 0)
            {
                index = 0;
            }
            else if (index > parameters.Count)
            {
                index = parameters.Count;
            }


            bool isFirstParam = index == 0;
            bool isLastParam = index == parameters.Count;


            //Param text for source, it may be with comma
            string textForSource = parameterText;


            int sourceStartIndexForParam;
            int sourceStartIndex;
            if (isFirstParam)
            {
                sourceStartIndexForParam = sourceStartIndex = nameEndIndex + 2; // indext after openning bracket

                if (!isLastParam)
                {
                    textForSource = textForSource + comma; // Ended by a comma
                }
            }
            else
            {
                sourceStartIndex = parameters[index - 1].EndIndex + 1;
                textForSource = comma + textForSource; // Should be started with comma
                sourceStartIndexForParam = sourceStartIndex + comma.Length;
            }

            //Make param object
            FunctionParameter param = new FunctionParameter(source
                , sourceStartIndexForParam
                , sourceStartIndexForParam + parameterText.Length - 1
                , this);

            //Insert into param list
            parameters.Insert(index, param);

            //Insert text into source
            source.Insert(sourceStartIndex, textForSource);

            AdjustIndexes(textForSource.Length, param);

            return param;

        }



        public void AdjustIndexes(int offset)
        {
            AdjustIndexes(offset, null, false);
        }

        public void AdjustIndexes(int offset, bool ownRequest)
        {
            AdjustIndexes(offset, null, ownRequest);
        }

        public void AdjustIndexes(int offset
            , FunctionParameter requestingParam)
        {
            AdjustIndexes(offset, requestingParam, false);
        }
        public void AdjustIndexes(int offset
            , FunctionParameter requestingParam
            , bool ownRequest)
        {
            if (requestingParam == null && !ownRequest)
            {
                nameEndIndex = nameEndIndex + offset;
                nameStartIndex = nameStartIndex + offset;
            }
            else
            {
                //it is requested from either its own argument or itself. 
                //In both cases, function name adjustment is not required.
            }

            //adjustment should be applied only onward params of requeting param
            int index = (requestingParam != null)
                ? parameters.IndexOf(requestingParam) + 1
                : 0;

            for (; index < parameters.Count; index++)
            {
                parameters[index].AdjustIndexes(offset);
            }

            //Also notify parent
            if (requestingParam != null || ownRequest)
            {
                //if request from either child param or itself, only then notification will be required.
                parentParameter.AdjustIndexes(offset, this);
            }
        }

        public override string ToString()
        {
            return FunctionName;
        }
    }


Example

select ISNULL(Convert(datetime , CreatedOn , 103) , getDate()) from Orders.
After parsing above query, system will give a FunctionParameter object with following details
Seq Property Value
1 Statement select ISNULL(Convert(datetime , CreatedOn , 'DD\MM\YYYY') , getDate()) from Orders.
2 ParentFunction [Null]
3 Functions [List]
1.1    FunctionName ISNULL
1.2    ParentFunctionParameter[ Seq# 1]
1.3    FunctionParameters [List]
1.3.1         Statement Convert(datetime , CreatedOn , 'DD\MM\YYYY')
1.3.2         ParentFunction [Seq# 1.1]
1.3.3         Functions [List]
1.3.3.1            FunctionName Convert
1.3.3.2            ParentFunctionParameter[Seq# 1.3.1]
1.3.3.3            FunctionParameters [List]
1.3.3.3.1                 Statement datetime
1.3.3.3.2                 ParentFunction [Seq#  1.3.3.1]
1.3.3.3.3                 Functions [Empty List]
1.3.3.3.4                 Statement CreatedOn
1.3.3.3.5                 ParentFunction [Seq#  1.3.3.1]
1.3.3.3.6                 Functions [Empty List]
1.3.3.3.7                 Statement 103
1.3.3.3.8                 ParentFunction [Seq#  1.3.3.1]
1.3.3.3.9                 Functions [Empty List]
1.3.4         Statement getDate()
1.3.5         ParentFunction [Seq# 1.1]
1.3.6         Functions [List]
1.3.6.1            FunctionName getDate
1.3.6.2            ParentFunctionParameter[Seq# 1.3.4]
1.3.6.3            FunctionParameters [Empty List]

Query Conversion Routine

After getting Function objects , we can easily write a routine for required conversion.
public static string ConvertQuery(SortedList flatListOfFunction
            , StringBuilder source)
        {
            foreach (int key in flatListOfFunction.Keys)
            {
                Function func = flatListOfFunction[key];
                if (func.FunctionName.ToUpper() == "CONVERT")
                {
                    if (func.Parameters.Count == 3)
                    {
                        if (func.Parameters[2].Statement == "103"
                            && func.Parameters[0].Statement.ToUpper() == "DATETIME")
                        {
                            func.FunctionName = "TO_Date";
                            func.RemoveParameter(0);
                            func.Parameters[1].Statement = "'DD/MM/YYYY'";
                        }

                    }
                }

                if (func.FunctionName.ToUpper() == "ISNULL")
                {
                    func.FunctionName = "NVL";
                }

                if (func.FunctionName.ToUpper() == "SUM")
                {
                    func.FunctionName = "Add";
                }
                if (func.FunctionName.ToUpper() == "GETDATE")
                {
                    func.FunctionName = "SysDate";
                    func.EnclosingChar = ' ';
                }
            }

            return source.ToString();

        }
After run query conversion method, above query will modified as follows.
select NVL(To_Date(CreatedOn , 'DD\MM\YYYY') , sysdate) from Orders

Digg Google Bookmarks reddit Mixx StumbleUpon Technorati Yahoo! Buzz DesignFloat Delicious BlinkList Furl

0 comments: on "Parsing a query or statement for Identifying Functions"

Post a Comment