Friday, July 22, 2011

Parsing source code files for function headers and function calls

Some weeks ago I wrote a small program, which is able to print a list of functions and function calls used in a program. It can parse a directory recursively for a given file type.

Here is the source code: functiondiagram.d
And here is the source as a "quote" (a function to present formated source code is missing on blogger.com...):
/*
    This console apllication written in D is able to parse C, C++, D and many more file types.
    It scans the files for functions and function calls and prints a list with all matches for every file.
    As first parameter you must specify a directory name, where the search will start. This folder will be scanned recursively for matches.
    The type of file must be specified by the second parameter.
  
    This code stands under the do-with-it-what-you-want license.
  
    Author: Energized
    E-mail: undervoltage@safe-mail.net
    Blog: http://electric-handicraft.blogspot.com
*/


import std.file;
import std.stdio;
import std.stream;
import std.regexp;

/*
TODO:
- Self defined data types are not recognized
- Comments will be parsed too. This can result in false "positives".
*/


/*
    Function is searching in a line of text/code for function calls and function headers:
*/
string[] returnFunctions( string data )
{
    string regex = r"([\w\d]+)\s*[(]";
    string[] rv;
  

    foreach( m; RegExp( regex, "ig" ).search( data ) )
    {
        if ( (m !is null) && (m.match(1) !is null) ) //is something was found:
        {
            //writefln("> '" ~ m.match(1) ~ "'");
            if ( RegExp( r"^(if|for|foreach|return|while|finally|try|catch)$" ).match( m.match(1) ) is null ) //is the match a reserved word?
            {
                rv ~= m.match(1); //if not, it will be part of our result

                //if no datatype is in front of the match, it is a function call:
                auto n = search( m.pre, r"^\s{0,}(string|bool|int|byte|char|long|short|float|double|void|signed|unsigned|HBITMAP)", "i" );
                if ( n is null )
                    rv ~= ";";
                else
                {
                    /*
                        If an assignment is done (function to variable), false positives will be produced.
                        Then the function call will be misinterpreted as a function header. Thats why we are searching for a
                        semicolon here. Is a semicolon in this line, it is a function call (or a forward declaration).
                    */
                    if ( std.string.find( m.post, ';' ) != -1 ) rv ~= ";";
                }
            }
        }
    }
    return rv;
}


/*
    Shows the result:
*/
void printTree( string[] data )
{
    if (data.length == 1)
        writefln( "\nFunction \"%s()\":", data[0] );
    else
    {
        foreach( t; data )
            if (t != ";") writefln("\t%s();", t);
    }
}


void main( char[][] parameters )
{
    string path = parameters[1];      //as first parameter the program needs a path name
    string extension = parameters[2]; //extension (like "*.cpp")
    string[] result = listdir( path, extension ); //crawl through all files with the given extension in the given directory and all sub directories and store the filenames in the "result" array.
  
    foreach( filename; result ) //scan one file after the other:
    {
        BufferedFile file = new BufferedFile( filename, FileMode.In ); //open the file for reading

        writefln( "\nFile: " ~ filename ); //print the filename
      
        foreach(ulong n, string line; file) //go through the lines and scan them for function names or function calls:
        {
            string[] functions = returnFunctions( line );
          
            if (functions !is null) printTree( functions ); //if the end of file is reached, print the result
        }
        file.close();
    }
}
 And here is an example, how the result looks like:

File: C:\functiondiagram.d

Function "returnFunctions()":
        RegExp();
        search();
        match();
        writefln();
        match();
        RegExp();
        match();
        match();
        match();
        search();
        done();
        call();
        find();

Function "printTree()":
        writefln();
        s();
        writefln();
        s();

Function "main()":

Function "extension()":
        listdir();
        BufferedFile();
        writefln();
        returnFunctions();
        printTree();
        close();

The program is not perfect yet. Some false positives are possible like in the example the function "s()" which is not a function but a regular expression.

No comments:

Post a Comment