1. Introduction : In this post , I will show how to build a mini C++ reflection tool using Clang ( Libclang in Python ). Initially, I will talk about Clang/LLVM and their ecosystem.
2. Clang & LLVM : Many people know Clang as a C++ compiler as an alternative and highly compatible to/with GCC. Clang actually is more than that. Clang is a front-end compiler and its result abstract syntax tree is also available to be used as a library. Basically, a compiler front end is front half a compiler which initially tokenize and then translate the source code into a traversable syntax tree. Then this abstract syntax tree is passed to the part which is called as a backend of a compiler. LLVM being a backend compiler compatible with Clang`s output parse tree , is responsible for converting the parse tree into instructions. Here is the part which compilation optimisations happen.
In this post , the example project will focus on working with an abstract syntax tree. Here you can read about it in Wikipedia :
To be more specific , below you can see a sample “foo” class :
And below you can see the abstract syntax tree produced by Clang :
– – CLASS_DECL
– – – CXX_ACCESS_SPEC_DECL
– – – FIELD_DECL
– – – CXX_ACCESS_SPEC_DECL
– – – FIELD_DECL
– – – CXX_METHOD
Translation unit : As described by the standards , it is simplest compilation unit. In this foo.cpp is the translation unit.
Class_Decl : Class declaration
CXX_ACCESS_SPEC_DECL : Access specifier : private,public or protected
FIELD_DECL : A member of the class
CXX_METHOD : A member method of the class
3. The current ecosystem : As it is very easy to use Clang as a library, there are many utility tools built around it . Libclang is the API of Clang. In this post, I will be using it via its Python binding. There are also tools like clang-tidy ( static analysis ) , clang include fixer, clang libformat to format C++ source or to apply your project`s coding notation and Cling which is a command line C++ interpreter and more.
As for the backend side LLVM which converts the AST into machine code, there are many interesting projects on top of it such as :
NVCC : Nvidia`s modification on LLVM allows to write plain C++ and produce GPU assembly.
Mapd : Is a product using NVidia`s NVCC and they run SQL queries on GPU . This is optimising their process according to this article : https://devblogs.nvidia.com/parallelforall/mapd-massive-throughput-database-queries-llvm-gpus/
Emscripten : It is an opensource SDK from Microsoft which uses LLVM to compile C++ into asm.js in order to run it on existing browsers : https://github.com/kripken/emscripten
4. First steps with Clang : Initially, I will show the simplest Libclang code that recursively traverses the foo.cpp above. For this I will be using Python binding of libClang :
As seen in the example , we initially start with a translation unit and then traverse the syntax tree recursively. When we recurse into a child we increment the level counter and also we know that we are leaving a child node when the stack unwinding happens and then we decrement the level counter. Also, the print function takes the level variable as argument in order to visualise tree structure in a very simple way.
5. Reflection tool : Reflection is the ability to access your code`s metadata in runtime. In other languages such as C# , it has been provided by the framework , whereas C++ does not provide the same functionality. In C++ , there are approaches such as using templates and C++11 SFINAE or adding an extra prebuild step to scan files and create reflection data as QT does. This example is closer to QT. The biggest advantage in this approach is you do not need to make any changes to your existing source code. Basically we will traverse every node in the syntax tree and record the data we see :
Traverse each node in the tree recursively
When it sees a class declaration , we create a record for the class
When it sees an access specifier , we set the current access specifier level for following members
When it sees a member variable/method declaration , it creates a record which is associated with the current class and access specifier.
Therefore the tool first creates the data and then generates C++ code. Here is the source of the simple reflection tool :
And below you can see the output for foo class :
In order to use :
auto ret = Reflection::GetClassNames();
auto ret2 = Reflection::GetMembers(“Foo”);
6. What more can be done : Actually there are many things doable with Clang and the best example is static analysis tools. Others are generating serialization and reflection code , applying coding standards of your team/company , include fixing and many more. There are many startups working with Clang parser. Note that there is also LibTooling tool which helps to create standalone tools.
As for myself I am working on a dynamic execution analysis tool which also gets help from Clang`s AST output in order to find all possible call flows before starting dynamic analysis. Below you can see a screenshot of a SQLite database with call flow information of Doom source code :
7. Links :
Clang official page : http://clang.llvm.org/
Detailed information about Clang AST : https://www.youtube.com/watch?v=VqCkCDFLSsc
Generating serialization code : http://llvm.org/devmtg/2012-04-12/Slides/Wayne_Palmer.pdf
A more complete reflection example : http://austinbrunkhorst.com/blog/category/reflection/
An interesting reflection projects which gets its data from PDB files : http://msinilo.pl/blog2/post/p707/