Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GCC and static analysis (lwn.net)
75 points by emillon on May 2, 2012 | hide | past | favorite | 16 comments


I really need to renew my LWN subscription; their articles are outstanding. I wish I could write as clearly on deeply technical subjects.


Is it possible to just install the static analyzer, without having any actual compiler like Clang?


Static analysis is a method, not a goal. Static analysis in compilers is done for the purposes of optimization, so it is meaningless without a compiler. There are commercial and open source products that do static analysis to find logical errors and security vulnerabilities in your code like Klocwork, Coverity, or Gimpel Lint. There are formal verifiers. Decompilers, indenters and obfuscators can be considered sort of static analysis tools too, although on a shallower level.


Oh, I see. The article specifically mentioned thread-safety annotations, and I was wondering if that part could be run without actually compiling the code.


I think there's some terminology confusion. You probably think of the actual generation of a binary-executable as "compiling the code," but that's not really accurate. Rather, the entire process of source code to machine-executable is "compiling." But there are many steps inbetween, many phases of the compiler, and even if you stop at any one of those steps, you've still "compiled" the code. I know that the compilation phases of gcc roughly go like this:

1. Lexing: http://en.wikipedia.org/wiki/Lexical_analysis

2. Parsing: http://en.wikipedia.org/wiki/Parsing This is where you would do static analysis on the high level language. [edited because zeugma's comment made we realize it was ambiguous which kind of static analysis I meant]

3. High-level (C, C++, etc.) source is transformed to a three-address intermediate representation called Gimple: http://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html This phase exists for two reasons. One, gcc support many languages, so all of those high-level languages are translated to a single representation. That way they can all share the same optimization and code generation back-end. "Three address" languages are kind of like a simplified version of assembly, but it's completely machine agnostic.

4. Various optimization passes are performed on the architecture agnostic intermediate representation (which, again, in gcc is Gimple). These are optimizations that have nothing to do with the target machine. Many modern compilers transform the code into SSA form (http://en.wikipedia.org/wiki/Static_single_assignment_form) to enable other optimizations.

5. The optimized intermediate representation gets transformed into the assembly language for the target architecture. Architecture specific optimizations will typically happen at this phase.

6. The assembly language for the particular architecture is passed to an assembler for that architecture, which does the job of producing an actual binary-executable. gcc calls as, which means that what you probably think of as "compiling" isn't even done by gcc itself!

In theory, you could ask a compiler to stop and produce output at any one of those phases. In practice, I'm not sure how many of these phases you can ask gcc to stop at. I know, for example, that if you pass -S to gcc, it will stop at the end of step 5, and you can see the assembly it produces for the high-level source you give it. I'm not sure about stopping earlier.

Now. Back to your actual questions. No, you can't install the "static analyzer," because it is deeply integrated into the compiler - it's part of phase 2 from above. You may be able to ask gcc to stop after it produces Gimple, which would allow you to take advantage of its static analysis. But, as this article mentions, that means you're married to the Gimple format for doing all further work with the program.


If you do static analysis on Step 2 (ie on the AST) then you only need the Front-end of your compiler.

Clang/LLVM is much more modular, library oriented.(which is why the Google engineer wanted to switch to it). You can just link to the needed front-end and do the static analysis without generating the LLVM bytecode.

So yes, you could install a static-analyzer without the whole compiler.


So yes, you could install a static-analyzer without the whole compiler.

You can install a static analyzer without the whole compiler, but not gcc's static analyzer. I assumed "the static analyzer" that sp332 was talking about was specifically gcc's, not any generic static analyzer.


OK thanks for the clarification. I just meant, run the compiler as far as complaining about thread safety, without generating a binary. (That way, I could modify my Makefile to run this LLVM static analysis before running GCC like normal to actually make a binary.)



To the person who sent this, could you please give me a subscriber link to https://lwn.net/Articles/494993/ ?


LWN articles are opened up ~1 week after they're published. So you can wait a couple days until then. They also have a mailing list where they'll send out an email with links once an article becomes available.


Sure, you can get it through here: https://lwn.net/subscribe/Info


Jonathan Corbet sometimes posts LWN articles on reddit, this is how I found this one.


Corbet also occasionally posts subscriber links here from his own account ( http://news.ycombinator.com/user?id=corbet ).


Thanks


Don't mind the note in the article:

> Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: