octave and LLVM google summer of code project
David Bateman
David.Bateman at motorola.com
Tue Apr 8 10:17:02 CDT 2008
Duncan Sands wrote:
> Hi, over at the LLVM project (http://llvm.org) we've just received
> a google summer of code project proposal from Max Rietmann who wants
> to use LLVM's just-in-time optimizers and code generation to speed
> up octave's interpreter. I've pasted his application at the end of
> this email. While some of us have used octave, none of the LLVM
> people know anything about its internals so it is hard for us to tell
> if this project makes sense or not. Is this something octave would
> be interested in? Is it do-able in 10 weeks? Finally, if the project
> looks good it would be helpful to have an octave expert as a mentor
> as well as someone from LLVM - any suggestions?
>
> Best wishes,
>
> Duncan.
>
>
This not only makes sense but is probably one of the major stumbling
blocks for widespread acceptance of Octave relative to Matlab (ie Matlab
has a JIT). Whether its doable in 10 weeks, well he'd have to be good to
have a chance, but even a good start that supplies a template of how to
modify the rest of the code would be a major help.. I'm sure that any
help that is requested on the Octave internals would be supplied rapidly
on the maintainers at octave.org, though as this will necessarily impact
the parser, its John Eaton who has the best understanding of how that
works..
D.
> PS: The project proposal:
>
> I will write an Octave to LLVM bytecode compiler. I plan to write it in Python or C++, both of which I know well from previous personal projects. Octave is an open-source numerical programming toolkit that implements similar functionality to that of Mathworks Matlab. Despite recent improvements, in many cases Octave runs much slower than Matlab.
>
> Octave developers have discussed how to remedy this and I propose to use LLVM as the interpreter and use it to compile down to native code when desired. Using LLVMs optimizations, Octave could become a great numerical method library, speeding up development for those in the computational sciences, who often write their final code in C or Fortran. Octave's Vector operations make it easy to code numerical methods and by using LLVM to optimize these, it will be much more competitive than it currently is - Perhaps it could even surpass Matlab in speed.
>
>
> Detailed Description
> GSoC 2008 Application
> Mentoring Organization: LLVM
>
> Max Rietmann
>
> Background: I studied electrical and computer engineering at Cornell university as an undergraduate, but upon graduating I realized that my talents and interests lie in a more mathematical realm. I am currently one semester into a masters in applied mathematics at San Diego State University (SDSU). My specific program has a concentration on (nonlinear) Dynamical Systems, which can range from modeling cellular activity to finding chaotic orbits around galaxies. I'm currently interested in chaotic behavior and where it occurs in a phase space.
> Much of my work is done with the help some sort of mathematical toolkit like Mathematica or Matlab. For much of my numerical calculations I use Matlab, but only out of impatience. Before I was able to obtain a license to Matlab, I used Octave, which is an open source implementation of most of Matlab's functionality. Even after getting a Matlab license, I mostly used Octave because It loads into the command line faster than Matlab can load its gui. Unfortunately I have found Octave to be noticeably slower than Matlab (up to an order of magnitude), which has caused me to use Matlab exclusively.
> In reading the Octave mailing list they spoke of trying to re-implement the Octave runtime on something like LLVM or the JVM so as to take advantage of better optimizations. For example, when Matlab encounters loops, it attempts to use JIT techniques to vectorize the loop speeding up the code tremendously. The Octave developers would love to be able to do this, but they are unsure of how to progress forward. They are not a mentoring organization this year, but LLVM is. I have read through the tutorial on making a LLVM compiler for a simple language called Kaleidoscope that compiles down to LLVM bytecode. After reading the tutorial, I feel that I would be capable of implementing most of the Octave functionality in LLVM.
>
> Project Description and Outline: I will write an Octave to LLVM bytecode compiler. I plan to write it in Python or C++, both of which I know well from previous personal projects. My preference language preference is Python, but since one of the tutorials is using C++ and since Octave is written using mostly C++, I might use that to be consistent.
> My Knowledge of the Octave backend is such: It parses the language and generates its own internal representation. The interpreter then runs through the representation and makes calls to various numerical libraries when appropriate. For example, it uses libraries lapack and fftpack for matrix and fft operations, which are both written in fortran. From discussions on the mailing list, its clear that Octave's speed problems do not lie in its lack of ability in numerical operations, but instead are due to slow interpretation of the base language. I specifically notice the biggest speed issues when looping. In some cases, JIT compilation could be used to vectorize the looping code. But this still does not account for its relatively slow looping over non-vectorizeable code.
> Because it uses a large number of libraries to execute its numerical subroutines, I think that LLVM makes a wonderful choice as a bytecode solution and runtime. From my readings, calling to external libraries from LLVM is relatively straightforward and most importantly fast.
> Despite my excitement for this project, I am less knowledgeable about compiler-design as I want to be. There are a lot of details I do not yet understand or even know about, but I will try to outline my plan in as much detail as I can.
> The first big step will be to write a parser for the Octave code. From my own source-code digging, it seems that Octave uses a parser-generator to parse its code and generate an op-code style language for the interpreter. It has a large library of op-codes (66 in the OPERATORS folder). Many of the op-codes are matrix based and converting matrix operations to LLVM bytecode will be a significant challenge. Because Octave has its own op-code language I imagine it will be difficult to use much existing code to make a compiler for LLVM. My parser will have to interpret the Matrix operations from Octave and appropriately break them apart to either call the appropriate numerics toolbox or generate the functional LLVM code.
> So instead of overwhelming myself talking about Matrix operations I will describe the easy part. First, I will implement the basic functionality of the language leaving out vectors and matrices to learn more about LLVM and writing a parser. I've written a parser tree in Objective-C for a math program I started, so I already have some ideas about how that will work. I was planning on writing my own parser, but I might instead use a parser generator to generate the data-object-tree (I think they call it an AST). From there, I will need to build a tree to LLVM bytecode generator. For the basic operations without vectors and matrices this should be straightforward. Once this is all implemented and working I will begin with adding vectors. Mathematically, vectors are just a type of Matrix, but I am unsure if generalizing this much will make the fastest code. I imagine that I will handle vectors separately and optimize them by hand. Next will be matrix operations. Since octave u!
> ses a matrix library to "outsource" its matrix operations, I would also try to do something similar. I will try to use the same libraries Octave uses, but I do not know much about calling libraries from LLVM, that will take research and advice from the LLVM devs.
> I think the biggest thing for me to keep in mind is looking at the project incrementally. First start simple with the parser and basic code like:
> * simple assignment: x=2;
> * all the various arithmetic ops: +,-,*,/
> * control structures (if,then,else)
> * looping (for,while)
> For loops use vectors, so I would have to implement a basic vector to do this.
> * Function definition and calling
> With these built, one would be able to build most programs sans vector/matrix operations.
> Now I add:
> *vectors as a variable type and all that comes with that:
> * Vector addition/multiplication x=v1+v2, x=v1*v2 and x=v1.*v2
> (the .* multiplication means to multiply corresponding entries)
> The final step will be the Matrix operations, which are an entirely new bag of tricks.
>
> Overall Goals: My main goal for the end of the summer is to have the Octave language running on LLVM. Improving Octave's speed is the motivation for the project, but that can always come later. I hope to lay the ground work for the Octave team to pick up my LLVM compiler, develop it further and migrate the current runtime from its C++ roots to LLVM. I see Octave's speed as a major drawback and helping to fix this would not only help me in my academic pursuits, it might make Octave more mainstream, which will save buying (or pirating in many cases) Matlab licenses by those poor starving engineering students of the world.
>
> Can I do this?: I have quite a lot of experience coding and I've worked in assembly before. I also know the octave (and matlab) code very well because of my coursework. I also have the summer free and would use the stipend from Google as my summer job. I would spend the summer coding with breaks to go surfing and climbing. I don't have a specific mentor picked out, but I imagine that the existing mentors from LLVM would be extremely helpful and that the Octave developers would be extremely excited to have someone like me lay the groundwork for restructuring the Octave runtime. As I said, they've already mentioned this on the Octave mailing list, so I'm sure there are Octave developers who would love to help when I need advice and to help test. The tutorials on the LLVM website should get me up and running quickly and introduce me to the more difficult concepts that I am still learning about.
> One positive part of this project I see is that its relatively clear way forward. The Octave language is well documented and my work at the beginning will just be to implement each part of the language in LLVM. I start first with arithmetic operators (+,-,*,/) and move on to variables, functions, loops, and then finally to vectors and matrices. Because the LLVM language makes writing a compiler so much easier, I really feel like I can at least get the basics working relatively quickly.
>
>
>
--
David Bateman David.Bateman at motorola.com
Motorola Labs - Paris +33 1 69 35 48 04 (Ph)
Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob)
91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax)
The information contained in this communication has been classified as:
[x] General Business Information
[ ] Motorola Internal Use Only
[ ] Motorola Confidential Proprietary
More information about the Octave-maintainers
mailing list