RooJSolutions

D sockets, and rooscript, roojs updates

2008-08-10 23:16:00

Article originally from rooJSolutions blog

Since a few friends complained about adding me to their RSS feed, then not actually posting anything I thought I'd post a little something about some of the recent hacks I've been up to.

See the extended version for details on

Non-blocking socketstreams in D
Unix Sockets in D
Rooscript updates - System, and GDC cached building.
RooJS updates - examples in the manual

Walter showing off Digitalmars D cool features

2007-02-25 10:07:01

Article originally from rooJSolutions blog
Normally I dont just post a link, but this one is on the 'have to see' list. Walter Bright, the author of Digitalmars D, gave a talk to a C++ group about some of the cool features of D. - reserve an hour to listen to this one.

Video of Walter Bright at the C++ group.
Slides of Walter Brights talk at the C++ group
Event details for reference

Some of the stuff like scope(exit) - makes Exceptions actually usable. Some of the other stuff like templates I'm still trying to get my head round, let alone decide how the fit into writing maintainable code....

Backtracing segfaults in a daemon in digitalmars D

2007-01-31 17:41:05

Article originally from rooJSolutions blog
One the projects I'm working on is a SyncML server, written from scratch in D, It's currently in testing mode, and we found that the server was mysteriously crashing. Unfortunatly, since it's threaded, and forked as a daemon, we didn't really want to run it under GDB, (and since GDB segfaults on startup anyway). we where at a bit of a quagmire about how to find the bug.

So after a bit of searching through code.google.com I came across the idea of catching the SIGSEGV signal and calling backtrace and backtrace_symbols

This little trick can product output that looks something like

/path/to/application [0xAAAAAA] << address of code
/path/to/application [0xAAAAAA] << address of code
/path/to/application [0xAAAAAA] << address of code
/path/to/application [0xAAAAAA] << address of code

which initially seemed a bit cryptic, but by putting it together with
addr2line can result in some great debugging information.

This is my little backtrace logger for the deamon logger.

static void print_trace()
{
	
	
	void *btarray[10];
	size_t size;
	char **strings;
	size_t i;
	pid_t pid = getpid();

	//writefln("SEG");
	size = backtrace(cast(void**)btarray, 10);
	strings = backtrace_symbols(cast(void**)btarray, size);

	std.process.system("/bin/echo '----BACKTRACE------' " ~ 
		"> /var/log/myproject/backtrace.log");
	 
	for(i = 0; i < size; i++) {
		
		char[] line = std.string.toString(strings[i]);
		char[][] bits = std.string.split(line, "[");
		char[] left = std.string.strip(bits[0]);
		if (!left.length) {
			continue;
		}
		// skip lines with ( in them...
		if (std.string.find(left,"(") > -1) {
			continue;
		}
		
		char[] addr = bits[1][2..length-1];
		
		 
		std.process.system("/bin/echo '----" ~ addr 
			~ "------' >> /var/log/myproject/backtrace.log");
		std.process.system("/usr/bin/addr2line -f -e " ~ 
			left ~ " " ~ addr ~ " >> /var/log/myproject/myproject.log");
		
		
	}
	free(strings);
}

of course you need to use a few C externs to make this work:

extern (C) {
	int backtrace(void **__array, int __size);
	char** backtrace_symbols(void **__array, int __size);
	pid_t getpid();
	sighandler_t signal(int signum, sighandler_t handler);
 	void  sigsegv(int sig)
	{
		// reset the handler.
		signal(SIGSEGV, cast(sighandler_t) 0);
		print_trace();
		// really die
		exit(SIGSEGV);
	}
	
}

and to add it to you application, stick this in main() somewhere

signal(SIGSEGV, &sigsegv);

testing it is quite simple, just do this in D

void  testSegfault()
{
	
	class SegTest {
		void test() {}
	}
	SegTest a;
	a.test();
}

Now looking at the debug file, you can work out where it failed...

----BACKTRACE------
----805e971------ (THIS IS MY OUTPUT CODE)
_D9myproject7cmdLine6daemon11print_traceFZv
init.c/src/myproject/cmdLine/daemon.d:306
----805e46b------ 
sigsegv
init.c/src/myproject/cmdLine/daemon.d:121
----804db18------ (AND NOW FOR THE LOCATION OF THE SEGFAULT)
_D9myprojectfort7manager7manager7runOptsFZAa
init.c/src/myproject/manager.d:50
----805617a------
_D9myproject10webRequest10webRequest5parseFAaAaKAaZAa
init.c/src/myproject/webRequest.d:89
----8050c3e------
_D9pmyproject14myprojectThread14myprojectThread18dealWithWebRequestFAaAaZv
init.c/src/myproject/myprojectThread.d:331
----80503d0------
_D9myproject14myprojectThread14myprojectThread3runFZi
init.c/src/myproject/myprojectThread.d:111
----8076260------
_D3std6thread6Thread11threadstartUPvZPv
??:0
----a7fd10bd------
??

I'm sure with some more work, you could get it to log to syslog...

Autocompletion in leds for Digitalmars D

2006-12-12 10:16:00

Article originally from rooJSolutions blog

I've been spending far to much time looking at the autocompletion in leds for D. The justification, while growing weaker the more time i spend on it, remains the same. Autocompletion and AutoHelp save a huge amount of time when writing code.

The PHP parser in leds is working quite well, is organically written, and has room to grow, the next major step is guessing object types by matching method calls on unknown objects, so while I spend about the same amount of time coding in D as I do in PHP now, I had decided to look at the D parser, and push that along.

Antonio had written the current parser which is in the dantfw project. It aimed to parse all C like languages, C, D, Java and C# etc. This meant that rather than following the D specification, the tokenizer and parser where quite generic. and lead to two issues.

When I added autocompletion as you type, rather than on demand (ctrl space), I noticed that it was attempting to autocomplete when i was typing "strings" and comments.
Quite often it appeared to be missing some key variables or methods that it should have known about.

The first issue I resolved by rewriteing the tokenizer (following the D specification closely), and re-tokenizing the document as you type (if it thinks your scope may have changed), then deducing the scope being input.

The second, I began to conclude was more related to the generic nature of the parser. It was making many assuptions about the language that turned out to be incorrect in D's context. The only, rather drastic solution was to rewrite the parser....

So on rather a quiet day i started that thankless task. Using Antonio's original design, of using object contructors to parse and eat their own tokens, I set about writing loads of switch/case combos.

My first effort worked reasonably well, but the more i wrote it, I began to realize that both the complexity, and the use of a series of tokens had a number of flaws. The declaration code for parsing methods, and variables was quite large, and very similar. (ending in a lot of redundant, duplicate code) Declaring multiple variables in one line also made the resulting abstract syntax tree objects quite klunky.

At this point I put it on hold for a while, partly to consider alternative approaches, and more to ensure paid work was not falling behind.

When i returned, I had come up with a few ideas

look at dmd for inspiration
check dsource to see if somebody had already done this
consider preparsing tokens into a tree before sending them to the parser.

From studing the dmd parser, I realized that Walter had gone for a parser Object in C++, (rather than our constructor parsing) and broken the different scopes into different parsing routines When the current parsing routine they hit a syntax patern that matched it called the parse do deal with that scope, that in turn added a new object to the AST Stack. He also appeared to skip all Comments/Whitespace and EOL's from the token fetching routine, (actually merging the comment block into the next token, although it's a touch more complex than that.)

From dsource I found codeanalyser, which after a bit of hacking to remove what appeared to be the C++ memory stack allocation routines. I finally got to compile (without crashing dmd), parse and output what looked a little bit like syntax trees.

The downside to this project was that extracting the detailed information I required (type definitions, method declarations, line numbers etc) was not really feasible, and the tree that was produced by the code included a significant amount of noise, (in that it frequently created tree nodes for failed syntax matches). Along with this, it would require some work to store the token data within the tree that was created.

On the upside, in debugging the compiling issues I began to get a better understanding Templates (one of those black magic features of D). I ponder if the introduction to templates in D should basically start with the statement.
"D does not have preprocessor macro's, it has Templates" as while Templates are considerably more powerfull than Macro's, from a comprehension point of view, that is effectively what they are. And, given that D's decision not to have macro's, on what appears to be a sensible view that they obfusicate the code (along with often making it a nightmare to compile), I consider Templates to be on the list of features to be used with extreme caution, as they have a similar obfusicating effect, although perhaps not to the same degree....

Anyway, so the last option was to chuck my first draft, and use an idea of building a token tree, and passing that to the parser, rather than just giving a series of tokens. Having got about half as far as I did before, I consider this one of those little gem's of an idea. That makes writing the Parser considerably simpler..

Consider the simple statement

void main(char[][] argv) { }

Which would tokenize into the following.

Token.T_void : void
Token.IDENTIFIER main
'('
Token.T_char char;
'['
']'
'['
']'
Token.IDENTIFIER argv
')'
'{'
'}'

Using a post tokenizing tree building routine, it now looks like this, which from the perspective of the parser, is considerably simpler to deal with.

Token.T_void : void
Token.IDENTIFIER main
'('
    Token.T_char char;
    '['
	 ']'
    '['
	']'
    Token.IDENTIFIER argv
    ')'
'{'
    '}'

From a pattern perspective, we are just looking for

BaseType, IDENTIFIER '(' '{' == a method declaration.

(although it's a bit more complex than that in real life!)

The tokens inside of a '(', '{', '[' collapse into that token, so we can either ignore them, when not needed, or send them as a set to a parsing routine. which doesnt have to keep dealing with nesting or determining the closer.

Anyway, the new parser is slowly ticking away, once it parses correctly, the next step is to work out how to make the resolver cleaner...

PHP Grammer added to leds, and how to build leds

2006-11-13 14:46:00

Article originally from rooJSolutions blog
I've spend quite a bit of time working on leds, specifically focusing on the autocompletion and help implementation for D and PHP, while the D is still ahead, in terms of cross file lookup, autocompletion and display of docbook comments, PHP is beginning to catch up.

The major part of autocompletion for PHP is dealing with the grammer parser (as I blogged before that I had done a pretty much perfect tokenizer). For this I took alot of inspiration from the D parser in dante. Which used for me, anyway, a rather creative method of parsing the language into Components.

In principle it breaks the language down into simple parts

files
classes
methods
codeblock (code inbetween { and })
statements something ending in ";" ,or "while ( ) { }" (which is a statement with a codeblock)

the parser method, basically relies on each class to find all it's subcomponents, then let's the subcomponent determine where it starts and ends. - relying on the file to store the pointer to the current position in the list of tokens.

The result is a very simple to write grammer parser, that can be grown organically, rather than the classic grammer parser that depends on the need to fully document every pattern in the language.

When I originally started this post, it was intended as install instructions for leds. which are included in the extended body. And also partly as a reference for me to get it up and going quickly on other machines. The documentation uses Makefiles (as you would use with C), however, after having chatted to Antonio Moneiro, the original author of leds, he had been working on a compile tool (compd).

This compile tool negated the need for makefiles, as it basically worked out what commands to run to build an application, by just being given the files to build, binary library paths and source library paths (bit like .h files in C)

This solved alot of the issues which the rather hacked together makefiles creates, but didnt solve the major problem. That you need to install the libraries in specific places, or modify the build command to help it find the libraries. eg. leds, uses libraries dantefw, dool, dui (GTK) and has to know where you have built those libraries, and where the source is.

So with a little bit of hacking, I added code to the build tool, such that when it has built the library, it writes a list of file paths and dependant libraries to either ~/.compd/{libname}.compd or /etc/compd/{libname}.compd.

Then for example when building leds, you just tell it that you need -ldantefw, it will look in /etc/compd/libdantefw, and find where you compiled the library to, and all the include paths. - hence you no longer need to specify paths anymore, and applications and libraries just build. without editing make files or using autoconf.

In addition we are still discussing how we can make it run more like "make", where it picks up a file in the current working directory, and uses that as it's arguments. so making leds would be as simple as typing "compd", or "compd -f leds.compd"

Parsing PHP in D.

2006-06-01 08:55:54

Article originally from rooJSolutions blog
I keep meaning to write a full log of the whole mythtv experience, as the number of hardware headaches that that has gone through is enough for at least 4 or 5 posts (including the need to rebuild from scratch this whole server a few weeks ago due to hard disk failures from overheating..). But this week, for what seems like the first time this year, I actually scraped 2 days to work on yet another new pet project. Phpmole's replacement....

Phpmole is my lifeblood for development, when I wrote it, I added all the features that where missing from other editors, and the resulting editor made a huge difference to my productivity.

code folding
Although annoying at times, is a good way to dive into new code.
autocompletion
look up PHP functions / variable names etc.
inline help hints
show the signature of a php method inline.
List of open files on a left bar.
Since I often have 30+ files open, this is very usefull to flip between them.
Standard editor features, like syntax highlighting etc.

However Phpmoles code base is now pretty old, and was written before I did much PEAR work (eg. it's messy....). There are alot of design decisions that related to PHPGTK1, and the need at the time to implement file transports (ssh/midgard/file) etc. Which are now redundant as things like sshfs do that much better.

So I've started hacking on leds, the editor I mentioned before when hacking on D. It has the benefit of being relatively small codebase wise, and very easy to understand. Let alone it's fast and runs as a binary, so I can eventually just compile the lib's and distribute them, rather than a huge array of php files.

Anyway, the first stage in the great conquest was to teach leds a little about PHP. So starting with Zend's lexer, I hand crafted a reasonably complete lexer in D, (it ignores variables in quoted strings, and heredocs), but should be enough to grab defined classes/methods/vars etc. and work out where to put the folding...

While at present, it's has a few in-efficiencies, as the design was a little organic. It is however quite an interesting way to build a generic language parser and only took 2 days to write and test.. http://www.akbkhome.com/svn/D_Stuff/leds/PhpParser.d

D compile time learning...

2006-03-23 16:32:26

Article originally from rooJSolutions blog
Random notes during compiling:

method Argument missmatch:
mime/Document.d(264): function dinc.mime.Source.Source.skipUntilBoundary (char[],uint,bit) does not match argument types (Source,char[],uint,bit)

method overloading -> although you can overload a method within the same class with different signatures, in the extended class, creating a different signature will not work unless there is a matching definition in the base class.

More DigitalMars D - finding a string in a stream

2006-03-15 10:14:53

Article originally from rooJSolutions blog
The three good ways to learn a language:

hack on some existing code
write a simple program from scratch
port some code from another language to the one you want to learn.

Well, this week I though I'd have a go at the third. Picking something that was well writen to start with, I decided to use binc, an extremely well writen imap server which is writen in C++, and see how it converts to D.

Rather than attacking the core imap bit, I decided to start with the MimeDocument decoding part. something relatively self contained, and conceptually quite simple. Most of the Porting involved bringing together Classes that had methods defined in multiple files (as seems common with C++), and merging them into nice classes in D.

While most of it will probably end up untested until it's all ported, one single method stood out as a good simple test of working with D. - Searching for a string (or delimiter) in a stream.

Obviously, one of the things that happens with an imap server, is that it has to scan a email message, and find out how what makes up the email (eg. attachments, different mimetypes and how they are nested. A brute force approach would be to load the whole message into memory, and just scan through looking for the sections. However, since email messages can frequently be over 5Mb, It's obviously horribly inefficent. So the existing code used a simple C++ method to search for a delimiter.

Hit the more link for another simple tutorial...