A painting of me

Laying Out Software

   30 July 2018, early morning

I wrote this in 2007, when my life was all C and C++. I was working on migrating something that morphed from a small focused C program to a larger messy C++ program. I don’t remember why I didn’t publish it at the time. I’m sure I had more I wanted to say. Or maybe this advice is bad and with my forgetting all the C++ I used to know I no longer remember why.

—

I should write some posts about cleaning up old and poorly written programs. As software develops over time it sometimes ends up a huge unmanageable mess. It takes concerted effort to keep source code neat and organized. Furthermore, spending the time to think about how you organize your software will save you time in the long run. So, my first piece of advice for you budding software developers — i’m looking at you here Shima — is that source files should be as small as possible, and no smaller.

If you are working in C++ (or a similar object-oriented language), header files should be used to declare classes, and source files should be used for their definitions. Inline function should go in their own file as well. You should be able to look at a file and know what its contents are. Languages like C++ are fairly easy to work with because the structure of your code in the file system generally mirrors the structure of the program as discrete objects.

When working with a procedural languages like C, it is sometimes harder to see where things should be delineated. It is easy to fall into the habit of having one mother-of-all header files that contains all your declarations, and one source file with all your functions. This is stupid. Code should be organized such that unrelated functions, typedefs, structures, etc, are kept apart. Digging through a 3000 line source file looking for a function definition will make you crazy. You shouldn’t need a fancy IDE to manage your software projects.

Regardless of the programming language you are using, related functionality should be grouped and declared in their own header files, with definitions of functions in their own source files. Dividing your source code neatly in this fashion allows code that requires this functionality to (ideally) #include just the definitions it needs, and no more. You should be able to look at the #include directives in source file and header files and understand the dependencies of the code contained within; you should be able to see the relationships between the functions in your program. If you are lazy about the file structure of your source this becomes difficult. Don’t be lazy.

Comment |  

Get Off My Lawn

   21 August 2008, mid-morning

Unspace discuss their past, and the future of Ruby. There are a lot of interesting links in the post, but of particular interest was the post about Hadoop by Ted Dziuba. (The subtitle, “On the emasculation of Twitter and Dirty Harry” is certainly enjoyable.) There is a lot of interesting stuff being done in Ruby, but like Dziuba, I find a lot of it quaint and half assed. Sometimes I get the feeling that the community around Rails seems to be a bit of a cargo cult. You have a core group of people who know what their doing, and a lot of people who echo what the core says, but who perhaps don’t quite grasp what’s going on. Someone discovers REST and all of a sudden everyone is going on about RESTful this or that. Someone discovers automated testing and everyone is going on about Runit and Rspec. Mind you I’m probably just an elitist C++ programmer. One day I’ll write a longer blog post about that, but not today.

Update: Rethink sort of mangled one of my comment’s, which i’ll repost here:

Read the rest of this post. (558 words)

Comment |  

Crappy Programmers

   30 December 2005, early afternoon

Joel from Joel on Software posted an essay today entitled The Perils of JavaSchools. His main point is a simple one: dumbing down Computer Science programs helps no one. In particular, he talks about schools that only use Java in their curriculum. In doing so, such schools fail to teach what Joel considers to be two important things: pointers and recursion. First off, I should say that I don’t think either is particularly challenging stuff. Secondly, you can cover this material in Java.

Read the rest of this post. (643 words)

Comment [5]  

Open Standards are the Future

   24 August 2005, lunch time

Nerd Alert: A little bit lot more on this topic of the future of the web, and then I’ll shut up and start posting about movies I’ve watched again. I read over what I wrote yesterday, and for the most part it sounds like my objection to the WebOS is simply because it is a stupid name—not so, I think it’s a stupid idea too.

Read the rest of this post. (725 words)

Comment [3] |  

Kottke the Quasi-Computer-Scientist

   23 August 2005, early evening

There are several things I find thoroughly suspect or outright stupid in Jason Kottke’s post on the WebOS. I’ll write a longer post later. I had to say something now, as I feel dirty reading all this quasi-computer-science. Briefly: Kottke describes at great length his vision of the “WebOS”, which requires another OS to run. Awesome.

Perhaps I am being unfair. Maybe he hasn’t posted about how his vision of a WebOS will run the file system, perform memory management, schedule tasks, handle network communication and perform a slew of other tasks that operating systems perform. Or, perhaps I am being totally fair, and Kottke doesn’t know much at all about what he is talking about.

The Web browser (along with other browser-ish applications like Konfabulator) becomes the primary application interface through which the user views content, performs services, and manages data on their local machine and on the Web, often without even knowing the difference. Something like Firefox, Safari, or IE…ideally browser agnostic.

Kottke talks at length about the WebOS, when what he is really describing is the top most layer we as users typically deal with when working with computers: shells or window managers.

You don’t need to be on a specific machine with a specific OS…you just need a browser + local Web server to access your favorite data and apps.

If you think what Kottke describes is revolutionary, than you will definitely want to read all about XUL. What Kottke calls the WebOS is already here. Or, if you want to be a little bit boring, Sun already invented the WebOS, only they called in Java.

I wonder how Kottke expects his web server and browser to run. Magic? If you still need an OS, what is the point? This doesn’t shake things up for anyone—least of all Microsoft.

update: I was going to write more about this, but the comments are already full of interesting stuff. I may write about this topic another day, only next time i’ll be a little bit less snarky. Just a bit.

update: I’ve written a bit more about this topic: Open Standards are the Future.

Comment [14] |  

Richard Stevens

   11 August 2005, early afternoon

Richard Stevens was the author of several classic textbooks on computer networks and programming; he wrote the TCP/IP Illustrated series, in addition to Unix Network Programming Book and Advanced Programming in the UNIX Environment. Steven’s passed away in 1999, but his web site is still online today. I’ve been reading it the past few days while trying to find information on dealing with UDP packet loss. His conversational, friendly, style of writing obscures the fact that he has been dead for 6 years now; it’s a little strange. There is a lot of interesting stuff on the site. He has links to obscure Usenet posts, interesting because they provide context to papers I have read or subjects I have seen in textbooks. For example, the paper Congestion Avoidance and Control by Van Jacobson is mentioned as a work in progress in the following two email messages: Re: interpacket arrival variance and mean and Re: Your congestion scheme. Interested in implementing software timers in C? You may want to check out Implementing Software Timers by Don Libes. Steven’s site is definitely worth reading through if you are in a particularly geeky mood.

Comment [6] |  

Awesome MD5 Collisions

   10 June 2005, early afternoon

Perhaps awesome isn’t the word, but these researchers present two different meaningful documents that share the same MD5 checksum. Usually MD5 isn’t used to sign documents like this, but it is quite common to use MD5 to verify binaries on the Internet.

Briefly, MD5 is a hash function, a program that takes a big string of 1s and 0s (which is what everything on your computer is), and outputs a much smaller string of 1s and 0s. This smaller string is usually called a fingerprint of checksum. MD5 was thought to be secure, but was recently broken. For a hash function to be considered secure:

  1. given the value of a hash, it should be infeasible to find the input that produced the hash; given any input x
  2. it should be infeasible to find another input x' such that the hashes of x and x' match
  3. it should be infeasible to find two different inputs that have the same hash value.

One attack I could envision is creating an evil Trojan distribution of a popular open-source program: You could tell people you are mirroring a popular program—when you download from SourceForge for example there are countless mirrors for you to choose from. No one would assume anything is amiss, since the checksum for your application would match the checksum generated by the real programs being hosted by all the other mirrors; when the people run your evil program and it would do its evil things. (Update: not quite right, see comments.)

Comment [3] |  

Language of Choice

   19 March 2005, early evening

Sometimes, while you surf around on the Internet, you may come across people have a Language of Choice argument. In the western world we call these people geeks, but maybe where you are from they are referred to by another name. Recently I stumbled upon such an argument. When it comes to web development, some people are obsessed with Java. Others with Ruby. The lead developer at Signal vs. Noise, David Heinemeier Hansson, is a bit of a Ruby Zealot. He wrote the popular framework Ruby on Rails. He used the framework to write a simple To-Do list application, Ta-Da Lists. Of course, once the application was released, the bitching began. On the internet, especially with software, people will quickly tell you to put-up or shut-up. Geert Bevin, the bitcher in question, opted to put-up. Of course, this got everyone up in arms about which implementation was better. This is the geek community equivalent of a pissing contest.

Comment [7] |  

Eve's Not A Bitch Encryption

   22 February 2005, lunch time

Ju-lian and I had devised an encryption scheme when we were in University called Eve’s Not A Bitch Encryption. The security of the scheme hinged on the fact no one wants to read your stupid email.

Comment [2] |  

Testing Memory Allocation Failure

   15 December 2004, evening time

When programming in C a programmer must use the malloc function to request memory on the heap. In C++ one can use malloc, but generally the proper thing to do is use new, and for arrays new[]. The new operator, when successful, returns a chunk of memory big enough for the object or array you wish to store. Unfortunately, sometimes new and malloc will fail to find any memory.

Read the rest of this post. (410 words)

Comment [7] |  

Fuck You Too, Waterloo

   19 April 2004, early morning

Well, it took them a while, but Waterloo finally told me to piss off. I applied to do graduate studies here back in December, and have been waiting to here from them all this time. My dad called this evening to say that they got a letter saying I didn’t get in.

Am I surprised? A little bit. I’ve done quite well in the computer science program here. Though, in all honesty, that isn’t much of a feat. Perhaps the faculty here knows that as well.

Now I wait to see what U of T has to say. I think I’m going to be looking for a job quite soon.

Comment [9] |  

Bottom-Up? Top-Down?

   28 February 2004, early morning

A parser will take an input source file and produce a parse tree. The parse tree can then be used to do some context-sensitive analysis of the language, i.e. making sure that variables are declared before they are used, or that break statements occur within loops.

Usually this type of analysis is done by attaching attributes to each of the nodes in the parse tree, and calculating the values of these attributes using information from the lexer, and the structure of the tree.

You can compute some attributes in a bottom-up fashion. That is, the attribute at a particular node is computed using information from its children. Such attributes are said to be synthesized attributes. We can do a bottom-up walk through the tree using a depth-first search type traversal. You can compute some attributes in a top-down fashion. The attribute at a particular node inherits its value from its parent. These attributes are referred to as inherited attributes. We can do a top-down talk through the tree with a breadth-first search type traversal.

Sometimes things get more complicated then that. Sometimes things get much more complicated then that.

Comment |  

STL

   22 January 2004, the wee hours

The Standard Template Library is freaking crazy. This is the place I go to figure out what does what. I’m working hard on compilers, though I seem to lack the focus I used to have. Our first assignment is due in a week. I am feeling stressed.

Comment |  

Grammars

   21 January 2004, terribly early in the morning

You specify the syntax of a context free language with a context free grammar. Most programming languages correspond to a special class of context-free languages known as LALR languages. I have spent the past few days trying to write a working Ada/CS grammar, only to discover one laying around on the internet that works just fine. After getting rid of all the Yacc-isms I was left with something useful. I can’t believe I wasted so much time these past few days.

I am now trying to parse the output of this program called ilalr, which takes as input a LALR grammar and returns information that can be used to build a program that parses the grammar

Compilers is probably going to be a very hard course.

Comment |  

Al Aho

   16 January 2004, early evening

Just got back from a talk given by one of the big guys in the Computer Science community, Al Aho. (He has written a lot of important computer science texts, most notably the Dragon Book.) The talk was quite good. He discussed the state of software today, and talked about his research in to compilers and programming languages for quantum computers. I saw Yang and Phil, Nabeel, Ju-lian and Ryan at the talk. The room was packed. I’ve never been to such a busy talk.

Comment |  

C++

   15 January 2004, the wee hours

I seem to have forgotten a lot of C++. That sucks. I will probably know a lot when I am done this project.

Comment |  

Scanners

   14 January 2004, early evening

I met with Kumar today to work on our compilers project. We think we have sketched out what we need to implement in order to recognize the tokens in our language.

For those who don’t know, when you write a computer program, the first part of compilation involves a program called a scanner breaking your program up into individual tokens, which are then fed to a parser and processed further. For example, a C code fragment like “x = x + 1” would become something like “ID ASSIGN ID PLUS INT_LITERAL”. Additional information like the name of the identifier and the value of the integer literal would also be stored in the token, to be used later when compiling. So now you know a little bit more about compilers.

It is so cold outside. I am in no mood to walk back home, but will probably do so shortly.

Comment [5] |  

Compilers

   13 January 2004, the wee hours

I have started reading the definition of the language we are supposed to write a compiler for, Ada/CS. Ada/CS is a subset of the language Ada, which was developed for use by US department of the defense. Compilers looks like it will be a bitch of a course.

Comment [1] |  

E.W. Dijkstra Archive

   18 September 2003, late evening

E.W. Dijkstra Archive Very cool site that contains many manuscripts written by Dijkstra. (Dijkstra is a famous computer scientist for those who don’t know. He’s famous for his algorithm for finding shortest paths in a graph.)

Comment |