Metacircular thoughts

November 28, 2007

Learning Lift – Step 0: Learn the Java platform

Filed under: Java, RIAs, Scala — metacircular @ 9:57 pm

Lift is a Scala web framework. Its creator is none other than the mighty David Pollak, a man who was a mature programmer when I was a newborn infant.

It combines many of the best ideas from other web frameworks. It’s at the point where it’s small enough to be manageable but large enough to be featureful. Now is the perfect time to get in on it.

The problem is that it until about a week ago, my computer couldn’t even handle Lift, I didn’t have enough RAM. That problem has been solved.

But I still don’t have a clue about Maven and the other Java components Scala depends on which I don’t think David even realizes are nontrivial to learn. (They are trivial if you are David Pollak.)

And besides, Lucene is too awesome to pass up.

So, the first step is getting comfortable with Java web programming.

To do this, I am going to create a wiki/PIM type product that is more or less a Backpack ripoff. It will start as a trivial wiki, with Backpack-type features added later. Here are some features that are essential:

  • Fulltext search. This is an essential feature of all modern data-driven applications. Lucene will do the job.
  • Version control. I insist on knowing who changed what and when on pretty much any digital artifact I work on on computers these days, even binary assets like Photoshop files.
  • Simplicity. It must be beyond braindead simple to jot down quick notes, as simple as it is to type in a query to Google.
  • Responsiveness. I fucking hate slow software! Unfortunately we will have to dirty our hands with JavaScript in order to carry out nice Ajax stuff. I hate Ajax hype more than anyone I know, but it’s simply a fact that client-side scripting is the way to make excellent, responsive web applications.

In order to make this tractable, Internet Explorer compatibility is entirely optional, and I’m not going to waste my spare time on it.

Probably mobile access is important but I barely ever use cell phones.

The book I’m going to be working through is a recent release from APress: Beginning JSP, JSF and Tomcat Web Development: From Novice to Professional.

Let the games begin.

March 1, 2007

Another reason to ignore Scala disappears: excellent IDE integration

Filed under: Scala — metacircular @ 8:15 am

A notice was posted to the Scala mailing list just a short while ago announcing the first release of a plugin for IntelliJ IDEA.

You can see a high-res screenshot of it in action or download it.

As Eugene Vigdorchik wrote in his message, here are some of the features in the plugin:

The feature highlights are as follows: syntax highlighting, formatting, parsing errors highlighting, folding, surround with, file
structure outline, keyword completion, goto class, compilation (with no dependent files calculation yet), debugging, cross-language resolve of types/find usages/rename of classes and traits.

Definitely looks very cool. Martin Fowler? Bruce Eckel? Bruce Tate? Dave Thomas? Are you guys listening?

February 20, 2007

So, what’s going on with Scala?

Filed under: Scala — metacircular @ 5:47 pm

I haven’t written about Scala in a while. Here are some updates.

  • I’m working on an RSS aggregator designed for browsing thousands of feeds. I’m implementing it as a desktop application using SWT and JFace. I plan on releasing several components of it independently for use in other applications.
  • Phillip Haller is continuing to work on the actors library, adding a notion of futures, which I don’t really understand, but it’s probably pretty useful if you do. The actors library is kind of a wild frontier and I don’t get how to use it. Supposedly a tutorial is in the works but knowing how the LAMP people document things they’ll probably spend 90% of it talking about how it was implemented rather than how to use it.
  • A web framework is in the works and will be released quite soon. Many other people have bits of Scala web programming code they intend to polish off in a releasable form, including code that has been used for real, heavy client work for over a year (not bad considering Scala was only an idea for a new language 5 years ago). When the unreleased web code becomes available I will be on it like stink on poop: screencasts and tutorials lie ahead. I will raise hell if things get too design-y/enterprisey in any Scala web code.
  • Steady incremental releases of both the main Scala distribution and the Eclipse plugin continue.
  • The Scala mailing list is sufficiently active that Martin Odersky has considered splitting it off into a low-volume mailing list for announcements only (which I suppose is what the Scala list initially basically was) and a higher-volume list for the kind of chatter that occurs normally (e.g., me struggling to figure out code using asynchronous message passing).

Thought leaders like Martin Fowler, Bruce Eckel, and Dave Thomas who tend to be early adopters of new agile technologies continue to not really notice Scala (although Eckel wrote a brief blog entry mentioning it). If we can get a good web toolchain in place, improve the Eclipse plugin, and add mock objects to SUnit/Rehersal, I don’t think they’ll have a good reason to continue ignoring it as an option for Java developers looking for something more expressive but don’t want to leave the JVM behind.

Afterthought: We also have an IRC channel, #scala on Freenode (irc.freenode.net). There’s currently 13 people in there, so it’s definitely growing.

Afterthought 2: I ran some crude tests for benchmarking event-based actors as per Joe Armstrong’s concurrency challenge where you set up a ring of n processes and send a simple message around the ring m times, and basically event-based actors in Scala scale like a motherfucking demon. Details here.

February 11, 2007

F# vs. Scala: an uninformed comparison

Filed under: Scala — metacircular @ 10:35 pm

If you want to do functional programming but also actually get stuff done, you don’t have many options. Pretty much your options are F#, Scala, and BusinessObjects’ new CAL. I don’t really want to write Haskell-like code and too much of what I tinker with is inherently stateful and imperative, so that leaves F# and Scala for me. Scala and F# are pretty much analogues of one another, but with different target virtual machines.

Here is, from what I can tell, are some of the differences and similarities between F# and Scala.

  • F# has better IDE integration via Visual Studio. However, this is like saying that a certain kind of boat is really good for cruising around in the sewers. The Visual Studio IDE is an orgy of complex, useless stuff. Then again, it has better REPL integration, creating more of an interactive feel. Scala’s Eclipse plugin, as raw as it is, is for an open IDE whereas Visual Studio is proprietary.
  • F# has far less support for objects and object-oriented programming. Scala wins big on this. This isn’t entirely fair, though, as unifying object-oriented and functional programming is one of the central goals of the Scala project. However, Scala also has this remarkable ability to look like Java if you want it to, down to superfluous semicolons and superfluous ()’s when you call a function that takes no parameters.
  • They have roughly comparable performance.
  • Since F# talks to .NET, it’s going to be better for creating Windows GUI apps. I know there’s SWT, but it really doesn’t compare. Wishing for cross-platform UIs is a pipe dream. The sooner you realize this, the better.
  • F# targets .NET which, while it certainly isn’t going anywhere anytime soon, is proprietary. Of course, this was true of Scala up until a few months ago.
  • F#, being more ML-like than Scala, has more of a minimal lambda calculus-y feel to it the way OCaml, Standard ML, and Scheme have. F# appears to be a considerably simpler language.
  • Both are powerful, expressive, pragmatic languages that are far superior to C# or Java.

February 9, 2007

Beautiful code in Scala

Filed under: Scala — metacircular @ 9:32 pm

The most beautiful code I have seen in Smalltalk has come from the combination of higher-order functions and the Smalltalk-style method invocation syntax scala allows (but does not force you to use). Consider the following:


scala> val data = List range (0,11)
data: scala.List[scala.Int] = List(0,1,2,3,4,5,6,7,8,9,10)

scala> def even(n: int) = (n%2 == 0)
even: (scala.Int)scala.Boolean

scala> data filter even
line12: scala.List[scala.Int] = List(0,2,4,6,8,10)

scala> import Console._
scala> (data filter even) foreach println
0
2
4
6
8
10
line13: scala.Unit = ()

Not bad at all, IMO.

February 8, 2007

The evolution of a small Scala program

Filed under: Scala — metacircular @ 2:54 pm

Update: there is a subtle bug in the code.

To my surprise, when I posted the last code snippet, members of the Scala community actually took some time to play around with it. Much thanks to everyone who took a look.

Eric Willigers suggests some small things to alleviate the akward parts of the code I originally posted.

David Pollak properly factors things out and introduces a more idiomatic way of gettin’ yer data o’er the wire.

Finally, Adrian Moors shows that it’s possible to write Haskell in any language. :)

Here is a code kata for you: meditate on each of these 4 ways of doing the same thing and decide which you think is best.

February 7, 2007

Towards polite HTTP retrieval in Scala

Filed under: Scala — metacircular @ 11:09 pm

Update: The Scala mailing list has interesting suggestions on any number of ways to improve this code. It is left as an exercise to the reader to evaluate the merits of each approach.

I’d like to revisit an old post on using the Jakarta Commons HTTP client. Specifically, I’d like to show how to do something approaching polite HTTP retrieval in Scala.

What does “polite” retrieval imply? It means using conditional GET and gzip compression to minimize the amount of bandwidth used when retrieving a resource. This is especially relevant if you’re going to write an RSS aggregator. A widely read post, Conditional GET for RSS Hackers, explains the rationale of conditional GET. When we retrieve something that changes from time to time, we keep track of the last ETag/Last-Modified header values and then supply that in the future. If the resource hasn’t changed, we’ll get a response code of 304 indicating as much. We pass these in the If-None-Match and If-Modified-Since headers in the HTTP GET request.

Gzip compression causes the server to transfer the resource we want to retrieve in compressed, gzipped form, substantially reducing the amount of bandwidth we need to use. We indicate that we can accept it by adding a header of the form Accept-Encoding -> gzip to our request.

Furthermore, it’s desirable to have a custom timeout amount and have a custom number of retries.

I haven’t seen any code samples out there put all these techniques out there for the Jakarta HTTP client, or even any showing how to use and decompress gzipped stuff over HTTP, so I’m going go through the code to do that in Scala. This would easily apply over to Java. In fact I have to warn you that the code is going to be ugly and Java-ish.

We start out by having a function httpget that takes in our parameters and returns (1) the response code (2) the body, if any (3) the ETag response header if any and (4) the Last-Modified response header if any.


def httpget(uri: String, timeout: int, retry_cnt: int, etag: String,
      last_modified: String): {int, String, String, String} = {

We declare a HTTPClient instance, set the timeout as per our parameter and set the retry count as well. We tell HTTPClient to accept redirects, and then in a key step, we tell it we will accept gzip-encoded content.


val client = new HttpClient
// our parameter is in seconds, HTTPClient uses milliseconds
client.getHttpConnectionManager.getParams
	.setConnectionTimeout(timeout*1000)
val m = new GetMethod(uri)
m.getParams.setParameter(HttpMethodParams.RETRY_HANDLER,
		     new DefaultHttpMethodRetryHandler(retry_cnt, false))
m.addRequestHeader("Accept-Encoding", "gzip")
m setFollowRedirects true

Then we give it our ETag and Last-Modified data if we got any.


if(etag != "")
   m.addRequestHeader("If-None-Match", etag)
if(last_modified != "")
  m.addRequestHeader("If-Modified-Since", last_modified)

Next, we actually do the request and get the ETag/Last-Modified response headers.


var code = 0
try { code = client executeMethod m}
catch { case e: Exception => Console.println("Error: " + e) }

// XXX this is akward
var new_etag = ""
if(m.getResponseHeader("ETag") != null)
  new_etag = m.getResponseHeader("ETag").getValue
var new_lm = ""
if(m.getResponseHeader("Last-Modified") != null)
 new_lm = m.getResponseHeader("Last-Modified").getValue

Now we can finally deal with the response. If the server supports gzip, we wrap the response body stream object around a Java class, GZIPInputStream. We then in turn wrap this around a BufferedReader class. Then we read from the BufferedReader one line at a time in a loop. It’s kind of lousy but it gets the job done. Finally, we release the connection and return our data using the new tuple syntax in Scala 2.3.3.


var resp = ""
if(code == 200) {
  var str = m.getResponseBodyAsStream
  if(str != null) {
	val enc_hdr = m.getResponseHeader("Content-Encoding")
	if (enc_hdr != null && enc_hdr.getValue.equalsIgnoreCase("gzip"))
	  str = new GZIPInputStream(m.getResponseBodyAsStream)

	var data = new StringBuffer
	val buf = new BufferedReader(new InputStreamReader(str))

	// XXX this is also akward
	var line = "ignore"
	while(line != null) {
	  line = buf.readLine
	  if(line != null) data.append(line)
	}
	str.close
	resp = data.toString
  }
}

m.releaseConnection

{code, resp, new_etag, new_lm}

Something about this code smells but I’m not sure what to do with it; I’m pretty new to the JVM. If any old-time Java programmers want to get some Bileblog-esque Java frustration out and tell me exactly where I’m going wrong, I’d appreciate it.

OK, let’s write a driver function to call this code so we can experiment with it. This should be self-explanatory.


def main(args: Array[String]) {
  if(args.length < 3) {
    System.err.println("Usage: scala -Dlog4j.ignoreTCL politehttpget <URL> <ETag> <Last-Modified>")
    exit(1)
  }

  Console.println("Retrieving " + args(0))
  var uri = args(0)
  var site_etag = args(1)
  var site_lm = args(2)

  Console.println("Using ETag: " + site_etag)
  Console.println("Using Last-Modified: " + site_lm)
  httpget(args(0), 10, 3, site_etag, site_lm) match {
    case {n, data, etag, lm} =>
      Console.println("Got response code " + n)
      Console.println("Read data of length " + data.length)
      if(etag.length > 0)
        Console.println("ETag: " + etag)
      if(lm.length > 0)
        Console.println("Last modified: " + lm)
  }
}

The -Dlog4j.ignoreTCL is to make log4j not output annoying, unintelligible errors. I don’t know why I have to do that and I don’t care, I just know that doing that makes it shut the fuck up. Here are some examples of calling the program.


$ scala -Dlog4j.ignoreTCL politehttpget http://metacircular.wordpress.com/feed "" ""
Retrieving http://metacircular.wordpress.com/feed
Using ETag:
Using Last-Modified:
Got response code 200
Read data of length 42749
ETag: "0c13d35331869cd6df1d3add0b25f3c3"
Last modified: Tue, 06 Feb 2007 23:01:32 GMT

$ scala -Dlog4j.ignoreTCL politehttpget http://metacircular.wordpress.com/feed "\\"0c13d35331869cd6df1d3add0b25f3c3\\"" "Tue, 06 Feb 2007 23:01:32 GMT"
Retrieving http://metacircular.wordpress.com/feed
Using ETag: "0c13d35331869cd6df1d3add0b25f3c3"
Using Last-Modified: Tue, 06 Feb 2007 23:01:32 GMT
Got response code 304
Read data of length 0
ETag: "0c13d35331869cd6df1d3add0b25f3c3"
Last modified: Tue, 06 Feb 2007 23:01:32 GMT

Notice how when we first retrieve the feed for this blog, we get back an ETag and a Last-Modified. When we make the same request again but supply that data to the WordPress server, we get a 304 response code, indicating that, unsurprisingly, the feed hasn’t changed since when we made the first request 15 seconds prior.

Hopefully this was useful to you. Happy hacking.

Download the code for this post (Apache license)

February 4, 2007

Scala makes XML processing easy

Filed under: Scala — metacircular @ 2:20 pm

Ruby and Python are great for a lot of data munging tasks, but one thing they, along with most other languages, suck at is XML. XML is not complicated but the DOM is and SAX kind of is compared to what I’m about to show you.

The basic thing to realize is that Scala has XML baked into the syntax. So we can fire up our interpreter and have XML be first-class values (and call methods on it), like so:


scala> val data = <data id="1">this is test data</data>
data: scala.xml.Elem = <data id="1">this is test data</data>

scala> data.text
line1: java.lang.String = this is test data

scala> data \\ "@id"
line2: scala.xml.NodeSeq = 1

scala> (data \\ "@id").text
line3: java.lang.String = 1

scala> <data>words here <child>more words</child></data> \\ "child"
line4: scala.xml.NodeSeq = <child>more words</child>

In the first line entered, we see that if we just type raw XML then Scala infers its type to be an XML element. The class Elem has a method text on it which retrieves the text content between the tags, as you can see in our REPL session. It also has a method \ (like Lisp, Scala is very liberal in what you can name a method; many non-alphanumeric characters can be used for method names, and in fact the actors library, which implements Erlang-style concurrency as a library, has methods called ! and !?) which retrieves the child nodes of an element. Giving an argument to the method that starts with @ makes it retrieve attributes. Finally we see that \ (or \\, whichever you prefer) can also be used to retrieve child nodes. If an element has more than one child, retrieving the child gives us a sequence we can iterate over using sequence comprehensions (Scala’s way of doing foreach from languages like Ruby, Python, C#, and Java).

Hopefully you agree that this is a saner way of accessing XML than using the DOM or SAX; by comparison, the DOM looks way too design-y. One of the Scala people wrote a paper about this stuff which shows a verbose DOM example and then the equivalent Scala code; the difference is striking.

So, OK, let’s get a real example here which is still pretty easy to digest. Weblogs.com publishes a list of recently updated blogs. It so happens that about 80% of them are spam blogs, but that’s irrelevant for our purposes here. The file they publish looks like this:


<?xml version="1.0" encoding="UTF-8"?>
<weblogUpdates version="2" updated="Tue, 06 Feb 2007 04:31:00 GMT" count="2136465">
<weblog name="name" url="http://example.com" when="n" />
<weblog name="name2" url="http://example2.com" when="n" />
...
</weblogUpdates>

You can find the pings in the last hour here; it’s about a 10-15 MB file so you should probably right click, save as rather than loading it in your browser. The one I downloaded had about 160,000 entries and was about 16.5 MB in size. That’s an appreciable size.

So, let’s just do a simple processing example where we just iterate over the pings (again, 80% of them are going to be from splogs), retrieving the URL it refers to and incrementing a loop variable. A more realistic example would be to send the data to a database, but that would obscure from the example too much. So here is some code to process the changes.xml file.


import scala.xml._

object processblogpings extends Application {
  val start = System.currentTimeMillis

  val data = XML.loadFile("changes.xml")
  Console.println("Updated: " + (data \\ "@updated").text)

  var cnt = 0
  for(val entry <- data \\ "weblog") {
    // extract the URL field but don't do anything with it
    val url = (entry \\ "@url").text
    cnt = cnt+1
  }

  Console.println("Found " + cnt + " entries")
  val end = System.currentTimeMillis
  Console.println("Took " + (end-start)/1000.0 + "s")
}

Well, that's really not too bad. We read in and parse the XML file in one line with a call to loadFile and then retrieve attributes by looping and using the \ method. Really not too bad at all, I think.

The above code took 4.2 seconds to process about 163,000 entries on my Pentium 4 2.8 Ghz. That's pretty fast, especially considering the brevity we achieved, in my opinion. Now, this library might not work for files larger than what can fit in system memory, but that's a pretty rare situation. For most situations, we can take advantage of Scala's XML savvy to make life easier. Scala is, as far as I can tell, the ultimate XML processing language. There, I said it. It's powerful because we can use XML data in conjunction with all the other constructs of a modern programming language: closures, pattern matching, objects, etc. etc.

I didn't even show any examples of doing pattern matching on XML (!). However, this should be enough to get you interested in looking further; the best source is probably the draft book on XML in Scala. Note that unlike other functional languages which claim to process XML, we can actually handle Unicode data unlike, say, OCaml or a variety of Common Lisp/Scheme distributions out there. Happy hacking!

January 27, 2007

Wrapping up little bits of Scala evangelism

Filed under: Scala — metacircular @ 11:11 am

Thus far I’ve tried to argue that Scala is a functional programming language that is also very pragmatic for real-world work. I think any Java developer who’s looking for something more sophisticated but doesn’t want to give up their development environment or their current toolchain should take a very serious, hard look at what Scala has to offer. JetBrains, creators of the popular IntelliJ IDEA, have shown interest in Scala with good reason; a few of their employees post to the Scala mailing list on a regular basis.

Here are some links to other interesting code examples.

  • Here is an example of using a GUI library in the works by EPFL. It wraps around Swing and makes it much easier to use. It’s only in the early stages but it looks promising. It certainly makes Swing easier to use.
  • David Pollak has been writing some interesting extended examples in Scala and you should have a look.
  • The Scala wiki has an impressive example of simplifying JDBC with implicit definitions. Implicit definitions allow you to consume Java libraries (which tend to be akward to use, a state of affairs still preferable to what you find if you’re using Lisp/OCaml/Haskell, where the libraries you need don’t exist and probably never will unless you write them yourself) on much more amiable terms without having to extend them and thereby getting stuck in a rigid class hierarchy unfamiliar to a Java programmer who has used the library in question before. Implicit definitions are quite useful when used properly.

With that, I will take it as a given that Scala is a legitimate, useful language that any person who cares about producing brief, readable, high-performance code should definitely consider. The entirety of Java-land is the Scala hacker’s oyster, and he picks and plucks his favorite libraries at will, writing tiny bits of view code to simplify away the ugliness and verbosity, writing code often comparable to Erlang, Haskell, or OCaml in brevity. It is up to the Java community as well as functional programmers whether they want to be open to new ways of doing things that will help them and their customers or not.

Scala is definitely a little rough around the edges, still, being so young, of course. Probably its biggest flaw at the moment is documentation. But that is getting better on a rapid basis. The Scala team has shown that they are extremely open to user feedback and they are making rapid, regular progress towards improving both the language and documentation. They have gotten amazingly far, considering that Scala is only about 5 years old, and have only had a 1.0 release for a few years.

If all this isn’t enough to convince you about the potential merits of Scala, probably nothing will, and therefore I will cease preaching to either the converted or the unconvertible.

January 16, 2007

Example: Using Jakarta Commons HttpClient for multithreaded HTTP retrieval

Filed under: Scala — metacircular @ 3:01 pm

Public service announcement: The following code is not usable for real-world HTTP retrieval. If you write a spider or a news aggregator, please use gzip compression and conditional GETs. We now return you to our regularly scheduled blogging already in progress.

The Apache people have an HTTP client designed to deal with the shortcomings of the stuff you can find in the java.net packages. They have an example program in their svn repository of retrieving a given list of URLs in multithreaded fashion. Here’s a slightly simpler Scala adaptation which simultaneously retrieves the webpages for Google, Yahoo, and Sun:


import org.apache.commons.httpclient._
import org.apache.commons.httpclient.methods._

object mtget extends Application {
  // a list of uris to retrieve -- make a thread for each and start it
  List("http://www.google.com/", "http://www.yahoo.com/",
       "http://www.sun.com/") map (x => new Getter(x)) foreach (x => x start)

  class Getter(uri: String) extends Thread {
    val m = new GetMethod(uri)
    override def run() =
      try {
	m setFollowRedirects true
        (new HttpClient(new MultiThreadedHttpConnectionManager)) executeMethod m
        Console.println(m.getResponseBody.length + "b from " + uri)
      } catch {
	case e: Exception => Console.println("error: " + e)
      } finally {
        m.releaseConnection
        Console.println("connection released")
      }
  }
}

While my example is slightly simpler there are parts in the original Java program I don’t have to write or can write in a briefer fashion. For instance, code like this:


        Getter[] threads = new Getter[urisToGet.length];
        for (int i = 0; i < threads.length; i++) {
            threads[i] = new Getter(urisToGet[i]);
        }
        for (int j = 0; j < threads.length; j++) {
            threads[j].start();
        }

would become urisToGet map (x => new Getter(x)) foreach (x => x start) in Scala; there’s no use, after all, in creating variables if you’re not going to use them. I also don’t need to write boilerplate code for class variable declaration and a constructor that just initializes the class variables to the constructor parameters due to how you declare classes in Scala. Notice in the Scala code that I used a style of invoking methods comparable to Smalltalk. Instead of writing client.executeMethod(method), you could write client executeMethod method, whatever you prefer. It’s just syntactic sugar.

Also notice that I have fewer variables and less mutability. Removing unnecessary mutability (e.g., by declaring our GetMethod object to be an immutable val instead of a mutable var) reduces the number of places things can go wrong. The programming style Scala encourages, then, leads to fewer bugs. Also, we can have inner classes rather than having to declare things as static. Finally, the type inference system in Scala’s compiler lets us write code free of redundant type declarations.

The point of this post is that Scala lets you do real-world stuff in a pretty brief manner, and it compares favorably to Java. Scala is not a mere academic curiosity useless for getting real work done. Scala is useful for real-world programming, today. Say goodbye to for(int i = 0; i < N; i++); say hello to type inference and beautiful, brief code.

Next Page »

Blog at WordPress.com.