Scala-bility

Spark, a Worked Example: Speeding Up DNA Sequencing

2016-07-22T08:50:00.002-07:00

Processing huge amounts of data takes time. To do it efficiently we can write code in a functional style and use the opportunities for parallelism that FP allows us. Still, if we want to spread the code over thousands of processors, there is extra work involved in distributing it and managing failure scenarios.

Fortunately things have got much easier in recent years with the emergence of helper software like Spark which will take your code and run it reliably in a cluster. This assumes that you follow certain conventions. Specifically, your program should be a pipeline of transformations between collections of elements, especially collections of key-value pairs. Such a collection in the Spark system is called an RDD (Resilient Distributed Dataset).

Spark is written in Scala. The API supports other languages too like Python and R, but let's stick with Scala, and illustrate how a sample problem can be solved by modelling the solution in this way and feeding the code to Spark. I will take a simplified task from the area of genomics.

DNA may be sequenced by fragmenting a chromosome into many small subsequences, then reassembling them by matching overlapping parts to get the right order and forming a single correct sequence.In the FASTA format, each sequence is composed of characters: one of T/C/G/A. Here are four sample sequence fragments:

>Frag_56
ATTAGACCTG
>Frag_57
CCTGCCGGAA
>Frag_58
AGACCTGCCG
>Frag_59
GCCGGAATAC

Can you see the overlapping parts? If you put the sequences together, the result is be ATTAGACCTGCCGGAATAC.

A naive solution would do it the same way you do it in your head: pick one sequence, look through all the others for a match, combine the match with what you already have, look though all the remaining ones for a match again... and so on, until all sequences have been handled or there are no more matches.

Let's try a better algorithm, one which divides the solution into a number of tasks, each of which can be parallelized. First, you may want to download Spark. You will also need Scala and sbt.

We need Spark because the number of sequence fragments in a real-life scenario would be many millions. (However, I will continue to illustrate the algorithm using the toy data above.)

The skeleton of a Spark program looks like this:

import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.rdd.RDD
object MergeFasta {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("MergeFasta")
    val sc = new SparkContext(conf)
  
    // The algorithm's code will go here.
  }
}

main is the entry point. (The Scala extends App syntax also works, but the documentation recommends not using it.) The first thing any Spark program needs to do is get a reference to a SparkContext so it can send computations to the Spark execution environment.

Now, back to the algorithm. We start with a set of small sequences (see above). These would typically be read from a file and converted to an RDD. Throughout the program we try to keep all data wrapped in RDD data structures so Spark knows how to deal with and parallelize the processing. To get an RDD from a file, we can use Spark's own function, sc.textFile("data.txt"), or use regular Scala code to read the file, parse it into a collection, and call sc.parallelize(anyScalaCollection).

By the way if you want to play about with Spark functions like this, the Spark software includes a REPL which you can fire up as follows:

$ spark-shell

scala> val myRDD = sc.parallelize(List(1,2))
myRDD: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize at <console>:27

scala> myRDD.collect()
res2: Array[Int] = Array(1, 2)

Notice how the collect() function is used to convert an RDD back into a Scala collection so we can view it.

Anyway, now that we have a starting RDD in the program, the first step of the algorithm is to compare each sequence with every other sequence to form a matrix. For each pair, calculate the number of characters at the end of the first sequence that match the characters at the beginning of the second (0 if there is no match). The matrix looks like this:

Key sequence	Other sequences
ATTAGACCTG	(CCTGCCGGAA, 4), (AGACCTGCCG, 7), (GCCGGAATAC, 1)
CCTGCCGGAA	(ATTAGACCTG, 0), (AGACCTGCCG, 0), (GCCGGAATAC, 7)
AGACCTGCCG	(ATTAGACCTG, 0), (CCTGCCGGAA, 7), (GCCGGAATAC, 4)
GCCGGAATAC	(ATTAGACCTG, 0), (CCTGCCGGAA, 1), (AGACCTGCCG, 0)

There are different ways you can build this matrix. Let's assume we have a function matchingLength that calculates the number of characters that overlap between two sequences. Then we can use this code:

val seqList = seqSet.collect().toList
val matrix = seqSet.map(seq => (seq, seqList.filterNot(_ == seq)))
matrix.map {
  case (keySeq, seqs) => (keySeq, seqs.map(seq => (seq, matchingLength(keySeq, seq))))
}

The nice thing is that if you are used to dealing with Scala collections, this will look very familiar. We are actually operating not on Scala collections but RDDs. More good news: RDDs are based on the FP technique of lazy evaluation, so no time is wasted on building intermediate collections. The work is postponed until you call collect() or something similar.

The next step in the algorithm is this: for each row in the matrix, eliminate all values except the one with the longest match. Also eliminate inadequate matches, i.e. where the common part is not more than half the length of a sequence. The reduced matrix looks like:

Key sequence	Sequence with overlap length
ATTAGACCTG	(AGACCTGCCG, 7)
CCTGCCGGAA	(GCCGGAATAC, 7)
AGACCTGCCG	(CCTGCCGGAA, 7)
GCCGGAATAC	(CCTGCCGGAA, 1)

This transformation can be achieved using the mapValues function:

val matrixMaximums = matrix.mapValues(seq => seq.maxBy(_._2))

The last line of the matrix is not a good match, so we can lop it off by applying a filter:

matrixMaximums.filter { case (key, (seq, length)) => length > seq.length / 2 }

Now we have the sequences that we can put together to make the final sequence. We still need to get them in the right order though. Which is the first sequence? It will be any sequence that does not appear as a value anywhere.

val values: RDD[String] = matrix.values.map(_._1)
val startingSeqs: RDD[String] = matrix.keys.subtract(values)

subtract is one of the many useful transformations available on an RDD, and does what you would expect. If you want to see all the available RDD transformations check the Spark API. There are examples of using each one here.

We extract the starting sequence (signalling an error if there wasn't one and only one), and then "follow" that through the matrix: from ATTAGACCTG to AGACCTGCCG to CTGCCGGAA to GCCGGAATAC where the trail ends. To implement this, we may call lookup on the RDD repeatedly inside of a recursive function that builds the String. Each sequence found is appended using the given length (7 in all cases here) to get a final sequence of ATTAGACCTGCCGGAATAC.

To see the final result we can just use println . But how to run it? First compile it to a jar file. sbt is a good choice for building. The full code is available as an sbt project here. When you've cloned or downloaded that, call sbt package in the normal way and then issue the command:

spark-submit --class "MergeFasta" --master local[4] target/scala-2.11/mergefasta_2.11-0.1.jar

The local[4] means that for test purposes we will just be running the program locally rather than on a real cluster, but trying to distribute the work to all 4 processors of the local machine.

When you run spark-submit it will produce a lot of output about how it is parallelizing tasks. You might prefer to see this in the Web UI later. If you want to turn off excessive stdout logging in the meantime, go to Spark's conf directory, copy log4j.properties.template to log4j.properties and set log4j.rootCategory to WARN. Then you will just see the result at the end.

The next question is: how fast was it really? The Spark history server can look at the application logs stored on the filesystem and construct a UI. There is more about configuration here. But the main thing is to run this from the Spark root:

./sbin/start-history-server.sh

Then navigate in your browser to http://localhost:18080/. You will see a list representing all the times you called spark-submit. Click on the latest run, the top one. Now there is a list of "jobs," the subtasks Spark parallelized and how long they took. The jobs run on RDDs will just be a few milliseconds each because of the strategy of deferring the bulk of the work to the end (lazy evaluation), where something like collect() is called.

When I first ran the code, by far the slowest thing in the list of jobs was the count() on line 78, taking 2 seconds. count() is not a transformation: it is an action and it is not lazy but eager, forcing all the previously queued operations to be performed. So it is necessary to dig a bit deeper to find out where the true bottleneck is. Clicking on the link reveals a graph of what Spark has been doing:

This suggests that the really slow thing is the subtract operation on line 76. Let's look at that code again:

val values: RDD[String] = matrix.values.map(_._1)
val startingSeqs: RDD[String] = matrix.keys.subtract(values)

You can see that the matrix RDD is used twice. The problem is that by default Spark is not keeping the matrix in memory, and it has to recompute it. To solve this, the cache function needs to be called just before the above code:

matrix.cache()

After running everything again, the statistics breakdown looks like this:

So, thanks to caching, the time has been nearly halved.

There are still many optimizations possible and perhaps you can think of a better algorithm entirely. However, hopefully this post has shown how Spark can be useful for parallelizing the solution to a simple problem. Again, the full code can be found here.

Scala.js and React: Building an Application for the Web

2015-05-13T16:10:00.001-07:00

Scala.js compiles Scala code to JavaScript. I noticed on the Reactive Programming course at Coursera that Scala.js has been integrated into it to implement a basic spreadsheet on a Web page, suggesting good support from the Scala establishment. The principal developer of Scala.js is a collaborator of Martin Odersky at EPFL, Sébastien Doeraene, and he will be speaking about it next month (June 2015) at Scala Days in Amsterdam.

Before getting into the sample application, let's talk about the motivation for Scala.js. Basically, the Web continues to be a powerful platform for application development. Despite its many problems, it has features that the desktop and mobile phone cannot match. For example, the install process for a web application is negligible from a user's perspective: just load the web page for the first time and it is already there. Also, web applications can connect with each other via hyperlinks and REST calls.

Unfortunately on the Web, JVM languages have traditionally been limited to the server-side. Functionality on the client-side is dominated by JavaScript. The trouble is, from a developer's perspective, two languages mean a lot of extra complexity. You can't share code between them, and to pass objects at runtime between client and server, you end up writing serialization and validation logic in both languages. It would be preferable to implement a web application in a single language.

How is this possible, given that browsers only understand JavaScript? One possibility is to run JavaScript on the server-side, which is the approach that Node.js takes. However, JavaScript meets with many complaints: it is essentially untyped, and you can't take advantage of the solidity and scalability that the JVM has been providing on the server-side for so long. It would be better to use a language on both sides that can take advantage of the performance of JVM, the safety of typing, and also the FP principles crossing over into the mainstream during the past ten years. This is made possible though the use of transpilers which convert one source language (Scala in the case of Scala.js) to JavaScript.

One of the big challenges in Web programming is coordinating events so that elements in the view (client-side) are updated when the model (server-side) changes. For desktop apps, the Observer design pattern is often used, but on the Web, it takes a bit more work, and we often employ the help of some MVC (Model-View-Controller) Web framework. The most general term for getting changes to propagate automatically like this (as opposed to manually making calls from the view all the time) is "Reactive Programming". A particular form of Reactive Programming is Functional Reactive Programming (FRP) which is about capturing relationships in a composable way between values that change over time ("signals"). A related approach is to use a message passing system like Akka that keeps components loosely coupled. In both cases the key goals are to avoid the inefficiency of blocking operations and the hazards of mutable data, so making the overall system scalable and resilient.

I would propose that the term FWP (Functional Web Programming) be used to cover systems that bring FP and Reactive Programming to the Web: including Elm, ClojureScript/Om, and Play Iteratees. The FWP implementation I have chosen is a combination of Scala.js and React: this article describes setting up a simple application without going into depth about FRP or being a full tutorial.

While Scala.js was being developed, back in the JavaScript world frameworks were being developed that tackled the issues of Reactive Programming. One of the most popular has been React, developed by Facebook and Instagram. When it was introduced in May 2013, it surprised seasoned developers as it seemed to violate establish best practices. In JavaScript updating the browser DOM is slow, so it was common to only update the necessary parts when the backing model changed. However, in React when any component's state is changed, a complete re-render is done from the application developer's perspective. It's almost like serving a whole new page, guaranteeing that every place data is displayed it will be up-to-date. This avoids the dangers of mutating state but it seems like it would be very slow: still, it actually outperforms other frameworks like AngularJS thanks to some a clever diffing algorithm involving a "virtual DOM" that is maintained independently of the browser's actual DOM.

Another advantage of React is that developers concentrate on re-using components rather than templates where messy logic tends to accumulate. Components are typically written in JSX, a JavaScript extension language, and then translated to actual JavaScript using a preprocessor. For instance, consider the score text at the top left in the Libanius app (see screenshot). In JSX this would be written:

var ScoreText = React.createClass({
  render: function() {
    return (
      <span className="score-text">
        Score: {this.props.scoreText}
      </span>
    );
  }
});
React.render(
  <ScoreText />,
  document.getElementById('content')
);

This is JavaScript plus some syntactic sugar. Notice how the score variable is passed in using the props for the component.

The code above is converted to normal JavaScript by running the preprocessor. On the command-line you would typically run something like this to watch your source directory and translate code to raw JavaScript in the build directory whenever anything changes:

> jsx --watch src/ build/

This is using standard React so far. However this is not the way we do it with Scala.js.

Firstly, there exists a Scala library from Haoyi Li called Scalatags, that includes Scala equivalents for HTML tags and attributes. Let's assume we have a file QuizScreen.scala in which we are writing the view. The core code may start off looking a bit like this:

@JSExport
object QuizScreen {
  @JSExport
  def main(target: html.Div) = {
    val quizData = // … Ajax call
    target.appendChild(
      span(`class` := "alignleft", "Score: " + quizData.score)
      // ... more view code here
    )
  }
}

Notice that span is a Scala method (the Scalatags library has to be imported). Assuming you've configured SBT to use Scala.js (see below), this is converted to JavaScript in SBT by calling:

> fastOptJS

A good thing about the Scala.js compiler is that it keeps the target JavaScript small by eliminating any code from included libraries that is not used. To stop this from happening on the entry point itself in QuizScreen, it is necessary to use the @JSExport annotation both on the object and the main method. This guarantees that main() will be callable from JavaScript.

So now we have seen the React way and the Scala.js way. How do we combine them? A good option is to use the scalajs-react library from David Barri. Now the ScoreText component looks like this:

  
val ScoreText = ReactComponentB[String]("ScoreText")
    .render(scoreText => <.span(^.className := "alignleft", "Score: " + scoreText))
    .build

Compare this with the JSX version. The Scala code is more concise. Notice that the render() method is present. It's also possible to use other React lifecycle methods if necessary, like componentDidMount() for initializing the component.

Notice also that it uses a Scala span. A specialized version of Scalatags is used here. At first the extra symbols look intimidating, but just remember that < is used for tags and ^ is used for attributes, and they are imported like this:

import japgolly.scalajs.react.vdom.prefix_<^._

In classic JavaScript React, a component can hold state, as seen in references to this.state and the getInitialState() method, which might look like this.

  getInitialState: function() {
    return {data: []};
  }

The Scala version lets us define the state more clearly because it is a strongly typed language. For example, the state for the QuizScreen looks like this:

case class State(userToken: String, currentQuizItem: Option[QuizItemReact] = None,
    prevQuizItem: Option[QuizItemReact] = None, scoreText: String = "",
    chosen: Option[String] = None, status: String = "")

It is best to centralize the state for the screen like this and pass it down to components, rather than having each sub-component have its own separate state object. That could get out of hand!

By the way, you can compose components just as you can in classic React. The central component of the QuizScreen object is the QuizScreen component, and it contains the ScoreText component along with the various other bits and pieces. The code below shows how this is all put together.

val QuizScreen = ReactComponentB[Unit]("QuizScreen")
  .initialState(State(generateUserToken))
  .backend(new Backend(_))
  .render((_, state, backend) => state.currentQuizItem match {
    // Only show the page if there is a quiz item
    case Some(currentQuizItem: QuizItemReact) =>
      <.div(
        <.span(^.id := "header-wrapper", ScoreText(state.scoreText),
          <.span(^.className := "alignright",
            <.button(^.id := "delete-button",
              ^.onClick --> backend.removeCurrentWordAndShowNextItem(currentQuizItem),
                  "DELETE WORD"))
        ),
        QuestionArea(Question(currentQuizItem.prompt,
            currentQuizItem.responseType,
            currentQuizItem.numCorrectResponsesInARow)),
        <.span(currentQuizItem.allChoices.map { choice =>
          <.div(
            <.p(<.button(
              ^.className := "response-choice-button",
              ^.className := cssClassForChosen(choice, state.chosen,
                  currentQuizItem.correctResponse),
              ^.onClick --> backend.submitResponse(choice, currentQuizItem), choice))
          )}),
        PreviousQuizItemArea(state.prevQuizItem),
        StatusText(state.status))
    case None =>
      if (!state.quizEnded)
        <.div("Loading...")
      else
        <.div("Congratulations! Quiz complete. Score: " + state.scoreText)
  })
  .buildU

The central component (QuizScreen) contains the other components (implementations not shown) and also has access to a State and a Backend. The backend contains logic that is a bit more extended. For example, in the code above, observe that submitResponse is called above on the backend when a button is clicked by the user. The code invoked is:

class Backend(scope: BackendScope[Unit, State]) {
    def submitResponse(choice: String, curQuizItem: QuizItemReact) {
      scope.modState(_.copy(chosen = Some(choice)))
      val url = "/processUserResponse"
      val response = QuizItemAnswer.construct(scope.state.userToken, curQuizItem, choice)
      val data = upickle.write(response)

      val sleepMillis: Double = if (response.isCorrect) 200 else 1000
      Ajax.post(url, data).foreach { xhr =>
        setTimeout(sleepMillis) { updateStateFromAjaxCall(xhr.responseText, scope) }
      }
    }

    def updateStateFromAjaxCall(responseText: String, scope: BackendScope[Unit, State]): Unit = {
      val curQuizItem = scope.state.currentQuizItem
      upickle.read[DataToClient](responseText) match {
        case quizItemData: DataToClient =>
          val newQuizItem = quizItemData.quizItemReact
          // Set new quiz item and switch curQuizItem into the prevQuizItem position
          scope.setState(State(scope.state.userToken, newQuizItem, curQuizItem,
              quizItemData.scoreText))
      }
    }
    // more backend methods...
  }

submitResponse makes an Ajax POST call to the server, collects the results and updates the State object. The React framework will take care of the rest, i.e. updating the DOM to reflect the changes to State.

In making the Ajax call, the upickle library (again from Haoyi Li) is used for serialization/deserialization. This is also used on the server side of our Scala.js application. The core of the server side is a Spray server. A simple router is defined which recognizes the call to processUserResponse made above:

object Server extends SimpleRoutingApp {
  def main(args: Array[String]): Unit = {
    implicit val system = ActorSystem()
    lazy val config = ConfigFactory.load()
    val port = config.getInt("libanius.port")

    startServer("0.0.0.0", port = port) {
      // .. get route not shown here
      post {
        path("processUserResponse") {
          extract(_.request.entity.asString) { e =>
            complete {
              val quizItemAnswer = upickle.read[QuizItemAnswer](e)
              upickle.write(QuizService.processUserResponse(quizItemAnswer))
            }
          }
        }
      } 
    }
  }
}

The "processUserResponse" path extracts the post data using upickle then passes the call on to the QuizService singleton which contains the mid-tier business logic, and relies on the core Libanius library to run the main back-end functionality on the Quiz. I won't go into detail about this logic, but note that both for historical reasons and future portability it relies on simple files to hold quiz data rather than a database system.

Back to the Spray server: when the QuizScreen page is initially loaded, this route is used:

     get {
        pathSingleSlash {
          complete{
            HttpEntity(
              MediaTypes.`text/html`,
              QuizScreen.skeleton.render
            )
          }
        }

The QuizScreen mentioned here is not the QuizScreen on the client-side that is described above. In fact, it is a server-side QuizScreen that makes a call to the client-side QuizScreen. Like this:

object QuizScreen {

  val skeleton =
    html(
      head(
        link(
          rel:="stylesheet",
          href:="quiz.css"
        ),
        script(src:="/app-jsdeps.js")
      ),
      body(
        script(src:="/app-fastopt.js"),        
        div(cls:="center", id:="container"),
        script("com.oranda.libanius.scalajs.QuizScreen().main()")
      )
    )
}

Again the tags are from Scalatags. The main call is in the last script tag. Recall that on the client-side we use @JSExport to make the QuizScreen().main() available:

  
  @JSExport
  def main(): Unit = {
    QuizScreen() render document.getElementById("container")  
  }

Also notice in the skeleton above, there are two included JavaScript libraries:

app-fastopt.js: In a Scala.js application, the *-fastopt.js file is the final output of the fastOptJS task, containing the JavaScript code that has been generated from your Scala code.
app-jsdeps.js: In a Scala.js application, the *-jsdeps.js, contains all additional JavaScript libraries: in our case, the only thing it incorporates is react-with-addons.min.js.

Here are the essentials of the SBT configuration, which can be used as a starting point for other Scala.js projects, as it just uses the most basic dependencies, including Scalatags, upickle for serialization, and utest for testing.

import sbt.Keys._

name := "Libanius Scala.js front-end"

// Set the JavaScript environment to Node.js, assuming that it is installed, rather than the default Rhino 
scalaJSStage in Global := FastOptStage  

// Causes a *-jsdeps.js file to be generated, including (here) React
skip in packageJSDependencies := false

val app = crossProject.settings(
  unmanagedSourceDirectories in Compile +=
    baseDirectory.value  / "shared" / "main" / "scala",

  libraryDependencies ++= Seq(
    "com.lihaoyi" %%% "scalatags" % "0.5.1",
    "com.lihaoyi" %%% "utest" % "0.3.0",
    "com.lihaoyi" %%% "upickle" % "0.2.8"
  ),
  scalaVersion := "2.11.6",
  testFrameworks += new TestFramework("utest.runner.Framework")
).jsSettings(
  libraryDependencies ++= Seq(
    "org.scala-js" %%% "scalajs-dom" % "0.8.0",
    "com.github.japgolly.scalajs-react" %%% "core" % "0.8.3",
    "com.github.japgolly.scalajs-react" %%% "extra" % "0.8.3",
    "com.lihaoyi" %%% "scalarx" % "0.2.8"
  ),
  // React itself (react-with-addons.js can be react.js, react.min.js, react-with-addons.min.js)
  jsDependencies += "org.webjars" % "react" % "0.13.1" / "react-with-addons.js" commonJSName "React"

).jvmSettings(
  libraryDependencies ++= Seq(
    "io.spray" %% "spray-can" % "1.3.2",
    "io.spray" %% "spray-routing" % "1.3.2",
    "com.typesafe.akka" %% "akka-actor" % "2.3.6",
    "org.scalaz" %% "scalaz-core" % "7.1.2"
  )
)

lazy val appJS = app.js.settings(

  // make the libanius core JAR available
  // ...
  unmanagedBase <<= baseDirectory(_ / "../shared/lib")
)

lazy val appJVM = app.jvm.settings(

  // make sure app-fastopt.js, app-jsdeps.js, quiz.css, the libanius core JAR, application.conf 
  // and shared source code is copied to the server
  // ...
)

As you can see, the special thing about a Scala.js client-server SBT configuration is that it is divided into three parts: js, jvm, and shared. The js folder contains code to be compiled by ScalaJS, the jvm folder contains regular Scala code used on the server-side, and the shared folder contains code and configuration that should be accessible to both js and jvm. This is achieved by using the crossProject builder from Scala.js, which constructs two separate projects, the js one and the jvm one.

So far we've been assuming that any generated JavaScript will run in a browser. However, Scala.js also works with "headless runtimes" like Node.js or PhantomJS to ensure you can run it from the command-line on the server-side too: this is important in testing. Notice the scalaJSStage in Global := FastOptStage line above.

Now for a grand overview of the web application, let's look at the directory structure. You can see how slim the application really is: there are only a few key source files.

libanius-scalajs-react/  
  build.sbt
  app/
    js/
      src/
        main/
          scala/
            com.oranda.libanius.scalajs/
              QuizScreen.scala
      target/
    jvm/
      src/
        main/
          resources/
            application.conf
          scala/
            com.oranda.libanius.sprayserver/
              QuizScreen.scala
              QuizService.scala
              Server.scala
      target/
    shared/
      lib/
        libanius-0.982.jar
      src/
        main/
          resources/
            quiz.css
          scala/
            com.oranda.libanius.scalajs/
              ClientServerObjects.scala
                QuizItemReact

Again notice there is a QuizScreen on both the server-side and client-side: the former calls the latter.

One thing that I didn't mention yet is the quiz.css file that is used in the client-side QuizScreen. This is just an old-fashioned CSS file, but of course it also possible to use LESS files. Furthermore, if you don't anticipate having a graphic designer want to change your styles, you could even go the whole way in making your application type safe, and write your styles in Scala with ScalaCSS.

The full code for this front-end to Libanius is on Github. As of writing there is a deployment on Heroku (may not be supported indefinitely). For a full tutorial on Scala.js, see Hands-on Scala.js from Haoyi Li. There is also a small official tutorial.

Speaking Actors Living in the Phone

2015-04-06T06:58:00.000-07:00

A Hollywood actor who couldn't speak wouldn't get far. How soon before we say the same of mobile apps?

This article is a bit off-topic for this blog, but it's a bit of fun and the subject deserves more attention. Over the past 18 months the Google Text-to-Speech API has come a long way on the Android platform. I decided to take advantage of it in the Libanius quiz app. For example, when the app presents a quiz item, it should speak the prompt ("Clairvoyance" in the picture).

Libanius uses the actor model from Akka to coordinate its subsystems. To see how Android can be set up to use Akka actors, see this from Typesafe, or this from me. Assuming the Akka infrastructure is in place, a speaking actor can be created like this:

class Voice(implicit ctx: Context) extends Actor with TextToSpeech.OnInitListener {

  val tts = new TextToSpeech(ctx, this)

  override def receive = {
    case Speak(text: String, quizGroupKeyType: String) => // see next code snippet
    case _ => logError("Voice received an unknown command")
  }
}

The Context of the Android activity is passed to the actor because the TextToSpeech class needs it for initialization. Speak is just a case class for the message that is sent to the actor. We have to implement what happens when that message is received.

The prompt could be in various different languages. The Google API supplies a different voice for each language, so we need to take advantage of this.

  case Speak(text: String, quizGroupKeyType: String) =>
    KnownQuizGroups.getLocale(quizGroupKeyType).foreach(speak(text, _))

KnownQuizGroups is a simple map of Libanius quiz group types (e.g. "Spanish") to locales
recognized by TextToSpeech.

The speak method finally uses the TextToSpeech API to set the voice according to the given locale, and actually speak the text. It works fine whether the text is a word or a full sentence.

def speak(text: String, locale: Locale) {
  setSpeechLanguage(locale)
  tts.speak(text, TextToSpeech.QUEUE_FLUSH, null)
}

private[this] def setSpeechLanguage(locale: Locale): Unit = {
  val result: Int = tts.setLanguage(locale)
  if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED)
    logError("Language is not available.")
}

Also we must remember in the Speak code to reset the language after the message is spoken.

  setSpeechLanguage(DEFAULT_LOCALE)

While we're adding sound to the Libanius quiz, how about a ping sound for a correct answer, and a buzz sound for a wrong answer? This is a job for another actor: SoundPlayer. The guts of it look like this:

class SoundPlayer(implicit ctx: Context) extends Actor {

  val audioManager: AudioManager =
    ctx.getApplicationContext.getSystemService(Context.AUDIO_SERVICE).asInstanceOf[AudioManager]
  val soundPool: SoundPool = new SoundPool(10, AudioManager.STREAM_MUSIC, 0)
  var soundPoolMap = Map[SoundSample, Int]()

  override def receive = {
    case Load() => 
      loadSounds()
    case Play(soundSample: SoundSample) =>
      val curVolume = audioManager.getStreamVolume(AudioManager.STREAM_MUSIC)
      soundPool.play(soundPoolMap(soundSample), curVolume, curVolume, 1,  0, 1f)
    case _ =>
      logError("SoundPlayer received an unknown command")
  }
}

There is more complete code on Github. This is all for the Android platform, but of course you can also use Akka actors for sound on other platforms like the Mac: see Alvin Alexander's Wikipedia Reader page for some sample code. Have fun!

The QuizItemSource Typeclass

2014-09-23T14:30:00.001-07:00

When talking about scaling an application, people are usually referring to performance: how to design the software so that limited hardware can support millions of users? But it is also important to be able to scale the feature set of an application. If users say they want a new button, you don't want to have to make extensive refactorings of your core model classes as is all too often the case in industry. The key to both of these related demands is decoupling.

While writing the Libanius quiz application, I found typeclasses to be very useful in decoupling behaviour from the model classes. Typeclasses (formally spelt "type classes") originated in Haskell, but can be used in Scala too, being a language that supports FP well. I'll describe one use of typeclasses in Libanius, an example that should be less "theoretical" and more extended than the ones that are usually given.

To start with, the core model in Libanius is a simple containment hierarchy as shown in the diagram. The Quiz data structure contains all the information necessary to provide a stream of quiz items to a single user. It is split by topic into QuizGroups, which, for performance reasons, are further partitioned into "memory levels" according to how well the user has learned the items. The flow of a typical request then is Quiz.findQuizItem() followed by QuizGroup.findQuizItem() calls, followed by QuizGroupMemoryLevel.findQuizItem() calls.

In practice the findQuizItem() signatures and functionality were similar but not identical. Rather than have them scattered throughout the model classes, it was desirable to put them together in a single place where the similarites could be observed and moulded, avoiding any unnecessary repetition. The same was true of other operations in the model classes. Pulling them out would also solve the problem of too much bloat in the model.

Furthermore, although the Quiz data structures are optimized for finding quiz items, I wanted to be able to extend the same action to other entities. A wide range of scenarios can be imagined:

A Twitter feed of quotes might be seen as a mapping from speakers to quotations, which could be turned into a Who-Said-What quiz.
A set of web pages with recipes could be read as a "What ingredient goes with what dish?" quiz.
In a chat forum context, a person might be thinking of quiz questions and feeding them through.
A clever algorithm might not simply be "finding" quiz items in a store, but dynamically generating them based on previous answers.
The Quiz might not be a Scala data structure, but a database. This could be a NoSQL in-memory database like Redis or leveldb. It could also be a traditional RDBMS like Postgres accessed via Slick.

What links all these cases together is that a findQuizItem() function is necessary but with a new implementation. So the immediate priority is to decouple the findQuizItem() action from the simple Quiz data structure. Notice that this action might be applied to structures that didn't even know they were supposed to be quizzes. We don't want to have to retroactively modify classes like TwitterFeed and ChatParticipant to make them "quiz aware".

If you know a lot of Java, some design patterns might occur to you here, like Adapter, Composite, or Visitor. In the Visitor pattern, in particular, you can separate out arbitary operations from the data structures and form a parallel hierarchy to run those operations. But the original data structures still have to have an accept() method defined. In Scala a more convenient alternative, graced with all the polymorphism you could ever need, is the typeclass construct. Typeclasses excel at separating out orthogonal concerns.

Firstly a trait is defined for the typeclass with the behaviour we wish to separate out. Let's begin by calling the main function produceQuizItem() rather than findQuizItem(), to abstract it more from the implementation:

trait QuizItemSource[A] {
  def produceQuizItem(component: A): Option[QuizItem]}
}

The component could be a Quiz, a QuizGroup, a QuizGroupMemoryLevel or anything else. Instead of calling produceQuizItem() on the component, the component is passed as a parameter to the operation.

Next, define some sample implementations (aka typeclass instances) for the entities (aka model components) that we already have:

object modelComponentsAsQuizItemSources {

  implicit object quizAsSource extends QuizItemSource[Quiz] {
    def produceQuizItem(quiz: Quiz): Option[QuizItem]} = 
      … // implementation calls methods on quiz
  }

  implicit object quizGroupAsSource extends QuizItemSource[QuizGroup] {
    def produceQuizItem(quizGroup: QuizGroup): Option[QuizItem]} = 
      … // implementation calls methods on quizGroup
  }

  implicit object quizGroupMemoryLevelAsSource extends QuizItemSource[QuizGroupMemoryLevel] {
    def produceQuizItem(quizGroupMemoryLevel: QuizGroupMemoryLevel): Option[QuizItem]} = 
      … // implementation calls methods on quizGroupMemoryLevel
  }
}

These sample implementations can be accessed in client code once it is imported in this way:

import modelComponentsAsQuizItemSources._

However, we don't want to call implementations directly, like quizGroupAsSource.produceQuizItem(myQuizGroup). Instead we want to just call produceQuizItem(myQuizGroup) and let the correct implementation be selected automatically: this is what polymorphism is about. To facilitate this, let's define a function in the middle for accessing the implementations (and this is really the final piece of the puzzle for typeclasses!):

object QuizItemSource {

  def produceQuizItem[A](component: A)(implicit qis: QuizItemSource[A]): Option[QuizItem] =
    qis.produceQuizItem(component)
}

If you're coming from other languages, it's the use of the implicit that is at first difficult to grasp, but this is really the Scala feature that makes typeclasses convenient to use, where they are not in, say, Java. Let me explain what is happening here:

Suppose in client code, the modelComponentsAsQuizItemSources has been imported as above. Now the quizGroupAsSource is implicitly in scope, among other implementations. If there is now a call to QuizItemSource.produceQuizItem(quizGroup), the compiler, seeing that quizGroup is of type QuizGroup will match qis to an implicit of type QuizItemSource[QuizGroup]: this is quizGroupAsSource, the correct implementation.

The great thing is that the correct QuizItemSource implementation will always be selected based on the type of entity passed to produceQuizItem, and also the context of the client, specifically the imports it has chosen. Also, we have achieved our original goal that whenever we realize we want to turn an entity into a quiz (TwitterFeed, ChatParticipant, etc.) we don't have to modify that class: instead, we add an instance of the typeclass to modelComponentsAsQuizItemSources and import it.

This approach is a break with OO, moving the emphasis from objects to actions. Instead of adding actions (operations) to objects, we are adding objects to an action generalization (typeclass). We can make that action as powerful as we like by continuing to add objects to it. Again, unlike in OO, those objects do not need to be custom-built to be aware of our action.

That is the main work done. There are two more advanced points to cover.

Firstly, it is possible to add some "syntactical sugar" to the definition of produceQuizItem() above. In this code

def produceQuizItem[A](component: A)(implicit qis: QuizItemSource[A]): Option[QuizItem] =
    qis.produceQuizItem(component)

… the qis variable may be viewed as clutter. We can push it into the background using context bound syntax:

def produceQuizItem(A : QuizItemSource](component: A): QuizItem =
  implicitly[QuizItemSource].produceQuizItem(component)

What happens is that the context bound A : QuizItemSource establishes an implicit parameter, which is then pulled out by implicitly. This syntax is a bit shorter, and looks shorter still if you do not need to call implicitly but are simply passing the implicit parameter down to another function.

Secondly, it is possible to have a typeclass with multiple type parameters. Context bound syntax is not convenient for this case, so, first, let's go back to the original syntax:

def produceQuizItem[A](component: A)(implicit qis: QuizItemSource[A]): Option[QuizItem] =
    qis.produceQuizItem(component)

Suppose a Quiz were passed in, the quizAsSource instance will be selected. You will remember this looked like this:

implicit object quizAsSource extends QuizItemSource[Quiz] {
  def produceQuizItem(quiz: Quiz): Option[QuizItem]} = 
    … // implementation calls methods on quiz
}

However, I simplified a bit here. The original implementation did not in fact return simple quiz items in the case of a Quiz. Whenever the Quiz retrieved a quiz item from a QuizGroup, it would add some information. Specifically it would generate some "wrong choices" for the question and return that to the user to allow a multiple-choice presentation of the quiz item. So the QuizItem would be wrapped in a QuizItemViewWithChoices object. The signature had to be:

  def produceQuizItem(quiz: Quiz): Option[QuizItemViewWithChoices]} = …

Now the problem is this does not conform to the typeclass definition. So let's massage the typeclass definition a bit by turning the return type into a new parameter, C:

trait QuizItemSource[A, C] {
  def produceQuizItem(component: A): Option[C]
}

Then the typeclass instance for Quiz will look as desired:

implicit object quizAsSource extends QuizItemSource[Quiz, QuizItemViewWithChoices] {
  def produceQuizItem(quiz: Quiz): Option[QuizItemViewWithChoices]} = 
    … // implementation calls methods on quiz
}

The intermediate function for accessing the typeclass becomes a bit more complex. Recall this was previously:

  def produceQuizItem[A](component: A)
      (implicit qis: QuizItemSource[A]): Option[QuizItem] =
    qis.produceQuizItem(component)

The second type parameter for the return type is added as C:

  def produceQuizItem[A, C](component: A)
      (implicit qis: QuizItemSourceBase[A, C], c: C => QuizItem): Option[C] =
    qis.produceQuizItem(component)

C is constrained so that the return value must be compatible with QuizItem. To make QuizItemViewWithChoices compatible with QuizItem, it is not necessary to make it a subclass with lots of boilerplate as in Java. Instead an implicit conversion is defined next to the QuizItemViewWithChoices class like this:

object QuizItemViewWithChoices {
  // Allow QuizItemViewWithChoices to stand in for QuizItem whenever necessary
  implicit def qiView2qi(quizItemView: QuizItemViewWithChoices): QuizItem =
    quizItemView.quizItem
}

Now it all works!

That ends the description of the QuizItemSource typeclass. For more examples, see the other typeclasses in the Libanius source code such as CustomFormat and ConstructWrongChoices, which cover other concerns orthogonal to the model, or view this excellent video tutorial by Dan Rosen.

The Akka Event Bus on Android

2014-03-03T15:19:00.000-08:00

Event buses are useful for keeping components in a system loosely coupled, and we generally see them in two areas: distributed systems, and UIs. This post is about the use of an event bus in a UI, specifically Android.

There exist event bus libraries for Android like Otto and greenrobot. There also exists an Event Bus API for Akka. What I haven't seen yet is the Akka Event Bus set up on Android. So, here it is!

First a word on the motivation. In the Libanius Android application, I recently separated part of a screen (in Android's parlance, an Activity) into its own separate class. A problem was that, in response to certain events, the new class needed to add data back to the data structure in the main class for the Activity. My initial approach was to define a closure that could do the work of adding the data, and pass the closure as an argument to the new class (the sub-Activity). However, this felt a bit ad hoc, so I decided it was time to introduce a general event-handling mechanism into Libanius Android. Having helped create an event bus for the GWT UI framework before, I thought it would be interesting to see if the same idea could be applied in Android.

The Libanius Event Bus just extends Akka's ActorEventBus. It is nearly as simple as this:

class LibaniusActorEventBus extends ActorEventBus with LookupClassification
    with AppDependencyAccess {

  type Event = EventBusMessageEvent
  type Classifier = String

  protected def classify(event: Event): Classifier = event.channel

  protected def publish(event: Event, subscriber: Subscriber) {
    subscriber ! event
  }
}

The diagram below shows what use we are actually making of the event bus:

The Android parts are OptionsScreen and DictionarySearch. In the UI, DictionarySearch is actually part of the OptionsScreen activity, but they communicate through the event bus. When the user clicks on a button from DictionarySearch to add a new word to the quiz, a NewQuizItemMessage is constructed. This event is just a wrapper for the quiz item and an identifying header:

case class NewQuizItemMessage(header: QuizGroupHeader, quizItem: QuizItem)
  extends EventBusMessage

The NewQuizItemMessage is published to the QUIZ_ITEM_CHANNEL by calling sendQuizItem() in LibaniusActorSystem. LibaniusActorSystem is a facade to LibaniusActorEventBus and other actors in the system not discussed here.

object LibaniusActorSystem extends AppDependencyAccess {
  val system: ActorSystem = ActorSystem("LibaniusActorSystem")

  val appActorEventBus = new LibaniusActorEventBus
  val QUIZ_ITEM_CHANNEL = "/data_objects/quiz_item"

  def sendQuizItem(qgh: QuizGroupHeader, quizItem: QuizItem) {    
    val newQuizItemMessage = NewQuizItemMessage(qgh, quizItem)
    appActorEventBus.publish(EventBusMessageEvent(QUIZ_ITEM_CHANNEL, newQuizItemMessage))
  }
}

That's the guts of it. There is some more code in OptionsScreen to create an actor to subscribe to the QUIZ_ITEM_CHANNEL, and actually add the QuizItem when it sees the event. For more detail, see the code in the Libanius Android project.

Extending Play with AngularJS: Nested MVCs in a Webapp

2014-01-03T23:39:00.000-08:00

I finally got a bit of spare time to tighten up the JavaScript in Libanius-Play, the web interface to my quiz application. For a long time, I was dissatisfied with the way data was being passed around from server to client and then back from client to server: there was too much repetition. The JavaScript library jQuery was of use to create AJAX callbacks, but still, any time there was a new parameter, it had to be added in too many places.

AngularJS is another JavaScript library. Like jQuery it is geared towards asynchronous communication, but it provides an MVC framework. The Play Framework is also MVC, so the result is an MVC-within-MVC architecture (see diagram). It seems to work quite well.

The AngularJS controller has a services layer to fetch data from the Scala back-end. Then it updates the page. The nice thing about it is that I don't have to update variables in the web page one by one: I can just bind variables in the web page to attributes in the model. This is achieved in the HTML using AngularJS tags like ng-model and the use of curly braces like this:

    
    {{ quizData.prompt }}

There are a number of variables like this scattered through the HTML, but they can all be updated at once in a single line of JavaScript if the Play backend returns a JSON structure. (The data structure can be defined as a case class in Scala on the server, and will be dynamically mirrored as a JavaScript structure on the client-side without the need to map attributes manually.) Here is a simplified version of what happens in the JavaScript controller after the server has processed a user response: basically it returns a new quiz item.

    
    services.processUserResponse($scope.quizData, response).then(function(freshQuizData) {
       $scope.quizData = freshQuizData.data // updates values in the web page automagically
       labels.setColors($scope.quizData)    // extra UI work using a custom JavaScript module called labels
    }

In the first line, services.processUserResponse is a function in the services.js file which sends user input to the server, and immediately returns a promise of new data. The promise has a handy then function which takes a callback argument, describing what happens when the data actually comes through, i.e. updating the view.

All in all, a good experience with AngularJS.

Private Constructors

2013-12-18T08:33:00.000-08:00

Private constructors: mostly you won't need them. Commonly in Scala we just define a class and let the constructor and parameters be exposed to the world:

case class SocialFeedCache(tweets: List[Tweet]) { ... }

But suppose the tweets need to be organized for efficiency, and we don't want the cache to be constructed in an invalid state? Then we might have a factory method that performs the necessary initialization. In Java this would be a static method. In Scala we put it in a companion object:

object SocialFeedCache {
  def apply(tweets: List[Tweet]): SocialFeedCache = {
    // ... organize the tweets ...
    SocialFeedCache(organizedTweets) // call the primary constructor
  }
}

So now we must always call the factory method rather than the primary constructor. How can we enforce this? How can we block anyone from calling the constructor directly? A first attempt might look like this:

private case class SocialFeedCache(tweets: List[Tweet]) { ... }

But this doesn't just hide the constructor; it hides the whole class! In fact there will be a compile-time error, because the factory method in object SocialFeedCache is trying to expose a class that is now hidden.

The correct syntax is actually:

case class SocialFeedCache private(tweets: List[Tweet]) { ... }

And that's all there is to it. This is a useful technique whenever you have special initialization taking place in factory methods.

Akka in Your Pocket

2013-08-17T08:05:00.000-07:00

Although Akka is usually thought of in connection with massively scalable applications on the server side, it is a generally useful model for concurrency that can be deployed in an ordinary PC application or even a mobile app.

My Nexus 4 phone has four cores. If I have some CPU-intensive processing in an Android app, I observe that I can speed it up nearly 300% by spreading load across four actors. This can mean the difference between one second response time and quarter-second response time, quite noticeable to the user. Therefore, along with regular Scala futures, Akka is a useful tool that Scala provides to Android developers to make the most of their users' computing resources.

One tricky bit is getting the initial configuration correct with SBT. After some experimentation, I found that these Proguard settings, adapted from an early version of this project, work. (ProGuard shrinks and optimizes an application, but when you include a library like Akka, it needs to be reminded to keep certain classes using "keep options.")

 
      proguardOption in Android := 
      """
        |-keep class com.typesafe.**
        |-keep class akka.**
        |-keep class scala.collection.immutable.StringLike {*;}
        |-keepclasseswithmembers class * {public <init>(java.lang.String, akka.actor.ActorSystem$Settings, akka.event.EventStream, akka.actor.Scheduler, akka.actor.DynamicAccess);}
        |-keepclasseswithmembers class * {public <init>(akka.actor.ExtendedActorSystem);}
        |-keep class scala.collection.SeqLike {public protected *;}
        |
        |-keep public class * extends android.app.Application
        |-keep public class * extends android.app.Service
        |-keep public class * extends android.content.BroadcastReceiver
        |-keep public class * extends android.content.ContentProvider
        |-keep public class * extends android.view.View {public <init>(android.content.Context);
        | public <init>(android.content.Context, android.util.AttributeSet); public <init>
        | (android.content.Context, android.util.AttributeSet, int); public void set*(...);}
        |-keepclasseswithmembers class * {public <init>(android.content.Context, android.util.AttributeSet);}
        |-keepclasseswithmembers class * {public <init>(android.content.Context, android.util.AttributeSet, int);}
        |-keepclassmembers class * extends android.content.Context {public void *(android.view.View); public void *(android.view.MenuItem);}
        |-keepclassmembers class * implements android.os.Parcelable {static android.os.Parcelable$Creator CREATOR;}
        |-keepclassmembers class **.R$* {public static <fields>;}
      """.stripMargin

Note: this has only been tested to work with Akka 2.1.0 and Scala 2.10.1. If you use other versions, you may have do some more tinkering/googling.

Using 2.10 Futures in Android

2013-08-03T07:04:00.001-07:00

In an Android app, it is usually necessary to run some tasks in the background so they don't block
the UI. The standard way to do this in Java code is with an AsyncTask. Basically you fill out the body of an AsyncTask with two parts: what needs to happen in the background, and what needs to happen in the foreground once that is complete. The return value of the background part is passed as a parameter to the foreground part:

Here is an example expressed in Scala, as always more concise than Java:

    new AsyncTask[Object, Object, String] {
      override def doInBackground(args: Object*): String = calculateScore
      override def onPostExecute(score: Int) { showScore(score) }
    }.execute()

From a Scala perspective, it might occur to you that this is a bit similar to what a Future does. Futures are more general and concise, so in my opinion they are to be preferred. Indeed, following the example of Terry Lin, I tried to get the identical functionality as the above working using the Future trait in Scala 2.10, and it worked fine. The new code has this structure:

    future {
      calculateScore
    } map {
      runOnUiThread(new Runnable { override def run() { showScore(score) } }))
    }

This code is a bit shorter. The one thing to remember is that after the score is returned, we have
to remind Android to return control to the main thread -- that's what the runOnUiThread() is for.

A Benefit of Implicit Conversions: Avoiding Long Boilerplate in Wrappers

2013-07-04T06:39:00.002-07:00

In OO practice we use wrapper classes a lot. So much in fact, that the Gang of Four identified several types of wrappers (differentiated more by purpose than structure): Adapters, Decorators, Facades, and Proxies. In this blog post I am going to concentrate on the Proxy design pattern, but the same technique of using implicit def to cut down code repetition can be applied for many different types of wrappers.

As an example, I'm going to take a data structure from my open-source quiz app Libanius. It's not necessary to go into the full data model of the app, but one of the central domain objects is called a WordMappingValueSet, which holds pairs of words and associated user data. This data is serialized and deserialized from a custom format in a simple file.

The problem is that the deserialization, which involves parsing, is rather slow for tens of thousands of WordMappingValueSet's, especially on Android, the target platform. The app performs this slow initialization on startup, but we don't want to keep the user waiting before starting the quiz. How can the delay be avoided? The solution is to wrap the WordMappingValueSet in a Lazy Load Proxy, which gives the app the illusion that it immediately possesses all these WordMappingValueSet objects when in fact all it has is a collection of proxies that each contain the serialized String for a WordMappingValueSet. The expensive work of deserializing the WordMappingValueSet is done by the proxy only when each object is really needed.

The question now is how to implement the Lazy Load Proxy. The classic object-oriented way
to do it, as documented by the Gang of Four book, is to create an interface to the WordMappingValueSet that contains all the same methods, and then a WordMappingValueSetLazyProxy implementation of that interface. The client code instantiates the proxy object but it makes calls on the interface as if it were calling the real object. In this way, the illusion is preserved so not much has to change in the client code, but "under the hood" what is happening is that the proxy is deserializing the String and then forwarding each call to the real object.

In the above diagram, the function fromCustomFormat() does the work of deserialization. In Java this would be a static method; in Scala it is placed within the singleton object WordMappingValueSet. Inside WordMappingValueSetLazyProxy, the serialized String is stored, and fromCustomFormat() is called the first time one of the business methods is called, before the call is delegated to the method of the same name in WordMappingValueSet.

This kind of wrapper is very common in the Java world. But there is a problem. There are a lot of methods in WordMappingValueSet. These all need to be duplicated in IWordMappingValueSet and WordMappingValueSetLazyProxy. It violates the principle of DRY (Don't Repeat Yourself). Implementing this in Scala, I would go from about a hundred lines of code to two hundred lines. (In Java I would be going from two hundred lines to four hundred lines.) Furthermore, every time I need to add a method to WordMappingValueSet, I now need to do the same to the other two types, so the lightweight flexibility of the data structure has been lost.

In Scala, there is a better solution: use an implicit conversion. Firstly, as before, the WordMappingValueSet is wrapped inside a WordMappingValueSetLazyProxy:

case class WordMappingValueSetLazyProxy(valuesString: String) {  
  lazy val wmvs: WordMappingValueSet = WordMappingValueSet.fromCustomFormat(valuesString)
}

But that's all that's necessary for that class. We won't be defining the same methods all over again. Notice also that the lazy keyword, provided by Scala, means that fromCustomFormat() won't actually be called until some method is called on wmvs, i.e. until it is really needed.

But the real trick to make this work is to define the implicit conversion:

object WordMappingValueSetLazyProxy {
  implicit def proxy2wmvs(proxy: WordMappingValueSetLazyProxy): WordMappingValueSet = proxy.wmvs
}

This "converts" a WordMappingValueSetLazyProxy to a WordMappingValueSet. More specifically, it says to the compiler, "Whenever a method is called on WordMappingValueSetLazyProxy such that it doesn't seem to compile because no such method exists, try calling that method on WordMappingValueSet instead and see if that works." So we get the benefits of adding all the methods of WordMappingValueSet to WordMappingValueSetLazyProxy without having to actually do it by hand as in Java.

There are other types of proxies, objects that give you the illusion that a resource has been acquired before it actually has. Perhaps the most common type is the network proxy. This is how EJBs are implemented, and perhaps the most frequent criticism of EJBs, since the early days of 2.x, is the large amount of code redundancy: for each domain object, you need to write a plethora of interfaces surrounding it. A lot of tools and techniques have emerged over the years to deal with the redundancy: code generation, AOP byte code weaving, annotations, etc. Could implicit conversions have avoided this? Some believe that implicits are too powerful and scary, but considering how much time has been spent over the years on frameworks involving reams of XML to work around the limitations of Java, it seems there might be a case for allowing this kind of power to be built into the language. Scala has this.

Scala on the Rise in Germany

2013-06-01T14:39:00.000-07:00

Kevin Shale was nice enough to invite me to a Scala meetup in Cologne on May 27. I had previously been to the Scala meeting in Berlin, but this was my first time with the Scala User Group in Cologne. The talk covered scalaz, the functional Scala library from Tony Morris. The discussion afterwards was broader and also interesting.

I understand previous meetings were pretty small, but twice the number of people came this time. After taking London and Silicon Valley by storm, it's amazing how fast Scala is catching on in Germany too. I have already had recruiters from Munich and Berlin asking me about it.

Functional Programming at Coursera

2013-04-21T08:37:00.000-07:00

Many professions are based on the idea that you do a big amount of learning upfront (university) and then only small updates will be necessary throughout the rest of your career. However, in software development technology advances so quickly that your whole career is really one of continuous learning. To further this end, MOOCs (Massive Open Online Courses) have been become popular, especially over the past year or two. I don't see them as replacing traditional universities. However, they are a very useful resource, both for reminding yourself what you learnt in university and introducing you to new paradigms as they become popular in your field of expertise.

A great example is the Functional Programming Principles in Scala course at Coursera. Coursera is based at Stanford University, and this particular course is led by the Swiss professor Martin Odersky, inventor of the Scala language. I am now on the fourth week of the seven-week course. Each week requires the student to watch a couple of hours of videos interspersed with quizzes, and then take some more serious exercises, which take several hours more and submitted online using the Scala tool sbt. A student can do this work at any time during the week, meaning there doesn't need to be any interruption to one's regular work schedule. At the end of the course, a certificate is available, as well as the total score on the exercises.

In practise the majority of students drop out of online courses soon after starting them. However, I fully intend to pursue this one to the end, as I think the material is both excellently presented and highly relevant to the direction the software world is moving in. Furthermore, completing the exercises feels like solving puzzles in a way that I found quite addictive, and I shall be sorry when the course is complete and there are no more.

For instance this week's lectures, on "Types and Pattern Matching," set the stage for exercises on creating Huffman trees, which was quite interesting. In this course, there is an emphasis on data structures and algorithms that reminds of my days studying Computer Science at university twenty years ago. In those days we learnt Scheme, a LISP variant and a classic functional programming language. These days there are more popular FP languages like Scala and Clojure, which are entering the mainstream for many good reasons including allowing access to the JVM and Java libraries.

Anyway I highly recommend the course. This is the second run of it and I expect there will be another chance for people to take it in a few months.

Load Testing with Gatling

2012-10-16T08:37:00.000-07:00

Recently, I've been working with an organization that had a huge increase in traffic due to mobile phones, and needed to determine how its legacy application would handle the millions of users flooding into its servers. They were accustomed to using JMeter, as it's a full-featured mature Java application for load testing and also useful for functional testing.

However, it turns out that when the numbers get really big, you have to figure out how to deal with the load of JMeter itself! You end up distributing the JMeter application across several servers. Surely this shouldn't be necessary? Also, because it's so UI-oriented, the "scripts" you use for testing are big clunky XML files. You might prefer writing them in a modern programming language, giving you the option to integrate them into a general network management system.

Enter Gatling. This tool is based on the modern Akka framework, so it has much less overhead than JMeter. In fact, I was able to run it on a single machine all the time. Furthermore, since the scripts you write are in Scala, you can make it as easy or complex as the needs of the organization warrant. A simple script (or "simulation") looks like this:

val myScenario = scenario("My Sequence Of Actions")
  .feed(csv("feed_of_users.csv"))
  .exec(http("Confirm Newspaper Subscription")
  .get("/myapp/hasSubscription?userId=" + "${userId}")
  .check(status.is(200))
  .check(regex("""<response status='ok' """).exists))

Run gatling.sh, and choose the script you want to run. After it finishes Gatling gives you a HTML page filled with graphs. Here is an example. This graph is actually showing a problem with the application under the test. The bumps in the red line mean that connections are being dropped from time to time.

In summary, Gatling is not as fancy as JMeter yet, but I think this is the future of load testing.

Keywords as Clutter: Doing More with Less

2012-09-30T09:55:00.003-07:00

When I first started writing Scala, I found it very easy to code in a Java-like way. Perhaps a little too easy in fact. The problem is that if you persist in writing code in an imperative style, you miss a lot of the FP (functional) advantages that Scala provides.

For example, consider a simple for loop in Java, the unenhanced version where you need the index:

for (int i = 1; i <= 10; i++) {
    System.out.println("I like the number " + i);
}

It is easy to transliterate this into Scala like this:

for (i <- 1 to 10)
  println("I like the number " + i)

This is OK. However, a more idiomatic way is to use a foreach function:

Range(1, 10).foreach(println("I like the number " + i))

This is functional, rather than procedural. People will have various reasons for preferring this style. But the interesting thing to me is that foreach is not a Scala keyword like for. It is just another function. Code is easier to compose when you get rid of the magic of keywords. You can do more with less. Another magic keyword is if. Again, Scala allows you to use it, and sometimes you need it. However, if you think about it a little, very often you find that using a Scala idiom is better. For example, it is tempting to translate a Java null check to something like this:

if (functionResult.isDefined) // dubious style
  doMyProcessing(functionResult.get)

However this check on the functionResult Option can be a bit tedious. The same thing can be written like this:

functionResult.foreach { doMyProcessing(_) }

This may look a bit odd if functionResult is only a single value. However, by writing it in this way you have succeeded in moving the boring check down into the functionResult Option implementation. This is especially useful when you have a lot of such "null checks" and wish to compose them. You don't want the code to become spaghetti-like with if statements. Here is a classic cheat sheet about dealing with Option's in a functional way. A different way of getting rid of if is to use pattern matching. Here is some Scala written in a Java-like way:

if (actorMessage == "toast_warm")
  pop()
else if (actorMessage == "toast_cold")
  toastSomeMore()
else
  println("Who moved my toast?")

Better is the Scala idiom:

actorMessage match {
  case "toast_warm" => pop()
  case "toast_cold" => toastSomeMore()
  case _ => println("Who moved my toast?")
}

Admittedly, match and case are still keywords. But in this case, it's probably worth it. Unlike the Java case label, Scala's case clause can take, not just an int or a String, but anything, including nested structures. That is the power of pattern matching. I am collecting Scala idioms, so if you can think of other good examples, especially ones that do away with keywords, add a comment.