Scala-bility

Tuesday, 23 September 2014

The QuizItemSource Typeclass

When talking about scaling an application, people are usually referring to performance: how to design the software so that limited hardware can support millions of users? But it is also important to be able to scale the feature set of an application. If users say they want a new button, you don't want to have to make extensive refactorings of your core model classes as is all too often the case in industry. The key to both of these related demands is decoupling.

While writing the Libanius quiz application, I found typeclasses to be very useful in decoupling behaviour from the model classes. Typeclasses (formally spelt "type classes") originated in Haskell, but can be used in Scala too, being a language that supports FP well. I'll describe one use of typeclasses in Libanius, an example that should be less "theoretical" and more extended than the ones that are usually given.

To start with, the core model in Libanius is a simple containment hierarchy as shown in the diagram. The Quiz data structure contains all the information necessary to provide a stream of quiz items to a single user. It is split by topic into QuizGroups, which, for performance reasons, are further partitioned into "memory levels" according to how well the user has learned the items. The flow of a typical request then is Quiz.findQuizItem() followed by QuizGroup.findQuizItem() calls, followed by QuizGroupMemoryLevel.findQuizItem() calls.

In practice the findQuizItem() signatures and functionality were similar but not identical. Rather than have them scattered throughout the model classes, it was desirable to put them together in a single place where the similarites could be observed and moulded, avoiding any unnecessary repetition. The same was true of other operations in the model classes. Pulling them out would also solve the problem of too much bloat in the model.

Furthermore, although the Quiz data structures are optimized for finding quiz items, I wanted to be able to extend the same action to other entities. A wide range of scenarios can be imagined:

A Twitter feed of quotes might be seen as a mapping from speakers to quotations, which could be turned into a Who-Said-What quiz.
A set of web pages with recipes could be read as a "What ingredient goes with what dish?" quiz.
In a chat forum context, a person might be thinking of quiz questions and feeding them through.
A clever algorithm might not simply be "finding" quiz items in a store, but dynamically generating them based on previous answers.
The Quiz might not be a Scala data structure, but a database. This could be a NoSQL in-memory database like Redis or leveldb. It could also be a traditional RDBMS like Postgres accessed via Slick.

What links all these cases together is that a findQuizItem() function is necessary but with a new implementation. So the immediate priority is to decouple the findQuizItem() action from the simple Quiz data structure. Notice that this action might be applied to structures that didn't even know they were supposed to be quizzes. We don't want to have to retroactively modify classes like TwitterFeed and ChatParticipant to make them "quiz aware".

If you know a lot of Java, some design patterns might occur to you here, like Adapter, Composite, or Visitor. In the Visitor pattern, in particular, you can separate out arbitary operations from the data structures and form a parallel hierarchy to run those operations. But the original data structures still have to have an accept() method defined. In Scala a more convenient alternative, graced with all the polymorphism you could ever need, is the typeclass construct. Typeclasses excel at separating out orthogonal concerns.

Firstly a trait is defined for the typeclass with the behaviour we wish to separate out. Let's begin by calling the main function produceQuizItem() rather than findQuizItem(), to abstract it more from the implementation:

trait QuizItemSource[A] {
  def produceQuizItem(component: A): Option[QuizItem]}
}

The component could be a Quiz, a QuizGroup, a QuizGroupMemoryLevel or anything else. Instead of calling produceQuizItem() on the component, the component is passed as a parameter to the operation.

Next, define some sample implementations (aka typeclass instances) for the entities (aka model components) that we already have:

object modelComponentsAsQuizItemSources {

  implicit object quizAsSource extends QuizItemSource[Quiz] {
    def produceQuizItem(quiz: Quiz): Option[QuizItem]} = 
      … // implementation calls methods on quiz
  }

  implicit object quizGroupAsSource extends QuizItemSource[QuizGroup] {
    def produceQuizItem(quizGroup: QuizGroup): Option[QuizItem]} = 
      … // implementation calls methods on quizGroup
  }

  implicit object quizGroupMemoryLevelAsSource extends QuizItemSource[QuizGroupMemoryLevel] {
    def produceQuizItem(quizGroupMemoryLevel: QuizGroupMemoryLevel): Option[QuizItem]} = 
      … // implementation calls methods on quizGroupMemoryLevel
  }
}

These sample implementations can be accessed in client code once it is imported in this way:

import modelComponentsAsQuizItemSources._

However, we don't want to call implementations directly, like quizGroupAsSource.produceQuizItem(myQuizGroup). Instead we want to just call produceQuizItem(myQuizGroup) and let the correct implementation be selected automatically: this is what polymorphism is about. To facilitate this, let's define a function in the middle for accessing the implementations (and this is really the final piece of the puzzle for typeclasses!):

object QuizItemSource {

  def produceQuizItem[A](component: A)(implicit qis: QuizItemSource[A]): Option[QuizItem] =
    qis.produceQuizItem(component)
}

If you're coming from other languages, it's the use of the implicit that is at first difficult to grasp, but this is really the Scala feature that makes typeclasses convenient to use, where they are not in, say, Java. Let me explain what is happening here:

Suppose in client code, the modelComponentsAsQuizItemSources has been imported as above. Now the quizGroupAsSource is implicitly in scope, among other implementations. If there is now a call to QuizItemSource.produceQuizItem(quizGroup), the compiler, seeing that quizGroup is of type QuizGroup will match qis to an implicit of type QuizItemSource[QuizGroup]: this is quizGroupAsSource, the correct implementation.

The great thing is that the correct QuizItemSource implementation will always be selected based on the type of entity passed to produceQuizItem, and also the context of the client, specifically the imports it has chosen. Also, we have achieved our original goal that whenever we realize we want to turn an entity into a quiz (TwitterFeed, ChatParticipant, etc.) we don't have to modify that class: instead, we add an instance of the typeclass to modelComponentsAsQuizItemSources and import it.

This approach is a break with OO, moving the emphasis from objects to actions. Instead of adding actions (operations) to objects, we are adding objects to an action generalization (typeclass). We can make that action as powerful as we like by continuing to add objects to it. Again, unlike in OO, those objects do not need to be custom-built to be aware of our action.

That is the main work done. There are two more advanced points to cover.

Firstly, it is possible to add some "syntactical sugar" to the definition of produceQuizItem() above. In this code

def produceQuizItem[A](component: A)(implicit qis: QuizItemSource[A]): Option[QuizItem] =
    qis.produceQuizItem(component)

… the qis variable may be viewed as clutter. We can push it into the background using context bound syntax:

def produceQuizItem(A : QuizItemSource](component: A): QuizItem =
  implicitly[QuizItemSource].produceQuizItem(component)

What happens is that the context bound A : QuizItemSource establishes an implicit parameter, which is then pulled out by implicitly. This syntax is a bit shorter, and looks shorter still if you do not need to call implicitly but are simply passing the implicit parameter down to another function.

Secondly, it is possible to have a typeclass with multiple type parameters. Context bound syntax is not convenient for this case, so, first, let's go back to the original syntax:

def produceQuizItem[A](component: A)(implicit qis: QuizItemSource[A]): Option[QuizItem] =
    qis.produceQuizItem(component)

Suppose a Quiz were passed in, the quizAsSource instance will be selected. You will remember this looked like this:

implicit object quizAsSource extends QuizItemSource[Quiz] {
  def produceQuizItem(quiz: Quiz): Option[QuizItem]} = 
    … // implementation calls methods on quiz
}

However, I simplified a bit here. The original implementation did not in fact return simple quiz items in the case of a Quiz. Whenever the Quiz retrieved a quiz item from a QuizGroup, it would add some information. Specifically it would generate some "wrong choices" for the question and return that to the user to allow a multiple-choice presentation of the quiz item. So the QuizItem would be wrapped in a QuizItemViewWithChoices object. The signature had to be:

  def produceQuizItem(quiz: Quiz): Option[QuizItemViewWithChoices]} = …

Now the problem is this does not conform to the typeclass definition. So let's massage the typeclass definition a bit by turning the return type into a new parameter, C:

trait QuizItemSource[A, C] {
  def produceQuizItem(component: A): Option[C]
}

Then the typeclass instance for Quiz will look as desired:

implicit object quizAsSource extends QuizItemSource[Quiz, QuizItemViewWithChoices] {
  def produceQuizItem(quiz: Quiz): Option[QuizItemViewWithChoices]} = 
    … // implementation calls methods on quiz
}

The intermediate function for accessing the typeclass becomes a bit more complex. Recall this was previously:

  def produceQuizItem[A](component: A)
      (implicit qis: QuizItemSource[A]): Option[QuizItem] =
    qis.produceQuizItem(component)

The second type parameter for the return type is added as C:

  def produceQuizItem[A, C](component: A)
      (implicit qis: QuizItemSourceBase[A, C], c: C => QuizItem): Option[C] =
    qis.produceQuizItem(component)

C is constrained so that the return value must be compatible with QuizItem. To make QuizItemViewWithChoices compatible with QuizItem, it is not necessary to make it a subclass with lots of boilerplate as in Java. Instead an implicit conversion is defined next to the QuizItemViewWithChoices class like this:

object QuizItemViewWithChoices {
  // Allow QuizItemViewWithChoices to stand in for QuizItem whenever necessary
  implicit def qiView2qi(quizItemView: QuizItemViewWithChoices): QuizItem =
    quizItemView.quizItem
}

Now it all works!

That ends the description of the QuizItemSource typeclass. For more examples, see the other typeclasses in the Libanius source code such as CustomFormat and ConstructWrongChoices, which cover other concerns orthogonal to the model, or view this excellent video tutorial by Dan Rosen.

Monday, 3 March 2014

The Akka Event Bus on Android

Event buses are useful for keeping components in a system loosely coupled, and we generally see them in two areas: distributed systems, and UIs. This post is about the use of an event bus in a UI, specifically Android.

There exist event bus libraries for Android like Otto and greenrobot. There also exists an Event Bus API for Akka. What I haven't seen yet is the Akka Event Bus set up on Android. So, here it is!

First a word on the motivation. In the Libanius Android application, I recently separated part of a screen (in Android's parlance, an Activity) into its own separate class. A problem was that, in response to certain events, the new class needed to add data back to the data structure in the main class for the Activity. My initial approach was to define a closure that could do the work of adding the data, and pass the closure as an argument to the new class (the sub-Activity). However, this felt a bit ad hoc, so I decided it was time to introduce a general event-handling mechanism into Libanius Android. Having helped create an event bus for the GWT UI framework before, I thought it would be interesting to see if the same idea could be applied in Android.

The Libanius Event Bus just extends Akka's ActorEventBus. It is nearly as simple as this:

class LibaniusActorEventBus extends ActorEventBus with LookupClassification
    with AppDependencyAccess {

  type Event = EventBusMessageEvent
  type Classifier = String

  protected def classify(event: Event): Classifier = event.channel

  protected def publish(event: Event, subscriber: Subscriber) {
    subscriber ! event
  }
}

The diagram below shows what use we are actually making of the event bus:

The Android parts are OptionsScreen and DictionarySearch. In the UI, DictionarySearch is actually part of the OptionsScreen activity, but they communicate through the event bus. When the user clicks on a button from DictionarySearch to add a new word to the quiz, a NewQuizItemMessage is constructed. This event is just a wrapper for the quiz item and an identifying header:

case class NewQuizItemMessage(header: QuizGroupHeader, quizItem: QuizItem)
  extends EventBusMessage

The NewQuizItemMessage is published to the QUIZ_ITEM_CHANNEL by calling sendQuizItem() in LibaniusActorSystem. LibaniusActorSystem is a facade to LibaniusActorEventBus and other actors in the system not discussed here.

object LibaniusActorSystem extends AppDependencyAccess {
  val system: ActorSystem = ActorSystem("LibaniusActorSystem")

  val appActorEventBus = new LibaniusActorEventBus
  val QUIZ_ITEM_CHANNEL = "/data_objects/quiz_item"

  def sendQuizItem(qgh: QuizGroupHeader, quizItem: QuizItem) {    
    val newQuizItemMessage = NewQuizItemMessage(qgh, quizItem)
    appActorEventBus.publish(EventBusMessageEvent(QUIZ_ITEM_CHANNEL, newQuizItemMessage))
  }
}

That's the guts of it. There is some more code in OptionsScreen to create an actor to subscribe to the QUIZ_ITEM_CHANNEL, and actually add the QuizItem when it sees the event. For more detail, see the code in the Libanius Android project.

Friday, 3 January 2014

Extending Play with AngularJS: Nested MVCs in a Webapp

I finally got a bit of spare time to tighten up the JavaScript in Libanius-Play, the web interface to my quiz application. For a long time, I was dissatisfied with the way data was being passed around from server to client and then back from client to server: there was too much repetition. The JavaScript library jQuery was of use to create AJAX callbacks, but still, any time there was a new parameter, it had to be added in too many places.

AngularJS is another JavaScript library. Like jQuery it is geared towards asynchronous communication, but it provides an MVC framework. The Play Framework is also MVC, so the result is an MVC-within-MVC architecture (see diagram). It seems to work quite well.

The AngularJS controller has a services layer to fetch data from the Scala back-end. Then it updates the page. The nice thing about it is that I don't have to update variables in the web page one by one: I can just bind variables in the web page to attributes in the model. This is achieved in the HTML using AngularJS tags like ng-model and the use of curly braces like this:

    
    {{ quizData.prompt }}

There are a number of variables like this scattered through the HTML, but they can all be updated at once in a single line of JavaScript if the Play backend returns a JSON structure. (The data structure can be defined as a case class in Scala on the server, and will be dynamically mirrored as a JavaScript structure on the client-side without the need to map attributes manually.) Here is a simplified version of what happens in the JavaScript controller after the server has processed a user response: basically it returns a new quiz item.

    
    services.processUserResponse($scope.quizData, response).then(function(freshQuizData) {
       $scope.quizData = freshQuizData.data // updates values in the web page automagically
       labels.setColors($scope.quizData)    // extra UI work using a custom JavaScript module called labels
    }

In the first line, services.processUserResponse is a function in the services.js file which sends user input to the server, and immediately returns a promise of new data. The promise has a handy then function which takes a callback argument, describing what happens when the data actually comes through, i.e. updating the view.

All in all, a good experience with AngularJS.

Wednesday, 18 December 2013

Private Constructors

Private constructors: mostly you won't need them. Commonly in Scala we just define a class and let the constructor and parameters be exposed to the world:

case class SocialFeedCache(tweets: List[Tweet]) { ... }

But suppose the tweets need to be organized for efficiency, and we don't want the cache to be constructed in an invalid state? Then we might have a factory method that performs the necessary initialization. In Java this would be a static method. In Scala we put it in a companion object:

object SocialFeedCache {
  def apply(tweets: List[Tweet]): SocialFeedCache = {
    // ... organize the tweets ...
    SocialFeedCache(organizedTweets) // call the primary constructor
  }
}

So now we must always call the factory method rather than the primary constructor. How can we enforce this? How can we block anyone from calling the constructor directly? A first attempt might look like this:

private case class SocialFeedCache(tweets: List[Tweet]) { ... }

But this doesn't just hide the constructor; it hides the whole class! In fact there will be a compile-time error, because the factory method in object SocialFeedCache is trying to expose a class that is now hidden.

The correct syntax is actually:

case class SocialFeedCache private(tweets: List[Tweet]) { ... }

And that's all there is to it. This is a useful technique whenever you have special initialization taking place in factory methods.

Saturday, 17 August 2013

Akka in Your Pocket

Although Akka is usually thought of in connection with massively scalable applications on the server side, it is a generally useful model for concurrency that can be deployed in an ordinary PC application or even a mobile app.

My Nexus 4 phone has four cores. If I have some CPU-intensive processing in an Android app, I observe that I can speed it up nearly 300% by spreading load across four actors. This can mean the difference between one second response time and quarter-second response time, quite noticeable to the user. Therefore, along with regular Scala futures, Akka is a useful tool that Scala provides to Android developers to make the most of their users' computing resources.

One tricky bit is getting the initial configuration correct with SBT. After some experimentation, I found that these Proguard settings, adapted from an early version of this project, work. (ProGuard shrinks and optimizes an application, but when you include a library like Akka, it needs to be reminded to keep certain classes using "keep options.")

 
      proguardOption in Android := 
      """
        |-keep class com.typesafe.**
        |-keep class akka.**
        |-keep class scala.collection.immutable.StringLike {*;}
        |-keepclasseswithmembers class * {public <init>(java.lang.String, akka.actor.ActorSystem$Settings, akka.event.EventStream, akka.actor.Scheduler, akka.actor.DynamicAccess);}
        |-keepclasseswithmembers class * {public <init>(akka.actor.ExtendedActorSystem);}
        |-keep class scala.collection.SeqLike {public protected *;}
        |
        |-keep public class * extends android.app.Application
        |-keep public class * extends android.app.Service
        |-keep public class * extends android.content.BroadcastReceiver
        |-keep public class * extends android.content.ContentProvider
        |-keep public class * extends android.view.View {public <init>(android.content.Context);
        | public <init>(android.content.Context, android.util.AttributeSet); public <init>
        | (android.content.Context, android.util.AttributeSet, int); public void set*(...);}
        |-keepclasseswithmembers class * {public <init>(android.content.Context, android.util.AttributeSet);}
        |-keepclasseswithmembers class * {public <init>(android.content.Context, android.util.AttributeSet, int);}
        |-keepclassmembers class * extends android.content.Context {public void *(android.view.View); public void *(android.view.MenuItem);}
        |-keepclassmembers class * implements android.os.Parcelable {static android.os.Parcelable$Creator CREATOR;}
        |-keepclassmembers class **.R$* {public static <fields>;}
      """.stripMargin

Note: this has only been tested to work with Akka 2.1.0 and Scala 2.10.1. If you use other versions, you may have do some more tinkering/googling.

Saturday, 3 August 2013

Using 2.10 Futures in Android

In an Android app, it is usually necessary to run some tasks in the background so they don't block
the UI. The standard way to do this in Java code is with an AsyncTask. Basically you fill out the body of an AsyncTask with two parts: what needs to happen in the background, and what needs to happen in the foreground once that is complete. The return value of the background part is passed as a parameter to the foreground part:

Here is an example expressed in Scala, as always more concise than Java:

    new AsyncTask[Object, Object, String] {
      override def doInBackground(args: Object*): String = calculateScore
      override def onPostExecute(score: Int) { showScore(score) }
    }.execute()

From a Scala perspective, it might occur to you that this is a bit similar to what a Future does. Futures are more general and concise, so in my opinion they are to be preferred. Indeed, following the example of Terry Lin, I tried to get the identical functionality as the above working using the Future trait in Scala 2.10, and it worked fine. The new code has this structure:

    future {
      calculateScore
    } map {
      runOnUiThread(new Runnable { override def run() { showScore(score) } }))
    }

This code is a bit shorter. The one thing to remember is that after the score is returned, we have
to remind Android to return control to the main thread -- that's what the runOnUiThread() is for.

Thursday, 4 July 2013

A Benefit of Implicit Conversions: Avoiding Long Boilerplate in Wrappers

In OO practice we use wrapper classes a lot. So much in fact, that the Gang of Four identified several types of wrappers (differentiated more by purpose than structure): Adapters, Decorators, Facades, and Proxies. In this blog post I am going to concentrate on the Proxy design pattern, but the same technique of using implicit def to cut down code repetition can be applied for many different types of wrappers.

As an example, I'm going to take a data structure from my open-source quiz app Libanius. It's not necessary to go into the full data model of the app, but one of the central domain objects is called a WordMappingValueSet, which holds pairs of words and associated user data. This data is serialized and deserialized from a custom format in a simple file.

The problem is that the deserialization, which involves parsing, is rather slow for tens of thousands of WordMappingValueSet's, especially on Android, the target platform. The app performs this slow initialization on startup, but we don't want to keep the user waiting before starting the quiz. How can the delay be avoided? The solution is to wrap the WordMappingValueSet in a Lazy Load Proxy, which gives the app the illusion that it immediately possesses all these WordMappingValueSet objects when in fact all it has is a collection of proxies that each contain the serialized String for a WordMappingValueSet. The expensive work of deserializing the WordMappingValueSet is done by the proxy only when each object is really needed.

The question now is how to implement the Lazy Load Proxy. The classic object-oriented way
to do it, as documented by the Gang of Four book, is to create an interface to the WordMappingValueSet that contains all the same methods, and then a WordMappingValueSetLazyProxy implementation of that interface. The client code instantiates the proxy object but it makes calls on the interface as if it were calling the real object. In this way, the illusion is preserved so not much has to change in the client code, but "under the hood" what is happening is that the proxy is deserializing the String and then forwarding each call to the real object.

In the above diagram, the function fromCustomFormat() does the work of deserialization. In Java this would be a static method; in Scala it is placed within the singleton object WordMappingValueSet. Inside WordMappingValueSetLazyProxy, the serialized String is stored, and fromCustomFormat() is called the first time one of the business methods is called, before the call is delegated to the method of the same name in WordMappingValueSet.

This kind of wrapper is very common in the Java world. But there is a problem. There are a lot of methods in WordMappingValueSet. These all need to be duplicated in IWordMappingValueSet and WordMappingValueSetLazyProxy. It violates the principle of DRY (Don't Repeat Yourself). Implementing this in Scala, I would go from about a hundred lines of code to two hundred lines. (In Java I would be going from two hundred lines to four hundred lines.) Furthermore, every time I need to add a method to WordMappingValueSet, I now need to do the same to the other two types, so the lightweight flexibility of the data structure has been lost.

In Scala, there is a better solution: use an implicit conversion. Firstly, as before, the WordMappingValueSet is wrapped inside a WordMappingValueSetLazyProxy:

case class WordMappingValueSetLazyProxy(valuesString: String) {  
  lazy val wmvs: WordMappingValueSet = WordMappingValueSet.fromCustomFormat(valuesString)
}

But that's all that's necessary for that class. We won't be defining the same methods all over again. Notice also that the lazy keyword, provided by Scala, means that fromCustomFormat() won't actually be called until some method is called on wmvs, i.e. until it is really needed.

But the real trick to make this work is to define the implicit conversion:

object WordMappingValueSetLazyProxy {
  implicit def proxy2wmvs(proxy: WordMappingValueSetLazyProxy): WordMappingValueSet = proxy.wmvs
}

This "converts" a WordMappingValueSetLazyProxy to a WordMappingValueSet. More specifically, it says to the compiler, "Whenever a method is called on WordMappingValueSetLazyProxy such that it doesn't seem to compile because no such method exists, try calling that method on WordMappingValueSet instead and see if that works." So we get the benefits of adding all the methods of WordMappingValueSet to WordMappingValueSetLazyProxy without having to actually do it by hand as in Java.

There are other types of proxies, objects that give you the illusion that a resource has been acquired before it actually has. Perhaps the most common type is the network proxy. This is how EJBs are implemented, and perhaps the most frequent criticism of EJBs, since the early days of 2.x, is the large amount of code redundancy: for each domain object, you need to write a plethora of interfaces surrounding it. A lot of tools and techniques have emerged over the years to deal with the redundancy: code generation, AOP byte code weaving, annotations, etc. Could implicit conversions have avoided this? Some believe that implicits are too powerful and scary, but considering how much time has been spent over the years on frameworks involving reams of XML to work around the limitations of Java, it seems there might be a case for allowing this kind of power to be built into the language. Scala has this.