12 minutes read
In the last article from the Data Science for Losers series I’ve used a few mini examples in Scala to show how Apache Spark works. Granted, I’m still not sure if this was a “good” idea at all but regarding the fact that the whole series degenerated into something really chaotic a few harmless lines in Scala wouldn’t make it worse anyway. However, the much lower retweet-rate of the last article made it clear that the jump from my bad Python code to my even worse Scala code wasn’t very well accepted. Well, I think it’s time for a crash course in Scala 😯
The sources can be found here.
What is Scala?
A JVM-based object-functional programming language designed by Martin Odersky (EPFL, Lausanne, Switzerland). The Bible of Scala is this book:
I assume that you have installed Scala from here and run Scala IDE to write your code. Also, don’t forget to install Scala’s build tool SBT. The Java VM has, of course, to be installed too because without it you’d neither be able to run Scala nor Scala IDE. All of these tools and environments are available for Linux, Windows and Mac and only differ regarding installation paths and handling of user permissions. Feel free to discuss these issues with your machine/OS.
All code in this tutorial will be written in a Scala Project using “Scala Worksheets”. A worksheet in this case is just a fancy name for a code file that gets compiled & executed when you save it. I recommend using worksheets when learning new stuff from Scala or testing some features before you deploy them as real code. Of course, if you’d rather write small programs instead of worksheets full of examples, just do it. To insert a new worksheet into your Scala project click on the right mouse button over the name of your package and select as new file type “Scala Worksheet”:
Our project consists of a worksheet named scala_crash.sc. Here’s how the immediate output, colored in green, looks like:
Everything you type in will be compiled & executed immediately after you save the file. So, let’s start to dissect a few important features of Scala
Scala prefers immutability, allows mutability
In Scala you have two different types of variables: vars and vals. A val can’t be changed after it has been defined. A var can be changed.
Three important things should be mentioned:
- The data types come after the name of the variable.
- The data types can be inferred by Scala compiler if we don’t use them explicitly [However, Scala is statically typed. It’s not a dynamic language. Unlike dynamic languages Scala knows its types at compile time.]
- Semicolons are optional
Here’s an example with inference:
Scala Functions are like variables
Functions in Scala are defined the same way like vals or vars. This follows the general Uniform Access Principle in Scala. This also means that Scala treats functions like any other objects. They can be assigned to a variable, passed as a parameter, or even returned as a value (hint: higher-order-functions)
Here we define a function (or method) saySomething that expects a String parameter and prints out a String together with the given value by using interpolation. In Scala you can manipulate raw strings by putting an s before a String and using the $-prefix. Another option to manipulate Strings is by putting them in triple quotes. Such Strings can be defined over multiple lines and retain their formatting (carriage-returns, line-feeds, tabs etc.):
But before going any further let’s analyze Scala functions in more detail. First, in our above example we see no return value defined. Also, what does this rather weird equals sign mean? It almost looks like we were defining a variable and not a function. Well, this is the Uniform Access Principle! Defining or using a function in Scala is the same like defining or using a variable. Scala treats functions as a first-class-citizens. And regarding the returned data types the answer is simple: if we don’t put a return data type Scala will imply the default return data type: Unit. This data type is an “empty data type” which means that the function isn’t returning anything at all. Here’s the same function with explicit Unit.
Here we see that the definition of a returned data type for functions follows the same rules like those for variables. We separate the type from the function name by a colon and open the definition of the function body with an equals-sign. Here’s a simple juxtaposition of a variable and a function. The only difference is the opening keyword: val respective var for variables and def for functions (or methods):
Being a object-functional language Scala’s functions behave a little differently like functions from non-functional languages. For example, we don’t have to use the return keyword to explicitly return a value. In such cases Scala would assume that the last statement in a function is the return value.
There was no return in function returnStringByDefault so Scala assumed that the single String-value has to be returned.
Functions support named parameters
Scala functions don’t expect you to put your parameters in the same order as they were defined. Just use their names and mix them as you like:
Scala Classes & Objects
Defining and using classes in Scala is not much different from other object-oriented languages (but there are some differences though):
We define classes by using the same keyword like in Java but there are not getters/setters, we see no curly braces enclosing the body of the class. But we can initialize an object from it! How’s that possible? This is due to the fact that Scala does much more for us behind the scenes relieving us from writing endless ceremonial code, getters/setters etc. In this case we simply get an instance with three read-only properties because we’ve defined them as vals. No need for us to exlicitly write getName, getMode, getManufacturer. Accessing such properties via Code Completion is possible too:
Defining a class method is simple.
Making them private (or protected) is the same as in Java:
We’ve already mentioned the Uniform Access Principle which states that there’s no difference between accessing variables or functions. This also means that the user of an instance, like the above t800 object, should not know what’s happening inside. If we change a property, for example, it should be transparent to us regardless of the internal logic. In this example we change the instance name by by using a manually defined setter:
First we set in the class definition the variable “name” to be a var. Also we change the variable’s name from “name” to “_name” because we’ll use it internally to map to the assignment function. Inside the class body we map “_name” to the (publicly visible!) method “name“. Additionally we define a new method named “name_” which takes a String parameter and assigns it to the variable _name that is located in the class constructor’s parameter list. Just for logging purposes we insert a println to show us the value to be assigned to _name. Now, I’m sure this feels a little bit confusing at the first time. Therefore, we’ll walk over it again, step by step. And to make it (hopefully) easier to grasp we change the original variable “name” to “cyborgsName”:
These are the steps one has to make to create a manually defined setter:
- To create a proper setter its underlying variable must be a var
- Inside the class define two methods for the future setter.
- Both methods must have the same name but the one that receives a value must end with an underscore.
- The first method (without the trailing underscore) must be set as equals to the variable from constructor’s parameter list.
- The method ending with an underscore, which contains the assignment logic, must not be separated by space from the following equals sign:
- Also this method must only take a single parameter of the same data type like the variable from constructor’s parameter list
Case Classes
Scala offers an alternative way to define classes which provides many interesting functionalities not found in the Java-like classes from the above example. Case classes implicitly generate much of the “usual” boilerplate code we often have to write by hand. Just take any ordinary Java class and you’ll see long trails of getter/setter, equals and toString definitions. Not so with case classes:
Here we see that the definition is almost the same except the case keyword and missing val respective var. Here we can omit them and still have the access to the fields. If we would do the same with “normal” classes our fields would be declared as private and therefore inaccessible for external clients. But here the fields remain accessible albeit not changeable because Scala declares them as immutable by default.
The instantiation looks a little bit weird because there’s no new in sight. However the result of such call is still a proper object of class type Android. Soon we’ll see how case classes can be used when dealing with pattern matching but let us first use another powerful feature: traits
Inheritance with Traits
We know interfaces, we use them day by day. Scala uses traits which could be defined as interfaces on steroids. Unlike Java interfaces traits allow partial implementations. However, it is not allowed to define constructors in traits.
We define a trait Robot which is then extended by our case class Android. Therefore C3PO inherits the method say. But this is not everything. Case classes can also be used for pattern matching.
We define a method expecting an instance of type (or trait) Robot. Inside the method we define a match statement that takes an instance of Andoid which extends Robot and checks if it fits to any of the predefined cases. As we see a case can ignore certain parameters by using underlines. Here we’ve defined that the first two parameters (name and model) are irrelevant for the first case-statement and only the value of the manufacturer string should be checked for “Anakin Skywalker”. If this is the case then the code after => will be executed. There’s another case which matches everything given and therefore serves as a fallback. In some way the match-case resembles the old-school switch-case but don’t bee fooled by the superficial similarity. Scala’s pattern matching is much more powerful. As you see you can even check for existence of certain values of variables. This is not possible with standard Java switch-case statements.
Higher Order Functions
Scala, being a mixed language, also supports functional techniques and idioms. One of its more prominent features is the support of Higher-Order-Functions. What are they? As we already mentioned, Scala treats functions like any other variable. It can assign a function to a variable, pass it around and even give it back as a result of some other function. The usual examples you can find in any “functional programming tutorial” are surely map, flatMap and reduce. This tutorial will be no different. Let’s use map to see what higher-order-functions are.
First, we define a List of Strings. By default a List-object in Scala inherits the method map which expects a function with a single parameter of type String in its parameter list.
Recognizing such functions the first time is not always easy, so let’s dissect the whole more precisely.
Map, by definition, is described like this:
We read that map expects a function that takes a String value and returns a String. Implicitly a List[String] will be returned afterwards. To achieve this result the function will be applied to every element in the List.
For easier understanding of the whole process, just imagine the List function would contain a little Engine which iterates over the elements and each time the next element gets “cosumed” some sort of internal logic would be executed. Our obligation is to pass this “logic” so the internal Engine can be executed properly. In this case we pass a function that takes a String value and converts it to upper case. Now, the Engine inside List[String] iterates over its elements, one by one, and uses the new “logic” to do something with the elements. What exactly will happen to the elements is of no importance to the List[String] itself. The List is only iterating and everything else is done inside the function we provided. The advantage of this approach is obvious: if we later change the “logic” the enclosing Engine wouldn’t notice anything. Of course this is just a microscopically small example but I’m sure you get the point.
Conclusion
Scala is a wonderful language that combines functional and object-oriented aspects. To master it takes some time and I can only hope that this first part could provide some interesting insights.