where scala is beloved


last update:

Type variance in scala is quite a tricky topic especially if you do not use it often - details might slip out of mind easily in this case. So below you can find very short tips about it which purpose is to remind how it works.

In this article we will see how to use different streams(like akka-stream) for pagination and when it can be useful. The main idea of pagination is partition of a big sequence of objects into several parts(or pages) in order to make possible its processing page by page. For example, you have a 1.000.000 users in the database and you need to send an email to all of them. You could try to load all user records in a big list and process it at once but it would not be memory-efficient approach. Instead you can partition the list of users into pages by 100 users per page, load one page, send emails to users in this page, load next page and so on. This will be a much more efficient way to deal with big collections of records.

So lets try to implement this approach but for a more complex case.

Validation in scala

This article is about possible solutions for validation in scala. Validation is the process of checking input data in order to provide its correctness and requirements compliance.


There are several libraries in scala which can be used for validation:

  • Accord - Accord is a validation library written in and for Scala. Docs here
  • Skinny validator - skinny-validator is a portable library, so it is possible to use skinny-validator with Play2, Scalatra and any other web app frameworks. Docs here
  • DValidation - A little, opinionated Scala domain object validation toolkit
  • io.underscore.validation - Work-in-progress library demonstrating a functional programming approach to data validation in Scala

The source code for this article is here

In this article you can find information about using t-digest library in order to measure average value of some quantity(average session time). There is also an answer for the question: What and why should you use to make such the measurement mean or median? Besides, list and comparison of different implementations is presented below in the article.

Main problem

So in what cases we have need to calculate mean/median? For example we have a site and we want to understand how much time an average user spent on our site. In order to do it we should calculate an average duration of a user session. And there are at least two ways to do it - calculate an arithmetic mean(or just mean) or calculate a median. The calculation of mean is very simple. You need two fields: one for a sum of elements and another for their count. But it doesn`t work very well with anomalies in the data. I mean the case when one or several elements differ greatly from others. Lets assume that we have such values for our session durations(in milliseconds):

3000, 2000, 3000, 5000, 3000, 4000, 4500, 3200, 2700, 3380

mean = (3000+2000+3000+5000+3000+4000+4500+3200+2700+3380) / 10 = 3378 msecs. In this case all is ok.

But what if one of these users opens the site, forgets to close a browser tab and goes afk for an hour(3.600.000 msecs):

3000, 2000, 3000, 5000, 3000, 4000, 4500, 3200, 2700, 3600000

mean = (3000+2000+3000+5000+3000+4000+4500+3200+2700+3600000) / 10 = 363040 msecs. Just one of the users influences strongly on mean value. Generally speaking, the mean is only representative if the distribution of the data is symmetric, otherwise it may be heavily influenced by outlying measurements. In simple cases it is possible to use some kind of a filter. But often we just don’t know what threshold we should use to filter values. Whereas the median value is the same in both cases and is equal to 3100. So in the cases like this the median will be more useful then the mean. However the calculation of the median in general case needs a lot of memory - O(n)

In this article we will look at HLL algorithm and different implementations of it.

General Info

HLL is a propabalistic algorithm which is used for a estimation of unique values. More details about HLL you can get here. The main reason to use HLL is necessity to estimate uniques in very big amount of data in case if it is possible to sacrifice accuracy of an unique counter.

List of implementations

You can find several implementations of HLL:

Next in this article we will take a close look at all these libs and answer the question: «Why should we use HLL?».

Lens in scala

In this article let’s take a look at such a thing as lens(or lenses). A Lens is an abstraction from functional programming which helps to deal with a problem of updating complex immutable nested objects like this:

case class User(id: UserId, generalInfo: GeneralInfo, billInfo: BillInfo)
case class UserId(value: Long)
case class GeneralInfo(email: Email,
                       password: String,
                       siteInfo: SiteInfo,
                       isEmailConfirmed: Boolean = false,
                       phone: String,
                       isPhoneConfirmed: Boolean = false)
case class SiteInfo(alias: String, avatarUrl: String, userRating: Double = 0.0d)
case class Email(value: String)
case class BillInfo(addresses: Seq[Address], name: Name)
case class Name(firstName: String, secondName: String)
case class Address(country: Country, city: City, street: String, house: String, isConfirmed: Boolean = false)
case class City(name: String)
case class Country(name: String)

If we want to increase userRating in this model then we will have to write such a code:

val updatedUser = user.copy(
  generalInfo = user.generalInfo.copy(
    siteInfo = user.generalInfo.siteInfo.copy(
      userRating = user.generalInfo.siteInfo.userRating + 1

And we have to write the code below to confirm all of the addresses in BillInfo

val updatedAddresses = user.billInfo.addresses.map(_.copy(isConfirmed = true))
val updatedUser = user.copy(
	billInfo = user.billInfo.copy(addresses = updatedAddresses)

If we increase a level of nesting in our structures then we will considerably increase amount of a code like this. In such cases lens give a cleaner way to make changes in nested structures.

How it was said in the previous article there is no way to do truly fail-fast async error handling using only scala or scalaz.

Look at the example below:

val longFut = longFuture() // very long future
val shortFut = shortFuture()
val failedFut = failedFuture() // throw new IllegalStateException("future is failed")

val result = for {
  long <- longFut
  short <- shortFut
  failed <- failedFut
} yield {
  long + " | " + short + " | " + failed

In that example we will wait all the futures until we get IllegalStateException because for-comprehension always handle futures in the order which we define them since Scala translates the example above to this:

longFut.flatMap { long =>
  shortFut.flatMap { short =>
    failedFut.map { failed =>
      long + " | " + short + " | " + failed

But it is possible to avoid this problem with Expression library(link)

Many of us know about such library as scalaz. For those who don’t know it is a library for functional programming in scala. You can find it here. Then I was trying to learn and understand this lib it was quite difficult to realize how exactly it can be used in a real code in a real system. I looked thought lots of articles about it, but there were only abstract examples. So I`ve decided to write a little example in order to show how scalaz can be used in a real system.


Idea: Future and scalaz.Either can be used as a result of an asynchronous operation.

Reason: We must compose futures and eithers in order to deal with possible errors which may occur during execution of async operations.

For example we have to gather information from several different DBs and an external service like Twitter. We also don`t want to use exceptions as notifications about errors. Why? Because it sucks ^_^ It will be better if exceptions are used for something really exceptional. One more thing which we should implement is fail-fast error handling because it will be wasteful to continue program execution if it already contains some errors.