Gathering effects in Tagless Final

March 18, 2024

Developers who have experience with Scala know how to work with side effects. Find them, wrap each of them with IO.delay{..} to get an effect, chain them together using Apply[F] or Monad[F], run the result program as IOApp. In this case all side effects are launched in such an order as they are chained in the result IO[..]. But there is another way of working with a certain type of effectful computations. It could improve testability and control over how these computations are executed. So in this article I’d like to show how to gather and process these effects later, after the logic itself is executed in full.

Context Passing in Tagless Final

August 30, 2023

Context passing is the thing that all programmers face regardless of a programming language they use. In this article I’d like to discuss the ways how this can be solved in Scala using Cats Effect, ZIO, cats-mtl and the tagless final encoding.

Sum Data Types with Shapeless

April 16, 2018

Scala has support of algebraic data types out of the box but often it is not enough for complex cases. In this article I will try to show its limits and how to bypass them using shapeless library by the example of sum data types.

HLL counters as Cassandra user defined aggregates

December 4, 2016

In the previous article we discussed usage of hll countes for finding the number of unique values in a collection. But in this article we will see how to use them in Cassandra db internally in order to provide such functionality using user defined functions(UDF) and aggregates(UDA).

Tips about variance in scala

November 26, 2016

Type variance in scala is quite a tricky topic especially if you do not use it often - details might slip out of mind easily in this case. So below you can find very short tips about it which purpose is to remind how it works.

Pagination and Streams

April 10, 2016

In this article we will see how to use different streams(like akka-stream) for pagination and when it can be useful. The main idea of pagination is partition of a big sequence of objects into several parts(or pages) in order to make possible its processing page by page. For example, you have a 1.000.000 users in the database and you need to send an email to all of them. You could try to load all user records in a big list and process it at once but it would not be memory-efficient approach. Instead you can partition the list of users into pages by 100 users per page, load one page, send emails to users in this page, load next page and so on. This will be a much more efficient way to deal with big collections of records.

So lets try to implement this approach but for a more complex case.

Validation in scala

November 27, 2015

This article is about possible solutions for validation in scala. Validation is the process of checking input data in order to provide its correctness and requirements compliance.

Implementations

There are several libraries in scala which can be used for validation:

Accord - Accord is a validation library written in and for Scala. Docs here
Skinny validator - skinny-validator is a portable library, so it is possible to use skinny-validator with Play2, Scalatra and any other web app frameworks. Docs here
DValidation - A little, opinionated Scala domain object validation toolkit
io.underscore.validation - Work-in-progress library demonstrating a functional programming approach to data validation in Scala

The source code for this article is here

Using T-Digest: Median calculation and anomaly detection

October 13, 2015

In this article you can find information about using t-digest library in order to measure average value of some quantity(average session time). There is also an answer for the question: What and why should you use to make such the measurement mean or median? Besides, list and comparison of different implementations is presented below in the article.

Main problem

So in what cases we have need to calculate mean/median? For example we have a site and we want to understand how much time an average user spent on our site. In order to do it we should calculate an average duration of a user session. And there are at least two ways to do it - calculate an arithmetic mean(or just mean) or calculate a median. The calculation of mean is very simple. You need two fields: one for a sum of elements and another for their count. But it doesn`t work very well with anomalies in the data. I mean the case when one or several elements differ greatly from others. Lets assume that we have such values for our session durations(in milliseconds):

3000, 2000, 3000, 5000, 3000, 4000, 4500, 3200, 2700, 3380

mean = (3000+2000+3000+5000+3000+4000+4500+3200+2700+3380) / 10 = 3378 msecs. In this case all is ok.

But what if one of these users opens the site, forgets to close a browser tab and goes afk for an hour(3.600.000 msecs):

3000, 2000, 3000, 5000, 3000, 4000, 4500, 3200, 2700, 3600000

mean = (3000+2000+3000+5000+3000+4000+4500+3200+2700+3600000) / 10 = 363040 msecs. Just one of the users influences strongly on mean value. Generally speaking, the mean is only representative if the distribution of the data is symmetric, otherwise it may be heavily influenced by outlying measurements. In simple cases it is possible to use some kind of a filter. But often we just don’t know what threshold we should use to filter values. Whereas the median value is the same in both cases and is equal to 3100. So in the cases like this the median will be more useful then the mean. However the calculation of the median in general case needs a lot of memory - O(n)

Comparison of HLL implementations

September 22, 2015

In this article we will look at HLL algorithm and different implementations of it.

General Info

HLL is a propabalistic algorithm which is used for a estimation of unique values. More details about HLL you can get here. The main reason to use HLL is necessity to estimate uniques in very big amount of data in case if it is possible to sacrifice accuracy of an unique counter.

List of implementations

You can find several implementations of HLL:

twitter/algebird - a scala library from twitter which contains lots of different algorithms including HLL
prasanthj/hyperloglog - a detached java library for HLL
addthis/stream-lib - another java lib which have an implementation of HLL.
aggregateknowledge/java-hll - a low-level java implementation of HLL

Next in this article we will take a close look at all these libs and answer the question: “Why should we use HLL?”.

Lens in scala

September 19, 2015

In this article let’s take a look at such a thing as lens(or lenses). A Lens is an abstraction from functional programming which helps to deal with a problem of updating complex immutable nested objects like this:

case class User(id: UserId, generalInfo: GeneralInfo, billInfo: BillInfo)
case class UserId(value: Long)
case class GeneralInfo(email: Email,
                       password: String,
                       siteInfo: SiteInfo,
                       isEmailConfirmed: Boolean = false,
                       phone: String,
                       isPhoneConfirmed: Boolean = false)
case class SiteInfo(alias: String, avatarUrl: String, userRating: Double = 0.0d)
case class Email(value: String)
case class BillInfo(addresses: Seq[Address], name: Name)
case class Name(firstName: String, secondName: String)
case class Address(country: Country, city: City, street: String, house: String, isConfirmed: Boolean = false)
case class City(name: String)
case class Country(name: String)

If we want to increase userRating in this model then we will have to write such a code:

val updatedUser = user.copy(
  generalInfo = user.generalInfo.copy(
    siteInfo = user.generalInfo.siteInfo.copy(
      userRating = user.generalInfo.siteInfo.userRating + 1
    )
  )
)

And we have to write the code below to confirm all of the addresses in BillInfo

val updatedAddresses = user.billInfo.addresses.map(_.copy(isConfirmed = true))
val updatedUser = user.copy(
	billInfo = user.billInfo.copy(addresses = updatedAddresses)
)

If we increase a level of nesting in our structures then we will considerably increase amount of a code like this. In such cases lens give a cleaner way to make changes in nested structures.

True fail-fast async error handling with Expression

September 12, 2015

How it was said in the previous article there is no way to do truly fail-fast async error handling using only scala or scalaz.

Look at the example below:

val longFut = longFuture() // very long future
val shortFut = shortFuture()
val failedFut = failedFuture() // throw new IllegalStateException("future is failed")

val result = for {
  long <- longFut
  short <- shortFut
  failed <- failedFut
} yield {
  long + " | " + short + " | " + failed
}

In that example we will wait all the futures until we get IllegalStateException because for-comprehension always handle futures in the order which we define them since Scala translates the example above to this:

longFut.flatMap { long =>
  shortFut.flatMap { short =>
    failedFut.map { failed =>
      long + " | " + short + " | " + failed
    }
  }
}

But it is possible to avoid this problem with Expression library(link)

Practical Scalaz: Make async operations with scalaz.Either and Futures

September 5, 2015

Many of us know about such library as scalaz. For those who don’t know it is a library for functional programming in scala. You can find it here. Then I was trying to learn and understand this lib it was quite difficult to realize how exactly it can be used in a real code in a real system. I looked thought lots of articles about it, but there were only abstract examples. So I`ve decided to write a little example in order to show how scalaz can be used in a real system.

Intro

Idea: Future and scalaz.Either can be used as a result of an asynchronous operation.

Reason: We must compose futures and eithers in order to deal with possible errors which may occur during execution of async operations.

For example we have to gather information from several different DBs and an external service like Twitter. We also don`t want to use exceptions as notifications about errors. Why? Because it sucks ^_^ It will be better if exceptions are used for something really exceptional. One more thing which we should implement is fail-fast error handling because it will be wasteful to continue program execution if it already contains some errors.

Contents

Implementations

Main problem

General Info

List of implementations

Intro