It goes without saying that for the last decades a vast majority of institutions, companies, firms and the like, have dealt with the Big Data reality, which required or just forced the urgent necessity to create processing platforms capable of storing and analyzing this vast amount of data. Here is why Hadoop and [Spark](/spark-consulting/), later on, around the year 2008, came into picture.
High-volume data streams and a great number of reports for the real estate market was what we were confronted with on one of our client’s projects. More specifically, the client faced a tough scalability problem: the property market reports generated from such a big data set took up to 3 hours to produce (just for 100 markets). Worse, this time was increasing as each day a few million new records were fed to augment the data set. In a step to resolve the problem, the client decided to invest in a new system architecture.
We have recently had an opportunity to design and develop an independent machine learning-based service for a social publishing and e-learning platform. The client needed to build a service that would deliver a recommendation system with automatic content classification. In this post I would like to share some background to the work as well