The term Elasticsearch is never far from the headlines, and usually for the wrong reasons. Seems like every week brings a new story about an Elasticsearch server that has been breached, which often leads to data being exposed. But why are so many security breaches related to Elasticsearch buckets, and how can companies using this technology take full advantage of it while preventing a data leak?
To answer these questions, one must first understand what Elasticsearch is. Elasticsearch is an open source search and analysis engine and a data store developed by Elastic.
Regardless of whether an organization has a thousand or a billion discrete pieces of information, Elasticsearch allows them to search through large amounts of data and perform calculations on the fly. Elasticsearch is a cloud-based service. However, companies can also use Elasticsearch locally or in conjunction with another cloud offering.
Companies then use the platform to store all of their information in custodians (also known as buckets). These buckets can contain emails, spreadsheets, social media posts, files ̵
In 2020 alone, cosmetics giant Avon published 19 million records in an Elasticsearch database. Another misconfigured bucket involving Family Tree Maker, an online genealogy service, revealed over 25 GB of sensitive data. The same thing happened with the sports giant Decathlon, which leaked 123 million records. Then more than five billion records were exposed after another Elasticsearch database was left unprotected. Surprisingly, it contained an extensive database of previously breached user information from 2012 to 2019.
According to what has been said so far, those who have chosen to use cloud-based databases will also need to perform the necessary due diligence to configure and secure every corner of the system. Clearly, this need is often overlooked or simply ignored. A security researcher even went to the trouble of figuring out how long it would take hackers to find, attack, and exploit an unprotected Elasticsearch server that was purposely exposed online – eight hours was all it took.
The digital transformation has definitely changed the way modern business thinks. The cloud is seen as a novel technology that needs to be adopted. Cloud technologies certainly have their advantages, but their improper use has very negative consequences. Failure to understand, or fail to understand, the security implications of this technology can have dangerous business implications.
It is therefore important to know that with Elasticsearch, the basic security recommendations and configurations cannot be skipped just because a product is freely available and highly scalable. With data being widely celebrated as the new gold coin, the demand for monetizing current data has never been greater. Obviously, privacy and security have played second fiddle to some companies to take advantage of it as they are doing their best to capitalize on the data gold rush.
Is there only one attack vector for a server that can be breached? Not really. The truth is, there are several ways in which a server’s content could have been leaked – a stolen password, hackers infiltrating systems, or even the risk of an insider breaking out of the protected environment. Most often, however, this occurs when a database is left online unsecured (even without a password) and remains accessible for anyone to access the data. If so, there is clearly a poor understanding of the security features of Elasticsearch and the expectations organizations have in protecting sensitive customer data. This could be due to the common misconception that responsibility for security is automatically transferred to the cloud service provider. This is a false assumption and often results in misconfigured or underprotected servers. Cloud security is a shared responsibility between the company’s security team and the cloud service provider. However, the organization itself has at least the responsibility of performing the necessary due diligence to properly configure and secure every corner of the system and to minimize potential risks.
In order to effectively prevent Elasticsearch (or similar) data breaches, a different mindset from data security is required, one that allows data to be a) protected wherever it exists and b) whoever manages it on their behalf. For this reason, a data-centric security model is more suitable, as it enables a company to secure and use data while it is protected for analysis and data exchange on cloud-based resources.
Standard encryption-based security is one way to do this. However, encryption methods involve a sometimes complicated administrative burden for the management of keys. In addition, many encryption algorithms can be cracked easily. Tokenization, on the other hand, is a data-centric security method in which confidential information is replaced by harmless representation tokens. This means that even if the data falls into the wrong hands, no clear meaning can be derived from the tokens. Sensitive information remains protected, which means that threat actors are unable to benefit from breaches and data theft.
With the GDPR and the new wave of similar data protection and security laws, consumers know better what to expect when they share their sensitive information with vendors and service providers. Therefore, protecting data is more important than ever. Had techniques such as tokenization been used to mask the information in many of these Elasticsearch server leaks, that data would not have been decipherable by criminal threat actors – the information itself would not have been compromised and the culpable organization would have complied and avoided liability – based impact.
This is a lesson for all of us in dealing with data – if someone actually dreams of their data being secure while it is “hidden in sight” in an “anonymous” cloud resource, the series for Elasticsearch and Other Cloud Service providers should provide the necessary wake-up call to act now. Nobody wants to deal with the fallout when a real alarm bell rings!