Search |
News/Blog |
Using Big Data Analytics When Large-Scale Litigation Stakes are High - Part 2
As Roger Cohen observed recently in the New York Times, "Technology is double-edged. It may produce so much information that it hides the truth." (Please see: http://imcmsimages.mediacorp.sg/CMSFileserver/documents/006/PDF/20110514/1405NYP002.pdf.) In the olden days of large-scale, paper-based litigation in the US, hiding the truth in mountains of paper was a generally accepted tactic among major city firms. As a rule, however, if my experience as a litigation legal assistant in San Francisco in the late 1970s and early 1980s was any indication, the legal teams knew the truth, at least for their own clients.
Electronically stored information (ESI) in the form of emails held on enterprise physical and virtual servers and in the cloud, text messages, voicemails, tweets, Facebook walls, blogs, smartphone data, etc. has made achieving this objective in large-scale litigation a Herculean task. As the mountains of paper have transmogrified into vast oceans of data, legal teams now often flounder amidst the massive swells of a perfect storm in their expeditions to discover the truth, which they must inevitably find if they are to provide wise counsel to the organisations they serve. So, in part 1 of my blog, I asked the question: "Is it time to consider the use of Big Data Analytics in large-scale litigation, employing technologies such as Hadoop, which is based upon the distributed data processing techniques that Google invented, called MapReduce?" (http://inforiskawareness.co.uk/using_big_data_analytics_when_largescale_litigation_stakes_are_high_part_1/)
The appeal of Big Data Analytics to global organisations resides in its versatility, its flexibility, and its relatively low costs. The same technology that predicts customer purchasing behaviour can detect subtle anomalies that reveal signs of internal fraud and bribery. Or again, the same technology that indicates the gradation of customer attitudes towards a company's products can simultaneously identify growing discontent among consumers that would normally signal the likelihood of substantial impending claims.
What makes this versatility possible is the flexibility of Big Data Analytics, enabling users to slice and dice raw data in an infinite number of ways in order to solve a business problem or to analyse a legal issue. As Big Data experts Ron Bodkin and Rick Farnell have pointed out, "Storing data in its raw format and holding on to it for future analysis was possible a couple years ago, but it was painfully expensive and time consuming. The most common practice at the time was to pre-compute specific summaries in a data warehouse to answer questions that were anticipated in advance, often at great investment of time and money." The authors go on to say that thanks to organisations such as Facebook, which, among others, "has invested in building the open source Hadoop distributed storage and processing system, the rest of the world now has the ability to store, access, and process raw data for a relatively small price with unprecedented scalability and flexibility." (Please see "Predictive Analytics Alone is Not the Answer," IQT Quarterly, Spring 2011, Vol. 2 No.4, pp. 5-8, http://www.iqt.org/).
Multinational businesses can, therefore, now retain massive quantities of source data stored online in Hadoop clusters, as opposed to archiving it on tapes, for their business analyst and legal teams to interrogate according to the requirements of the moment. As one commentator has observed: "The mining process . . . lets users create relationships between data, while mining for other information that is applicable for an e-discovery request. For example, data that is not normally related can be retrieved using ad hoc queries that build temporary relationships to combine filtered data sets. This allows an administrator to gather all information pertinent to a particular customer (including VoIP recordings, emails, IMs, documents, spreadsheets and so on) in a matter of minutes by leveraging the power of Big Data Web analytics." (Please see: http://itknowledgeexchange.techtarget.com/it-compliance/weighing-the-balance-of-big-data-web-analytics-and-compliance/) As another sign of its versatility and flexibility, organisations can also use Big Data Analytics for event processing "to respond to incoming interactions within milliseconds . . . to flag possible fraud," or to determine security risks (Please see "Predictive Analytics Alone is Not the Answer," IQT Quarterly, Spring 2011, Vol. 2 No.4, pp. 5-8, http://www.iqt.org/).
The relatively low cost for the capabilities enumerated above derives from the fact that Hadoop clusters can be created using commodity servers that stream reads from their disks at wire speed, thus providing an extraordinary ROI for organisations that prefer a safe bet to rolling the dice when the large-scale litigation stakes are high.