While big data is not a new concept, it’s gaining more significance as IT embraces new ways to generate and leverage big data. Mobility, cloud, social, analytics, and IoT are all important initiatives that are driving the production and consumption of big data. Therefore, IT needs to take big data management seriously. The challenge is to find a way to manage big data that does not create a big problem. Big data involves ingesting vast quantities of data and then leveraging this data to drive competitive advantage. The volume and velocity of big data render manual efforts to manage this data obsolete. However, it is possible to logically simplify big data into the following activities:
- Identify key big data use cases.
- Establish how analytics will support these use cases.
- Acquire and manage the data necessary to fuel the analytics.
These three activities form a closed loop system and an enterprise can enter this system at any point. But regardless of the entry point, all three of these activities must be addressed in a comprehensive way. Because any system is constrained by its weakest link, each of these activities must have a well-conceived approach that will stand up to a combination of automation, scale, complexity, and security requirements.
The least glamorous aspect of big data is data acquisition and management. However, considering the complexity and importance of data integration, governance, and security; big data management is a difficult activity and one that is often not well understood. This makes big data management a perfect candidate for a platform-based solution. Data ingestion, data quality, data governance, and data security are all thorny issues for enterprises, but they must be addressed to effectively get them in the big data game. Therefore, it is recommended that enterprises look to a vendor to provide them with a big data management platform. One of the difficulties is that the term “big data management” is often used to describe any collection of capabilities that reside in the big data domain. Consequently, enterprises are advised to first consider their use cases and then look at what tools best support their data acquisition and management needs with consideration to what analytics will be applied. This approach ensures the availability of the data needed to drive the analysis prior to any investment in tools, and that the less glamorous but more important tasks associated with integration--quality, governance, and security--are well defined before data acquisition commences.
Informatica, who is perhaps the best known independent leader in data integration, just announced its big data management platform. Informatica has done a commendable job at addressing data integration, quality, governance, and security for big data environments with this platform. Here’s an overview of what the platform includes:
- Data Integration. 200+ prebuilt data connectors (Hadoop, NoSQL, MPP appliances, …), support for real-time streaming data, 100+ data integration and data quality transformations, dynamic mappings & schema support, data pipelines, provisioning templates, and workload optimization.
- Data Quality and Governance. Tools that enable analysts and data stewards to collaborate with IT, data profiling and discovery capabilities, relationship discovery, a universal metadata catalog, scalable data quality processes, and end-to-end data auditing and analysis.
- Big Data Security. The automated discovery of sensitive data and its accessibility, visualization, and reporting of sensitive data, sensitive data exposure analysis, data de-identification for specific application classes and activities.
Because it’s unusual to find such a comprehensive collection of services for big data management, Informatica’s platform is an effective way for enterprises looking for a big data management reference architecture to begin their search. Informatica’s big data management platform is also cloud-based which means that enterprises do not have to go all-in on the services provided. This allows for enterprises to pursue a soft-start strategy and ramp up as requirements dictate. Informatica’s big data management platform is intended to extend its leadership position in managing structured data into the unstructured domain – and in this regard we believe that Informatica has succeeded.