The era of “big data” has arrived, and this is not new information. During the last five to 10 years, there has been a dramatic increase in the amount of data stored, and the diversity of file formats, as companies fully adopt new digital communications and information technologies.

Just how big is that explosion? IBM estimates that each day, another 2.5 quintillion bytes of data are added to the global storage load. The growth is exponential: 90% of all data in the world was created in the last two years.

Midstream oil and gas companies have long been familiar with data governance, but over the years, the scope of information that must be managed has been broadened to include a larger variety of structured data, and unstructured files, documents and communications. Not only is there an increase in the kind of information, but the size of individual files is also increasing. In midstream businesses, increased regulation and the drive for operational efficiency has resulted in more system-wide sensors delivering more real-time information to operations for monitoring, management and maintenance purposes.

Structured data

Traditional file storage systems store structured information such as Word documents, presentations and spreadsheets as well as information inside of corporate databases like enterprise resource planning and customer relationship management systems. Not only are there more sensors, but also they are more sensitive and capable of collecting more data.

Implemented originally to ensure efficient and effective real-time operations—whether processing or transporting petrochemicals—now this data must be stored and made accessible within a reasonable time frame. Maintenance logs and inspection data require storage for future liability risk. While structured data are easier to search because they are, by definition, sortable by keyword criteria or search terms than unstructured data, the sheer volume is becoming daunting.

Searching terabytes of data even when structured can be a very time-consuming process if not done with the proper indexing tools. Companies are additionally looking at creating retention policies based on corporate and compliance requirements to reduce the data footprint.

Unstructured data

Some 80% of corporate information being generated is unstructured—email, instant messages, audio files, videos, web pages, blog posts, Facebook pages and many other new forms of information that relate to the company are all examples of unstructured data.

In the midstream, new security systems introduce high-data volumes. For example, surveillance camera data needs storing for quality assurance and e-discovery purposes. Mapping and geographic information systems require increased storage and classification systems. Project files, computer-aided design drawings and complex process and infrastructure 3-D models also require attention. The increased availability of fiber-optic cable systems is leading to thousands of miles of pipeline lined with optical networks and monitoring devices that provide real-time data on resource delivery, environmental conditions and possible threats to the integrity of the line.

Developing a retention policy

Not all data are equal. The first question is which data are valuable to the company and to what degree—performance, risk mitigation, e-discovery, compliance, etc. Then the question is how do you categorize the information that you need to keep so that it is easily retrievable?

There is not a simple answer. From a data-governance point of view, information technology (IT) professionals are struggling to keep up with the onslaught of new data. IT budgets and teams are constantly squeezed while the costs associated with data management rise. Limited budgets and staff resources pose challenges for all IT departments.

One way to limit data storage costs is to develop a comprehensive data retention policy. In addition to allowing companies to make better data storage investments, it also allows organizations in oil and gas to respond to the evolving legal environment of e-discovery of electronically stored data.

A retention policy comprises three key components:

1. Data retention and classification strategy

This critical step allows a company to define what information it needs to keep and how long it must remain accessible. Here, regulatory requirements play a vital role, however there may be reasons beyond regulation why a company would want to keep data. Big data can make operations more efficient. For example, companies may wish to keep historic pipeline sensor information for performance reasons and to add to predictive failure analyses. A robust data retention policy weighs both legal and privacy concerns against economic and availability concerns to determine archival rules, data accessibility, data formats, data retention time and data security.

2. Audit strategy

Companies must be able to demonstrate that the systems and procedures they have developed to store and protect data are actually being used and work to fulfill the data retention and classification strategy.

3. Business continuity

A data retention policy must also demonstrate that, in the event of storage system failure caused by natural disaster or other reasons, the data will not be lost. Double redundancy and co-location of information, offsite storage and hosted storage solutions are IT strategies that help companies retain their ability to function despite catastrophe.

E-discovery, compliance and regulatory

Even routine regulatory requests can be onerous if no clear data classification and retention policies exist. Think for a minute—what would happen if the U.S. Environmental Protection Agency (EPA) came knocking on your door tomorrow with an information request?

Many legal departments do not yet fully grasp all EPA regulations and how they apply across their lines of business and various departments. Even if they do, they may have no formal way to convey this information to their IT group. Companies are already challenged to identify and communicate what parts of their enormous data volume is needed for regulatory, legal, environmental and strategic market requirements.

If a pipeline operation fails and causes damage to a third party or the environment, or death, several agencies may scrutinize that failure. The Federal Trade Commission will investigate any supply disruption. State departments of transportation will look at pipeline safety issues. The Occupational Safety and Health Administration investigates injuries or deaths of employees, law enforcement or injury or death of third parties. The EPA will look at environmental issues. All of these agencies have different compliance and data retention requirements.

In Canada, the National Energy Board lists comprehensive records retention requirements for different kinds of data. In the U.S., regulation 49 CFR 195.404 addresses records retention requirements for liquid pipelines.

Failure to produce records may not only be a violation, but might also raise questions about whether the company destroyed the data because it had something to hide. If a lawsuit is filed, issues regarding the destruction of evidence are likely, along with concerns about records that are part of common law requirements, as well as regulations like the Sarbanes-Oxley Act in the U.S.

Additionally important is a company’s ability to classify and protect its proprietary information not subject to search. This classification must be done in an auditable fashion but when done properly can protect organizations.

A team effort

Navigating the big data challenge requires a team approach. The simple fact is that developing a solid data classification and retention policy is not just an IT exercise. Legal executives, compliance executives and a line of business executives must lead the development of the strategy. Legal teams usually want to implement complex data governance rules based on many different criteria and end users typically don’t want to lose access to any of their data, while IT departments want to make the policies as easy and efficient as possible.

The rise of hosted data governance systems present an entirely new way to develop these policies and to manage information—without taking on the burden of buying and maintaining servers in-house. Many pipeline companies have put programs in place to eliminate excess data. Their goals are to significantly reduce storage and other information management costs while improving their ability to comply with regulations, respond to requests for legal holds and use high-value business information effectively. In this era of big data, hosted solutions, which can easily scale to meet volume requirements, will no doubt play a growing role in data governance.

These are just a few of the considerations oil and gas companies will make when evaluating data governance solutions. The important first step is developing a data management policy that will not only protect your business against legal and continuity risks, but also reduce costs and improve corporate performance.

Chris Grossman is the senior vice president of Enterprise Applications at Rand Worldwide where he manages the Enterprise Applications division of Rand Worldwide, including the Rand Secure Data division.