(provoked by frustration with solipsists)In 3 years, the bytes of data generated by cameras, phones, business systems and other devices will equal the number of grains of sand on the world's beaches. While most will be consumers driven more than half is expected to cross corporate networks. Even if one just considers business data - there will be a lot of it e.g. Wal-Mart generates 1TB of transactional data a day.
Data management strategies are going to have to improve to deal with business policies, security and compliance issues. Today the average organisation can't even provide a high level map of what data it deals with i.e. it is buried in silos (often maintained by a technical clergy interested in their particular arcane way of dealing with data e.g. ER modelling, Documents, Multimedia etc.).
You can't manage what you can't see, and even if you can see it, that doesn't necessarily mean you understand it. Few organizations have a real handle on the relationship between business information/data and the underlying ICT systems that implement the data (in its many forms). Worse, if they did have visibility into the data, they would often discover that they: have several different systems handling these data; and an increasingly diverse set of internal and external stakeholders; and complex governance and compliance issues.
The end result of all this confusion is that most CIOs can't really have a candid conversation with their business counterparts about what the real issues are associated with managing todays data or dealing with more data. The typical response is to frantically try and find out where the data is (e.g. in applications SAP, Oracle, bespoke; repositories: file, data, document, content, image etc.) and then try see how hard it is access it, find it, archive it, change it, aggregate it, report on it etc.
Why hold data in an enterprise knowledge base
Some reasons for holding data in an Enterprise knowledge base are to understand:
- how the data relates to the different aspects of the enterprise e.g. what data are associated with what: process, service, objects etc. (for impact analysis, completeness checking, benchmarking against reference models etc.);
- where and in what form this data has been implemented e.g. in what systems, in what form (e.g. SQL, XML, File, Image, other etc.)
- what the business concept is (i.e. information as distinct from an implementation of if) so it is clear how it is affect by external factors (e.g. laws, compliance regulations), internal governance mechanism (e.g. life cycle management), etc.
What is wrong with the current approach
It is clear that this information can't effectively be "managed" in documents or in people's heads (often the failure of this approach becomes apparent within a business/ICT transformation project, and almost always it become apparent across projects, and time). It is clear that dedicated tools associated with various aspects of technology implementation have a role (e.g. ER modellers associated with relational data, XML modellers, ontology modellers, ETL and data mapping tools etc.). Therefore what is required is an understand of how one moves from the Enterprise domain to a technology domain.
What are the impediments
Typically the impediments to maintain an accurate view of the information/data in an organisation are:
- practitioners in one of the technology silos (e.g. ER modellers, BI/ETL modellers, XML modellers, UML modellers etc.) who can not see need beyond their immediate needs (i.e. to the broader needs of the project and organisation).
- projects which focus on the short term or a narrow aspect of the business
- vendors who have an interest in developing direct connections to their implementation technologies and thereby locking clients into these technologies.
What would an approach to establishing an enterprise knowledge base include:
- Determining the role of the enterprise knowledge base e.g. to act as a central hub tying together all aspects of what is known about an enterprise and the degree to which it will manage our knowledge of data (i.e. the conceptual boundaries between say an enterprise view and an ER view)
- Determining the nature of the interface between the Enterprise knowledge base and dedicated implementation oriented tools (e.g. ER modellers, XML modellers etc.)
- Inventorying of existing (and planned) data e.g. why it exists, who uses it, where it is implemented (systems, interfaces, store etc.)
- Benchmarking based on an industry Reference Models if they exist i.e. in some industries Industry Reference Models exist which can be used (see RHE's documents on Industry Reference models).
- Defining architectural principles e.g. what data should be implemented where and how e.g what the source of record should be, what the form of the data should be, what the life cycle issues are (and where these are governed by external compliance regulations).
- Analyzing business information for context - what drives it, who owns it, what it relates to e.g. services, processes
- Selecting data for optimisation
- Developing optimisation programme
- Integrating the data optimisation work with other programmes and initiatives
- Defining data governance approaches
- Implementating the approaches and optimisating the management of the data
[I have just used the word "data" - and have not attempted to distinguish between "data" and "information"; or "information" and "knowledge". This is not to say that I don't see that distinctions can be drawn e.g. Chambers gives the following definitions: data is - "facts given (quantities, values, names, etc) from which other information may be inferred"; information is - "intelligence given; knowledge; ...data"; knowledge is - "that which is known; information, instruction; enlightenment, learning". So from this it seems that" information can equate to knowledge or data; data provides the basis for information (which in turn contributes to knowledge); and that information and data can be stored in a system, whereas knowledge implies consciousness (e.g. people).