When designing your Knowledge Graph it certainly helps to have some interesting questions that you are trying to answer. “Build it and they will come” approaches didn’t work very well for data warehouse and BI efforts and the same will be true for your Knowledge Graph. Of course, you can’t connect the dots if you don’t collect the dots so there will be some amount of loading data into the Knowledge Graph that doesn’t directly produce any high value results. However, the “network effect” of continually combining data together will lead to the ability to answer more difficult questions.
Let’s play a little game to illustrate this – I’m going to show you some data and you’re going to tell me if it’s interesting or boring.
Here is a list of the top executives at BigCorp.
Senior Executives
- Smith
- Jones
- Subrumanian
- …
OK, interesting or boring? Well if any of you said interesting you must be in senior management. Obviously, this list of data is pretty boring.
Let’s try again, here is a list of IP addresses used at BigCorp
IP Address
- 10.5.100.10
- 10.5.100.11
- 10.5.100.12
- … (this list goes on for a while)
This should be a little easier (unless you’re a network administrator), obviously this is really boring.
Now let’s make the plot a little more interesting. The head of security at BigCorp decides to run penetration tests against all the application systems run by the company. This produces a list of IP addresses and vulnerabilities that is presented to the head of data center operations with the admonition to “fix these issues”. Now the head of data center operations turns to his minions and asks the question “Which senior business executive owns each of these IP addresses?”.
Mr. Smith — OWNS —> 10.5.100.10
Now that’s interesting! But what system can produce this knowledge? It turns out there are quite a few systems and databases involved (some are just in Excel) and the real challenge is connecting the dots as there is no direct connection between senior management and IP addresses. Let’s be honest – senior management doesn’t know what IP addresses are and the people who run the data center probably don’t know who the senior managers of the company are which is why making this connection is very interesting.
Let’s take a look at the path from interesting to boring.
The diagram above illustrates how we can get from senior executive to IP address. The different colors represent the different source systems (Enterprise Architecture tool, Application Portfolio Management tool, IT Asset Management tool, Network Scanner). Most companies already have this information they just lack the ability to link it all together.
What can we learn from this example
- Simple facts that just state the existence of things (i.e. “there is an IP address 10.5.100.10”) are pretty boring, however they are essential. You have to collect this information.
- Facts that state the relationship between two things are more interesting, but only when you start crossing boundaries. For example:
- Fact: This IP address is bound to this L2 interface
- Not very interesting unless you’re a network admin
- This information is contained inside the same data domain and organizational boundary
- Fact: This computer asset has this IP address
- Getting more interesting as we’ve crossed a data domain and an organizational boundary (they typically go together)
- However, the network people and the computer system support people probably know how to figure this out
- Fact: This Application System runs on this Computer System
- Even more interesting because we’ve now crossed a knowledge boundary
- It is very possible that the people who support an application system don’t know what computer systems it runs on and vice versa. However, the two groups are at least aware of each other.
- Fact: This senior executive “owns” this IP address
- Very interesting because now we’ve crossed an awareness boundary. These two worlds (senior executives and network admins) don’t hardly know that each other exists, let alone that there is an important connection between them.
- Fact: This IP address is bound to this L2 interface
We can conclude that it is the relationships between things that are interesting. But it’s not just the relationship between two things, but the path or network of relationships leading from one thing to another that can produce truly useful knowledge for the organization. Finally, the more boundaries (data domain, organizational, knowledge, and awareness) you cross, the greater impact your Knowledge Maps will have.