Knowledge Graphs – How to Ask a Good Data Question

by John Singer July 19, 2020 at 3:26 pm

The good news about the Knowledge Graph is you can start anywhere and build out from there.  Because data is organized into facts, each fact being represented by a subject node, relationship (predicate), and an object node on the graph you can start with any fact.  Individual facts are typically boring, but when linked into a broader web of data you can begin easily answering what where once difficult questions.

The question is where to start?  Build it and they will come approaches are to be avoided so some starting requirements might be a good idea.  A good way to drive out Knowledge Map requirements is to write a good “data story”. 

The elements of a good story

  • Protagonist: There has to be a good guy and a bad guy.  In our case, the protagonist is the person seeking a greater information awareness in order to solve a problem.
  • Antagonist: The antagonist isn’t really a person, it’s the situation (stupid systems, broken processes, lack of data) the protagonist finds herself in.
  • Conflict: The conflict of course, refers to the problem/issue/crisis that needs to be resolved.
  • Resolution: The resolution is the path through the knowledge map (connecting the dots) that informs the protagonist and resolves the conflict.

The Data Story

Let’s take the general story elements and recast them into a use case template for writing a “data story”.  You can create a form with fill in the blanks for the following questions.

  • Actor (protagonist): Who is asking the question?  What role does this person play in what part of the organization?
  • Scenario (conflict): What is the Business context? What situation is the Actor in, what problem needs to be solved?  Who is the problem impacting? What process, organizational, data issues exist that prevent the problem from being solved.
  • Starting Data (antagonist): What data point(s) are known at the beginning of the scenario?  What systems does this data come from? This is typically the systems the user has immediate access to.
  • Ending Data (Resolution): What data is needed by the Actor to resolve the Scenario?  State how this will solve the problem as described in the scenario.   Define the systems that store the missing data and if possible, demonstrate a “swivel chair” integration of the various systems such that the needed information is obtained.  Identify terminology mismatches between systems that need to be resolved.

The starting and ending data points (and all links between) and their source systems make up the requirement for adding facts to the knowledge map.  At first, the knowledge map will require many new data integrations to solve the data scenario.  As more data questions are “asked and answered” many of the needed facts will already exist so only missing facts need to be linked in.  All you need is one high value problem to get started and then theKnowledge Graph can grow in any direction from there.

This story originally appeared at DATAVERSITY on August 28th, 2017

Add Comment