2016: The Year of the Graph

Insights from the inaugural independent Graph Day Conference

The inaugural Graph Day, an independent graph conference, was held on Jan 17th, 2016 in Austin, TX. The brain child of Lynn Bender, the first gathering of the Graph Day conference brought together database vendors, analysts, visualization groups, and data enthusiasts for a day of insightful conversation. The PokitDok team was in attendance in full force and presented a series of the graph team’s open source initiatives. Our slides are available on SlideShare, our Gremlin-Python repository is baked into a docker container for exploration, and our custom build of Titan is available here.

For us, here are some of the top take aways from Graph Day Texas, 2016:

The proper implementation of a graph db’s architecture can make you, or break you.
Transitioning existing use cases and databases to use a graph database is probably a good idea and will enable your business to more easily extract insightful connections across your data than with a relational db. However, it isn’t advisable to jump into these waters headfirst. First and foremost, what are your use cases and how is a graph optimally designed to address them? Implementing a graph database prior to mapping out the db’s specific use cases is going to cause downstream design problems. On the other hand, do not make your graph infrastructure more complicated than needed up front. For example, Josh Perryman’s talk titled “Graph Database Engine Shout-out: Part 1” presented a use case when PostgreSQL queries outperformed the traversal time in a graph database for the same analytics. In this example, the data was modeled to fit into a wide RDBMS table (as opposed to a tall table) and some analytics were faster in an optimized relational database. However, relational databases are much less effective at storing or expressing relationships between stored data attributes. This is especially true as relationships become more complex, making graph databases more appealing in such cases.

Overall, the double-edged sword of being prepared, yet not over-designing, can create insurmountable infrastructure problems down the road when building a graph based solution. Spend the time needed up front to design, and be prepared to iterate, pivot, break, and try again.

2016 is the year of the graph
This was the common theme throughout the conference that was ignited by Lynn Bender, the conference organizer, during his event opening. And we couldn’t agree more. As we see it, there are two dependent and positively correlated channels of momentum required for really seeing this field take off: graph infrastructure and an open source community. The development and traction of each area fuels the other; without either, neither can gain the traction needed. On one side, we are seeing vast improvements to the underlying infrastructure and tooling that are vital to creating production level graph databases. On the other hand, the business use cases and open source initiatives from the graph community are providing direction for future infrastructure improvements. The momentum and conversation from both sides of this community are telling of the large traction and uptake in graph dbs we are about to see in 2016.

It is still a bit of a wild space.
One of the most useful aspects of the independent GraphDay conference was the ability to talk to every vendor across the graph stack. From databases to analytics and visualization, the conversations centered on one developer’s roadblock to another’s success. These insightful conversations outlined the community’s ever-changing status on the leading tools and vendors. While we are betting on the DataStax DSE graph due to our specific use cases for heavy writes and real time seamless API processing, it was useful to gather with like minds and hash though use cases, experience, and roadblocks across every vendor in the space.

Also, it was announced at the event that the next independent graph day looks to be scheduled around June and will be in San Francisco. Here are a few things our team would like to see highlighted at the next event:

Infrastructures built with graph databases and traditional stores?
Existing API infrastructures can utilize event logs like Kafka for managing updates to multiple stores. For example, you may want an event to be processed and merged into two destinations: ElasticSearch indexes for real time search and a graph database. The theme here: use the right tool for the right application, and that means multiple tools. How do you manage updates? How do you built an architecture around this? How can you expose search, traversal, OLTP, and OLAP? Tools that claim to do everything may not excel at one particular thing. We would like to participate in more discussion around real-time graph architectures at the next event.

How many folks are still using ontologies?
There were a few talks which hinted at building RDF ontologies for their application in order to do SparQL queries. Which use cases across the graph community are utilizing this model for analytics? What has proven to be more performant and repeatable than other options? Or is it a model that is relegated to academia but not the enterprise?

And, just to note, while we did have the opportunity to spend some time in Texas, we did not witness the expected inundation of cowboy hats and boots. Instead, we found Hank. Enjoy.

 while we did have the opportunity to spend some time in Texas, we did not witness the expected inundation of cowboy hats and boots.  Instead, we found Hank.  Enjoy.

We found Hank. Enjoy.