Modeling reference data in RavenDB
The question came up in the Mailing list, and I thought it would be a good thing to post about.
How do you handle reference data in RavenDB? The typical example in most applications would be something like the list of states, their names and abbreviations. You might want to refer to a state by its abbreviation, but still allow for something like this:
How do you handle this in RavenDB? Using a relational database, you would probably have a separate table just for states, and it is certainly possible to create something similar to that in RavenDB:
The problem with this approach is that is it a very wrong headed approach one for a document database. In a relational database, you have no choice but to threat anything that has many items as a table with many rows. In a document database, we have much better alternatives.
Instead of threating each state as an individual document, we can treat them as a whole value, like this:
In the sample, I included just a couple of states, but I think that you get the idea. Note the name of this document “Config/States”.
What are the benefits of this approach?
- We only have to retrieve a single document.
- We are almost always going to treat the states as a list, never as individual items.
- There is never a transactional boundary for modifying just a single state.
In fact, I would usually say that this isn’t good enough. I would try to centralized any/all of the reference data that we use in the application into a single document. There is a high likelihood that it would be a fairly small document even after doing so, and it would be insanely easy to cache.
Comments
This is a nice approach but has a drawback: sometimes you need to modify your dictionaries or let your customer modify them. In case of sql database it's a simple insert/update, but in Raven you'll have to edit Json document that can get huge and unmaintainable if there are many dictionaries. Of course you could provide a CRUD gui for that purpose but it's an additional burden for develpoers.
@Rafal
With the Raven Client API it's not that hard, you certainly don't have to edit the Json directly. You just do something like this:
var stateLookup = session.Load<List<State>("Config/states"); ... make changes.. session.Store(stateLookup)
How would you handle localization of those values?
@braco
I guess one way is to have several docs, "Config/en-gb/States", "Config/fr-ca/States", "Config/en-ca/States" etc
@Matt Great, you have shown that Raven has API. And? Am I supposed to tell my customer that they can write a simple program to update a list? Almost everyone in IT can use SQL to modify data and it's quite common practice, but with Raven it's not so easy. Especially if all static data is in a huge Json document.
Please don't threat all those poor many-item'ed tables and states, they are but artifacts of sick minds.
Rafal,
As a serious developers you wouldn't want your customers to insert and update rows in a SQL-table manually but instead have a simple UI for that. In a RDBMS situation it is even more critical since you have to maintain forey-key-relations and I would doubt that someone who gives his customers database-access can assure that he has all the key-constraints set-up correctly in SQL-server...
From the developer perspective, the only drawback I find here is the work needed to maintain the relation, but because the names of states aren't likely to change it doesn't matter here.
not familiar with ravendb, so my question may look dumb, but I'm wondering how mailing list entity looks like (reference to state) and how you load state for each of mailing list item, what displaying grid of mailing list items.
@Matt What happens if you've got a few documents using the same States document and you need to update one or a few states? Will that still work: var stateLookup = session.Load("Config/states"); ... make changes.. session.Store(stateLookup)
@Giedrius this is document DB, not a relational DB. there isn't a relationship between the states config and the mailing list. Instead you would save the state information directly in the the mailing list document.
This is the most difficult concept for developers coming from a relational data background. they ask "how does data persistance work without relations?" or "how do I make raven work like a RDBMS?". for more information on this check out Ayende's nosql tag.
@Rafal
I didn't realize that you were talking about letting the customer edit database tables by hand. In that case, is it really much easier for them to write SQL insert/update statements than using the RavenDB management studio and editing a Json doc?
See http://blogs.hibernatingrhinos.com/5/ravendb-in-practice-part-1-an-introduction-to-ravendb for screen-shots.
BTW I'm not pretending that RavenDB and doc-databases are the answer to everything, just that they make some things easy.
@Nieve
RavenDB has transactions and writing a doc is fully ACID, so the next time you load the doc you'll get the new values. The line I missed out from the code sample was: session.SaveChanges()
This writes the doc to the store as an atomic transaction, see http://ravendb.net/documentation/client-api/docs-documentsession-characteristics for more info.
Rafal, What is the problem with modifying the document? That is what RavenDB is _for_.
Bavo,
{ "States": [ { "Name": [ { "Culture": "en", "Value": "Texas"}, { "Culture": "fr", "Value": "French Texas" } ], "Abbrevation": "TX" } ] }
Rafal, I still fail to see the problem. RavenDB has a management UI that you can use. Beside, if you are advocating FOR having random people execute SQL against the production db, you lost me
What’s the cut off limit for this approach? I thinking what if you’re reference data was very city and town in the US rather than just a list of states? I guess this would still fit in memory without too much of a problem, but if you were going to provide some sort of auto-complete on this then you’d need to provide some sort of efficient way of search this, if each was a document in RavenDb then you could just use an index to ensure an efficient search time on the fields you’re interested in.
Robert, You can still use an index to search inside a single document. Cut off point would be greater than a few megs, but I would probably do is something like: Config/States Config/Texas/Cities Config/Florida/Cities
Etc
And what is the best approach to store this inside a NoSql "db"?
Address Street Country (reference) State (reference) Country Code Name-En/Name-Nl/etc. States (one or more states)
And when showing an address to the user it should show: Street Country.Name-Nl (current culture) State.Name-Nl (current culture)
Thanks
Ayende, The problem i see is general lack of tools for modifying database contents, especially for modifying large Json documents, and general unfamiliarity of db admins with this new database type. But maybe the problem can be alleviated by providing scripts that automate basic tasks and show how to perform typical operations. This is not advocating to allow random people do random things to database, I'm talking about adoption of Raven by IT - usually they are the only people allowed to touch production databases.
@Rafal, The problem with the DB Admins is that relational database was the silver bullet for more than 20 years and now we have new solutions that can be mixed with relational databases or not, they need to get back to the book and learn like all the people who write systems do every single day. I don't know if this is possible with RavenDB but with Mongo theres a command line where you can modify things using "scripts". A kind of javascript if I'm not wrong. In RavenDB I saw that there are a silverlight app, but I don't use this a lot yet, I really don't know all the resources available, you can modify documents there, but I don't know if you can do this using "scripts".
@Marco As far as I'm concerned, the best approach for domain data is replicate this information, since this information almost never change (how many times you see an state or city that changed name?), there will be no problem with this approach.
@Matt or @Ayende ...
can you please elaborate how you would use these states, in a simple document? eg. a User poco/document.
public class { public string Id { get; set; } public string Name { get; set; } public ????? State { get; set; } // <-- state this person was born in. }
Also, why the namespace Config/States ??? is 'Config/' something special?
Marco, The Country / State usually just contains the id of the relevant item, not the actual value. You load that item and then access its name. We have very efficient ways of doing that.
Rafal, Scripts are really easy to write against RavenDB. Hell, you can provide pure JS solution in a few minutes that requires nothing else. As for "general lack of tools", that isn't an issue for RavenDB, it comes already with such a tool to do just that.
Justin, "Config/" is just my conventions for reference data, nothing special about it.
public class Address { public string Id { get; set; } public string Name { get; set; } public string State { get; set; } // <-- state this person was born in. }
@Ayende .. so the value of that Address class is a string .. meaning .. there's no reference to the Config/States document? Just happens to use a value, from there?
Justin, That value can uniquely identify the state in question in that document, but yes. There are no references in RavenDB
Cheers! As always, much appreciated!
FRENCH TEXAS! HAHA
Would you model this as a Config class with a single list? What about nexttime you need som "basic data" for at list? Would you then just extend this existing class/document?
Hi!
I was loading yesterday existing reference/read-most data from sql server into a document. I ended up loading data from SQL Server with EF, shaping it with linq and persisting the shaped data to ravendb in one single document, following this post's ideas.
Is this the recommended way of loading existing data to RavenDB? Wouldn't it be nice to have this kind of one-time loading of data built-in into RavenDB Studio? A tool that would let you specify a sql data source, a linq query to shape/filter results, a destination document and done!
Thanks!
Germán
German, That is how we do it, yes. Having such a tool isn't really useful from our perspective, you already have those tools outside of RavenDB, no need to replicate them inside RavenDB.
Hi!
Is this approach also advisable if you have a lot of information? For example, I have at least 12000 locations with a lot of textual data, country data and so on, but in some parts of this application only id and name are needed as I want to display the name to the user after a webservice gives me the id.
With this in mind I could come up with two solutions: -make one document for each location with all the data and a map/reduce index just to get the subset of information needed in order to populate a select list or decode the id
or
-make one document for each location and another big document that has just the id and name. I would need to get them in sync.
In general, as you said, with sql there are just columns, rows, and relations. With RavenDb I think that we have more choices regarding how to shape our data and for me is just a little bit more diffcullt.
Any chance we get a good blog posts series on data modelling in document databases?
Germán
Germán, I would go with the map/reduce approach, because you are always going to want to search this, never actually show the entire data set.
Comment preview